카카오

22 posts

tech.kakao.com

Filter by tag

kakao

Kanana-2 Development Story ( (opens in new tab)

Kakao has introduced Kanana-2, a series of language models utilizing a Mixture of Experts (MoE) architecture to achieve high intelligence while maintaining low inference costs. To support the stable pre-training of their largest 155B parameter model, the team implemented advanced technical stacks including the Muon optimizer and MuonClip to prevent training instabilities. These developments reflect a strategic focus on balancing large-scale performance with "high-efficiency, low-cost" engineering. ### MoE Architecture and Scaling Strategy * Kanana-2 models, such as the 32B version, activate only 3B parameters during inference to maximize computational efficiency without sacrificing the intelligence of a larger model. * The team is currently training a massive 155B parameter version (Kanana-2-155b-a17b) using FP8 training infrastructure, MuonClip, and Hyperparameter Transfer to ensure stable convergence. * Custom-developed MoE kernels were integrated to reduce memory usage and increase training speed, resulting in a highly stable Loss Curve even during constant learning rate phases. ### A Controlled Testbed for Mid- and Post-Training * The Kanana-2-30b-a3b-base-2601 model was intentionally released without synthetic reasoning data to serve as a "clean" base for research. * This model allows researchers to investigate phenomena like "Reasoning Trace Distribution Mismatch" and "Spurious Rewards" by providing a baseline unaffected by post-training interventions. * By offering a high-quality Korean base model, Kakao aims to support the local AI community in conducting more rigorous experiments on mathematical and logical reasoning. ### Optimization with Muon and Polar Express * Kakao shifted from the industry-standard AdamW optimizer to Muon, which updates parameters by orthogonalizing gradients rather than performing element-wise updates. * To achieve more accurate orthogonalization, they implemented the Polar Express iterative algorithm instead of the standard Newton-Schulz method, aiming to reduce noise in weight updates during the latter stages of large-scale training. * The optimization process also involved detailed adjustments to RMSNorm parameterization and learning rate (LR) management to ensure the model scales effectively. ### Training Stability via MuonClip * To address potential "logit explosion" in large-scale models, the team utilized MuonClip, a technique that clips attention logits to maintain stability. * Because standard Flash Attention stores Max Logit values only on-chip, the team modified the Flash Attention kernels to extract and return these values for monitoring and clipping purposes. * Stress tests conducted with high learning rates proved that MuonClip prevents training divergence and maintains performance levels even when the model is pushed to its limits. The development of Kanana-2 demonstrates that scaling to hundreds of billions of parameters requires more than just data; it necessitates deep architectural optimizations and custom kernel engineering. For organizations looking to train large-scale MoE models, adopting sophisticated orthogonalization optimizers and logit clipping mechanisms is highly recommended to ensure predictable and stable model convergence.

kakao

Kanana-2 Development Log ( (opens in new tab)

Kakao’s development of the Kanana-2 model family represents a strategic shift toward Agentic AI, prioritizing complex reasoning and execution capabilities over simple conversational fluency. By implementing a sophisticated post-training pipeline—including a specialized Mid-training stage and refined reinforcement learning—the team successfully enhanced the model's instruction-following and tool-calling performance. This methodology ensures that the 30B parameter models excel in logical tasks and real-world agentic environments while maintaining high linguistic stability in both English and Korean. ## Mid-training and Catastrophic Forgetting Prevention * A 250B token Mid-training stage was introduced between Pre-training and Post-training to bridge the gap in reasoning, coding, and tool-calling capabilities. * The dataset comprised 200B tokens of high-quality reasoning data (Chain-of-Thought math and code) and 50B tokens of "replay" data from the original pre-training set. * This replay strategy specifically targeted "Catastrophic Forgetting," preventing the model from losing its Korean linguistic nuances and performance on benchmarks like KoMT-bench while it gained English-heavy reasoning skills. * Experimental results indicated that Mid-training serves as a foundational "force multiplier," leading to faster convergence and higher performance ceilings during subsequent Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) stages. ## Enhanced Instruction Following and Tool Calling * To optimize for Agentic AI, the developers focused on Instruction Following (IFEval) by synthesizing high-quality, long-form responses that strictly adhere to complex constraints. * Tool-calling capabilities were improved using "Rejection Sampling" (Iterative SFT), where model-generated trajectories are validated in a real execution environment; only successful outcomes are retained for training. * The training data was categorized into distinct buckets—such as Chat, Math, Code, and Tool Calling—allowing for a more balanced recipe compared to previous Kanana versions. * This approach specifically addressed multi-turn and multi-tool scenarios, ensuring the model can handle the recursive logic required for autonomous agents. ## Parallel Reinforcement Learning and Calibration Tuning * A "Parallel RL" framework was adopted to optimize different capabilities simultaneously: the "Chat" track focused on helpfulness and safety, while the "Logic" track focused on accuracy in math and programming. * The pipeline moved beyond standard SFT to include Reinforcement Learning from Human Feedback (RLHF), utilizing DPO and PPO-style methods to align the model with human preferences. * A final "Calibration Tuning" step was implemented to ensure the model’s internal confidence levels match its actual accuracy, effectively reducing hallucinations and improving reliability in technical tasks. * Comparative benchmarks show that the Kanana-2 Instruct and Thinking models significantly outperform earlier versions and rival larger open-source models in reasoning and coding benchmarks like HumanEval and GSM8K. The Kanana-2 development cycle demonstrates that achieving "Agentic" performance requires more than just scaling data; it requires a structured transition from general language understanding to execution-verified reasoning. For organizations building AI agents, the Kanana-2 post-training recipe suggests that integrating environment-validated feedback and balancing reasoning data with foundational language "replays" is critical for creating reliable, multi-functional models.

kakao

Kakao’s “ (opens in new tab)

Kakao's Kanana-v-4b-hybrid is a multimodal language model designed to transcend simple image-to-text conversion by integrating logical reasoning and self-verification directly into its response process. By employing a hybrid architecture that handles both intuitive dialogue and complex visual reasoning within a single model, it achieves high accuracy and reliability for sophisticated tasks. This approach allows the model to maintain consistency in user experience while excelling in Korean-specific contexts, as evidenced by its record-breaking 92.8 score on the KoNET evaluation. ### Integrated Hybrid Architecture * Consolidates intuitive tasks (like OCR and summarization) and logical tasks (complex reasoning) into a single model to reduce system complexity and maintenance costs. * Eliminates the need for external routing between specialized models, ensuring a consistent tone, response format, and safety policy throughout a single conversation session. * Utilizes a refined training recipe that balances data ratios and visual reasoning training to ensure that improvements in multimodal understanding benefit all types of user queries. ### Visual Reasoning and Self-Reflection * Follows a natural logic flow: synthesizing information from images and text, applying conditions, verifying candidates, and finally concluding the response. * Features a "Reflection" mechanism where the model actively monitors its own thought process to catch "small but fatal" errors, such as calculation mistakes or missed constraints. * Excels in high-stakes visual tasks like receipt auditing, table filtering, and mathematical problem-solving by double-checking intermediate results against original image data. ### Native Korean Logical Processing * Prioritizes "thinking in Korean" to accurately preserve the nuances of complex constraints, such as "except for X" or "only in cases of Y," which are often lost during internal translation. * Develops a native Korean Rationale process to prevent logical drift, ensuring that the internal reasoning steps remain perfectly aligned with the linguistic structure of the user's query. * Addresses the difficulty of processing information scattered throughout Korean-language documents or exam papers by synthesizing data without language-conversion overhead. Kanana-v-4b-hybrid marks a shift toward "verifiable AI" that provides evidence-based answers rather than just plausible text. For applications in education, finance, or complex document processing, this model offers a blueprint for building trust through transparent reasoning and self-correction.

kakao

Development of an Ultra-lightweight Classic (opens in new tab)

Kakao developed a specialized, lightweight morphological analyzer to meet the strict resource constraints of mobile environments where modern deep-learning models are often too heavy. By opting for a classical Viterbi-based approach implemented in C++20, the team successfully reduced the library's binary size to approximately 200KB while ensuring high performance. This development highlights how traditional algorithmic optimization and careful language selection remain vital for mobile software efficiency. ## The Choice of C++ over Rust - While Rust was considered for its safety, it was ultimately rejected because its default binary size (even with optimization) reached several megabytes, which was too large for the specific project requirements. - C++ was chosen because mobile platforms like iOS and Android already include standard libraries (libc++ or libstdc++), allowing the final analyzer binary to be stripped down to core logic. - The project utilized C++20 features such as Concepts and `std::span` to replace older patterns like SFINAE and `gsl::span`, resulting in more readable and maintainable code without sacrificing performance. ## Trie Compression using LOUDS - To minimize the dictionary size, the team implemented a LOUDS (Level-Order Unary Degree Sequence) structure, which represents a Trie using a bit sequence instead of pointers. - This approach provides a compression rate near the information-theoretic lower bound, allowing approximately 760,000 nodes to be stored in just 9.4MB. - Further optimization was achieved through a custom encoding scheme that represents Hangul in 2 bytes and English in 1 byte, significantly reducing the dictionary's memory footprint compared to standard UTF-8. ## Optimizing the Select Bit Operation - Initial performance profiling showed that the `select0` operation (finding the N-th zero in a bit sequence) consumed 90% of the dictionary search time due to linear search overhead. - The solution involved dividing the bit sequence into 64-bit chunks and storing the cumulative count of zeros at each chunk boundary in a separate array. - By using binary search to find the correct chunk and applying parallel bit-counting techniques for intra-chunk searching, the dictionary search time was reduced from 165ms to 10ms. - These optimizations led to a total analysis time improvement from 182ms to 28ms, making the tool highly responsive for real-time mobile use. For mobile developers facing strict hardware limitations, this project proves that combining classical data structures like LOUDS with modern low-level language features can yield performance and size benefits that deep learning alternatives currently cannot match.

kakao

Smarter and More (opens in new tab)

Kakao has released Kanana-2, a high-performance open-source language model specifically engineered to power Agentic AI by enhancing tool-calling and instruction-following capabilities. Surpassing its predecessors and rivaling global frontier models like Qwen3, Kanana-2 offers a versatile suite of variants designed for practical, high-efficiency application in complex service environments. ### Optimized Model Lineup: Base, Instruct, and Thinking * **Kanana-2-30b-a3b-base:** Provided as a foundational model with pre-training weights, allowing researchers to fine-tune the model using their own datasets. * **Kanana-2-30b-a3b-instruct:** A version optimized through post-training to maximize the model's ability to follow complex user instructions accurately. * **Kanana-2-30b-a3b-thinking:** Kakao’s first reasoning-specialized model, designed for tasks requiring high-level logical thinking, such as mathematics and coding. ### Strengthening Agentic AI Capabilities * **Tool Calling:** Multi-turn tool-calling performance has improved more than threefold compared to Kanana-1.5, significantly enhancing its utility with the Model Context Protocol (MCP). * **Instruction Following:** The model's ability to understand and execute multi-step, complex user requirements has been refined to ensure reliable task completion. * **Reasoning-Tool Integration:** Unlike many reasoning models that lose instruction-following quality during deep thought, the "Thinking" variant maintains high performance in both logical deduction and tool use. ### High-Efficiency Architecture for Scale * **MLA (Multi-head Latent Attention):** Compresses memory usage to handle long contexts more efficiently, reducing the resources needed for extensive data processing. * **MoE (Mixture of Experts):** Activates only the necessary parameters during inference, maintaining high performance while drastically reducing computational costs and response times. * **Improved Tokenization:** A newly trained tokenizer has improved Korean language token efficiency by 30%, enabling faster throughput and lower latency in high-traffic environments like KakaoTalk. ### Expanded Multilingual Support * **Broad Linguistic Reach:** The model has expanded its support from just Korean and English to include six languages: Korean, English, Japanese, Chinese, Thai, and Vietnamese. By open-sourcing Kanana-2, Kakao provides a robust foundation for developers seeking to build responsive, tool-integrated AI services. Its focus on practical efficiency and advanced reasoning makes it an ideal choice for implementing agentic workflows in real-world applications where speed and accuracy are critical.

kakao

12 Reasons to Upgrade to MongoDB (opens in new tab)

MongoDB 8.0 marks a significant shift in the database's evolution, moving away from simple feature expansion to prioritize architectural stability and substantial performance gains. By addressing historical criticisms regarding write latency and query overhead, this release establishes a robust foundation for enterprise-scale applications requiring high throughput and long-term reliability. ### Extended Support and Release Strategy * MongoDB 8.0 is designated for five years of support (until October 2029), offering a stable "LTS-like" window that reduces the resource burden of frequent major upgrades. * The "Rapid Release" policy, previously exclusive to MongoDB Atlas, now extends to on-premise environments, allowing self-managed users to access minor release features and improvements more quickly. * This policy change provides DBAs with greater strategic flexibility to choose between prioritizing stability or adopting new features. ### Optimized "Majority" Write Concern * The criteria for "majority" write acknowledgment has shifted from `lastApplied` (when data is written to the data file) to `lastWritten` (when the entry is recorded in the `oplog.rs` collection). * This change bypasses the wait time for secondary nodes to physically apply changes to their storage engines, resulting in a 30–47% improvement in write throughput. * While this improves speed, applications that read from secondaries immediately after a write may need to implement Causally Consistent Sessions to ensure they see the most recent data. ### Efficient Bulk Operations * A new database-level `bulkWrite` command allows for operations across multiple collections within a single request, reducing network round-trip costs. * The system now groups multiple document inserts (up to a default of 500) into a single oplog entry instead of creating individual entries for every document. * This grouping aligns the oplog process with the WiredTiger storage engine’s internal batching, significantly reducing replication lag and improving overall write efficiency. ### High-Speed Indexing with Express Plan * MongoDB 8.0 introduces the "Express Plan" to optimize high-frequency, simple queries by bypassing the traditional multi-stage query optimizer. * Queries are eligible for this fast-track execution if they are point queries on the `_id` field or equality searches on fields with unique indexes (or queries using `limit: 1`). * By skipping the overhead of query parsing, normalization, and plan stage construction, the Express Plan maximizes CPU efficiency for the most common database interaction patterns. For organizations managing large-scale production environments, MongoDB 8.0 is a highly recommended upgrade. The combination of a five-year support lifecycle and fundamental improvements to replication and query execution makes it the most performant and operationally sound version of the database to date.