카카오

13 posts

tech.kakao.com

Filter by tag

kakao

Kanana-2 Development Story (2 (opens in new tab)

Kakao’s development of the Kanana-2 model family represents a strategic shift toward Agentic AI, prioritizing complex reasoning and execution capabilities over simple conversational fluency. By implementing a sophisticated post-training pipeline—including a specialized Mid-training stage and refined reinforcement learning—the team successfully enhanced the model's instruction-following and tool-calling performance. This methodology ensures that the 30B parameter models excel in logical tasks and real-world agentic environments while maintaining high linguistic stability in both English and Korean. ## Mid-training and Catastrophic Forgetting Prevention * A 250B token Mid-training stage was introduced between Pre-training and Post-training to bridge the gap in reasoning, coding, and tool-calling capabilities. * The dataset comprised 200B tokens of high-quality reasoning data (Chain-of-Thought math and code) and 50B tokens of "replay" data from the original pre-training set. * This replay strategy specifically targeted "Catastrophic Forgetting," preventing the model from losing its Korean linguistic nuances and performance on benchmarks like KoMT-bench while it gained English-heavy reasoning skills. * Experimental results indicated that Mid-training serves as a foundational "force multiplier," leading to faster convergence and higher performance ceilings during subsequent Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) stages. ## Enhanced Instruction Following and Tool Calling * To optimize for Agentic AI, the developers focused on Instruction Following (IFEval) by synthesizing high-quality, long-form responses that strictly adhere to complex constraints. * Tool-calling capabilities were improved using "Rejection Sampling" (Iterative SFT), where model-generated trajectories are validated in a real execution environment; only successful outcomes are retained for training. * The training data was categorized into distinct buckets—such as Chat, Math, Code, and Tool Calling—allowing for a more balanced recipe compared to previous Kanana versions. * This approach specifically addressed multi-turn and multi-tool scenarios, ensuring the model can handle the recursive logic required for autonomous agents. ## Parallel Reinforcement Learning and Calibration Tuning * A "Parallel RL" framework was adopted to optimize different capabilities simultaneously: the "Chat" track focused on helpfulness and safety, while the "Logic" track focused on accuracy in math and programming. * The pipeline moved beyond standard SFT to include Reinforcement Learning from Human Feedback (RLHF), utilizing DPO and PPO-style methods to align the model with human preferences. * A final "Calibration Tuning" step was implemented to ensure the model’s internal confidence levels match its actual accuracy, effectively reducing hallucinations and improving reliability in technical tasks. * Comparative benchmarks show that the Kanana-2 Instruct and Thinking models significantly outperform earlier versions and rival larger open-source models in reasoning and coding benchmarks like HumanEval and GSM8K. The Kanana-2 development cycle demonstrates that achieving "Agentic" performance requires more than just scaling data; it requires a structured transition from general language understanding to execution-verified reasoning. For organizations building AI agents, the Kanana-2 post-training recipe suggests that integrating environment-validated feedback and balancing reasoning data with foundational language "replays" is critical for creating reliable, multi-functional models.

kakao

Kanana-2 Development Story (1 (opens in new tab)

Kakao has introduced Kanana-2, a series of language models utilizing a Mixture of Experts (MoE) architecture to achieve high intelligence while maintaining low inference costs. To support the stable pre-training of their largest 155B parameter model, the team implemented advanced technical stacks including the Muon optimizer and MuonClip to prevent training instabilities. These developments reflect a strategic focus on balancing large-scale performance with "high-efficiency, low-cost" engineering. ### MoE Architecture and Scaling Strategy * Kanana-2 models, such as the 32B version, activate only 3B parameters during inference to maximize computational efficiency without sacrificing the intelligence of a larger model. * The team is currently training a massive 155B parameter version (Kanana-2-155b-a17b) using FP8 training infrastructure, MuonClip, and Hyperparameter Transfer to ensure stable convergence. * Custom-developed MoE kernels were integrated to reduce memory usage and increase training speed, resulting in a highly stable Loss Curve even during constant learning rate phases. ### A Controlled Testbed for Mid- and Post-Training * The Kanana-2-30b-a3b-base-2601 model was intentionally released without synthetic reasoning data to serve as a "clean" base for research. * This model allows researchers to investigate phenomena like "Reasoning Trace Distribution Mismatch" and "Spurious Rewards" by providing a baseline unaffected by post-training interventions. * By offering a high-quality Korean base model, Kakao aims to support the local AI community in conducting more rigorous experiments on mathematical and logical reasoning. ### Optimization with Muon and Polar Express * Kakao shifted from the industry-standard AdamW optimizer to Muon, which updates parameters by orthogonalizing gradients rather than performing element-wise updates. * To achieve more accurate orthogonalization, they implemented the Polar Express iterative algorithm instead of the standard Newton-Schulz method, aiming to reduce noise in weight updates during the latter stages of large-scale training. * The optimization process also involved detailed adjustments to RMSNorm parameterization and learning rate (LR) management to ensure the model scales effectively. ### Training Stability via MuonClip * To address potential "logit explosion" in large-scale models, the team utilized MuonClip, a technique that clips attention logits to maintain stability. * Because standard Flash Attention stores Max Logit values only on-chip, the team modified the Flash Attention kernels to extract and return these values for monitoring and clipping purposes. * Stress tests conducted with high learning rates proved that MuonClip prevents training divergence and maintains performance levels even when the model is pushed to its limits. The development of Kanana-2 demonstrates that scaling to hundreds of billions of parameters requires more than just data; it necessitates deep architectural optimizations and custom kernel engineering. For organizations looking to train large-scale MoE models, adopting sophisticated orthogonalization optimizers and logit clipping mechanisms is highly recommended to ensure predictable and stable model convergence.

kakao

The Development of Kakao's " (opens in new tab)

Kakao's Kanana-v-4b-hybrid is a multimodal language model designed to transcend simple image-to-text conversion by integrating logical reasoning and self-verification directly into its response process. By employing a hybrid architecture that handles both intuitive dialogue and complex visual reasoning within a single model, it achieves high accuracy and reliability for sophisticated tasks. This approach allows the model to maintain consistency in user experience while excelling in Korean-specific contexts, as evidenced by its record-breaking 92.8 score on the KoNET evaluation. ### Integrated Hybrid Architecture * Consolidates intuitive tasks (like OCR and summarization) and logical tasks (complex reasoning) into a single model to reduce system complexity and maintenance costs. * Eliminates the need for external routing between specialized models, ensuring a consistent tone, response format, and safety policy throughout a single conversation session. * Utilizes a refined training recipe that balances data ratios and visual reasoning training to ensure that improvements in multimodal understanding benefit all types of user queries. ### Visual Reasoning and Self-Reflection * Follows a natural logic flow: synthesizing information from images and text, applying conditions, verifying candidates, and finally concluding the response. * Features a "Reflection" mechanism where the model actively monitors its own thought process to catch "small but fatal" errors, such as calculation mistakes or missed constraints. * Excels in high-stakes visual tasks like receipt auditing, table filtering, and mathematical problem-solving by double-checking intermediate results against original image data. ### Native Korean Logical Processing * Prioritizes "thinking in Korean" to accurately preserve the nuances of complex constraints, such as "except for X" or "only in cases of Y," which are often lost during internal translation. * Develops a native Korean Rationale process to prevent logical drift, ensuring that the internal reasoning steps remain perfectly aligned with the linguistic structure of the user's query. * Addresses the difficulty of processing information scattered throughout Korean-language documents or exam papers by synthesizing data without language-conversion overhead. Kanana-v-4b-hybrid marks a shift toward "verifiable AI" that provides evidence-based answers rather than just plausible text. For applications in education, finance, or complex document processing, this model offers a blueprint for building trust through transparent reasoning and self-correction.

kakao

Building an Ultra-lightweight (opens in new tab)

Kakao developed a specialized, lightweight morphological analyzer to meet the strict resource constraints of mobile environments where modern deep-learning models are often too heavy. By opting for a classical Viterbi-based approach implemented in C++20, the team successfully reduced the library's binary size to approximately 200KB while ensuring high performance. This development highlights how traditional algorithmic optimization and careful language selection remain vital for mobile software efficiency. ## The Choice of C++ over Rust - While Rust was considered for its safety, it was ultimately rejected because its default binary size (even with optimization) reached several megabytes, which was too large for the specific project requirements. - C++ was chosen because mobile platforms like iOS and Android already include standard libraries (libc++ or libstdc++), allowing the final analyzer binary to be stripped down to core logic. - The project utilized C++20 features such as Concepts and `std::span` to replace older patterns like SFINAE and `gsl::span`, resulting in more readable and maintainable code without sacrificing performance. ## Trie Compression using LOUDS - To minimize the dictionary size, the team implemented a LOUDS (Level-Order Unary Degree Sequence) structure, which represents a Trie using a bit sequence instead of pointers. - This approach provides a compression rate near the information-theoretic lower bound, allowing approximately 760,000 nodes to be stored in just 9.4MB. - Further optimization was achieved through a custom encoding scheme that represents Hangul in 2 bytes and English in 1 byte, significantly reducing the dictionary's memory footprint compared to standard UTF-8. ## Optimizing the Select Bit Operation - Initial performance profiling showed that the `select0` operation (finding the N-th zero in a bit sequence) consumed 90% of the dictionary search time due to linear search overhead. - The solution involved dividing the bit sequence into 64-bit chunks and storing the cumulative count of zeros at each chunk boundary in a separate array. - By using binary search to find the correct chunk and applying parallel bit-counting techniques for intra-chunk searching, the dictionary search time was reduced from 165ms to 10ms. - These optimizations led to a total analysis time improvement from 182ms to 28ms, making the tool highly responsive for real-time mobile use. For mobile developers facing strict hardware limitations, this project proves that combining classical data structures like LOUDS with modern low-level language features can yield performance and size benefits that deep learning alternatives currently cannot match.

kakao

Releasing Smarter and (opens in new tab)

Kakao has released Kanana-2, a high-performance open-source language model specifically engineered to power Agentic AI by enhancing tool-calling and instruction-following capabilities. Surpassing its predecessors and rivaling global frontier models like Qwen3, Kanana-2 offers a versatile suite of variants designed for practical, high-efficiency application in complex service environments. ### Optimized Model Lineup: Base, Instruct, and Thinking * **Kanana-2-30b-a3b-base:** Provided as a foundational model with pre-training weights, allowing researchers to fine-tune the model using their own datasets. * **Kanana-2-30b-a3b-instruct:** A version optimized through post-training to maximize the model's ability to follow complex user instructions accurately. * **Kanana-2-30b-a3b-thinking:** Kakao’s first reasoning-specialized model, designed for tasks requiring high-level logical thinking, such as mathematics and coding. ### Strengthening Agentic AI Capabilities * **Tool Calling:** Multi-turn tool-calling performance has improved more than threefold compared to Kanana-1.5, significantly enhancing its utility with the Model Context Protocol (MCP). * **Instruction Following:** The model's ability to understand and execute multi-step, complex user requirements has been refined to ensure reliable task completion. * **Reasoning-Tool Integration:** Unlike many reasoning models that lose instruction-following quality during deep thought, the "Thinking" variant maintains high performance in both logical deduction and tool use. ### High-Efficiency Architecture for Scale * **MLA (Multi-head Latent Attention):** Compresses memory usage to handle long contexts more efficiently, reducing the resources needed for extensive data processing. * **MoE (Mixture of Experts):** Activates only the necessary parameters during inference, maintaining high performance while drastically reducing computational costs and response times. * **Improved Tokenization:** A newly trained tokenizer has improved Korean language token efficiency by 30%, enabling faster throughput and lower latency in high-traffic environments like KakaoTalk. ### Expanded Multilingual Support * **Broad Linguistic Reach:** The model has expanded its support from just Korean and English to include six languages: Korean, English, Japanese, Chinese, Thai, and Vietnamese. By open-sourcing Kanana-2, Kakao provides a robust foundation for developers seeking to build responsive, tool-integrated AI services. Its focus on practical efficiency and advanced reasoning makes it an ideal choice for implementing agentic workflows in real-world applications where speed and accuracy are critical.

kakao

12 Reasons to Upgrade to MongoDB (opens in new tab)

MongoDB 8.0 marks a significant shift in the database's evolution, moving away from simple feature expansion to prioritize architectural stability and substantial performance gains. By addressing historical criticisms regarding write latency and query overhead, this release establishes a robust foundation for enterprise-scale applications requiring high throughput and long-term reliability. ### Extended Support and Release Strategy * MongoDB 8.0 is designated for five years of support (until October 2029), offering a stable "LTS-like" window that reduces the resource burden of frequent major upgrades. * The "Rapid Release" policy, previously exclusive to MongoDB Atlas, now extends to on-premise environments, allowing self-managed users to access minor release features and improvements more quickly. * This policy change provides DBAs with greater strategic flexibility to choose between prioritizing stability or adopting new features. ### Optimized "Majority" Write Concern * The criteria for "majority" write acknowledgment has shifted from `lastApplied` (when data is written to the data file) to `lastWritten` (when the entry is recorded in the `oplog.rs` collection). * This change bypasses the wait time for secondary nodes to physically apply changes to their storage engines, resulting in a 30–47% improvement in write throughput. * While this improves speed, applications that read from secondaries immediately after a write may need to implement Causally Consistent Sessions to ensure they see the most recent data. ### Efficient Bulk Operations * A new database-level `bulkWrite` command allows for operations across multiple collections within a single request, reducing network round-trip costs. * The system now groups multiple document inserts (up to a default of 500) into a single oplog entry instead of creating individual entries for every document. * This grouping aligns the oplog process with the WiredTiger storage engine’s internal batching, significantly reducing replication lag and improving overall write efficiency. ### High-Speed Indexing with Express Plan * MongoDB 8.0 introduces the "Express Plan" to optimize high-frequency, simple queries by bypassing the traditional multi-stage query optimizer. * Queries are eligible for this fast-track execution if they are point queries on the `_id` field or equality searches on fields with unique indexes (or queries using `limit: 1`). * By skipping the overhead of query parsing, normalization, and plan stage construction, the Express Plan maximizes CPU efficiency for the most common database interaction patterns. For organizations managing large-scale production environments, MongoDB 8.0 is a highly recommended upgrade. The combination of a five-year support lifecycle and fundamental improvements to replication and query execution makes it the most performant and operationally sound version of the database to date.

kakao

Korean and Images All at Once: Kak (opens in new tab)

Kakao has developed Kanana-v-embedding, a specialized multimodal embedding model designed to bridge the gap between Korean text and visual data within a unified semantic space. By leveraging a Vision-Language Model (VLM) framework, the model enables seamless search and recommendation across various combinations of text and images, offering a significant performance boost over existing English-centric models like CLIP. This development provides a robust technical foundation for enhancing Kakao’s services, including RAG-based systems and localized content discovery. ### Unified Multimodal Meaning Space * The model maps text and images into a single vector space where semantic similarity is measured via cosine similarity. * Unlike traditional CLIP models that use independent encoders, this architecture treats text and images as a single sequence, allowing for "text + image" combined queries. * It supports four primary interaction modes: Text-to-Text, Text-to-Image, Image-to-Image, and (Text+Image)-to-(Text+Image). ### VLM-Based Architecture and Instruction Tuning * The system utilizes a VLM consisting of an LLM and an image encoder, extracting embeddings from the final hidden state of the [EOS] token. * It employs instruction-based query embedding, where specific prompts (e.g., "Find an image matching this caption") guide the model to generate embeddings tailored to the specific task, such as retrieval or classification. * The model is optimized for the Korean language and cultural context, addressing the limitations of previous models that struggled with non-English data. ### Advanced Training for Scalability and Precision * **Gradient Caching:** To overcome GPU memory limitations, this technique allows the model to train with effectively large batch sizes, which is critical for the InfoNCE loss used in contrastive learning. * **Matryoshka Representation Learning (MRL):** The model supports flexible embedding sizes ranging from 64 to 2,048 dimensions. This allows services to choose between low-latency (smaller dimensions) or high-precision (larger dimensions) without retraining. * **Hard Negative Mining:** The training process incorporates "hard negatives"—items that are similar but incorrect—to sharpen the model’s ability to distinguish between subtle differences in data. ### Performance Benchmarks and Efficiency * Kanana-v-embedding significantly outperforms CLIP and VLM2Vec on the KoEmbed benchmark, particularly in Korean Text-to-Image and Image-to-Text retrieval tasks. * In the M-BEIR (Multimodal Benchmark for Retrieval), the model demonstrated superior performance in multimodal document retrieval and image-to-text tasks compared to established open-source models. * Evaluation of MRL showed that the model retains high accuracy even when dimensions are reduced to 256 or 512, providing a 4x to 8x improvement in storage and search efficiency with minimal loss in quality. For organizations looking to implement multimodal RAG or advanced recommendation systems in Korean-language environments, Kanana-v-embedding offers a highly adaptable solution. Its ability to balance computational cost and retrieval quality through Matryoshka learning makes it particularly suitable for large-scale production environments where latency is a primary concern.

kakao

The Evolution of Kanana-o (opens in new tab)

Kakao has significantly advanced its integrated multimodal model, Kanana-o, by enhancing its ability to process complex instructions across text, image, and audio inputs while enriching its emotional vocal expression. By developing specialized datasets and sophisticated training techniques for prosody, the team has bridged the performance gap between text and audio modalities. The result is a more natural, human-like AI capable of nuanced interaction and high-performance instruction following, particularly within the Korean linguistic context. ## Advancing Multimodal Instruction Following * Addressed the "modality gap" where multimodal models often show decreased reasoning and reasoning performance when processing audio inputs compared to text. * Constructed a structured, high-quality dataset featuring complex, multi-step instructions such as summarizing a context and then translating it into a specific language or style. * Leveraged the Speech-KoMT-Bench to evaluate performance, showing that Kanana-o significantly outperforms global competitors of similar scale in Korean-specific tasks. * Focused on "Domain-generalization" to ensure the model's core intelligence remains stable regardless of whether the input is text, audio, or a combination of both. ## Image-Audio-Text Modality Alignment * Developed integrated datasets to ensure that reasoning capabilities learned in text-image or text-audio contexts generalize to complex image-audio scenarios. * Trained the model to handle tasks where users ask questions about visual information via voice, requiring the simultaneous alignment of three different data types. * Prioritized the maintenance of "World Knowledge" during multimodal training so that the addition of new modalities does not degrade the model’s factual accuracy. ## Enhancing Vocal Expressiveness and Prosody * Focused on "prosody"—the rhythm, pitch, and stress of speech—to move beyond robotic, flat text-to-speech (TTS) outputs. * Implemented a system of descriptive tokens and emotion tags (e.g., "warm voice," "excited tone") during training to give the model fine-grained control over its vocal persona. * Incorporated natural human speech elements, such as realistic breathing patterns and contextual variations in speech speed, to make interactions feel more intuitive and less synthetic. * Refined the model's ability to interpret the user's emotional state from their voice and respond with a matching emotional intensity. The evolution of Kanana-o highlights a shift from simply maximizing generic benchmarks to optimizing real-world user experiences through multimodal alignment and emotional intelligence. The success of this model underscores the necessity of high-quality, structured instruction data and fine-grained control over output styles to create truly conversational AI that feels natural to the user.

kakao

What AI TOP 100 (opens in new tab)

The Kakao AI Native Strategy team successfully developed a complex competition system for the "AI TOP 100" event in just two weeks by replacing traditional waterfall methodologies with an AI-centric approach. By utilizing tools like Cursor and Claude Code, the team shifted the developer’s role from manual coding to high-level orchestration and validation. This experiment demonstrates that AI does not replace developers but rather redefines the "standard" of productivity, moving the focus from execution speed to strategic decision-making. ### Rapid Prototyping as the New Specification * The team eliminated traditional, lengthy planning documents and functional specifications. * Every team member was tasked with creating a working prototype using AI based on their own interpretation of the project goals. * One developer produced six different versions of the system independently, allowing the team to "see" ideas rather than read about them. * Final requirements were established by reviewing and merging the best features of these functional prototypes, significantly reducing communication overhead. ### AI-Native Development and 99% Delegation * The majority of the codebase (over 99%) was generated by AI tools like Claude Code and Cursor, with developers focusing on intent and review. * One developer recorded an extreme usage of 200 million tokens in a single day to accelerate system completion. * The high productivity of AI allowed a single frontend developer to manage the entire UI for both the preliminary and main rounds, a task that typically requires a much larger team. * The development flow moved away from linear "think-code-test" patterns to a "dialogue-based" implementation where ideas were instantly turned into code. ### PoC-Driven Development (PDD) * The team adopted a "Proof of Concept (PoC) Driven Development" model to handle high uncertainty and tight deadlines. * Abstract concepts were immediately fed into AI to generate functional PoC code and architectural drafts. * The human role shifted from "writing from scratch" to "judging and selecting" the most viable outputs generated by the AI. * This approach allowed the team to bypass resource limitations by prioritizing speed and functional verification over perfectionist documentation. ### Human Governance and the Role of Experience * Internal conflicts occasionally arose when different AI models suggested equally "logical" but conflicting architectural solutions. * Senior developers played a critical role in breaking these deadlocks by applying real-world experience regarding long-term maintainability and system constraints. * While AI provided the "engine" for speed, human intuition remained the "steering wheel" to ensure the system met specific organizational standards. * The project highlighted that as AI handles more of the implementation, a developer’s ability to judge code quality and architectural fit becomes their most valuable asset. This project serves as a blueprint for the future of software engineering, where AI is treated as a peer programmer rather than a simple tool. To stay competitive, development teams should move away from rigid waterfall processes and embrace a PoC-centric workflow that leverages AI to collapse the distance between ideation and deployment.

kakao

Y is Watching – The Story of Kak (opens in new tab)

Kakao developed YEYE, a dedicated Attack Surface Management (ASM) system, to proactively identify and manage the organization's vast digital footprint, including IPs, domains, and open ports. By integrating automated scanning with a human-led Daily Security Review (DSR) process, the platform transforms raw asset data into actionable security intelligence. This holistic approach ensures that potential entry points are identified and secured before they can be exploited by external threats. ## The YEYE Asset Management Framework * Defines attack surfaces broadly to include every external-facing digital asset, such as subdomains, API endpoints, and mobile APKs. * Categorizes assets using a standardized taxonomy based on scope (In/Out/Undefined), type (Domain/IP/Service), and identification status (Known/Unknown/3rd Party). * Implements a labeling system that converts diverse data formats from multiple sources into a simplified, unified structure for better visibility. * Establishes multi-dimensional relationships between assets, CVEs, certificates, and departments, allowing teams to instantly identify which business unit is responsible for a newly discovered vulnerability. ## Daily Security Review (DSR) * Operates on the principle that "security is a process, not a product," bridging the gap between automated detection and manual remediation. * Utilizes a rotating group system where security engineers review external feeds, public vulnerability news, and YEYE alerts every morning. * Focuses on detecting "shadow IT" or assets deployed without formal security reviews to ensure all external touchpoints are accounted for. ## Scalable and Efficient Scanning Architecture * Resolved internal network bandwidth bottlenecks by adopting a hybrid infrastructure that leverages public cloud resources for high-concurrency scanning tasks. * Developed a custom distributed scanning structure using schedulers and queues to manage multiple independent workers, overcoming the limitations of single-process open-source scanners. * Optimized infrastructure costs by identifying the "sweet spot" in server specifications, favoring the horizontal expansion of medium-spec servers over expensive, high-performance hardware. * Mitigates service impact and false alarms by using fixed IPs and custom User-Agent (UA) strings, allowing service owners to distinguish YEYE’s security probes from actual malicious traffic. To effectively manage a growing attack surface, organizations should combine automated asset discovery with a structured manual review process. Prioritizing data standardization and relationship mapping between assets and vulnerabilities is essential for rapid incident response and long-term infrastructure hardening.

kakao

[AI_TOP_1 (opens in new tab)

The AI TOP 100 contest was designed to shift the focus from evaluating AI model performance to measuring human proficiency in solving real-world problems through AI collaboration. By prioritizing the "problem-solving process" over mere final output, the organizers sought to identify individuals who can define clear goals and navigate the technical limitations of current AI tools. The conclusion of this initiative suggests that true AI literacy is defined by the ability to maintain a "human-in-the-loop" workflow where human intuition guides AI execution and verification. ### Core Philosophy of Human-AI Collaboration * **Human-in-the-Loop:** The contest emphasizes a cycle of human analysis, AI problem-solving, and human verification. This ensures that the human remains the "pilot" who directs the AI engine and takes responsibility for the quality of the result. * **Strategic Intervention:** Participants were encouraged to provide AI with structural context it might struggle to perceive (like complex table relationships) and to perform data pre-processing to improve AI accuracy. * **Task Delegation:** For complex iterative tasks, such as generating images for a montage, solvers were expected to build automated pipelines using AI agents to handle repetitive feedback loops while focusing human effort on higher-level strategy. ### Designing Against "One-Shot" Solutions * **Low Barrier, High Ceiling:** Problems were designed to be intuitive enough for anyone to understand but complex enough to prevent "one-shot" solutions (the "click-and-solve" trap). * **Targeting Technical Weaknesses:** Organizers intentionally embedded technical hurdles that current LLMs struggle with, forcing participants to demonstrate how they bridge the gap between AI limitations and a correct answer. * **The Difficulty Ladder:** To account for varying domain expertise (e.g., OCR experience), problems utilized a multi-part structure. This included "Easy" starting questions to build momentum and "Medium" hint questions that guided participants toward solving the more difficult "Killer" components. ### The 4-Pattern Problem Framework * **P1 - Insight (Analysis & Definition):** Identifying meaningful opportunities or problems within complex, unstructured data. * **P2 - Action (Implementation & Automation):** Developing functional code or workflows to execute a defined solution. * **P3 - Persuasion (Strategy & Creativity):** Generating logical and creative content to communicate technical solutions to non-technical stakeholders. * **P4 - Decision (Optimization):** Making optimal choices and simulations to maximize goals under specific constraints. ### Quality Assurance and Score Calibration * **4-Stage Pipeline:** Problems moved from Ideation to Drafting (testing for one-shot immunity), then to Candidate (analyzing abuse vulnerabilities), and finally to a Final selection based on difficulty balance. * **Cross-Model Validation:** Internal and alpha testers solved problems using various models including Claude, GPT, and Gemini to ensure that no single tool could bypass the intended human-led process. * **Effort-Based Scoring:** Instead of uniform points, scores were calibrated based on the "effort cost" and human competency required to solve them. This resulted in varying total points per problem to better reflect the true difficulty of the task. In the era of rapidly evolving AI, the ability to "use" a tool is becoming less valuable than the ability to "collaborate" with it. This shift requires a move toward building automated pipelines and utilizing a "difficulty ladder" approach to tackle complex, multi-stage problems that AI cannot yet solve in a single iteration.

kakao

How the POPM program became (opens in new tab)

Kakao developed its internal POPM (Product Owner/Product Manager) training program by treating the curriculum itself as an evolving product rather than a static lecture series. By applying agile methodologies such as data-driven prioritization and iterative versioning, the program successfully moved from a generic pilot to a structured framework that aligns teams through a shared language of problem-solving. This approach demonstrates that internal capability building is most effective when managed with the same rigor and experimentation used in software development. ## Strategic Motivation for POPM Training * Addressed the inherent ambiguity of the PO/PM role, where non-visible tasks often make it difficult for practitioners to define their own growth or impact. * Sought to resolve the disconnect between strategic problem definition (PO) and tactical execution (PM) within Kakao’s teams. * Prioritized the creation of a "common language" to allow cross-functional team members to define problems, analyze metrics, and design experiments under a unified structure. ## Iterative Design and Versioning * The program transitioned through multiple "versions," starting with an 8-session pilot that covered the entire lifecycle from bottleneck exploration to execution review. * Based on participant feedback regarding high fatigue and low efficiency in long presentations, the curriculum was condensed into 5 core modules: Strategy, Metrics, Experiment, Design, and Execution. * The instructional design shifted from "delivering information" to "designing a rhythm," utilizing a "one slide, one question, one example" rule to maintain engagement. ## Data-Driven Program Refinement * Applied a "Product Metaphor" to education by calculating "Opportunity Scores" using a matrix of Importance vs. Satisfaction for each session. * Identified "Data/Metrics" as the highest priority for redesign because it scored high in importance but low in satisfaction, indicating a structural gap in the teaching method. * Refined the "features" of the training by redesigning worksheets to focus on execution routines and converting mandatory practice tasks into selective, flexible modules. ## Structural Insights for Organizational Growth * Focused on accumulating "structure" rather than just training individuals, ensuring that even as participants change, the framework for defining problems remains consistent within the organization. * Designed practice sessions to function as "thinking structures" rather than "answer-seeking" exercises, encouraging teams to bring their training insights directly into actual team meetings. * Prioritized scalability and simplicity in the curriculum to ensure the structure can be adopted across different departments with varying product needs. To build effective internal capabilities, organizations should treat training as a product that requires constant maintenance and versioning. Instead of focusing on one-off lectures, leaders should design structural "rhythms" and feedback loops that allow the curriculum to evolve based on the actual pain points of the practitioners.

kakao

Were we solving the real (opens in new tab)

The POPM (Product Owner/Product Manager) training course at Kakao focuses on restructuring existing professional knowledge into a cohesive framework for solving real-world business problems. Rather than simply delivering new information, the program emphasizes aligning strategy with execution, transforming "strategy" from a vague concept into a practical set of decision-making criteria. The ultimate goal is to move teams away from a "release-only" mindset toward a cycle of continuous hypothesis verification and learning. ### Strategic Thinking and Metric Modeling * **Strategic Decision Criteria**: Strategy is redefined as the standard for team judgment, utilizing frameworks like MECE, MVP, and priority-setting models to align daily tasks with long-term goals. * **Metrics as Problem-Solving Language**: Key indicators such as Funnel, Retention, Cohort, and LTV are treated not just as data points, but as a language used to define and reveal underlying product issues. * **Context-Based Design**: UX design is approached through "context-based logic" rather than intuition, encouraging teams to ask which specific design fits the current user journey. ### Systematic Experimentation and A/B Testing * **The MASS Framework**: Experiments are designed and evaluated based on being Measurable, Attributable, Sensitive, and having a Short-term cycle. * **Failure Analysis Routines**: The curriculum emphasizes the importance of establishing a routine for interpreting failed experiments, ensuring that every test contributes to the team's institutional knowledge. * **Incremental Testing**: Encourages a culture of "starting small," giving teams the confidence to run experiments without requiring massive resource allocation. ### Building Repeatable Execution Loops * **Metric-Based Retrospectives**: Teams transition from simply finishing a release to a structured loop of "Problem Definition → Hypothesis → Metric → Verification → Retrospective." * **Formalizing Problem Definitions**: Using templates to 명문화 (formally document) the problem, expected behavior, and success metrics ensures that the entire team—not just the PO—understands the "why" behind every task. * **Operational Rhythms**: Teams are adopting fixed weekly or bi-weekly cycles for sharing insights and adjusting priorities, turning data-driven execution into a natural habit. The most critical takeaway for product teams is to constantly ask: "Is the work we are doing right now actually a solution to a defined problem, or are we just busy releasing features?" Success lies in moving beyond the sense of accomplishment from a launch and establishing a repeatable rhythm that validates whether those efforts truly move the needle.