llm-agent

2 posts

naver

Naver Integrated Search LLM DevOps (opens in new tab)

Naver’s Integrated Search team is transitioning from manual fault response to an automated system using LLM Agents to manage the increasing complexity of search infrastructure. By integrating Large Language Models into the DevOps pipeline, the system evolves through accumulated experience, moving beyond simple alert monitoring to intelligent diagnostic analysis and action recommendation. ### Limitations of Traditional Fault Response * **Complex Search Flows:** Naver’s search architecture involves multiple interdependent layers, which makes manual root cause analysis slow and prone to human error. * **Fragmented Context:** Existing monitoring requires developers to manually synthesize logs and metrics from disparate telemetry sources, leading to high cognitive load during outages. * **Delayed Intervention:** Human-led responses often suffer from a "detection-to-action" lag, especially during high-traffic periods or subtle service regressions. ### Architecture of DevOps Agent v1 * **Initial Design:** Focused on automating basic data gathering and providing preliminary textual reports to engineers. * **Infrastructure Integration:** Built using a specialized software stack designed to bridge frontend (FE) and backend (BE) telemetry within the search infrastructure. * **Standardized Logic:** The v1 agent operated on a fixed set of instructions to perform predefined diagnostic tasks when triggered by specific system alarms. ### Evolution to DevOps Agent v2 * **Overcoming V1 Limitations:** The first iteration struggled with maintaining deep context and providing diverse actionable insights, necessitating a more robust agentic structure. * **Enhanced Memory and Learning:** V2 incorporates a more sophisticated architecture that allows the agent to reference historical failure data and learn from past incident resolutions. * **Advanced Tool Interaction:** The system was upgraded to handle more complex tool-calling capabilities, allowing the agent to interact more deeply with internal infrastructure APIs. ### System Operations and Evaluation * **Trigger Queue Management:** Implements a queuing system to efficiently process and prioritize multiple concurrent system alerts without overwhelming the diagnostic pipeline. * **Anomaly Detection:** Utilizes advanced detection methods to distinguish between routine traffic fluctuations and genuine service anomalies that require LLM intervention. * **Rigorous Evaluation:** The agent’s performance is measured through a dedicated evaluation framework that assesses the accuracy of its diagnoses against known ground-truth incidents. ### Scaling and Future Challenges * **Context Expansion:** Efforts are focused on integrating a wider range of metadata and environmental context to provide a holistic view of system health. * **Action Recommendation:** The system is moving toward suggesting specific recovery actions, such as rollbacks or traffic rerouting, rather than just identifying the problem. * **Sustainability:** Ensuring the DevOps Agent remains maintainable and cost-effective as the underlying search infrastructure and LLM models continue to evolve. Organizations managing high-scale search traffic should consider LLM-based agents as integrated infrastructure components rather than standalone tools. Moving from reactive monitoring to a proactive, experience-based agent system is essential for reducing the mean time to recovery (MTTR) in complex distributed environments.

line

The Present State of LY Corporation's (opens in new tab)

Tech-Verse 2025 showcased LY Corporation’s strategic shift toward an AI-integrated ecosystem following the merger of LINE and Yahoo Japan. The event focused on the practical hurdles of deploying generative AI, concluding that the transition from experimental models to production-ready services requires sophisticated evaluation frameworks and deep contextual integration into developer workflows. ## AI-Driven Engineering with Ark Developer LY Corporation’s internal "Ark Developer" solution demonstrates how AI can be embedded directly into the software development life cycle. * The system utilizes a Retrieval-Augmented Generation (RAG) based code assistant to handle tasks such as code completion, security reviews, and automated test generation. * Rather than treating codebases as simple text documents, the tool performs graph analysis on directory structures to maintain structural context during code synthesis. * Real-world application includes a seamless integration with GitHub for automated Pull Request (PR) creation, with internal users reporting higher satisfaction compared to off-the-shelf tools like GitHub Copilot. ## Quantifying Quality in Generative AI A significant portion of the technical discussion centered on moving away from subjective "vibes-based" assessments toward rigorous, multi-faceted evaluation of AI outputs. * To measure the quality of generated images, developers utilized traditional metrics like Fréchet Inception Distance (FID) and Inception Score (IS) alongside LAION’s Aesthetic Score. * Advanced evaluation techniques were introduced, including CLIP-IQA, Q-Align, and Visual Question Answering (VQA) based on video-language models to analyze image accuracy. * Technical challenges in image translation and inpainting were highlighted, specifically the difficulty of restoring layout and text structures naturally after optical character recognition (OCR) and translation. ## Global Technical Exchange and Implementation The conference served as a collaborative hub for engineers across Japan, Taiwan, and Korea to discuss the implementation of emerging standards like the Model Context Protocol (MCP). * Sessions emphasized the "how-to" of overcoming deployment hurdles rather than just following technical trends. * Poster sessions (Product Street) and interactive Q&A segments allowed developers to share localized insights on LLM agent performance and agentic workflows. * The recurring theme across diverse teams was that the "evaluation and verification" stage is now the primary driver of quality in generative AI services. For organizations looking to scale AI, the key recommendation is to move beyond simple implementation and invest in "evaluation-driven development." By building internal tools that leverage graph-based context and quantitative metrics like Aesthetic Scores and VQA, teams can ensure that generative outputs meet professional service standards.