라인 / llm

7 posts

line

엔터프라이즈 LLM 서비스 구축기 1: 컨텍스트 엔지니어링 (opens in new tab)

LY Corporation’s engineering team developed an AI assistant for their private cloud platform, Flava, by prioritizing "context engineering" over traditional prompt engineering. To manage a complex environment of 260 APIs and hundreds of technical documents, they implemented a strategy of progressive disclosure to ensure the LLM receives only the most relevant information for any given query. This approach allows the assistant to move beyond simple RAG-based document summarization to perform active diagnostics and resource management based on real-time API data. ### Performance Limitations of Long Contexts * Research indicates that LLM performance can drop by 13.9% to 85% as context length increases, even if the model technically supports a large token window. * The phenomenon of "context rot" occurs when low-quality or irrelevant information is mixed into the input, causing the model to generate confident but incorrect answers. * Because LLMs are stateless, maintaining conversation history and processing dense JSON responses from multiple APIs quickly exhausts context windows and degrades reasoning quality. ### Progressive Disclosure and Tool Selection * The system avoids loading all 260+ API definitions at once; instead, it analyzes the user's intent to select only the necessary tools, such as loading only Redis-related APIs when a user asks about a cluster. * Specific product usage hints, such as the distinction between private and CDN settings for Object Storage, are injected only when those specific services are invoked. * This phased approach significantly reduces token consumption and prevents the model from being overwhelmed by irrelevant technical specifications. ### Response Guidelines and the "Mock Tool Message" Strategy * The team distinguished between "System Prompts" (global rules) and "Response Guidelines" (situational instructions), such as directing users to a console UI before suggesting CLI commands. * Injecting specific guidelines into the system prompt often caused "instruction conflict," where the LLM might hallucinate information to satisfy a guideline while ignoring core requirements like using search tools. * To resolve these conflicts, the team utilized "ToolMessages" to inject guidelines; by formatting instructions as if they were results from a tool execution, the LLM treats the information as factual context rather than a command that might override the system prompt. To build a robust enterprise LLM service, developers should focus on dynamic context management rather than static prompt optimization. Treating operational guidelines as external data via mock tool messages, rather than system instructions, provides a scalable way to reduce hallucinations and maintain high performance across hundreds of integrated services.

line

안전은 기본, 비용 절감은 덤: AI 서비스에 별도 가드레일이 필요한 이유 (opens in new tab)

AI developers often rely on system prompts to enforce safety rules, but this integrated approach frequently leads to "over-refusal" and unpredictable shifts in model performance. To ensure both security and operational efficiency, it is increasingly necessary to decouple safety mechanisms into separate guardrail systems that operate independently of the primary model's logic. ## Negative Impact on Model Utility * Integrating safety instructions directly into system prompts often leads to a high False Positive Rate (FPR), where the model rejects harmless requests alongside harmful ones. * Technical analysis using Principal Component Analysis (PCA) reveals that guardrail prompts shift the model's embedding results in a consistent direction toward refusal, regardless of the input's actual intent. * Studies show that aggressive safety prompting can cause models to refuse benign technical queries—such as "how to kill a Python process"—because the model adopts an overly conservative decision boundary. ## Positional Bias and Context Neglect * Research on the "Lost in the Middle" phenomenon indicates that LLMs are most sensitive to information at the beginning and end of a prompt, while accuracy drops significantly for information placed in the center. * The "Constraint Difficulty Distribution Index" (CDDI) demonstrates that the order of instructions matters; models generally follow instructions better when difficult constraints are placed at the beginning of the prompt. * In complex system prompts where safety rules are buried in the middle, the model may fail to prioritize these guardrails, leading to inconsistent safety enforcement depending on the prompt's structure. ## The Butterfly Effect of Prompt Alterations * Small, seemingly insignificant changes to a system prompt—such as adding a single whitespace, a "Thank you" note, or changing the output format to JSON—can alter more than 10% of a model's predictions. * Modifying safety-related lines within a unified system prompt can cause "catastrophic performance collapse," where the model's internal reasoning path is diverted, affecting unrelated tasks. * Because LLMs treat every part of the prompt as a signal that moves their decision boundaries, managing safety and task logic in a single string makes the system brittle and difficult to iterate upon. To build robust and high-performing AI applications, developers should move away from bloated system prompts and instead implement external guardrails. This modular approach allows for precise security filtering without compromising the model's creative or logical capabilities.

line

AI 제품 개발 중 마주칠 수 있는 보안 위협 사례와 대책 방안 (opens in new tab)

Developing AI products introduces unique security vulnerabilities that extend beyond traditional software risks, ranging from package hallucinations to sophisticated indirect prompt injections. To mitigate these threats, organizations must move away from trusting LLM-generated content and instead implement rigorous validation, automated threat modeling, and input/output guardrails. The following summary details the specific risks and mitigation strategies identified by LY Corporation’s security engineering team. ## Slopsquatting and Package Hallucinations - AI models frequently hallucinate non-existent library or package names when providing coding instructions (e.g., suggesting `huggingface-cli` instead of the correct `huggingface_hub[cli]`). - Attackers exploit this by registering these hallucinated names on public registries to distribute malware to unsuspecting developers. - Mitigation requires developers to manually verify all AI-suggested commands and dependencies before execution in any environment. ## Prompt Injection and Arbitrary Code Execution - As seen in CVE-2024-5565 (Vanna AI), attackers can inject malicious instructions into prompts to force the application to execute arbitrary code. - This vulnerability arises when developers grant LLMs the autonomy to generate and run logic within the application context without sufficient isolation. - Mitigation involves treating LLM outputs as untrusted data, sanitizing user inputs, and strictly limiting the LLM's ability to execute system-level commands. ## Indirect Prompt Injection in Integrated AI - AI assistants integrated into office environments (like Gemini for Workspace) are susceptible to indirect prompt injections hidden within emails or documents. - A malicious email can contain "system-like" instructions that trick the AI into hiding content, redirecting users to phishing sites, or leaking data from other files. - Mitigation requires the implementation of robust guardrails that scan both the input data (the content being processed) and the generated output for instructional anomalies. ## Permission Risks in AI Agents and MCP - The use of Model Context Protocol (MCP) and coding agents creates risks where an agent might overstep its intended scope. - If an agent has broad access to a developer's environment, a malicious prompt in a public repository could trick the agent into accessing or leaking sensitive data (such as salary info or private keys) from a private repository. - Mitigation centers on the principle of least privilege, ensuring AI agents are restricted to specific, scoped directories and repositories. ## Embedding Inversion and Vector Store Vulnerabilities - Attacks targeting the retrieval phase of RAG (Retrieval-Augmented Generation) systems can lead to data leaks. - Embedding Inversion techniques may allow attackers to reconstruct original sensitive text from the vector embeddings stored in a database. - Securing AI products requires protecting the integrity of the vector store and ensuring that retrieved context does not bypass security filters. ## Automated Security Assessment Tools - To scale security, LY Corporation is developing internal tools like "ConA" for automated threat modeling and "LAVA" for automated vulnerability assessment. - These tools aim to identify AI-specific risks during the design and development phases rather than relying solely on manual reviews. Effective AI security requires a shift in mindset: treat every LLM response as a potential security risk. Developers should adopt automated threat modeling and implement strict input/output validation layers to protect both the application infrastructure and user data from evolving AI-based exploits.

line

한 달짜리 과제, 바이브 코딩으로 5일 만에!(ChatGPT·Cursor) (opens in new tab)

This blog post explores how LY Corporation reduced a month-long development task to just five days by leveraging "vibe coding" with Generative AI tools like ChatGPT and Cursor. By shifting from traditional, rigid documentation to an iterative, demo-first approach, developers can rapidly validate multiple UI/UX solutions for complex problems like restaurant menu registration. The author concludes that AI's ability to handle frequent re-work makes it more efficient to "build fast and iterate" than to aim for perfection through long-form specifications. ### Strategic Shift to Rapid Prototyping * Traditional development cycles (spec → design → dev → fix) are often too slow to keep up with market trends due to heavy documentation and impact analysis. * The "vibe coding" approach prioritizes creating "working demos" over perfect specifications to find "good enough" answers through rapid feedback loops. * AI reduces the psychological and logistical burden of "starting over," allowing developers to refine the context and quality of outputs through repeated interaction without the friction of manual re-documentation. ### Defining Requirements and Solution Ideation * Initial requirements are kept minimal, focusing only on the core mission, top priorities, and essential data structures (e.g., product name, image, description) to avoid limiting AI creativity. * ChatGPT is used to generate a wide range of solution candidates, which are then filtered into five distinct approaches: Stepper Wizards, Live Previews with Quick Add, Template/Cloning, Chat Input, and OCR-based photo scanning. * This stage emphasizes volume and variety, using AI-generated pros and cons to establish selection criteria and identify potential UX bottlenecks early in the process. ### Detailed Design and Multi-Solution Wireframing * Each of the five chosen solutions is expanded into detailed screen flows and UI elements, such as progress bars, bottom sheets, and validation logic. * Prompt engineering is used iteratively; if an AI-generated result lacks a specific feature like "temporary storage" or "mandatory field validation," the prompt is adjusted to regenerate the design instantly. * The focus remains on defining the "what" (UI elements) and "how" (user flow) through textual descriptions before moving to actual coding. ### Implementation with Cursor and Flutter * Cursor is utilized to generate functional code based on the refined wireframes, using Flutter as the framework to ensure rapid cross-platform development for both iOS and Android. * The development follows a "skeleton-first" approach: first creating a main navigation hub with five entry points, then populating each individual solution module one by one. * Technical architecture decisions, such as using Riverpod for state management or SQLite for data storage, are layered onto the demo post-hoc, reversing the traditional "stack-first" development order to prioritize functional validation. ### Recommendation To maximize efficiency, developers should treat AI as a partner for high-speed iteration rather than a one-shot tool. By focusing on creating functional demos quickly and refining them through direct feedback, teams can bypass the bottlenecks of traditional software requirements and deliver user-centric products in a fraction of the time.

line

IUI 2025 참관기: AI의 지속성과 인간 중심의 AI에 대해서 (opens in new tab)

The IUI 2025 conference highlighted a significant shift in the AI landscape, moving away from a sole focus on model performance toward "human-centered AI" that prioritizes collaboration, ethics, and user agency. The prevailing consensus across key sessions suggests that for AI to be sustainable and trustworthy, it must transcend simple automation to become a tool that augments human perception and decision-making through transparent, interactive, and socially aware design. ## Reality Design and Human Augmentation The concept of "Reality Design" suggests that Human-Computer Interaction (HCI) research must expand beyond screen-based interfaces to design reality itself. As AI, sensors, and wearables become integrated into daily life, technology can be used to directly augment human perception, cognition, and memory. * Memory extension: Systems can record and reconstruct personal experiences, helping users recall details in educational or professional settings. * Sensory augmentation: Technologies like selective hearing or slow-motion visual playback can enhance a user's natural observational powers. * Cognitive balance: While AI can assist with task difficulty (e.g., collaborative Lego building), designers must ensure that automation does not erode the human will to learn or remember, echoing historical warnings about technology-induced "forgetfulness." ## Bridging the Socio-technical Gap in AI Transparency Transparency in AI, particularly for high-risk areas like finance or medicine, should not be limited to showing mathematical model weights. Instead, it must bridge the gap between technical complexity and human understanding by focusing on user goals and social contexts. * Multi-faceted communication: Effective transparency involves model reporting (Model Cards), sharing safety evaluation results, and providing linguistic or visual cues for uncertainty rather than just numerical scores. * Counterfactual explanations: Users gain better trust when they can see how a decision might have changed if specific input conditions were different. * Interaction-based transparency: Transparency must be coupled with control, allowing users to act as "adjusters" who provide feedback that the model then reflects in its future outputs. ## Interactive Machine Learning and Human-in-the-Loop The framework of Interactive Machine Learning (IML) challenges the traditional view of AI as a static black box trained on fixed data. Instead, it proposes an interactive loop where the user and the model grow together through continuous feedback. * User-driven training: Users should be able to inspect model classifications, correct errors, and have those corrections immediately influence the model's learning path. * Beyond automation: This approach reframes AI from a replacement for human labor into a collaborative partner that adapts to specific user behaviors and professional expertise. * Impact on specialized tools: Modern applications include educational platforms where students manipulate data directly and research tools that integrate human intuition into large-scale data analysis. ## Collaborative Systems in Specialized Professional Contexts Practical applications of human-centered AI are being realized in sensitive fields like child counseling, where AI assists experts without replacing the human element. * Counselor-AI transcription: Systems designed for counseling analysis allow AI to handle the heavy lifting of transcription while counselors manage the nuance and contextual editing. * Efficiency through partnership: By focusing on reducing administrative burdens, these systems enable professionals to spend more time on high-level cognitive tasks and emotional support, demonstrating the value of AI as a supportive infrastructure. The future of AI development requires moving beyond isolated technical optimization to embrace the complexity of the human experience. Organizations and developers should focus on creating systems where transparency is a tool for "appropriate trust" and where design is focused on empowering human capabilities rather than simply automating them.

line

LY Corporation의 AI 기술의 현재, Tech-Verse 2025 후기 (opens in new tab)

Tech-Verse 2025 showcased LY Corporation’s strategic shift toward an AI-integrated ecosystem following the merger of LINE and Yahoo Japan. The event focused on the practical hurdles of deploying generative AI, concluding that the transition from experimental models to production-ready services requires sophisticated evaluation frameworks and deep contextual integration into developer workflows. ## AI-Driven Engineering with Ark Developer LY Corporation’s internal "Ark Developer" solution demonstrates how AI can be embedded directly into the software development life cycle. * The system utilizes a Retrieval-Augmented Generation (RAG) based code assistant to handle tasks such as code completion, security reviews, and automated test generation. * Rather than treating codebases as simple text documents, the tool performs graph analysis on directory structures to maintain structural context during code synthesis. * Real-world application includes a seamless integration with GitHub for automated Pull Request (PR) creation, with internal users reporting higher satisfaction compared to off-the-shelf tools like GitHub Copilot. ## Quantifying Quality in Generative AI A significant portion of the technical discussion centered on moving away from subjective "vibes-based" assessments toward rigorous, multi-faceted evaluation of AI outputs. * To measure the quality of generated images, developers utilized traditional metrics like Fréchet Inception Distance (FID) and Inception Score (IS) alongside LAION’s Aesthetic Score. * Advanced evaluation techniques were introduced, including CLIP-IQA, Q-Align, and Visual Question Answering (VQA) based on video-language models to analyze image accuracy. * Technical challenges in image translation and inpainting were highlighted, specifically the difficulty of restoring layout and text structures naturally after optical character recognition (OCR) and translation. ## Global Technical Exchange and Implementation The conference served as a collaborative hub for engineers across Japan, Taiwan, and Korea to discuss the implementation of emerging standards like the Model Context Protocol (MCP). * Sessions emphasized the "how-to" of overcoming deployment hurdles rather than just following technical trends. * Poster sessions (Product Street) and interactive Q&A segments allowed developers to share localized insights on LLM agent performance and agentic workflows. * The recurring theme across diverse teams was that the "evaluation and verification" stage is now the primary driver of quality in generative AI services. For organizations looking to scale AI, the key recommendation is to move beyond simple implementation and invest in "evaluation-driven development." By building internal tools that leverage graph-based context and quantitative metrics like Aesthetic Scores and VQA, teams can ensure that generative outputs meet professional service standards.

line

AI와 글쟁이의 동행: 코드 주면 API 레퍼런스 써드려요 (opens in new tab)

LY Corporation is addressing the chronic shortage of high-quality technical documentation by treating the problem as an engineering challenge rather than a training issue. By utilizing Generative AI to automate the creation of API references, the Document Engineering team has transitioned from a "manual craftsmanship" approach to an "industrialized production" model. While the system significantly improves efficiency and maintains internal context better than generic tools, the team concludes that human verification remains essential due to the high stakes of API accuracy. ### Contextual Challenges with Generic AI Standard coding assistants like GitHub Copilot often fail to meet the specific documentation needs of a large organization. * Generic tools do not adhere to internal company style guides or maintain consistent terminology across projects. * Standard AI lacks awareness of internal technical contexts; for example, generic AI might mistake a company-specific identifier like "MID" for "Member ID," whereas the internal tool understands its specific function within the LY ecosystem. * Fragmented deployment processes across different teams make it difficult for developers to find a single source of truth for API documentation. ### Multi-Stage Prompt Engineering To ensure high-quality output without overwhelming the LLM's "memory," the team refined a complex set of instructions into a streamlined three-stage workflow. * **Language Recognition:** The system first identifies the programming language and specific framework being used. * **Contextual Analysis:** It analyzes the API's logic to generate relevant usage examples and supplemental technical information. * **Detail Generation:** Finally, it writes the core API descriptions, parameter definitions, and response value explanations based on the internal style guide. ### Transitioning to Model Context Protocol (MCP) While the prototype began as a VS Code extension, the team shifted to using the Model Context Protocol (MCP) to ensure the tool was accessible across various development environments. * Moving to MCP allows the tool to support multiple IDEs, including IntelliJ, which was a high-priority request from the developer community. * The MCP architecture decouples the user interface from the core logic, allowing the "host" (like the IDE) to handle UI interactions and parameter inputs. * This transition reduced the maintenance burden on the Document Engineering team by removing the need to build and update custom UI components for every IDE. ### Performance and the Accuracy Gap Evaluation of the AI-generated documentation showed strong results, though it highlighted the unique risks of documenting APIs compared to other forms of writing. * Approximately 88% of the AI-generated comments met the team's internal evaluation criteria. * The specialized generator outperformed GitHub Copilot in 78% of cases regarding style and contextual relevance. * The team noted that while a 99% accuracy rate is excellent for a blog post, a single error in a short API reference can render the entire document useless for a developer. To successfully implement AI-driven documentation, organizations should focus on building tools that understand internal business logic while maintaining a strict "human-in-the-loop" workflow. Developers should use these tools to generate the bulk of the content but must perform a final technical audit to ensure the precision that only a human author can currently guarantee.