라인 / prompt-engineering

6 posts

line

엔터프라이즈 LLM 서비스 구축기 1: 컨텍스트 엔지니어링 (opens in new tab)

LY Corporation’s engineering team developed an AI assistant for their private cloud platform, Flava, by prioritizing "context engineering" over traditional prompt engineering. To manage a complex environment of 260 APIs and hundreds of technical documents, they implemented a strategy of progressive disclosure to ensure the LLM receives only the most relevant information for any given query. This approach allows the assistant to move beyond simple RAG-based document summarization to perform active diagnostics and resource management based on real-time API data. ### Performance Limitations of Long Contexts * Research indicates that LLM performance can drop by 13.9% to 85% as context length increases, even if the model technically supports a large token window. * The phenomenon of "context rot" occurs when low-quality or irrelevant information is mixed into the input, causing the model to generate confident but incorrect answers. * Because LLMs are stateless, maintaining conversation history and processing dense JSON responses from multiple APIs quickly exhausts context windows and degrades reasoning quality. ### Progressive Disclosure and Tool Selection * The system avoids loading all 260+ API definitions at once; instead, it analyzes the user's intent to select only the necessary tools, such as loading only Redis-related APIs when a user asks about a cluster. * Specific product usage hints, such as the distinction between private and CDN settings for Object Storage, are injected only when those specific services are invoked. * This phased approach significantly reduces token consumption and prevents the model from being overwhelmed by irrelevant technical specifications. ### Response Guidelines and the "Mock Tool Message" Strategy * The team distinguished between "System Prompts" (global rules) and "Response Guidelines" (situational instructions), such as directing users to a console UI before suggesting CLI commands. * Injecting specific guidelines into the system prompt often caused "instruction conflict," where the LLM might hallucinate information to satisfy a guideline while ignoring core requirements like using search tools. * To resolve these conflicts, the team utilized "ToolMessages" to inject guidelines; by formatting instructions as if they were results from a tool execution, the LLM treats the information as factual context rather than a command that might override the system prompt. To build a robust enterprise LLM service, developers should focus on dynamic context management rather than static prompt optimization. Treating operational guidelines as external data via mock tool messages, rather than system instructions, provides a scalable way to reduce hallucinations and maintain high performance across hundreds of integrated services.

line

사내 AI 리터러시를 향상하기 위한 AI Campus Day를 개최했습니다 (opens in new tab)

LY Corporation recently hosted "AI Campus Day," a large-scale internal event designed to bridge the gap between AI theory and practical workplace application for over 3,000 employees. By transforming their office into a learning campus, the company successfully fostered a culture of "AI Transformation" through peer-led mentorship and task-specific experimentation. The event demonstrated that internal context and hands-on participation are far more effective than traditional external lectures for driving meaningful AI literacy and productivity gains. ## Hands-on Experience and Technical Support * The curriculum featured 10 specialized sessions across three tracks—Common, Creative, and Engineering—to ensure relevance for every job function. * Sessions ranged from foundational prompt engineering for non-developers to advanced technical topics like building Model Context Protocol (MCP) servers for engineers. * To ensure smooth execution, the organizers provided comprehensive "Session Guides" containing pre-configured account settings and specific prompt templates. * The event utilized a high support ratio, with 26 teaching assistants (TAs) available to troubleshoot technical hurdles in real-time and dedicated Slack channels for sharing live AI outputs. ## Peer-Led Mentorship and Internal Context * Instead of hiring external consultants, the program featured 10 internal "AI Mentors" who shared how they integrated AI into their actual daily workflows at LY Corporation. * Training focused exclusively on company-approved tools, including ChatGPT Enterprise, Gemini, and Claude Code, ensuring all demonstrations complied with internal security protocols. * Internal mentors were able to provide specific "company context" that external lecturers lack, such as integrating AI with existing proprietary systems and data. * A rigorous three-stage quality control process—initial flow review, final end-to-end dry run, and technical rehearsal—was implemented to ensure the educational quality of mentor-led sessions. ## Gamification and Cultural Engagement * The event was framed as a "festival" rather than a mandatory training, using campus-themed motifs like "enrollment" and "school attendance" to reduce psychological barriers. * A "Stamp Rally" system encouraged participation by offering tiered rewards, including welcome kits, refreshments, and subscriptions to premium AI tools. * Interactive exhibition booths allowed employees to experience AI utility firsthand, such as an AI photo zone using Gemini to generate "campus-style" portraits and an AI Agent Contest booth. * Strong executive support played a crucial role, with leadership encouraging staff to pause routine tasks for the day to focus entirely on AI experimentation and "playing" with new technologies. To effectively scale AI literacy within a large organization, it is recommended to move away from passive, one-size-fits-all lectures. Success lies in leveraging internal experts who understand the specific security and operational constraints of the business, and creating a low-pressure environment where employees can experiment with hands-on tasks relevant to their specific roles.

line

안전은 기본, 비용 절감은 덤: AI 서비스에 별도 가드레일이 필요한 이유 (opens in new tab)

AI developers often rely on system prompts to enforce safety rules, but this integrated approach frequently leads to "over-refusal" and unpredictable shifts in model performance. To ensure both security and operational efficiency, it is increasingly necessary to decouple safety mechanisms into separate guardrail systems that operate independently of the primary model's logic. ## Negative Impact on Model Utility * Integrating safety instructions directly into system prompts often leads to a high False Positive Rate (FPR), where the model rejects harmless requests alongside harmful ones. * Technical analysis using Principal Component Analysis (PCA) reveals that guardrail prompts shift the model's embedding results in a consistent direction toward refusal, regardless of the input's actual intent. * Studies show that aggressive safety prompting can cause models to refuse benign technical queries—such as "how to kill a Python process"—because the model adopts an overly conservative decision boundary. ## Positional Bias and Context Neglect * Research on the "Lost in the Middle" phenomenon indicates that LLMs are most sensitive to information at the beginning and end of a prompt, while accuracy drops significantly for information placed in the center. * The "Constraint Difficulty Distribution Index" (CDDI) demonstrates that the order of instructions matters; models generally follow instructions better when difficult constraints are placed at the beginning of the prompt. * In complex system prompts where safety rules are buried in the middle, the model may fail to prioritize these guardrails, leading to inconsistent safety enforcement depending on the prompt's structure. ## The Butterfly Effect of Prompt Alterations * Small, seemingly insignificant changes to a system prompt—such as adding a single whitespace, a "Thank you" note, or changing the output format to JSON—can alter more than 10% of a model's predictions. * Modifying safety-related lines within a unified system prompt can cause "catastrophic performance collapse," where the model's internal reasoning path is diverted, affecting unrelated tasks. * Because LLMs treat every part of the prompt as a signal that moves their decision boundaries, managing safety and task logic in a single string makes the system brittle and difficult to iterate upon. To build robust and high-performing AI applications, developers should move away from bloated system prompts and instead implement external guardrails. This modular approach allows for precise security filtering without compromising the model's creative or logical capabilities.

line

한 달짜리 과제, 바이브 코딩으로 5일 만에!(ChatGPT·Cursor) (opens in new tab)

This blog post explores how LY Corporation reduced a month-long development task to just five days by leveraging "vibe coding" with Generative AI tools like ChatGPT and Cursor. By shifting from traditional, rigid documentation to an iterative, demo-first approach, developers can rapidly validate multiple UI/UX solutions for complex problems like restaurant menu registration. The author concludes that AI's ability to handle frequent re-work makes it more efficient to "build fast and iterate" than to aim for perfection through long-form specifications. ### Strategic Shift to Rapid Prototyping * Traditional development cycles (spec → design → dev → fix) are often too slow to keep up with market trends due to heavy documentation and impact analysis. * The "vibe coding" approach prioritizes creating "working demos" over perfect specifications to find "good enough" answers through rapid feedback loops. * AI reduces the psychological and logistical burden of "starting over," allowing developers to refine the context and quality of outputs through repeated interaction without the friction of manual re-documentation. ### Defining Requirements and Solution Ideation * Initial requirements are kept minimal, focusing only on the core mission, top priorities, and essential data structures (e.g., product name, image, description) to avoid limiting AI creativity. * ChatGPT is used to generate a wide range of solution candidates, which are then filtered into five distinct approaches: Stepper Wizards, Live Previews with Quick Add, Template/Cloning, Chat Input, and OCR-based photo scanning. * This stage emphasizes volume and variety, using AI-generated pros and cons to establish selection criteria and identify potential UX bottlenecks early in the process. ### Detailed Design and Multi-Solution Wireframing * Each of the five chosen solutions is expanded into detailed screen flows and UI elements, such as progress bars, bottom sheets, and validation logic. * Prompt engineering is used iteratively; if an AI-generated result lacks a specific feature like "temporary storage" or "mandatory field validation," the prompt is adjusted to regenerate the design instantly. * The focus remains on defining the "what" (UI elements) and "how" (user flow) through textual descriptions before moving to actual coding. ### Implementation with Cursor and Flutter * Cursor is utilized to generate functional code based on the refined wireframes, using Flutter as the framework to ensure rapid cross-platform development for both iOS and Android. * The development follows a "skeleton-first" approach: first creating a main navigation hub with five entry points, then populating each individual solution module one by one. * Technical architecture decisions, such as using Riverpod for state management or SQLite for data storage, are layered onto the demo post-hoc, reversing the traditional "stack-first" development order to prioritize functional validation. ### Recommendation To maximize efficiency, developers should treat AI as a partner for high-speed iteration rather than a one-shot tool. By focusing on creating functional demos quickly and refining them through direct feedback, teams can bypass the bottlenecks of traditional software requirements and deliver user-centric products in a fraction of the time.

line

자네, 해커가 되지 않겠나? Hack Day 2025에 다녀왔습니다! (opens in new tab)

Hack Day 2025 serves as a cornerstone of LY Corporation’s engineering culture, bringing together diverse global teams to innovate beyond their daily operational scopes. By fostering a high-intensity environment focused on creative freedom, the event facilitates technical growth and strengthens interpersonal bonds across international branches. This 19th edition demonstrated how rapid prototyping and cross-functional collaboration can transform abstract ideas into functional AI-driven prototypes within a strict 24-hour window. ### Structure and Participation Dynamics * The hackathon follows a "9 to 9" format, providing exactly 24 hours of development time followed by a day for presentations and awards. * Participation is inclusive of all roles, including developers, designers, planners, and HR staff, allowing for holistic product development. * Teams can be "General Teams" from the same legal entity or "Global Mixed Teams" comprising members from different regions like Korea, Japan, Taiwan, and Vietnam. * The Developer Relations (DevRel) team facilitates team building for remote employees using digital collaboration tools like Zoom and Miro. ### AI-Powered Personality Analysis Project * The author's team developed a "Scouter" program inspired by Dragon Ball, designed to measure professional "combat power" based on communication history. * The system utilizes Slack bots and AI models to analyze message logs and map them to the Big 5 Personality traits (Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism). * Professional metrics are visualized as game-like character statistics to make personality insights engaging and less intimidating. * While the original plan involved using AI to generate and print physical character cards, hardware failures with photo printers forced a technical pivot to digital file downloads. ### High-Pressure Presentation and Networking * Every team is allotted a strict 90-second window to pitch their product and demonstrate a live demo. * The "90-second rule" includes a mandatory microphone cutoff to maintain momentum and keep the large-scale event engaging for all attendees. * Dedicated booth sessions follow the presentations, allowing participants to provide hands-on experiences to colleagues and judges. * The event emphasizes "Perfect the Details," a core company value, by encouraging teams to utilize all available resources—from whiteboards to AI image generators—within the time limit. ### Environmental Support and Culture * The event occupies an entire office floor, providing a high-density yet comfortable environment designed to minimize distractions during the "Hack Time." * Cultural exchange is encouraged through "humanity snacks," where participants from different global offices share local treats in dedicated rest areas. * Strategic scheduling, such as "Travel Days" for international participants, ensures that teams can focus entirely on technical execution once the event begins. Participating in internal hackathons provides a vital platform for testing new technologies—like LLMs and personality modeling—that may not fit into immediate product roadmaps. For organizations with hybrid work models, these intensive in-person events are highly recommended to bridge the communication gap and build lasting trust between global teammates.

line

AI와 글쟁이의 동행: 코드 주면 API 레퍼런스 써드려요 (opens in new tab)

LY Corporation is addressing the chronic shortage of high-quality technical documentation by treating the problem as an engineering challenge rather than a training issue. By utilizing Generative AI to automate the creation of API references, the Document Engineering team has transitioned from a "manual craftsmanship" approach to an "industrialized production" model. While the system significantly improves efficiency and maintains internal context better than generic tools, the team concludes that human verification remains essential due to the high stakes of API accuracy. ### Contextual Challenges with Generic AI Standard coding assistants like GitHub Copilot often fail to meet the specific documentation needs of a large organization. * Generic tools do not adhere to internal company style guides or maintain consistent terminology across projects. * Standard AI lacks awareness of internal technical contexts; for example, generic AI might mistake a company-specific identifier like "MID" for "Member ID," whereas the internal tool understands its specific function within the LY ecosystem. * Fragmented deployment processes across different teams make it difficult for developers to find a single source of truth for API documentation. ### Multi-Stage Prompt Engineering To ensure high-quality output without overwhelming the LLM's "memory," the team refined a complex set of instructions into a streamlined three-stage workflow. * **Language Recognition:** The system first identifies the programming language and specific framework being used. * **Contextual Analysis:** It analyzes the API's logic to generate relevant usage examples and supplemental technical information. * **Detail Generation:** Finally, it writes the core API descriptions, parameter definitions, and response value explanations based on the internal style guide. ### Transitioning to Model Context Protocol (MCP) While the prototype began as a VS Code extension, the team shifted to using the Model Context Protocol (MCP) to ensure the tool was accessible across various development environments. * Moving to MCP allows the tool to support multiple IDEs, including IntelliJ, which was a high-priority request from the developer community. * The MCP architecture decouples the user interface from the core logic, allowing the "host" (like the IDE) to handle UI interactions and parameter inputs. * This transition reduced the maintenance burden on the Document Engineering team by removing the need to build and update custom UI components for every IDE. ### Performance and the Accuracy Gap Evaluation of the AI-generated documentation showed strong results, though it highlighted the unique risks of documenting APIs compared to other forms of writing. * Approximately 88% of the AI-generated comments met the team's internal evaluation criteria. * The specialized generator outperformed GitHub Copilot in 78% of cases regarding style and contextual relevance. * The team noted that while a 99% accuracy rate is excellent for a blog post, a single error in a short API reference can render the entire document useless for a developer. To successfully implement AI-driven documentation, organizations should focus on building tools that understand internal business logic while maintaining a strict "human-in-the-loop" workflow. Developers should use these tools to generate the bulk of the content but must perform a final technical audit to ensure the precision that only a human author can currently guarantee.