라인 / k8s

8 posts

line

LY Corporation의 클라우드 인프라 개편: 거대한 두 개의 클라우드를 통합한 차세대 플랫폼 Flava의 아키텍처 소개 (opens in new tab)

안녕하세요. LY Corporation에서 프라이빗 클라우드 인프라를 담당하고 있는 이노우에입니다. LY Corporation의 방대한 트래픽과 데이터를 지탱하는 것은 저희가 직접 개발해 운영하고 있는 대규모 프라이빗 클라우드입니다. 현재 저희는 구 LINE Corporation에서 사용하던 'Verda'와 구 Yahoo Japan Corporation에서 사용하던 'YNW(IaaS(infrastructure as a service))'라는 두 거대한 클라우드 기반을 차세대 클라우드 기반인 Fl…

line

엔터프라이즈 LLM 서비스 구축기 2: 에이전트 엔지니어링 (opens in new tab)

들어가며 지난 엔터프라이즈 LLM 서비스 구축기 1: 컨텍스트 엔지니어링에서는 260개의 도구와 수백 페이지의 문서를 다루는 환경에서 LLM에게 필요한 정보만 골라서 제공하는 '점진적 공개' 전략을 공유해 드렸습니다. 1편이 AI에게 '무엇을 전달할 것인가?'에 대한 답이었다면, 이번 2편은 그 다음 질문으로 넘어갑니다. 정제된 맥락을 전달받는 에이전트를 '어떻게 만들 것인가?'입니다. 본격적인 이야기에 앞서, 먼저 현재 Flava AI 어시스턴트(이하 FAA)의 실전 성적표를 공개합니다. 저희…

line

Claude Code Action: 조직 전반의 코드 품질을 지키는 AI 코드 리뷰 플랫폼화 (opens in new tab)

들어가며 안녕하세요. LINE NEXT DevOps 팀에서 일하고 있는 이동원입니다. 저는 쿠버네티스 기반 인프라 운영과 CI/CD 구축, 모니터링 및 장애 대응 등 인프라 운영 관리 전반의 업무를 담당하고 있으며, 최근에는 AI를 활용한 개발 생산성 향상과 자동화에 깊은 관심을 두고 관련 학습과 실험을 병행하고 있습니다. 다양한 AI 모델과 도구를 테스트하며, 어떻게 하면 AI를 팀 전체의 개발 프로세스에 자연스럽게 통합할 수 있을지 고민하고 있습니다. 이번 글에서는 LINE NEXT에서 AI…

line

미래의 클라우드를 창조하다 (opens in new tab)

들어가며 안녕하세요. 개발 서비스용 프라이빗 클라우드를 담당하고 있는 Cloud Service CBU 박영희입니다. LY Corporation은 서비스 개발에 필요한 인프라와 플랫폼을 제공하기 위한 프라이빗 클라우드를 내부에서 구축해 사용하고 있으며, LY Corporation으로 합병 전에 Yahoo! JAPAN과 LINE에서 사용하던 클라우드 서비스를 하나로 통합하고 있습니다. 새로운 통합 프라이빗 클라우드의 이름은 'Flava'입니다. 이 글에서는 클라우드 산업 전체가 어떻게 진화할지 말씀…

line

Scaling to Infinity: 한계를 넘어서는 LY Corporation의 관측 가능성 플랫폼 진화기 (opens in new tab)

안녕하세요. LY Corporation Observability Infrastructure 팀에서 사내 시계열 데이터베이스(time-series database, TSDB)의 개발 및 운영을 맡고 있는 오기준입니다. LY Corporation의 사내 프라이빗 클라우드 플랫폼은 단순한 가상 머신(virtual machine)을 제공하는 것을 넘어 쿠버네티스(Kubernetes) 기반의 컨테이너 환경과 데이터베이스, 로드 밸런서(load balancer) 등 방대한 서비스 포트폴리오를 제공하고 있습…

line

Athenz 엔지니어는 왜 Kubestronaut에 도전했는가? (opens in new tab)

Security platform engineer Jung-woo Kim details his transition from a specialized Athenz developer to a "Kubestronaut," a prestigious CNCF designation awarded to those who master the entire Kubernetes ecosystem. By systematically obtaining five distinct certifications, he argues that deep, practical knowledge of container orchestration is essential for building secure, scalable access control systems in private cloud environments. His journey demonstrates that moving beyond application-level expertise to master cluster administration and security directly improves architectural design and operational troubleshooting. ## The Kubestronaut Framework * The title is awarded by the Cloud Native Computing Foundation (CNCF) to individuals who pass five specific certification exams: CKA, CKAD, CKS, KCNA, and KCSA. * The CKA (Administrator), CKAD (Application Developer), and CKS (Security Specialist) exams are performance-based, requiring candidates to solve real-world technical problems in a live terminal environment rather than answering multiple-choice questions. * Success in these exams demands a combination of deep technical knowledge, speed, and accuracy, as practitioners must configure clusters and resolve failures under strict time constraints. * The remaining Associate-level exams (KCNA and KCSA) provide a theoretical foundation in cloud-native security and ecosystem standards. ## A Progressive Path to Technical Mastery * **CKAD (Application Developer):** The initial focus was on mastering the deployment of Athenz—an open-source auth system—ensuring it runs efficiently from a developer's perspective. Preparation involved rigorous use of tools like killer.sh to simulate high-pressure environments. * **CKA (Administrator):** To manage multi-cluster environments and understand the underlying components that make Kubernetes function, the author moved to the administrator level, gaining insight into how various services interact within the cluster. * **CKS (Security Specialist):** Given his background in security, this was the most critical and difficult stage, focusing on cluster hardening, vulnerability analysis, and implementing strict network policies to ensure the entire infrastructure remains resilient. ## Organizational Impact and Open Source Governance * Obtaining these certifications provided a clearer understanding of open-source governance, specifically how Special Interest Groups (SIGs) and pull request (PR) workflows drive massive projects like Kubernetes. * This technical depth was applied to a high-stakes project providing Athenz services in a Bare Metal as a Service (BMaaS) environment, allowing for more stable and efficient architecture design. * The learning process was supported by corporate initiatives, including access to Udemy Business for technical training and a hybrid work culture that allowed for consistent, early-morning study habits. To achieve expert-level proficiency in complex systems like Kubernetes, engineers should adopt the "Ubo-cheonri" philosophy—making slow but steady progress. Starting with even one minute of study or a single GitHub commit per day can eventually lead to mastering the highest levels of cloud-native architecture. For those managing enterprise-grade infrastructure, pursuing the Kubestronaut path is highly recommended as it transforms theoretical knowledge into a broad, practical vision for system design.

line

Central Dogma 컨트롤 플레인으로 LY Corporation의 수천 개 서비스를 연결하기 (opens in new tab)

LY Corporation developed a centralized control plane using Central Dogma to manage service-to-service communication across its vast, heterogeneous infrastructure of physical machines, virtual machines, and Kubernetes clusters. By adopting the industry-standard xDS protocol, the new system resolves the interoperability and scaling limitations of their legacy platform while providing a robust GitOps-based workflow. This architecture enables the company to connect thousands of services with high reliability and sophisticated traffic control capabilities. ## Limitations of the Legacy System The previous control plane environment faced several architectural bottlenecks that hindered developer productivity and system flexibility: * **Tight Coupling:** The system was heavily dependent on a specific internal project management tool (PMC), making it difficult to support modern containerized environments like Kubernetes. * **Proprietary Schemas:** Communication relied on custom message schemas, which created interoperability issues between different clients and versions. * **Lack of Dynamic Registration:** The legacy setup could not handle dynamic endpoint registration effectively, functioning more as a static registry than a functional service mesh control plane. * **Limited Traffic Control:** It lacked the ability to perform complex routing tasks, such as canary releases or advanced client-side load balancing, across diverse infrastructures. ## Central Dogma as a Control Plane To solve these issues, the team leveraged Central Dogma, a Git-based repository service for textual configuration, to act as the foundation for a new control plane: * **xDS Protocol Integration:** The new control plane implements the industry-standard xDS protocol, ensuring seamless compatibility with Envoy and other modern data plane proxies. * **GitOps Workflow:** By utilizing Central Dogma’s mirroring features, developers can manage service configurations and traffic policies safely through Pull Requests in external Git repositories. * **High Reliability:** The system inherits Central Dogma’s native strengths, including multi-datacenter replication, high availability, and a robust authorization system. * **Schema Evolution:** The control plane automatically transforms legacy metadata into standard xDS resources, allowing for a smooth transition from old infrastructure to the new service mesh. ## Dynamic Service Discovery and Registration The architecture provides automated ways to manage service endpoints across different environments: * **Kubernetes Endpoint Plugin:** A dedicated plugin watches for changes in Kubernetes services and automatically updates the xDS resource tree in Central Dogma. * **Automated API Registration:** The system provides gRPC and HTTP APIs (e.g., `RegisterLocalityLbEndpoint`) that allow services to register themselves dynamically during the startup process. * **Advanced Traffic Features:** The new control plane supports sophisticated features like zone-aware routing, circuit breakers, automatic retries, and "slow start" mechanisms for new endpoints. ## Evolution Toward Sidecar-less Service Mesh A major focus of the project is improving the developer experience by reducing the operational overhead of the data plane: * **Sidecar-less Options:** The team is working toward providing service mesh benefits without requiring a sidecar proxy for every pod, which reduces resource consumption and simplifies debugging. * **Unified Control:** Central Dogma acts as a single source of truth for both proxy-based and proxyless service mesh configurations, ensuring consistent policy enforcement across the entire organization. For organizations managing large-scale, heterogeneous infrastructure, transitioning to an xDS-compliant control plane backed by a reliable Git-based configuration store is highly recommended. This approach balances the need for high-speed dynamic updates with the safety and auditability of GitOps, ultimately allowing for a more scalable and developer-friendly service mesh.

line

Nginx 설정 통합과 Loki 연동으로 설계한 유연한 멀티사이트 아키텍처 (opens in new tab)

LINE NEXT optimized its web server infrastructure by transitioning from fragmented, manual Nginx setups to a centralized native Nginx multi-site architecture. By integrating global configurations and automating the deployment pipeline with Ansible, the team successfully reduced service launch lead times by over 80% while regaining the ability to use advanced features like GeoIP and real client IP tracking. This evolution ensures that the infrastructure can scale to support over 100 subdomains across diverse global services with high reliability and minimal manual overhead. ## Evolution of Nginx Infrastructure * **PMC-based Structure**: The initial phase relied on a Project Management Console using `rsync` via SSH; this created security risks and led to fragmented, siloed configurations that were difficult to maintain. * **Ingress Nginx Structure**: To improve speed, the team moved to Kubernetes-based Ingress using Helm charts, which automated domain and certificate settings but limited the use of native Nginx modules and complicated the retrieval of real client IP addresses. * **Native Nginx Multi-site Structure**: The current hybrid approach utilizes native Nginx managed by Ansible, combining the speed of configuration-driven setups with the flexibility to use advanced modules like GeoIP and Loki for log collection. ## Configuration Integration and Multi-site Management * **Master Configuration Extraction**: Common directives such as `timeouts`, `keep-alive` settings, and `log formats` were extracted into a master Nginx configuration file to eliminate redundancy across services. * **Hierarchical Directory Structure**: Inspired by Apache, the team adopted a `sites-available` structure where individual `server` blocks for different services (alpha, beta, production) are managed in separate files. * **Operational Efficiency**: This integrated structure allows a single Nginx instance to serve multiple sites simultaneously, significantly reducing the time required to add and deploy new service domains. ## Automated Deployment with Ansible * **Standardized Workflow**: The team replaced manual processes with Ansible playbooks that handle everything from cloning the latest configuration from Git to extracting environment-specific files. * **Safety and Validation**: The automated pipeline includes mandatory Nginx syntax verification (`nginx -t`) and process status checks to ensure stability before a deployment is finalized. * **Rolling Deployments**: To minimize service impact, updates are pushed sequentially across servers; the process automatically halts if an error is detected at any stage of the rollout. To effectively manage a rapidly expanding portfolio of global services, infrastructure teams should move toward a "configuration-as-code" model that separates common master settings from service-specific logic. Leveraging automation tools like Ansible alongside a native Nginx multi-site structure provides the necessary balance between rapid deployment and the granular control required for complex logging and security requirements.