infrastructure

4 posts

woowahan

Delivering the Future: Global (opens in new tab)

The Global Hackathon 2025 served as a massive collaborative initiative to unite over 270 technical employees from seven global entities under DeliveryHero’s umbrella, including Woowa Brothers. By leveraging the community-building expertise of the Woowahan DevRel team, the event successfully bridged geographical and technical gaps to foster innovation in "Delivering the Future." The hackathon concluded with high-level recognition from global leadership and a strategic partnership with Google Cloud, demonstrating the power of synchronized global technical synergy. ## Strategic Planning and Global Coordination * The event adopted a hybrid "Base Camp" model, where participants worked from their local entity offices while staying connected through 24-hour live streaming and centralized online channels. * Organizers meticulously navigated the logistical hurdles of spanning 70 countries, including coordinating across vastly different time zones and respecting local public holidays and vacation seasons. * Efficiency was maintained through a decentralized communication strategy, using entity-specific meetings and comprehensive guidebooks rather than frequent global meetings to prevent "meeting fatigue" across time zones. ## Technical Infrastructure and Regulatory Compliance * To accommodate diverse technical preferences, the infrastructure had to support various stacks, including AWS, Google Cloud Platform (GCP), and specific machine learning models. * The central organization team addressed complex regulatory challenges, ensuring all sandbox environments complied with strict global security standards and GDPR (EU General Data Protection Regulation). * A strategic partnership with Google Cloud provided a standardized Google AI-based environment, enabling teams to experiment rapidly with mature tools and cloud-native services. ## Local Operations and Cross-Entity Collaboration * Physical office spaces were transformed into immersive hackathon hubs to maintain the high-intensity atmosphere characteristic of offline coding marathons. * The event encouraged "office sharing" between entities located in the same city and even supported travel for members to join different regional base camps, fostering a truly global networking culture. * Local supporters used standardized checklists and operational frameworks to ensure a consistent experience for participants, whether they were in Seoul, Berlin, or Dubai. Building a successful global technical event requires a delicate balance between centralized infrastructure and local autonomy. For organizations operating across multiple regions, investing in shared technical sandboxes and robust communication frameworks is essential for turning fragmented local talent into a unified global innovation engine.

daangn

Easily Operating Karrot (opens in new tab)

This blog post by the Daangn (Karrot) search platform team details their journey in optimizing Elasticsearch operations on Kubernetes (ECK). While their initial migration to ECK reduced deployment times, the team faced critical latency spikes during rolling restarts due to "cold caches" and high traffic volumes. To achieve a "deploy anytime" environment, they developed a data node warm-up system to ensure nodes are performance-ready before they begin handling live search requests. ## Scaling Challenges and Operational Constraints - Over two years, Daangn's search infrastructure expanded from a single cluster to four specialized clusters, with peak traffic jumping from 1,000 to over 10,000 QPS. - The initial strategy of "avoiding peak hours" for deployments became a bottleneck, as the window for safe updates narrowed while total deployment time across all clusters exceeded six hours. - Manual monitoring became a necessity rather than an option, as engineers had to verify traffic conditions and latency graphs before and during every ArgoCD sync. ## The Hazards of Rolling Restarts in Elasticsearch - Standard Kubernetes rolling restarts are problematic for stateful systems because a "Ready" Pod does not equate to a "Performant" Pod; Elasticsearch relies heavily on memory-resident caches (page cache, query cache, field data cache). - A version update in the Elastic Operator once triggered an unintended rolling restart that caused a 60% error rate and 3-second latency spikes because new nodes had to fetch all data from disk. - When a node restarts, the cluster enters a "Yellow" state where remaining replicas must handle 100% of the traffic, creating a single point of failure and increasing the load on the surviving nodes. ## Strategy for Reliable Node Warm-up - The primary goal was to reach a state where p99 latency remains stable during restarts, regardless of whether the deployment occurs during peak traffic hours. - The solution involves a "Warm-up System" designed to pre-load frequently accessed data into the filesystem and Elasticsearch internal caches before the node is allowed to join the load balancer. - By executing representative search queries against a newly started node, the system ensures that the necessary segments are already in the page cache, preventing the disk I/O thrashing that typically follows a cold start. ## Implementation Goals - Automate the validation of node readiness beyond simple health checks to include performance readiness. - Eliminate the need for human "eyes-on-glass" monitoring during the 90-minute deployment cycles. - Maintain high availability and consistent user experience even when shards are being reallocated and replicas are temporarily unassigned. To maintain a truly resilient search platform on Kubernetes, it is critical to recognize that for stateful applications, "available" is not the same as "ready." Implementing a customized warm-up controller or logic is a recommended practice for any high-traffic Elasticsearch environment to decouple deployment schedules from traffic patterns.

line

LY's Tech Conference, 'Tech (opens in new tab)

LY Corporation’s Tech-Verse 2025 conference highlighted the company's strategic pivot toward becoming an AI-centric organization through the "Catalyst One Platform" initiative. By integrating the disparate infrastructures of LINE and Yahoo! JAPAN into a unified private cloud, the company aims to achieve massive cost efficiencies while accelerating the deployment of AI agents across its entire service ecosystem. This transformation focuses on empowering engineers with AI-driven development tools to foster rapid innovation and deliver a seamless, "WOW" experience for global users. ### Infrastructure Integration and the Catalyst One Platform To address the redundancies following the merger of LINE and Yahoo! JAPAN, LY Corporation is consolidating its technical foundations into a single internal ecosystem known as the Catalyst One Platform. * **Private Cloud Advantage:** The company maintains its own private cloud to achieve a four-fold cost reduction compared to public cloud alternatives, managed by a lean team of 700 people supporting 500,000 servers. * **Unified Architecture:** The integration spans several layers, including Infrastructure (Project "DC-Hub"), Cloud (Project "Flava"), and specialized Data and AI platforms. * **Next-Generation Cloud "Flava":** This platform integrates existing services to enhance VM specifications, VPC networking, and high-performance object storage (Ceph and Dragon). * **Information Security:** A dedicated "SafeOps" framework is being implemented to provide governance and security across all integrated services, ensuring a safer environment for user data. ### AI Strategy and Service Agentization A core pillar of LY’s strategy is the "AI Agentization" of all its services, moving beyond simple features to proactive, personalized assistance. * **Scaling GenAI:** Generative AI has already been integrated into 44 different services within the group. * **Personalized Agents:** The company is developing the capacity to generate millions of specialized agents that can be linked together to support the unique needs of individual users. * **Agent Ecosystem:** The goal is to move from a standard platform model to one where every user interaction is mediated by an intelligent agent. ### AI-Driven Development Transformation Beyond user-facing services, LY is fundamentally changing how its engineers work by deploying internal AI development solutions to all staff starting in July. * **Code and Test Automation:** Proof of Concept (PoC) results showed a 96% accuracy rate for "Code Assist" and a 97% reduction in time for "Auto Test" procedures. * **RAG Integration:** The system utilizes Retrieval-Augmented Generation (RAG) to leverage internal company knowledge and guidelines, ensuring high-quality, context-aware development support. * **Efficiency Gains:** By automating repetitive tasks, the company intends for engineers to shift their focus from maintenance to creative service improvement and innovation. The successful integration of these platforms and the aggressive adoption of AI-driven development tools suggest that LY Corporation is positioning itself to be a leader in the "AI-agent" era. For technical organizations, LY's model serves as a case study in how large-scale mergers can leverage private cloud infrastructure to fund and accelerate a company-wide AI transition.

line

Hosting the Tech Conference Tech-Verse (opens in new tab)

LY Corporation is hosting its global technology conference, Tech-Verse 2025, on June 30 and July 1 to showcase the engineering expertise of its international teams. The event features 127 sessions centered on core themes of AI and security, offering a deep dive into how the group's developers, designers, and product managers solve large-scale technical challenges. Interested participants can register for free on the official website to access the online live-streamed sessions, which include real-time interpretation in English, Korean, and Japanese. ### Conference Overview and Access * The event runs for two days, from 10:00 AM to 6:00 PM (KST), and is primarily delivered via online streaming. * Registration is open to the public at no cost through the Tech-Verse 2025 official website. * The conference brings together technical talent from across the LY Corporation Group, including LINE Plus, LINE Taiwan, and LINE Vietnam. ### Multi-Disciplinary Technical Tracks * The agenda is divided into 12 distinct categories to cover the full spectrum of software development and product lifecycle. * Day 1 focuses on foundational technologies: AI, Security, Server-side development, Private Cloud, Infrastructure, and Data Platforms. * Day 2 explores application and management layers: AI Use Cases, Frontend, Mobile Applications, Design, Product Management, and Engineering Management. ### Key Engineering Case Studies and Sessions * **AI and Data Automation:** Sessions explore the evolution of development processes using AI, the shift from "Vibe Coding" to professional AI-assisted engineering, and the use of Generative AI to automate data pipelines. * **Infrastructure and Scaling:** Presentations include how the "Central Dogma Control Plane" connects thousands of services within LY Corporation and methods for improving video playback quality for LINE Call. * **Framework Migration:** A featured case study details the strategic transition of the "Demae-can" service from React Native to Flutter. * **Product Insights:** Deep dives into user experience design and data-driven insights gathered from LINE Talk's global user base. Tech-Verse 2025 provides a valuable opportunity for developers to learn from real-world deployments of AI and large-scale infrastructure. Given the breadth of the 127 sessions and the availability of real-time translation, tech professionals should review the timetable in advance to prioritize tracks relevant to their specific engineering interests.