coupang

Optimizing the inbound process with a machine learning model (opens in new tab)

Coupang optimized its fulfillment center inbound process by implementing a machine learning model to predict the exact number of delivery trucks and dock slots required for vendor shipments. By moving away from manual estimates, the system minimizes resource waste from over-allocation while preventing processing delays caused by under-prediction. This automated approach ensures that the limited capacity of fulfillment center docks is utilized with maximum efficiency. ### The Challenges of Dock Slot Allocation * Fulfillment centers operate with a fixed number of hourly "slots," representing the time and space a single truck occupies at a dock to unload goods. * Inaccurate slot forecasting creates a binary risk: under-prediction leads to logistical bottlenecks and delivery delays, while over-prediction results in idle docks and wasted operational overhead. * The diversity of vendor behaviors and product types makes manual estimation of truck requirements highly inconsistent across the supply chain. ### Predictive Modeling and Feature Engineering * Coupang utilized years of historical logistics data to extract features influencing truck counts, including product dimensions, categories, and vendor-specific shipment patterns. * The system employs the LightGBM algorithm, a gradient-boosting framework selected for its high performance and ability to handle large-scale tabular logistics data. * Hyperparameter tuning is managed via Bayesian optimization, which efficiently searches the parameter space to minimize prediction error. * The model accounts for the inherent trade-off between under-prediction and over-prediction, prioritizing a balance that maintains high throughput without straining labor resources. ### System Integration and Real-time Processing * The trained ML model is integrated directly into the inbound reservation system, providing vendors with an immediate prediction of required slots during the request process. * By automating the truck-count calculation, the system removes the burden of estimation from vendors and ensures consistency across different fulfillment centers. * This integration allows Coupang to dynamically adjust its dock capacity planning based on real-time data rather than static, historical averages. To maximize logistics efficiency, organizations should leverage granular product data and historical vendor behavior to automate capacity planning. Integrating predictive models directly into the reservation workflow ensures that data-driven insights are applied at the point of action, reducing human error and resource waste.

coupang

Accelerating Coupang’s AI Journey with LLMs (opens in new tab)

Coupang is strategically evolving its machine learning infrastructure to integrate Large Language Models (LLMs) and foundation models across its e-commerce ecosystem. By transitioning from task-specific deep learning models to multi-modal transformers, the company aims to enhance customer experiences in search, recommendations, and logistics. This shift necessitates a robust ML platform capable of handling the massive compute, networking, and latency demands inherent in generative AI. ### Core Machine Learning Domains Coupang’s existing ML ecosystem is built upon three primary pillars that drive business logic: * **Recommendation Systems:** These models leverage vast datasets of user interactions—including clicks, purchases, and relevance judgments—to power home feeds, search results, and advertising. * **Content Understanding:** Utilizing deep learning to process product catalogs, user reviews, and merchant data to create unified representations of customers and products. * **Forecasting Models:** Predictive algorithms manage over 100 fulfillment centers, optimizing pricing and logistics for millions of products through a mix of statistical methods and deep learning. ### Enhancing Multimodal and Language Understanding The adoption of Foundation Models (FM) has unified previously fragmented ML tasks, particularly in multilingual environments: * **Joint Modeling:** Instead of separate embeddings, vision and language transformer models jointly model product images and metadata (titles/descriptions) to improve ad retrieval and similarity searches. * **Cross-Border Localization:** LLMs facilitate the translation of product titles from Korean to Mandarin and improve the quality of shopping feeds for global sellers. * **Weak Label Generation:** To overcome the high cost of human labeling in multiple languages, Coupang uses LLMs to generate high-quality "weak labels" for training downstream models, addressing label scarcity in under-resourced segments. ### Infrastructure for Large-Scale Training Scaling LLM training requires a shift in hardware architecture and distributed computing strategies: * **High-Performance Clusters:** The platform utilizes H100 and A100 GPU clusters interconnected with high-speed InfiniBand or RoCE (RDMA over Converged Ethernet) networking to minimize communication bottlenecks. * **Distributed Frameworks:** To fit massive models into GPU memory, Coupang employs various parallelism techniques, including Fully Sharded Data Parallelism (FSDP), Tensor Parallelism (TP), and Pipeline Parallelism (PP). * **Efficient Categorization:** Traditional architectures that required a separate model for every product category are being replaced by a single, massive multi-modal transformer capable of handling categorization and attribute extraction across the entire catalog. ### Optimizing LLM Serving and Inference The transition to real-time generative AI features requires significant optimizations to manage the high computational cost of inference: * **Quantization Strategies:** To reduce memory footprint and increase throughput, models are compressed using FP8, INT8, or INT4 precision without significant loss in accuracy. * **Advanced Serving Techniques:** The platform implements Key-Value (KV) caching to avoid redundant computations during text generation and utilizes continuous batching (via engines like vLLM or TGI) to maximize GPU utilization. * **Lifecycle Management:** A unified platform vision ensures that the entire end-to-end lifecycle—from data preparation and fine-tuning to deployment—is streamlined for ML engineers. To stay competitive, Coupang is moving toward an integrated AI lifecycle where foundation models serve as the backbone for both content generation and predictive analytics. This infrastructure-first approach allows for the rapid deployment of generative features while maintaining the resource efficiency required for massive e-commerce scales.

coupang

Cloud expenditure optimization for cost efficiency (opens in new tab)

Coupang addressed rising cloud costs by establishing a cross-functional Central team to bridge the gap between engineering usage and financial accountability. Through a data-driven approach involving custom analytics and automated resource management, the company successfully reduced on-demand expenditure by millions of dollars. This initiative demonstrates that aligning technical infrastructure with financial governance is essential for maintaining growth without unnecessary waste. **The Central Team and Data-Driven Governance** * Coupang formed a specialized Central team consisting of infrastructure engineers and technical program managers to identify efficiency opportunities across the organization. * The team developed custom BI dashboards utilizing Amazon CloudWatch, AWS Cost and Usage Reports (CUR), and Amazon Athena to provide domain teams with actionable insights into their spending. * The finance department partnered with engineering to enforce strict budget compliance, ensuring that domain teams managed their resources within assigned monthly and quarterly limits. **Strategies for Spending and Paying Less** * The company implemented "Spending Less" strategies by automating the launch of resources in non-production environments only when needed, resulting in a 25% cost reduction for those areas. * "Paying Less" initiatives focused on rightsizing, where the Central team worked with domain owners to manually identify and eliminate unutilized or underutilized EC2 resources. * Workloads were migrated to more efficient hardware and pricing models, specifically leveraging ARM-based AWS Graviton processors and AWS Spot Instances for data processing and storage. **Targeted Infrastructure Optimization** * Engineering teams focused on instance generation alignment, ensuring that services were running on the most cost-effective hardware generations available. * Storage costs were reduced by optimizing Amazon S3 structures at rest, improving how data is organized and stored. * The team refined Amazon EMR (Elastic MapReduce) configurations to enhance processing efficiency, significantly lowering the cost of large-scale data analysis. To achieve sustainable cloud efficiency, engineering organizations should move beyond viewing cloud costs as a purely financial concern and instead treat resource management as a core technical metric. By integrating financial accountability directly into the engineering workflow through shared analytics and automated resource controls, companies can foster a culture of efficiency that supports long-term scalability.

coupang

Coupang Rocket Delivery’s spatial index-based delivery management system (opens in new tab)

Coupang’s Rocket Delivery system recently transitioned from a text-based postal code infrastructure to a sophisticated spatial index-based management system to handle increasing delivery density. By adopting Uber’s H3 hexagonal grid system, the engineering team enabled the visualization and precise segmentation of delivery areas that were previously too large for a single driver to manage. This move has transformed the delivery process into an intuitive, map-centric operation that allows for data-driven optimization and real-time area modifications. ### Limitations of Text-Based Postal Codes * While postal codes provided a government-standardized starting point, they became inefficient as delivery volumes grew from double to triple digits per code. * The lack of spatial data meant that segmenting a single postal code into smaller units, such as individual apartment complexes or buildings, required manual input from local experts familiar with the terrain. * Relying on text strings prevented the system from providing intuitive visual feedback or automated metrics for optimizing delivery routes. ### Adopting H3 for Geospatial Indexing * The team evaluated different spatial indexing systems, specifically comparing Google’s S2 (square-based) and Uber’s H3 (hexagon-based) frameworks. * H3 was chosen because hexagons provide a constant distance between the center of a cell and all six of its neighbors, which simplifies the modeling of movement and coverage. * The hexagonal structure minimizes "edge effect" distortions compared to squares or triangles, making it more accurate for calculating delivery radius and area density. ### Technical Redesign and Implementation * The system utilizes H3’s hierarchical indexing, allowing the platform to store delivery data at various resolutions to balance granularity with computational performance. * Delivery zones were converted from standard polygons into "hexagonized" groups, enabling the system to treat complex geographical shapes as sets of standardized cell IDs. * This transition allowed for the creation of a visual interface where camp leaders can modify delivery boundaries directly on a map, with changes reflected instantly across the logistics chain. By shifting to a spatial index, Coupang has decoupled its logistics logic from rigid administrative boundaries like postal codes. This technical foundation allows for more agile resource distribution and provides the scalability needed to handle the continued growth of high-density urban deliveries.

coupang

Meet Coupang’s Machine Learning Platform (opens in new tab)

Coupang’s internal Machine Learning Platform (MLP) is a comprehensive "batteries-included" ecosystem designed to streamline the end-to-end lifecycle of ML development across its diverse business units, including e-commerce, logistics, and streaming. By providing standardized tools for feature engineering, pipeline authoring, and model serving, the platform significantly reduces the time-to-production while enabling scalable, efficient compute management. Ultimately, this infrastructure allows Coupang to leverage advanced models like Ko-BERT for search and real-time forecasting to enhance the customer experience at scale. **Motivation for a Centralized Platform** * **Reduced Time to Production:** The platform aims to accelerate the transition from ad-hoc exploration to production-ready services by eliminating repetitive infrastructure setup. * **CI/CD Integration:** By incorporating continuous integration and delivery into ML workflows, the platform ensures that experiments are reproducible and deployments are reliable. * **Compute Efficiency:** Managed clusters allow for the optimization of expensive hardware resources, such as GPUs, across multiple teams and diverse workloads like NLP and Computer Vision. **Notebooks and Pipeline Authoring** * **Managed Jupyter Notebooks:** Provides data scientists with a standardized environment for initial data exploration and prototyping. * **Pipeline SDK:** Developers can use a dedicated SDK to define complex ML workflows as code, facilitating the transition from research to automated pipelines. * **Framework Agnostic:** The platform supports a wide range of ML frameworks and programming languages to accommodate different model architectures. **Feature Engineering and Data Management** * **Centralized Feature Store:** Enables teams to share and reuse features, reducing redundant data processing and ensuring consistency across the organization. * **Consistent Data Pipelines:** Bridges the gap between offline training and online real-time inference by providing a unified interface for data transformations. * **Large-scale Preparation:** Streamlines the creation of training datasets from Coupang’s massive logs, including product catalogs and user behavior data. **Training and Inference Services** * **Scalable Model Training:** Handles distributed training jobs and resource orchestration, allowing for the development of high-parameter models. * **Robust Model Inference:** Supports low-latency model serving for real-time applications such as ad ranking, video recommendations in Coupang Play, and pricing. * **Dedicated Infrastructure:** Training and inference clusters abstract the underlying hardware complexity, allowing engineers to focus on model logic rather than server maintenance. **Monitoring and Observability** * **Performance Tracking:** Integrated tools monitor model health and performance metrics in live production environments. * **Drift Detection:** Provides visibility into data and model drift, ensuring that models remain accurate as consumer behavior and market conditions change. For organizations looking to scale their AI capabilities, investing in an integrated platform that bridges the gap between experimentation and production is essential. By standardizing the "plumbing" of machine learning—such as feature stores and automated pipelines—companies can drastically increase the velocity of their data science teams and ensure the long-term reliability of their production models.