쿠팡 / ai

5 posts

coupang

기계 학습 모델을 활용한 물류 입고 프로세스 최적화 (opens in new tab)

Coupang has implemented a machine learning-based prediction system to optimize its logistics inbound process by accurately forecasting the number of trucks required for product deliveries. By analyzing historical logistics data and vendor characteristics, the system minimizes resource waste at fulfillment center docks and prevents operational delays caused by slot shortages. This data-driven approach ensures that limited dock slots are allocated efficiently, improving overall supply chain speed and reliability. ### Challenges in Inbound Logistics * Fulfillment centers operate with a fixed number of "docks" for unloading and specific time "slots" assigned to each truck. * Inaccurate predictions create a resource dilemma: under-estimating slots causes unloading delays and backlogs, while over-estimating leads to idle docks and wasted capacity. * The goal was to move beyond manual estimation to an automated system that balances vendor requirements with actual facility throughput. ### Feature Engineering and Data Collection * The team performed Exploratory Data Analysis (EDA) on approximately 800,000 instances of inbound data collected over two years. * In-depth interviews with domain experts and logistics managers were conducted to identify hidden patterns and qualitative factors that influence truck requirements. * Final feature sets were refined through feature engineering, focusing on vendor-specific behaviors and the physical characteristics of the products being delivered. ### LightGBM Implementation and Optimization * The LightGBM algorithm was selected due to its high performance with large datasets and its efficiency in handling categorical features. * The model utilizes a leaf-wise tree growth strategy, which allows for faster training speeds and lower loss compared to traditional level-wise growth algorithms. * Hyperparameters were optimized using Bayesian Optimization, a method that finds the most effective model configurations more efficiently than traditional grid search methods. * The trained model is integrated directly into the booking system, providing real-time truck quantity recommendations to vendors during the application process. ### Operational Trade-offs and Results * The system must navigate the trade-off between under-prediction (which risks logistical bottlenecks) and over-prediction (which risks resource waste). * By automating the prediction of necessary slots, Coupang has reduced the manual workload for vendors and improved the accuracy of fulfillment center scheduling. * This optimization allows for more products to be processed in a shorter time frame, directly contributing to faster delivery times for the end customer. By replacing manual estimates with a LightGBM-based predictive model, Coupang has successfully synchronized vendor deliveries with fulfillment center capacity. This technical shift not only maximizes dock utilization but also builds a more resilient and scalable inbound supply chain.

coupang

쿠팡의 머신러닝 플랫폼을 통한 ML 개발 가속화 (opens in new tab)

Coupang’s internal Machine Learning (ML) platform serves as a standardized ecosystem designed to accelerate the transition from experimental research to stable production services. By centralizing core functions like automated pipelines, feature engineering, and scalable inference, the platform addresses the operational complexities of managing ML at an enterprise scale. This infrastructure allows engineers to focus on model innovation rather than manual resource management, ultimately driving efficiency across Coupang’s diverse service offerings. ### Addressing Scalability and Development Bottlenecks * The platform aims to drastically reduce "Time to Market" by providing "ready-to-use" services that eliminate the need for engineers to build custom infrastructure for every model. * Integrating Continuous Integration and Continuous Deployment (CI/CD) into the ML lifecycle ensures that updates to data, code, and models are handled with the same rigor as traditional software engineering. * By optimizing ML computing resources, the platform allows for the efficient scaling of training and inference workloads, preventing infrastructure costs from spiraling as the number of models grows. ### Core Services of the ML Platform * **Notebooks and Pipelines:** Integrated Jupyter environments allow for ad-hoc exploration, while workflow orchestration tools enable the construction of reproducible ML pipelines. * **Feature Engineering:** A dedicated feature store facilitates the reuse of data components and ensures consistency between the features used during model training and those used in real-time inference. * **Scalable Training and Inference:** The platform provides dedicated clusters for high-performance model training and robust hosting services for real-time and batch model predictions. * **Monitoring and Observability:** Automated tools track model performance and data drift in production, alerting engineers when a model’s accuracy begins to degrade due to changing real-world data. ### Real-World Success in Search and Pricing * **Search Query Understanding:** The platform enabled the training of Ko-BERT (Korean Bidirectional Encoder Representations from Transformers), significantly improving the accuracy of search results by better understanding customer intent. * **Real-time Dynamic Pricing:** Using the platform’s low-latency inference services, Coupang can predict and adjust product prices in real-time based on fluctuating market conditions and inventory levels. To maintain a competitive edge in e-commerce, organizations should transition away from fragmented, ad-hoc ML workflows toward a unified platform that treats ML as a first-class citizen of the software development lifecycle. Investing in such a platform not only speeds up deployment but also ensures the long-term reliability and observability of production models.

coupang

Optimizing the inbound process with a machine learning model (opens in new tab)

Coupang optimized its fulfillment center inbound process by implementing a machine learning model to predict the exact number of delivery trucks and dock slots required for vendor shipments. By moving away from manual estimates, the system minimizes resource waste from over-allocation while preventing processing delays caused by under-prediction. This automated approach ensures that the limited capacity of fulfillment center docks is utilized with maximum efficiency. ### The Challenges of Dock Slot Allocation * Fulfillment centers operate with a fixed number of hourly "slots," representing the time and space a single truck occupies at a dock to unload goods. * Inaccurate slot forecasting creates a binary risk: under-prediction leads to logistical bottlenecks and delivery delays, while over-prediction results in idle docks and wasted operational overhead. * The diversity of vendor behaviors and product types makes manual estimation of truck requirements highly inconsistent across the supply chain. ### Predictive Modeling and Feature Engineering * Coupang utilized years of historical logistics data to extract features influencing truck counts, including product dimensions, categories, and vendor-specific shipment patterns. * The system employs the LightGBM algorithm, a gradient-boosting framework selected for its high performance and ability to handle large-scale tabular logistics data. * Hyperparameter tuning is managed via Bayesian optimization, which efficiently searches the parameter space to minimize prediction error. * The model accounts for the inherent trade-off between under-prediction and over-prediction, prioritizing a balance that maintains high throughput without straining labor resources. ### System Integration and Real-time Processing * The trained ML model is integrated directly into the inbound reservation system, providing vendors with an immediate prediction of required slots during the request process. * By automating the truck-count calculation, the system removes the burden of estimation from vendors and ensures consistency across different fulfillment centers. * This integration allows Coupang to dynamically adjust its dock capacity planning based on real-time data rather than static, historical averages. To maximize logistics efficiency, organizations should leverage granular product data and historical vendor behavior to automate capacity planning. Integrating predictive models directly into the reservation workflow ensures that data-driven insights are applied at the point of action, reducing human error and resource waste.

coupang

Accelerating Coupang’s AI Journey with LLMs (opens in new tab)

Coupang is strategically evolving its machine learning infrastructure to integrate Large Language Models (LLMs) and foundation models across its e-commerce ecosystem. By transitioning from task-specific deep learning models to multi-modal transformers, the company aims to enhance customer experiences in search, recommendations, and logistics. This shift necessitates a robust ML platform capable of handling the massive compute, networking, and latency demands inherent in generative AI. ### Core Machine Learning Domains Coupang’s existing ML ecosystem is built upon three primary pillars that drive business logic: * **Recommendation Systems:** These models leverage vast datasets of user interactions—including clicks, purchases, and relevance judgments—to power home feeds, search results, and advertising. * **Content Understanding:** Utilizing deep learning to process product catalogs, user reviews, and merchant data to create unified representations of customers and products. * **Forecasting Models:** Predictive algorithms manage over 100 fulfillment centers, optimizing pricing and logistics for millions of products through a mix of statistical methods and deep learning. ### Enhancing Multimodal and Language Understanding The adoption of Foundation Models (FM) has unified previously fragmented ML tasks, particularly in multilingual environments: * **Joint Modeling:** Instead of separate embeddings, vision and language transformer models jointly model product images and metadata (titles/descriptions) to improve ad retrieval and similarity searches. * **Cross-Border Localization:** LLMs facilitate the translation of product titles from Korean to Mandarin and improve the quality of shopping feeds for global sellers. * **Weak Label Generation:** To overcome the high cost of human labeling in multiple languages, Coupang uses LLMs to generate high-quality "weak labels" for training downstream models, addressing label scarcity in under-resourced segments. ### Infrastructure for Large-Scale Training Scaling LLM training requires a shift in hardware architecture and distributed computing strategies: * **High-Performance Clusters:** The platform utilizes H100 and A100 GPU clusters interconnected with high-speed InfiniBand or RoCE (RDMA over Converged Ethernet) networking to minimize communication bottlenecks. * **Distributed Frameworks:** To fit massive models into GPU memory, Coupang employs various parallelism techniques, including Fully Sharded Data Parallelism (FSDP), Tensor Parallelism (TP), and Pipeline Parallelism (PP). * **Efficient Categorization:** Traditional architectures that required a separate model for every product category are being replaced by a single, massive multi-modal transformer capable of handling categorization and attribute extraction across the entire catalog. ### Optimizing LLM Serving and Inference The transition to real-time generative AI features requires significant optimizations to manage the high computational cost of inference: * **Quantization Strategies:** To reduce memory footprint and increase throughput, models are compressed using FP8, INT8, or INT4 precision without significant loss in accuracy. * **Advanced Serving Techniques:** The platform implements Key-Value (KV) caching to avoid redundant computations during text generation and utilizes continuous batching (via engines like vLLM or TGI) to maximize GPU utilization. * **Lifecycle Management:** A unified platform vision ensures that the entire end-to-end lifecycle—from data preparation and fine-tuning to deployment—is streamlined for ML engineers. To stay competitive, Coupang is moving toward an integrated AI lifecycle where foundation models serve as the backbone for both content generation and predictive analytics. This infrastructure-first approach allows for the rapid deployment of generative features while maintaining the resource efficiency required for massive e-commerce scales.

coupang

Meet Coupang’s Machine Learning Platform (opens in new tab)

Coupang’s internal Machine Learning Platform (MLP) is a comprehensive "batteries-included" ecosystem designed to streamline the end-to-end lifecycle of ML development across its diverse business units, including e-commerce, logistics, and streaming. By providing standardized tools for feature engineering, pipeline authoring, and model serving, the platform significantly reduces the time-to-production while enabling scalable, efficient compute management. Ultimately, this infrastructure allows Coupang to leverage advanced models like Ko-BERT for search and real-time forecasting to enhance the customer experience at scale. **Motivation for a Centralized Platform** * **Reduced Time to Production:** The platform aims to accelerate the transition from ad-hoc exploration to production-ready services by eliminating repetitive infrastructure setup. * **CI/CD Integration:** By incorporating continuous integration and delivery into ML workflows, the platform ensures that experiments are reproducible and deployments are reliable. * **Compute Efficiency:** Managed clusters allow for the optimization of expensive hardware resources, such as GPUs, across multiple teams and diverse workloads like NLP and Computer Vision. **Notebooks and Pipeline Authoring** * **Managed Jupyter Notebooks:** Provides data scientists with a standardized environment for initial data exploration and prototyping. * **Pipeline SDK:** Developers can use a dedicated SDK to define complex ML workflows as code, facilitating the transition from research to automated pipelines. * **Framework Agnostic:** The platform supports a wide range of ML frameworks and programming languages to accommodate different model architectures. **Feature Engineering and Data Management** * **Centralized Feature Store:** Enables teams to share and reuse features, reducing redundant data processing and ensuring consistency across the organization. * **Consistent Data Pipelines:** Bridges the gap between offline training and online real-time inference by providing a unified interface for data transformations. * **Large-scale Preparation:** Streamlines the creation of training datasets from Coupang’s massive logs, including product catalogs and user behavior data. **Training and Inference Services** * **Scalable Model Training:** Handles distributed training jobs and resource orchestration, allowing for the development of high-parameter models. * **Robust Model Inference:** Supports low-latency model serving for real-time applications such as ad ranking, video recommendations in Coupang Play, and pricing. * **Dedicated Infrastructure:** Training and inference clusters abstract the underlying hardware complexity, allowing engineers to focus on model logic rather than server maintenance. **Monitoring and Observability** * **Performance Tracking:** Integrated tools monitor model health and performance metrics in live production environments. * **Drift Detection:** Provides visibility into data and model drift, ensuring that models remain accurate as consumer behavior and market conditions change. For organizations looking to scale their AI capabilities, investing in an integrated platform that bridges the gap between experimentation and production is essential. By standardizing the "plumbing" of machine learning—such as feature stores and automated pipelines—companies can drastically increase the velocity of their data science teams and ensure the long-term reliability of their production models.