coupang

Accelerating Coupang’s AI Journey with LLMs | by Coupang Engineering | Coupang Engineering Blog | Medium (opens in new tab)

Coupang is strategically evolving its machine learning infrastructure to integrate Large Language Models (LLMs) and foundation models across its e-commerce ecosystem. By transitioning from task-specific deep learning models to multi-modal transformers, the company aims to enhance customer experiences in search, recommendations, and logistics. This shift necessitates a robust ML platform capable of handling the massive compute, networking, and latency demands inherent in generative AI.

Core Machine Learning Domains

Coupang’s existing ML ecosystem is built upon three primary pillars that drive business logic:

  • Recommendation Systems: These models leverage vast datasets of user interactions—including clicks, purchases, and relevance judgments—to power home feeds, search results, and advertising.
  • Content Understanding: Utilizing deep learning to process product catalogs, user reviews, and merchant data to create unified representations of customers and products.
  • Forecasting Models: Predictive algorithms manage over 100 fulfillment centers, optimizing pricing and logistics for millions of products through a mix of statistical methods and deep learning.

Enhancing Multimodal and Language Understanding

The adoption of Foundation Models (FM) has unified previously fragmented ML tasks, particularly in multilingual environments:

  • Joint Modeling: Instead of separate embeddings, vision and language transformer models jointly model product images and metadata (titles/descriptions) to improve ad retrieval and similarity searches.
  • Cross-Border Localization: LLMs facilitate the translation of product titles from Korean to Mandarin and improve the quality of shopping feeds for global sellers.
  • Weak Label Generation: To overcome the high cost of human labeling in multiple languages, Coupang uses LLMs to generate high-quality "weak labels" for training downstream models, addressing label scarcity in under-resourced segments.

Infrastructure for Large-Scale Training

Scaling LLM training requires a shift in hardware architecture and distributed computing strategies:

  • High-Performance Clusters: The platform utilizes H100 and A100 GPU clusters interconnected with high-speed InfiniBand or RoCE (RDMA over Converged Ethernet) networking to minimize communication bottlenecks.
  • Distributed Frameworks: To fit massive models into GPU memory, Coupang employs various parallelism techniques, including Fully Sharded Data Parallelism (FSDP), Tensor Parallelism (TP), and Pipeline Parallelism (PP).
  • Efficient Categorization: Traditional architectures that required a separate model for every product category are being replaced by a single, massive multi-modal transformer capable of handling categorization and attribute extraction across the entire catalog.

Optimizing LLM Serving and Inference

The transition to real-time generative AI features requires significant optimizations to manage the high computational cost of inference:

  • Quantization Strategies: To reduce memory footprint and increase throughput, models are compressed using FP8, INT8, or INT4 precision without significant loss in accuracy.
  • Advanced Serving Techniques: The platform implements Key-Value (KV) caching to avoid redundant computations during text generation and utilizes continuous batching (via engines like vLLM or TGI) to maximize GPU utilization.
  • Lifecycle Management: A unified platform vision ensures that the entire end-to-end lifecycle—from data preparation and fine-tuning to deployment—is streamlined for ML engineers.

To stay competitive, Coupang is moving toward an integrated AI lifecycle where foundation models serve as the backbone for both content generation and predictive analytics. This infrastructure-first approach allows for the rapid deployment of generative features while maintaining the resource efficiency required for massive e-commerce scales.