google

Geospatial Reasoning: Unlocking insights with generative AI and multiple foundation models (opens in new tab)

Google Research is introducing Geospatial Reasoning, a new framework that integrates generative AI with specialized foundation models to streamline complex geographical problem-solving. By combining large language models like Gemini with domain-specific data, the initiative seeks to make large-scale spatial analysis accessible to sectors like public health, urban development, and climate resilience. This research effort moves beyond traditional data silos, enabling agentic workflows that can interpret diverse data types—from satellite imagery to population dynamics—through natural language. ### Specialized Foundation Models for Human Activity * The Population Dynamics Foundation Model (PDFM) captures the complex interplay between human behaviors and their local environments. * A dedicated trajectory-based mobility foundation model has been developed to process and analyze movement patterns. * While initially tested in the US, experimental datasets are expanding to include the UK, Australia, Japan, Canada, and Malawi for selected partners. ### Remote Sensing and Vision Architectures * New models utilize advanced architectures including masked autoencoders, SigLIP, MaMMUT, and OWL-ViT, specifically adapted for the remote sensing domain. * Training involves high-resolution satellite and aerial imagery paired with text descriptions and bounding box annotations to enable precise object detection. * The models support zero-shot classification and retrieval, allowing users to locate specific features—such as "residential buildings with solar panels"—using flexible natural language queries. * Internal evaluations show state-of-the-art performance across multiple benchmarks, including image segmentation and post-disaster damage assessment. ### Agentic Workflows and Industry Collaboration * The Geospatial Reasoning framework utilizes LLMs like Gemini to manage complex datasets and orchestrate "agentic" workflows. * These workflows are grounded in geospatial data to ensure that the insights generated are both useful and contextually accurate. * Google is collaborating with inaugural industry partners, including Airbus, Maxar, Planet Labs, and WPP, to test these capabilities in real-world scenarios. Organizations interested in accelerating their geospatial analysis should consider applying for the trusted tester program to explore how these foundation models can be fine-tuned for specific proprietary data and use cases.

google

Evaluating progress of LLMs on scientific problem-solving (opens in new tab)

Current scientific benchmarks for large language models (LLMs) often focus on simple knowledge recall and multiple-choice responses, which do not reflect the complex, context-rich reasoning required in real-world research. To bridge this gap, Google Research has introduced CURIE, alongside the SPIQA and FEABench datasets, to evaluate LLMs on their ability to understand long-form documents, analyze multimodal data, and solve multi-step problems. These benchmarks aim to move AI from merely surfacing facts to actively assisting scientists in workflows involving information extraction, algebraic manipulation, and tool use. ### The CURIE Multitask Benchmark * CURIE spans six diverse scientific disciplines: materials science, condensed matter physics, quantum computing, geospatial analysis, biodiversity, and proteins. * The benchmark includes 10 challenging tasks, such as concept tracking, information aggregation, and cross-domain expertise, based on 429 full-length research documents. * The complexity of the benchmark is reflected in its scale, with input queries averaging 15,000 words and ground truth responses averaging 954 words. * Domain experts were involved in every phase of development, from sourcing papers to creating nuanced ground-truth answers in formats like JSON, LaTeX, and YAML. ### Multimodal Reasoning and Agentic Simulation * The SPIQA (Scientific Paper Image Question Answering) dataset evaluates the ability of multimodal LLMs to ground their answers in complex figures and tables found in scientific literature. * FEABench (Finite Element Analysis Benchmark) measures the ability of LLM agents to simulate and solve multiphysics, mathematics, and engineering problems. * These tools specifically test whether models can choose the correct computational tools and reason through the physical constraints of a given problem. ### Programmatic and Model-Based Evaluation * Because scientific answers are often descriptive or formatted heterogeneously, the evaluation uses programmatic metrics like ROUGE-L and Intersection-over-Union (IoU). * For free-form and complex technical generation, the framework incorporates model-based evaluations to ensure AI responses align with expert assessments. * Task difficulty is quantified by expert ratings, ensuring the benchmark measures high-level reasoning rather than just pattern matching. These new benchmarks provide a rigorous framework for developing LLMs that can act as true collaborators in the scientific process. By focusing on long-context understanding and tool-integrated reasoning, researchers can better track the progress of AI in handling the actual complexities of modern scientific discovery.

google

ECLeKTic: A novel benchmark for evaluating cross-lingual knowledge transfer in LLMs (opens in new tab)

ECLeKTic is a novel benchmark designed to evaluate how effectively large language models (LLMs) transfer knowledge between languages, addressing a common limitation where models possess information in a source language but fail to access it in others. By utilizing a closed-book question-answering format based on language-specific Wikipedia entries, the benchmark quantifies the gap between human-like cross-lingual understanding and current machine performance. Initial testing reveals that even state-of-the-art models have significant room for improvement, with the highest-performing model, Gemini 2.5 Pro, achieving only a 52.6% success rate. ## Methodology and Dataset Construction The researchers built the ECLeKTic dataset by focusing on "information silos" within Wikipedia to ensure the models would need to perform internal transfer rather than simply recalling translated training data. * The dataset targets 12 languages: English, French, German, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Mandarin Chinese, Portuguese, and Spanish. * Researchers selected 100 articles per language from a July 2023 Wikipedia snapshot that existed exclusively in that specific language and had no equivalent articles in the other 11 targeted languages. * This approach uses Wikipedia presence as a proxy to identify facts likely encountered by the model in only one language during its training phase. ## Human Refinement and Decontextualization To ensure the quality and portability of the questions, the team employed native speakers to refine and verify the data generated by AI. * Human annotators filtered Gemini-generated question-and-answer pairs to ensure they were answerable in a closed-book setting without referring to external context. * Annotators performed "decontextualization" by adding specific details to ambiguous terms; for example, a reference to the "Supreme Court" was clarified as the "Israeli Supreme Court" to ensure the question remained accurate after translation. * Questions were curated to focus on cultural and local salience rather than general global knowledge like science or universal current events. * The final dataset consists of 384 unique questions, which were translated and verified across all 11 target languages, resulting in 4,224 total examples. ## Benchmarking Model Performance The benchmark evaluates models using a specific metric called "overall success," which measures a model's ability to answer a question correctly in both the original source language and the target language. * The benchmark was used to test eight leading open and proprietary LLMs. * Gemini 2.0 Pro initially set a high bar with 41.6% success, which was later surpassed by Gemini 2.5 Pro at 52.6%. * The results demonstrate that while models are improving, they still struggle to maintain consistent knowledge across different linguistic contexts, representing a major hurdle for equitable global information access. The release of ECLeKTic as an open-source benchmark on Kaggle provides a vital tool for the AI community to bridge the "knowledge gap" between high-resource and low-resource languages. Developers and researchers should use this data to refine training methodologies, aiming for models that can express their internal knowledge regardless of the language used in the prompt.

google

The evolution of graph learning (opens in new tab)

The evolution of graph learning has transformed from classical mathematical puzzles into a cornerstone of modern machine learning, enabling the modeling of complex relational data. By bridging the gap between discrete graph algorithms and neural networks, researchers have unlocked the ability to generate powerful embeddings that capture structural similarities. This progression, spearheaded by milestones like PageRank and DeepWalk, has established graph-based models as essential tools for solving real-world challenges ranging from traffic prediction to molecular analysis. **Foundations of Graph Theory and Classical Algorithms** * Graph theory originated in 1736 with Leonhard Euler’s analysis of the Seven Bridges of Königsberg, which established the mathematical framework for representing connections between entities. * Pre-deep learning efforts focused on structural properties, such as community detection and centrality, or solving discrete problems like shortest paths and maximum flow. * The 1996 development of PageRank by Google’s founders applied these principles at scale, treating the internet as a massive graph of nodes (pages) and edges (hyperlinks) to revolutionize information retrieval. **Bridging Graph Data and Neural Networks via DeepWalk** * A primary challenge in the field was the difficulty of integrating discrete graph structures into neural network architectures, which typically favor feature-based embeddings over relational ones. * Developed in 2014, DeepWalk became the first practical method to bridge this gap by utilizing a neural network encoder to create graph embeddings. * These embeddings convert complex relational data into numeric representations that preserve the structural similarity between objects, allowing graph data to be processed by modern machine learning pipelines. **The Rise of Graph Convolutional Networks and Message Passing** * Following the success of graph embeddings, the field moved toward Graph Convolutional Networks (GCNs) in 2016 to better handle non-Euclidean data. * Modern frameworks now utilize Message Passing Neural Networks (MPNNs), which allow nodes to aggregate information from their neighbors to learn more nuanced representations. * These advancements are supported by specialized libraries in TensorFlow and JAX, enabling the application of graph learning to diverse fields such as physics simulations, disease spread modeling, and fake news detection. To effectively model complex systems where relationships are as important as the entities themselves, practitioners should transition from traditional feature-based models to graph-aware architectures. Utilizing contemporary libraries like those available for JAX and TensorFlow allows for the integration of relational structure directly into the learning process, providing more robust insights into interconnected data.

google

Deciphering language processing in the human brain through LLM representations (opens in new tab)

Recent research by Google Research and collaborating universities indicates that Large Language Models (LLMs) process natural language through internal representations that closely mirror neural activity in the human brain. By comparing intracranial recordings from spontaneous conversations with the internal embeddings of the Whisper speech-to-text model, the study found a high degree of linear alignment between artificial and biological language processing. These findings suggest that the statistical structures learned by LLMs via next-word prediction provide a viable computational framework for understanding how humans comprehend and produce speech. ## Mapping LLM Embeddings to Brain Activity * Researchers utilized intracranial electrodes to record neural signals during real-world, free-flowing conversations. * The study compared neural activity against two distinct types of embeddings from the Transformer-based Whisper model: "speech embeddings" from the model’s encoder and "language embeddings" from the decoder. * A linear transformation was used to predict brain signals based on these embeddings, revealing that LLMs and the human brain share similar multidimensional spaces for coding linguistic information. * The alignment suggests that human language processing may rely more on statistical structures and contextual embeddings rather than traditional symbolic rules or syntactic parts of speech. ## Neural Sequences in Speech Comprehension * When a subject listens to speech, the brain follows a specific chronological sequence that aligns with model representations. * Initially, speech embeddings predict cortical activity in the superior temporal gyrus (STG), which is responsible for processing auditory speech sounds. * A few hundred milliseconds later, language embeddings predict activity in Broca’s area (located in the inferior frontal gyrus), marking the transition from sound perception to decoding meaning. ## Reversed Dynamics in Speech Production * During speech production, the neural sequence is reversed, beginning approximately 500 milliseconds before a word is articulated. * Processing starts in Broca’s area, where language embeddings predict activity as the brain plans the semantic content of the utterance. * This is followed by activity in the motor cortex (MC), aligned with speech embeddings, as the brain prepares the physical articulatory movements. * Finally, after articulation, speech embeddings predict activity back in the STG, suggesting the brain is monitoring the sound of the speaker's own voice. This research validates the use of LLMs as powerful predictive tools for neuroscience, offering a new lens through which to study the temporal and spatial dynamics of human communication. By bridging the gap between artificial intelligence and cognitive biology, researchers can better model how the brain integrates sound and meaning in real-time.

google

Loss of Pulse Detection on the Google Pixel Watch 3 (opens in new tab)

Google Research has developed a "Loss of Pulse Detection" feature for the Pixel Watch 3 to address the high mortality rates associated with unwitnessed out-of-hospital cardiac arrests (OHCA). By utilizing a multimodal algorithm that combines photoplethysmography (PPG) and accelerometer data, the device can automatically identify the transition to a pulseless state and contact emergency services. This innovation aims to transform unwitnessed medical emergencies into functionally witnessed ones, potentially increasing survival rates by ensuring timely intervention. ### The Impact of Witness Status on Survival * Unwitnessed cardiac arrests currently face a major public health challenge, with survival rates as low as 4% compared to 20% for witnessed events. * The "Chain of Survival" traditionally relies on human bystanders to activate emergency responses, leaving those alone at a significant disadvantage. * Every minute without resuscitation decreases the chance of survival by 7–10%, making rapid detection the most critical factor in prognosis. * Converting an unwitnessed event into a "functionally witnessed" one via a wearable device could equate to a number needed to treat (NNT) of only six people to save one life. ### Multimodal Detection and the Three-Gate Process * The system uses PPG sensors to measure blood pulsatility by detecting photons backscattered by tissue at green and infrared wavelengths. * To prevent false positives and errant emergency calls, the algorithm must pass three sequential "gates" before making a classification. * **Gate 1:** Detects a sudden, significant drop in the alternating current (AC) component of the green PPG signal, which suggests a transition from a pulsatile to a pulseless state, paired with physical stillness. * **Gate 2:** Employs a machine learning algorithm trained on diverse user data to quantify the probability of a true pulseless transition. * **Gate 3:** Conducts additional sensor checks using various LED and photodiode geometries, wavelengths, and gain settings to confirm the absence of even a weak pulse. ### On-Device Processing and User Verification * All data processing occurs entirely on the watch to maintain user privacy, consistent with Google’s established health data policies. * If the algorithm detects a loss of pulse, it initiates two check-in prompts involving haptic, visual, and audio notifications to assess user responsiveness. * The process can be de-escalated immediately if the user moves their arm purposefully, ensuring that emergency services are only contacted during true incapacitation. * When a user remains unresponsive, the watch automatically contacts emergency services to provide the individual's current location and medical situation. By providing a passive, opportunistic monitoring system on a mass-market wearable, this technology offers a critical safety net for individuals at risk of unwitnessed cardiac events. For the broader population, the Pixel Watch 3 serves as a life-saving tool that bridges the gap between a sudden medical emergency and the arrival of professional responders.

google

Load balancing with random job arrivals (opens in new tab)

Research from Google explores the competitive ratio of online load balancing when tasks arrive in a uniformly random order rather than an adversarial one. By analyzing a "tree balancing game" where edges must be oriented to minimize node indegree, the authors demonstrate that random arrival sequences still impose significant mathematical limitations on deterministic algorithms. The study ultimately concludes that no online algorithm can achieve a competitive ratio significantly better than $\sqrt{\log n}$, establishing new theoretical boundaries for efficient cluster management. ### The Online Load Balancing Challenge * Modern cluster management systems, such as Google’s Borg, must distribute hundreds of thousands of jobs across machines to maximize utilization and minimize the maximum load (makespan). * In the online version of this problem, jobs arrive one-by-one, and the system must assign them immediately without knowing what future jobs will look like. * Traditionally, these algorithms are evaluated using "competitive analysis," comparing the performance of an online algorithm against an optimal offline version that has full knowledge of the job sequence. ### The Tree Balancing Game * The problem is modeled as a game where an adversary presents edges of a tree (representing jobs and machines) one at a time. * For every undirected edge $(u, v)$ presented, the algorithm must choose an orientation ($u \to v$ or $v \to u$), with the goal of minimizing the maximum number of edges pointing at any single node. * In a worst-case adversarial arrival order, it has been mathematically proven since the 1990s that no deterministic algorithm can guarantee a maximum indegree of less than $\log n$, where $n$ is the number of nodes. ### Performance Under Random Arrival Orders * The research specifically investigates "random order arrivals," where every possible permutation of the job sequence is equally likely, simulating a more natural distribution than a malicious adversary. * While previous assumptions suggested that a simple "greedy algorithm" (assigning the job to the machine with the currently lower load) performed better in this model, this research proves a new, stricter lower bound. * The authors demonstrate that even with random arrivals, any online algorithm will still incur a maximum load proportional to at least $\sqrt{\log n}$. * For more general load balancing scenarios beyond simple trees, the researchers established a lower bound of $\sqrt{\log \log n}$. ### Practical Implications These findings suggest that while random job arrival provides a slight performance advantage over adversarial scenarios, system designers cannot rely on randomness alone to eliminate load imbalances. Because the maximum load grows predictably according to the $\sqrt{\log n}$ limit, large-scale systems must be architected to handle this inherent logarithmic growth in resource pressure to maintain high utilization and stability.

google

Generating synthetic data with differentially private LLM inference (opens in new tab)

Researchers at Google have developed an inference-only method for generating differentially private (DP) synthetic data that avoids the high costs and data requirements associated with private fine-tuning. By prompting off-the-shelf large language models (LLMs) with sensitive examples in parallel and aggregating their outputs, the approach can generate thousands of high-quality synthetic data points while maintaining rigorous privacy guarantees. This method allows synthetic data to serve as a secure interface for model development, enabling teams to collaborate without requiring specialized knowledge of differential privacy. ## Differentially Private Prediction and Aggregation The core of this method relies on "private prediction," where privacy is applied to the model's output rather than the model itself. * Sensitive data points are distributed across multiple independent prompts, ensuring that no single individual's record can significantly influence the final output. * The LLM generates next-token predictions for each prompt in parallel, which are then aggregated to mask individual contributions. * The researchers designed a DP token sampling algorithm that treats the standard LLM "softmax" sampling process as a version of the exponential mechanism, a mathematical framework used to select the best option from a set while maintaining privacy. ## Enhancing Efficiency via KV Caching Previous attempts at private prediction were computationally expensive because they required a fresh batch of sensitive examples for every single token generated. * A new privacy analysis allows the system to reuse a fixed batch of sensitive examples across an entire generation sequence. * By maintaining the same context for each generation step, the system becomes compatible with standard inference optimization techniques like KV (Key-Value) caching. * This improvement enables the generation of synthetic data at a scale two to three orders of magnitude larger than prior methods. ## Optimizing Privacy Spend with Public Drafters To preserve the "privacy budget"—the limited amount of information that can be released before privacy is compromised—the method introduces a public drafter model. * The drafter model predicts the next token based solely on previously generated synthetic text, without ever seeing the sensitive data. * Using the sparse vector technique, the system only consumes the privacy budget when the public drafter’s suggestion disagrees with the private aggregate of the sensitive data. * This is particularly useful for structured data, where the drafter can handle formatting and syntax tokens, saving the privacy budget for the actual content. By leveraging off-the-shelf models like Gemma, this approach provides a scalable way to transform sensitive datasets into useful synthetic versions. These synthetic datasets are high-quality enough to replace real data in downstream machine learning tasks, such as in-context learning or fine-tuning models like BERT, without the risk of leaking individual user information.

coupang

Optimizing operational costs through cloud (opens in new tab)

Coupang’s Finance and Engineering teams collaborated to optimize cloud expenditures by focusing on resource efficiency and the company's "Hate Waste" leadership principle. Through a dedicated optimization project team and the implementation of data-driven analytics, the company successfully reduced on-demand costs by millions of dollars without compromising business growth. This initiative transformed cloud management from a reactive expense into a proactive engineering culture centered on financial accountability and technical efficiency. ### Forming the Optimization Project Team * A specialized team consisting of Cloud Infrastructure Engineers and Technical Program Managers (TPMs) was established to bridge the gap between finance and engineering. * The project team focused on educating domain teams about the variable cost model of cloud services, moving away from a fixed-cost mindset. * Technical experts helped domain teams identify opportunities to use cost-efficient technologies, such as ARM-based AWS Graviton processors and AWS Spot Instances for data processing. * The initiative established clear ownership, ensuring that each domain team understood and managed their specific cloud resource usage. ### Analytics and Dashboards for Visibility * Engineers developed custom dashboards using Amazon Athena to process Amazon CloudWatch data, providing deep insights into resource performance. * The team utilized AWS Cost & Usage Reports (CUR) within internal Business Intelligence (BI) tools to provide granular visibility into spending patterns. * Finance teams worked alongside engineers to align technical roadmaps with monthly and quarterly budget goals, making cost management a shared responsibility. ### Strategies for Usage and Cost Reduction * **Spend Less (Usage Reduction):** Coupang implemented automation to ensure that non-production environment resources were only active when needed, resulting in a 25% cost saving for those environments. * **Pay Less (Right-sizing):** The team analyzed usage patterns to manually identify and decommission unused EC2 resources across all domain teams. * **Instance and Storage Optimization:** The project prioritized migrating workloads to the latest instance generations and optimizing Amazon S3 storage structures to reduce costs for data at rest. To achieve sustainable cloud efficiency, organizations should move beyond simple monitoring and foster an engineering culture where resource management is a core technical discipline. Prioritizing automated resource scheduling and adopting modern, high-efficiency hardware like Graviton instances are essential steps for any large-scale cloud operation looking to maximize its return on investment.

coupang

Coupang SCM Workflow: Developing (opens in new tab)

Coupang has developed an internal SCM Workflow platform to streamline the complex data and operational needs of its Supply Chain Management team. By implementing low-code and no-code functionalities, the platform enables developers, data scientists, and business analysts to build data pipelines and launch services without the traditional bottlenecks of manual development. ### Addressing Inefficiencies in SCM Data Management * The SCM team manages a massive network of suppliers and fulfillment centers (FCs) where demand forecasting and inventory distribution require constant data feedback. * Traditionally, non-technical stakeholders like business analysts (BAs) relied heavily on developers to build or modify data pipelines, leading to high communication costs and slower response times to changing business requirements. * The new platform aims to simplify the complexity found in traditional tools like Jenkins, Airflow, and Jupyter Notebooks, providing a unified interface for data creation and visualization. ### Democratizing Access with the No-code Data Builder * The "Data Builder" allows users to perform data queries, extraction, and system integration through a visual interface rather than writing backend code. * It provides seamless access to a wide array of data sources used across Coupang, including Redshift, Hive, Presto, Aurora, MySQL, Elasticsearch, and S3. * Users can construct workflows by creating "nodes" for specific tasks—such as extracting inventory data from Hive or calculating transfer quantities—and linking them together to automate complex decisions like inter-center product transfers. ### Expanding Capabilities through Low-code Service Building * The platform functions as a "Service Builder," allowing users to expand domains and launch simple services without building entirely new infrastructure from scratch. * This approach enables developers to focus on high-level algorithm development while allowing data scientists to apply and test new models directly within the production environment. * By reducing the need for code changes to reflect new requirements, the platform significantly increases the agility of the SCM pipeline. Organizations managing complex, data-driven ecosystems can significantly reduce operational friction by adopting low-code/no-code platforms. Empowering non-technical stakeholders to handle data processing and service integration not only accelerates innovation but also allows engineering resources to be redirected toward core architectural challenges.

coupang

Optimizing Logistics Inbound Process Using (opens in new tab)

Coupang has implemented a machine learning-based prediction system to optimize its logistics inbound process by accurately forecasting the number of trucks required for product deliveries. By analyzing historical logistics data and vendor characteristics, the system minimizes resource waste at fulfillment center docks and prevents operational delays caused by slot shortages. This data-driven approach ensures that limited dock slots are allocated efficiently, improving overall supply chain speed and reliability. ### Challenges in Inbound Logistics * Fulfillment centers operate with a fixed number of "docks" for unloading and specific time "slots" assigned to each truck. * Inaccurate predictions create a resource dilemma: under-estimating slots causes unloading delays and backlogs, while over-estimating leads to idle docks and wasted capacity. * The goal was to move beyond manual estimation to an automated system that balances vendor requirements with actual facility throughput. ### Feature Engineering and Data Collection * The team performed Exploratory Data Analysis (EDA) on approximately 800,000 instances of inbound data collected over two years. * In-depth interviews with domain experts and logistics managers were conducted to identify hidden patterns and qualitative factors that influence truck requirements. * Final feature sets were refined through feature engineering, focusing on vendor-specific behaviors and the physical characteristics of the products being delivered. ### LightGBM Implementation and Optimization * The LightGBM algorithm was selected due to its high performance with large datasets and its efficiency in handling categorical features. * The model utilizes a leaf-wise tree growth strategy, which allows for faster training speeds and lower loss compared to traditional level-wise growth algorithms. * Hyperparameters were optimized using Bayesian Optimization, a method that finds the most effective model configurations more efficiently than traditional grid search methods. * The trained model is integrated directly into the booking system, providing real-time truck quantity recommendations to vendors during the application process. ### Operational Trade-offs and Results * The system must navigate the trade-off between under-prediction (which risks logistical bottlenecks) and over-prediction (which risks resource waste). * By automating the prediction of necessary slots, Coupang has reduced the manual workload for vendors and improved the accuracy of fulfillment center scheduling. * This optimization allows for more products to be processed in a shorter time frame, directly contributing to faster delivery times for the end customer. By replacing manual estimates with a LightGBM-based predictive model, Coupang has successfully synchronized vendor deliveries with fulfillment center capacity. This technical shift not only maximizes dock utilization but also builds a more resilient and scalable inbound supply chain.

coupang

Accelerating ML Development through Coupang (opens in new tab)

Coupang’s internal Machine Learning (ML) platform serves as a standardized ecosystem designed to accelerate the transition from experimental research to stable production services. By centralizing core functions like automated pipelines, feature engineering, and scalable inference, the platform addresses the operational complexities of managing ML at an enterprise scale. This infrastructure allows engineers to focus on model innovation rather than manual resource management, ultimately driving efficiency across Coupang’s diverse service offerings. ### Addressing Scalability and Development Bottlenecks * The platform aims to drastically reduce "Time to Market" by providing "ready-to-use" services that eliminate the need for engineers to build custom infrastructure for every model. * Integrating Continuous Integration and Continuous Deployment (CI/CD) into the ML lifecycle ensures that updates to data, code, and models are handled with the same rigor as traditional software engineering. * By optimizing ML computing resources, the platform allows for the efficient scaling of training and inference workloads, preventing infrastructure costs from spiraling as the number of models grows. ### Core Services of the ML Platform * **Notebooks and Pipelines:** Integrated Jupyter environments allow for ad-hoc exploration, while workflow orchestration tools enable the construction of reproducible ML pipelines. * **Feature Engineering:** A dedicated feature store facilitates the reuse of data components and ensures consistency between the features used during model training and those used in real-time inference. * **Scalable Training and Inference:** The platform provides dedicated clusters for high-performance model training and robust hosting services for real-time and batch model predictions. * **Monitoring and Observability:** Automated tools track model performance and data drift in production, alerting engineers when a model’s accuracy begins to degrade due to changing real-world data. ### Real-World Success in Search and Pricing * **Search Query Understanding:** The platform enabled the training of Ko-BERT (Korean Bidirectional Encoder Representations from Transformers), significantly improving the accuracy of search results by better understanding customer intent. * **Real-time Dynamic Pricing:** Using the platform’s low-latency inference services, Coupang can predict and adjust product prices in real-time based on fluctuating market conditions and inventory levels. To maintain a competitive edge in e-commerce, organizations should transition away from fragmented, ad-hoc ML workflows toward a unified platform that treats ML as a first-class citizen of the software development lifecycle. Investing in such a platform not only speeds up deployment but also ensures the long-term reliability and observability of production models.

coupang

Optimizing the inbound process with a machine learning model | by Coupang Engineering | Coupang Engineering Blog | Medium (opens in new tab)

Coupang optimized its fulfillment center inbound process by implementing a machine learning model to predict the exact number of delivery trucks and dock slots required for vendor shipments. By moving away from manual estimates, the system minimizes resource waste from over-allocation while preventing processing delays caused by under-prediction. This automated approach ensures that the limited capacity of fulfillment center docks is utilized with maximum efficiency. ### The Challenges of Dock Slot Allocation * Fulfillment centers operate with a fixed number of hourly "slots," representing the time and space a single truck occupies at a dock to unload goods. * Inaccurate slot forecasting creates a binary risk: under-prediction leads to logistical bottlenecks and delivery delays, while over-prediction results in idle docks and wasted operational overhead. * The diversity of vendor behaviors and product types makes manual estimation of truck requirements highly inconsistent across the supply chain. ### Predictive Modeling and Feature Engineering * Coupang utilized years of historical logistics data to extract features influencing truck counts, including product dimensions, categories, and vendor-specific shipment patterns. * The system employs the LightGBM algorithm, a gradient-boosting framework selected for its high performance and ability to handle large-scale tabular logistics data. * Hyperparameter tuning is managed via Bayesian optimization, which efficiently searches the parameter space to minimize prediction error. * The model accounts for the inherent trade-off between under-prediction and over-prediction, prioritizing a balance that maintains high throughput without straining labor resources. ### System Integration and Real-time Processing * The trained ML model is integrated directly into the inbound reservation system, providing vendors with an immediate prediction of required slots during the request process. * By automating the truck-count calculation, the system removes the burden of estimation from vendors and ensures consistency across different fulfillment centers. * This integration allows Coupang to dynamically adjust its dock capacity planning based on real-time data rather than static, historical averages. To maximize logistics efficiency, organizations should leverage granular product data and historical vendor behavior to automate capacity planning. Integrating predictive models directly into the reservation workflow ensures that data-driven insights are applied at the point of action, reducing human error and resource waste.

coupang

Accelerating Coupang’s AI Journey with LLMs | by Coupang Engineering | Coupang Engineering Blog | Medium (opens in new tab)

Coupang is strategically evolving its machine learning infrastructure to integrate Large Language Models (LLMs) and foundation models across its e-commerce ecosystem. By transitioning from task-specific deep learning models to multi-modal transformers, the company aims to enhance customer experiences in search, recommendations, and logistics. This shift necessitates a robust ML platform capable of handling the massive compute, networking, and latency demands inherent in generative AI. ### Core Machine Learning Domains Coupang’s existing ML ecosystem is built upon three primary pillars that drive business logic: * **Recommendation Systems:** These models leverage vast datasets of user interactions—including clicks, purchases, and relevance judgments—to power home feeds, search results, and advertising. * **Content Understanding:** Utilizing deep learning to process product catalogs, user reviews, and merchant data to create unified representations of customers and products. * **Forecasting Models:** Predictive algorithms manage over 100 fulfillment centers, optimizing pricing and logistics for millions of products through a mix of statistical methods and deep learning. ### Enhancing Multimodal and Language Understanding The adoption of Foundation Models (FM) has unified previously fragmented ML tasks, particularly in multilingual environments: * **Joint Modeling:** Instead of separate embeddings, vision and language transformer models jointly model product images and metadata (titles/descriptions) to improve ad retrieval and similarity searches. * **Cross-Border Localization:** LLMs facilitate the translation of product titles from Korean to Mandarin and improve the quality of shopping feeds for global sellers. * **Weak Label Generation:** To overcome the high cost of human labeling in multiple languages, Coupang uses LLMs to generate high-quality "weak labels" for training downstream models, addressing label scarcity in under-resourced segments. ### Infrastructure for Large-Scale Training Scaling LLM training requires a shift in hardware architecture and distributed computing strategies: * **High-Performance Clusters:** The platform utilizes H100 and A100 GPU clusters interconnected with high-speed InfiniBand or RoCE (RDMA over Converged Ethernet) networking to minimize communication bottlenecks. * **Distributed Frameworks:** To fit massive models into GPU memory, Coupang employs various parallelism techniques, including Fully Sharded Data Parallelism (FSDP), Tensor Parallelism (TP), and Pipeline Parallelism (PP). * **Efficient Categorization:** Traditional architectures that required a separate model for every product category are being replaced by a single, massive multi-modal transformer capable of handling categorization and attribute extraction across the entire catalog. ### Optimizing LLM Serving and Inference The transition to real-time generative AI features requires significant optimizations to manage the high computational cost of inference: * **Quantization Strategies:** To reduce memory footprint and increase throughput, models are compressed using FP8, INT8, or INT4 precision without significant loss in accuracy. * **Advanced Serving Techniques:** The platform implements Key-Value (KV) caching to avoid redundant computations during text generation and utilizes continuous batching (via engines like vLLM or TGI) to maximize GPU utilization. * **Lifecycle Management:** A unified platform vision ensures that the entire end-to-end lifecycle—from data preparation and fine-tuning to deployment—is streamlined for ML engineers. To stay competitive, Coupang is moving toward an integrated AI lifecycle where foundation models serve as the backbone for both content generation and predictive analytics. This infrastructure-first approach allows for the rapid deployment of generative features while maintaining the resource efficiency required for massive e-commerce scales.

coupang

Cloud expenditure optimization for cost efficiency | by Coupang Engineering | Coupang Engineering Blog | Medium (opens in new tab)

Coupang addressed rising cloud costs by establishing a cross-functional Central team to bridge the gap between engineering usage and financial accountability. Through a data-driven approach involving custom analytics and automated resource management, the company successfully reduced on-demand expenditure by millions of dollars. This initiative demonstrates that aligning technical infrastructure with financial governance is essential for maintaining growth without unnecessary waste. **The Central Team and Data-Driven Governance** * Coupang formed a specialized Central team consisting of infrastructure engineers and technical program managers to identify efficiency opportunities across the organization. * The team developed custom BI dashboards utilizing Amazon CloudWatch, AWS Cost and Usage Reports (CUR), and Amazon Athena to provide domain teams with actionable insights into their spending. * The finance department partnered with engineering to enforce strict budget compliance, ensuring that domain teams managed their resources within assigned monthly and quarterly limits. **Strategies for Spending and Paying Less** * The company implemented "Spending Less" strategies by automating the launch of resources in non-production environments only when needed, resulting in a 25% cost reduction for those areas. * "Paying Less" initiatives focused on rightsizing, where the Central team worked with domain owners to manually identify and eliminate unutilized or underutilized EC2 resources. * Workloads were migrated to more efficient hardware and pricing models, specifically leveraging ARM-based AWS Graviton processors and AWS Spot Instances for data processing and storage. **Targeted Infrastructure Optimization** * Engineering teams focused on instance generation alignment, ensuring that services were running on the most cost-effective hardware generations available. * Storage costs were reduced by optimizing Amazon S3 structures at rest, improving how data is organized and stored. * The team refined Amazon EMR (Elastic MapReduce) configurations to enhance processing efficiency, significantly lowering the cost of large-scale data analysis. To achieve sustainable cloud efficiency, engineering organizations should move beyond viewing cloud costs as a purely financial concern and instead treat resource management as a core technical metric. By integrating financial accountability directly into the engineering workflow through shared analytics and automated resource controls, companies can foster a culture of efficiency that supports long-term scalability.