recommendation-systems

4 posts

meta

Adapting the Facebook Reels RecSys AI Model Based on User Feedback - Engineering at Meta (opens in new tab)

Meta has enhanced the Facebook Reels recommendation engine by shifting focus from traditional engagement signals, like watch time and likes, to direct user feedback. By implementing the User True Interest Survey (UTIS) model, the system now prioritizes content that aligns with genuine user preferences rather than just short-term interactions. This shift has resulted in significant improvements in recommendation relevance, high-quality content delivery, and long-term user retention. **Limitations of Engagement-Based Metrics** * Traditional signals like "likes" and "watch time" are often noisy and may not reflect a user’s actual long-term interests. * Models optimized solely for engagement tend to favor short-term value over the long-term utility of the product. * Internal research found that previous heuristic-based interest models only achieved 48.3% precision in identifying what users truly care about. * Effective interest matching requires understanding nuanced factors such as production style, mood, audio, and motivation, which implicit signals often miss. **The User True Interest Survey (UTIS) Model** * Meta collects direct feedback via randomized, single-question surveys asking users to rate video interest on a 1–5 scale. * The raw survey data is binarized to denoise responses and weighted to correct for sampling and nonresponse bias. * The UTIS model functions as a lightweight "alignment model layer" built on top of the main multi-task ranking system. * The architecture uses existing model predictions as input features, supplemented by engineered features that capture content attributes and user behavior. **Integration into the Ranking Funnel** * **Late Stage Ranking (LSR):** The UTIS score is used as an additional input feature in the final value formula, allowing the system to boost high-interest videos and demote low-interest ones. * **Early Stage Ranking (Retrieval):** The model aggregates survey data to reconstruct user interest profiles, helping the system source more relevant candidates during the initial retrieval phase. * **Knowledge Distillation:** Large sequence-based retrieval models are aligned using UTIS predictions as labels through distillation objectives. **Performance and Impact** * The deployment of UTIS has led to a measurable increase in the delivery of niche, high-quality content. * Generic, popularity-based recommendations that often lack depth have been reduced. * Meta observed robust improvements across core metrics, including higher follow rates, more shares, and increased user retention. * The system now offers better interpretability, allowing engineers to understand which specific factors contribute to a user’s sense of "interest match." To continue improving the Reels ecosystem, Meta is focusing on doubling down on personalization by tackling challenges related to sparse data and sampling bias while exploring more advanced AI architectures to further diversify recommendations.

netflix

Post-Training Generative Recommenders with Advantage-Weighted Supervised Finetuning | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix is evolving its recommendation systems by moving beyond simple behavior imitation toward generative recommenders that better align with true user preferences. While generative models like HSTU and OneRec effectively capture sequential user patterns, they often struggle to distinguish between habitual clicks and genuine satisfaction. To bridge this gap, Netflix developed Advantage-Weighted Supervised Fine-tuning (A-SFT), a post-training method that leverages noisy reward signals to refine model performance without the need for complex counterfactual data. ### The Shift to Generative Recommenders * Modern generative recommenders (GRs), such as HSTU and OneRec, utilize transformer architectures to treat recommendation as a sequential transduction task. * The models are typically trained using next-item prediction, where the system learns to imitate the chronological sequence of a user’s activities. * A significant drawback of this "behavior cloning" approach is that it captures external trends and noise rather than long-term user satisfaction, potentially recommending content the user finished but did not actually enjoy. ### Barriers to Reinforcement Learning in RecSys * Traditional post-training methods used in Large Language Models, such as Proximal Policy Optimization (PPO) or Direct Preference Optimization (DPO), require counterfactual feedback that is difficult to obtain in recommendation contexts. * Because user sequences span weeks or years, it is impractical to generate and test hypothetical, counterfactual experiences for real-time user validation. * Reward signals in recommendation systems are inherently noisy; for instance, high watch time might indicate interest, but it can also be a result of external circumstances, making it an unreliable metric for optimization. ### Advantage-Weighted Supervised Fine-tuning (A-SFT) * A-SFT is a hybrid approach that sits between offline reinforcement learning and standard supervised fine-tuning. * The algorithm incorporates an advantage function to weight training examples, allowing the model to prioritize actions that lead to higher rewards while filtering out noise from the reward model. * This method is specifically designed to handle high-variance reward signals, using them as directional guides rather than absolute truth, which prevents the model from over-exploiting inaccurate data. * Benchmarks against other representative methods show that A-SFT achieves superior alignment between the generative recommendation policy and the underlying reward model. For organizations managing large-scale recommendation engines, A-SFT offers a practical path to implementing post-training improvements. By focusing on advantage-weighted signals, developers can improve recommendation quality using existing implicit feedback—like watch time and clicks—without the infrastructure hurdles of online reinforcement learning.

netflix

Behind the Streams: Real-Time Recommendations for Live Events Part 3 | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix manages the massive surge of concurrent users during live events by utilizing a hybrid strategy of prefetching and real-time broadcasting to deliver synchronized recommendations. By decoupling data delivery from the live trigger, the system avoids the "thundering herd" effect that would otherwise overwhelm cloud infrastructure during record-breaking broadcasts. This architecture ensures that millions of global devices receive timely updates and visual cues without requiring linear, inefficient scaling of compute resources. ### The Constraint Optimization Problem To maintain a seamless experience, Netflix engineers balance three primary technical constraints: time to update, request throughput, and compute cardinality. * **Time:** The specific duration required to coordinate and push a recommendation update to the entire global fleet. * **Throughput:** The maximum capacity of cloud services to handle incoming requests without service degradation. * **Cardinality:** The variety and complexity of unique requests necessary to serve personalized updates to different user segments. ### Two-Phase Recommendation Delivery The system splits the delivery process into two distinct stages to smooth out traffic spikes and ensure high availability. * **Prefetching Phase:** While members browse the app normally before an event, the system downloads materialized recommendations, metadata, and artwork into the device's local cache. * **Broadcasting Phase:** When the event begins, a low-cardinality "at least once" message is broadcast to all connected devices, triggering them to display the already-cached content instantaneously. * **Traffic Smoothing:** This approach eliminates the need for massive, real-time data fetches at the moment of kickoff, distributing the heavy lifting of data transfer over a longer period. ### Live State Management and UI Synchronization A dedicated Live State Management (LSM) system tracks event schedules in real time to ensure the user interface stays perfectly in sync with the production. * **Dynamic Adjustments:** If a live event is delayed or ends early, the LSM adjusts the broadcast triggers to preserve accuracy and prevent "spoilers" or dead links. * **Visual Cues:** The UI utilizes "Live" badging and dynamic artwork transitions to signal urgency and guide users toward the stream. * **Frictionless Playback:** For members already on a title’s detail page, the system can trigger an automatic transition into the live player the moment the broadcast begins, reducing navigation latency. To support global-scale live events, technical teams should prioritize edge-heavy strategies that pre-position assets on client devices. By shifting from a reactive request-response model to a proactive prefetch-and-trigger model, platforms can maintain high performance and reliability even during the most significant traffic peaks.

coupang

Accelerating Coupang’s AI Journey with LLMs | by Coupang Engineering | Coupang Engineering Blog | Medium (opens in new tab)

Coupang is strategically evolving its machine learning infrastructure to integrate Large Language Models (LLMs) and foundation models across its e-commerce ecosystem. By transitioning from task-specific deep learning models to multi-modal transformers, the company aims to enhance customer experiences in search, recommendations, and logistics. This shift necessitates a robust ML platform capable of handling the massive compute, networking, and latency demands inherent in generative AI. ### Core Machine Learning Domains Coupang’s existing ML ecosystem is built upon three primary pillars that drive business logic: * **Recommendation Systems:** These models leverage vast datasets of user interactions—including clicks, purchases, and relevance judgments—to power home feeds, search results, and advertising. * **Content Understanding:** Utilizing deep learning to process product catalogs, user reviews, and merchant data to create unified representations of customers and products. * **Forecasting Models:** Predictive algorithms manage over 100 fulfillment centers, optimizing pricing and logistics for millions of products through a mix of statistical methods and deep learning. ### Enhancing Multimodal and Language Understanding The adoption of Foundation Models (FM) has unified previously fragmented ML tasks, particularly in multilingual environments: * **Joint Modeling:** Instead of separate embeddings, vision and language transformer models jointly model product images and metadata (titles/descriptions) to improve ad retrieval and similarity searches. * **Cross-Border Localization:** LLMs facilitate the translation of product titles from Korean to Mandarin and improve the quality of shopping feeds for global sellers. * **Weak Label Generation:** To overcome the high cost of human labeling in multiple languages, Coupang uses LLMs to generate high-quality "weak labels" for training downstream models, addressing label scarcity in under-resourced segments. ### Infrastructure for Large-Scale Training Scaling LLM training requires a shift in hardware architecture and distributed computing strategies: * **High-Performance Clusters:** The platform utilizes H100 and A100 GPU clusters interconnected with high-speed InfiniBand or RoCE (RDMA over Converged Ethernet) networking to minimize communication bottlenecks. * **Distributed Frameworks:** To fit massive models into GPU memory, Coupang employs various parallelism techniques, including Fully Sharded Data Parallelism (FSDP), Tensor Parallelism (TP), and Pipeline Parallelism (PP). * **Efficient Categorization:** Traditional architectures that required a separate model for every product category are being replaced by a single, massive multi-modal transformer capable of handling categorization and attribute extraction across the entire catalog. ### Optimizing LLM Serving and Inference The transition to real-time generative AI features requires significant optimizations to manage the high computational cost of inference: * **Quantization Strategies:** To reduce memory footprint and increase throughput, models are compressed using FP8, INT8, or INT4 precision without significant loss in accuracy. * **Advanced Serving Techniques:** The platform implements Key-Value (KV) caching to avoid redundant computations during text generation and utilizes continuous batching (via engines like vLLM or TGI) to maximize GPU utilization. * **Lifecycle Management:** A unified platform vision ensures that the entire end-to-end lifecycle—from data preparation and fine-tuning to deployment—is streamlined for ML engineers. To stay competitive, Coupang is moving toward an integrated AI lifecycle where foundation models serve as the backbone for both content generation and predictive analytics. This infrastructure-first approach allows for the rapid deployment of generative features while maintaining the resource efficiency required for massive e-commerce scales.