Meta / machine-learning

3 posts

meta

Adapting the Facebook Reels RecSys AI Model Based on User Feedback (opens in new tab)

Meta has enhanced the Facebook Reels recommendation engine by shifting focus from traditional engagement signals, like watch time and likes, to direct user feedback. By implementing the User True Interest Survey (UTIS) model, the system now prioritizes content that aligns with genuine user preferences rather than just short-term interactions. This shift has resulted in significant improvements in recommendation relevance, high-quality content delivery, and long-term user retention. **Limitations of Engagement-Based Metrics** * Traditional signals like "likes" and "watch time" are often noisy and may not reflect a user’s actual long-term interests. * Models optimized solely for engagement tend to favor short-term value over the long-term utility of the product. * Internal research found that previous heuristic-based interest models only achieved 48.3% precision in identifying what users truly care about. * Effective interest matching requires understanding nuanced factors such as production style, mood, audio, and motivation, which implicit signals often miss. **The User True Interest Survey (UTIS) Model** * Meta collects direct feedback via randomized, single-question surveys asking users to rate video interest on a 1–5 scale. * The raw survey data is binarized to denoise responses and weighted to correct for sampling and nonresponse bias. * The UTIS model functions as a lightweight "alignment model layer" built on top of the main multi-task ranking system. * The architecture uses existing model predictions as input features, supplemented by engineered features that capture content attributes and user behavior. **Integration into the Ranking Funnel** * **Late Stage Ranking (LSR):** The UTIS score is used as an additional input feature in the final value formula, allowing the system to boost high-interest videos and demote low-interest ones. * **Early Stage Ranking (Retrieval):** The model aggregates survey data to reconstruct user interest profiles, helping the system source more relevant candidates during the initial retrieval phase. * **Knowledge Distillation:** Large sequence-based retrieval models are aligned using UTIS predictions as labels through distillation objectives. **Performance and Impact** * The deployment of UTIS has led to a measurable increase in the delivery of niche, high-quality content. * Generic, popularity-based recommendations that often lack depth have been reduced. * Meta observed robust improvements across core metrics, including higher follow rates, more shares, and increased user retention. * The system now offers better interpretability, allowing engineers to understand which specific factors contribute to a user’s sense of "interest match." To continue improving the Reels ecosystem, Meta is focusing on doubling down on personalization by tackling challenges related to sparse data and sampling bias while exploring more advanced AI architectures to further diversify recommendations.

meta

DrP: Meta's Root Cause Analysis Platform at Scale (opens in new tab)

DrP is Meta’s programmatic root cause analysis (RCA) platform designed to automate incident investigations and reduce the burden of manual on-call tasks. By codifying investigation playbooks into executable "analyzers," the platform significantly lowers the mean time to resolve (MTTR) by 20% to 80% for over 300 teams. This systematic approach replaces outdated manual scripts with a scalable backend that executes 50,000 automated analyses daily, providing immediate context when alerts fire. ## Architecture and Core Components * **Expressive SDK:** Provides a framework for engineers to codify investigation workflows into "analyzers," utilizing a rich library of helper functions and machine learning algorithms. * **Built-in Analysis Tools:** The platform includes native support for anomaly detection, event isolation, time-series correlation, and dimension analysis to identify specific problem areas. * **Scalable Backend:** A multi-tenant execution environment manages a worker pool that handles thousands of requests securely and asynchronously. * **Workflow Integration:** DrP is integrated directly into Meta’s internal alerting and incident management systems, allowing for automatic triggering without human intervention. ## Authoring and Verification Workflow * **Template Bootstrapping:** Engineers use the SDK to generate boilerplate code that captures required input parameters and context in a type-safe manner. * **Analyzer Chaining:** The system allows for seamless dependency analysis by passing context between different analyzers, enabling investigations to span multiple interconnected services. * **Automated Backtesting:** Before deployment, analyzers undergo automated backtesting integrated into the code review process to ensure accuracy and performance. * **Decision Tree Logic:** Investigation steps are modeled as decision trees within the code, allowing the analyzer to follow different paths based on the data it retrieves. ## Execution and Post-Processing * **Trigger-based Analysis:** When an alert is activated, the backend automatically queues the relevant analyzer, ensuring findings are available as soon as an engineer begins triaging. * **Automated Mitigation:** A post-processing system can take direct action based on investigation results, such as creating tasks or submitting pull requests to resolve identified issues. * **DrP Insights:** This system periodically reviews historical analysis outputs to identify and rank the top causes of alerts, helping teams prioritize long-term reliability fixes. * **Alert Annotation:** Results are presented in both human-readable text and machine-readable formats, directly annotating the incident logs for the on-call responder. ## Practical Conclusion Organizations managing large-scale distributed systems should transition from static markdown playbooks to executable investigation code. By implementing a programmatic RCA framework like DrP, teams can scale their troubleshooting expertise and significantly reduce "on-call fatigue" by automating the repetitive triage steps that typically consume the first hour of an incident.

meta

Efficient Optimization With Ax, an Open Platform for Adaptive Experimentation (opens in new tab)

Meta has released Ax 1.0, an open-source platform designed to automate and optimize complex, resource-intensive experimentation through machine learning. By utilizing Bayesian optimization, the platform helps researchers navigate vast configuration spaces to improve AI models, infrastructure, and hardware design efficiently. The release aims to bridge the gap between sophisticated mathematical theory and the practical requirements of production-scale engineering. ## Real-World Experimentation and Utility * Ax is used extensively at Meta for diverse tasks, including tuning hyperparameter configurations, discovering optimal data mixtures for Generative AI, and optimizing compiler flags. * The platform is built to handle the logistical "overhead" of experimentation, such as managing experiment states, automating orchestration, and providing diagnostic tools. * It supports multi-objective optimization, allowing users to balance competing metrics and enforce "guardrail" constraints rather than just maximizing a single value. * Applications extend beyond software to physical engineering, such as optimizing design parameters for AR/VR hardware. ## System Insight and Analysis * Beyond finding optimal points, Ax serves as a diagnostic tool to help researchers understand the underlying behavior of their systems. * It includes built-in visualizations for Pareto frontiers, which illustrate the trade-offs between different metrics. * Sensitivity analysis tools identify which specific input parameters have the greatest impact on the final results. * The platform provides automated plots and tables to track optimization progress and visualize the effect of parameters across the entire input space. ## Technical Methodology and Architecture * Ax utilizes Bayesian optimization, an iterative approach that balances "exploration" (sampling new areas) with "exploitation" (refining known good areas). * The platform relies on **BoTorch** for its underlying Bayesian components and typically employs **Gaussian processes (GP)** as surrogate models. * GPs are preferred because they can make accurate predictions and quantify uncertainty even when provided with very few data points. * The system uses an **Expected Improvement (EI)** acquisition function to calculate the potential value of new configurations compared to the current best-known result. * This surrogate-based approach is designed to scale to high-dimensional settings involving hundreds of tunable parameters where traditional search methods are too costly. To begin implementing these methods, developers can install the platform via `pip install ax-platform`. Ax 1.0 provides a robust framework for moving cutting-edge optimization research directly into production environments.