Netflix / machine-learning

2 posts

netflix

Netflix's Metaflow Spin: Faster ML Development | Netflix TechBlog (opens in new tab)

Netflix has introduced Spin, a new functionality within the Metaflow framework designed to significantly accelerate the iterative development cycle for ML and AI workflows. By bridging the gap between the interactive speed of notebooks and the production-grade reliability of versioned workflows, Spin allows developers to experiment with stateful increments without the latency of full restarts. This enhancement ensures that the "prototype to production" pipeline remains fluid while maintaining the deterministic execution and explicit state management that Metaflow provides at scale. ### The Nature of ML and AI Iteration * ML and AI development is distinct from traditional software engineering because it involves large, mutable datasets and computationally expensive, stochastic processes. * State management is a primary concern in this domain, as reloading data or recomputing transformations for every minor code change creates a prohibitively slow feedback loop. * While notebooks like Jupyter or Marimo excel at preserving in-memory state for fast exploration, they often lead to "hidden state" problems and non-deterministic results due to out-of-order cell execution. ### Metaflow as a State-Aware Framework * Metaflow uses the `@step` decorator to define checkpoint boundaries where the framework automatically persists all instance variables as versioned artifacts. * The framework’s `resume` command allows developers to restart execution from a specific step, cloning previous state to avoid recomputing successful upstream tasks. * This architecture addresses notebook limitations by ensuring execution order is explicit and deterministic while making the state fully discoverable and versioned. ### Introducing Spin for Rapid Development * Spin is a new feature introduced in Metaflow 2.19 that further reduces the friction of the iterative development loop. * It aims to provide the near-instant feedback of a notebook environment while operating within the structure of a production-ready Metaflow workflow. * The tool helps developers manage the stateful nature of ML development, allowing for quick, incremental experimentation without losing continuity between code iterations. To improve data science productivity and reduce "waiting time" during the development phase, engineering teams should look to adopt Metaflow 2.19 and integrate Spin into their experimentation workflows.

netflix

Post-Training Generative Recommenders with Advantage-Weighted Supervised Finetuning | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix is evolving its recommendation systems by moving beyond simple behavior imitation toward generative recommenders that better align with true user preferences. While generative models like HSTU and OneRec effectively capture sequential user patterns, they often struggle to distinguish between habitual clicks and genuine satisfaction. To bridge this gap, Netflix developed Advantage-Weighted Supervised Fine-tuning (A-SFT), a post-training method that leverages noisy reward signals to refine model performance without the need for complex counterfactual data. ### The Shift to Generative Recommenders * Modern generative recommenders (GRs), such as HSTU and OneRec, utilize transformer architectures to treat recommendation as a sequential transduction task. * The models are typically trained using next-item prediction, where the system learns to imitate the chronological sequence of a user’s activities. * A significant drawback of this "behavior cloning" approach is that it captures external trends and noise rather than long-term user satisfaction, potentially recommending content the user finished but did not actually enjoy. ### Barriers to Reinforcement Learning in RecSys * Traditional post-training methods used in Large Language Models, such as Proximal Policy Optimization (PPO) or Direct Preference Optimization (DPO), require counterfactual feedback that is difficult to obtain in recommendation contexts. * Because user sequences span weeks or years, it is impractical to generate and test hypothetical, counterfactual experiences for real-time user validation. * Reward signals in recommendation systems are inherently noisy; for instance, high watch time might indicate interest, but it can also be a result of external circumstances, making it an unreliable metric for optimization. ### Advantage-Weighted Supervised Fine-tuning (A-SFT) * A-SFT is a hybrid approach that sits between offline reinforcement learning and standard supervised fine-tuning. * The algorithm incorporates an advantage function to weight training examples, allowing the model to prioritize actions that lead to higher rewards while filtering out noise from the reward model. * This method is specifically designed to handle high-variance reward signals, using them as directional guides rather than absolute truth, which prevents the model from over-exploiting inaccurate data. * Benchmarks against other representative methods show that A-SFT achieves superior alignment between the generative recommendation policy and the underlying reward model. For organizations managing large-scale recommendation engines, A-SFT offers a practical path to implementing post-training improvements. By focusing on advantage-weighted signals, developers can improve recommendation quality using existing implicit feedback—like watch time and clicks—without the infrastructure hurdles of online reinforcement learning.