Netflix

10 posts

netflixtechblog.com

Filter by tag

netflix

How Temporal Powers Reliable Cloud Operations at Netflix | by Netflix Technology Blog | Dec, 2025 | Netflix TechBlog (opens in new tab)

Netflix has significantly enhanced the reliability of its global continuous delivery platform, Spinnaker, by adopting Temporal for durable execution of cloud operations. By migrating away from a fragile, polling-based orchestration model between its internal services, the engineering team successfully reduced transient deployment failures from 4% to a remarkable 0.0001%. This shift has allowed developers to write complex, long-running operational logic as standard code while the underlying platform handles state persistence and fault recovery. ### Limitations of Legacy Orchestration * **The Polling Bottleneck:** Originally, Netflix's orchestration engine (Orca) communicated with its cloud interface (Clouddriver) via a synchronous POST request followed by continuous polling of a GET endpoint to track task status. * **State Fragility:** Clouddriver utilized an internal orchestration engine that relied on in-memory state or volatile Redis storage, meaning if a Clouddriver instance crashed mid-operation, the deployment state was often lost, leading to "zombie" tasks or failed deployments. * **Manual Error Handling:** Developers had to manually implement complex retry logic, exponential backoffs, and state checkpointing for every cloud operation, which was both error-prone and difficult to maintain. ### Transitioning to Durable Execution with Temporal * **Abstraction of Failures:** Temporal provides a "Durable Execution" platform where the state of a workflow—including local variables and thread stacks—is automatically persisted. This allows code to run "as if failures don’t exist," as the system can resume exactly where it left off after a process crash or network interruption. * **Workflows and Activities:** Netflix re-architected cloud operations into Temporal Workflows (orchestration logic) and Activities (idempotent units of work like calling an AWS API). This separation ensures that the orchestration logic remains deterministic while external side effects are handled reliably. * **Eliminating Polling:** By using Temporal’s signaling and long-running execution capabilities, Netflix moved away from the heavy overhead of thousands of services polling for status updates, replacing them with a push-based, event-driven model. ### Impact on Cloud Operations * **Dramatic Reliability Gains:** The most significant outcome was the near-elimination of transient failures, moving from a 4% failure rate to 0.0001%, ensuring that critical updates to the Open Connect CDN and Live streaming infrastructure are executed with high confidence. * **Developer Productivity:** Using Temporal’s SDKs, Netflix engineers can now write standard Java or Go code to define complex deployment strategies (like canary releases or blue-green deployments) without building custom state machines or management layers. * **Operational Visibility:** Temporal provides a native UI and history audit log for every workflow, giving operators deep visibility into exactly which step of a deployment failed and why, along with the ability to retry specific failed steps manually if necessary. For organizations managing complex, distributed cloud infrastructure, adopting a durable execution framework like Temporal is highly recommended. It moves the burden of state management and fault tolerance from the application layer to the platform, allowing engineers to focus on business logic rather than the mechanics of distributed systems failure.

netflix

Netflix Live Origin. Xiaomei Liu, Joseph Lynch, Chris Newton | by Netflix Technology Blog | Dec, 2025 | Netflix TechBlog (opens in new tab)

The Netflix Live Origin is a specialized, multi-tenant microservice designed to bridge the gap between cloud-based live streaming pipelines and the Open Connect content delivery network. By operating as an intelligent broker, it manages content selection across redundant regional pipelines to ensure that only valid, high-quality segments are distributed to client devices. This architecture allows Netflix to achieve high resilience and stream integrity through server-side failover and deterministic segment selection. ### Multi-Pipeline and Multi-Region Awareness * The origin server mitigates common live streaming defects, such as missing segments, timing discontinuities, and short segments containing missing video or audio samples. * It leverages independent, redundant streaming pipelines across different AWS regions to ensure high availability; if one pipeline fails or produces a defective segment, the origin selects a valid candidate from an alternate path. * Implementation of epoch locking at the cloud encoder level allows the origin to interchangeably select segments from various pipelines. * The system uses lightweight media inspection at the packager level to generate metadata, which the origin then uses to perform deterministic candidate selection. ### Stream Distribution and Protocol Integration * The service operates on AWS EC2 instances and utilizes standard HTTP protocol features for communication. * Upstream packagers use HTTP PUT requests to push segments into storage at specific URLs, while the downstream Open Connect network retrieves them via GET requests. * The architecture is optimized for a manifest design that uses segment templates and constant segment durations, which reduces the need for frequent manifest refreshes. ### Open Connect Streaming Optimization * While Netflix’s Open Connect Appliances (OCAs) were originally optimized for VOD, the Live Origin extends nginx proxy-caching functionality to meet live-specific requirements. * OCAs are provided with Live Event Configuration data, including Availability Start Times and initial segment numbers, to determine the legitimate range of segments for an event. * This predictive modeling allows the CDN to reject requests for objects outside the valid range immediately, reducing unnecessary traffic and load on the origin. By decoupling the live streaming pipeline from the distribution network through this specialized origin layer, Netflix can maintain a high level of fault tolerance and stream stability. This approach minimizes client-side complexity by handling failovers and segment selection on the server side, ensuring a seamless experience for viewers of live events.

netflix

AV1 — Now Powering 30% of Netflix Streaming | by Netflix Technology Blog | Dec, 2025 | Netflix TechBlog (opens in new tab)

Netflix has successfully integrated the AV1 codec into its streaming infrastructure, where it now accounts for 30% of all viewing traffic and is on track to become the platform's primary format. This transition from legacy standards like H.264/AVC is driven by AV1's superior compression efficiency, which allows for higher visual quality at significantly lower bitrates. By leveraging this open-source technology, Netflix has enhanced the user experience across a diverse range of devices while simultaneously optimizing global network bandwidth. ### Evolution of AV1 Adoption The journey to 30% adoption began with a strategic rollout across different device ecosystems, balancing software flexibility with hardware requirements. * **Mobile Origins:** The rollout started in 2020 on Android using the "dav1d" software decoder, which was specifically optimized for ARM chipsets to provide better quality for data-conscious mobile users. * **Large Screen Integration:** In 2021, Netflix expanded AV1 to Smart TVs and streaming sticks, working closely with SoC vendors to certify hardware decoders capable of handling 4K and high frame rate (HFR) content. * **Ecosystem Expansion:** Support was extended to web browsers in 2022 and eventually to the Apple ecosystem in 2023 following the introduction of hardware AV1 support in M3 and A17 Pro chips. ### Quantifiable Performance Gains The shift to AV1 has resulted in measurable improvements in video fidelity and streaming stability compared to previous standards. * **Visual Quality:** On average, AV1 streaming sessions achieve VMAF scores that are 4.3 points higher than AVC and 0.9 points higher than HEVC. * **Bandwidth Efficiency:** AV1 sessions require approximately one-third less bandwidth than both AVC and HEVC to maintain the same level of quality. * **Reliability:** The increased efficiency has led to a 45% reduction in buffering interruptions, making high-quality 4K streaming more accessible in regions with limited network infrastructure. ### Live Streaming and Spatial Video Beyond standard video-on-demand, Netflix is utilizing AV1 to power its latest innovations in live broadcasting and immersive media. * **Live Events:** For major live events, such as the Jake Paul vs. Mike Tyson fight, Netflix utilized 10-bit AV1 to provide better resilience against packet loss and lower latency compared to traditional codecs. * **Immersive Content:** AV1 serves as the backbone for spatial video on devices like the Apple Vision Pro, delivering high-bitrate HDR content necessary for a convincing "cinema-grade" experience. As AV1 continues to displace older codecs, the industry is already looking toward the next milestone with the upcoming release of AV2. For developers and hardware manufacturers, the rapid success of AV1 underscores the importance of supporting open-source media standards to meet the increasing consumer demand for high-fidelity, low-latency streaming.

netflix

How and Why Netflix Built a Real-Time Distributed Graph: Part 1 — Ingesting and Processing Data Streams at Internet Scale | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix has developed a Real-Time Distributed Graph (RDG) to unify member interaction data across its expanding business verticals, including streaming, live events, and mobile gaming. By transitioning from siloed microservice data to a graph-based model, the company can perform low-latency, relationship-centric queries that were previously hindered by expensive manual joins and data fragmentation. The resulting system enables Netflix to track user journeys across various devices and platforms in real-time, providing a foundation for deeper personalization and pattern detection. ### Challenges of Data Isolation in Microservices * While Netflix’s microservices architecture facilitates independent scaling and service decomposition, it inherently leads to data isolation where each service manages its own storage. * Data scientists and engineers previously had to "stitch" together disparate data from various databases and the central data warehouse, which was a slow and manual process. * The RDG moves away from table-based models to a relationship-centric model, allowing for efficient "hops" across nodes without the need for complex denormalization. * This flexibility allows the system to adapt to new business entities (like live sports or games) without requiring massive schema re-architectures. ### Real-Time Ingestion and Normalization * The ingestion layer is designed to capture events from diverse upstream sources, including Change Data Capture (CDC) from databases and request/response logs. * Netflix utilizes its internal data pipeline, Keystone, to funnel these high-volume event streams into the processing framework. * The system must handle "Internet scale" data, ensuring that events from millions of members are captured as they happen to maintain an up-to-date view of the graph. ### Stream Processing with Apache Flink * Netflix uses Apache Flink as the core stream processing engine to handle the transformation of raw events into graph entities. * Incoming data undergoes normalization to ensure a standardized format, regardless of which microservice or business vertical the data originated from. * The pipeline performs data enrichment, joining incoming streams with auxiliary metadata to provide a comprehensive context for each interaction. * The final step of the processing layer involves mapping these enriched events into a graph structure of nodes (entities) and edges (relationships), which are then emitted to the system's storage layer. ### Practical Conclusion Organizations operating with a highly decoupled microservices architecture should consider a graph-based ingestion strategy to overcome the limitations of data silos. By leveraging stream processing tools like Apache Flink to build a real-time graph, engineering teams can provide stakeholders with the ability to discover hidden relationships and cross-domain insights that are often lost in traditional data warehouses.

netflix

Netflix's Metaflow Spin: Faster ML Development | Netflix TechBlog (opens in new tab)

Netflix has introduced Spin, a new functionality within the Metaflow framework designed to significantly accelerate the iterative development cycle for ML and AI workflows. By bridging the gap between the interactive speed of notebooks and the production-grade reliability of versioned workflows, Spin allows developers to experiment with stateful increments without the latency of full restarts. This enhancement ensures that the "prototype to production" pipeline remains fluid while maintaining the deterministic execution and explicit state management that Metaflow provides at scale. ### The Nature of ML and AI Iteration * ML and AI development is distinct from traditional software engineering because it involves large, mutable datasets and computationally expensive, stochastic processes. * State management is a primary concern in this domain, as reloading data or recomputing transformations for every minor code change creates a prohibitively slow feedback loop. * While notebooks like Jupyter or Marimo excel at preserving in-memory state for fast exploration, they often lead to "hidden state" problems and non-deterministic results due to out-of-order cell execution. ### Metaflow as a State-Aware Framework * Metaflow uses the `@step` decorator to define checkpoint boundaries where the framework automatically persists all instance variables as versioned artifacts. * The framework’s `resume` command allows developers to restart execution from a specific step, cloning previous state to avoid recomputing successful upstream tasks. * This architecture addresses notebook limitations by ensuring execution order is explicit and deterministic while making the state fully discoverable and versioned. ### Introducing Spin for Rapid Development * Spin is a new feature introduced in Metaflow 2.19 that further reduces the friction of the iterative development loop. * It aims to provide the near-instant feedback of a notebook environment while operating within the structure of a production-ready Metaflow workflow. * The tool helps developers manage the stateful nature of ML development, allowing for quick, incremental experimentation without losing continuity between code iterations. To improve data science productivity and reduce "waiting time" during the development phase, engineering teams should look to adopt Metaflow 2.19 and integrate Spin into their experimentation workflows.

netflix

Post-Training Generative Recommenders with Advantage-Weighted Supervised Finetuning | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix is evolving its recommendation systems by moving beyond simple behavior imitation toward generative recommenders that better align with true user preferences. While generative models like HSTU and OneRec effectively capture sequential user patterns, they often struggle to distinguish between habitual clicks and genuine satisfaction. To bridge this gap, Netflix developed Advantage-Weighted Supervised Fine-tuning (A-SFT), a post-training method that leverages noisy reward signals to refine model performance without the need for complex counterfactual data. ### The Shift to Generative Recommenders * Modern generative recommenders (GRs), such as HSTU and OneRec, utilize transformer architectures to treat recommendation as a sequential transduction task. * The models are typically trained using next-item prediction, where the system learns to imitate the chronological sequence of a user’s activities. * A significant drawback of this "behavior cloning" approach is that it captures external trends and noise rather than long-term user satisfaction, potentially recommending content the user finished but did not actually enjoy. ### Barriers to Reinforcement Learning in RecSys * Traditional post-training methods used in Large Language Models, such as Proximal Policy Optimization (PPO) or Direct Preference Optimization (DPO), require counterfactual feedback that is difficult to obtain in recommendation contexts. * Because user sequences span weeks or years, it is impractical to generate and test hypothetical, counterfactual experiences for real-time user validation. * Reward signals in recommendation systems are inherently noisy; for instance, high watch time might indicate interest, but it can also be a result of external circumstances, making it an unreliable metric for optimization. ### Advantage-Weighted Supervised Fine-tuning (A-SFT) * A-SFT is a hybrid approach that sits between offline reinforcement learning and standard supervised fine-tuning. * The algorithm incorporates an advantage function to weight training examples, allowing the model to prioritize actions that lead to higher rewards while filtering out noise from the reward model. * This method is specifically designed to handle high-variance reward signals, using them as directional guides rather than absolute truth, which prevents the model from over-exploiting inaccurate data. * Benchmarks against other representative methods show that A-SFT achieves superior alignment between the generative recommendation policy and the underlying reward model. For organizations managing large-scale recommendation engines, A-SFT offers a practical path to implementing post-training improvements. By focusing on advantage-weighted signals, developers can improve recommendation quality using existing implicit feedback—like watch time and clicks—without the infrastructure hurdles of online reinforcement learning.

netflix

Behind the Streams: Real-Time Recommendations for Live Events Part 3 | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix manages the massive surge of concurrent users during live events by utilizing a hybrid strategy of prefetching and real-time broadcasting to deliver synchronized recommendations. By decoupling data delivery from the live trigger, the system avoids the "thundering herd" effect that would otherwise overwhelm cloud infrastructure during record-breaking broadcasts. This architecture ensures that millions of global devices receive timely updates and visual cues without requiring linear, inefficient scaling of compute resources. ### The Constraint Optimization Problem To maintain a seamless experience, Netflix engineers balance three primary technical constraints: time to update, request throughput, and compute cardinality. * **Time:** The specific duration required to coordinate and push a recommendation update to the entire global fleet. * **Throughput:** The maximum capacity of cloud services to handle incoming requests without service degradation. * **Cardinality:** The variety and complexity of unique requests necessary to serve personalized updates to different user segments. ### Two-Phase Recommendation Delivery The system splits the delivery process into two distinct stages to smooth out traffic spikes and ensure high availability. * **Prefetching Phase:** While members browse the app normally before an event, the system downloads materialized recommendations, metadata, and artwork into the device's local cache. * **Broadcasting Phase:** When the event begins, a low-cardinality "at least once" message is broadcast to all connected devices, triggering them to display the already-cached content instantaneously. * **Traffic Smoothing:** This approach eliminates the need for massive, real-time data fetches at the moment of kickoff, distributing the heavy lifting of data transfer over a longer period. ### Live State Management and UI Synchronization A dedicated Live State Management (LSM) system tracks event schedules in real time to ensure the user interface stays perfectly in sync with the production. * **Dynamic Adjustments:** If a live event is delayed or ends early, the LSM adjusts the broadcast triggers to preserve accuracy and prevent "spoilers" or dead links. * **Visual Cues:** The UI utilizes "Live" badging and dynamic artwork transitions to signal urgency and guide users toward the stream. * **Frictionless Playback:** For members already on a title’s detail page, the system can trigger an automatic transition into the live player the moment the broadcast begins, reducing navigation latency. To support global-scale live events, technical teams should prioritize edge-heavy strategies that pre-position assets on client devices. By shifting from a reactive request-response model to a proactive prefetch-and-trigger model, platforms can maintain high performance and reliability even during the most significant traffic peaks.

netflix

100X Faster: How We Supercharged Netflix Maestro’s Workflow Engine | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix has significantly optimized Maestro, its horizontally scalable workflow orchestrator, to meet the evolving demands of low-latency use cases like live events, advertising, and gaming. By redesigning the core engine to transition from a polling-based architecture to a high-performance event-driven model, the team achieved a 100x increase in speed. This evolution reduced workflow overhead from several seconds to mere milliseconds, drastically improving developer productivity and system efficiency. ### Limitations of the Legacy Architecture The original Maestro architecture was built on a three-layer system that, while scalable, introduced significant latency during execution. * **Polling Latency:** The internal flow engine relied on calling execution functions at set intervals, creating a "speedbump" where tasks waited seconds to be picked up by workers. * **Execution Overhead:** The process of translating complex workflow graphs into parallel flows and sequentially chained tasks added internal processing time that hindered sub-hourly and ad-hoc workloads. * **Concurrency Issues:** A lack of strong guarantees from the internal flow engine occasionally led to race conditions, where a single step might be executed by multiple workers simultaneously. ### Transitioning to an Event-Driven Engine To support the highest level of user needs, Netflix replaced the traditional flow engine with a custom, high-performance execution model. * **Direct Dispatching:** The engine moved away from periodic polling in favor of an event-driven mechanism that triggers state transitions instantly. * **State Machine Optimization:** The new design manages the lifecycle of workflows and steps through a more streamlined state machine, ensuring faster transitions between "start," "restart," "stop," and "pause" actions. * **Reduced Data Latency:** The team optimized data access patterns for internal state storage, reducing the time required to write Maestro data to the database during high-volume executions. ### Scalability and Functional Improvements The redesign not only improved speed but also strengthened the engine's ability to handle massive, complex data pipelines. * **Isolation Layers:** The engine maintains strict isolation between the Maestro step runtime (integrated with Spark and Trino) and the underlying execution logic. * **Support for Heterogeneous Workflows:** The supercharged engine continues to support massive workflows with hundreds of thousands of jobs while providing the low latency required for iterative development cycles. * **Reliability Guarantees:** By moving to a more robust internal event bus, the system eliminated the race conditions found in the previous distributed job queue implementation. For organizations managing large-scale Data or ML workflows, moving toward an event-driven orchestration model is essential for supporting sub-hourly execution and low-latency ad-hoc queries. These performance improvements are now available in the Maestro open-source project for wider community adoption.

netflix

Building a Resilient Data Platform with Write-Ahead Log at Netflix | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix has developed a distributed Write-Ahead Log (WAL) abstraction to address critical data challenges such as accidental corruption, system entropy, and the complexities of cross-region replication. By decoupling data mutation from immediate persistence and providing a unified API, this system ensures strong durability and eventual consistency across diverse storage engines. The WAL acts as a resilient buffer that powers high-leverage features like secondary indexing and delayed retry queues while maintaining the massive scale required for global operations. ### The Role of the WAL Abstraction * The system serves as a centralized mechanism to capture data changes and reliably deliver them to downstream consumers, mitigating the risk of data loss during administrative errors or database corruption. * It provides a simplified `WriteToLog` gRPC endpoint that abstracts underlying infrastructure, allowing developers to focus on data logic rather than the specifics of the storage layer. * By acting as a durable intermediary, it prevents permanent data loss during incidents where primary datastores fail or require schema changes that might otherwise lead to corruption. ### Flexible Personas and Namespaces * The architecture utilizes "namespaces" to define logical separation, allowing different services to configure specific storage backends like Kafka or SQS based on their needs. * The "Delayed Queues" persona leverages SQS to provide a scalable way to retry failed messages in real-time pipelines without sacrificing overall system throughput. * The system can be configured for "Cross-Region Replication," enabling high availability and disaster recovery for storage engines that do not natively support multi-region data transfer. ### Solving System Entropy and Consistency * The WAL addresses the "dual-write" problem, where updates to primary stores (such as Cassandra) and search indices (such as Elasticsearch) can diverge over time, leading to data inconsistency. * It facilitates reliable secondary indexing for NoSQL databases by managing updates to multiple partitions as a coordinated sequence of events. * The platform mitigates operational risks, such as Out-of-Memory (OOM) errors on Key-Value nodes caused by bulk deletes, by staging and throttling mutations through the log. Organizations operating at scale should adopt a WAL-centric architecture to simplify the management of heterogeneous data stores and enhance system resilience. By centralizing the mutation log, teams can implement complex features like Change Data Capture (CDC) and cross-region failover through a single, consistent interface rather than building bespoke solutions for every service.

netflix

Scaling Muse: How Netflix Powers Data-Driven Creative Insights at Trillion-Row Scale | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix’s Muse platform has evolved from a simple dashboard into a high-scale Online Analytical Processing (OLAP) system that processes trillions of rows to provide creative insights for promotional media. To meet growing demands for complex audience affinity analysis and advanced filtering, the engineering team modernized the data serving layer by moving beyond basic batch pipelines. By integrating HyperLogLog sketches for approximate counting and leveraging in-memory precomputed aggregates, the system now delivers low-latency performance and high data accuracy at an immense scale. ### Approximate Counting with HyperLogLog (HLL) Sketches To track metrics like unique impressions and qualified plays without the massive overhead of comparing billions of profile IDs, Muse utilizes the Apache Datasketches library. * The system trades a small margin of error (approximately 0.8% with a logK of 17) for significant gains in processing speed and memory efficiency. * Sketches are built during Druid ingestion using the HLLSketchBuild aggregator with rollup enabled to reduce data volume. * In the Spark ETL process, all-time aggregates are maintained by merging new daily HLL sketches into existing ones using the `hll_union` function. ### Utilizing Hollow for In-Memory Aggregates To reduce the query load on the Druid cluster, Netflix uses Hollow, an internal open-source tool designed for high-density, near-cache data sets. * Muse stores precomputed, all-time aggregates—such as lifetime impressions per asset—within Hollow’s in-memory data structures. * When a user requests "all-time" data, the application retrieves the results from the Hollow cache instead of forcing Druid to scan months or years of historical segments. * This approach significantly lowers latency for the most common queries and frees up Druid resources for more complex, dynamic filtering tasks. ### Optimizing the Druid Data Layer Efficient data retrieval from Druid is critical for supporting the application’s advanced grouping and filtering capabilities. * The team transitioned from hash-based partitioning to range-based partitioning on frequently filtered dimensions like `video_id` to improve data locality and pruning. * Background compaction tasks are utilized to merge small segments into larger ones, reducing metadata overhead and improving scan speeds across the cluster. * Specific tuning was applied to the Druid broker and historical nodes, including adjusting processing threads and buffer sizes to handle the high-concurrency demands of the Muse UI. ### Validation and Data Accuracy Because the move to HLL sketches introduces approximation, the team implemented rigorous validation processes to ensure the data remained actionable. * Internal debugging tools were developed to compare results from the new architecture against the "ground truth" provided by legacy batch systems. * Continuous monitoring ensures that HLL error rates remain within the expected 1–2% range and that data remains consistent across different time grains. For organizations building large-scale OLAP applications, the Muse architecture demonstrates that performance bottlenecks can often be solved by combining approximate data structures with specialized in-memory caches to offload heavy computations from the primary database.