netflix

Behind the Streams: Real-Time Recommendations for Live Events Part 3 | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix manages the massive surge of concurrent users during live events by utilizing a hybrid strategy of prefetching and real-time broadcasting to deliver synchronized recommendations. By decoupling data delivery from the live trigger, the system avoids the "thundering herd" effect that would otherwise overwhelm cloud infrastructure during record-breaking broadcasts. This architecture ensures that millions of global devices receive timely updates and visual cues without requiring linear, inefficient scaling of compute resources.

The Constraint Optimization Problem

To maintain a seamless experience, Netflix engineers balance three primary technical constraints: time to update, request throughput, and compute cardinality.

  • Time: The specific duration required to coordinate and push a recommendation update to the entire global fleet.
  • Throughput: The maximum capacity of cloud services to handle incoming requests without service degradation.
  • Cardinality: The variety and complexity of unique requests necessary to serve personalized updates to different user segments.

Two-Phase Recommendation Delivery

The system splits the delivery process into two distinct stages to smooth out traffic spikes and ensure high availability.

  • Prefetching Phase: While members browse the app normally before an event, the system downloads materialized recommendations, metadata, and artwork into the device's local cache.
  • Broadcasting Phase: When the event begins, a low-cardinality "at least once" message is broadcast to all connected devices, triggering them to display the already-cached content instantaneously.
  • Traffic Smoothing: This approach eliminates the need for massive, real-time data fetches at the moment of kickoff, distributing the heavy lifting of data transfer over a longer period.

Live State Management and UI Synchronization

A dedicated Live State Management (LSM) system tracks event schedules in real time to ensure the user interface stays perfectly in sync with the production.

  • Dynamic Adjustments: If a live event is delayed or ends early, the LSM adjusts the broadcast triggers to preserve accuracy and prevent "spoilers" or dead links.
  • Visual Cues: The UI utilizes "Live" badging and dynamic artwork transitions to signal urgency and guide users toward the stream.
  • Frictionless Playback: For members already on a title’s detail page, the system can trigger an automatic transition into the live player the moment the broadcast begins, reducing navigation latency.

To support global-scale live events, technical teams should prioritize edge-heavy strategies that pre-position assets on client devices. By shifting from a reactive request-response model to a proactive prefetch-and-trigger model, platforms can maintain high performance and reliability even during the most significant traffic peaks.