netflix

How and Why Netflix Built a Real-Time Distributed Graph: Part 1 — Ingesting and Processing Data Streams at Internet Scale | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix has developed a Real-Time Distributed Graph (RDG) to unify member interaction data across its expanding business verticals, including streaming, live events, and mobile gaming. By transitioning from siloed microservice data to a graph-based model, the company can perform low-latency, relationship-centric queries that were previously hindered by expensive manual joins and data fragmentation. The resulting system enables Netflix to track user journeys across various devices and platforms in real-time, providing a foundation for deeper personalization and pattern detection.

Challenges of Data Isolation in Microservices

  • While Netflix’s microservices architecture facilitates independent scaling and service decomposition, it inherently leads to data isolation where each service manages its own storage.
  • Data scientists and engineers previously had to "stitch" together disparate data from various databases and the central data warehouse, which was a slow and manual process.
  • The RDG moves away from table-based models to a relationship-centric model, allowing for efficient "hops" across nodes without the need for complex denormalization.
  • This flexibility allows the system to adapt to new business entities (like live sports or games) without requiring massive schema re-architectures.

Real-Time Ingestion and Normalization

  • The ingestion layer is designed to capture events from diverse upstream sources, including Change Data Capture (CDC) from databases and request/response logs.
  • Netflix utilizes its internal data pipeline, Keystone, to funnel these high-volume event streams into the processing framework.
  • The system must handle "Internet scale" data, ensuring that events from millions of members are captured as they happen to maintain an up-to-date view of the graph.

Stream Processing with Apache Flink

  • Netflix uses Apache Flink as the core stream processing engine to handle the transformation of raw events into graph entities.
  • Incoming data undergoes normalization to ensure a standardized format, regardless of which microservice or business vertical the data originated from.
  • The pipeline performs data enrichment, joining incoming streams with auxiliary metadata to provide a comprehensive context for each interaction.
  • The final step of the processing layer involves mapping these enriched events into a graph structure of nodes (entities) and edges (relationships), which are then emitted to the system's storage layer.

Practical Conclusion

Organizations operating with a highly decoupled microservices architecture should consider a graph-based ingestion strategy to overcome the limitations of data silos. By leveraging stream processing tools like Apache Flink to build a real-time graph, engineering teams can provide stakeholders with the ability to discover hidden relationships and cross-domain insights that are often lost in traditional data warehouses.