Evolving Cloudflare’s Threat Intelligence Platform: actionable, scalable, and ETL-less 2026-03-03 Blake Darché Alexandra Moraru Brian Seel Jacob Crisp For years, the cybersecurity industry has suffered from a "data gravity" problem. Security teams are buried under billions of ro…
My Journey to Airbnb — Anna Sulkina -- Listen Share Anna Sulkina has always been a traveler, and we’re lucky her travels have brought her to Airbnb. Anna is a Senior Director of Engineering, and she’s responsible for Application & Cloud infrastructure. She brings over two decade…
We’re sharing a novel approach to enabling cross-device passkey authentication for devices with inaccessible displays (like XR devices). Our approach bypasses the use of QR codes and enables cross-device authentication without the need for an on-device display, while still compl…
Featured The AI Evolution of Graph Search at Netflix: From Structured Queries to Natural Language -- 8 Listen Share By Alex Hutter and Bartosz Balukiewicz Our previous blog posts (part 1, part 2, part 3) detailed how Netflix’s Graph Search platform addresses the challenges of se…
GraphQL Data Mocking at Scale with LLMs and @generateMock -- Listen Share How Airbnb combines GraphQL infra, product context, and LLMs to generate and maintain convincing, type-safe mock data using a new directive. Introduction Producing valid and realistic mock data for testing…
Netflix’s Muse platform has evolved from a simple dashboard into a high-scale Online Analytical Processing (OLAP) system that processes trillions of rows to provide creative insights for promotional media. To meet growing demands for complex audience affinity analysis and advanced filtering, the engineering team modernized the data serving layer by moving beyond basic batch pipelines. By integrating HyperLogLog sketches for approximate counting and leveraging in-memory precomputed aggregates, the system now delivers low-latency performance and high data accuracy at an immense scale.
### Approximate Counting with HyperLogLog (HLL) Sketches
To track metrics like unique impressions and qualified plays without the massive overhead of comparing billions of profile IDs, Muse utilizes the Apache Datasketches library.
* The system trades a small margin of error (approximately 0.8% with a logK of 17) for significant gains in processing speed and memory efficiency.
* Sketches are built during Druid ingestion using the HLLSketchBuild aggregator with rollup enabled to reduce data volume.
* In the Spark ETL process, all-time aggregates are maintained by merging new daily HLL sketches into existing ones using the `hll_union` function.
### Utilizing Hollow for In-Memory Aggregates
To reduce the query load on the Druid cluster, Netflix uses Hollow, an internal open-source tool designed for high-density, near-cache data sets.
* Muse stores precomputed, all-time aggregates—such as lifetime impressions per asset—within Hollow’s in-memory data structures.
* When a user requests "all-time" data, the application retrieves the results from the Hollow cache instead of forcing Druid to scan months or years of historical segments.
* This approach significantly lowers latency for the most common queries and frees up Druid resources for more complex, dynamic filtering tasks.
### Optimizing the Druid Data Layer
Efficient data retrieval from Druid is critical for supporting the application’s advanced grouping and filtering capabilities.
* The team transitioned from hash-based partitioning to range-based partitioning on frequently filtered dimensions like `video_id` to improve data locality and pruning.
* Background compaction tasks are utilized to merge small segments into larger ones, reducing metadata overhead and improving scan speeds across the cluster.
* Specific tuning was applied to the Druid broker and historical nodes, including adjusting processing threads and buffer sizes to handle the high-concurrency demands of the Muse UI.
### Validation and Data Accuracy
Because the move to HLL sketches introduces approximation, the team implemented rigorous validation processes to ensure the data remained actionable.
* Internal debugging tools were developed to compare results from the new architecture against the "ground truth" provided by legacy batch systems.
* Continuous monitoring ensures that HLL error rates remain within the expected 1–2% range and that data remains consistent across different time grains.
For organizations building large-scale OLAP applications, the Muse architecture demonstrates that performance bottlenecks can often be solved by combining approximate data structures with specialized in-memory caches to offload heavy computations from the primary database.
Viaduct, Five Years On: Modernizing the Data-Oriented Service Mesh A more powerful engine and a simpler API for our data-oriented mesh -- Listen Share By: Adam Miskiewicz, Raymie Stata In November 2020 we published a post about Viaduct, our data-oriented service mesh. Today, we’…
Taming Service-Oriented Architecture Using A Data-Oriented Service Mesh -- 3 Listen Share Introducing Viaduct, Airbnb’s data-oriented service mesh By: Raymie Stata, Arun Vijayvergiya, Adam Miskiewicz At Hasura’s Enterprise GraphQL Conf on October 22, we presented Viaduct, what w…