netflix

Behind the Streams: Real-Time Recommendations for Live Events Part 3 | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix manages the massive surge of concurrent users during live events by utilizing a hybrid strategy of prefetching and real-time broadcasting to deliver synchronized recommendations. By decoupling data delivery from the live trigger, the system avoids the "thundering herd" effect that would otherwise overwhelm cloud infrastructure during record-breaking broadcasts. This architecture ensures that millions of global devices receive timely updates and visual cues without requiring linear, inefficient scaling of compute resources. ### The Constraint Optimization Problem To maintain a seamless experience, Netflix engineers balance three primary technical constraints: time to update, request throughput, and compute cardinality. * **Time:** The specific duration required to coordinate and push a recommendation update to the entire global fleet. * **Throughput:** The maximum capacity of cloud services to handle incoming requests without service degradation. * **Cardinality:** The variety and complexity of unique requests necessary to serve personalized updates to different user segments. ### Two-Phase Recommendation Delivery The system splits the delivery process into two distinct stages to smooth out traffic spikes and ensure high availability. * **Prefetching Phase:** While members browse the app normally before an event, the system downloads materialized recommendations, metadata, and artwork into the device's local cache. * **Broadcasting Phase:** When the event begins, a low-cardinality "at least once" message is broadcast to all connected devices, triggering them to display the already-cached content instantaneously. * **Traffic Smoothing:** This approach eliminates the need for massive, real-time data fetches at the moment of kickoff, distributing the heavy lifting of data transfer over a longer period. ### Live State Management and UI Synchronization A dedicated Live State Management (LSM) system tracks event schedules in real time to ensure the user interface stays perfectly in sync with the production. * **Dynamic Adjustments:** If a live event is delayed or ends early, the LSM adjusts the broadcast triggers to preserve accuracy and prevent "spoilers" or dead links. * **Visual Cues:** The UI utilizes "Live" badging and dynamic artwork transitions to signal urgency and guide users toward the stream. * **Frictionless Playback:** For members already on a title’s detail page, the system can trigger an automatic transition into the live player the moment the broadcast begins, reducing navigation latency. To support global-scale live events, technical teams should prioritize edge-heavy strategies that pre-position assets on client devices. By shifting from a reactive request-response model to a proactive prefetch-and-trigger model, platforms can maintain high performance and reliability even during the most significant traffic peaks.

google

Teaching Gemini to spot exploding stars with just a few examples (opens in new tab)

Researchers have demonstrated that Google’s Gemini model can classify cosmic events with 93% accuracy, rivaling specialized machine learning models while providing human-readable explanations. By utilizing few-shot learning with only 15 examples per survey, the model addresses the "black box" limitation of traditional convolutional neural networks used in astronomy. This approach enables scientists to efficiently process the millions of alerts generated by modern telescopes while maintaining a transparent and interactive reasoning process. ## Bottlenecks in Modern Transient Astronomy * Telescopes like the Vera C. Rubin Observatory are expected to generate up to 10 million alerts per night, making manual verification impossible. * The vast majority of these alerts are "bogus" signals caused by satellite trails, cosmic rays, or instrumental artifacts rather than real supernovae. * Existing specialized models often provide binary "real" or "bogus" labels without context, forcing astronomers to either blindly trust the output or spend hours on manual verification. ## Multimodal Few-Shot Learning for Classification * The research utilized few-shot learning, providing Gemini with only 15 annotated examples for three major surveys: Pan-STARRS, MeerLICHT, and ATLAS. * Input data consisted of image triplets—a "new" alert image, a "reference" image of the same sky patch, and a "difference" image—each 100x100 pixels in size. * The model successfully generalized across different telescopes with varying pixel scales, ranging from 0.25" per pixel for Pan-STARRS to 1.8" per pixel for ATLAS. * Beyond simple labels, Gemini generates a textual description of observed features and an interest score to help astronomers prioritize follow-up observations. ## Expert Validation and Self-Assessment * A panel of 12 professional astronomers evaluated the model using a 0–5 coherence rubric, confirming that Gemini’s logic aligned with expert reasoning. * The study found that Gemini can effectively assess its own uncertainty; low self-assigned "coherence scores" were strong indicators of likely classification errors. * This ability to flag its own potential mistakes allows the model to act as a reliable partner, alerting scientists when a specific case requires human intervention. The transition from "black box" classifiers to interpretable AI assistants allows the astronomical community to scale with the data flood of next-generation telescopes. By combining high-accuracy classification with transparent reasoning, researchers can maintain scientific rigor while processing millions of cosmic events in real time.

google

A picture's worth a thousand (private) words: Hierarchical generation of coherent synthetic photo albums (opens in new tab)

Researchers at Google have developed a hierarchical method for generating differentially private (DP) synthetic photo albums, providing a way to share representative datasets while protecting sensitive individual information. By utilizing an intermediate text representation and a two-stage generation process, the approach maintains thematic coherence across multiple images in an album—a significant challenge for traditional synthetic data methods. This framework allows organizations to apply standard, non-private analytical techniques to safe synthetic substitutes rather than modifying every individual analysis method for differential privacy. ## The Hierarchical Generation Process * The workflow begins by converting original photo albums into structured text; an AI model generates detailed captions for each image and a summary for the entire album. * Two large language models (LLMs) are privately fine-tuned using DP-SGD: the first is trained to produce album summaries, and the second generates individual photo captions based on those summaries. * Synthetic data is then produced hierarchically, where the model first generates a global album summary to serve as context, followed by a series of individual photo captions that remain consistent with that context. * The final step uses a text-to-image AI model to transform the private, synthetic text captions back into a set of coherent images. ## Benefits of Intermediate Text Representations * Text summarization is inherently privacy-enhancing because it is a "lossy" operation, meaning the text description is unlikely to capture the exact unique details of an original photo. * Using text as a midpoint allows for more efficient resource management, as generated albums can be filtered and curated at the text level before undergoing the computationally expensive process of image generation. * The hierarchical approach ensures that photos within a synthetic album share the same characters and themes, as every caption in a set is derived from the same contextual summary. * Training two separate models with shorter context windows is significantly more efficient than training one large model, because the computational cost of self-attention scales quadratically with the length of the context. This hierarchical, text-mediated approach demonstrates that high-level semantic information and thematic coherence can be preserved in synthetic datasets without sacrificing individual privacy. Organizations should consider this workflow—translating complex multi-modal data into structured text before synthesis—to scale differentially private data generation for advanced modeling and analysis.

line

Essential Element for App Success: Error Monitoring (opens in new tab)

Effective mobile app management requires proactive outage monitoring to prevent user churn caused by failures in critical flows like registration or payment. Relying on user reports is often too late, so developers must implement systematic event collection and real-time dashboards to identify issues the moment they arise. By integrating tools like Sentry or Firebase, teams can maintain high quality through immediate response and detailed performance analysis. ### Implementing Sentry in Flutter * **Dependency and Initialization**: Integration begins by adding `sentry_flutter` and `sentry_dio` to the project. The initialization process involves setting the Data Source Name (DSN), environment tags (e.g., production vs. staging), and release versions to ensure logs are correctly categorized. * **Performance and Privacy**: Developers should configure `tracesSampleRate` and `profilesSampleRate` to balance monitoring depth with costs. Additionally, the `beforeSend` callback allows for masking sensitive user data like authorization headers or IP addresses before they are transmitted. * **Contextual Tracking**: To aid debugging, the system captures user IDs via `Sentry.configureScope` and tracks user movement using `SentryNavigatorObserver`. Utilizing `SentryInterceptor` with the Dio library allows for automatic tracking of HTTP request performance and API bottlenecks. ### Strategic Log Level Design * **Debug and Info**: Debug logs remain local to the terminal to save resources. Info logs are reserved for significant user actions that change data, such as successful sign-ups or purchases, while high-frequency read actions like "viewing a product list" are excluded to reduce noise and costs. * **Warning**: This level tracks external system failures, such as failed API calls or push notification losses. To prevent "alert fatigue," client-side network issues (e.g., timeouts or offline status) are ignored, and alerts are triggered only when specific thresholds are met, such as 100 failures within 10 minutes. * **Error**: Error logs represent internal logic failures that bypass defensive coding, such as null object errors, parsing failures, or unreachable code branches. These require immediate notification to the development team to facilitate rapid hotfixes. * **Fatal**: This level is dedicated to application crashes and unhandled exceptions. When configured at the app's entry point, the system automatically captures these critical failures to provide a comprehensive "crash-free users" metric. ### Creating Effective Dashboards * **Naming Conventions**: Logs should follow a strict structure, using tags for modules and event names (e.g., `[API] [postLogin] success`). This consistency allows for granular querying and clearer visualization on monitoring dashboards. * **Data Enrichment**: Using the `extra` field in log events provides vital context for troubleshooting, such as including the specific endpoint, request body, and response status code for a failed transaction. * **Actionable Metrics**: Effective monitoring focuses on key performance indicators like API error rates and the failure percentage of core business events (login, registration, payment) rather than just raw crash counts. A robust monitoring strategy shifts the focus from simple crash reporting to comprehensive service health. By standardizing log levels and automating event collection, development teams can distinguish between transient network blips and critical logic errors, ensuring they spend their time fixing high-impact issues.

google

Solving virtual machine puzzles: How AI is optimizing cloud computing (opens in new tab)

Google researchers have developed LAVA, a scheduling framework designed to optimize virtual machine (VM) allocation in large-scale data centers by accurately predicting and adapting to VM lifespans. By moving beyond static, one-time predictions toward a "continuous re-prediction" model based on survival analysis, the system significantly improves resource efficiency and reduces fragmentation. This approach allows cloud providers to solve the complex "bin packing" problem more effectively, leading to better capacity utilization and easier system maintenance. ### The Challenge of Long-Tailed VM Distributions * Cloud workloads exhibit a extreme long-tailed distribution: while 88% of VMs live for less than an hour, these short-lived jobs consume only 2% of total resources. * The rare VMs that run for 30 days or longer account for a massive fraction of compute resources, meaning their placement has a disproportionate impact on host availability. * Poor allocation leads to "resource stranding," where a server's remaining capacity is too small or unbalanced to host new VMs, effectively wasting expensive hardware. * Traditional machine learning models that provide only a single prediction at VM creation are often fragile, as a single misprediction can block a physical host from being cleared for maintenance or new tasks. ### Continuous Re-prediction via Survival Analysis * Instead of predicting a single average lifetime, LAVA uses an ML model to generate a probability distribution of a VM's expected duration. * The system employs "continuous re-prediction," asking how much longer a VM is expected to run given how long it has already survived (e.g., a VM that has run for five days is assigned a different remaining lifespan than a brand-new one). * This adaptive approach allows the scheduling logic to automatically correct for initial mispredictions as more data about the VM's actual behavior becomes available over time. ### Novel Scheduling and Rescheduling Algorithms * **Non-Invasive Lifetime Aware Scheduling (NILAS):** Currently deployed on Google’s Borg cluster manager, this algorithm ranks potential hosts by grouping VMs with similar expected exit times to increase the frequency of "empty hosts" available for maintenance. * **Lifetime-Aware VM Allocation (LAVA):** This algorithm fills resource gaps on hosts containing long-lived VMs with jobs that are at least an order of magnitude shorter. This ensures the short-lived VMs exit quickly without extending the host's overall occupation time. * **Lifetime-Aware Rescheduling (LARS):** To minimize disruptions during defragmentation, LARS identifies and migrates the longest-lived VMs first while allowing short-lived VMs to finish their tasks naturally on the original host. By integrating survival-analysis-based predictions into the core logic of data center management, cloud providers can transition from reactive scheduling to a proactive model. This system not only maximizes resource density but also ensures that the physical infrastructure remains flexible enough to handle large, resource-intensive provisioning requests and essential system updates.

google

Using AI to identify genetic variants in tumors with DeepSomatic (opens in new tab)

DeepSomatic is an AI-powered tool developed by Google Research to identify cancer-related mutations by analyzing a tumor's genetic sequence with higher accuracy than current methods. By leveraging convolutional neural networks (CNNs), the model distinguishes between inherited genetic traits and acquired somatic variants that drive cancer progression. This flexible tool supports multiple sequencing platforms and sample types, offering a critical resource for clinicians and researchers aiming to personalize cancer treatment through precision medicine. ## Challenges in Somatic Variant Detection * Somatic variants are genetic mutations acquired after birth through environmental exposure or DNA replication errors, making them distinct from the germline variants found in every cell of a person's body. * Detecting these mutations is technically difficult because tumor samples are often heterogeneous, containing a diverse set of variants at varying frequencies. * Sequencing technologies often introduce small errors that can be difficult to distinguish from actual somatic mutations, especially when the mutation is only present in a small fraction of the sampled cells. ## CNN-Based Variant Calling Architecture * DeepSomatic employs a method pioneered by DeepVariant, which involves transforming raw genetic sequencing data into a set of multi-channel images. * These images represent various data points, including alignment along the chromosome, the quality of the sequence output, and other technical variables. * The convolutional neural network processes these images to differentiate between three categories: the human reference genome, non-cancerous germline variants, and the somatic mutations driving tumor growth. * By analyzing tumor and non-cancerous cells side-by-side, the model effectively filters out sequencing artifacts that might otherwise be misidentified as mutations. ## System Versatility and Application * The model is designed to function in multiple modes, including "tumor-normal" (comparing a biopsy to a healthy sample) and "tumor-only" mode, which is vital for blood cancers like leukemia where isolating healthy cells is difficult. * DeepSomatic is platform-agnostic, meaning it can process data from all major sequencing technologies and adapt to different types of sample processing. * The tool has demonstrated the ability to generalize its learning to various cancer types, even those not specifically included in its initial training sets. ## Open-Source Contributions to Precision Medicine * Google has made the DeepSomatic tool and the CASTLE dataset—a high-quality training and evaluation set—openly available to the global research community. * This initiative is part of a broader effort to use AI for early detection and advanced research in various cancers, including breast, lung, and gynecological cancers. * The release aims to accelerate the development of personalized treatment plans by providing a more reliable way to identify the specific genetic drivers of an individual's disease. By providing a more accurate and adaptable method for variant calling, DeepSomatic helps researchers pinpoint the specific drivers of a patient's cancer. This tool represents a significant advancement in deep learning for genomics, potentially shortening the path from biopsy to targeted therapeutic intervention.

google

Coral NPU: A full-stack platform for Edge AI (opens in new tab)

Coral NPU is a new full-stack, open-source platform designed to bring advanced AI directly to power-constrained edge devices and wearables. By prioritizing a matrix-first hardware architecture and a unified software stack, Google aims to overcome traditional bottlenecks in performance, ecosystem fragmentation, and data privacy. The platform enables always-on, low-power ambient sensing while providing developers with a flexible, RISC-V-based environment for deploying modern machine learning models. ## Overcoming Edge AI Constraints * The platform addresses the "performance gap" where complex ML models typically exceed the power, thermal, and memory budgets of battery-operated devices. * It eliminates the "fragmentation tax" by providing a unified architecture, moving away from proprietary processors that require costly, device-specific optimizations. * On-device processing ensures a high standard of privacy and security by keeping personal context and data off the cloud. ## AI-First Hardware Architecture * Unlike traditional chips, this architecture prioritizes the ML matrix engine over scalar compute to optimize for efficient on-device inference. * The design is built on RISC-V ISA compliant architectural IP blocks, offering an open and extensible reference for system-on-chip (SoC) designers. * The base design delivers performance in the 512 giga operations per second (GOPS) range while consuming only a few milliwatts of power. * The architecture is tailored for "always-on" use cases, making it ideal for hearables, AR glasses, and smartwatches. ## Core Architectural Components * **Scalar Core:** A lightweight, C-programmable RISC-V frontend that manages data flow using an ultra-low-power "run-to-completion" model. * **Vector Execution Unit:** A SIMD co-processor compliant with the RISC-V Vector instruction set (RVV) v1.0 for simultaneous operations on large datasets. * **Matrix Execution Unit:** A specialized engine using quantized outer product multiply-accumulate (MAC) operations to accelerate fundamental neural network tasks. ## Unified Developer Ecosystem * The platform is a C-programmable target that integrates with modern compilers such as IREE and TFLM (TensorFlow Lite Micro). * It supports a wide range of popular ML frameworks, including TensorFlow, JAX, and PyTorch. * The software toolchain utilizes MLIR and the StableHLO dialect to facilitate the transition from high-level models to hardware-executable code. * Developers have access to a complete suite of tools, including a simulator, custom kernels, and a general-purpose MLIR compiler. SoC designers and ML developers looking to build the next generation of wearables should leverage the Coral NPU reference architecture to balance high-performance AI with extreme power efficiency. By utilizing the open-source documentation and RISC-V-based tools, teams can significantly reduce the complexity of deploying private, always-on ambient sensing.

discord

Staff Picks, September 2025: Welcome to Our Video Game Museum (opens in new tab)

This blog post celebrates National Video Games Day by reflecting on the cultural and historical significance of the gaming industry. By framing the discussion around a hypothetical museum of influential titles, the post seeks to identify the specific games that have left the most lasting impact on players and creators alike. ### Commemorating National Video Games Day * The post acknowledges September 12th as a day to honor gaming culture and the legacy of titles released over the years. * It encourages readers to use the occasion as an excuse to engage with their favorite games and spread appreciation for the medium. ### Identifying Historically Significant Titles * The authors utilize a "museum" concept—referencing the character Blathers from the *Animal Crossing* series—to discuss game preservation and importance. * The central inquiry focuses on identifying "prized games" that deserve to be showcased behind glass cases due to their industry-wide influence. ### Team Perspectives on Industry Impact * The post features insights from four specific team members: Veronica, Scott, Tyler, and Anni. * Each contributor provides a personal selection for the game they believe has had the most significant impact on their lives or the industry at large. Whether through a museum exhibit or personal play, reflecting on the history of gaming helps highlight the titles that defined the medium. Readers are encouraged to consider which games they would personally archive as the most influential "prized pieces" of digital history.

google

XR Blocks: Accelerating AI + XR innovation (opens in new tab)

XR Blocks is an open-source, cross-platform framework designed to bridge the technical gap between mature AI development ecosystems and high-friction extended reality (XR) prototyping. By providing a modular architecture and high-level abstractions, the toolkit enables creators to rapidly build and deploy intelligent, immersive web applications without managing low-level system integration. Ultimately, the framework empowers developers to move from concept to interactive prototype across both desktop simulators and mobile XR devices using a unified codebase. ### Core Design Principles * **Simplicity and Readability:** Drawing inspiration from the "Zen of Python," the framework prioritizes human-readable abstractions where a developer’s script reflects a high-level description of the experience rather than complex boilerplate code. * **Creator-Centric Workflow:** The architecture is designed to handle the "plumbing" of XR—such as sensor fusion, AI model integration, and cross-platform logic—allowing creators to focus entirely on user interaction and experience. * **Pragmatic Modularity:** Rather than attempting to be a perfect, all-encompassing system, XR Blocks favors an adaptable and simple architecture that can evolve alongside the rapidly changing fields of AI and spatial computing. ### The Reality Model Abstractions * **The Script Primitive:** Acts as the logical center of an application, separating the "what" of an interaction from the "how" of its underlying technical implementation. * **User and World:** Provides built-in support for tracking hands, gaze, and avatars while allowing the system to query the physical environment for depth, estimated lighting conditions, and object recognition. * **AI and Agents:** Facilitates the integration of intelligent assistants, such as the "Sensible Agent," which can provide proactive, context-aware suggestions within the XR environment. * **Virtual Interfaces:** Offers tools to augment blended reality with virtual UI elements that respond to the user's physical context. ### Technical Implementation and Integration * **Web-Based Foundation:** The framework is built upon accessible, standard technologies including WebXR, three.js, and LiteRT (formerly TFLite) to ensure a low barrier to entry for web developers. * **Advanced AI Support:** It features native integration with Gemini for high-level reasoning and context-aware applications. * **Cross-Platform Deployment:** Developers can prototype depth-aware, physics-based interactions in a desktop simulator and deploy the exact same code to Android XR devices. * **Open-Source Resources:** The project includes a comprehensive suite of templates and live demos covering specific use cases like depth mapping, gesture modeling, and lighting estimation. By lowering the barrier to entry for intelligent XR development, XR Blocks serves as a practical starting point for researchers and developers aiming to explore the next generation of human-centered computing. Interested creators can access the source code on GitHub to begin building immersive, AI-driven applications that function seamlessly across the web and specialized XR hardware.

discord

Discord Patch Notes: October 7, 2025 (opens in new tab)

Discord’s "Patch Notes" series provides a transparent look into the engineering team's ongoing efforts to improve platform performance, reliability, and responsiveness. By focusing on bug-squishing and usability enhancements, the series outlines the specific technical changes implemented to maintain a high-quality user experience across all supported devices. ### Community-Driven Bug Discovery * Discord utilizes the community-run r/DiscordApp subreddit as a primary channel for identifying technical issues. * Users are encouraged to post in the Bimonthly Bug Megathread, which is actively monitored by the engineering team to track and resolve persistent user concerns. * This direct feedback loop allows developers to prioritize fixes that have the most significant impact on the general user base. ### Early Access via iOS TestFlight * For users interested in experimental features, Discord offers an early-access program through Apple’s TestFlight platform. * This beta version allows iOS users to test new updates before they reach the general public, serving as a final stage for identifying "pesky bugs" in a live environment. * Participation in this program provides the engineering team with critical data on feature stability and performance on mobile hardware. ### Commit and Deployment Status * All listed fixes in the series have already been committed and merged into Discord's primary codebase. * Because the deployment process is staged, these updates may roll out to individual platforms and regions at slightly different times even after the notes are published. To ensure the most stable experience and gain access to the latest performance improvements, users should keep their applications updated and consider joining the TestFlight program to help refine upcoming features.

discord

Discord Update: September 25, 2025 Changelog (opens in new tab)

Discord’s September 2025 update focuses on enhancing user expression and scaling server infrastructure to unprecedented levels. By introducing massive server capacity increases and highly customizable interface features, the platform aims to better support its largest communities and most active power users. Ultimately, these changes provide a more dynamic social experience through improved profile visibility, expanded pin limits, and flexible multitasking tools. ### Enhanced User Profiles and Multitasking - Desktop profiles now feature a refreshed layout designed to showcase a user's current activities and history more clearly. - Multiple concurrent activities, such as playing a game while listening to music in a voice channel, are now displayed as a "stack of cards" on the profile. - Activities can be moved into a pop-out floating window, allowing users to participate in shared experiences like "Watch Together" while navigating other servers or DMs. - A new audio cue now plays whenever a user turns their camera on to provide immediate feedback that their video stream is live. ### Massive Scaling and Embed Improvements - The default server member cap has been increased to 25 million, supported by engineering optimizations to member list loading speeds for "super-super-large" communities. - The channel pin limit has been expanded fivefold, moving from a 50-message cap to 250 messages per channel. - Native support for AV1 video attachments and embeds was integrated to improve video quality and loading performance. - Tumblr link embeds have been overhauled to include detailed descriptions and metadata for hashtags used in the original post. ### Custom Themes and Aesthetic Upgrades - Nitro users can now create custom gradient themes using up to five different colors, a feature that synchronizes across both desktop and mobile clients. - Two new Server Tag badge packs—the Pet pack and the Flex pack—introduce new iconography for server roles, including animal icons and royalty-themed badges. - Visual updates were made to Group DM icons, which the development team refers to as "facepiles," to better represent groups of friends in the chat list. Users should explore the new custom gradient settings in their Nitro preferences to personalize their workspace and take advantage of the expanded pin limits to better manage information in high-traffic channels.