google

How we are building the personal health coach (opens in new tab)

Google is leveraging Gemini models to create a proactive, adaptive personal health coach designed to bridge the gap between fragmented health data and actionable wellness guidance. By integrating physiological metrics with behavioral science, the system provides tailored insights and sustainable habit-building plans through a sophisticated multi-agent AI architecture. This initiative, currently in public preview for Fitbit Premium users, represents a transition toward data-driven, expert-validated health coaching that evolves dynamically with an individual's progress. ## Architecting a Multi-Agent Health Coach The system utilizes a complex multi-agent framework to coordinate different specialized AI sub-agents, ensuring that health recommendations are holistic and contextually aware. * **Conversational Agent:** Manages multi-turn interactions, understands user intent, and orchestrates the other agents while gathering necessary context for response generation. * **Data Science Agent:** Employs code-generation capabilities to iteratively fetch, analyze, and summarize physiological time-series data, such as sleep patterns and workout intensity. * **Domain Expert Agent:** Analyzes user data through the lens of specific fields like fitness or nutrition to generate and adapt personalized plans based on changing user context. * **Numerical Reasoning:** The coach performs sophisticated reasoning on health metrics, comparing current data against personal baselines and population-level statistics using capabilities derived from PH-LLM research. ## Ensuring Reliability via the SHARP Framework To move beyond general-purpose AI capabilities, the system is grounded in established coaching frameworks and subjected to rigorous technical and clinical validation. * **SHARP Evaluation:** The model is continuously assessed across five dimensions: Safety, Helpfulness, Accuracy, Relevance, and Personalization. * **Human-in-the-Loop Validation:** The development process involved over 1 million human annotations and 100,000 hours of evaluation by specialists in fields such as cardiology, endocrinology, and behavioral science. * **Expert Oversight:** Google convened a Consumer Health Advisory Panel and collaborated with professional fitness coaches to ensure the AI's recommendations align with real-world professional standards. * **Scientific Grounding:** The coach utilizes novel methods to foster consensus in nuanced health areas, ensuring that wellness recommendations remain scientifically accurate through the use of scaled "autoraters." Eligible Fitbit Premium users on Android in the US can now opt into the public preview to provide feedback on these personalized insights. As the tool evolves through iterative design and user research, it aims to provide a seamless connection between raw health metrics and sustainable lifestyle changes.

netflix

Post-Training Generative Recommenders with Advantage-Weighted Supervised Finetuning | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix is evolving its recommendation systems by moving beyond simple behavior imitation toward generative recommenders that better align with true user preferences. While generative models like HSTU and OneRec effectively capture sequential user patterns, they often struggle to distinguish between habitual clicks and genuine satisfaction. To bridge this gap, Netflix developed Advantage-Weighted Supervised Fine-tuning (A-SFT), a post-training method that leverages noisy reward signals to refine model performance without the need for complex counterfactual data. ### The Shift to Generative Recommenders * Modern generative recommenders (GRs), such as HSTU and OneRec, utilize transformer architectures to treat recommendation as a sequential transduction task. * The models are typically trained using next-item prediction, where the system learns to imitate the chronological sequence of a user’s activities. * A significant drawback of this "behavior cloning" approach is that it captures external trends and noise rather than long-term user satisfaction, potentially recommending content the user finished but did not actually enjoy. ### Barriers to Reinforcement Learning in RecSys * Traditional post-training methods used in Large Language Models, such as Proximal Policy Optimization (PPO) or Direct Preference Optimization (DPO), require counterfactual feedback that is difficult to obtain in recommendation contexts. * Because user sequences span weeks or years, it is impractical to generate and test hypothetical, counterfactual experiences for real-time user validation. * Reward signals in recommendation systems are inherently noisy; for instance, high watch time might indicate interest, but it can also be a result of external circumstances, making it an unreliable metric for optimization. ### Advantage-Weighted Supervised Fine-tuning (A-SFT) * A-SFT is a hybrid approach that sits between offline reinforcement learning and standard supervised fine-tuning. * The algorithm incorporates an advantage function to weight training examples, allowing the model to prioritize actions that lead to higher rewards while filtering out noise from the reward model. * This method is specifically designed to handle high-variance reward signals, using them as directional guides rather than absolute truth, which prevents the model from over-exploiting inaccurate data. * Benchmarks against other representative methods show that A-SFT achieves superior alignment between the generative recommendation policy and the underlying reward model. For organizations managing large-scale recommendation engines, A-SFT offers a practical path to implementing post-training improvements. By focusing on advantage-weighted signals, developers can improve recommendation quality using existing implicit feedback—like watch time and clicks—without the infrastructure hurdles of online reinforcement learning.

google

Google Earth AI: Unlocking geospatial insights with foundation models and cross-modal reasoning (opens in new tab)

Google Earth AI introduces a framework of geospatial foundation models and reasoning agents designed to solve complex, planetary-scale challenges through cross-modal reasoning. By integrating Gemini-powered orchestrators with specialized imagery, population, and environmental models, the system deconstructs multifaceted queries into actionable multi-step plans. This approach enables a holistic understanding of real-world events, such as disaster response and disease forecasting, by grounding AI insights in diverse, grounded geospatial data. ## Geospatial Reasoning Agents * Utilizes Gemini models as intelligent orchestrators to manage complex queries that require data from multiple domains. * The agent deconstructs a high-level question—such as predicting hurricane landfalls and community vulnerability—into a sequence of smaller, executable tasks. * It executes these plans by autonomously calling specialized foundation models, querying vast datastores, and utilizing geospatial tools to fuse disparate data points into a single, cohesive answer. ## Remote Sensing and Imagery Foundations * Employs vision-language models and open-vocabulary object detection trained on a large corpus of high-resolution overhead imagery paired with text descriptions. * Enables "zero-shot" capabilities, allowing users to find specific objects like "flooded roads" or "building damage" using natural language without needing to retrain the model for specific classes. * Technical evaluations show a 16% average improvement on text-based image search tasks and more than double the baseline accuracy for detecting novel objects in a zero-shot setting. ## Population Dynamics and Mobility * Focuses on the interplay between people and places using globally-consistent embeddings across 17 countries. * Includes monthly updated embeddings that capture shifting human activity patterns, which are essential for time-sensitive forecasting. * Research conducted with the University of Oxford showed that incorporating these population embeddings into a Dengue fever forecasting model in Brazil improved the R² metric from 0.456 to 0.656 for long-range 12-month predictions. ## Environmental and Disaster Forecasting * Integrates established Google research into weather nowcasting, flood forecasting, and wildfire boundary mapping. * Provides the reasoning agent with the data necessary to evaluate environmental risks alongside population density and infrastructure imagery. * Aims to provide Search and Maps users with real-time, accurate alerts regarding natural disasters grounded in planetary-scale environmental data. Developers and enterprises looking to solve high-level geospatial problems can now express interest in accessing these capabilities through Google Earth and Google Cloud. By leveraging these foundation models, organizations can automate the analysis of satellite imagery and human mobility data to better prepare for environmental and social challenges.

google

A verifiable quantum advantage (opens in new tab)

Google Quantum AI researchers have introduced "Quantum Echoes," a new algorithm designed to measure Out-of-Time-Order Correlators (OTOCs) to characterize quantum chaos. By demonstrating this task on the 103-qubit Willow chip, the team has achieved a verifiable quantum advantage that surpasses the limitations of previous random circuit sampling techniques. This work establishes a direct path toward solving practical problems in physics and chemistry, such as Hamiltonian learning, through the use of stable and reproducible quantum expectation values. ## Limitations of Random Circuit Sampling * While the 2019 "quantum supremacy" milestone proved quantum computers could outperform classical ones, the bitstring sampling method used was difficult to verify and lacked practical utility. * In large-scale quantum systems, specific bitstrings rarely repeat, which restricts the ability to extract useful, actionable information from the computation. * The Quantum Echoes approach shifts focus to quantum expectation values—such as magnetization, density, and velocity—which remain consistent across different quantum computers and are computationally verifiable. ## The Quantum Echoes Algorithm and OTOCs * The algorithm measures OTOCs, which represent the state of a single qubit after a series of "forward" ($U$) and "backward" ($U^\dagger$) evolutions. * In the experiment, 103 qubits on the Willow processor underwent evolution through random quantum circuits to reach a highly chaotic state. * A perturbation (gate $B$) is applied between the forward and backward evolutions; if the system is chaotic, this small change triggers a "butterfly effect," resulting in a final state significantly different from the initial one. * Higher-order OTOCs involve multiple "round trips" of these evolutions, increasing the system's sensitivity to the perturbation and allowing for a more detailed characterization of the quantum dynamics. ## Many-Body Interference and Signal Amplification * The researchers discovered that higher-order OTOCs function like many-body interferometers, where the quantum states of many particles interfere with one another. * The perturbation gates ($B$ and $M$) act as mirrors; when a resonance condition is met (where $U^\dagger$ is the exact inverse of $U$), constructive interference occurs. * This constructive interference amplifies specific quantum correlations, allowing the OTOC signal magnitude to scale as a negative power of the system size, rather than the exponential decay typically seen in chaotic systems. * This amplification makes the OTOC a sensitive instrument for identifying the specific correlations generated between two different qubits during the evolution of the circuit. ## Practical Applications and Future Research The success of the Quantum Echoes algorithm on the Willow chip marks a transition toward using quantum computers for tasks that are both beyond-classical and physically relevant. This method is particularly well-suited for Hamiltonian learning in Nuclear Magnetic Resonance (NMR) and studying the flow of electrons in high-temperature superconductors. Moving forward, the ability to measure verifiable expectation values in the chaotic regime will be essential for researchers looking to simulate complex quantum materials that are impossible to model on classical hardware.

netflix

Behind the Streams: Real-Time Recommendations for Live Events Part 3 | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix manages the massive surge of concurrent users during live events by utilizing a hybrid strategy of prefetching and real-time broadcasting to deliver synchronized recommendations. By decoupling data delivery from the live trigger, the system avoids the "thundering herd" effect that would otherwise overwhelm cloud infrastructure during record-breaking broadcasts. This architecture ensures that millions of global devices receive timely updates and visual cues without requiring linear, inefficient scaling of compute resources. ### The Constraint Optimization Problem To maintain a seamless experience, Netflix engineers balance three primary technical constraints: time to update, request throughput, and compute cardinality. * **Time:** The specific duration required to coordinate and push a recommendation update to the entire global fleet. * **Throughput:** The maximum capacity of cloud services to handle incoming requests without service degradation. * **Cardinality:** The variety and complexity of unique requests necessary to serve personalized updates to different user segments. ### Two-Phase Recommendation Delivery The system splits the delivery process into two distinct stages to smooth out traffic spikes and ensure high availability. * **Prefetching Phase:** While members browse the app normally before an event, the system downloads materialized recommendations, metadata, and artwork into the device's local cache. * **Broadcasting Phase:** When the event begins, a low-cardinality "at least once" message is broadcast to all connected devices, triggering them to display the already-cached content instantaneously. * **Traffic Smoothing:** This approach eliminates the need for massive, real-time data fetches at the moment of kickoff, distributing the heavy lifting of data transfer over a longer period. ### Live State Management and UI Synchronization A dedicated Live State Management (LSM) system tracks event schedules in real time to ensure the user interface stays perfectly in sync with the production. * **Dynamic Adjustments:** If a live event is delayed or ends early, the LSM adjusts the broadcast triggers to preserve accuracy and prevent "spoilers" or dead links. * **Visual Cues:** The UI utilizes "Live" badging and dynamic artwork transitions to signal urgency and guide users toward the stream. * **Frictionless Playback:** For members already on a title’s detail page, the system can trigger an automatic transition into the live player the moment the broadcast begins, reducing navigation latency. To support global-scale live events, technical teams should prioritize edge-heavy strategies that pre-position assets on client devices. By shifting from a reactive request-response model to a proactive prefetch-and-trigger model, platforms can maintain high performance and reliability even during the most significant traffic peaks.

google

Teaching Gemini to spot exploding stars with just a few examples (opens in new tab)

Researchers have demonstrated that Google’s Gemini model can classify cosmic events with 93% accuracy, rivaling specialized machine learning models while providing human-readable explanations. By utilizing few-shot learning with only 15 examples per survey, the model addresses the "black box" limitation of traditional convolutional neural networks used in astronomy. This approach enables scientists to efficiently process the millions of alerts generated by modern telescopes while maintaining a transparent and interactive reasoning process. ## Bottlenecks in Modern Transient Astronomy * Telescopes like the Vera C. Rubin Observatory are expected to generate up to 10 million alerts per night, making manual verification impossible. * The vast majority of these alerts are "bogus" signals caused by satellite trails, cosmic rays, or instrumental artifacts rather than real supernovae. * Existing specialized models often provide binary "real" or "bogus" labels without context, forcing astronomers to either blindly trust the output or spend hours on manual verification. ## Multimodal Few-Shot Learning for Classification * The research utilized few-shot learning, providing Gemini with only 15 annotated examples for three major surveys: Pan-STARRS, MeerLICHT, and ATLAS. * Input data consisted of image triplets—a "new" alert image, a "reference" image of the same sky patch, and a "difference" image—each 100x100 pixels in size. * The model successfully generalized across different telescopes with varying pixel scales, ranging from 0.25" per pixel for Pan-STARRS to 1.8" per pixel for ATLAS. * Beyond simple labels, Gemini generates a textual description of observed features and an interest score to help astronomers prioritize follow-up observations. ## Expert Validation and Self-Assessment * A panel of 12 professional astronomers evaluated the model using a 0–5 coherence rubric, confirming that Gemini’s logic aligned with expert reasoning. * The study found that Gemini can effectively assess its own uncertainty; low self-assigned "coherence scores" were strong indicators of likely classification errors. * This ability to flag its own potential mistakes allows the model to act as a reliable partner, alerting scientists when a specific case requires human intervention. The transition from "black box" classifiers to interpretable AI assistants allows the astronomical community to scale with the data flood of next-generation telescopes. By combining high-accuracy classification with transparent reasoning, researchers can maintain scientific rigor while processing millions of cosmic events in real time.

google

A picture's worth a thousand (private) words: Hierarchical generation of coherent synthetic photo albums (opens in new tab)

Researchers at Google have developed a hierarchical method for generating differentially private (DP) synthetic photo albums, providing a way to share representative datasets while protecting sensitive individual information. By utilizing an intermediate text representation and a two-stage generation process, the approach maintains thematic coherence across multiple images in an album—a significant challenge for traditional synthetic data methods. This framework allows organizations to apply standard, non-private analytical techniques to safe synthetic substitutes rather than modifying every individual analysis method for differential privacy. ## The Hierarchical Generation Process * The workflow begins by converting original photo albums into structured text; an AI model generates detailed captions for each image and a summary for the entire album. * Two large language models (LLMs) are privately fine-tuned using DP-SGD: the first is trained to produce album summaries, and the second generates individual photo captions based on those summaries. * Synthetic data is then produced hierarchically, where the model first generates a global album summary to serve as context, followed by a series of individual photo captions that remain consistent with that context. * The final step uses a text-to-image AI model to transform the private, synthetic text captions back into a set of coherent images. ## Benefits of Intermediate Text Representations * Text summarization is inherently privacy-enhancing because it is a "lossy" operation, meaning the text description is unlikely to capture the exact unique details of an original photo. * Using text as a midpoint allows for more efficient resource management, as generated albums can be filtered and curated at the text level before undergoing the computationally expensive process of image generation. * The hierarchical approach ensures that photos within a synthetic album share the same characters and themes, as every caption in a set is derived from the same contextual summary. * Training two separate models with shorter context windows is significantly more efficient than training one large model, because the computational cost of self-attention scales quadratically with the length of the context. This hierarchical, text-mediated approach demonstrates that high-level semantic information and thematic coherence can be preserved in synthetic datasets without sacrificing individual privacy. Organizations should consider this workflow—translating complex multi-modal data into structured text before synthesis—to scale differentially private data generation for advanced modeling and analysis.