google

Accelerating the magic cycle of research breakthroughs and real-world applications (opens in new tab)

Google Research is accelerating a "magic cycle" where breakthrough scientific discoveries and real-world applications continuously reinforce one another through advanced AI models and open platforms. By leveraging agentic tools and large-scale foundations, the company is transforming complex data into actionable insights across geospatial analysis, genomics, and quantum computing. This iterative process aims to solve critical global challenges while simultaneously uncovering new frontiers for future innovation. ### Earth AI and Geospatial Reasoning * Google has integrated various geospatial models—including those for flood forecasting, wildfire tracking, and air quality—into a unified Earth AI program. * The newly introduced Geospatial Reasoning Agent uses Large Language Models (LLMs) to allow non-experts to ask complex questions and receive plain-language answers derived from diverse datasets. * Riverine flood models have been significantly expanded, now providing forecasts for over 2 billion people across 150 countries. * New Remote Sensing and Population Dynamics Foundations have been released to help researchers understand nuanced correlations in planetary data and supply chain management. ### DeepSomatic and Genomic Research * Building on ten years of genomics work, DeepSomatic is an AI tool designed to identify somatic mutations (genetic variants in tumors) to assist in cancer research. * The tool follows the development of previous foundational models like DeepVariant and DeepConsensus, which helped map human and non-human genomes. * These advancements aim to move the medical field closer to precision medicine by providing health practitioners with higher-resolution data on genetic variations. ### The Magic Cycle of Research and Development * Google highlights "Quantum Echoes" as a key breakthrough in quantum computing, contributing to the broader goal of solving fundamental scientific problems through high-scale computation. * The acceleration of discovery is largely attributed to "agentic tools" that assist scientists in navigating massive datasets and uncovering new research opportunities. * The company emphasizes a collaborative approach, making foundation models available to trusted testers and partners like the WHO and various international research institutes. To maximize the impact of these breakthroughs, organizations should look toward integrating multimodal AI agents that can bridge the gap between specialized scientific data and practical decision-making. By utilizing open platforms and foundation models, the broader scientific community can translate high-level research into scalable solutions for climate resilience, healthcare, and global policy.

google

Toward provably private insights into AI use (opens in new tab)

Google Research has introduced Provably Private Insights (PPI), a framework designed to analyze generative AI usage patterns while providing mathematical guarantees of user privacy. By integrating Large Language Models (LLMs) with differential privacy and trusted execution environments (TEEs), the system enables developers to derive aggregate trends from unstructured data without exposing individual user content. This approach ensures that server-side processing remains limited to privacy-preserving computations that are fully auditable by external parties. ### The Role of LLMs in Structured Summarization The system employs "data expert" LLMs to transform unstructured generative AI data into actionable, structured insights. * The framework utilizes open-source Gemma 3 models to perform specific analysis tasks, such as classifying transcripts into topics or identifying user frustration levels. * This "structured summarization" occurs entirely within a TEE, ensuring that the model processes raw data in an environment inaccessible to human operators or external processes. * Developers can update LLM prompts frequently to answer new research questions without compromising the underlying privacy architecture. ### Confidential Federated Analytics (CFA) Infrastructure The PPI system is built upon Confidential Federated Analytics, a technique that isolates data through hardware-based security and cryptographic verification. * User devices encrypt data and define specific authorized processing steps before uploading it to the server. * A TEE-hosted key management service only releases decryption keys to processing steps that match public, open-source code signatures. * System integrity is verified using Rekor, a public, tamper-resistant transparency log that allows external parties to confirm that the code running in the TEE is exactly what was published. ### Anonymization via Differential Privacy Once the LLM extracts features from the data, the system applies differential privacy (DP) to ensure that the final output does not reveal information about any specific individual. * The extracted categories are aggregated into histograms, with DP noise added to the final counts to prevent the identification of single users. * Because the privacy guarantee is applied at the aggregation stage, the system remains secure even if a developer uses a prompt specifically designed to isolate a single user's data. * All aggregation algorithms are open-source and reproducibly buildable, allowing for end-to-end verifiability of the privacy claims. By open-sourcing the PPI stack through the Google Parfait project and deploying it in applications like Pixel Recorder, this framework establishes a new standard for transparent data analysis. Developers should look to integrate similar TEE-based federated analytics to balance the need for product insights with the necessity of provable, hardware-backed user privacy.

line

Code Quality Improvement Techniques Part (opens in new tab)

Designing objects that require a specific initialization sequence often leads to fragile code and runtime exceptions. When a class demands that a method like `prepare()` be called before its primary functionality becomes available, it places the burden of safety on the consumer rather than the structure of the code itself. To improve reliability, developers should aim to create "unbreakable" interfaces where an instance is either ready for use upon creation or restricted by the type system from being used incorrectly. ### Problems with "Broken" Constructors * Classes that allow instantiation in an "unprepared" state rely on documentation or developer memory to avoid `IllegalStateException` errors. * When an object is passed across different layers of an application, it becomes difficult to track whether the required setup logic has been executed. * Relying on runtime checks to verify internal state increases the surface area for bugs that only appear during specific execution paths. ### Immediate Initialization and Factory Patterns * The most direct solution is to move initialization logic into the `init` block, allowing properties to be defined as read-only (`val`). * Because constructors have limitations—such as the inability to use `suspend` functions or handle complex side effects—a private constructor combined with a static factory method (e.g., `companion object` in Kotlin) is often preferred. * Using a factory method like `createInstance()` ensures that all necessary preparation logic is completed before a user ever receives the object instance. ### Lazy and Internal Preparation * If the initialization process is computationally expensive and might not be needed for every instance, "lazy" initialization can defer the cost until the first time a functional method is called. * In Kotlin, the `by lazy` delegate can be used to encapsulate preparation logic, ensuring it only runs once and remains thread-safe. * Alternatively, the class can handle preparation internally within its main methods, checking the initialization state automatically so the user does not have to manage it manually. ### Type-Safe State Transitions * For complex lifecycles, the type system can be used to enforce order by splitting the object into two distinct classes: one for the "unprepared" state and one for the "prepared" state. * The initial class contains only the `prepare()` method, which returns a new instance of the "Prepared" class upon completion. * This approach makes it a compile-time impossibility to call methods like `play()` on an object that hasn't been prepared, effectively eliminating a whole category of runtime errors. ### Recommendations When designing classes with internal states, prioritize structural safety by making it impossible to represent an invalid state. Use factory functions for complex setup logic and consider splitting classes into separate types if they have distinct "ready" and "not ready" phases to leverage the compiler for error prevention.

google

StreetReaderAI: Towards making street view accessible via context-aware multimodal AI (opens in new tab)

StreetReaderAI is a research prototype designed to make immersive street-level imagery accessible to the blind and low-vision community through multimodal AI. By integrating real-time scene analysis with context-aware geographic data, the system transforms visual mapping data into an interactive, audio-first experience. This framework allows users to virtually explore environments and plan routes with a level of detail and independence previously unavailable through traditional screen readers. ### Navigation and Spatial Awareness The system offers an immersive, first-person exploration interface that mimics the mechanics of accessible gaming. * Users navigate using keyboard shortcuts or voice commands, taking "virtual steps" forward or backward and panning their view in 360 degrees. * Real-time audio feedback provides cardinal and intercardinal directions, such as "Now facing North," to maintain spatial orientation. * Distance tracking informs the user how far they have traveled between panoramic images, while "teleport" features allow for quick jumps to specific addresses or landmarks. ### Context-Aware AI Describer At the core of the tool is a subsystem backed by Gemini that synthesizes visual and geographic data to generate descriptions. * The AI Describer combines the current field-of-view image with dynamic metadata about nearby roads, intersections, and points of interest. * Two distinct modes cater to different user needs: a "Default" mode focusing on pedestrian safety and navigation, and a "Tour Guide" mode that provides historical and architectural details. * The system utilizes Gemini to proactively predict and suggest follow-up questions relevant to the specific scene, such as details about crosswalks or building entrances. ### Interactive Dialogue and Session Memory StreetReaderAI utilizes the Multimodal Live API to facilitate real-time, natural language conversations about the environment. * The AI Chat agent maintains a large context window of approximately 1,048,576 tokens, allowing it to retain a "memory" of up to 4,000 previous images and interactions. * This memory allows users to ask retrospective spatial questions, such as "Where was that bus stop I just passed?", with the agent providing relative directions based on the user's current location. * By tracking every pan and movement, the agent can provide specific details about the environment that were captured in previous steps of the virtual walk. ### User Evaluation and Practical Application Testing with blind screen reader users confirmed the system's utility in practical, real-world scenarios. * Participants successfully used the prototype to evaluate potential walking routes, identifying critical environmental features like the presence of benches or shelters at bus stops. * The study highlighted the importance of multimodal inputs—combining image recognition with structured map data—to provide a more accurate and reliable description than image analysis alone could offer. While StreetReaderAI remains a proof-of-concept, it demonstrates that the integration of multimodal LLMs and spatial data can bridge significant accessibility gaps in digital mapping. Future implementation of these technologies could transform how visually impaired individuals interact with the world, turning static street imagery into a functional tool for independent mobility and exploration.

google

How we are building the personal health coach (opens in new tab)

Google is leveraging Gemini models to create a proactive, adaptive personal health coach designed to bridge the gap between fragmented health data and actionable wellness guidance. By integrating physiological metrics with behavioral science, the system provides tailored insights and sustainable habit-building plans through a sophisticated multi-agent AI architecture. This initiative, currently in public preview for Fitbit Premium users, represents a transition toward data-driven, expert-validated health coaching that evolves dynamically with an individual's progress. ## Architecting a Multi-Agent Health Coach The system utilizes a complex multi-agent framework to coordinate different specialized AI sub-agents, ensuring that health recommendations are holistic and contextually aware. * **Conversational Agent:** Manages multi-turn interactions, understands user intent, and orchestrates the other agents while gathering necessary context for response generation. * **Data Science Agent:** Employs code-generation capabilities to iteratively fetch, analyze, and summarize physiological time-series data, such as sleep patterns and workout intensity. * **Domain Expert Agent:** Analyzes user data through the lens of specific fields like fitness or nutrition to generate and adapt personalized plans based on changing user context. * **Numerical Reasoning:** The coach performs sophisticated reasoning on health metrics, comparing current data against personal baselines and population-level statistics using capabilities derived from PH-LLM research. ## Ensuring Reliability via the SHARP Framework To move beyond general-purpose AI capabilities, the system is grounded in established coaching frameworks and subjected to rigorous technical and clinical validation. * **SHARP Evaluation:** The model is continuously assessed across five dimensions: Safety, Helpfulness, Accuracy, Relevance, and Personalization. * **Human-in-the-Loop Validation:** The development process involved over 1 million human annotations and 100,000 hours of evaluation by specialists in fields such as cardiology, endocrinology, and behavioral science. * **Expert Oversight:** Google convened a Consumer Health Advisory Panel and collaborated with professional fitness coaches to ensure the AI's recommendations align with real-world professional standards. * **Scientific Grounding:** The coach utilizes novel methods to foster consensus in nuanced health areas, ensuring that wellness recommendations remain scientifically accurate through the use of scaled "autoraters." Eligible Fitbit Premium users on Android in the US can now opt into the public preview to provide feedback on these personalized insights. As the tool evolves through iterative design and user research, it aims to provide a seamless connection between raw health metrics and sustainable lifestyle changes.

netflix

Post-Training Generative Recommenders with Advantage-Weighted Supervised Finetuning | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix is evolving its recommendation systems by moving beyond simple behavior imitation toward generative recommenders that better align with true user preferences. While generative models like HSTU and OneRec effectively capture sequential user patterns, they often struggle to distinguish between habitual clicks and genuine satisfaction. To bridge this gap, Netflix developed Advantage-Weighted Supervised Fine-tuning (A-SFT), a post-training method that leverages noisy reward signals to refine model performance without the need for complex counterfactual data. ### The Shift to Generative Recommenders * Modern generative recommenders (GRs), such as HSTU and OneRec, utilize transformer architectures to treat recommendation as a sequential transduction task. * The models are typically trained using next-item prediction, where the system learns to imitate the chronological sequence of a user’s activities. * A significant drawback of this "behavior cloning" approach is that it captures external trends and noise rather than long-term user satisfaction, potentially recommending content the user finished but did not actually enjoy. ### Barriers to Reinforcement Learning in RecSys * Traditional post-training methods used in Large Language Models, such as Proximal Policy Optimization (PPO) or Direct Preference Optimization (DPO), require counterfactual feedback that is difficult to obtain in recommendation contexts. * Because user sequences span weeks or years, it is impractical to generate and test hypothetical, counterfactual experiences for real-time user validation. * Reward signals in recommendation systems are inherently noisy; for instance, high watch time might indicate interest, but it can also be a result of external circumstances, making it an unreliable metric for optimization. ### Advantage-Weighted Supervised Fine-tuning (A-SFT) * A-SFT is a hybrid approach that sits between offline reinforcement learning and standard supervised fine-tuning. * The algorithm incorporates an advantage function to weight training examples, allowing the model to prioritize actions that lead to higher rewards while filtering out noise from the reward model. * This method is specifically designed to handle high-variance reward signals, using them as directional guides rather than absolute truth, which prevents the model from over-exploiting inaccurate data. * Benchmarks against other representative methods show that A-SFT achieves superior alignment between the generative recommendation policy and the underlying reward model. For organizations managing large-scale recommendation engines, A-SFT offers a practical path to implementing post-training improvements. By focusing on advantage-weighted signals, developers can improve recommendation quality using existing implicit feedback—like watch time and clicks—without the infrastructure hurdles of online reinforcement learning.

google

Google Earth AI: Unlocking geospatial insights with foundation models and cross-modal reasoning (opens in new tab)

Google Earth AI introduces a framework of geospatial foundation models and reasoning agents designed to solve complex, planetary-scale challenges through cross-modal reasoning. By integrating Gemini-powered orchestrators with specialized imagery, population, and environmental models, the system deconstructs multifaceted queries into actionable multi-step plans. This approach enables a holistic understanding of real-world events, such as disaster response and disease forecasting, by grounding AI insights in diverse, grounded geospatial data. ## Geospatial Reasoning Agents * Utilizes Gemini models as intelligent orchestrators to manage complex queries that require data from multiple domains. * The agent deconstructs a high-level question—such as predicting hurricane landfalls and community vulnerability—into a sequence of smaller, executable tasks. * It executes these plans by autonomously calling specialized foundation models, querying vast datastores, and utilizing geospatial tools to fuse disparate data points into a single, cohesive answer. ## Remote Sensing and Imagery Foundations * Employs vision-language models and open-vocabulary object detection trained on a large corpus of high-resolution overhead imagery paired with text descriptions. * Enables "zero-shot" capabilities, allowing users to find specific objects like "flooded roads" or "building damage" using natural language without needing to retrain the model for specific classes. * Technical evaluations show a 16% average improvement on text-based image search tasks and more than double the baseline accuracy for detecting novel objects in a zero-shot setting. ## Population Dynamics and Mobility * Focuses on the interplay between people and places using globally-consistent embeddings across 17 countries. * Includes monthly updated embeddings that capture shifting human activity patterns, which are essential for time-sensitive forecasting. * Research conducted with the University of Oxford showed that incorporating these population embeddings into a Dengue fever forecasting model in Brazil improved the R² metric from 0.456 to 0.656 for long-range 12-month predictions. ## Environmental and Disaster Forecasting * Integrates established Google research into weather nowcasting, flood forecasting, and wildfire boundary mapping. * Provides the reasoning agent with the data necessary to evaluate environmental risks alongside population density and infrastructure imagery. * Aims to provide Search and Maps users with real-time, accurate alerts regarding natural disasters grounded in planetary-scale environmental data. Developers and enterprises looking to solve high-level geospatial problems can now express interest in accessing these capabilities through Google Earth and Google Cloud. By leveraging these foundation models, organizations can automate the analysis of satellite imagery and human mobility data to better prepare for environmental and social challenges.

google

A verifiable quantum advantage (opens in new tab)

Google Quantum AI researchers have introduced "Quantum Echoes," a new algorithm designed to measure Out-of-Time-Order Correlators (OTOCs) to characterize quantum chaos. By demonstrating this task on the 103-qubit Willow chip, the team has achieved a verifiable quantum advantage that surpasses the limitations of previous random circuit sampling techniques. This work establishes a direct path toward solving practical problems in physics and chemistry, such as Hamiltonian learning, through the use of stable and reproducible quantum expectation values. ## Limitations of Random Circuit Sampling * While the 2019 "quantum supremacy" milestone proved quantum computers could outperform classical ones, the bitstring sampling method used was difficult to verify and lacked practical utility. * In large-scale quantum systems, specific bitstrings rarely repeat, which restricts the ability to extract useful, actionable information from the computation. * The Quantum Echoes approach shifts focus to quantum expectation values—such as magnetization, density, and velocity—which remain consistent across different quantum computers and are computationally verifiable. ## The Quantum Echoes Algorithm and OTOCs * The algorithm measures OTOCs, which represent the state of a single qubit after a series of "forward" ($U$) and "backward" ($U^\dagger$) evolutions. * In the experiment, 103 qubits on the Willow processor underwent evolution through random quantum circuits to reach a highly chaotic state. * A perturbation (gate $B$) is applied between the forward and backward evolutions; if the system is chaotic, this small change triggers a "butterfly effect," resulting in a final state significantly different from the initial one. * Higher-order OTOCs involve multiple "round trips" of these evolutions, increasing the system's sensitivity to the perturbation and allowing for a more detailed characterization of the quantum dynamics. ## Many-Body Interference and Signal Amplification * The researchers discovered that higher-order OTOCs function like many-body interferometers, where the quantum states of many particles interfere with one another. * The perturbation gates ($B$ and $M$) act as mirrors; when a resonance condition is met (where $U^\dagger$ is the exact inverse of $U$), constructive interference occurs. * This constructive interference amplifies specific quantum correlations, allowing the OTOC signal magnitude to scale as a negative power of the system size, rather than the exponential decay typically seen in chaotic systems. * This amplification makes the OTOC a sensitive instrument for identifying the specific correlations generated between two different qubits during the evolution of the circuit. ## Practical Applications and Future Research The success of the Quantum Echoes algorithm on the Willow chip marks a transition toward using quantum computers for tasks that are both beyond-classical and physically relevant. This method is particularly well-suited for Hamiltonian learning in Nuclear Magnetic Resonance (NMR) and studying the flow of electrons in high-temperature superconductors. Moving forward, the ability to measure verifiable expectation values in the chaotic regime will be essential for researchers looking to simulate complex quantum materials that are impossible to model on classical hardware.