Google Research

100 posts

research.google/blog

Filter by tag

google

Unlocking health insights: Estimating advanced walking metrics with smartwatches (opens in new tab)

Google researchers have validated that smartwatches are a highly reliable and accurate platform for estimating complex spatio-temporal gait metrics, rivaling the performance of smartphone-based methods. By utilizing a multi-head deep learning model, the study demonstrates that wrist-worn devices can provide continuous, lab-grade health insights into a user's walking speed, step length, and balance without requiring the specific pocket placement or specialized laboratory equipment previously necessary for such data. ## Multi-Head Deep Learning for Wrist-Based Sensors * The researchers developed a temporal convolutional network (TCN) architecture designed to process raw inertial measurement unit (IMU) data, specifically 3-axis accelerometer and gyroscope signals sampled at 50 Hz. * Unlike traditional models that only track temporal events and are prone to integration drift, this multi-head approach directly estimates both unilateral and bilateral metrics simultaneously. * The model architecture extracts embeddings from the IMU signals and concatenates them with user height (a demographic scalar input) to improve the precision of spatial predictions. * The system estimates a comprehensive suite of metrics, including gait speed, double support time (the proportion of time both feet are on the ground), step length, swing time, and stance time. ## Large-Scale Validation and Study Protocol * To ensure rigorous results, the study involved a diverse cohort of 246 participants across two international sites, generating approximately 70,000 walking segments. * Ground truth measurements were captured using a professional-grade Zeno Gait Walkway system to provide high-precision reference data for comparison. * The study protocol included various walking conditions to test the model's versatility: a self-paced six-minute walk test (6MWT), fast-paced walking, and induced physical asymmetry created by wearing hinged knee braces at specific angles. * Researchers employed a five-fold cross-validation strategy, ensuring that all data from a single participant remained within a single split to prevent data leakage and ensure the model generalizes to new users. ## Clinical Validity and Comparative Performance * Smartwatch estimates demonstrated strong validity and excellent reliability, with Pearson correlation coefficients (r) and intraclass correlation coefficients (ICC) exceeding 0.80 for most metrics. * Performance comparisons showed non-significant differences in Mean Absolute Percentage Error (MAPE) between the Pixel Watch and Pixel phone, establishing the smartwatch as a viable alternative to smartphone-based tracking. * While double support time showed slightly lower but acceptable reliability (ICC 0.56–0.60), other metrics like step length and gait speed proved highly consistent across different walking speeds and styles. * The model’s success suggests that smartwatches can effectively bridge the gap in gait analysis, providing a more practical and consistent platform for continuous health tracking than handheld devices. This research establishes smartwatches as a powerful tool for longitudinal health monitoring, enabling the detection of neurological or musculoskeletal changes through passive, continuous gait analysis in everyday environments.

google

Hard-braking events as indicators of road segment crash risk (opens in new tab)

Google Research has established a statistically significant correlation between hard-braking events (HBEs) collected via Android Auto and actual road crash rates. By utilizing HBEs as a "leading" indicator rather than relying on sparse, lagging historical crash data, researchers can proactively identify high-risk road segments with much greater speed and spatial granularity. This validation suggests that connected vehicle data can serve as a scalable proxy for traditional safety assessments. ### Data Density and Scalability * HBEs—defined as forward deceleration exceeding -3m/s²—provide a signal that is 18 times denser than reported crash data. * While crashes are statistically rare and can take years to provide a valid safety profile for a specific road segment, HBEs offer a continuous stream of information. * This high density allows for the creation of a comprehensive "safety map" that includes local and arterial roads where crash reporting is often inconsistent or sparse. ### Statistical Validation of HBEs * Researchers employed negative binomial regression models to analyze 10 years of public crash data from California and Virginia alongside anonymized HBE data. * The models controlled for confounding factors such as traffic volume, segment length, road type (local, arterial, highway), and infrastructure dynamics like slope and lane changes. * The results confirmed a consistent positive association between HBE frequency and crash rates across all road types, proving HBEs are a reliable surrogate for risk regardless of geography. ### High-Risk Identification Case Study * An analysis of a freeway merge connecting Highway 101 and Highway 880 in California served as a practical validation of the metric. * This specific segment was found to have an HBE rate 70 times higher than the state average, correlating with a historical record of one crash every six weeks. * The HBE signal successfully flagged this location as being in the top 1% of high-risk segments without needing years of collision reports to confirm the danger, demonstrating its utility in identifying "black spots" early. ### Real-World Application and Road Management * Validating HBEs transforms raw sensor data into a trusted tool for urban planners and road authorities to perform network-wide safety assessments. * This approach allows for proactive infrastructure interventions, such as adjusting signage or merge patterns, before fatalities or injuries occur. * The findings support the integration of connected vehicle insights into platforms like Google Maps to help authorities manage road safety more dynamically.

google

Next generation medical image interpretation with MedGemma 1.5 and medical speech to text with MedASR (opens in new tab)

Google Research has introduced MedGemma 1.5 4B and MedASR, expanding its suite of open medical AI models to support more complex clinical workflows. These updates significantly enhance the interpretation of high-dimensional imaging and medical speech-to-text, providing a compute-efficient foundation for healthcare developers to build upon. By maintaining an open-access model available on Hugging Face and Vertex AI, Google aims to accelerate the integration of multimodal AI into real-world medical applications. ### Multimodal Advancements in MedGemma 1.5 The latest update to the MedGemma 4B model focuses on high-dimensional and longitudinal data, moving beyond simple 2D image interpretation. * **3D Medical Imaging:** The model now supports volumetric representations from CT scans and MRIs, as well as whole-slide histopathology imaging. * **Longitudinal Review:** New capabilities allow for the review of chest X-ray time series, helping clinicians track disease progression over time. * **Anatomical Localization:** Developers can use the model to identify and localize specific anatomical features within chest X-rays. * **Document Understanding:** Enhanced support for extracting structured data from complex medical lab reports and documents. * **Edge Capability:** The 4B parameter size is specifically designed to be small enough to run offline while remaining accurate enough for core medical reasoning tasks. ### Medical Speech-to-Text with MedASR MedASR is a specialized automated speech recognition (ASR) model designed to bridge the gap between clinical dialogue and digital documentation. * **Clinical Dictation:** The model is specifically fine-tuned for medical terminology and the unique nuances of clinical dictation. * **Integrated Reasoning:** MedASR is designed to pair seamlessly with MedGemma, allowing transcribed text to be immediately processed for advanced medical reasoning or summarization. * **Accessibility:** Like other HAI-DEF models, it is free for research and commercial use and hosted on both Hugging Face and Google Cloud’s Vertex AI. ### Performance Benchmarks and Community Impact Google is incentivizing innovation through improved performance metrics and community-driven challenges. * **Accuracy Gains:** Internal benchmarks show MedGemma 1.5 improved disease-related CT classification by 3% and MRI classification by 14% compared to the previous version. * **MedGemma Impact Challenge:** A Kaggle-hosted hackathon with $100,000 in prizes has been launched to encourage developers to find creative applications for these multimodal tools. * **Model Collection:** The update complements existing tools like the MedSigLIP image encoder and the larger MedGemma 27B model, which remains the preferred choice for complex, text-heavy medical applications. Developers and researchers are encouraged to utilize MedGemma 1.5 for tasks requiring efficient, offline multimodal processing, while leveraging MedASR to automate clinical documentation. By participating in the MedGemma Impact Challenge, the community can help define the next generation of AI-assisted medical diagnostics and workflows.

google

Dynamic surface codes open new avenues for quantum error correction (opens in new tab)

Google Research has demonstrated the operation of dynamic surface codes for quantum error correction, marking a significant shift from traditional static circuit architectures. By alternating between different circuit constructions and re-tiling "detecting regions" in each cycle, these dynamic circuits offer greater flexibility to avoid hardware defects and suppress correlated errors. Experimental results on the Willow processor show that these methods can match the performance of static codes while significantly simplifying the physical design and fabrication of quantum chips. ## Error Triangulation via Dynamic Detecting Regions Quantum error correction (QEC) functions by localizing physical errors within specific "detecting regions" over multiple cycles to prevent them from affecting logical information. While standard surface codes use a static, square tiling for these regions, dynamic codes periodically change the tiling pattern. * Dynamic circuits allow the system to "deform" the detecting regions in spacetime, providing multiple perspectives to triangulate errors. * This approach enables the use of different gate types and connectivity layouts that are not possible with fixed, repetitive cycles. * The flexibility of dynamic re-tiling allows the system to sidestep common superconducting qubit issues such as "dropouts" (failed qubits or couplers) and leakage out of the computational subspace. ## Quantum Error Correction on Hexagonal Lattices Traditional square lattices require each physical qubit to connect to four neighbors, which creates significant overhead in wiring and coupler density. Dynamic circuits enable the use of a hexagonal lattice, where each qubit only requires three couplers. * The hexagonal code alternates between two distinct cycle types, utilizing one of the three couplers twice per cycle to maintain error detection capabilities. * Testing on the Willow processor showed that scaling the hexagonal code from distance 3 to 5 improved the logical error rate by a factor of 2.15, matching the performance of standard static circuits. * Reducing coupler density simplifies the optimization of qubit and gate frequencies, leading to a 15% improvement in simulated error suppression compared to four-coupler designs. ## Walking Circuits to Mitigate Leakage Superconducting qubits are prone to "leakage," where a qubit exits its intended computational states (0 and 1) into a higher energy state (2). In static circuits, repeated measurements on the same physical qubits can cause these leakage errors to accumulate and spread. * "Walking" circuits solve this by shifting the roles of data and measurement qubits across the lattice in each cycle. * By constantly moving the location where errors are measured, the circuit effectively "flushes out" leakage and other correlated errors before they can damage logical information. * Experiments confirmed that walking circuits achieve error suppression equivalent to static circuits while offering a more robust defense against long-term error correlations. ## Flexibility with iSWAP Entangling Gates Most superconducting quantum processors are optimized for Controlled-Z (CZ) gates, but dynamic circuits prove that QEC can be effectively implemented using alternative gates like iSWAP. * The research team demonstrated a dynamic surface code that utilizes iSWAP gates, which are native to many quantum hardware architectures. * This flexibility ensures that QEC is not tethered to a specific gate set, allowing hardware designers to choose entangling operations that offer the highest physical fidelity for their specific device. The move toward dynamic surface codes suggests a future where quantum processors are more resilient to manufacturing imperfections. By adopting hexagonal layouts and walking circuits, developers can reduce hardware complexity and mitigate physical noise, providing a more scalable path toward fault-tolerant quantum computing.

google

NeuralGCM harnesses AI to better simulate long-range global precipitation (opens in new tab)

NeuralGCM represents a significant evolution in atmospheric modeling by combining traditional fluid dynamics with neural networks to solve the long-standing challenge of simulating global precipitation. By training the AI component directly on high-quality NASA satellite observations rather than biased reanalysis data, the model achieves unprecedented accuracy in predicting daily weather cycles and extreme rainfall events. This hybrid approach offers a faster, more precise tool for both medium-range weather forecasting and multi-decadal climate projections. ## The Limitations of Cloud Parameterization * Precipitation is driven by cloud processes occurring at scales as small as 100 meters, which is far below the kilometer-scale resolution of global weather models. * Traditional models rely on "parameterizations," or mathematical approximations, to estimate how these small-scale events affect the larger atmosphere. * Because these approximations are often simplified, traditional models struggle to accurately capture the complexity of water droplet formation and ice crystal growth, leading to errors in long-term forecasts. ## Training on Direct Satellite Observations * Unlike previous AI models trained on "reanalyses"—which are essentially simulations used to fill observational gaps—NeuralGCM is trained on NASA satellite-based precipitation data spanning 2001 to 2018. * The model utilizes a differentiable dynamical core, an architecture that allows the neural network to learn the effects of small-scale events directly from physical observations. * By bypassing the weaknesses inherent in reanalysis data, the model effectively creates a machine-learned parameterization that is more faithful to real-world cloud physics. ## Performance in Weather and Climate Benchmarks * At a resolution of 280 km, NeuralGCM outperforms leading operational models in medium-range forecasts (up to 15 days) and matches the precision of sophisticated multi-decadal climate models. * The model shows a marked improvement in capturing precipitation extremes, particularly for the top 0.1% of rainfall events. * Evaluation through WeatherBench 2 demonstrates that NeuralGCM accurately reproduces the diurnal (daily) weather cycle, a metric where traditional physics-based models frequently fall short. NeuralGCM provides a highly efficient and accessible framework for researchers and city planners who need to simulate long-range climate scenarios, such as 100-year storms or seasonal agricultural cycles. Its ability to maintain physical consistency while leveraging the speed of AI makes it a powerful candidate for the next generation of global atmospheric modeling.

google

Google Research 2025: Bolder breakthroughs, bigger impact (opens in new tab)

Google Research in 2025 has shifted toward an accelerated "Magic Cycle" that rapidly translates foundational breakthroughs into real-world applications across science, society, and consumer products. By prioritizing model efficiency, factuality, and agentic capabilities, the organization is moving beyond static text generation toward interactive, multi-modal systems that solve complex global challenges. This evolution is underpinned by a commitment to responsible AI development, ensuring that new technologies like quantum computing and generative UI are both safe and culturally inclusive. ## Enhancing Model Efficiency and Factuality * Google introduced new efficiency-focused techniques like block verification (an evolution of speculative decoding) and the LAVA scheduling algorithm, which optimizes resource allocation in large cloud data centers. * The Gemini 3 model achieved state-of-the-art results on factuality benchmarks, including SimpleQA Verified and the newly released FACTS benchmark suite, by emphasizing grounded world knowledge. * Research into Retrieval Augmented Generation (RAG) led to the development of the LLM Re-Ranker in Vertex AI, which helps models determine if they possess sufficient context to provide accurate answers. * The Gemma open model expanded to support over 140 languages, supported by the TUNA taxonomy and the Amplify initiative to improve socio-cultural intelligence and data representation. ## Interactive Experiences through Generative UI * A novel implementation of generative UI allows Gemini 3 to dynamically create visual interfaces, web pages, and tools in response to user prompts rather than providing static text. * This technology is powered by specialized models like "Gemini 3-interactive," which are trained to output structured code and design elements. * These capabilities have been integrated into AI Mode within Google Search, allowing for more immersive and customizable user journeys. ## Advanced Architectures and Agentic AI * Google is exploring hybrid model architectures, such as Jamba-style models that combine State Space Models (SSMs) with traditional attention mechanisms to handle long contexts more efficiently. * The development of agentic AI focuses on models that can reason, plan, and use tools, exemplified by Project Astra, a prototype for a universal AI agent. * Specialized models like Gemini 3-code have been optimized to act as autonomous collaborators for software developers, assisting in complex coding tasks and system design. ## AI for Science and Planetary Health * In biology, research teams utilized AI to map human heart and brain structures and employed RoseTTAFold-Diffusion to design new proteins for therapeutic use. * The NeuralGCM model has revolutionized Earth sciences by combining traditional physics with machine learning for faster, more accurate weather and climate forecasting. * Environmental initiatives include the FireSat satellite constellation for global wildfire detection and the expansion of AI-driven flood forecasting and contrail mitigation. ## Quantum Computing and Responsible AI * Google achieved significant milestones in quantum error correction, developing low-overhead codes that bring the industry closer to a reliable, large-scale quantum computer. * Security and safety remain central, with the expansion of SynthID—a watermarking tool for AI-generated text, audio, and video—to help users identify synthetic content. * The team continues to refine the Secure AI Framework (SAIF) to defend against emerging threats while promoting the safe deployment of generative media models like Veo and Imagen. To maximize the impact of these advancements, organizations should focus on integrating agentic workflows and RAG-based architectures to ensure their AI implementations are both factual and capable of performing multi-step tasks. Developers can leverage the Gemma open models to build culturally aware applications that scale across diverse global markets.

google

Gemini provides automated feedback for theoretical computer scientists at STOC 2026 (opens in new tab)

Google Research launched an experimental program for the STOC 2026 conference using a specialized Gemini model to provide automated, rigorous feedback on theoretical computer science submissions. By identifying critical logical errors and proof gaps within a 24-hour window, the tool demonstrated that advanced AI can serve as a powerful pre-vetting collaborator for high-level mathematical research. The overwhelmingly positive reception from authors indicates that AI can effectively augment the human peer-review process by improving paper quality before formal submission. ## Advanced Reasoning via Inference Scaling - The tool utilized an advanced version of Gemini 2.5 Deep Think specifically optimized for mathematical rigor. - It employed inference scaling methods, allowing the model to explore and combine multiple possible solutions and reasoning traces simultaneously. - This non-linear approach to problem-solving helps the model focus on the most salient technical issues while significantly reducing the likelihood of hallucinations. ## Structured Technical Feedback - Feedback was delivered in a structured format that included a high-level summary of the paper's core contributions. - The model provided a detailed analysis of potential mistakes, specifically targeting errors within lemmas, theorems, and logical proofs. - Authors also received a categorized list of minor corrections, such as inconsistent variable naming and typographical errors. ## Identified Technical Issues and Impact - The pilot saw high engagement, with over 80% of STOC 2026 submitters opting in for the AI-generated review. - The tool successfully identified "critical bugs" and calculation errors that had previously evaded human authors for months. - Survey results showed that 97% of participants found the feedback helpful, and 81% reported that the tool improved the overall clarity and readability of their work. ## Expert Verification and Hallucinations - Because the users were domain experts, they were able to act as a filter, distinguishing between deep technical insights and occasional model hallucinations. - While the model sometimes struggled to parse complex notation or interpret figures, authors valued the "neutral tone" and the speed of the two-day turnaround. - The feedback was used as a starting point for human verification, allowing researchers to refine their arguments rather than blindly following the model's output. ## Future Outlook and Educational Potential - Beyond professional research, 75% of surveyed authors see significant educational value in using the tool to train students in mathematical rigor. - The experiment's success has led to 88% of participants expressing interest in having continuous access to such a tool throughout their entire research and drafting process. The success of the STOC 2026 pilot suggests that researchers should consider integrating specialized LLMs early in the drafting phase to catch "embarrassing" or logic-breaking errors. While the human expert remains the final arbiter of truth, these tools provide a necessary layer of automated verification that can accelerate the pace of scientific discovery.

google

Spotlight on innovation: Google-sponsored Data Science for Health Ideathon across Africa (opens in new tab)

Google Research, in partnership with several pan-African machine learning communities, recently concluded the Africa-wide Data Science for Health Ideathon to address regional medical challenges. By providing access to specialized open-source health models and technical mentorship, the initiative empowered local researchers to develop tailored solutions for issues ranging from maternal health to oncology. The event demonstrated that localized innovation, supported by high-performance AI foundations, can effectively bridge healthcare gaps in resource-constrained environments. ## Collaborative Framework and Objectives * The Ideathon was launched at the 2025 Deep Learning Indaba in Kigali, Rwanda, in collaboration with SisonkeBiotik, Ro’ya, and DS-I Africa. * The primary goal was to foster capacity building within the African AI community, moving beyond theoretical research toward the execution of practical healthcare tools. * Participants received hands-on training on Google’s specialized health models and were supported with Google Cloud Vertex AI compute credits and mentorship from global experts. * Submissions were evaluated based on their innovation, technical feasibility, and contextual relevance to African health systems. ## Technical Foundations and Google Health Models * Developers focused on a suite of open health AI models, including MedGemma for clinical reasoning, TxGemma for therapeutics, and MedSigLIP for medical vision-language tasks. * The competition utilized a two-phase journey: an initial "Idea Development" stage where teams defined clinical problems and outlined AI approaches, followed by a "Prototype & Pitch" phase. * Technical implementations frequently involved advanced techniques such as Retrieval-Augmented Generation (RAG) to ensure alignment with local medical protocols and WHO guidelines. * Fine-tuning methods, specifically Low-Rank Adaptation (LoRA), were utilized by teams to specialize large-scale models like MedGemma-27B-IT for niche datasets. ## Innovative Solutions for Regional Health * **Dawa Health:** This first-place winner developed an AI-powered cervical cancer screening tool that uses MedSigLIP to identify abnormalities in colposcopy images uploaded via WhatsApp, combined with Gemini RAG for clinical guidance. * **Solver (CerviScreen AI):** This team built a web application for automated cervical-cytology screening by fine-tuning MedGemma-27B-IT on the CRIC dataset to assist cytopathologists with annotated images. * **Mkunga:** A maternal health call center that adapts MedGemma and Gemini to provide advice in Swahili using Speech-to-Text (STT) and Text-to-Speech (TTS) technologies. * **HexAI (DermaDetect):** Recognized for the best proof-of-concept, this offline-first mobile app allows community health workers to triage skin conditions using on-device versions of MedSigLIP, specifically designed for low-connectivity areas. The success of the Ideathon underscores the importance of "local solutions for local priorities." By making sophisticated models like MedGemma and MedSigLIP openly available, the technical barrier to entry is lowered, allowing African developers to build high-impact, culturally and linguistically relevant medical tools. For organizations looking to implement AI in global health, this model of providing foundational tools and cloud resources to local experts remains a highly effective strategy for sustainable innovation.

google

A differentially private framework for gaining insights into AI chatbot use (opens in new tab)

Google Research has introduced Urania, a novel framework designed to extract high-level usage insights from AI chatbot conversations while maintaining rigorous differential privacy (DP) guarantees. Unlike previous heuristic methods that rely on simple redaction or LLM-based PII stripping, this pipeline ensures that no individual user's data can be reconstructed from the resulting summaries. By combining DP clustering and keyword extraction with LLM-based summarization, the system provides a formal, auditable approach to understanding platform trends without compromising sensitive information. ## Limitations of Heuristic Privacy * Existing frameworks often rely on large language models to manually strip personally identifiable information (PII) from text before analysis. * These heuristic protections are difficult to formalize or audit, and their effectiveness may diminish as models evolve or face sophisticated prompt injection attacks. * The Urania framework addresses these weaknesses by using mathematical privacy budgets (the epsilon parameter) to measure and limit the influence of any single user's data on the final output. ## The Differentially Private Pipeline * **DP Clustering**: The framework first converts conversation data into numerical embeddings. These are grouped using a DP clustering algorithm, ensuring that cluster centers reflect broad trends rather than specific individual inputs. * **DP Keyword Extraction**: The system identifies keywords for each cluster and generates a histogram of their frequency. By adding mathematical noise to these counts, the framework masks individual contributions and ensures that only keywords common to many users are retained. * **Keyword Generation Methods**: The researchers explored three methods for extraction: LLM-guided selection of relevant terms, a differentially private version of TF-IDF, and an LLM-guided approach that selects terms from a pre-defined list of public keywords. * **LLM Summarization**: In the final stage, an LLM generates a high-level summary of the cluster using only the noisy, anonymized keywords. Because the LLM never sees the raw conversation text, the "post-processing" property of DP guarantees that the final summary remains private. ## Privacy and Utility Trade-offs * The framework was tested against a non-private baseline (Simple-CLIO) to evaluate how privacy constraints affect the quality of the insights generated. * Stronger privacy settings (lower epsilon values) inherently result in a utility trade-off, as the added noise can obscure some niche usage patterns. * Despite these trade-offs, the framework provides a robust defense against data leakage, as the summarization model is structurally prevented from seeing sensitive original text, making it resilient to prompt injection. This framework offers a scalable way for platform providers to analyze chatbot usage patterns and enforce safety policies while providing mathematical certainty regarding user privacy. For organizations handling sensitive conversation data, moving from heuristic redaction to formal DP pipelines like Urania provides a more robust and auditable path for service improvement.

google

Titans + MIRAS: Helping AI have long-term memory (opens in new tab)

Google Research has introduced Titans, a new architecture, and MIRAS, a theoretical framework, designed to overcome the computational limitations of Transformers while maintaining high-fidelity long-term memory. These innovations utilize "test-time memorization," allowing models to update their core parameters in real-time as they process data without requiring offline retraining. By combining the speed of linear recurrent neural networks (RNNs) with the accuracy of attention mechanisms, the system enables AI to handle massive contexts such as genomic analysis or full-document understanding. ## Titans and Neural Long-Term Memory * Unlike traditional RNNs that compress context into fixed-size vectors or matrices, Titans uses a multi-layer perceptron (MLP) as a dedicated long-term memory module. * This deep neural memory provides significantly higher expressive power, allowing the model to synthesize and understand entire narratives rather than just storing passive snapshots. * The architecture separates memory into two distinct modules: an attention mechanism for precise short-term context and the MLP for summarizing long-term information. ## The Gradient-Based Surprise Metric * Titans employs a "surprise metric" to decide which information is important enough to store, mirroring the human brain's tendency to remember unexpected events. * The model calculates an internal error signal (gradient); a high gradient indicates that the new input is anomalous or context-breaking, signaling it should be prioritized for long-term storage. * The system incorporates "Momentum" to track the flow of context over time, ensuring that subsequent relevant information is captured even if individual tokens are not surprising. * To manage memory capacity during extremely long sequences, an adaptive weight decay mechanism acts as a forgetting gate to discard information that is no longer useful. ## MIRAS: A Unified Framework for Sequence Modeling * MIRAS provides a theoretical blueprint that views all major sequence models—including Transformers and linear RNNs—as different forms of associative memory modules. * The framework defines sequence models through four key design choices: memory architecture (e.g., MLP vs. vector), attentional bias, and the internal learning objectives used to combine new and old data. * This approach shifts AI modeling toward real-time adaptation, where the model actively learns and incorporates specific new details into its core knowledge as data streams in. These advancements suggest a shift away from static context windows toward dynamic systems capable of lifelong learning. For developers working with large-scale data, the Titans architecture provides a practical tool for scaling performance, while the MIRAS framework offers a roadmap for designing next-generation models that adapt instantly to new information.

google

From Waveforms to Wisdom: The New Benchmark for Auditory Intelligence (opens in new tab)

Google Research has introduced the Massive Sound Embedding Benchmark (MSEB) to unify the fragmented landscape of machine sound intelligence. By standardizing the evaluation of eight core auditory capabilities across diverse datasets, the framework reveals that current sound representations are far from universal and have significant performance "headroom" for improvement. Ultimately, MSEB provides an open-source platform to drive the development of general-purpose sound embeddings for next-generation multimodal AI. ### Diverse Datasets for Real-World Scenarios The benchmark utilizes a curated collection of high-quality, accessible datasets designed to reflect global diversity and complex acoustic environments. * **Simple Voice Questions (SVQ):** A foundational dataset featuring 177,352 short spoken queries across 17 languages and 26 locales, recorded in varying conditions like traffic and media noise. * **Speech-MASSIVE:** Used for multilingual spoken language understanding and intent classification. * **FSD50K:** A large-scale dataset for environmental sound event recognition containing 200 classes based on the AudioSet Ontology. * **BirdSet:** A massive-scale benchmark specifically for avian bioacoustics and complex soundscape recordings. ### Eight Core Auditory Capabilities MSEB is structured around "super-tasks" that represent the essential functions an intelligent auditory system must perform within a multimodal context. * **Retrieval and Reasoning:** These tasks simulate voice search and the ability of an assistant to find precise answers within documents based on spoken questions. * **Classification and Transcription:** Standard perception tasks that categorize sounds by environment or intent and convert audio signals into verbatim text. * **Segmentation and Clustering:** These involve identifying and localizing salient terms with precise timestamps and grouping sound samples by shared attributes without predefined labels. * **Reranking and Reconstruction:** Advanced tasks that reorder ambiguous text hypotheses to match spoken queries and test embedding quality by regenerating original audio waveforms. ### Unified Evaluation and Performance Goals The framework is designed to move beyond fragmented research by providing a consistent structure for evaluating different model architectures. * **Model Agnostic:** The open framework allows for the evaluation of uni-modal, cascade, and end-to-end multimodal embedding models. * **Objective Baselines:** By establishing clear performance goals, the benchmark highlights specific research opportunities where current state-of-the-art models fall short of their potential. * **Multimodal Integration:** Every task assumes sound is the critical input but incorporates other modalities, such as text context, to better simulate real-world AI interactions. By providing a comprehensive roadmap for auditory intelligence, MSEB encourages the community to move toward universal sound embeddings. Researchers can contribute to this evolving standard by accessing the open-source GitHub repository and utilizing the newly released datasets on Hugging Face to benchmark their own models.

google

Reducing EV range anxiety: How a simple AI model predicts port availability (opens in new tab)

Google Research has developed a lightweight AI model designed to predict the probability of EV charging port availability at specific future intervals, directly addressing the "range anxiety" experienced by electric vehicle drivers. By co-designing the model with deployment infrastructure, researchers found that a simple linear regression approach outperformed more complex architectures like neural networks and decision trees. The resulting system effectively predicts availability changes during high-turnover periods, providing more reliable navigation and planning data than traditional "no-change" assumptions. ### Model Architecture and Feature Selection * The development team prioritized a minimal feature set to ensure low-latency deployment and high speed in real-world navigational applications. * After testing various architectures, a straightforward linear regression model was selected for its robustness and superior performance in this specific predictive task. * The model was trained using real-time availability data from diverse geographical regions, specifically California and Germany, with an emphasis on larger charging stations that reflect high-traffic usage patterns. ### Temporal Feature Weights and Occupancy Trends * The model uses the hour of the day as a primary feature, treating each hour as an independent variable to capture specific daily cycles. * Learned numerical "weights" dictate the predicted rate of occupancy change: positive weights indicate ports are becoming occupied (e.g., during morning rush), while negative weights indicate ports are being freed up (e.g., during evening hours). * The system is designed to only deviate from the current occupancy state when the change rate is statistically significant or when a station's large size amplifies the likelihood of a status change. ### Performance Benchmarking and Validation * The model was evaluated against a "Keep Current State" baseline, which assumes future availability will be identical to the present status—a difficult baseline to beat since port status remains unchanged roughly 90% of the time over 30-minute windows. * Accuracy was measured using Mean Squared Error (MSE) and Mean Absolute Error (MAE) over 30-minute and 60-minute time horizons across 100 randomly selected stations. * Testing confirmed that the linear regression model provides its greatest value during infrequent but critical moments of high turnover, successfully identifying when a station is likely to become full or available. The success of this model demonstrates that sophisticated deep learning is not always the optimal solution for infrastructure challenges. By combining intuitive real-world logic—such as driver schedules and station capacity—with simple machine learning techniques, developers can create highly efficient tools that significantly improve the EV user experience without requiring massive computational overhead.

google

Real-time speech-to-speech translation (opens in new tab)

Google DeepMind and Google Core ML have developed an innovative end-to-end speech-to-speech translation (S2ST) model that enables real-time, voice-preserved communication with only a two-second delay. By replacing traditional cascaded pipelines with a streaming architecture trained on time-synchronized data, the system overcomes long-standing issues of high latency and accumulated errors. This advancement represents a significant shift toward natural, fluid cross-language dialogue that retains the original speaker's personality. ## Limitations of Cascaded S2ST Traditional real-time translation systems typically rely on a cascaded chain of three distinct AI models: Automatic Speech Recognition (ASR), Automatic Speech Translation (AST), and Text-to-Speech (TTS). This approach suffers from several critical drawbacks: * **High Latency:** Processing through three separate stages results in a 4–5 second delay, forcing users into unnatural, turn-based interactions. * **Error Propagation:** Inaccuracies in the initial transcription or translation phase accumulate, often leading to garbled or incorrect final audio output. * **Loss of Identity:** General-purpose TTS engines generate generic voices, stripping the communication of the original speaker’s unique vocal characteristics. ## Time-Synced Data Acquisition Pipeline To train an end-to-end model capable of low-latency output, researchers created a scalable pipeline that transforms raw audio into a specialized time-synchronized dataset. * **Alignment Multi-mapping:** The process uses forced alignment algorithms to map source audio to source text, source text to translated text, and finally, translated text to generated speech. * **Voice Preservation:** A custom TTS engine generates the target language audio while intentionally preserving the vocal characteristics of the original speaker. * **Strict Validation:** Automated filters discard any segments where alignments fail or where the translated audio cannot meet specific real-time delay requirements. * **Data Augmentation:** The training set is further refined using techniques such as sample rate reduction, denoising, and reverberation to ensure the model performs well in real-world environments. ## End-to-End Streaming Architecture The model’s architecture is designed for continuous audio streams, leveraging the AudioLM framework and fundamental transformer blocks to make real-time decisions. * **Streaming Encoder:** This component summarizes source audio data by focusing on the preceding 10-second window of input. * **Streaming Decoder:** This module predicts translated audio autoregressively, utilizing compressed encoder states and previous predictions to maintain flow. * **RVQ Audio Tokens:** The system represents audio as a 2D set of Residual Vector Quantization (RVQ) tokens, where the X-axis represents time and the Y-axis represents audio quality/fidelity. * **SpectroStream Integration:** By using SpectroStream codec technology, the model manages hierarchical audio representations, allowing it to prioritize the sequential output of audio segments for immediate playback. This technology effectively bridges the gap between high-quality translation and real-time responsiveness. For developers and researchers in the field, the transition from modular cascaded systems to end-to-end streaming architectures—supported by rigorous time-aligned datasets—is the recommended path for achieving truly seamless human-to-human cross-language communication.

google

Generative UI: A rich, custom, visual interactive user experience for any prompt (opens in new tab)

Google Research has introduced a novel Generative UI framework that enables AI models to dynamically construct bespoke, interactive user experiences—including web pages, games, and functional tools—in response to any natural language prompt. This shift from static, predefined interfaces to AI-generated environments allows for highly customized digital spaces that adapt to a user's specific intent and context. Evaluated through human testing, these custom-generated interfaces are strongly preferred over traditional, text-heavy LLM outputs, signaling a fundamental evolution in human-computer interaction. ### Product Integration in Gemini and Google Search The technology is currently being deployed as an experimental feature across Google’s main AI consumer platforms to enhance how users visualize and interact with data. * **Dynamic View and Visual Layout:** These experiments in the Gemini app use agentic coding capabilities to design and code a complete interactive response for every prompt. * **AI Mode in Google Search:** Available for Google AI Pro and Ultra subscribers, this feature uses Gemini 3’s multimodal understanding to build instant, bespoke interfaces for complex queries. * **Contextual Customization:** The system differentiates between user needs, such as providing a simplified interface for a child learning about the microbiome versus a data-rich layout for an adult. * **Task-Specific Tools:** Beyond text, the system generates functional applications like fashion advisors, event planners, and science simulations for topics like RNA transcription. ### Technical Architecture and Implementation The Generative UI implementation relies on a multi-layered approach centered around the Gemini 3 Pro model to ensure the generated code is both functional and accurate. * **Tool Access:** The model is connected to server-side tools, including image generation and real-time web search, to enrich the UI with external data. * **System Instructions:** Detailed guidance provides the model with specific goals, formatting requirements, and technical specifications to avoid common coding errors. * **Agentic Coding:** The model acts as both a designer and a developer, writing the necessary code to render the UI on the fly based on its interpretation of the user’s prompt. * **Post-Processing:** Outputs undergo a series of automated checks to address common issues and refine the final visual experience before it reaches the browser. ### The Shift from Static to Generative Interfaces This research represents a move away from the traditional software paradigm where users must navigate a fixed catalog of applications to find the tool they need. * **Prompt-Driven UX:** Interfaces are generated from prompts as simple as a single word or as complex as multi-paragraph instructions. * **Interactive Comprehension:** By building simulations on the fly, the system creates a dynamic environment optimized for deep learning and task completion. * **Preference Benchmarking:** Research indicates that when generation speed is excluded as a factor, users significantly prefer these custom-built visual tools over standard, static AI responses. To experience this new paradigm, users can select the "Thinking" option from the model menu in Google Search’s AI Mode or engage with the Dynamic View experiment in the Gemini app to generate tailored tools for specific learning or productivity tasks.

google

Separating natural forests from other tree cover with AI for deforestation-free supply chains (opens in new tab)

Researchers from Google DeepMind and Google Research have developed "Natural Forests of the World 2020," an AI-powered global map that distinguishes natural ecosystems from commercial tree plantations. By utilizing high-resolution satellite data and machine learning, the project provides a critical 10-meter resolution baseline to support deforestation-free supply chain regulations like the EUDR. This tool enables governments and companies to monitor biodiversity-rich areas with unprecedented accuracy, ensuring that natural forests are protected from industrial degradation. **The Limitation of Traditional Tree Cover Maps** * Existing maps frequently conflate all woody vegetation into a generic "tree cover" category, leading to "apples-to-oranges" comparisons between different land types. * This lack of distinction makes it difficult to differentiate between the harvesting of short-term plantations and the permanent loss of ancient, biodiversity-rich natural forests. * Precise mapping is now a legal necessity due to regulations like the European Union Regulation on Deforestation-free Products (EUDR), which bans products from land deforested or degraded after December 31, 2020. **The MTSViT Modeling Approach** * To accurately identify forest types, researchers developed the Multi-modal Temporal-Spatial Vision Transformer (MTSViT). * Rather than relying on a single snapshot, the AI "observes" 1280 x 1280 meter patches over the course of a year to identify seasonal, spectral, and textural signatures. * The model integrates multi-modal data, including Sentinel-2 satellite imagery, topographical information (such as elevation and slope), and specific geographical coordinates. * This temporal-spatial analysis allows the AI to recognize the complex patterns of natural forests that distinguish them from the uniform, fast-growing structures of commercial plantations. **Dataset Scale and Global Validation** * The model was trained on a massive dataset comprising over 1.2 million global patches at 10-meter resolution. * The final map provides seamless global coverage, achieving a best-in-class validation accuracy of 92.2% against an independent global dataset. * The research was a collaborative effort involving the World Resources Institute and the International Institute for Applied Systems Analysis to ensure scientific rigor and practical utility. The "Natural Forests of the World 2020" dataset is publicly available via Google Earth Engine and other open repositories. Organizations should leverage this high-resolution baseline to conduct environmental due diligence, support government monitoring, and target conservation efforts in preparation for global climate milestones like COP30.