Google Research / machine-learning

30 posts

google

NeuralGCM harnesses AI to better simulate long-range global precipitation (opens in new tab)

NeuralGCM represents a significant evolution in atmospheric modeling by combining traditional fluid dynamics with neural networks to solve the long-standing challenge of simulating global precipitation. By training the AI component directly on high-quality NASA satellite observations rather than biased reanalysis data, the model achieves unprecedented accuracy in predicting daily weather cycles and extreme rainfall events. This hybrid approach offers a faster, more precise tool for both medium-range weather forecasting and multi-decadal climate projections. ## The Limitations of Cloud Parameterization * Precipitation is driven by cloud processes occurring at scales as small as 100 meters, which is far below the kilometer-scale resolution of global weather models. * Traditional models rely on "parameterizations," or mathematical approximations, to estimate how these small-scale events affect the larger atmosphere. * Because these approximations are often simplified, traditional models struggle to accurately capture the complexity of water droplet formation and ice crystal growth, leading to errors in long-term forecasts. ## Training on Direct Satellite Observations * Unlike previous AI models trained on "reanalyses"—which are essentially simulations used to fill observational gaps—NeuralGCM is trained on NASA satellite-based precipitation data spanning 2001 to 2018. * The model utilizes a differentiable dynamical core, an architecture that allows the neural network to learn the effects of small-scale events directly from physical observations. * By bypassing the weaknesses inherent in reanalysis data, the model effectively creates a machine-learned parameterization that is more faithful to real-world cloud physics. ## Performance in Weather and Climate Benchmarks * At a resolution of 280 km, NeuralGCM outperforms leading operational models in medium-range forecasts (up to 15 days) and matches the precision of sophisticated multi-decadal climate models. * The model shows a marked improvement in capturing precipitation extremes, particularly for the top 0.1% of rainfall events. * Evaluation through WeatherBench 2 demonstrates that NeuralGCM accurately reproduces the diurnal (daily) weather cycle, a metric where traditional physics-based models frequently fall short. NeuralGCM provides a highly efficient and accessible framework for researchers and city planners who need to simulate long-range climate scenarios, such as 100-year storms or seasonal agricultural cycles. Its ability to maintain physical consistency while leveraging the speed of AI makes it a powerful candidate for the next generation of global atmospheric modeling.

google

Spotlight on innovation: Google-sponsored Data Science for Health Ideathon across Africa (opens in new tab)

Google Research, in partnership with several pan-African machine learning communities, recently concluded the Africa-wide Data Science for Health Ideathon to address regional medical challenges. By providing access to specialized open-source health models and technical mentorship, the initiative empowered local researchers to develop tailored solutions for issues ranging from maternal health to oncology. The event demonstrated that localized innovation, supported by high-performance AI foundations, can effectively bridge healthcare gaps in resource-constrained environments. ## Collaborative Framework and Objectives * The Ideathon was launched at the 2025 Deep Learning Indaba in Kigali, Rwanda, in collaboration with SisonkeBiotik, Ro’ya, and DS-I Africa. * The primary goal was to foster capacity building within the African AI community, moving beyond theoretical research toward the execution of practical healthcare tools. * Participants received hands-on training on Google’s specialized health models and were supported with Google Cloud Vertex AI compute credits and mentorship from global experts. * Submissions were evaluated based on their innovation, technical feasibility, and contextual relevance to African health systems. ## Technical Foundations and Google Health Models * Developers focused on a suite of open health AI models, including MedGemma for clinical reasoning, TxGemma for therapeutics, and MedSigLIP for medical vision-language tasks. * The competition utilized a two-phase journey: an initial "Idea Development" stage where teams defined clinical problems and outlined AI approaches, followed by a "Prototype & Pitch" phase. * Technical implementations frequently involved advanced techniques such as Retrieval-Augmented Generation (RAG) to ensure alignment with local medical protocols and WHO guidelines. * Fine-tuning methods, specifically Low-Rank Adaptation (LoRA), were utilized by teams to specialize large-scale models like MedGemma-27B-IT for niche datasets. ## Innovative Solutions for Regional Health * **Dawa Health:** This first-place winner developed an AI-powered cervical cancer screening tool that uses MedSigLIP to identify abnormalities in colposcopy images uploaded via WhatsApp, combined with Gemini RAG for clinical guidance. * **Solver (CerviScreen AI):** This team built a web application for automated cervical-cytology screening by fine-tuning MedGemma-27B-IT on the CRIC dataset to assist cytopathologists with annotated images. * **Mkunga:** A maternal health call center that adapts MedGemma and Gemini to provide advice in Swahili using Speech-to-Text (STT) and Text-to-Speech (TTS) technologies. * **HexAI (DermaDetect):** Recognized for the best proof-of-concept, this offline-first mobile app allows community health workers to triage skin conditions using on-device versions of MedSigLIP, specifically designed for low-connectivity areas. The success of the Ideathon underscores the importance of "local solutions for local priorities." By making sophisticated models like MedGemma and MedSigLIP openly available, the technical barrier to entry is lowered, allowing African developers to build high-impact, culturally and linguistically relevant medical tools. For organizations looking to implement AI in global health, this model of providing foundational tools and cloud resources to local experts remains a highly effective strategy for sustainable innovation.

google

From Waveforms to Wisdom: The New Benchmark for Auditory Intelligence (opens in new tab)

Google Research has introduced the Massive Sound Embedding Benchmark (MSEB) to unify the fragmented landscape of machine sound intelligence. By standardizing the evaluation of eight core auditory capabilities across diverse datasets, the framework reveals that current sound representations are far from universal and have significant performance "headroom" for improvement. Ultimately, MSEB provides an open-source platform to drive the development of general-purpose sound embeddings for next-generation multimodal AI. ### Diverse Datasets for Real-World Scenarios The benchmark utilizes a curated collection of high-quality, accessible datasets designed to reflect global diversity and complex acoustic environments. * **Simple Voice Questions (SVQ):** A foundational dataset featuring 177,352 short spoken queries across 17 languages and 26 locales, recorded in varying conditions like traffic and media noise. * **Speech-MASSIVE:** Used for multilingual spoken language understanding and intent classification. * **FSD50K:** A large-scale dataset for environmental sound event recognition containing 200 classes based on the AudioSet Ontology. * **BirdSet:** A massive-scale benchmark specifically for avian bioacoustics and complex soundscape recordings. ### Eight Core Auditory Capabilities MSEB is structured around "super-tasks" that represent the essential functions an intelligent auditory system must perform within a multimodal context. * **Retrieval and Reasoning:** These tasks simulate voice search and the ability of an assistant to find precise answers within documents based on spoken questions. * **Classification and Transcription:** Standard perception tasks that categorize sounds by environment or intent and convert audio signals into verbatim text. * **Segmentation and Clustering:** These involve identifying and localizing salient terms with precise timestamps and grouping sound samples by shared attributes without predefined labels. * **Reranking and Reconstruction:** Advanced tasks that reorder ambiguous text hypotheses to match spoken queries and test embedding quality by regenerating original audio waveforms. ### Unified Evaluation and Performance Goals The framework is designed to move beyond fragmented research by providing a consistent structure for evaluating different model architectures. * **Model Agnostic:** The open framework allows for the evaluation of uni-modal, cascade, and end-to-end multimodal embedding models. * **Objective Baselines:** By establishing clear performance goals, the benchmark highlights specific research opportunities where current state-of-the-art models fall short of their potential. * **Multimodal Integration:** Every task assumes sound is the critical input but incorporates other modalities, such as text context, to better simulate real-world AI interactions. By providing a comprehensive roadmap for auditory intelligence, MSEB encourages the community to move toward universal sound embeddings. Researchers can contribute to this evolving standard by accessing the open-source GitHub repository and utilizing the newly released datasets on Hugging Face to benchmark their own models.

google

Reducing EV range anxiety: How a simple AI model predicts port availability (opens in new tab)

Google Research has developed a lightweight AI model designed to predict the probability of EV charging port availability at specific future intervals, directly addressing the "range anxiety" experienced by electric vehicle drivers. By co-designing the model with deployment infrastructure, researchers found that a simple linear regression approach outperformed more complex architectures like neural networks and decision trees. The resulting system effectively predicts availability changes during high-turnover periods, providing more reliable navigation and planning data than traditional "no-change" assumptions. ### Model Architecture and Feature Selection * The development team prioritized a minimal feature set to ensure low-latency deployment and high speed in real-world navigational applications. * After testing various architectures, a straightforward linear regression model was selected for its robustness and superior performance in this specific predictive task. * The model was trained using real-time availability data from diverse geographical regions, specifically California and Germany, with an emphasis on larger charging stations that reflect high-traffic usage patterns. ### Temporal Feature Weights and Occupancy Trends * The model uses the hour of the day as a primary feature, treating each hour as an independent variable to capture specific daily cycles. * Learned numerical "weights" dictate the predicted rate of occupancy change: positive weights indicate ports are becoming occupied (e.g., during morning rush), while negative weights indicate ports are being freed up (e.g., during evening hours). * The system is designed to only deviate from the current occupancy state when the change rate is statistically significant or when a station's large size amplifies the likelihood of a status change. ### Performance Benchmarking and Validation * The model was evaluated against a "Keep Current State" baseline, which assumes future availability will be identical to the present status—a difficult baseline to beat since port status remains unchanged roughly 90% of the time over 30-minute windows. * Accuracy was measured using Mean Squared Error (MSE) and Mean Absolute Error (MAE) over 30-minute and 60-minute time horizons across 100 randomly selected stations. * Testing confirmed that the linear regression model provides its greatest value during infrequent but critical moments of high turnover, successfully identifying when a station is likely to become full or available. The success of this model demonstrates that sophisticated deep learning is not always the optimal solution for infrastructure challenges. By combining intuitive real-world logic—such as driver schedules and station capacity—with simple machine learning techniques, developers can create highly efficient tools that significantly improve the EV user experience without requiring massive computational overhead.

google

Real-time speech-to-speech translation (opens in new tab)

Google DeepMind and Google Core ML have developed an innovative end-to-end speech-to-speech translation (S2ST) model that enables real-time, voice-preserved communication with only a two-second delay. By replacing traditional cascaded pipelines with a streaming architecture trained on time-synchronized data, the system overcomes long-standing issues of high latency and accumulated errors. This advancement represents a significant shift toward natural, fluid cross-language dialogue that retains the original speaker's personality. ## Limitations of Cascaded S2ST Traditional real-time translation systems typically rely on a cascaded chain of three distinct AI models: Automatic Speech Recognition (ASR), Automatic Speech Translation (AST), and Text-to-Speech (TTS). This approach suffers from several critical drawbacks: * **High Latency:** Processing through three separate stages results in a 4–5 second delay, forcing users into unnatural, turn-based interactions. * **Error Propagation:** Inaccuracies in the initial transcription or translation phase accumulate, often leading to garbled or incorrect final audio output. * **Loss of Identity:** General-purpose TTS engines generate generic voices, stripping the communication of the original speaker’s unique vocal characteristics. ## Time-Synced Data Acquisition Pipeline To train an end-to-end model capable of low-latency output, researchers created a scalable pipeline that transforms raw audio into a specialized time-synchronized dataset. * **Alignment Multi-mapping:** The process uses forced alignment algorithms to map source audio to source text, source text to translated text, and finally, translated text to generated speech. * **Voice Preservation:** A custom TTS engine generates the target language audio while intentionally preserving the vocal characteristics of the original speaker. * **Strict Validation:** Automated filters discard any segments where alignments fail or where the translated audio cannot meet specific real-time delay requirements. * **Data Augmentation:** The training set is further refined using techniques such as sample rate reduction, denoising, and reverberation to ensure the model performs well in real-world environments. ## End-to-End Streaming Architecture The model’s architecture is designed for continuous audio streams, leveraging the AudioLM framework and fundamental transformer blocks to make real-time decisions. * **Streaming Encoder:** This component summarizes source audio data by focusing on the preceding 10-second window of input. * **Streaming Decoder:** This module predicts translated audio autoregressively, utilizing compressed encoder states and previous predictions to maintain flow. * **RVQ Audio Tokens:** The system represents audio as a 2D set of Residual Vector Quantization (RVQ) tokens, where the X-axis represents time and the Y-axis represents audio quality/fidelity. * **SpectroStream Integration:** By using SpectroStream codec technology, the model manages hierarchical audio representations, allowing it to prioritize the sequential output of audio segments for immediate playback. This technology effectively bridges the gap between high-quality translation and real-time responsiveness. For developers and researchers in the field, the transition from modular cascaded systems to end-to-end streaming architectures—supported by rigorous time-aligned datasets—is the recommended path for achieving truly seamless human-to-human cross-language communication.

google

Separating natural forests from other tree cover with AI for deforestation-free supply chains (opens in new tab)

Researchers from Google DeepMind and Google Research have developed "Natural Forests of the World 2020," an AI-powered global map that distinguishes natural ecosystems from commercial tree plantations. By utilizing high-resolution satellite data and machine learning, the project provides a critical 10-meter resolution baseline to support deforestation-free supply chain regulations like the EUDR. This tool enables governments and companies to monitor biodiversity-rich areas with unprecedented accuracy, ensuring that natural forests are protected from industrial degradation. **The Limitation of Traditional Tree Cover Maps** * Existing maps frequently conflate all woody vegetation into a generic "tree cover" category, leading to "apples-to-oranges" comparisons between different land types. * This lack of distinction makes it difficult to differentiate between the harvesting of short-term plantations and the permanent loss of ancient, biodiversity-rich natural forests. * Precise mapping is now a legal necessity due to regulations like the European Union Regulation on Deforestation-free Products (EUDR), which bans products from land deforested or degraded after December 31, 2020. **The MTSViT Modeling Approach** * To accurately identify forest types, researchers developed the Multi-modal Temporal-Spatial Vision Transformer (MTSViT). * Rather than relying on a single snapshot, the AI "observes" 1280 x 1280 meter patches over the course of a year to identify seasonal, spectral, and textural signatures. * The model integrates multi-modal data, including Sentinel-2 satellite imagery, topographical information (such as elevation and slope), and specific geographical coordinates. * This temporal-spatial analysis allows the AI to recognize the complex patterns of natural forests that distinguish them from the uniform, fast-growing structures of commercial plantations. **Dataset Scale and Global Validation** * The model was trained on a massive dataset comprising over 1.2 million global patches at 10-meter resolution. * The final map provides seamless global coverage, achieving a best-in-class validation accuracy of 92.2% against an independent global dataset. * The research was a collaborative effort involving the World Resources Institute and the International Institute for Applied Systems Analysis to ensure scientific rigor and practical utility. The "Natural Forests of the World 2020" dataset is publicly available via Google Earth Engine and other open repositories. Organizations should leverage this high-resolution baseline to conduct environmental due diligence, support government monitoring, and target conservation efforts in preparation for global climate milestones like COP30.

google

Differentially private machine learning at scale with JAX-Privacy (opens in new tab)

Google DeepMind and Google Research have announced the release of JAX-Privacy 1.0, a high-performance library designed to scale differentially private (DP) machine learning. By leveraging JAX’s native parallelization and functional programming model, the toolkit enables researchers to train large-scale foundation models while maintaining rigorous privacy guarantees. This version introduces modular components for advanced algorithms and empirical auditing, making private training both computationally efficient and verifiable across distributed environments. ### Scaling Differential Privacy with JAX * The library is built directly on the JAX ecosystem, integrating seamlessly with Flax for neural network architectures and Optax for optimization. * It utilizes JAX’s `vmap` for automatic vectorization and `shard_map` for single-program multiple-data (SPMD) parallelization, allowing DP primitives to scale across multiple accelerators. * By using just-in-time (JIT) compilation, the library mitigates the traditional performance overhead associated with per-example gradient clipping and noise addition. ### Core Components and Advanced Algorithms * The toolkit provides fundamental building blocks for implementing standard DP algorithms like DP-SGD and DP-FTRL, including specialized modules for data batch construction. * It supports state-of-the-art methods such as DP matrix factorization, which improves performance by injecting correlated noise across training iterations. * Features like micro-batching and padding are included to handle the massive, variable-sized batches often required to achieve an optimal balance between privacy and model utility. ### Verification and Privacy Auditing * JAX-Privacy incorporates rigorous privacy accounting based on Rényi Differential Privacy to provide precise tracking of privacy budgets. * The library includes tools for empirical auditing, allowing developers to validate their privacy guarantees through techniques like membership inference attacks and data poisoning. * The design ensures correctness in distributed settings, specifically focusing on consistent noise generation and gradient synchronization across clusters. JAX-Privacy 1.0 is a robust solution for researchers and engineers who need to deploy production-grade private models. Its modular architecture and integration with high-performance computing primitives make it a primary choice for training foundation models on sensitive datasets without compromising on scalability or security.

google

DS-STAR: A state-of-the-art versatile data science agent (opens in new tab)

DS-STAR is an advanced autonomous data science agent developed to handle the complexity and heterogeneity of real-world data tasks, ranging from statistical analysis to visualization. By integrating a specialized file analysis module with an iterative planning and verification loop, the system can interpret unstructured data and refine its reasoning steps dynamically based on execution feedback. This architecture allows DS-STAR to achieve state-of-the-art performance on major industry benchmarks, effectively bridging the gap between natural language queries and executable, verified code. ## Comprehensive Data File Analysis The framework addresses a major limitation of current agents—the over-reliance on structured CSV files—by implementing a dedicated analysis stage for diverse data formats. * The system automatically scans a directory to extract context from heterogeneous formats, including JSON, unstructured text, and markdown files. * A Python-based analysis script generates a textual summary of the data structure and content, which serves as the foundational context for the planning phase. * This module ensures the agent can navigate complex, multi-file environments where critical information is often spread across non-relational sources. ## Iterative Planning and Verification Architecture DS-STAR utilizes a sophisticated loop involving four specialized roles to mimic the workflow of a human expert conducting sequential analysis. * **Planner and Coder:** A Planner agent establishes high-level objectives, which a Coder agent سپس translates into executable Python scripts. * **LLM-based Verification:** A Verifier agent acts as a judge, assessing whether the generated code and its output are sufficient to solve the problem or if the reasoning is flawed. * **Dynamic Routing:** If the Verifier identifies gaps, a Router agent guides the refinement process by adding new steps or correcting errors, allowing the cycle to repeat for up to 10 rounds. * **Intermediate Review:** The agent reviews intermediate results before proceeding to the next step, similar to how data scientists use interactive environments like Google Colab. ## Benchmarking and State-of-the-Art Performance The effectiveness of the DS-STAR framework was validated through rigorous testing against existing agents like AutoGen and DA-Agent. * The agent secured the top rank on the public DABStep leaderboard, raising accuracy from 41.0% to 45.2% compared to previous best-performing models. * Performance gains were consistent across other benchmarks, including KramaBench (39.8% to 44.7%) and DA-Code (37.0% to 38.5%). * DS-STAR showed a significant advantage in "hard" tasks—those requiring the synthesis of information from multiple, varied data sources—demonstrating its superior versatility in complex environments. By automating the time-intensive tasks of data wrangling and verification, DS-STAR provides a robust template for the next generation of AI assistants. Organizations looking to scale their data science capabilities should consider adopting iterative agentic workflows that prioritize multi-format data understanding and self-correcting execution loops.

google

Forecasting the future of forests with AI: From counting losses to predicting risk (opens in new tab)

Research from Google DeepMind and Google Research introduces ForestCast, a deep learning-based framework designed to transition forest management from retrospective loss monitoring to proactive risk forecasting. By utilizing vision transformers and pure satellite data, the team has developed a scalable method to predict future deforestation that matches or exceeds the accuracy of traditional models dependent on inconsistent manual inputs. This approach provides a repeatable, future-proof benchmark for protecting biodiversity and mitigating climate change on a global scale. ### Limitations of Traditional Forecasting * Existing state-of-the-art models rely on specialized geospatial maps, such as infrastructure development, road networks, and regional economic indicators. * These traditional inputs are often "patchy" and inconsistent across different countries, requiring manual assembly that is difficult to replicate globally. * Manual data sources are not future-proof; they tend to go out of date quickly with no guarantee of regular updates, unlike continuous satellite streams. ### A Scalable Pure-Satellite Architecture * The ForestCast model adopts a "pure satellite" approach, using only raw inputs from Landsat and Sentinel-2 satellites. * The architecture is built on vision transformers (ViTs) that process an entire tile of pixels in a single pass to capture critical spatial context and landscape-level trends. * The model incorporates a satellite-derived "change history" layer, which identifies previously deforested pixels and the specific year the loss occurred. * By avoiding socio-political or infrastructure maps, the method can be applied consistently to any region on Earth, allowing for meaningful cross-regional comparisons. ### Key Findings and Benchmark Release * Research indicates that "change history" is the most information-dense input; a model trained on this data alone performs almost as well as those using raw multi-spectral data. * The model successfully predicts tile-to-tile variation in deforestation amounts and identifies the specific pixels most likely to be cleared next. * Google has released the training and evaluation data as a public benchmark dataset, focusing initially on Southeast Asia to allow the machine learning community to verify and improve upon the results. The release of ForestCast provides a template for scaling predictive modeling to Latin America, Africa, and boreal latitudes. Conservationists and policymakers should utilize these forecasting tools to move beyond counting historical losses and instead direct resources toward "frontline" areas where the model identifies imminent risk of habitat conversion.

google

Exploring a space-based, scalable AI infrastructure system design (opens in new tab)

Project Suncatcher is a Google moonshot initiative aimed at scaling machine learning infrastructure by deploying solar-powered satellite constellations equipped with Tensor Processing Units (TPUs). By leveraging the nearly continuous energy of the sun in specific orbits and utilizing high-bandwidth free-space optical links, the project seeks to bypass the resource constraints of terrestrial data centers. Early research suggests that a modular, tightly clustered satellite design can achieve the necessary compute density and communication speeds required for modern AI workloads. ### Data-Center Bandwidth via Optical Links * To match terrestrial performance, inter-satellite links must support tens of terabits per second using multi-channel dense wavelength-division multiplexing (DWDM) and spatial multiplexing. * The system addresses signal power loss (the link budget) by maintaining satellites in extremely close proximity—kilometers or less—compared to traditional long-range satellite deployments. * Initial bench-scale demonstrations have successfully achieved 800 Gbps each-way transmission (1.6 Tbps total) using a single transceiver pair, validating the feasibility of high-speed optical networking. ### Orbital Mechanics of Compact Constellations * The proposed system utilizes a sun-synchronous low-earth orbit (LEO) at an altitude of approximately 650 km to maximize solar exposure and minimize the weight of onboard batteries. * Researchers use Hill-Clohessy-Wiltshire equations and JAX-based differentiable models to manage the complex gravitational perturbations and atmospheric drag affecting satellites flying in tight 100–200m formations. * Simulations of 81-satellite clusters indicate that only modest station-keeping maneuvers are required to maintain stable, "free-fall" trajectories within the orbital plane. ### Hardware Resilience in Space Environments * The project specifically tests Google’s Trillium (v6e) Cloud TPUs to determine if terrestrial AI accelerators can survive the radiation found in LEO. * Hardware is subjected to 67MeV proton beams to analyze the impact of Total Ionizing Dose (TID) and Single Event Effects (SEEs) on processing reliability. * Preliminary testing indicates promising results for the radiation tolerance of high-performance accelerators, suggesting that standard TPU architectures may be viable for orbital deployment with minimal modification. While still in the research and development phase, Project Suncatcher suggests that the future of massive AI scaling may involve shifting infrastructure away from terrestrial limits and toward modular, energy-rich orbital environments. Organizations should monitor the progress of free-space optical communication and radiation-hardened accelerators as these technologies will be the primary gatekeepers for space-based computation.

google

Solving virtual machine puzzles: How AI is optimizing cloud computing (opens in new tab)

Google researchers have developed LAVA, a scheduling framework designed to optimize virtual machine (VM) allocation in large-scale data centers by accurately predicting and adapting to VM lifespans. By moving beyond static, one-time predictions toward a "continuous re-prediction" model based on survival analysis, the system significantly improves resource efficiency and reduces fragmentation. This approach allows cloud providers to solve the complex "bin packing" problem more effectively, leading to better capacity utilization and easier system maintenance. ### The Challenge of Long-Tailed VM Distributions * Cloud workloads exhibit a extreme long-tailed distribution: while 88% of VMs live for less than an hour, these short-lived jobs consume only 2% of total resources. * The rare VMs that run for 30 days or longer account for a massive fraction of compute resources, meaning their placement has a disproportionate impact on host availability. * Poor allocation leads to "resource stranding," where a server's remaining capacity is too small or unbalanced to host new VMs, effectively wasting expensive hardware. * Traditional machine learning models that provide only a single prediction at VM creation are often fragile, as a single misprediction can block a physical host from being cleared for maintenance or new tasks. ### Continuous Re-prediction via Survival Analysis * Instead of predicting a single average lifetime, LAVA uses an ML model to generate a probability distribution of a VM's expected duration. * The system employs "continuous re-prediction," asking how much longer a VM is expected to run given how long it has already survived (e.g., a VM that has run for five days is assigned a different remaining lifespan than a brand-new one). * This adaptive approach allows the scheduling logic to automatically correct for initial mispredictions as more data about the VM's actual behavior becomes available over time. ### Novel Scheduling and Rescheduling Algorithms * **Non-Invasive Lifetime Aware Scheduling (NILAS):** Currently deployed on Google’s Borg cluster manager, this algorithm ranks potential hosts by grouping VMs with similar expected exit times to increase the frequency of "empty hosts" available for maintenance. * **Lifetime-Aware VM Allocation (LAVA):** This algorithm fills resource gaps on hosts containing long-lived VMs with jobs that are at least an order of magnitude shorter. This ensures the short-lived VMs exit quickly without extending the host's overall occupation time. * **Lifetime-Aware Rescheduling (LARS):** To minimize disruptions during defragmentation, LARS identifies and migrates the longest-lived VMs first while allowing short-lived VMs to finish their tasks naturally on the original host. By integrating survival-analysis-based predictions into the core logic of data center management, cloud providers can transition from reactive scheduling to a proactive model. This system not only maximizes resource density but also ensures that the physical infrastructure remains flexible enough to handle large, resource-intensive provisioning requests and essential system updates.

google

Using AI to identify genetic variants in tumors with DeepSomatic (opens in new tab)

DeepSomatic is an AI-powered tool developed by Google Research to identify cancer-related mutations by analyzing a tumor's genetic sequence with higher accuracy than current methods. By leveraging convolutional neural networks (CNNs), the model distinguishes between inherited genetic traits and acquired somatic variants that drive cancer progression. This flexible tool supports multiple sequencing platforms and sample types, offering a critical resource for clinicians and researchers aiming to personalize cancer treatment through precision medicine. ## Challenges in Somatic Variant Detection * Somatic variants are genetic mutations acquired after birth through environmental exposure or DNA replication errors, making them distinct from the germline variants found in every cell of a person's body. * Detecting these mutations is technically difficult because tumor samples are often heterogeneous, containing a diverse set of variants at varying frequencies. * Sequencing technologies often introduce small errors that can be difficult to distinguish from actual somatic mutations, especially when the mutation is only present in a small fraction of the sampled cells. ## CNN-Based Variant Calling Architecture * DeepSomatic employs a method pioneered by DeepVariant, which involves transforming raw genetic sequencing data into a set of multi-channel images. * These images represent various data points, including alignment along the chromosome, the quality of the sequence output, and other technical variables. * The convolutional neural network processes these images to differentiate between three categories: the human reference genome, non-cancerous germline variants, and the somatic mutations driving tumor growth. * By analyzing tumor and non-cancerous cells side-by-side, the model effectively filters out sequencing artifacts that might otherwise be misidentified as mutations. ## System Versatility and Application * The model is designed to function in multiple modes, including "tumor-normal" (comparing a biopsy to a healthy sample) and "tumor-only" mode, which is vital for blood cancers like leukemia where isolating healthy cells is difficult. * DeepSomatic is platform-agnostic, meaning it can process data from all major sequencing technologies and adapt to different types of sample processing. * The tool has demonstrated the ability to generalize its learning to various cancer types, even those not specifically included in its initial training sets. ## Open-Source Contributions to Precision Medicine * Google has made the DeepSomatic tool and the CASTLE dataset—a high-quality training and evaluation set—openly available to the global research community. * This initiative is part of a broader effort to use AI for early detection and advanced research in various cancers, including breast, lung, and gynecological cancers. * The release aims to accelerate the development of personalized treatment plans by providing a more reliable way to identify the specific genetic drivers of an individual's disease. By providing a more accurate and adaptable method for variant calling, DeepSomatic helps researchers pinpoint the specific drivers of a patient's cancer. This tool represents a significant advancement in deep learning for genomics, potentially shortening the path from biopsy to targeted therapeutic intervention.

google

Coral NPU: A full-stack platform for Edge AI (opens in new tab)

Coral NPU is a new full-stack, open-source platform designed to bring advanced AI directly to power-constrained edge devices and wearables. By prioritizing a matrix-first hardware architecture and a unified software stack, Google aims to overcome traditional bottlenecks in performance, ecosystem fragmentation, and data privacy. The platform enables always-on, low-power ambient sensing while providing developers with a flexible, RISC-V-based environment for deploying modern machine learning models. ## Overcoming Edge AI Constraints * The platform addresses the "performance gap" where complex ML models typically exceed the power, thermal, and memory budgets of battery-operated devices. * It eliminates the "fragmentation tax" by providing a unified architecture, moving away from proprietary processors that require costly, device-specific optimizations. * On-device processing ensures a high standard of privacy and security by keeping personal context and data off the cloud. ## AI-First Hardware Architecture * Unlike traditional chips, this architecture prioritizes the ML matrix engine over scalar compute to optimize for efficient on-device inference. * The design is built on RISC-V ISA compliant architectural IP blocks, offering an open and extensible reference for system-on-chip (SoC) designers. * The base design delivers performance in the 512 giga operations per second (GOPS) range while consuming only a few milliwatts of power. * The architecture is tailored for "always-on" use cases, making it ideal for hearables, AR glasses, and smartwatches. ## Core Architectural Components * **Scalar Core:** A lightweight, C-programmable RISC-V frontend that manages data flow using an ultra-low-power "run-to-completion" model. * **Vector Execution Unit:** A SIMD co-processor compliant with the RISC-V Vector instruction set (RVV) v1.0 for simultaneous operations on large datasets. * **Matrix Execution Unit:** A specialized engine using quantized outer product multiply-accumulate (MAC) operations to accelerate fundamental neural network tasks. ## Unified Developer Ecosystem * The platform is a C-programmable target that integrates with modern compilers such as IREE and TFLM (TensorFlow Lite Micro). * It supports a wide range of popular ML frameworks, including TensorFlow, JAX, and PyTorch. * The software toolchain utilizes MLIR and the StableHLO dialect to facilitate the transition from high-level models to hardware-executable code. * Developers have access to a complete suite of tools, including a simulator, custom kernels, and a general-purpose MLIR compiler. SoC designers and ML developers looking to build the next generation of wearables should leverage the Coral NPU reference architecture to balance high-performance AI with extreme power efficiency. By utilizing the open-source documentation and RISC-V-based tools, teams can significantly reduce the complexity of deploying private, always-on ambient sensing.

google

Introducing interactive on-device segmentation in Snapseed (opens in new tab)

Google has introduced a new "Object Brush" feature in Snapseed that enables intuitive, real-time selective photo editing through a novel on-device segmentation technology. By leveraging a high-performance interactive AI model, users can isolate complex subjects with simple touch gestures in under 20 milliseconds, bridging the gap between professional-grade editing and mobile convenience. This breakthrough is achieved through a sophisticated teacher-student training architecture that prioritizes both pixel-perfect accuracy and low-latency performance on consumer hardware. ### High-Performance On-Device Inference * The system is powered by the Interactive Segmenter model, which is integrated directly into the Snapseed "Adjust" tool to facilitate immediate object-based modifications. * To ensure a fluid user experience, the model utilizes the MediaPipe framework and LiteRT’s GPU acceleration to process selections in less than 20ms. * The interface supports dynamic refinement, allowing users to provide real-time feedback by tracing lines or tapping to add or subtract specific areas of an image. ### Teacher-Student Model Distillation * The development team first created "Interactive Segmenter: Teacher," a large-scale model fine-tuned on 30,000 high-quality, pixel-perfect manual annotations across more than 350 object categories. * Because the Teacher model’s size and computational requirements are prohibitive for mobile use, researchers developed "Interactive Segmenter: Edge" through knowledge distillation. * This distillation process utilized a dataset of over 2 million weakly annotated images, allowing the smaller Edge model to inherit the generalization capabilities of the Teacher model while maintaining a footprint suitable for mobile devices. ### Training via Synthetic User Prompts * To make the model universally capable across all object types, the training process uses a class-agnostic approach based on the Big Transfer (BiT) strategy. * The model learns to interpret user intent through "prompt generation," which simulates real-world interactions such as random scribbles, taps, and lasso (box) selections. * During training, both the Teacher and Edge models receive identical prompts—such as red foreground scribbles and blue background scribbles—to ensure the student model learns to produce high-quality masks even from imprecise user input. This advancement significantly lowers the barrier to entry for complex photo manipulation by moving heavy-duty AI processing directly onto the mobile device. Users can expect a more responsive and precise editing experience that handles everything from fine-tuning a subject's lighting to isolating specific environmental elements like clouds or clothing.

google

Smarter nucleic acid design with NucleoBench and AdaBeam (opens in new tab)

Google Research and Move37 Labs have introduced NucleoBench, a comprehensive open-source benchmark for nucleic acid design, alongside AdaBeam, a high-performing new optimization algorithm. While AI models have become highly proficient at predicting the biological properties of DNA and RNA, generating optimal sequences within massive search spaces—such as the $2 \times 10^{120}$ possible variations for a 5' UTR—remains a significant hurdle. By standardizing evaluation across 16 distinct biological tasks, this research identifies AdaBeam as a superior method that scales effectively to the large-scale models required for modern drug discovery. ## Standardizing the Optimization Pipeline The process of computational nucleic acid design typically follows a five-step workflow: data collection, training a predictive model, generating candidate sequences (the design step), wet-lab validation, and iterative retraining. NucleoBench focuses specifically on the design step, which has historically lacked standardized evaluation. * Most existing benchmarks rely on decades-old methods like simulated annealing or vanilla genetic algorithms. * Traditional algorithms often treat predictive models as "black boxes," failing to leverage internal model data to guide the search. * The vastness of genomic search spaces makes brute-force optimization impossible, necessitating more intelligent, model-aware generation strategies. ## The NucleoBench Framework NucleoBench is the first large-scale benchmark designed to compare gradient-free and gradient-based design algorithms under identical conditions. The framework encompasses over 400,000 experiments to ensure statistical rigor across diverse biological challenges. * **Algorithm Categories**: It compares gradient-free methods (like directed evolution), which are simple but ignore model internals, against gradient-based methods (like FastSeqProp), which use the model’s internal "direction of steepest improvement" to find better sequences. * **Task Diversity**: The 16 tasks include controlling gene expression in specific cell types (liver or neuronal), maximizing transcription factor binding, and improving chromatin accessibility. * **Scale**: The benchmark includes long-range DNA sequence challenges using large-scale models like Enformer, which are computationally demanding but critical for understanding complex genomic interactions. ## AdaBeam’s Hybrid Optimization Performance Drawing on insights from the NucleoBench evaluation, the researchers developed AdaBeam, a hybrid algorithm that combines the strengths of various optimization strategies. * **Success Rate**: AdaBeam outperformed existing algorithms on 11 of the 16 tasks in the benchmark. * **Efficiency and Scaling**: Unlike many gradient-based methods that struggle with computational overhead, AdaBeam demonstrates superior scaling properties as sequences become longer and predictive models grow in complexity. * **Methodology**: It functions as a hybrid approach, using sophisticated search techniques to navigate the sequence space more effectively than "vanilla" algorithms developed before the era of deep learning. The researchers have made AdaBeam and the NucleoBench repository freely available to the scientific community. By providing a standardized environment for testing, they aim to accelerate the development of next-generation treatments, including more stable mRNA vaccines and precise CRISPR gene therapies.