foundation-models

7 posts

google

Accelerating the magic cycle of research breakthroughs and real-world applications (opens in new tab)

Google Research is accelerating a "magic cycle" where breakthrough scientific discoveries and real-world applications continuously reinforce one another through advanced AI models and open platforms. By leveraging agentic tools and large-scale foundations, the company is transforming complex data into actionable insights across geospatial analysis, genomics, and quantum computing. This iterative process aims to solve critical global challenges while simultaneously uncovering new frontiers for future innovation. ### Earth AI and Geospatial Reasoning * Google has integrated various geospatial models—including those for flood forecasting, wildfire tracking, and air quality—into a unified Earth AI program. * The newly introduced Geospatial Reasoning Agent uses Large Language Models (LLMs) to allow non-experts to ask complex questions and receive plain-language answers derived from diverse datasets. * Riverine flood models have been significantly expanded, now providing forecasts for over 2 billion people across 150 countries. * New Remote Sensing and Population Dynamics Foundations have been released to help researchers understand nuanced correlations in planetary data and supply chain management. ### DeepSomatic and Genomic Research * Building on ten years of genomics work, DeepSomatic is an AI tool designed to identify somatic mutations (genetic variants in tumors) to assist in cancer research. * The tool follows the development of previous foundational models like DeepVariant and DeepConsensus, which helped map human and non-human genomes. * These advancements aim to move the medical field closer to precision medicine by providing health practitioners with higher-resolution data on genetic variations. ### The Magic Cycle of Research and Development * Google highlights "Quantum Echoes" as a key breakthrough in quantum computing, contributing to the broader goal of solving fundamental scientific problems through high-scale computation. * The acceleration of discovery is largely attributed to "agentic tools" that assist scientists in navigating massive datasets and uncovering new research opportunities. * The company emphasizes a collaborative approach, making foundation models available to trusted testers and partners like the WHO and various international research institutes. To maximize the impact of these breakthroughs, organizations should look toward integrating multimodal AI agents that can bridge the gap between specialized scientific data and practical decision-making. By utilizing open platforms and foundation models, the broader scientific community can translate high-level research into scalable solutions for climate resilience, healthcare, and global policy.

google

Google Earth AI: Unlocking geospatial insights with foundation models and cross-modal reasoning (opens in new tab)

Google Earth AI introduces a framework of geospatial foundation models and reasoning agents designed to solve complex, planetary-scale challenges through cross-modal reasoning. By integrating Gemini-powered orchestrators with specialized imagery, population, and environmental models, the system deconstructs multifaceted queries into actionable multi-step plans. This approach enables a holistic understanding of real-world events, such as disaster response and disease forecasting, by grounding AI insights in diverse, grounded geospatial data. ## Geospatial Reasoning Agents * Utilizes Gemini models as intelligent orchestrators to manage complex queries that require data from multiple domains. * The agent deconstructs a high-level question—such as predicting hurricane landfalls and community vulnerability—into a sequence of smaller, executable tasks. * It executes these plans by autonomously calling specialized foundation models, querying vast datastores, and utilizing geospatial tools to fuse disparate data points into a single, cohesive answer. ## Remote Sensing and Imagery Foundations * Employs vision-language models and open-vocabulary object detection trained on a large corpus of high-resolution overhead imagery paired with text descriptions. * Enables "zero-shot" capabilities, allowing users to find specific objects like "flooded roads" or "building damage" using natural language without needing to retrain the model for specific classes. * Technical evaluations show a 16% average improvement on text-based image search tasks and more than double the baseline accuracy for detecting novel objects in a zero-shot setting. ## Population Dynamics and Mobility * Focuses on the interplay between people and places using globally-consistent embeddings across 17 countries. * Includes monthly updated embeddings that capture shifting human activity patterns, which are essential for time-sensitive forecasting. * Research conducted with the University of Oxford showed that incorporating these population embeddings into a Dengue fever forecasting model in Brazil improved the R² metric from 0.456 to 0.656 for long-range 12-month predictions. ## Environmental and Disaster Forecasting * Integrates established Google research into weather nowcasting, flood forecasting, and wildfire boundary mapping. * Provides the reasoning agent with the data necessary to evaluate environmental risks alongside population density and infrastructure imagery. * Aims to provide Search and Maps users with real-time, accurate alerts regarding natural disasters grounded in planetary-scale environmental data. Developers and enterprises looking to solve high-level geospatial problems can now express interest in accessing these capabilities through Google Earth and Google Cloud. By leveraging these foundation models, organizations can automate the analysis of satellite imagery and human mobility data to better prepare for environmental and social challenges.

google

Time series foundation models can be few-shot learners (opens in new tab)

Researchers at Google have introduced TimesFM-ICF, a foundation model that enables time-series forecasting to transition from zero-shot to few-shot learning via in-context fine-tuning. By utilizing continued pre-training and specialized separator tokens, the model learns to adapt to a handful of related examples at inference time without requiring the complex supervised fine-tuning typically needed for task-specific optimization. This approach effectively matches or exceeds the performance of specialized models while maintaining the flexibility of a general-purpose foundation model. ### Overcoming the Limitations of Zero-Shot Models * Traditional forecasting often requires building separate, specialized models for every unique task, which is resource-intensive and slow. * While zero-shot models like the original TimesFM provide immediate forecasts without task-specific training, they cannot incorporate relevant context, such as data from nearby sensors or similar historical patterns. * The In-Context Fine-tuning (ICF) approach allows the model to "learn" from a few examples provided at the time of prediction, similar to how Large Language Models (LLMs) use few-shot prompting. ### Architecture and the Common Separator Token * TimesFM-ICF utilizes a patched decoder architecture that tokenizes 32 contiguous timepoints into a single input token. * To prevent the model from conflating different data streams—such as separate store locations or distinct time periods—researchers introduced a "common separator token" as a digital boundary between examples. * The model processes these tokens through a transformer stack using causal self-attention (CSA), ensuring it learns from historical context without accidentally "peeking" into the future. * A shared multilayer perceptron (MLP) translates the processed output tokens back into a forecast spanning 128 timepoints. ### Performance Benchmarking and Results * The model was evaluated on 23 unseen datasets, using the Mean Absolute Scaled Error (MASE) metric to aggregate performance across diverse time-series tasks. * TimesFM-ICF demonstrated a significant performance boost over the original zero-shot TimesFM and other state-of-the-art foundation models like Moirai and Lag-Llama. * Test results showed that providing just a few in-context examples allowed the model to match the accuracy of supervised fine-tuning, which normally requires much more computational overhead and data curation. TimesFM-ICF represents a practical shift for businesses managing diverse data streams, offering a way to achieve high-accuracy forecasts by simply providing a few relevant historical examples. For those looking to optimize inventory or energy demands, this method provides the precision of a custom-tuned model with the deployment speed of a pre-trained foundation model.

google

SensorLM: Learning the language of wearable sensors (opens in new tab)

SensorLM is a new family of foundation models designed to bridge the gap between high-dimensional wearable sensor data and natural language descriptions. By training on a massive dataset of nearly 60 million hours of de-identified health data, the models learn to interpret complex physiological signals to provide meaningful context for human activities. This research demonstrates that integrating multimodal sensor signals with language models enables sophisticated health insights, such as zero-shot activity recognition and automated health captioning, that significantly outperform general-purpose large language models. ## Dataset Scale and Automated Annotation * The models were pre-trained on an unprecedented 59.7 million hours of multimodal sensor data collected from over 103,000 individuals across 127 countries. * To overcome the high cost of manual annotation, researchers developed a hierarchical pipeline that automatically generates text descriptions by calculating statistics and identifying trends within the raw sensor streams. * Data was sourced from Fitbit and Pixel Watch devices, representing nearly 2.5 million person-days of activity and health information. ## Hybrid Training Architecture * SensorLM unifies two primary multimodal strategies: contrastive learning and generative pre-training. * Through contrastive learning, the model learns to discriminate between different states—such as a "light swim" versus a "strength workout"—by matching sensor segments to corresponding text descriptions. * The generative component allows the model to "speak" for the sensors, producing nuanced, context-aware natural language captions directly from high-dimensional biometric signals. ## Activity Recognition and Cross-Modal Capabilities * The model demonstrates state-of-the-art performance in zero-shot human activity recognition, accurately classifying 20 different activities without any specific fine-tuning. * Its few-shot learning capabilities allow the model to adapt to new tasks or individual user patterns with only a handful of examples. * SensorLM facilitates cross-modal retrieval, enabling users or experts to find specific sensor patterns using natural language queries or to generate descriptions based on specific sensor inputs. ## Generative Health Captioning * Beyond simple classification, the model can generate hierarchical captions that describe the statistical, structural, and semantic dimensions of a user’s data. * Experimental results using metrics like BERTScore show that SensorLM produces captions that are more factually correct and coherent than those created by powerful non-specialist LLMs. * This capability allows for the translation of abstract data points, such as heart rate variability or step counts, into readable summaries that explain the "why" behind physiological changes. By providing a framework where wearable data can be understood through the lens of human language, SensorLM paves the way for more intuitive and personalized health monitoring. This technology holds the potential to transform raw biometric streams into actionable insights, helping users better understand the relationship between their activities and their overall physical well-being.

google

LSM-2: Learning from incomplete wearable sensor data (opens in new tab)

LSM-2 introduces a paradigm shift in processing wearable sensor data by treating naturally occurring data gaps as inherent features rather than errors to be corrected. By utilizing the Adaptive and Inherited Masking (AIM) framework, the model learns directly from fragmented, real-world data streams without the need for biased imputation or data-discarding filters. This approach allows LSM-2 to achieve state-of-the-art performance in health-related classification and regression tasks, maintaining robustness even when sensors fail or data is highly interrupted. ## The Challenge of Pervasive Missingness * Real-world wearable data is almost never continuous; factors such as device charging, motion artifacts, and battery-saving modes create frequent "missingness." * Traditional self-supervised learning models require complete data, forcing researchers to use imputation—which can introduce artificial bias—or aggressive filtering that discards over 90% of potentially useful samples. * In a dataset of 1.6 million day-long windows, research found that not a single sample had 0% missingness, highlighting the impracticality of training only on complete datasets. ## Adaptive and Inherited Masking (AIM) * AIM extends the Masked Autoencoder (MAE) framework by treating "inherited" masks (naturally occurring gaps) and "artificial" masks (training objectives) as equivalent. * The framework utilizes a dual masking strategy: it employs token dropout on a fixed ratio of tokens to ensure computational efficiency during encoding. * To handle the unpredictable and variable nature of real-world gaps, AIM uses attention masking within the transformer blocks for any remaining masked tokens. * During evaluation and fine-tuning, the model relies solely on attention masking to navigate naturally occurring gaps, allowing for accurate physiological modeling without filling in missing values. ## Scale and Training Architecture * LSM-2 was trained on a massive dataset comprising 40 million hours of de-identified wearable data from more than 60,000 participants using Fitbit and Google Pixel devices. * The model learns to understand underlying physiological structures by reconstructing masked segments across multimodal inputs, including heart signals, sleep patterns, and activity levels. * Because it is trained on fragmented data, the resulting foundation model is significantly more resilient to sensor dropouts in downstream tasks like hypertension prediction or stress monitoring. LSM-2 demonstrates that foundation models for health should be built to embrace the messiness of real-world environments. By integrating missingness directly into the self-supervised learning objective, developers can bypass the computational and statistical overhead of imputation while building more reliable diagnostic and monitoring tools.

google

Introducing Mobility AI: Advancing urban transportation (opens in new tab)

Google Research has introduced Mobility AI, a comprehensive program designed to provide transportation agencies with data-driven tools for managing urban congestion, road safety, and evolving transit patterns. By leveraging advancements in measurement, simulation, and optimization, the initiative translates decades of Google’s geospatial research into actionable technologies for infrastructure planning and real-time traffic management. The program aims to empower policymakers and engineers to mitigate gridlock and environmental impacts through high-resolution modeling and continuous monitoring of urban transportation systems. ### Measurement: Understanding Mobility Patterns The measurement pillar focuses on establishing a precise baseline of current transportation conditions using real-time and historical data. * **Congestion Functions:** Researchers utilize machine learning and floating car data to develop city-wide models that mathematically describe the relationship between vehicle volume and travel speeds, even on roads with limited data. * **Geospatial Foundation Models:** By applying self-supervised learning to movement patterns, the program creates embeddings that capture local spatial characteristics. This allows for better reasoning about urban mobility in data-sparse environments. * **Analytical Formulation:** Specific research explores how adjusting traffic signal timing influences the distribution of flow across urban networks, revealing patterns in how congestion propagates. ### Simulation: Forecasting and Scenario Analysis Mobility AI uses simulation technologies to create digital twins of cities, allowing planners to test interventions before implementing them physically. * **Traffic Simulation API:** This tool enables the modeling of complex "what-if" scenarios, such as the impact of closing a major bridge or reconfiguring lane assignments on a highway. * **High-Fidelity Calibration:** The simulations are calibrated using large-scale, real-world data to ensure that the virtual models accurately reflect local driver behavior and infrastructure constraints. * **Scalable Evaluation:** These digital environments provide a risk-free way to assess how new developments, such as the rise of autonomous vehicles or e-commerce logistics, will reshape existing traffic patterns. ### Optimization: Improving Urban Flow The optimization pillar focuses on applying AI to solve large-scale coordination problems, such as signal timing and routing efficiency. * **Project Green Light:** This initiative uses AI to provide traffic signal timing recommendations to city engineers, specifically targeting a reduction in stop-and-go traffic to lower greenhouse gas emissions. * **System-Wide Coordination:** Optimization algorithms work to balance the needs of multiple modes of transport, including public transit, cycling, and pedestrian infrastructure, rather than focusing solely on personal vehicles. * **Integration with Google Public Sector:** Research breakthroughs from this program are being integrated into Google Maps Platform and Google Public Sector tools to provide agencies with accessible, enterprise-grade optimization capabilities. Transportation agencies and researchers can leverage these foundational AI technologies to transition from reactive traffic management to proactive, data-driven policymaking. By participating in the Mobility AI program, public sector leaders can gain access to advanced simulation and measurement tools designed to build more resilient and efficient urban mobility networks.

google

Geospatial Reasoning: Unlocking insights with generative AI and multiple foundation models (opens in new tab)

Google Research is introducing Geospatial Reasoning, a new framework that integrates generative AI with specialized foundation models to streamline complex geographical problem-solving. By combining large language models like Gemini with domain-specific data, the initiative seeks to make large-scale spatial analysis accessible to sectors like public health, urban development, and climate resilience. This research effort moves beyond traditional data silos, enabling agentic workflows that can interpret diverse data types—from satellite imagery to population dynamics—through natural language. ### Specialized Foundation Models for Human Activity * The Population Dynamics Foundation Model (PDFM) captures the complex interplay between human behaviors and their local environments. * A dedicated trajectory-based mobility foundation model has been developed to process and analyze movement patterns. * While initially tested in the US, experimental datasets are expanding to include the UK, Australia, Japan, Canada, and Malawi for selected partners. ### Remote Sensing and Vision Architectures * New models utilize advanced architectures including masked autoencoders, SigLIP, MaMMUT, and OWL-ViT, specifically adapted for the remote sensing domain. * Training involves high-resolution satellite and aerial imagery paired with text descriptions and bounding box annotations to enable precise object detection. * The models support zero-shot classification and retrieval, allowing users to locate specific features—such as "residential buildings with solar panels"—using flexible natural language queries. * Internal evaluations show state-of-the-art performance across multiple benchmarks, including image segmentation and post-disaster damage assessment. ### Agentic Workflows and Industry Collaboration * The Geospatial Reasoning framework utilizes LLMs like Gemini to manage complex datasets and orchestrate "agentic" workflows. * These workflows are grounded in geospatial data to ensure that the insights generated are both useful and contextually accurate. * Google is collaborating with inaugural industry partners, including Airbus, Maxar, Planet Labs, and WPP, to test these capabilities in real-world scenarios. Organizations interested in accelerating their geospatial analysis should consider applying for the trusted tester program to explore how these foundation models can be fine-tuned for specific proprietary data and use cases.