google

Deeper insights into retrieval augmented generation: The role of sufficient context (opens in new tab)

Google Research has introduced "sufficient context" as a critical new metric for evaluating Retrieval Augmented Generation (RAG) systems, arguing that simple relevance is an inadequate measure of performance. By focusing on whether a retrieved context contains all the necessary information to definitively answer a query, researchers developed an LLM-based autorater that classifies context sufficiency with 93% accuracy. This framework reveals that many RAG failures, specifically hallucinations, occur because models fail to abstain from answering when information is incomplete or contradictory. ## Defining and Measuring Sufficient Context * Sufficient context is defined as containing all information necessary to provide a definitive answer, while insufficient context is relevant but incomplete, inconclusive, or contradictory. * The researchers developed an "autorater" using Gemini 1.5 Pro, utilizing chain-of-thought prompting and 1-shot examples to evaluate query-context pairs. * In benchmarks against human expert "gold standard" labels, the autorater achieved 93% accuracy, outperforming specialized models like FLAMe (fine-tuned PaLM 24B) and NLI-based methods. * Unlike traditional metrics, this approach does not require ground-truth answers to evaluate the quality of the retrieved information. ## RAG Failure Modes and Abstention Challenges * State-of-the-art models (Gemini, GPT, Claude) perform exceptionally well when provided with sufficient context but struggle when context is lacking. * The primary driver of hallucinations in RAG systems is the "abstention" problem, where a model attempts to answer a query based on insufficient context rather than stating "I don't know." * Analyzing model responses through the lens of sufficiency allows developers to distinguish between "knowledge" (the model knows the answer internally) and "grounding" (the model correctly uses the provided context). ## Implementation in Vertex AI * The insights from this research have been integrated into the Vertex AI RAG Engine via a new LLM Re-Ranker feature. * The re-ranker prioritizes retrieved snippets based on their likelihood of providing a sufficient answer, significantly improving retrieval metrics such as normalized Discounted Cumulative Gain (nDCG). * By filtering for sufficiency during the retrieval phase, the system reduces the likelihood that the LLM will be forced to process misleading or incomplete data. To minimize hallucinations and improve the reliability of RAG applications, developers should move beyond keyword-based relevance and implement re-ranking stages that specifically evaluate context sufficiency. Ensuring that an LLM has the "right" to answer based on the provided data—and training it to abstain when that data is missing—is essential for building production-grade generative AI tools.

google

Differential privacy on trust graphs (opens in new tab)

Researchers from Google have introduced Trust Graph Differential Privacy (TGDP), a framework that models privacy based on varying trust relationships between users represented as vertices in a graph. By allowing users to share data with trusted neighbors who then aggregate and privatize the information, TGDP bridges the gap between the highly accurate central DP model and the high-privacy local DP model. This approach enables more practical and accurate data analysis in scenarios where users exhibit nuanced privacy preferences rather than binary trust assumptions. ## Defining Trust Graph DP * The model represents users as vertices and mutual trust as edges, ensuring that a user’s data remains statistically indistinguishable to any party they do not trust. * This guarantee holds even if non-trusted parties pool their data or collaborate with a user's trusted neighbors to attempt re-identification. * TGDP serves as a mathematical interpolation: a "star graph" topology corresponds to the central DP model, while a fully unconnected graph corresponds to the local DP model. ## Private Aggregation and Error Metrics * The research evaluates TGDP through the fundamental task of private aggregation, where the goal is to estimate the sum of all users' private values ($\Sigma x_i$). * Accuracy is quantified using mean-squared error, allowing researchers to establish theoretical upper and lower bounds for algorithm performance. * These bounds demonstrate that the utility of a privacy-preserving algorithm is directly tied to the specific structure of the trust relationships within the network. ## The Dominating Set Algorithm * The proposed algorithm utilizes the concept of a "dominating set"—a subset of users $T$ such that every user in the graph is either in $T$ or adjacent to someone in $T$. * In this mechanism, each user sends their raw data to a trusted neighbor within the dominating set. * The members of the dominating set aggregate the data they receive and add specific statistical noise to satisfy differential privacy before sharing the results. * This method reduces the total noise required compared to the local model, as the number of noise-adding entities is limited to the size of the dominating set rather than the entire population. By leveraging existing trust networks, TGDP provides a rigorous way to optimize the trade-off between privacy and utility. This framework suggests that identifying small dominating sets within a community can significantly improve the accuracy of data analytics and machine learning without requiring a single, universally trusted central curator.

line

How should we evaluate AI- (opens in new tab)

LY Corporation is developing a text-to-image pipeline to automate the creation of branded character illustrations, aiming to reduce the manual workload for designers. The project focuses on utilizing Stable Diffusion and Flow Matching models to generate high-quality images that strictly adhere to specific corporate style guidelines. By systematically evaluating model architectures and hyperparameters, the team seeks to transform subjective image quality into a quantifiable and reproducible technical process. ### Evolution of Image Generation Models * **Diffusion Models:** These models generate images through a gradual denoising process. They use a forward process to add Gaussian noise via a Markov chain and a reverse process to restore the original image based on learned probability distributions. * **Stable Diffusion (SD):** Unlike standard diffusion that operates in pixel space, SD works within a "latent space" using a Variational Autoencoder (VAE). This significantly reduces computational load by denoising latent vectors rather than raw pixels. * **SDXL and SD3.5:** SDXL improves prompt comprehension by adding a second text encoder (CLIP-G/14). SD3.5 introduces a major architectural shift by moving from diffusion to "Flow Matching," utilizing a Multimodal Diffusion Transformer (MMDiT) that handles text and image modalities in a single block for better parameter efficiency. * **Flow Matching:** This approach treats image generation as a deterministic movement through a vector field. Instead of removing stochastic noise, it learns the velocity required to transform a simple probability distribution into a complex data distribution. ### Core Hyperparameters for Output Control * **Seeds and Latent Vectors:** The seed is the integer value that determines the initial random noise. Since Stable Diffusion operates in latent space, this noise is essentially the starting latent vector that dictates the basic structure of the final image. * **Prompts:** Textual inputs serve as the primary guide for the denoiser. Models are trained on image-caption pairs, allowing the U-Net or Transformer blocks to align the visual output with the user’s descriptive intent. * **Classifier-Free Guidance (CFG):** This parameter adjusts the weight of the prompt's influence. It calculates the difference between noise predicted with a prompt and noise predicted without one (or with a negative prompt), allowing users to control how strictly the model follows the text instructions. ### Practical Recommendation To achieve consistent results that match a specific brand identity, it is insufficient to rely on prompts alone; developers should implement automated hyperparameter search and black-box optimization. Transitioning to Flow Matching models like SD3.5 can provide a more deterministic generation path, which is critical when attempting to scale the production of high-quality, branded assets.

google

Bringing 3D shoppable products online with generative AI (opens in new tab)

Google has developed a series of generative AI techniques to transform standard 2D product images into immersive, interactive 3D visualizations for online shopping. By evolving from early neural reconstruction methods to state-of-the-art video generation models like Veo, Google can now produce high-quality 360-degree spins from as few as three images. This progression significantly reduces the cost and complexity for businesses to create shoppable 3D experiences at scale across diverse product categories. ## First Generation: Neural Radiance Fields (NeRFs) * Launched in 2022, this initial approach utilized NeRF technology to synthesize novel views and 360° spins, specifically for footwear on Google Search. * The system required five or more images and relied on complex sub-processes, including background removal, XYZ prediction (NOCS), and camera position estimation. * While a breakthrough, the technology struggled with "noisy" signals and complex geometries, such as the thin structures found in sandals or high heels. ## Second Generation: View-Conditioned Diffusion * Introduced in 2023, this version addressed previous limitations by using a diffusion-based architecture to predict unseen viewpoints from limited data. * The model utilized Score Distillation Sampling (SDS), which compares rendered 3D models against generated targets to iteratively refine parameters for better realism. * This approach allowed Google to scale 3D visualizations to the majority of shoes viewed on Google Shopping, handling more diverse and difficult footwear styles. ## Third Generation: Generalizing with Veo * The current advancement leverages Google’s Veo video generation model to transform product images into consistent, high-fidelity 360° videos. * By training on millions of synthetic 3D assets, Veo captures complex interactions between light, texture, and geometry, making it effective for shiny surfaces and diverse categories like electronics and furniture. * This method removes the need for precise camera pose estimation, increasing reliability across different environments. * While the model can generate a 3D representation from a single image by "hallucinating" missing details, using three images significantly reduces errors and ensures high-fidelity accuracy. These technological milestones mark a shift from specialized 3D reconstruction toward generalized AI models that make digital products feel tangible and interactive for consumers.

line

Code Quality Improvement Techniques Part 1 (opens in new tab)

Maintaining a clear separation of concerns between software layers requires avoiding implicit dependencies where one layer relies on the specific implementation details of another. When different components share "hidden" knowledge—such as a repository fetching extra data specifically to trigger a UI state—the code becomes fragile and difficult to maintain. By passing explicit information through data models, developers can decouple these layers and ensure that changes in one do not inadvertently break the other. ### The Risks of Implicit Layer Dependency When layers share implicit logic, such as a repository layer knowing the specific display requirements of the UI, the architecture becomes tightly coupled and prone to bugs. * In the initial example, the repository fetches `MAX + 1` items specifically because the UI needs to display a "+" sign if more items exist. * This creates a dependency where the UI logic for displaying counts relies entirely on the repository's internal fetching behavior. * Code comments that explain one layer's behavior in the context of another (e.g., `// +1 is for the UI`) are a "code smell" indicating that responsibilities are poorly defined. ### Decoupling Through Explicit State The most effective way to separate these concerns is to modify the data model to carry explicit state information, removing the need for "magic numbers" or leaked logic. * By adding a boolean property like `hasMoreItems` to the `StoredItems` model, the repository can explicitly communicate the existence of additional data. * The repository handles the logic of fetching `limit + 1`, determining the boolean state, and then truncating the list to the correct size before passing it up. * The UI layer becomes "dumb" and only reacts to the provided data; it no longer needs to know about the `MAX_COUNT` constant or the repository's fetching strategy to determine its display state. ### Strategic Placement of Logic and Constants Determining where constants like `ITEM_LIST_MAX_COUNT` should reside is a key architectural decision that impacts code reuse and clarity. * **Business Logic Layer:** Placing such constants in a dedicated Domain or Use Case layer is often the best approach for maintaining a clean architecture. * **Model Classes:** If a separate logic layer is too complex for the project scale, the constant can be housed within the model class (e.g., using a companion object in Kotlin). * **Dependency Direction:** Developers must ensure that functional logic does not leak into generic data models, as this can create confusing dependencies where a general-purpose model becomes tied to a specific feature's algorithm. Effective software design relies on components maintaining a "proper distance" from one another. To improve code quality, favor explicit flags and clear data contracts over implicit assumptions about how different layers of the stack will interact.

google

A new light on neural connections (opens in new tab)

Google and the Institute of Science and Technology Austria (ISTA) have developed LICONN, the first light-microscopy-based method capable of comprehensively mapping neurons and their connections in brain tissue. This approach overcomes the traditional reliance on expensive electron microscopy by utilizing physical tissue expansion and advanced machine learning to achieve comparable resolution and accuracy. The researchers successfully validated the technique by reconstructing nearly one million cubic microns of mouse cortex, demonstrating that light microscopy can now achieve "dense" connectomics at scale. ## Overcoming Resolution and Cost Barriers * Connectomics has traditionally relied on electron microscopy (EM) because it offers nanometer-scale resolution, whereas standard light microscopy is limited by the diffraction limit of visible light. * Electron microscopes cost millions of dollars and require specialized training, restricting high-level neuroscience research to wealthy, large-scale institutions. * LICONN provides a more accessible alternative by utilizing standard light microscopy equipment already found in most life science laboratories. ## Advanced Tissue Expansion and Labeling * The project uses a specialized expansion microscopy protocol where brain tissue is embedded in hydrogels that absorb water and physically swell. * The technique employs three different hydrogels to create interweaving polymer networks that expand the tissue by 16 times in each dimension while preserving structural integrity. * A whole-protein labeling process is used to provide the necessary image contrast, allowing for the tracing of densely packed neurites and the detection of synapses. ## Automated Reconstruction and Validation * Google applied its established suite of machine learning and image analysis tools to automate the reconstruction of the expanded tissue samples. * The team verified the accuracy of the method by tracing approximately 0.5 meters of neurites within mouse hippocampus tissue, confirming results comparable to electron microscopy. * In a large-scale validation, the researchers provided an automated reconstruction of a volume of mouse cortex totaling nearly one million cubic microns. ## Integration of Molecular and Structural Data * One of LICONN’s primary advantages over electron microscopy is its ability to capture multiple light wavelengths simultaneously. * Researchers can use fluorescent markers to visualize specific proteins, neurotransmitters, and other molecules within the structural map. * This dual-layered approach allows scientists to align molecular information with physical neuronal pathways, offering new insights into how brain circuits drive behavior and cognition. LICONN represents a significant shift in neuroscience by democratizing high-resolution brain mapping. By replacing expensive hardware requirements with sophisticated chemical protocols and machine learning, this method enables a wider range of laboratories to contribute to the global effort of mapping the brain’s intricate wiring.

google

Making complex text understandable: Minimally-lossy text simplification with Gemini (opens in new tab)

Google Research has introduced a novel system using Gemini models to perform minimally-lossy text simplification, a process designed to enhance readability while meticulously preserving original meaning and nuance. By utilizing an automated, iterative prompt-refinement loop, the system optimizes LLM instructions to achieve high-fidelity paraphrasing that avoids the information loss typical of standard summarization. A large-scale randomized study confirms that this approach significantly improves user comprehension across complex domains like law and medicine while simultaneously reducing cognitive load for the reader. ## Automated Evaluation and Fidelity Assessment * The system moves beyond traditional metrics like Flesch-Kincaid by using a Gemini-powered 1-10 readability scale that aligns more closely with human judgment and comprehension ease. * Fidelity is maintained through a specialized process using Gemini 1.5 Pro that maps specific claims from the original source text directly to the simplified output. * This mapping method identifies and weights specific error types, such as information loss, unnecessary gains, or factual distortions, to ensure the output remains a faithful representation of the technical original. ## Iterative Prompt Optimization Loop * To overcome the limitations and speed of manual prompt engineering, the researchers implemented a feedback loop where Gemini models optimize their own instructions. * In this "LLMs optimizing LLMs" setup, Gemini 1.5 Pro analyzes the performance of simplification prompts and proposes refinements based on automated readability and fidelity scores. * The optimization process ran for 824 iterations before performance plateaued, allowing the system to autonomously discover highly effective strategies for simplifying text without sacrificing detail. ## Validating Impact through Randomized Studies * The effectiveness of the model was validated with 4,563 participants across 31 diverse text excerpts covering specialized fields like aerospace, philosophy, finance, and biology. * The study utilized a randomized complete block design to compare the original text against simplified versions, measuring outcomes through nearly 50,000 multiple-choice question responses. * Beyond accuracy, researchers measured cognitive effort using the NASA Task Load Index and tracked self-reported user confidence to ensure the simplification actually lowered the barrier to understanding. This technology provides a scalable method for democratizing access to specialist knowledge by making expert-level discourse understandable to a general audience. The system is currently available as the "Simplify" feature within the Google app for iOS, offering a practical tool for users navigating complex digital information.

google

Amplify Initiative: Localized data for globalized AI (opens in new tab)

The Amplify Initiative by Google Research addresses the critical lack of linguistic and cultural diversity in generative AI training data by establishing an open, community-based platform for localized data collection. By partnering with regional experts to co-create structured, high-quality datasets, the initiative aims to ensure AI models are both representative and effective in solving local challenges across health, finance, and education. This approach shifts data collection from a top-down model to a participatory framework that prioritizes responsible, locally respectful practices in the Global South. ## The Amplify Platform Framework The initiative is designed to bridge the gap between global AI capabilities and local needs through three core pillars: * **Participatory Co-creation:** Researchers and local communities collaborate to define specific data needs, ensuring the resulting datasets address region-specific problems like financial literacy or localized health misinformation. * **Open Access for Innovation:** The platform provides high-quality, multilingual datasets suitable for fine-tuning and evaluating models, specifically empowering developers in the Global South to build tools for their own communities. * **Author Recognition:** Contributors receive tangible rewards, including professional certificates, research acknowledgments, and data authorship attribution, creating a sustainable ecosystem for expert participation. ## Pilot Implementation in Sub-Saharan Africa To test the methodology, Google Research partnered with Makerere University’s AI Lab in Uganda to conduct an on-the-ground pilot program. * **Expert Onboarding:** The program trained 259 experts across Ghana, Kenya, Malawi, Nigeria, and Uganda through a combination of in-person workshops and app-based modules. * **Dataset Composition:** The pilot resulted in 8,091 annotated adversarial queries across seven languages, covering salient domains such as education and finance. * **Adversarial Focus:** By focusing on adversarial queries, the team captured localized nuances of potential AI harms, including regional stereotypes and specialized advice that generic models often miss. ## Technical Workflow and App-Based Methodology The initiative utilizes a structured technical pipeline to scale data collection while maintaining high quality and privacy. * **Privacy-Preserving Android App:** A dedicated app serves as the primary interface for training, data creation, and annotation, allowing experts to contribute from their own environments. * **Automated Validation:** The app includes built-in feedback loops that use automated checks to ensure queries are relevant and to prevent the submission of semantically similar or duplicate entries. * **Domain-Specific Annotation:** Experts are provided with specialized annotation topics tailored to their professional backgrounds, ensuring that the metadata for each query is technically accurate and contextually relevant. The Amplify Initiative provides a scalable blueprint for building inclusive AI by empowering experts in the Global South to define their own data needs. As the project expands to India and Brazil, it offers a vital resource for developers seeking to fine-tune models for local contexts and improve the safety and relevance of AI on a global scale.

google

AMIE gains vision: A research AI agent for multimodal diagnostic dialogue (opens in new tab)

Google Research and DeepMind have introduced multimodal AMIE, an advanced research AI agent designed to conduct diagnostic medical dialogues that integrate text, images, and clinical documents. By building on Gemini 2.0 Flash and a novel state-aware reasoning framework, the system can intelligently request and interpret visual data such as skin photos or ECGs to refine its diagnostic hypotheses. This evolution moves AI diagnostic tools closer to real-world clinical practice, where visual evidence is often essential for accurate patient assessment and management. ### Enhancing AMIE with Multimodal Perception To move beyond text-only limitations, researchers integrated vision capabilities that allow the agent to process complex medical information during a conversation. * The system uses Gemini 2.0 Flash as its core component to interpret diverse data types, including dermatology images and laboratory reports. * By incorporating multimodal perception, the agent can resolve diagnostic ambiguities that cannot be addressed through verbal descriptions alone. * Preliminary testing with Gemini 2.5 Flash suggests that further scaling the underlying model continues to improve the agent's reasoning and diagnostic accuracy. ### Emulating Clinical Workflows via State-Aware Reasoning A key technical contribution is the state-aware phase transition framework, which helps the AI mimic the structured yet flexible approach used by experienced clinicians. * The framework orchestrates the conversation through three distinct phases: History Taking, Diagnosis & Management, and Follow-up. * The agent maintains a dynamic internal state that tracks known information about the patient and identifies specific "knowledge gaps." * When the system detects uncertainty, it strategically requests multimodal artifacts—such as a photo of a rash or an image of a lab result—to update its differential diagnosis. * Transitions between conversation phases are only triggered once the system assesses that the objectives of the current phase have been sufficiently met. ### Evaluation through Simulated OSCEs To validate the agent’s performance, the researchers developed a robust simulation environment to facilitate rapid iteration and standardized testing. * The system was tested using patient scenarios grounded in real-world datasets, including the SCIN dataset for dermatology and PTB-XL for ECG measurements. * Evaluation was conducted using a modified version of Objective Structured Clinical Examinations (OSCEs), the global standard for assessing medical students and professionals. * In comparative studies, AMIE's performance was measured against primary care physicians (PCPs) to ensure its behavior, accuracy, and tone aligned with clinical standards. This research demonstrates that multimodal AI agents can effectively navigate the complexities of a medical consultation by combining linguistic empathy with the technical ability to interpret visual clinical evidence. As these systems continue to evolve, they offer a promising path toward high-quality, accessible diagnostic assistance that mirrors the multimodal nature of human medicine.

google

Benchmarking LLMs for global health (opens in new tab)

Google Research has introduced a benchmarking pipeline and a dataset of over 11,000 synthetic personas to evaluate how Large Language Models (LLMs) handle tropical and infectious diseases (TRINDs). While LLMs excel at standard medical exams like the USMLE, this study reveals significant performance gaps when models encounter the regional context shifts and localized health data common in low-resource settings. The research concludes that integrating specific environmental context and advanced reasoning techniques is essential for making LLMs reliable decision-support tools for global health. ## Development of the TRINDs Synthetic Dataset * Researchers created a dataset of 11,000+ personas covering 50 tropical and infectious diseases to address the lack of rigorous evaluation data for out-of-distribution medical tasks. * The process began with "seed" templates based on factual data from the WHO, CDC, and PAHO, which were then reviewed by clinicians for clinical relevance. * The dataset was expanded using LLM prompting to include diverse demographic, clinical, and consumer-focused augmentations. * To test linguistic distribution shifts, the seed set was manually translated into French to evaluate how language changes impact diagnostic accuracy. ## Identifying Critical Performance Drivers * Evaluations of Gemini 1.5 models showed that accuracy on TRINDs is lower than reported performance on standard U.S. medical benchmarks, indicating a struggle with "out-of-distribution" disease types. * Contextual information is the primary driver of accuracy; the highest performance was achieved only when specific symptoms were combined with location and risk factors. * The study found that symptoms alone are often insufficient for an accurate diagnosis, emphasizing that LLMs require localized environmental data to differentiate between similar tropical conditions. * Linguistic shifts pose a significant challenge, as model performance dropped by approximately 10% when processing the French version of the dataset compared to the English version. ## Optimization and Reasoning Strategies * Implementing Chain-of-Thought (CoT) prompting—where the model is directed to explain its reasoning step-by-step—led to a significant 10% increase in diagnostic accuracy. * Researchers utilized an LLM-based "autorater" to scale the evaluation process, scoring answers as correct if the predicted diagnosis was meaningfully similar to the ground truth. * In tests regarding social biases, the study found no statistically significant difference in performance across race or gender identifiers within this specific TRINDs context. * Performance remained stable even when clinical language was swapped for consumer-style descriptions, suggesting the models are robust to variations in how patients describe their symptoms. To improve the utility of LLMs for global health, developers should prioritize the inclusion of regional risk factors and location-specific data in prompts. Utilizing reasoning-heavy strategies like Chain-of-Thought and expanding multilingual training sets are critical steps for bridging the performance gap in underserved regions.

google

Improving brain models with ZAPBench (opens in new tab)

Google Research, in collaboration with HHMI Janelia and Harvard, has introduced ZAPBench, a first-of-its-kind whole-brain activity dataset and benchmark designed to improve the accuracy of brain activity models. Using the larval zebrafish as a model organism, the project provides single-cell resolution recordings of approximately 70,000 neurons, capturing nearly the entire vertebrate brain in action. This resource allows researchers to bridge the gap between structural connectomics and dynamic functional activity to better understand how neural wiring generates complex behavior. ## Whole-Brain Activity in Larval Zebrafish * The dataset focuses on the six-day-old larval zebrafish because it is small, transparent, and capable of complex behaviors like motor learning, hunting, and memory. * Researchers used light-sheet microscopy to scan the brain in 3D slices, recording two hours of continuous activity. * The fish were engineered with GCaMP, a genetically encoded calcium indicator that emits light when neurons fire, allowing for the visualization of real-time neural impulses. * To correlate neural activity with behavior, the fish were placed in a virtual reality environment where stimuli—such as shifting water currents and light changes—were projected around them while tail muscle activity was recorded via electrodes. ## The ZAPBench Framework * ZAPBench standardizes the evaluation of machine learning models in neuroscience, following the tradition of benchmarks in fields like computer vision and language modeling. * The benchmark provides a high-quality dataset of 70,000 neurons, whereas previous efforts in other species often covered less than 0.1% of the brain. * It challenges models to predict how neurons will respond to specific visual stimuli and behavioral patterns. * Initial results presented at ICLR 2025 demonstrate that while simple linear models provide a baseline, advanced architectures like Transformers and Convolutional Neural Networks (CNNs) significantly improve prediction accuracy. ## Integrating Structure and Function * While previous connectomics projects mapped physical neural connections, ZAPBench adds the "dynamic" layer of how those connections are used over time. * The team is currently generating a comprehensive structural connectome for the exact same specimen used in the activity recordings. * This dual approach will eventually allow scientists to investigate the direct relationship between precise physical wiring and the resulting patterns of neural activity across an entire vertebrate brain. By providing an open-source dataset and standardized benchmark, ZAPBench enables the global research community to develop and compare more sophisticated models of neural dynamics, potentially leading to breakthroughs in how we simulate and understand vertebrate cognition.

google

Introducing Mobility AI: Advancing urban transportation (opens in new tab)

Google Research has introduced Mobility AI, a comprehensive program designed to provide transportation agencies with data-driven tools for managing urban congestion, road safety, and evolving transit patterns. By leveraging advancements in measurement, simulation, and optimization, the initiative translates decades of Google’s geospatial research into actionable technologies for infrastructure planning and real-time traffic management. The program aims to empower policymakers and engineers to mitigate gridlock and environmental impacts through high-resolution modeling and continuous monitoring of urban transportation systems. ### Measurement: Understanding Mobility Patterns The measurement pillar focuses on establishing a precise baseline of current transportation conditions using real-time and historical data. * **Congestion Functions:** Researchers utilize machine learning and floating car data to develop city-wide models that mathematically describe the relationship between vehicle volume and travel speeds, even on roads with limited data. * **Geospatial Foundation Models:** By applying self-supervised learning to movement patterns, the program creates embeddings that capture local spatial characteristics. This allows for better reasoning about urban mobility in data-sparse environments. * **Analytical Formulation:** Specific research explores how adjusting traffic signal timing influences the distribution of flow across urban networks, revealing patterns in how congestion propagates. ### Simulation: Forecasting and Scenario Analysis Mobility AI uses simulation technologies to create digital twins of cities, allowing planners to test interventions before implementing them physically. * **Traffic Simulation API:** This tool enables the modeling of complex "what-if" scenarios, such as the impact of closing a major bridge or reconfiguring lane assignments on a highway. * **High-Fidelity Calibration:** The simulations are calibrated using large-scale, real-world data to ensure that the virtual models accurately reflect local driver behavior and infrastructure constraints. * **Scalable Evaluation:** These digital environments provide a risk-free way to assess how new developments, such as the rise of autonomous vehicles or e-commerce logistics, will reshape existing traffic patterns. ### Optimization: Improving Urban Flow The optimization pillar focuses on applying AI to solve large-scale coordination problems, such as signal timing and routing efficiency. * **Project Green Light:** This initiative uses AI to provide traffic signal timing recommendations to city engineers, specifically targeting a reduction in stop-and-go traffic to lower greenhouse gas emissions. * **System-Wide Coordination:** Optimization algorithms work to balance the needs of multiple modes of transport, including public transit, cycling, and pedestrian infrastructure, rather than focusing solely on personal vehicles. * **Integration with Google Public Sector:** Research breakthroughs from this program are being integrated into Google Maps Platform and Google Public Sector tools to provide agencies with accessible, enterprise-grade optimization capabilities. Transportation agencies and researchers can leverage these foundational AI technologies to transition from reactive traffic management to proactive, data-driven policymaking. By participating in the Mobility AI program, public sector leaders can gain access to advanced simulation and measurement tools designed to build more resilient and efficient urban mobility networks.

google

A new hybrid platform for quantum simulation of magnetism (opens in new tab)

Google Quantum AI researchers have developed a hybrid quantum simulation platform that combines the flexibility of digital gates with the high-speed entanglement growth of analog dynamics. Using a 69-qubit Sycamore processor, the team demonstrated high-precision simulations of quantum magnetism that are estimated to be over a million years beyond the reach of the world’s fastest supercomputers. This approach allows for the study of complex physical systems before environmental noise can degrade the quantum state. ## The Hybrid Analog-Digital Approach * Digital simulation provides high flexibility by breaking operations into sequential logical gates, but it is relatively slow because qubits only interact in pairs. * Analog simulation activates all qubit couplers in parallel to mimic continuous, real-world dynamics, enabling much faster growth of quantum entanglement. * The hybrid model uses digital gates for initial state preparation and final characterization, while utilizing analog evolution for the core simulation phase. * This combination minimizes the time the system is exposed to noise while maintaining the ability to target specific, complex problems. ## High-Precision Calibration and Benchmarking * The team overcame the "interference" problem of analog simulation—where simultaneous coupler activation creates unpredictable results—by developing a new calibration scheme and precise hardware modeling. * The system achieved a high level of accuracy, with an error rate of only 0.1% each time a quantum excitation moves between qubits. * Benchmarking via random circuit sampling showed the platform can reach chaotic, highly entangled states significantly faster than purely digital methods. * Researchers estimate that reproducing these results with the same accuracy on the Frontier supercomputer would take more than one million years. ## Discovery in Quantum Magnetism * The researchers used the platform to study the XXZ model, a foundational paradigm in quantum magnetism, across a 69-qubit array. * The experiment investigated how quantum systems reach thermal equilibrium, focusing on the Eigenstate Thermalization Hypothesis (ETH). * The simulation revealed a surprising exception to standard physics theories: a specific parameter regime where the system resisted thermalization and remained in a non-equilibrium state. * This finding challenges the "Generalized Gibbs Ensemble," a widely used theory for predicting the behavior of isolated quantum systems. This hybrid platform establishes a new standard for using current-generation quantum hardware to conduct meaningful scientific research. By integrating analog speed with digital control, the approach provides a viable roadmap for exploring many-body physics and finding practical applications in the NISQ (Noisy Intermediate-Scale Quantum) era.

google

InstructPipe: Generating Visual Blocks pipelines with human instructions and LLMs (opens in new tab)

InstructPipe is a research prototype designed to simplify machine learning prototyping by generating visual programming pipelines directly from natural language instructions. By leveraging a multi-stage large language model (LLM) framework, the system automates the selection and connection of nodes to lower the barrier for novice users. The result is a streamlined workflow that transforms abstract text commands into functional, editable node-graph diagrams within the Visual Blocks for ML environment. ### Pipeline Representation and Efficiency - Visual Blocks pipelines are structured as Directed Acyclic Graphs (DAGs) and are typically stored in a verbose JSON format. - To improve LLM performance, InstructPipe utilizes a "pseudocode" intermediate representation that is highly token-efficient, compressing pipeline data from 2.8k tokens down to approximately 123 tokens. - This pseudocode defines output variables, unique node IDs, and node types while specifying arguments such as input images or text prompts (e.g., `pali_1_out:pali(image=input_image_1, prompt=input_text_1)`). ### Two-Stage LLM Refinement - The **Node Selector** module acts as a high-level filter, using brief node descriptions to identify a relevant subset of tools from the library based on the user's intent. - The **Code Writer** module receives the filtered list and uses detailed node configurations—including specific input/output data types and usage examples—to draft the actual pipeline logic. - This dual-prompting strategy mimics human developer behavior by first scanning documentation categories and then focusing on specific function requirements to ensure accurate node connections. ### Interpretation and Execution - A dedicated **Code Interpreter** parses the generated pseudocode to reconstruct the final JSON-formatted pipeline required by the visual editor. - The system renders the resulting graph in an interactive workspace, allowing users to immediately execute, modify, or extend the machine learning workflow. - Technical evaluations indicate that this approach effectively supports multimodal pipelines, such as those involving the PaLI model for vision-language tasks, while significantly reducing the learning curve for new users. InstructPipe demonstrates how LLMs can bridge the gap between high-level human intent and low-code visual programming environments. For developers and researchers, this approach mitigates the "blank canvas" problem, allowing for faster experimentation and the rapid prototyping of complex machine learning architectures through simple text-based collaboration.

google

Teaching machines the language of biology: Scaling large language models for next-generation single-cell analysis (opens in new tab)

Cell2Sentence-Scale (C2S-Scale) is a new family of open-source large language models designed to transform complex single-cell transcriptomic data into a text-based format accessible to natural language processing. By representing gene expression profiles as "cell sentences," the framework allows researchers to use general-purpose LLM architectures to "read" and "write" biological information. This approach simplifies single-cell analysis, enabling conversational queries and automated data interpretation that were previously limited to specialized tools and expert users. ### The Cell2Sentence Mapping Method * Translates single-cell RNA sequencing (scRNA-seq) measurements into sequences of text by ordering gene names according to their expression levels. * Enables the integration of cellular data with text-based biological context, such as cell types, experimental metadata, and scientific literature. * Leverages the existing vocabulary of biology—gene names and functions—to make high-dimensional data interpretable by standard language model tokenizers. ### C2S-Scale Model Architecture and Training * Built upon Google’s Gemma open model family, maintaining the original architecture to benefit from existing scalability and infrastructure. * Trained on a dataset exceeding 1 billion tokens derived from real-world transcriptomic data and biological metadata. * Features a range of model sizes from 410 million to 27 billion parameters, allowing researchers to choose between computational efficiency for exploratory work and high performance for complex tasks. ### Functional Applications in Biology * **Conversational Querying:** Researchers can interact with data through natural language to ask specific questions, such as predicting how a T cell might respond to a particular cancer therapy. * **Automated Interpretation:** The models can generate biological summaries of experiments, describing everything from individual cell types to the characteristics of entire tissues. * **Predictive Tasks:** The framework handles diverse tasks including cell type annotation and the generation of synthetic cells or tissues for research simulations. ### Performance and Biological Scaling Laws * Research demonstrates that biological language models follow predictable scaling laws, where performance in tasks like cell type annotation improves as model size increases. * Larger models show superior gene overlap and semantic similarity scores when interpreting datasets compared to smaller versions. * Smaller models remain highly effective for parameter-efficient fine-tuning in resource-constrained environments. C2S-Scale is available as an open-source resource on GitHub and HuggingFace, offering a flexible toolkit for the research community to apply large language models to next-generation genomic discovery.