llm - Google Research

google Apr 21, 2026

ReasoningBank: Enabling agents to learn from experience (opens in new tab)

ReasoningBank: Enabling agents to learn from experience April 21, 2026 Jun Yan and Chen-Yu Lee, Research Scientists, Google Cloud ReasoningBank is a novel agent memory framework that uses successful and failed experiences to distill generalizable reasoning strategies, enabling a…

llm database-design postgresql contrastive-learning+4

google Apr 9, 2026

ConvApparel: Measuring and bridging the realism gap in user simulators (opens in new tab)

ConvApparel: Measuring and bridging the realism gap in user simulators April 9, 2026 Ofer Meshi and Sally Goldman, Research Scientists, Google Research We introduce ConvApparel, a new human-AI conversation dataset and a comprehensive evaluation framework designed to quantify the…

llm database-design gemini conversational-ai+4

google Apr 8, 2026

Improving the academic workflow: Introducing two AI agents for better figures and peer review (opens in new tab)

Improving the academic workflow: Introducing two AI agents for better figures and peer review April 8, 2026 Jinsung Yoon, Research Scientist, and Tomas Pfister, Director, Google Cloud Introducing two AI agents to streamline academic research. These include: PaperVizAgent, a visu…

llm ai data-visualization multi-agent-systems+4

google Apr 3, 2026

Evaluating alignment of behavioral dispositions in LLMs (opens in new tab)

Evaluating alignment of behavioral dispositions in LLMs April 3, 2026 Amir Taubenfeld, Research Engineer, Zorik Gekhman, Research Scientist, and Lior Nezry, Psychology Researcher, Google Research As part of our ongoing exploration of model behavior and alignment, we introduce a…

llm behavioral-analysis situational-judgment-tests psychological-questionnaires+4

google Mar 16, 2026

Testing LLMs on superconductivity research questions (opens in new tab)

Testing LLMs on superconductivity research questions March 16, 2026 Subhashini Venugopalan, Research Scientist, and Eun-ah Kim, Visiting Scientist, Google Research Can LLMs become expert-level research partners in modern physics? Using high-temperature superconductivity as a cas…

llm database-design gemini notebooklm+2

google Mar 4, 2026

Teaching LLMs to reason like Bayesians (opens in new tab)

Teaching LLMs to reason like Bayesians March 4, 2026 Sjoerd van Steenkiste and Tal Linzen, Research Scientists, Google Research We teach LLMs to reason in a Bayesian manner by training them to mimic the predictions of an optimal Bayesian model. Quick links Paper Share Copy link…

llm supervised-fine-tuning bayesian-inference probabilistic-reasoning+3

google Feb 10, 2026

Beyond one-on-one: Authoring, simulating, and testing dynamic human-AI group conversations (opens in new tab)

Beyond one-on-one: Authoring, simulating, and testing dynamic human-AI group conversations February 10, 2026 Erzhen Hu, Student Researcher, and Ruofei Du, Interactive Perception & Graphics Lead, Google XR DialogLab is a research prototype that provides a unified interface to con…

llm gen-ai conversational-ai human-ai-interaction+3

google Jan 28, 2026

Towards a science of scaling agent systems: When and why agent systems work (opens in new tab)

Towards a science of scaling agent systems: When and why agent systems work January 28, 2026 Yubin Kim, Research Intern, and Xin Liu, Senior Research Scientist, Google Research Through a controlled evaluation of 180 agent configurations, we derive the first quantitative scaling…

llm ai-agent multi-agent-systems llm-evaluation+3

google Jan 27, 2026

ATLAS: Practical scaling laws for multilingual models (opens in new tab)

ATLAS: Practical scaling laws for multilingual models January 27, 2026 Shayne Longpre, Google Cloud Student Researcher, and Sayna Ebrahimi, Research Scientist, Google DeepMind We introduce new scaling laws for massively multilingual language models. ATLAS provides guidance on ho…

llm fine-tuning scaling-laws transfer-learning+4

google Dec 17, 2025

Google Research 2025: Bolder breakthroughs, bigger impact (opens in new tab)

Google Research in 2025 has shifted toward an accelerated "Magic Cycle" that rapidly translates foundational breakthroughs into real-world applications across science, society, and consumer products. By prioritizing model efficiency, factuality, and agentic capabilities, the organization is moving beyond static text generation toward interactive, multi-modal systems that solve complex global challenges. This evolution is underpinned by a commitment to responsible AI development, ensuring that new technologies like quantum computing and generative UI are both safe and culturally inclusive. ## Enhancing Model Efficiency and Factuality * Google introduced new efficiency-focused techniques like block verification (an evolution of speculative decoding) and the LAVA scheduling algorithm, which optimizes resource allocation in large cloud data centers. * The Gemini 3 model achieved state-of-the-art results on factuality benchmarks, including SimpleQA Verified and the newly released FACTS benchmark suite, by emphasizing grounded world knowledge. * Research into Retrieval Augmented Generation (RAG) led to the development of the LLM Re-Ranker in Vertex AI, which helps models determine if they possess sufficient context to provide accurate answers. * The Gemma open model expanded to support over 140 languages, supported by the TUNA taxonomy and the Amplify initiative to improve socio-cultural intelligence and data representation. ## Interactive Experiences through Generative UI * A novel implementation of generative UI allows Gemini 3 to dynamically create visual interfaces, web pages, and tools in response to user prompts rather than providing static text. * This technology is powered by specialized models like "Gemini 3-interactive," which are trained to output structured code and design elements. * These capabilities have been integrated into AI Mode within Google Search, allowing for more immersive and customizable user journeys. ## Advanced Architectures and Agentic AI * Google is exploring hybrid model architectures, such as Jamba-style models that combine State Space Models (SSMs) with traditional attention mechanisms to handle long contexts more efficiently. * The development of agentic AI focuses on models that can reason, plan, and use tools, exemplified by Project Astra, a prototype for a universal AI agent. * Specialized models like Gemini 3-code have been optimized to act as autonomous collaborators for software developers, assisting in complex coding tasks and system design. ## AI for Science and Planetary Health * In biology, research teams utilized AI to map human heart and brain structures and employed RoseTTAFold-Diffusion to design new proteins for therapeutic use. * The NeuralGCM model has revolutionized Earth sciences by combining traditional physics with machine learning for faster, more accurate weather and climate forecasting. * Environmental initiatives include the FireSat satellite constellation for global wildfire detection and the expansion of AI-driven flood forecasting and contrail mitigation. ## Quantum Computing and Responsible AI * Google achieved significant milestones in quantum error correction, developing low-overhead codes that bring the industry closer to a reliable, large-scale quantum computer. * Security and safety remain central, with the expansion of SynthID—a watermarking tool for AI-generated text, audio, and video—to help users identify synthetic content. * The team continues to refine the Secure AI Framework (SAIF) to defend against emerging threats while promoting the safe deployment of generative media models like Veo and Imagen. To maximize the impact of these advancements, organizations should focus on integrating agentic workflows and RAG-based architectures to ensure their AI implementations are both factual and capable of performing multi-step tasks. Developers can leverage the Gemma open models to build culturally aware applications that scale across diverse global markets.

llm ai gen-ai multimodal-ai+5

google Dec 14, 2025

Gemini provides automated feedback for theoretical computer scientists at STOC 2026 (opens in new tab)

Google Research launched an experimental program for the STOC 2026 conference using a specialized Gemini model to provide automated, rigorous feedback on theoretical computer science submissions. By identifying critical logical errors and proof gaps within a 24-hour window, the tool demonstrated that advanced AI can serve as a powerful pre-vetting collaborator for high-level mathematical research. The overwhelmingly positive reception from authors indicates that AI can effectively augment the human peer-review process by improving paper quality before formal submission. ## Advanced Reasoning via Inference Scaling - The tool utilized an advanced version of Gemini 2.5 Deep Think specifically optimized for mathematical rigor. - It employed inference scaling methods, allowing the model to explore and combine multiple possible solutions and reasoning traces simultaneously. - This non-linear approach to problem-solving helps the model focus on the most salient technical issues while significantly reducing the likelihood of hallucinations. ## Structured Technical Feedback - Feedback was delivered in a structured format that included a high-level summary of the paper's core contributions. - The model provided a detailed analysis of potential mistakes, specifically targeting errors within lemmas, theorems, and logical proofs. - Authors also received a categorized list of minor corrections, such as inconsistent variable naming and typographical errors. ## Identified Technical Issues and Impact - The pilot saw high engagement, with over 80% of STOC 2026 submitters opting in for the AI-generated review. - The tool successfully identified "critical bugs" and calculation errors that had previously evaded human authors for months. - Survey results showed that 97% of participants found the feedback helpful, and 81% reported that the tool improved the overall clarity and readability of their work. ## Expert Verification and Hallucinations - Because the users were domain experts, they were able to act as a filter, distinguishing between deep technical insights and occasional model hallucinations. - While the model sometimes struggled to parse complex notation or interpret figures, authors valued the "neutral tone" and the speed of the two-day turnaround. - The feedback was used as a starting point for human verification, allowing researchers to refine their arguments rather than blindly following the model's output. ## Future Outlook and Educational Potential - Beyond professional research, 75% of surveyed authors see significant educational value in using the tool to train students in mathematical rigor. - The experiment's success has led to 88% of participants expressing interest in having continuous access to such a tool throughout their entire research and drafting process. The success of the STOC 2026 pilot suggests that researchers should consider integrating specialized LLMs early in the drafting phase to catch "embarrassing" or logic-breaking errors. While the human expert remains the final arbiter of truth, these tools provide a necessary layer of automated verification that can accelerate the pace of scientific discovery.

llm ai gen-ai nlp+5

google Dec 9, 2025

A differentially private framework for gaining insights into AI chatbot use (opens in new tab)

Google Research has introduced Urania, a novel framework designed to extract high-level usage insights from AI chatbot conversations while maintaining rigorous differential privacy (DP) guarantees. Unlike previous heuristic methods that rely on simple redaction or LLM-based PII stripping, this pipeline ensures that no individual user's data can be reconstructed from the resulting summaries. By combining DP clustering and keyword extraction with LLM-based summarization, the system provides a formal, auditable approach to understanding platform trends without compromising sensitive information. ## Limitations of Heuristic Privacy * Existing frameworks often rely on large language models to manually strip personally identifiable information (PII) from text before analysis. * These heuristic protections are difficult to formalize or audit, and their effectiveness may diminish as models evolve or face sophisticated prompt injection attacks. * The Urania framework addresses these weaknesses by using mathematical privacy budgets (the epsilon parameter) to measure and limit the influence of any single user's data on the final output. ## The Differentially Private Pipeline * **DP Clustering**: The framework first converts conversation data into numerical embeddings. These are grouped using a DP clustering algorithm, ensuring that cluster centers reflect broad trends rather than specific individual inputs. * **DP Keyword Extraction**: The system identifies keywords for each cluster and generates a histogram of their frequency. By adding mathematical noise to these counts, the framework masks individual contributions and ensures that only keywords common to many users are retained. * **Keyword Generation Methods**: The researchers explored three methods for extraction: LLM-guided selection of relevant terms, a differentially private version of TF-IDF, and an LLM-guided approach that selects terms from a pre-defined list of public keywords. * **LLM Summarization**: In the final stage, an LLM generates a high-level summary of the cluster using only the noisy, anonymized keywords. Because the LLM never sees the raw conversation text, the "post-processing" property of DP guarantees that the final summary remains private. ## Privacy and Utility Trade-offs * The framework was tested against a non-private baseline (Simple-CLIO) to evaluate how privacy constraints affect the quality of the insights generated. * Stronger privacy settings (lower epsilon values) inherently result in a utility trade-off, as the added noise can obscure some niche usage patterns. * Despite these trade-offs, the framework provides a robust defense against data leakage, as the summarization model is structurally prevented from seeing sensitive original text, making it resilient to prompt injection. This framework offers a scalable way for platform providers to analyze chatbot usage patterns and enforce safety policies while providing mathematical certainty regarding user privacy. For organizations handling sensitive conversation data, moving from heuristic redaction to formal DP pipelines like Urania provides a more robust and auditable path for service improvement.

llm ai differential-privacy embeddings+3

google Nov 17, 2025

Generative UI: A rich, custom, visual interactive user experience for any prompt (opens in new tab)

Google Research has introduced a novel Generative UI framework that enables AI models to dynamically construct bespoke, interactive user experiences—including web pages, games, and functional tools—in response to any natural language prompt. This shift from static, predefined interfaces to AI-generated environments allows for highly customized digital spaces that adapt to a user's specific intent and context. Evaluated through human testing, these custom-generated interfaces are strongly preferred over traditional, text-heavy LLM outputs, signaling a fundamental evolution in human-computer interaction. ### Product Integration in Gemini and Google Search The technology is currently being deployed as an experimental feature across Google’s main AI consumer platforms to enhance how users visualize and interact with data. * **Dynamic View and Visual Layout:** These experiments in the Gemini app use agentic coding capabilities to design and code a complete interactive response for every prompt. * **AI Mode in Google Search:** Available for Google AI Pro and Ultra subscribers, this feature uses Gemini 3’s multimodal understanding to build instant, bespoke interfaces for complex queries. * **Contextual Customization:** The system differentiates between user needs, such as providing a simplified interface for a child learning about the microbiome versus a data-rich layout for an adult. * **Task-Specific Tools:** Beyond text, the system generates functional applications like fashion advisors, event planners, and science simulations for topics like RNA transcription. ### Technical Architecture and Implementation The Generative UI implementation relies on a multi-layered approach centered around the Gemini 3 Pro model to ensure the generated code is both functional and accurate. * **Tool Access:** The model is connected to server-side tools, including image generation and real-time web search, to enrich the UI with external data. * **System Instructions:** Detailed guidance provides the model with specific goals, formatting requirements, and technical specifications to avoid common coding errors. * **Agentic Coding:** The model acts as both a designer and a developer, writing the necessary code to render the UI on the fly based on its interpretation of the user’s prompt. * **Post-Processing:** Outputs undergo a series of automated checks to address common issues and refine the final visual experience before it reaches the browser. ### The Shift from Static to Generative Interfaces This research represents a move away from the traditional software paradigm where users must navigate a fixed catalog of applications to find the tool they need. * **Prompt-Driven UX:** Interfaces are generated from prompts as simple as a single word or as complex as multi-paragraph instructions. * **Interactive Comprehension:** By building simulations on the fly, the system creates a dynamic environment optimized for deep learning and task completion. * **Preference Benchmarking:** Research indicates that when generation speed is excluded as a factor, users significantly prefer these custom-built visual tools over standard, static AI responses. To experience this new paradigm, users can select the "Thinking" option from the model menu in Google Search’s AI Mode or engage with the Dynamic View experiment in the Gemini app to generate tailored tools for specific learning or productivity tasks.

llm ai gemini multimodal-ai+4

google Nov 5, 2025

DS-STAR: A state-of-the-art versatile data science agent (opens in new tab)

DS-STAR is an advanced autonomous data science agent developed to handle the complexity and heterogeneity of real-world data tasks, ranging from statistical analysis to visualization. By integrating a specialized file analysis module with an iterative planning and verification loop, the system can interpret unstructured data and refine its reasoning steps dynamically based on execution feedback. This architecture allows DS-STAR to achieve state-of-the-art performance on major industry benchmarks, effectively bridging the gap between natural language queries and executable, verified code. ## Comprehensive Data File Analysis The framework addresses a major limitation of current agents—the over-reliance on structured CSV files—by implementing a dedicated analysis stage for diverse data formats. * The system automatically scans a directory to extract context from heterogeneous formats, including JSON, unstructured text, and markdown files. * A Python-based analysis script generates a textual summary of the data structure and content, which serves as the foundational context for the planning phase. * This module ensures the agent can navigate complex, multi-file environments where critical information is often spread across non-relational sources. ## Iterative Planning and Verification Architecture DS-STAR utilizes a sophisticated loop involving four specialized roles to mimic the workflow of a human expert conducting sequential analysis. * **Planner and Coder:** A Planner agent establishes high-level objectives, which a Coder agent سپس translates into executable Python scripts. * **LLM-based Verification:** A Verifier agent acts as a judge, assessing whether the generated code and its output are sufficient to solve the problem or if the reasoning is flawed. * **Dynamic Routing:** If the Verifier identifies gaps, a Router agent guides the refinement process by adding new steps or correcting errors, allowing the cycle to repeat for up to 10 rounds. * **Intermediate Review:** The agent reviews intermediate results before proceeding to the next step, similar to how data scientists use interactive environments like Google Colab. ## Benchmarking and State-of-the-Art Performance The effectiveness of the DS-STAR framework was validated through rigorous testing against existing agents like AutoGen and DA-Agent. * The agent secured the top rank on the public DABStep leaderboard, raising accuracy from 41.0% to 45.2% compared to previous best-performing models. * Performance gains were consistent across other benchmarks, including KramaBench (39.8% to 44.7%) and DA-Code (37.0% to 38.5%). * DS-STAR showed a significant advantage in "hard" tasks—those requiring the synthesis of information from multiple, varied data sources—demonstrating its superior versatility in complex environments. By automating the time-intensive tasks of data wrangling and verification, DS-STAR provides a robust template for the next generation of AI assistants. Organizations looking to scale their data science capabilities should consider adopting iterative agentic workflows that prioritize multi-format data understanding and self-correcting execution loops.

llm ai machine-learning python+4

google Oct 30, 2025

Accelerating the magic cycle of research breakthroughs and real-world applications (opens in new tab)

Google Research is accelerating a "magic cycle" where breakthrough scientific discoveries and real-world applications continuously reinforce one another through advanced AI models and open platforms. By leveraging agentic tools and large-scale foundations, the company is transforming complex data into actionable insights across geospatial analysis, genomics, and quantum computing. This iterative process aims to solve critical global challenges while simultaneously uncovering new frontiers for future innovation. ### Earth AI and Geospatial Reasoning * Google has integrated various geospatial models—including those for flood forecasting, wildfire tracking, and air quality—into a unified Earth AI program. * The newly introduced Geospatial Reasoning Agent uses Large Language Models (LLMs) to allow non-experts to ask complex questions and receive plain-language answers derived from diverse datasets. * Riverine flood models have been significantly expanded, now providing forecasts for over 2 billion people across 150 countries. * New Remote Sensing and Population Dynamics Foundations have been released to help researchers understand nuanced correlations in planetary data and supply chain management. ### DeepSomatic and Genomic Research * Building on ten years of genomics work, DeepSomatic is an AI tool designed to identify somatic mutations (genetic variants in tumors) to assist in cancer research. * The tool follows the development of previous foundational models like DeepVariant and DeepConsensus, which helped map human and non-human genomes. * These advancements aim to move the medical field closer to precision medicine by providing health practitioners with higher-resolution data on genetic variations. ### The Magic Cycle of Research and Development * Google highlights "Quantum Echoes" as a key breakthrough in quantum computing, contributing to the broader goal of solving fundamental scientific problems through high-scale computation. * The acceleration of discovery is largely attributed to "agentic tools" that assist scientists in navigating massive datasets and uncovering new research opportunities. * The company emphasizes a collaborative approach, making foundation models available to trusted testers and partners like the WHO and various international research institutes. To maximize the impact of these breakthroughs, organizations should look toward integrating multimodal AI agents that can bridge the gap between specialized scientific data and practical decision-making. By utilizing open platforms and foundation models, the broader scientific community can translate high-level research into scalable solutions for climate resilience, healthcare, and global policy.

llm ai foundation-models quantum-computing+4