google Sep 8, 2025

Accelerating scientific discovery with AI-powered empirical software (opens in new tab)

ai llm gemini code-generation code-optimization tree-search scientific-computing empirical-software

Google Research has introduced an AI-powered system designed to accelerate scientific discovery by automating the creation and optimization of "empirical software." By leveraging the Gemini model and tree search optimization, the system can propose, implement, and iteratively improve code for complex multidisciplinary challenges, achieving results that match or exceed human expert performance. This approach transforms scientific hypothesis evaluation from a months-long manual coding process into an automated search that can be completed in hours or days.

The Concept of Empirical Software and Scorable Tasks

The system shifts focus from traditional functional correctness to "empirical software," where the primary objective is to maximize a predefined quality score.
It targets "scorable tasks," which are defined by a problem description, a specific scoring metric, and a dataset for training and validation.
This framework addresses the research bottleneck where scientists must manually test hundreds of models or parameters to achieve a breakthrough.

System Architecture and Optimization Strategy

The engine takes a task description and optional context—such as ideas from scientific literature—as input to generate novel methodological concepts.
It utilizes a tree search strategy inspired by AlphaZero, employing an upper confidence bound to navigate and prioritize thousands of potential code variants.
The LLM acts as an iterative rewriter, refining executable code within a sandbox to continuously improve the performance score.
Outputs are designed to be fully verifiable, interpretable, and reproducible, providing scientists with the specific coded solutions used to reach a result.

Demonstrated Performance Across Scientific Domains

The system was tested on six diverse benchmarks, including genomics, public health, geospatial analysis, neuroscience, and time-series forecasting.
In genomics, the system tackled the "batch integration" of single-cell RNA sequencing (scRNA-seq) data, a complex problem involving the removal of noise while preserving biological signals.
The AI discovered 40 novel methods that outperformed top expert-developed tools within the OpenProblems V2.0.0 batch integration benchmark.
Evaluation focused on advanced capabilities such as zero-shot generalization, high-dimensional signal processing, and uncertainty quantification.

This system represents a significant shift toward "research engines" that participate actively in the scientific method through iterative experimentation. Scientists can utilize these tools to explore a much broader range of hypotheses than manual coding allows, potentially leading to faster breakthroughs in data-heavy fields like genomics and climate modeling.