code-optimization

2 posts

google

Accelerating scientific discovery with AI-powered empirical software (opens in new tab)

Google Research has introduced an AI-powered system designed to accelerate scientific discovery by automating the creation and optimization of "empirical software." By leveraging the Gemini model and tree search optimization, the system can propose, implement, and iteratively improve code for complex multidisciplinary challenges, achieving results that match or exceed human expert performance. This approach transforms scientific hypothesis evaluation from a months-long manual coding process into an automated search that can be completed in hours or days. ### The Concept of Empirical Software and Scorable Tasks * The system shifts focus from traditional functional correctness to "empirical software," where the primary objective is to maximize a predefined quality score. * It targets "scorable tasks," which are defined by a problem description, a specific scoring metric, and a dataset for training and validation. * This framework addresses the research bottleneck where scientists must manually test hundreds of models or parameters to achieve a breakthrough. ### System Architecture and Optimization Strategy * The engine takes a task description and optional context—such as ideas from scientific literature—as input to generate novel methodological concepts. * It utilizes a tree search strategy inspired by AlphaZero, employing an upper confidence bound to navigate and prioritize thousands of potential code variants. * The LLM acts as an iterative rewriter, refining executable code within a sandbox to continuously improve the performance score. * Outputs are designed to be fully verifiable, interpretable, and reproducible, providing scientists with the specific coded solutions used to reach a result. ### Demonstrated Performance Across Scientific Domains * The system was tested on six diverse benchmarks, including genomics, public health, geospatial analysis, neuroscience, and time-series forecasting. * In genomics, the system tackled the "batch integration" of single-cell RNA sequencing (scRNA-seq) data, a complex problem involving the removal of noise while preserving biological signals. * The AI discovered 40 novel methods that outperformed top expert-developed tools within the OpenProblems V2.0.0 batch integration benchmark. * Evaluation focused on advanced capabilities such as zero-shot generalization, high-dimensional signal processing, and uncertainty quantification. This system represents a significant shift toward "research engines" that participate actively in the scientific method through iterative experimentation. Scientists can utilize these tools to explore a much broader range of hypotheses than manual coding allows, potentially leading to faster breakthroughs in data-heavy fields like genomics and climate modeling.

google

MLE-STAR: A state-of-the-art machine learning engineering agent (opens in new tab)

MLE-STAR is a state-of-the-art machine learning engineering agent designed to automate complex ML tasks by treating them as iterative code optimization challenges. Unlike previous agents that rely solely on an LLM’s internal knowledge, MLE-STAR integrates external web searches and targeted ablation studies to pinpoint and refine specific pipeline components. This approach allows the agent to achieve high-performance results, evidenced by its ability to win medals in 63% of Kaggle competitions within the MLE-Bench-Lite benchmark. ## External Knowledge and Targeted Ablation The core of MLE-STAR’s effectiveness lies in its ability to move beyond generic machine learning libraries by incorporating external research and specific performance testing. * The agent uses web search to retrieve task-specific, state-of-the-art models and approaches rather than defaulting to familiar libraries like scikit-learn. * Instead of modifying an entire script at once, the system conducts an ablation study to evaluate the impact of individual pipeline components, such as feature engineering or model selection. * By identifying which code blocks have the most significant impact on performance, the agent can focus its reasoning and optimization efforts where they are most needed. ## Iterative Refinement and Intelligent Ensembling Once the critical components are identified, MLE-STAR employs a specialized refinement process to maximize the effectiveness of the generated solution. * Targeted code blocks undergo iterative refinement based on LLM-suggested plans that incorporate feedback from prior experimental failures and successes. * The agent features a unique ensembling strategy where it proposes multiple candidate solutions and then designs its own method to merge them. * Rather than using simple validation-score voting, the agent iteratively improves the ensemble strategy itself, treating the combination of models as a distinct optimization task. ## Robustness and Safety Verification To ensure the generated code is both functional and reliable for real-world deployment, MLE-STAR incorporates three specialized diagnostic modules. * **Debugging Agent:** Automatically analyzes tracebacks and execution errors in Python scripts to provide iterative corrections. * **Data Leakage Checker:** Reviews the solution script prior to execution to ensure the model does not improperly access test dataset information during the training phase. * **Data Usage Checker:** Analyzes whether the script is utilizing all available data sources, preventing the agent from overlooking complex data formats in favor of simpler files like CSVs. By combining external grounding with a granular, component-based optimization strategy, MLE-STAR represents a significant shift in automated machine learning. For organizations looking to scale their ML workflows, such an agent suggests a future where the role of the engineer shifts from manual coding to high-level supervision of autonomous agents that can navigate the vast landscape of research and data engineering.