google Nov 5, 2025

DS-STAR: A state-of-the-art versatile data science agent (opens in new tab)

ai llm machine-learning python data-visualization data-science autonomous-agents data-wrangling

DS-STAR is an advanced autonomous data science agent developed to handle the complexity and heterogeneity of real-world data tasks, ranging from statistical analysis to visualization. By integrating a specialized file analysis module with an iterative planning and verification loop, the system can interpret unstructured data and refine its reasoning steps dynamically based on execution feedback. This architecture allows DS-STAR to achieve state-of-the-art performance on major industry benchmarks, effectively bridging the gap between natural language queries and executable, verified code.

Comprehensive Data File Analysis

The framework addresses a major limitation of current agents—the over-reliance on structured CSV files—by implementing a dedicated analysis stage for diverse data formats.

The system automatically scans a directory to extract context from heterogeneous formats, including JSON, unstructured text, and markdown files.
A Python-based analysis script generates a textual summary of the data structure and content, which serves as the foundational context for the planning phase.
This module ensures the agent can navigate complex, multi-file environments where critical information is often spread across non-relational sources.

Iterative Planning and Verification Architecture

DS-STAR utilizes a sophisticated loop involving four specialized roles to mimic the workflow of a human expert conducting sequential analysis.

Planner and Coder: A Planner agent establishes high-level objectives, which a Coder agent سپس translates into executable Python scripts.
LLM-based Verification: A Verifier agent acts as a judge, assessing whether the generated code and its output are sufficient to solve the problem or if the reasoning is flawed.
Dynamic Routing: If the Verifier identifies gaps, a Router agent guides the refinement process by adding new steps or correcting errors, allowing the cycle to repeat for up to 10 rounds.
Intermediate Review: The agent reviews intermediate results before proceeding to the next step, similar to how data scientists use interactive environments like Google Colab.

Benchmarking and State-of-the-Art Performance

The effectiveness of the DS-STAR framework was validated through rigorous testing against existing agents like AutoGen and DA-Agent.

The agent secured the top rank on the public DABStep leaderboard, raising accuracy from 41.0% to 45.2% compared to previous best-performing models.
Performance gains were consistent across other benchmarks, including KramaBench (39.8% to 44.7%) and DA-Code (37.0% to 38.5%).
DS-STAR showed a significant advantage in "hard" tasks—those requiring the synthesis of information from multiple, varied data sources—demonstrating its superior versatility in complex environments.

By automating the time-intensive tasks of data wrangling and verification, DS-STAR provides a robust template for the next generation of AI assistants. Organizations looking to scale their data science capabilities should consider adopting iterative agentic workflows that prioritize multi-format data understanding and self-correcting execution loops.