🤖 AI Summary
Google Cloud researchers announced DS-STAR, a versatile data‑science agent that automates end‑to‑end workflows across heterogeneous file types (CSV, JSON, markdown, unstructured text) and achieves state‑of‑the‑art results on benchmarks like DABStep, KramaBench and DA‑Code. DS‑STAR outperformed prior agents (raising DABStep accuracy from 41.0% to 45.2%, KramaBench 39.8%→44.7%, DA‑Code 37.0%→38.5%) and reached the top of the public DABStep leaderboard. Its significance lies in handling multi‑file, open‑ended problems—common in real world projects—by producing verifiable, executable code rather than relying on perfectly structured tabular inputs.
Technically, DS‑STAR combines three innovations: (1) a Data File Analyzer that auto‑summarizes diverse directory contents to give the LLM rich context, (2) an LLM‑based Verifier that judges plan sufficiency at each step, and (3) a sequential Planner→Coder→Verifier loop with a Router that iteratively refines plans (up to 10 rounds). Ablations show the Analyzer is critical—removing it drops hard‑task accuracy to 26.98%—and the Router’s corrective capability beats naively appending steps. DS‑STAR generalizes across LLMs (tested with GPT‑5 and Gemini‑2.5‑Pro) and typically needs ~3 rounds for easy tasks and ~5.6 for hard ones. The result is a practical agent that more reliably synthesizes insights from messy, multi‑source data and produces reproducible analysis code, lowering the expertise barrier for data science.
Loading comments...
login to comment
loading comments...
no comments yet