Jupyter Agents: training LLMs to reason with notebooks (huggingface.co)

🤖 AI Summary
A new initiative in the AI/ML community, the Jupyter Agent, has been introduced to enhance large language models (LLMs) with the ability to execute code directly within Jupyter Notebooks. This innovation aims to empower LLMs with more autonomy to tackle complex data analysis tasks by combining reasoning with code execution. The project leverages the powerful Qwen-3 Coder model and is positioned as an evolution from previous versions, focusing not only on large models but on fine-tuning smaller models to perform better in agentic data science tasks. The Jupyter Agent uses a curated dataset of around 51,000 synthetic notebooks from Kaggle to train models specifically for data science applications. A vital aspect of the project is the DABStep benchmark, which assesses LLM performance on realistic data science questions. Initial tests showed that even small models like Qwen3-4B have room for improvement, with performance on easy tasks at 44.4% accuracy. After fine-tuning and simplifying the model's scaffolding, accuracy improved to 59.7%. This development signifies a significant step towards creating AI agents that can not only execute code but also understand the reasoning behind their analyses, potentially transforming how data-driven tasks are approached in various fields.
Loading comments...
loading comments...