🤖 AI Summary
Tongyi DeepResearch is a newly released fully open‑source web agent (a 30B MoE model) that its authors say matches or exceeds OpenAI’s DeepResearch on multiple benchmarks—32.9 on Humanity’s Last Exam (HLE), 43.4 on BrowseComp, 46.7 on BrowseComp‑ZH and 75 on xbench‑DeepSearch. Beyond raw scores, the project supplies a complete, battle‑tested training and inference stack so others can reproduce high‑quality agent behavior: a long‑context (128K) ReAct mode that works out‑of‑the‑box and a Heavy Mode (IterResearch/Test‑time‑scaling) for deep, multi‑round research tasks and synthesis.
Technically, Tongyi’s key innovations are in data and training: Agentic Continual Pre‑training (CPT) driven by AgentFounder, a large‑scale synthetic data flywheel that reorganizes crawled documents, knowledge graphs and tool trajectories into entity‑anchored QA/action datasets; progressively harder, graph‑based QA synthesis (WebSailor/WebShaper) including controlled obfuscation and a formal set‑theoretic task model; and an automated engine that iteratively crafts PhD‑level questions. The end‑to‑end pipeline (Agentic CPT → SFT → RL) culminates in a customized on‑policy Group Relative Policy Optimization (GRPO) with token‑level gradients, leave‑one‑out advantage estimation and conservative negative‑sample filtering to stabilize training. For practitioners, this release lowers the barrier to building capable autonomous research agents, provides reproducible engineering patterns for agentic RL and synthetic data, and supplies a practical template for scaling complex reasoning and planning in open models.
Loading comments...
login to comment
loading comments...
no comments yet