R-Zero: Self-Evolving Reasoning LLM from Zero Data (arxiv.org)

0 points 2 days ago ago | visit original

🤖 AI Summary

R-Zero introduces a breakthrough approach to developing Large Language Models (LLMs) that autonomously generate and refine their own training data, eliminating reliance on vast human-curated datasets. This self-evolving framework uses two distinct models—a Challenger and a Solver—that co-evolve through interaction: the Challenger designs increasingly difficult tasks near the Solver's capability edge, and the Solver improves by attempting these challenges. This dynamic creates a self-improving curriculum tailored to the model’s current strengths and weaknesses, advancing reasoning ability without any external labeled data. This innovation is significant for the AI/ML community as it addresses a key bottleneck in LLM training—dependency on costly, manually labeled datasets—potentially accelerating progress toward more autonomous and scalable AI systems. Empirical results demonstrate notable improvements in reasoning tasks, with R-Zero boosting the performance of backbone models like Qwen3-4B-Base by +6.49 on math reasoning and +7.54 on general reasoning benchmarks. By fostering autonomous task creation and learning, R-Zero paves the way for more efficient, scalable, and powerful AI models capable of evolving beyond human-labeled data constraints.

Loading comments...

loading comments...