🤖 AI Summary
Researchers introduced Recursive Self-Aggregation (RSA), a test-time scaling method that unlocks deeper reasoning in large language models by combining the strengths of parallel and sequential inference. Instead of treating only final answers, RSA maintains a population of candidate reasoning chains and iteratively refines them: at each step, it aggregates subsets of chains to produce improved candidates for the next round. This evolutionary-style bootstrapping lets the method harvest partially correct intermediate steps from different chains of thought, enabling models to "build" better solutions over multiple iterations without additional training.
Technically, RSA is a population-based iterative refinement that uses subset aggregation (more than simple voting) to synthesize stronger reasoning traces; it scales compute at inference and improves with larger compute budgets. Empirically the authors show consistent gains across tasks (AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, SuperGPQA) and model families, with small models like Qwen3-4B-Instruct-2507 reaching performance competitive with larger reasoning models (DeepSeek-R1, o3-mini (high)). They also demonstrate further improvements by training models with an aggregation-aware reinforcement learning objective to better combine candidate solutions. Code is available for reproduction, making RSA a practical knob to boost reasoning without heavier models or retraining.
Loading comments...
login to comment
loading comments...
no comments yet