Show HN: Supercharging RL with Hyper-Efficient Online Opt, +165% in 2h, $10 (www.arc.computer)

0 points 23 hours ago ago | visit original

🤖 AI Summary

Researchers behind ATLAS report a simple, low-cost way to supercharge an already RL-trained teacher model: layer a hyper-efficient online optimizer (GEPA-style reflective mutation) on top of the offline RL foundation. Using ATLAS-8B-Thinking as the teacher and gemini/gemini-flash-2.5 as the reflection agent, they evolved teacher prompts in real time and achieved a +165% student performance gain (Pareto front 0.1933 → 0.5121) in ~2 hours for about $10 of inference. Validation on a held-out 50-example slice showed comparable +142% gains, and the optimized prompts produced 97% shorter student solutions (peak efficiency 1.97). A concrete case moved an initial negative score (–0.2) to 1.479 after iterative prompt mutations. Technically, the pipeline combines offline Reinforced Continual Learning (RCL) to build a robust, generalist teacher and an online GEPA loop that uses natural-language reflection to diagnose student failures and propose targeted “reflective mutations” to three interlocking templates (teacher adaptive, student diagnostic, student-with-teaching). The experiment ran on 100 training examples, up to 3,500 metric calls, and generated ~9M tokens (~$0.0007/1K tokens), demonstrating that language-based online optimization can cheaply adapt RL-trained agents for task-specific improvements. Code and models are open-source (Arc-Computer/ATLAS; arc-intelligence/ATLAS-8B-Thinking), offering a practical pattern for agent builders who want rapid, resource-light adaptation without full retraining.

Loading comments...

loading comments...