Nvidia ToolOrchestra – 8B model "manager" improves intelligence and efficiency (arxiv.org)

0 points 228 days ago ago | visit original

🤖 AI Summary

NVIDIA researchers introduce ToolOrchestra, a method that trains a small “orchestrator” model to coordinate larger models and diverse tools to solve hard, agentic tasks. They release Orchestrator, an 8B parameter model trained with reinforcement learning using outcome-, efficiency-, and user-preference-aware rewards to decide which tools to call and how to compose them. On the Humanity’s Last Exam (HLE) benchmark Orchestrator scores 37.1% (vs. GPT-5’s 35.1%) while being 2.5× more cost‑efficient; on tau2-Bench and FRAMES it outperforms GPT-5 while using only ~30% of the compute cost. Extensive evaluations show Orchestrator achieves a strong performance–cost tradeoff and generalizes to unseen tools. Technically, the key idea is moving intelligence from a single giant model into a lightweight controller that dynamically routes tasks to specialized components and external tools, optimized explicitly for outcomes, compute efficiency, and user-aligned tool choices. That design yields both better accuracy and far lower expense for complex reasoning workflows, suggesting a practical path to scalable, modular, tool-augmented AI systems. For the AI/ML community this highlights that small, RL-trained orchestrators can raise the ceiling of problem solving while reducing operational cost—paving the way for more efficient, preference-aware multi-agent and tool-use architectures.

Loading comments...

loading comments...