Nemotron 3 Ultra: Open Moe Hybrid Mamba-Transformer for Agentic Reasoning [pdf] (research.nvidia.com)

🤖 AI Summary
NVIDIA has announced the release of Nemotron 3 Ultra, a state-of-the-art Mixture-of-Experts (MoE) Hybrid Mamba-Transformer model featuring 550 billion total parameters. Pre-trained on an extensive dataset of 20 trillion text tokens, this model extends context length to 1 million tokens and employs innovative techniques like Reinforcement Learning and Multi-Teacher On-Policy Distillation. By implementing its advanced architecture, Nemotron 3 Ultra achieves up to 6 times higher inference throughput compared to existing language models while maintaining comparable accuracy, making it particularly suited for long-running autonomous applications such as code generation and complex task execution. The significance of Nemotron 3 Ultra lies in its ability to enhance agentic reasoning capabilities in the AI/ML landscape by supporting intricate reasoning and tool use. Notable features include the LatentMoE for better accuracy per parameter, a hybrid Mamba-Attention architecture to optimize inference speed, and a reasoning budget control mechanism, allowing for real-time adjustments in the accuracy-compute trade-off. NVIDIA has also committed to openness by releasing the model checkpoints and training data on HuggingFace, encouraging further research and development in the field of large language models and autonomous agents.
Loading comments...
loading comments...