Nemotron 3 Nano Technical Report [pdf] (research.nvidia.com)

0 points 135 days ago ago | visit original

🤖 AI Summary

NVIDIA recently announced the release of Nemotron 3 Nano, a 30 billion parameter Mixture-of-Experts (MoE) hybrid Mamba-Transformer language model that enhances agentic reasoning and chat capabilities. Trained on a massive dataset of 25 trillion tokens, Nemotron 3 Nano improves upon its predecessor, Nemotron 2 Nano, by activating fewer than half the parameters per forward pass while delivering up to 3.3x higher inference throughput compared to comparable models like GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507. With support for context lengths of up to 1 million tokens, this model aims to push the boundaries of natural language understanding and generation. This development is significant for the AI/ML community as it showcases the advancements in efficiency and performance achievable through innovative architectures like MoE and the hybrid Mamba-Transformer design. The granular MoE architecture allows for a high degree of sparsity, with only 6 out of 128 experts activated during inference, optimizing resource use while maintaining accuracy. The release includes not only the model weights but also code and data, fostering collaboration and further experimentation across the AI research landscape. The capabilities of Nemotron 3 Nano position it as a formidable competitor and an important tool for researchers and developers focusing on complex reasoning tasks.

Loading comments...

loading comments...