New Nvidia Nemotron models – new king of local models? (huggingface.co)

0 points 1 day ago ago | visit original

🤖 AI Summary

NVIDIA has released Qwen3-Nemotron-32B-RLBFF, a 32-billion-parameter Transformer built on Qwen/Qwen3-32B and fine‑tuned with a new RLBFF (Binary Flexible Feedback) training regime and the HelpSteer3 dataset. The model is available on Hugging Face and intended as a research release tied to the arXiv paper (RLBFF). It’s optimized to improve “default thinking” conversational responses and supports long-context inputs (model architecture allows up to 128k tokens though training used conversations up to 4k tokens). Runtime-tested on NVIDIA A100/H100 hardware with NeMo-RL and typical Transformers tooling; recommended for 1+ 80GB GPUs (bfloat16) and vLLM for faster inference. Use is governed by the NVIDIA Open Model License. Why it matters: Qwen3-Nemotron-32B-RLBFF substantially raises capability over its Qwen3-32B base (Arena Hard v2 from 44.0% → 55.6%, WildBench 67.6 → 70.33, MT-Bench 9.38 → 9.50) while keeping inference cost low. NVIDIA reports performance comparable to competitive local models (DeepSeek R1, o3‑mini) but at a much lower inference cost, making it attractive for labs and enterprises running on GPU-accelerated local infrastructure. For practitioners, the release provides a strong, efficient local model for research and product proof-of-concept work, with full model card details, safety guidance, and reproducible code examples included.

Loading comments...

loading comments...