Jamba Reasoning 3B (huggingface.co)

🤖 AI Summary
AI21 announced Jamba Reasoning 3B, a compact 3-billion-parameter reasoning model that blends Transformer attention with Mamba state-space layers to deliver high intelligence scores and extremely efficient long‑context processing. The hybrid architecture (28 layers: 26 Mamba, 2 attention; 20 MQA heads — 20 query heads with 1 shared KV head) reduces memory overhead and boosts throughput, enabling practical inference on laptops, GPUs and even mobile devices. Jamba supports a 256k token context, a 64k vocabulary, and multilingual output (including English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic and Hebrew). AI21 recommends vLLM >=0.11.0 with --mamba-ssm-cache-dtype=float32 for best performance; model weights are provided with a GGUF card under Apache 2.0. Technically significant for the AI/ML community, Jamba shows that hybrid SSM/attention designs can match or exceed larger models on multi-benchmark reasoning metrics while remaining resource‑light and scalable to very long contexts without prohibitive attention caches. On combined intelligence metrics it tops peers like Gemma 3 4B, Llama 3.2 3B and Granite 4.0 Micro (e.g., MMLU-Pro ~61%, IFBench ~52%). Training included large-scale pretraining, a ~0.5T-token mid‑training focusing on math/code with extended windows, cold-start distillation and online RL (RLVR) to improve instruction following and tool use; VeRL tooling improvements for hybrid-model training will be released soon.
Loading comments...
loading comments...