🤖 AI Summary
Prime Intellect today open-sourced INTELLECT-3, a 106B-parameter Mixture-of-Experts model (built on GLM‑4.5 Air) that was post-trained with supervised fine-tuning and large-scale reinforcement learning. The team says INTELLECT-3 achieves state-of-the-art results for its size across math, code, science and reasoning benchmarks and even outperforms many larger frontier models. Alongside model weights they released their full training stack, datasets (including the SYNTHETIC‑2 reasoning traces), evaluation suites and environments so others can reproduce, ablate, and extend the work.
Technically, INTELLECT-3 was trained on 512 NVIDIA H200 GPUs across 64 nodes using prime‑rl, an async‑only production RL trainer designed for off‑policy, long‑horizon agentic rollouts. The pipeline integrates verifiers (an open toolkit for fast, modular RL environments), an Environments Hub that version-controls environment modules, and Prime Sandboxes — a low‑latency Rust-to-pod execution layer built for thousands of concurrent untrusted-code rollouts. Training ran for roughly two months with a mix of Math, Code, Science, Logic, Deep Research and Software Engineering tasks. The significance: this demonstrates that async RL can scale to 100B+ MoE models, lowers the infrastructure barrier to frontier RL research, and promotes reproducibility and ecosystem growth by making full stacks and environments public and offering hosted tooling and grants to accelerate adoption.
Loading comments...
login to comment
loading comments...
no comments yet