🤖 AI Summary
South Korean startup FuriosaAI unveiled the RNGD Server, an enterprise AI appliance built around its in-house RNGD inference chips that delivers 4 petaFLOPS of FP8 compute, 384 GB of HBM3 memory, and runs at just 3 kW. Furiosa positions the system as matching the performance of Nvidia H100-based servers while using a fraction of the power: a standard 15 kW rack can hold five RNGD Servers versus one DGX H100, and typical data-center rack limits (often ≤8 kW) mean the RNGD lowers the need for costly power and cooling upgrades. The system is sampling now with global customers and is expected to be orderable in early 2026.
Beyond raw specs, Furiosa emphasizes practical adoption: RNGD Server supports OpenAI’s API and will receive continuous SDK improvements — including inter-chip tensor parallelism for scaling across chips, compiler optimizations, and expanded quantization formats — which ease porting and efficient inference of LLMs. LG AI Research is already using RNGD for its EXAONE models and reports more than 2× inference performance per watt versus GPUs. For the AI/ML community this signals a growing viable alternative to GPU-centric inference: lower operational costs and higher rack density could accelerate on-prem and hybrid deployments of large models, while software-stack advances will determine how smoothly models migrate from established accelerator ecosystems.
Loading comments...
login to comment
loading comments...
no comments yet