🤖 AI Summary
FuriosaAI unveiled its second‑generation RNGD (Renegade) AI accelerator—built on a Tensor Contraction Processor (TCP) architecture and sampling now on TSMC 5nm—that aims to undercut GPU incumbents on inference density and energy efficiency. Each PCIe RNGD card packs 48 GB HBM3, ~1.5 TB/s memory bandwidth, and 512 TFLOPS FP8 at a 180 W TDP; an 8‑card NXT RNGD Server aggregates 384 GB HBM3, 12 TB/s and 4 PFLOPS FP8 within a 3 kW rack envelope. Furiosa claims ~3x performance advantage versus Nvidia H100 on large language model workloads on a per‑watt basis and reports customers (LG AI Research, Kakao, even an OpenAI Seoul demo) seeing ~3.5x more tokens per rack because of lower power and higher rack density.
Technically, the TCP raises the primitive from matrix multiply to tensor contraction to minimize costly DRAM↔compute transfers, using a circuit‑switching fetch network and aggressive on‑chip data reuse. That hardware was co‑designed with a compiler and software stack: a JIT PyTorch toolchain, OpenAI‑compatible serving API, vLLM drop‑in support and a low‑level API for ultra‑low latency. The significance for AI/ML is practical: if validated at scale, TCP silicon could shift inference economics by reducing energy, improving utilization, and enabling denser racks—challenging GPU dominance—though broad adoption still depends on production design wins, ecosystem maturity and real‑world benchmark replication.
Loading comments...
login to comment
loading comments...
no comments yet