NextSilicon Details Runtime-Reconfigurable Architecture (www.eetimes.com)

🤖 AI Summary
HPC startup NextSilicon unveiled technical details of its runtime-reconfigurable dataflow architecture and benchmark results for its Maverick2 accelerator, claiming it can outperform CPUs and GPUs on unmodified code. Maverick2 is a fabric of quickly reconfigurable ALU compute blocks (what the company calls “software‑defined” compute) with on-chip telemetry, reservation stations, dispatch units, memory entry points, an MMU/TLB, and a compiler that maps existing C/C++/Fortran/Python/CUDA/ROCm/OneAPI code to the fabric. A runtime algorithm continuously identifies hot code paths (the company cites Pareto-like 1%/99% behavior in parallel code) and reconfigures ALUs in nanoseconds to accelerate those paths, duplicate hot blocks for parallelism, or move communicating blocks closer together — all without developer changes. Maverick2 is built on TSMC 5nm and comes in single-die (up to 96 GB HBM3e, 400 W air-cooled) and dual-die (up to 192 GB HBM3e, 750 W liquid-cooled) variants. Reported results: Stream 5.2 TB/s, GUPS 32.6 at 460 W, HPCG 600 GFLOPS at 600 W, and PageRank at 40 gigapages/s (10× better than GPUs on small graphs at half the power; able to run graphs >25 GB GPUs could not). NextSilicon also showed Arbel, a 10‑wide RISC‑V host core aimed at Lion‑Cove/Zen5-class performance, and says Maverick2 is already deployed at dozens of customers (including Sandia’s Vanguard 2) with Maverick3 — adding reduced‑precision AI support — due ~2027. If validated broadly, this approach could cut costly porting effort, boost energy efficiency, and change how HPC/ML workloads are accelerated by making dynamic, application-aware hardware reconfiguration practical.
Loading comments...
loading comments...