Why Is Japan Still Investing in Custom Floating Point Accelerators? (www.nextplatform.com)

0 points 9 days ago ago | visit original

🤖 AI Summary

Japan’s Pezy Computing continues to advance custom floating-point accelerators tailored for high-performance computing (HPC) and AI, challenging the GPU-dominated landscape by emphasizing energy efficiency and architectural innovation. With over a decade and a half of development, Pezy’s latest SC4s chip, unveiled at Hot Chips 2025, leverages TSMC’s cutting-edge 5nm process to deliver 2,048 RISC-V based processor elements (PEs) running at 1.5 GHz, supported by 96 GB of HBM3 memory with 3.2 TB/s bandwidth. This design integrates a sophisticated multi-level cache hierarchy and fine-grained multithreading under a Single Program, Multiple Data (SPMD) architecture, packing 4.8 billion gates while aiming for balanced compute-to-memory throughput and power efficiency around 600 watts. The significance for the AI/ML community lies in Pezy’s ability to rival leading GPUs not just in raw floating-point performance but also in energy efficiency and architectural flexibility. For instance, Pezy’s accelerators reportedly outperform Nvidia’s H100 GPUs by over 2x on genomics workloads, highlighting their potential in specialized AI and scientific applications where FP64 precision and throughput matter. Moreover, Pezy’s self-hosted RISC-V cores eliminate dependence on external CPUs for host operations, enabling streamlined, autonomous accelerator nodes. Their forthcoming roadmap hints at even more powerful multi-chiplet designs with FP8 support, aligning with trends in AI precision requirements. Pezy’s approach challenges the GPU hegemony by prioritizing minimal power overhead, fine-grained parallelism, and custom memory hierarchies designed explicitly for HPC and AI. As Japan doubles down on homegrown accelerator tech alongside projects like FugakuNext, Pezy Computing exemplifies why bespoke floating-point architectures still matter: they provide a compelling alternative for scaling AI workloads efficiently while driving innovation beyond generalized GPU designs. The release of the SC4s next year and its integration into large-scale systems will be a key benchmark to watch for HPC and AI accelerators globally.

Loading comments...

loading comments...