🤖 AI Summary
XiangShan published V2R2 of its Vector Floating-Point Unit (VFPU) design (2025-01-20), a full-featured vector FP functional unit targeting the RISC‑V RV-V1.0 vector ISA. The VFPU implements four submodules—VFAlu (adds, compares, reductions), VFMA (mul/FMA), VFDivSqrt (div/sqrt) and VFCvt (format convert/reciprocal estimate)—and supports fp16/fp32/fp64, FMA, division, square root, and reciprocal-estimate operations. The design emphasizes vector throughput (1×f64, 2×f32, 4×f16 per 64-bit lane), mixed‑precision/widening instructions (e.g., f64=f64+f32, 2×f32=f32+f16), and claims microarchitectural timing up to 3 GHz.
Technically, the report tackles two core challenges for ML-centric vector FPUs: maintaining 100% bandwidth utilization for narrow formats and efficiently supporting mixed‑precision conversions (including denormals). Key innovations include reuse of high‑precision hardware via multiplexing, a fast data format conversion path to reduce timing pressure, and an improved dual‑path floating‑point addition algorithm that minimizes critical‑path signed additions by parallelizing swap/align/LZD/rounding steps. These choices — plus vectorized FMA and accumulation algorithms — make the VFPU well suited for AI/ML workloads that rely on mixed precision and high throughput (inference/training), while remaining compatible with RISC‑V vector semantics for software portability and compiler optimization.
Loading comments...
login to comment
loading comments...
no comments yet