Nvmath-Python: Nvidia Math Libraries for the Python Ecosystem (github.com)

🤖 AI Summary
Nvmath‑python is a new Python package that exposes NVIDIA’s high‑performance math libraries (cuBLASLt, cuFFT Dx, etc.) directly to the Python ecosystem, offering pythonic APIs that accept PyTorch/CuPy/NumPy tensors. It surfaces low‑level parameters that other wrappers omit, lets you plan and reuse stateful primitives (e.g., Matmul objects with selectable mixed‑precision compute types like COMPUTE_32F_FAST_16F), and supports fused epilog/prolog operations (bias, rescale, etc.) applied without separate kernel calls. That combination of fine‑grained control and framework interoperability can unlock faster, more predictable GEMM/FFT performance and easier operator fusion for ML workloads. Technically, nvmath‑python also exposes NVIDIA device‑side (Dx) functions so you can call library routines from inside custom device kernels (examples show calling cuFFTDx from a Numba @cuda.jit kernel). Epilogs/prologs can be authored in Python and compiled to LTO‑IR for inlined, low‑overhead post‑processing (unitary FFT scaling shown). The API returns algorithm plans you can tune, exposes block/shared memory and thread‑layout details for device functions, and demonstrates high numerical fidelity (low L2 error in FFT example). The project is Apache‑2.0 licensed and currently Beta — useful for power users and library authors who need low‑level GPU control but still want Python ergonomics.
Loading comments...
loading comments...