🤖 AI Summary
ZLUDA 5 is out: the open-source compatibility layer that runs unmodified CUDA applications on non‑NVIDIA GPUs adds new developer tooling, correctness improvements and early ML support. Key additions include zluda_trace (community-friendly tracing for bug reports; collect Linux logs), an offline compiler zoc (PTX → LLVM IR before/after linking → RDNA assembly via ROCm, a ptxas-like CLI), kernel caching to avoid repeated PTX→machine-code compiles, and zluda_ld to work around DT_RPATH issues using LD_AUDIT so binaries like PyTorch can be coerced to load ZLUDA. The project now publishes prerelease binaries automatically and runs CI/unit tests and a nightly PTX sweep to catch regressions.
The release prioritizes correctness over speed and hits two ML milestones: llm.c GPT2 tests run (without multi‑GPU or FlashAttention; FlashAttention blocked by missing MIOpen APIs) and preliminary CUDA backend support for llama.cpp with performance comparable to recent ROCm measurements. Initial support for cuBLAS, cuBLASLt and NVML is present and set up for rapid expansion, but full PyTorch support remains blocked by compiler slowness, missing performance libraries (cuBLAS/cuDNN), and gaps in LLVM’s AMDGPU backend. ZLUDA’s compiler/host-API fixes have made it largely bit‑accurate with NVIDIA GPUs across most FP modes, though some instructions and performance libs are still pending. The team asks users to try traces and prereleases and file issues to help harden the stack.
Loading comments...
login to comment
loading comments...
no comments yet