Tile Language: DSL for High-Performance GPU/CPU/Accelerators Kernels (github.com)

0 points 7 hours ago ago | visit original

🤖 AI Summary

Tile Language (tile-lang) is an open-source, Pythonic DSL and compiler layer built on TVM for rapidly writing high-performance GPU/CPU/accelerator kernels (GEMM, Dequant GEMM, FlashAttention, LinearAttention, Flash MLA decoding, sparse attention, etc.). It combines concise, high-level primitives (T.Kernel, T.copy, T.gemm, T.Pipelined, layout annotations and swizzles) with low-level control so developers can prototype in a few dozen lines while still generating hand-tuned performance. Recent updates include AscendC/AscendNPU IR backends for Huawei Ascend chips (Sept 2025 preview), 2:4 sparse tensor-core support (T.gemm_sp), an NVRTC backend to speed compilation, WebGPU codegen, and a high-performance FlashMLA implementation for AMD MI300X that matches hand-written assembly. Nightly builds and a v0.1.0 release are available. Technically, tile-lang emits backends for CUDA/HIP/CPU (and now Ascend), leverages target-specific features (Auto TMA/WGMMA on H100, Auto MatrixCore on MI250, Async Copy on MI300X), and can dispatch to cute/hip for tile-level GEMMs. Examples show end-to-end JIT compilation, kernel profiling, correctness checks against PyTorch, and advanced optimizations like pipelined tile copies, L2-friendly swizzling, and fragment-level accumulators. For the AI/ML community, tile-lang lowers the barrier to experiment with custom kernels and operator fusion while preserving device-specific tuning—accelerating research and production paths for efficient transformer primitives, sparse kernels, and quantized workflows.

Loading comments...

loading comments...