🤖 AI Summary
Flash-KMeans is a Triton-based, GPU-accelerated implementation of batched K-Means clustering released as the official clustering module for Sparse VideoGen2. The repo is pip-installable and exposes a simple API (batch_kmeans_Euclid) that accepts batched tensors on CUDA and returns cluster IDs and centers; it supports FP16 and common convergence controls (e.g., tolerance, verbose). The implementation focuses on Euclidean K-Means for many simultaneous clustering problems (large batch sizes, high dimensionality) and is designed for integration into large-scale video and embedding pipelines.
Its significance lies in dramatic runtime improvements for large clustering workloads that frequently appear in AI/ML — e.g., preprocessing embeddings, quantizing feature maps, or forming semantic tokens for sparse attention in generative models. On an NVIDIA H100 (FP16, batch 32, 16k points, 128-D, 1k clusters) the Triton kernels deliver up to 16× speed-up versus a standard PyTorch baseline. That combination of batched operation, FP16 support, and Triton-optimized kernels makes Flash-KMeans especially useful when K-Means is a bottleneck in training or inference pipelines. If you use it, cite the Sparse VideoGen2 paper linked in the repository.
Loading comments...
login to comment
loading comments...
no comments yet