Show HN: XSched, a scheduling framework for multitasking over diverse XPUs (github.com)

🤖 AI Summary
XSched is an open-source, preemptive scheduling framework for heterogeneous XPUs (GPUs, NPUs, ASICs, FPGAs) that was released and presented at OSDI 2025. Recent updates include Windows and macOS support, validation on CUDA, LevelZero and OpenCL backends, and integrations into llama.cpp and NVIDIA Triton to enable priority-based, multi-request scheduling for inference. The project provides transparent deployment (no app code changes) via a shim that intercepts driver calls or optional explicit XQueue APIs; it can run as a system daemon and is already demonstrated to eliminate video-stutter on AI PCs (Intel NPU) and to coordinate workloads like fake-background webcam + Whisper speech-to-text using a least-laxity-first variant. Technically, XSched exposes a preemptible command-queue abstraction (XQueue) and a multi-level hardware model to map scheduling actions across diverse, vendor-specific runtimes. Its components—XShim (API interception), XPreempt (queue agent + preemption ops), XAL/HAL (hardware adapter), and XScheduler (central policy engine)—communicate via IPC so agents report queue state and the scheduler issues suspend/resume operations. The modular policy architecture supports priority, latency-aware, and custom policies with minimal runtime overhead. Implications: better SLOs and resource sharing across mixed accelerators, lower tail latency for multi-tenant inference, and a practical path to bring fine-grained preemption to emerging XPU hardware.
Loading comments...
loading comments...