GPU Snapshots to reduce ML coldstarts (nilesh-agarwal.com)

🤖 AI Summary
AWS-style VM snapshotting is coming to GPUs: using CRIU (v4.1+ with NVIDIA plugin) together with NVIDIA’s cuda-checkpoint, Podman/runc and the NVIDIA Container Toolkit, you can capture a running container’s full GPU state — VRAM, CUDA contexts, driver/kernel state and CPU process memory — and later restore it to resume execution instantly. For ML serving, that means multi-gigabyte models (Stable Diffusion, Qwen3) that normally incur 30+ second GPU load times can be “frozen” already in GPU memory and snapped back on demand, dramatically reducing cold starts and enabling fast scaling, migration, and pre-warmed inference images across machines. The workflow ties Podman’s checkpoint commands to runc → CRIU → cuda-checkpoint: CRIU freezes CPU/process state and delegates GPU state capture/restore to cuda-checkpoint, producing a tarred checkpoint archive you can import elsewhere. Key requirements: Linux host with compatible kernel, NVIDIA driver (recommended ≥570; cuda-checkpoint supports driver 550+), CRIU 4.1+, Podman/runc, and the NVIDIA container toolkit/CDI. Practicals include binding /dev/nvidia*, enabling tcp-established and link-remap in /etc/criu/runc.conf, and using options like --ignore-rootfs/--ignore-volumes. Caveats remain: PID and mount/lock brittleness, imperfect TCP socket reattachment, and the need for matching driver/kernel/GPU models on target hosts — so powerful but operationally sensitive for production deployments.
Loading comments...
loading comments...