GPUs when you need them: Introducing Flex-start VMs (cloud.google.com)

🤖 AI Summary
Google Cloud announced general availability of Flex-start VMs — a new Dynamic Workload Scheduler (DWS) feature that lets you create single Compute Engine VMs that wait in a managed queue for in-demand GPUs. Intended for tasks with flexible start times (model fine-tuning, batch inference, HPC, experiments), Flex-start VMs increase the likelihood of getting accelerators by holding capacity requests for 90 seconds up to two hours, switching the workflow from repeated manual retries to a single queued request. Google positions this as a differentiated consumption model (a first among major clouds), with discounted SKUs versus standard on‑demand pricing. Technically, Flex-start integrates directly with the instances.insert API, gcloud CLI and Console, placing VMs in a PENDING state until resources are provisioned or the wait window expires. Instances consume preemptible quota, can run up to seven days, and support stop/start to pause billing and re‑queue for capacity (which resets the seven-day clock once provisioned). You can also set instanceTerminationAction=STOP to preserve IPs and boot disks instead of deleting resources when the runtime limit is reached. For AI/ML teams this simplifies scheduler integration, reduces engineering work to implement retry logic, improves fair access to scarce GPUs, and lowers cost of short-duration GPU workloads.
Loading comments...
loading comments...