60x Faster Cold Starts: Treating Peer GPUs as Weight Servers (runwayml.com)

0 points 55 days ago ago | visit original

🤖 AI Summary

A new approach named NCCLBack has been developed to enhance GPU cold-start times from several minutes to mere seconds by treating peer GPUs as weight servers. This innovation addresses the longstanding inefficiency of GPUs independently fetching large model weights from cloud storage during their initialization. Instead, NCCLBack allows new workers to receive pre-loaded weights directly from a peer GPU, drastically reducing the demand on bandwidth and minimizing cold-start delays. This shift is particularly impactful for companies like Runway, which deploy models frequently, as it streamlines operations and enhances user experience by improving scalability, rollback capabilities, and response times. NCCLBack achieves its efficiency through a sophisticated system architecture that includes components for discovery, coordination, transfer, and verification. It leverages the faster interconnect speeds of GPU networks—up to 900 GB/s with NVLink—over conventional cloud storage options that operate at around 2–10 Gbps. The protocol ensures data integrity by systematically verifying weight hashes and establishes a fail-safe mechanism with timeouts during broadcasts to prevent indefinite hangs—key factors in maintaining reliable GPU operations. By facilitating simultaneous weight-sharing among multiple GPUs, NCCLBack not only enhances computational resource efficiency but also sets a precedent for future advancements in distributed AI systems.

Loading comments...

loading comments...