Microsoft Unveils First Nvidia GB300 NVL72 Supercomputing Cluster for OpenAI (blogs.nvidia.com)

🤖 AI Summary
Microsoft Azure launched the NDv6 GB300 VM series — the industry’s first production supercomputing cluster built from NVIDIA GB300 NVL72 systems — purpose-built to serve OpenAI’s highest‑demand inference workloads. The cluster stitches together 4,608 NVIDIA Blackwell Ultra GPUs (over 4,600) using NVIDIA’s NVLink switch fabric and Quantum‑X800 InfiniBand, reflecting a deep Microsoft–NVIDIA co‑engineering push across cooling, power, memory and software to deliver massive inference and training throughput for reasoning, agentic and multimodal models. Technically, each GB300 NVL72 rack packs 72 Blackwell Ultra GPUs plus 36 Grace CPUs, offering 37 TB of fast unified memory and about 1.44 exaflops of FP4 Tensor Core peak per VM. Intra‑rack NVLink provides 130 TB/s all‑to‑all bandwidth, while Quantum‑X800 (with ConnectX‑8 SuperNICs) delivers ~800 Gb/s per GPU across the cluster, plus adaptive routing, telemetry‑based congestion control and SHARP v4 acceleration. The stack leverages NVFP4 numeric formats and NVIDIA Dynamo compiler tech; MLPerf Inference v5.1 results showed up to 5x GPU throughput on a 671B reasoning model versus Hopper and leadership on Llama 3.1 405B. The outcome is a production platform that materially raises the scale and efficiency envelope for next‑generation models, enabling faster inference, larger unified memory models and more ambitious AI systems at cloud scale.
Loading comments...
loading comments...