NVIDIA B200 low power usage for inference AI workload (www.lightly.ai)

0 points 3 days ago ago | visit original

🤖 AI Summary

Early-access benchmarks of NVIDIA’s Blackwell B200 (self-hosted 8x cluster at GreenMountain) show it substantially outpacing cloud H100s on real-world AI workloads: up to 57% faster for GPU-bound computer vision pretraining (YOLOv8-x + DINOv2 on ImageNet-1k using LightlyTrain) and about a ~10% token-generation speedup for Gemma 27B in Ollama. For massive models (DeepSeek 671B) inference was roughly on par, a gap the authors attribute to early Blackwell software/driver support; they expect inference to improve as CUDA, vLLM/TensorRT-LLM and frameworks mature. The B200’s hardware jump—~192 GB memory, ~2.4× memory bandwidth, and >2× peak FP16/BF16 throughput vs H100—directly enables larger batch sizes and higher throughput for training. Power, cost, and ops make the story bigger for teams running continuous workloads: measured GPU draw was much lower than the 1,000W spec (around ~600W under load, ~140W idle), whole-node GPU power ~4.8 kW (system ~6.5–7 kW). Self-hosting at a colo yielded an estimated operating cost of ~$0.51/GPU-hour vs cloud H100 rates of $2.95–$16.10/hr, implying ~6×–30× operational savings for constant use, plus predictable performance (no noisy neighbors) and 24/7 availability on renewable power. For ML teams with sustained heavy training needs, B200 + self-hosting promises materially lower TCO and faster iteration—contingent on ecosystem maturation for peak inference performance.

Loading comments...

loading comments...