Measuring the environmental impact of delivering AI at Google Scale [pdf] (services.google.com)

🤖 AI Summary
Google published a first-party, full‑stack measurement study of AI serving at production scale, instrumenting the Gemini Apps assistant to quantify per‑prompt energy, carbon, and water impacts. Unlike prior estimates that often only measured accelerator draw, their methodology includes active AI accelerator power, host CPU/DRAM, energy from idled machines provisioned for latency, and data‑center overhead (PUE), plus market‑based emissions and embodied accelerator emissions. Key findings: the median Gemini Apps text prompt consumes about 0.24 Wh of energy and ~0.26 mL of water, and over one year Google’s software efficiency work plus clean‑energy procurement drove ~33× lower energy use and ~44× lower carbon per median prompt. This matters because published per‑prompt estimates vary by an order of magnitude depending on measurement boundaries and assumptions—making apples‑to‑apples comparisons and policy decisions difficult. Google’s paper defines three operational metrics (energy/prompt, emissions/prompt, water/prompt), highlights production levers (batching, speculative decoding, disaggregated serving and software stack optimizations), and argues that standardized, comprehensive measurement is critical to properly incentivize efficiency across the full serving stack. The result reframes expected inference footprints, shows substantial real‑world gains from software and procurement, and provides a replicable framework for benchmarking and lifecycle assessments.
Loading comments...
loading comments...