American Data Centers (tech.marksblogg.com)

🤖 AI Summary
Business Insider published an interactive map and embedded GeoJSON (as JavaScript) that links diesel generator permit filings to parent companies, revealing 1,240 U.S. data‑center sites with rich metadata. The author extracted the embedded GeoJSON from the site JS using esprima, converted it to line‑delimited JSON, and ingested it into DuckDB (with h3, spatial and json extensions), applying a Hilbert ordering for spatial locality to produce a compact Parquet file (234 KB, 1,240 rows, 99 columns). The dataset includes permit years, generator types and rated capacities, estimated power use at 30/50/60% utilization, water consumption notes, environmental‑justice percentiles, and operator/company mappings — exemplified by a hyperscaler‑scale Apple site in Mesa, AZ with detailed generator and capacity estimates. For the AI/ML community this is a practical dataset for quantifying compute infrastructure geography, resilience and environmental footprint. It enables analyses of where training clusters and hyperscalers concentrate, rough estimates of available on‑site power capacity, diesel backup reliance (risk/air quality), and water stress exposure — all relevant to model training costs, carbon accounting, latency planning, and regulatory risk. The author’s reproducible pipeline (Python 3.12, esprima, DuckDB, QGIS) makes the data readily usable for spatial analysis, capacity estimation, and research into sustainable AI infrastructure.
Loading comments...
loading comments...