🤖 AI Summary
This summer Amazon Web Services’ Bedrock — the managed service that hosts models like Anthropic’s Claude and Meta’s Llama — ran into “critical capacity constraints” and performance shortfalls that pushed several customers to rivals. An internal July memo described quota shortages (limits on tokens processed per minute and API calls) and slow approvals that delayed or lost tens of millions in revenue: Epic moved a $10M Fortnite project to Google Cloud, Vitol risked a $3.5M hit, and at least $52.6M in projected sales were stalled. Customers also cited latency and missing features; Thomson Reuters found Bedrock 15–30% slower and lacked some compliance certifications, prompting multi-cloud shifts and direct use of Anthropic or Google services.
The episode underscores two big implications for AI/ML: capacity and inference performance are now strategic bottlenecks, and cloud providers that can scale chips, power and low-latency inference will win adoption. AWS is accelerating capacity (adding ~3.8 GW of power this year and pushing Trainium chips for large clients) but competitors — notably Google’s Gemini (reported to offer 5–6× larger quotas and better benchmarks) and cheaper options like Gemini Flash — have converted customers and reduced costs for startups. The crisis highlights the need for a coherent Bedrock inference strategy, faster quota management, and continued capex arms races across clouds as demand for generative AI continues to surge.
Loading comments...
login to comment
loading comments...
no comments yet