AI's voracious appetite for data imperils key privacy principles (news.bloomberglaw.com)

🤖 AI Summary
AI’s escalating demand for training data is colliding with the long-standing privacy principle of data minimization — the idea that organizations should collect and keep only what’s necessary. Experts say available public text may already be largely exhaustively scraped, driving firms to hunt for more personal and conversational data to stay competitive; Stanford fellow Jennifer King notes the field is “out of data.” That pressure has real-world consequences: Google reportedly paid for access to Reddit posts and Big Tech lobbying is pushing back on rules that would restrict data reuse, while regulators eye whether laws like the GDPR need reinterpretation to accommodate AI development. Technically, the tension centers on quantity versus quality. Large language models need massive corpora, but uncurated datasets introduce noise, biases and low-quality behavior — as critics pointed to a recent Meta model trained on social posts that produced gibberish-like outputs. Some firms and lawyers argue curation, anonymization and task-specific data selection can reconcile AI progress with minimization, and examples like China’s DeepSeek suggest comparable performance can sometimes be achieved with less data. Still, more data means higher energy, cost and market advantage, creating strong incentives to resist strict limits and pushing policy debates toward either flexible, harm-focused rules or potential loosening of minimization norms.
Loading comments...
loading comments...