What Happened to Piracy? Copyright Enforcement Fades as AI Giants Rise (www.leefang.com)

0 points 9 hours ago ago | visit original

🤖 AI Summary

AI’s copyright fault lines have shifted from street‑level piracy to industrial-scale scraping: companies that once led anti‑piracy crusades are now accused of vacuuming copyrighted books, academic articles and other published works to train large language models. After decades of criminal enforcement driven by software firms like Microsoft, recent litigation — notably Kadrey et al. v. Meta — alleges that Meta and other AI leaders (OpenAI, Microsoft, Google, Anthropic, xAI) used mirrors of piracy repositories such as Library Genesis and even torrenting to assemble massive training corpora. Internal emails disclosed in suits show engineers escalating use of illicit sources up to executives and approving their inclusion; OpenAI’s reliance on Microsoft’s $13.75B‑backed Azure ties the old and new ecosystems together. The shift is significant because it exposes a legal and governance gap: federal criminal enforcement has largely receded, leaving authors and publishers to pursue civil class actions that could reshape data sourcing practices. Technically, model performance hinges on vast, diverse text corpora, so the industry’s willingness to scrape copyrighted material at scale raises IP risk, provenance and data‑quality issues, and pressure for new licensing markets or stricter regulation. The cases highlight competing futures for model training — formalized licensing and auditable datasets versus opaque, high‑volume ingestion of illicit sources — with major consequences for compliance, model deployment risk, and the economics of publishing and AI development.

Loading comments...

loading comments...