AIs can generate near-verbatim copies of novels from training data (arstechnica.com)

0 points 7 hours ago ago | visit original

🤖 AI Summary

Recent studies have revealed that leading AI language models, including those from OpenAI, Google, and Anthropic, can generate near-verbatim text from bestselling novels, challenging the industry’s claim that these systems do not memorize copyrighted material. Researchers from Stanford and Yale demonstrated that by strategically prompting these models, they could produce significant portions of well-known works like "Harry Potter" and "The Hunger Games," with one model reproducing nearly 77% of "Harry Potter and the Philosopher’s Stone" accurately. This discovery raises critical legal and ethical questions within the AI/ML community, particularly regarding the ongoing copyright lawsuits against AI companies. Experts assert that the ability to memorize and replicate copyrighted text undermines defenses used by these companies, which have long maintained that their models learn from, rather than store, training data. As the lines blur between learning and memorization, the implications for copyright laws and the future of AI training practices could be profound, potentially shaping new regulations and industry standards around the use of copyrighted materials in AI development.

Loading comments...

loading comments...