Does Using In-Copyright Works as Training Data Infringe? (cacm.acm.org)

🤖 AI Summary
Two federal judges—the first to squarely test training-data fair use—reached mixed but important rulings in Bartz v. Anthropic and Kadrey v. Meta. Both judges concluded that copying in‑copyright books to train foundation models is, at least in purpose, “transformative” (one of the four fair‑use factors courts weigh). Judge Alsup (Anthropic) found two training uses fair but ruled against Anthropic for maintaining a database of pirated books, calling piracy inherently infringing even if used for model training. Judge Chhabria (Meta) accepted Meta’s fair‑use defense on the record before him, but only because Kadrey’s market‑harm argument failed; he treated the pirated‑books issue as immaterial to objective fairness in that case. The decisions establish key precedents and fault lines: courts may accept transformative training uses and the reasonableness of copying entire works for model-building, but the nature of expressive works weighs against fair use, and use of pirated datasets can trigger liability. Both judges rejected plaintiffs’ “lost‑license” theory (large‑scale licensing deemed infeasible), but Chhabria flagged a novel—and controversial—“market dilution” theory (AI output flooding markets) that, if supported with evidence, could tilt future cases. Practically, developers can cautiously rely on fair use but face legal risk from piracy, dataset curation choices, and evolving theories of indirect market harm; appeals are likely.
Loading comments...
loading comments...