🤖 AI Summary
A preregistered study by Chakrabarty, Ginsburg, and Dhillon tested whether frontier LLMs can write literary prose that readers prefer to expert human writers. They asked ChatGPT, Claude, and Gemini to emulate 50 award‑winning authors in up to 450‑word excerpts, and compared AI outputs to pieces written by MFA‑trained writers in blind pairwise evaluations by 159 expert (MFA) judges and lay readers (Prolific). With simple in‑context prompting, experts strongly rejected AI for both stylistic fidelity (OR = 0.16, p < 10^-8) and overall quality (OR = 0.13, p < 10^-7), while lay readers were mixed. However, when ChatGPT was fine‑tuned on each author’s complete works, the results flipped: experts favored fine‑tuned AI for stylistic fidelity (OR = 8.16, p < 10^-13) and quality (OR = 1.87, p = 0.01), with similar shifts among lay readers. Results are robust to cluster‑robust inference and hold across authors.
Technically and legally notable: fine‑tuning almost eliminated detectable “AI” stylistic quirks (e.g., cliché density), cutting AI detector flags from 97% to 3% and mediating the preference reversal. The authors estimate median fine‑tuning+inference cost at ~$81 per author (a ~99.7% reduction vs typical professional rates), though they don’t model extra human labor needed to turn excerpts into publishable novels. The study supplies empirical evidence bearing on copyright’s “market effect” fair‑use factor, highlights risks of author‑specific model replication, and signals impending tensions among detection, attribution, and policy as fine‑tuned models can produce non‑verbatim, reader‑preferred prose.
Loading comments...
login to comment
loading comments...
no comments yet