🤖 AI Summary
A preprint posted to medRxiv (12 Sept) reports that text-generating AIs can be used to mass-produce “copycat” biomedical papers that evade publishers’ plagiarism checks. Researchers screened PubMed for redundant association studies using the US NHANES open-health dataset and identified 411 near-duplicate papers published in 112 journals between Jan 2021 and July 2025; some associations appeared in six independent but almost identical papers. To test misuse, the team used ChatGPT and Google’s Gemini to rewrite three heavily duplicated articles (using the original text and NHANES data) and, after about two hours of manual cleanup per manuscript, produced new drafts that did not trigger standard plagiarism-detection thresholds.
This matters for AI/ML and research integrity: it demonstrates that large language models can generate derivative but publishable-seeming science that circumvents automated similarity checks, facilitating paper mills and opportunistic authors to flood literature with low-value, non-novel studies. Technical implications include the need for better provenance and reproducibility checks (data/code links, stricter editorial screening for redundant analyses), development of robust AI-usage disclosures or watermarking, and improved detection tools that go beyond surface text similarity to spot statistical redundancy and synthetic generation. Left unchecked, this workflow could scale across open datasets and dilute the scientific record.
Loading comments...
login to comment
loading comments...
no comments yet