Major AI conference flooded with peer reviews written fully by AI (www.nature.com)

🤖 AI Summary
Concerns that large language models had been drafting peer reviews at the International Conference on Learning Representations (ICLR) prompted a mass audit: Pangram Labs scanned 19,490 submissions and 75,800 peer reviews for ICLR 2026 after researchers reported hallucinated citations, long vague feedback, and incorrect numerical claims. Using its own LLM-detection tool and scripts to extract text, Pangram flagged roughly 21% of reviews as fully AI-generated and found that over half showed signs of AI use. On the manuscript side, 199 submissions (~1%) appeared fully AI-generated, 61% were mostly human-written, and 9% had more than 50% AI-generated text. The team posted results and a preprint to document their methodology and examples. The episode is significant because it’s the first large-scale, quantified instance of LLM-written peer review targeting a flagship ML conference, exposing a trust gap in a process that influences acceptance decisions and reputations. Technically, the findings show current LLMs can produce plausible but error-prone critiques (hallucinated citations, incorrect metrics, odd requests), which can sway reviewers or program chairs. ICLR organizers plan automated screening for policy breaches; the community now faces urgent questions about detection reliability, reviewer guidelines, auditability of reviews, and how to adapt review workflows and incentive structures to prevent AI-enabled gaming or degradation of review quality.
Loading comments...
loading comments...