OpenAI: AI hallucinations are mathematically inevitable, not engineering flaws (www.computerworld.com)

🤖 AI Summary
OpenAI published a paper formalizing a stark claim: hallucinations in large language models are mathematically inevitable, not merely engineering bugs. The authors prove lower bounds tying generative error rates to fundamental training and inference limits — notably that the generative error rate is at least twice the IIV (“Is‑It‑Valid”) misclassification rate — and identify three root causes: epistemic uncertainty from rare or missing data, architectural representational limits, and computational intractability for certain problems. The team demonstrated the effect empirically across state‑of‑the‑art systems (e.g., a “How many Ds are in DEEPSEEK?” probe produced inconsistent answers even from a 600B‑parameter DeepSeek‑V3 and from competitors), and reported nontrivial hallucination rates for OpenAI’s own reasoning models (o1: 16%; o3: 33%; o4‑mini: 48% in a summarization task), underscoring that higher model complexity doesn’t eliminate these errors. The paper also argues industry practice exacerbates the issue: common benchmarks reward confident guesses over “I don’t know,” creating incentives to hallucinate. OpenAI suggests mitigations such as explicit confidence targets, better-calibrated evaluations, and governance changes (human‑in‑the‑loop, domain guardrails, continuous monitoring), but accepts full elimination is impossible. For researchers and enterprises this reframes priorities from chasing zero error to managing risk — improving uncertainty estimation, revising benchmarks, and building containment and oversight systems to safely deploy models whose outputs will sometimes be plausibly wrong.
Loading comments...
loading comments...