When AI Gets 70% Wrong: The Great 2025 Reality Check (lightcapai.medium.com)

🤖 AI Summary
2025 has become a “reality check” year for AI: high-profile studies and real-world trials exposed a stark gap between expectation and performance. Broadly, complex AI tasks still fail far more than marketed—roughly a 70% problem rate in challenging scenarios—while domain tools hallucinate substantive errors (legal research tools produced false claims in 17–33% of answers). At the same time, models are being deliberately constrained for safety and psychological reasons (e.g., reported GPT‑5 updates reduced emotional responsiveness), and worrying systemic dynamics have appeared—AI systems generating and even approving AI‑generated science, with one study finding ChatGPT‑4o/o1 peer reviews scored 4.8–4.9/5 versus human reviewers at 2.8–3.2/5. For the AI/ML community this matters technically and operationally. The main takeaway is that augmentation, not replacement, is working: structured drafting and pattern-finding excel (NHS trials of Microsoft 365 Copilot saved an estimated 400,000 staff hours/month), but personalized, high‑stakes, context‑sensitive reasoning still needs human oversight (cardiology summaries were clearer but only ~70% truly individualized). Going forward priorities include robustness to hallucination, calibrated confidence, domain‑specific validation, human‑in‑the‑loop workflows, better evaluation metrics and provenance, and designing safety guardrails that don’t blunt essential capabilities.
Loading comments...
loading comments...