đ¤ AI Summary
Recent research and internal OpenAI tests reveal hallucinations in large language models are worsening: OpenAI found its o3 model fabricated answers on 33% of factual questions and o4-mini on 48%. The trend isnât just noise â OpenAI itself says GPT-4o has âunintentionally increased what users perceive as âbluffingâ,â and investigations (e.g., Sky News) have shown models confidently inventing transcripts or doubling down when corrected. For enterprises in healthcare, finance, law and other high-stakes fields, these error rates make human oversight mandatory and undercut AIâs value proposition, since hallucinations can be plausible, random, and nearly impossible for non-experts to spot.
Technically, the root cause is how LLMs are built: they predict probable token sequences from training data rather than ground truths, and efforts to make them more human-like (empathetic, certainty-signaling) amplify confident but incorrect outputs. The article argues scaling alone wonât fix this and promotes neurosymbolic AI â hybrids that pair neural language models for fluent interfaces with deterministic symbolic reasoning that encodes factual rules and can explicitly say âI donât know.â That hybrid promises reproducible outputs, verifiable facts, and better auditability, offering a practical path for enterprises to gain generative capabilities without the current risk profile of pure LLM deployments.
Loading comments...
login to comment
loading comments...
no comments yet