LLMs believe false statements even after explicit warnings that they’re false (arstechnica.com)

0 points 1 hour ago ago | visit original

🤖 AI Summary

Recent research has revealed that large language models (LLMs) struggle to dismiss false information, even when explicitly marked as such, highlighting a phenomenon known as “negation neglect.” An international team of researchers found that LLMs continued to integrate fictitious statements into their models despite receiving clear warnings about their inaccuracy. This raises significant concerns regarding the reliability of AI, as it helps explain the frequent hallucinations of false information from these models. The study involved training LLMs like Qwen3.5-35B-A3B, Kimi K2.5, and GPT-4.1 on synthetic documents that incorporated outrageous false claims. The results were alarming: for instance, belief in these falsehoods surged from 2.5% to 92.4% after fine-tuning with misleading data. These findings underscore the necessity for improved methodologies in curating training datasets, emphasizing the importance of how AI systems are taught to process and understand information, ultimately influencing their reliability and safety in real-world applications.

Loading comments...

loading comments...