H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs (arxiv.org)

0 points 29 days ago ago | visit original

🤖 AI Summary

Researchers at Tsinghua University have identified a sparse subset of neurons in large language models (LLMs), termed "H-Neurons," that are specifically associated with generating hallucinations—plausible yet factually incorrect responses. This investigation reveals that less than 0.1% of total neurons can reliably predict when hallucinations will occur, demonstrating significant generalization across different scenarios. These neurons are linked to over-compliance behaviors in LLMs, which can lead to unsafe or misleading outputs. The study sheds light on neuron-level mechanisms that contribute to hallucinations, offering new insights into the challenges of improving LLM reliability. Significantly, this research bridges the gap between macro-level behavioral patterns and the microscopic neural activities within models. By understanding the origins of H-Neurons, which were shown to arise during the pre-training phase, the findings suggest strategies for enhancing model reliability. This neuron-centric perspective not only aids in predicting hallucinations but also presents intervention points that could modify model behavior, ultimately enriching the field of AI/ML by promoting safer and more accurate LLM outputs.

Loading comments...

loading comments...