RefChecker for Fine-Grained Hallucination Detection (github.com)

🤖 AI Summary
RefChecker introduces a fine-grained hallucination detection framework targeting the outputs of large language models (LLMs), breaking down generated content into knowledge triplets (subject-relation-object) for precise fact verification. This granular approach surpasses traditional sentence- or paragraph-level assessments by pinpointing the truthfulness of individual factual claims. RefChecker supports three contextual scenarios—zero, noisy, and accurate context—reflecting common real-world use cases ranging from open question answering to document summarization, thereby broadening its applicability in diverse AI tasks. Technically, RefChecker operates via a modular, three-stage pipeline encompassing claim extraction, hallucination checking, and result aggregation, all configurable via command-line tools or APIs. It integrates with major LLMs including GPT-4, Claude 2, LLaMA2, and others through frameworks like litellm and vllm, and supports deployment on cloud platforms such as Amazon Bedrock and OpenAI. A substantial human-annotated benchmark with 2.1k LLM responses across seven models validates its effectiveness. Additionally, non-LLM-based checkers provide efficiency gains for large-scale usage. RefChecker’s released data, paper, and demo website empower AI researchers and developers to rigorously evaluate and improve the factual reliability of LLM-generated content, addressing a critical challenge in AI safety and trustworthiness.
Loading comments...
loading comments...