Hallucination Risk Calculator (github.com)

0 points 2 days ago ago | visit original

🤖 AI Summary

A new toolkit, the Hallucination Risk Calculator, offers a post-hoc calibration method for large language models (LLMs) to estimate and control hallucination risk without retraining. Leveraging the Expectation-level Decompression Law (EDFL), it quantifies a bounded hallucination risk (in nats) from a given prompt and makes a transparent, mathematically grounded decision to either ANSWER or REFUSE, meeting user-specified reliability SLAs. This approach enhances trustworthiness by providing explicit, interpretable risk bounds and guaranteeing conservative safety margins through a dual-prior strategy: worst-case priors for strict SLA adherence and average priors for realistic risk estimates. Technically, the system builds rolling priors via ensembles of “skeleton” prompts generated by either removing or masking key evidence or semantic entities like numbers and proper nouns. It operates in two modes: evidence-based (with context present) and closed-book (no evidence), using only the OpenAI Chat Completions API, eliminating the need for model retraining. The decision to answer hinges on an Information Sufficiency Ratio that compares the information gain of the prompt against a Bits-to-Trust threshold derived from EDFL. Extensive calibration options ensure tuning of hallucination rate targets, sampling stability, and masking strategies, validated on labeled datasets with rigorous statistical confidence bounds. This toolkit represents a significant advance in reliable LLM deployment by offering a modular, auditable framework for risk-aware generation, ultimately empowering developers to implement safer AI assistants and applications with transparent hallucination guarantees at low operational cost.

Loading comments...

loading comments...