Making LLMs more accurate by using all of their layers (research.google)

🤖 AI Summary
Google Research introduced SLED (Self Logits Evolution Decoding), a decoding strategy that reduces LLM hallucinations by aligning outputs with the model’s own internal knowledge rather than relying on external retrieval or extra fine-tuning. Presented at NeurIPS 2024, SLED leverages the probability estimates (logits) produced at every Transformer layer instead of using only the final layer, creating a multi-layer consensus that boosts factual accuracy across multiple-choice, free-response, and chain-of-thought benchmarks. Technically, SLED reapplies the model’s final projection matrix to intermediate “early-exit” logits to convert them into token probability distributions, then takes a weighted average across layers to form the next-token distribution. This lets earlier layers’ signals (e.g., cues that prefer “x” over “=” in a multi-step math problem) correct mistaken high-confidence choices from the final layer. The method is model-agnostic (tested on Gemma 3, GPT-OSS, Mistral), improves accuracy up to ~16% versus base decoding and prior methods like DoLa, and integrates with other factuality decoders. The trade-off is modest extra latency (roughly a few percent compared to DoLa). Code is available on GitHub, making SLED an easy plug-in for practitioners seeking better factuality without extra data or retraining.
Loading comments...
loading comments...