Antislop: A framework for eliminating repetitive patterns in language models (arxiv.org)

🤖 AI Summary
Antislop is a new open-source framework that detects and removes characteristic repetitive phraseology—“slop”—that plagues many LLM outputs and makes them easy to spot as machine-generated. The authors introduce three complementary tools: the Antislop Sampler, an inference-time backtracking sampler that suppresses unwanted strings without corrupting the model’s vocabulary; an automated profiling pipeline that compares model outputs to human baselines and generates targeted training examples; and Final Token Preference Optimization (FTPO), a fine-tuning method that surgically adjusts logits for individual tokens wherever a banned pattern appears in an inference trace. The results are substantial and practical: some slop patterns occur over 1,000× more often in LLM outputs than in human text; the Antislop Sampler can suppress 8,000+ patterns while preserving generation quality (token banning fails around 2,000 patterns); and FTPO achieves ~90% slop reduction while maintaining or improving performance on cross-domain benchmarks (GSM8K, MMLU, creative writing). Compared to alternatives like DPO, which reduced slop less effectively and hurt writing quality and lexical diversity, Antislop provides both inference-time and fine-tuning solutions that improve human-likeness and diversity of outputs. All code and results are released under an MIT license, enabling immediate adoption and further research.
Loading comments...
loading comments...