🤖 AI Summary
Antislop is a new open-source framework that detects and removes characteristic repetitive phraseology—“slop”—that plagues many LLM outputs and makes them easy to spot as machine-generated. The authors introduce three complementary tools: the Antislop Sampler, an inference-time backtracking sampler that suppresses unwanted strings without corrupting the model’s vocabulary; an automated profiling pipeline that compares model outputs to human baselines and generates targeted training examples; and Final Token Preference Optimization (FTPO), a fine-tuning method that surgically adjusts logits for individual tokens wherever a banned pattern appears in an inference trace.
The results are substantial and practical: some slop patterns occur over 1,000× more often in LLM outputs than in human text; the Antislop Sampler can suppress 8,000+ patterns while preserving generation quality (token banning fails around 2,000 patterns); and FTPO achieves ~90% slop reduction while maintaining or improving performance on cross-domain benchmarks (GSM8K, MMLU, creative writing). Compared to alternatives like DPO, which reduced slop less effectively and hurt writing quality and lexical diversity, Antislop provides both inference-time and fine-tuning solutions that improve human-likeness and diversity of outputs. All code and results are released under an MIT license, enabling immediate adoption and further research.
Loading comments...
login to comment
loading comments...
no comments yet