LLMc: Beating All Compression with LLMs (syfi.cs.washington.edu)

🤖 AI Summary
Thinking Machines Lab’s new research post about fixing nondeterminism in LLM inference unlocked a practical surprise: LLMc, an open-source compressor that uses a large language model itself as the compression/decompression reference. Built by Yi Pan and colleagues, LLMc leverages the fundamental link between language modeling and source coding—the optimal code length is proportional to negative log-likelihood—so a high-quality autoregressive LLM can serve as a high-capacity probabilistic reference. Because Thinking Machines defeated non-deterministic kernels, decompression can deterministically replay the same model to recover original text. Technically, LLMc encodes not tokens but their rank in the model’s next-token probability distribution. In most contexts the true next token ranks among the top few candidates, so storing small integer ranks is far more compact than raw tokens; decompression replays the model with identical context and applies stored ranks to reconstruct text losslessly. Benchmarks show LLMc outperforms traditional compressors (ZIP, LZMA) across Wikipedia, narrative, and scientific abstracts and competes with closed-source systems. Key limitations: inference is memory- and compute-bound (quadratic with sequence length), so LLMc chunks text to improve GPU utilization, throughput lags conventional compressors, and numerical stability requires batch-invariant kernels and integer rank encoding. The implementation currently targets natural language; broader modalities are future work and contributions are welcome.
Loading comments...
loading comments...