🤖 AI Summary
A new proof-of-concept codec called **subtext-codec** has been developed to embed arbitrary binary data within seemingly normal text generated by large language models (LLMs). This innovative approach manipulates the model's next-token decisions by utilizing the rank of tokens in the logit distribution. The codec ensures that the hidden data is fully reversible, allowing for natural reading while seamlessly encoding bytes. The technique involves sorting logits, applying a cumulative probability threshold, and selecting tokens based on a predetermined active base, which dynamically adjusts according to the model's confidence levels.
The significance of subtext-codec lies in its potential applications in covert communication and data storage within generative text context. By integrating deterministic next-token steering and mixed-radix payload reconstruction, it achieves a clever balance between readability and data concealment. Although designed for experimentation and demonstration rather than production use, the codec offers a compact implementation compatible with popular libraries like Hugging Face Transformers. As the AI/ML community explores new avenues for steganography in language models, subtext-codec stands out as a remarkable step forward in merging linguistic creativity with hidden data encoding.
Loading comments...
login to comment
loading comments...
no comments yet