🤖 AI Summary
Chinese AI startup DeepSeek released an OCR model that isn’t notable just for extracting text from images, but for experimenting with a new way to store and retrieve information: instead of breaking text into thousands of small text tokens, it packs written content into image-shaped “visual tokens” and applies tiered compression (older or less important content is stored blurrier). The system reportedly matches top OCR benchmarks while using far fewer tokens, can generate over 200,000 pages of training data per day on a single GPU, and is positioned as a testbed for improving AI memory and long-context behavior.
The significance for AI/ML is twofold: visual tokens could dramatically reduce the compute and storage costs of maintaining long conversational histories—helping to mitigate “context rot” and the growing carbon footprint of large models—and open new avenues for continuous-agent design and data generation. Researchers praise the idea (Andrej Karpathy and academics at Northwestern), but caution it’s early work: current methods still tend to recall recent items linearly rather than prioritizing importance, and more research is needed to extend visual tokens to reasoning and dynamic, human-like forgetting. If validated, this shift from text to image-based context could alter how models scale memory and enable more efficient long-term agents.
Loading comments...
login to comment
loading comments...
no comments yet