DeepSeek-OCR Compression Meets Energy Search (www.tuned.org.uk)

0 points 6 days ago ago | visit original

🤖 AI Summary

ArrowSpace v0.18.0 integrates a Rust reimplementation of DeepSeek-OCR’s optical-compression primitive (built with burn.dev) and pairs it with a novel energy-based retrieval model that moves beyond cosine similarity. The optical pipeline treats rendered text as images and compresses them into compact vision tokens using two vision encoders (SAM-base, ~80M, for hierarchical window attention and 16× conv compression; CLIP-large, ~300M, for global semantics) plus an MLP projector. That pipeline yields ≈10× text-to-token compression (e.g., ~1,000 words → ~100 vision tokens; SAM produces 256 tokens from 1024×1024 inputs) and supports five resolution modes to tune token budgets. Implementation includes spatial binning, low-activation pooling, and information-theoretic metrics (Shannon entropy, word-to-token ratios) to validate semantic preservation (~97% OCR decoding precision at 10×). Crucially, ArrowSpace replaces geometric cosine ranking with an energy-distance score combining Rayleigh quotients (λ), Gini dispersion, and bounded Dirichlet energy to respect spectral and topological manifold structure. The v0.18.0 energymaps pipeline builds a bootstrap kNN Laplacian, diffuses node features (heat flow), splits high-dispersion nodes, and weights edges by energy distance; search is then ranked by weighted energy terms rather than cosine. Experiments on a CVE corpus (1,681 items) show optimal diffusion hyperparameters at low η and moderate steps (e.g., η=0.05, steps=4–6) producing strong MRR and NDCG, suggesting energy-aware indices can better capture dataset topology for retrieval at long contexts and high compression.

Loading comments...

loading comments...