Show HN: Spectral Indexing, from concept to paper to alpha in 45 days (www.tuned.org.uk)

🤖 AI Summary
ArrowSpace v0.21.0 ships a working proof‑of‑concept for spectral, energy‑informed vector search that closes the loop from the original paper: index build → spectral/energy construction → λτ ranking → lightweight query-time retrieval. The release formalizes two build paths — eigenmaps (Laplacian → Rayleigh energy → per‑row λτ) and energymaps (optical compression → diffusion + subcentroid splitting → JL projection → energy Laplacian → τ‑mode λτ synthesis) — and operationalizes the bounded transform E/(E+τ) for ranking. Significance for the AI/ML community: it demonstrates a practical, manifold-aware alternative to cosine‑only retrieval that keeps query cost bounded (only per‑item λ scalars, norms and a small reduced Laplacian need be stored), enables hierarchical search units via diffusion‑split subcentroids, and offers integrated dataset analysis for drift and slice comparisons. Key technical details and implications: search_energy computes a query λ via subcentroid mapping and ranks by an adaptive energy distance blending λ proximity with bounded Dirichlet/L2 terms, using cosine only as a tie‑breaker. The energymaps pipeline uses optical 2D compression (DeepSeek‑OCR inspired), diffusion smoothing controlled by η (eta) and step count, and JL projection to preserve energy scoring in reduced space. Benchmarks on a 300K×384 CVE corpus show build times ~75–83s and peak MRR≈0.75 (NDCG@10 ≈0.72) with η≈0.22–0.5 and 4–8 steps (η=0.5, steps=4 is a pragmatic speed/accuracy sweet spot). Planned work includes auto‑computing build parameters and parquet querying to enable datalake‑scale deployment.
Loading comments...
loading comments...