Show HN: Searchable compression for JSON (p50≈0.18 ms; 10-min demo) (github.com)

0 points 5 hours ago ago | visit original

🤖 AI Summary

Show HN: SEE — a schema-aware, searchable JSON compressor that keeps data compressed yet directly queryable. The demo reports combined size ≈19.5% of raw JSON, ms-level lookup latency (p50 ≈ 0.18 ms, p95 ≈ 0.28 ms, p99 ≈ 0.34 ms) and ~99% skip rate using Bloom filters and skip indices. The pipeline combines structure-aware encoding, delta compression, Zstd, and page-level random access so you can test existence/position without full decompression or JSON parsing. Why it matters: for AI/ML teams that ingest large NDJSON logs, telemetry or feature-event streams, I/O and JSON parsing are major cost/latency drivers. SEE trades a modest size increase versus Zstd for searchable compressed data, drastically reducing egress, storage, and CPU “tax” and improving TCO for I/O/CPU-bound workloads. Key technical implications: schema awareness + delta encoding preserves repetitiveness, Bloom density ≈0.30 reduces false positives, and page-granularity enables random access and sub-ms lookups. Demo materials (ZIP, wheel, quick_demo.py) reproduce KPIs in ~10 minutes; the release includes a OnePager and checksums. Best fit: repetitive JSON/NDJSON (logs, events, telemetry, metrics).

Loading comments...

loading comments...