Sick: Indexed deduplicated binary storage for JSON-like data structures (github.com)

🤖 AI Summary
SICK is an encoding and library approach that converts JSON-like data into an indexed, deduplicated binary representation so callers can access values just-in-time without parsing whole documents. By flattening structures into typed tables with fixed-size "reference" pairs (type, index) and compact list encodings (length + offsets + concatenated payloads), SICK enables random access, perfect streaming parsers, incremental updates (including remove messages), multi-file deduplication, circular references, and custom scalar or polymorphic types. The design addresses a fundamental limitation of JSON—its Type-2 grammar requires pushdown parsing and full accumulation for deeply nested data—by making values directly addressable and reorderable when no removals are present. Technically, SICK represents every unique value in a table (strings, objects, arrays, roots) and uses one-byte type markers for ~15 core types (null, booleans, ints, floats, strings, arrays, objects, root, etc.). References are fixed-size binary pairs, arrays and variable-length lists use offset tables for indexing, and the format supports native types like timestamps via additional type tags. Current implementations exist in Scala and C# (encoders/decoders only) plus a Scala.JS-backed JavaScript port; streaming encoders/decoders are not yet implemented. Tradeoffs include a more complex encoder and limits today (max object keys 65,534; 2^32 array elements/unique values), but the format is battle-tested in proprietary apps and invites third-party implementations and contributions.
Loading comments...
loading comments...