Token-Oriented Object Notation (Toon) (github.com)

0 points 1 day ago ago | visit original

🤖 AI Summary

Token-Oriented Object Notation (TOON) is a compact, human-readable serialization format designed to pass structured data into large language models while cutting token usage. It blends YAML-style indentation for nesting with CSV-like tabular rows: you declare a uniform object schema once (e.g., users[2]{id,name,role}:) and then stream rows, removing repeated keys, braces and most quotes. TOON is lossless and intended as a drop-in LLM input representation for JSON data — best for uniform arrays of objects where repeated field names dominate token cost; for deeply nested or highly irregular data JSON may be more efficient. Benchmarks show the practical tradeoffs: TOON typically reduces token counts by ~30–60% vs pretty JSON (benchmarks measured with the GPT-5 o200k_base tokenizer), and on the reported datasets it used 46.3% fewer tokens while improving overall retrieval accuracy (70.1% vs JSON’s 65.4%). The format also adds LLM-friendly guardrails — explicit field lists and lengths that aid parsing and validation — and ships with a spec, CLI and API for conversion (recommended workflow: keep JSON programmatically, convert to TOON for prompts). The takeaway for ML engineers: when you’re sending many uniform tabular payloads to LLMs and token cost or prompt clarity matters, TOON can meaningfully lower cost and often improve model retrieval accuracy; evaluate on your data shape before adopting.

Loading comments...

loading comments...