🤖 AI Summary
Last week the team released Chandra, a new OCR model that tops the independent olmocr benchmark and is built around full‑page, layout‑aware decoding rather than the blockwise pipeline used by earlier models (Marker/Surya). That architectural shift lets Chandra identify and treat page elements (text, images, tables, figures, checkboxes, handwriting and math) as coherent objects, enabling tasks beyond plain text extraction — e.g., image cropping and captioning, structured table/figure extraction, and more reliable form and checkbox recognition. The model’s strong math parsing—trained on both labeled real data and synthetic examples—has already attracted adoption from AI labs and outperforms some large commercial models on difficult pages (old fonts, handwritten math).
Technically, Chandra supports full‑page decoding to preserve cross‑cell/table context, improves layout understanding for image and table extraction, and handles handwriting and degraded documents better than pipeline approaches. The team offers quantized 8‑bit and 2‑bit versions for on‑prem deployment (up to ~4 pages/sec on an H100, ~345k pages/day) with minimal accuracy loss, plus an open/free tier under a revenue limit and paid support for heavier users. Roadmap items include better low‑resource language support, lower latency, and improved math—making Chandra a practical step forward for document understanding and downstream NLP/vision pipelines.
Loading comments...
login to comment
loading comments...
no comments yet