ScribeOCR – Web interface for recognizing text, OCR, & creating digitized docs (github.com)

0 points 7 hours ago ago | visit original

🤖 AI Summary

ScribeOCR is a free, open web app (live at scribeocr.com) for extracting, proofreading and exporting digitized text from images and scanned documents. It combines client-side OCR (recognition handled by the Scribe.js library) with a precision proofreading UI that overlays editable text directly on source images, flags low-confidence words, and generates custom per-document fonts to improve alignment. Use cases include creating searchable PDFs (an alternative to Acrobat), editing existing OCR/HOCR output (e.g., from Tesseract), and producing “ebook-style” native-text PDFs that faithfully reproduce formatting without huge image-overlaid files. The project’s codebase separates the UI (this repo) from recognition (Scribe.js), and all processing runs in your browser—no data is sent to a server. For developers and practitioners the key benefits are speed and accuracy in post-OCR correction: the precise overlay + font optimization makes errors obvious and can raise practical accuracy from ~98% to 100% during proofreading, while exporting supports both invisible-text-over-image PDFs and compact native-text ebook exports. You can run a local copy (git clone --recursive …; npm i; npx http-server) to serve files locally; a desktop app isn’t available yet but the repo invites feature requests. Docs and user guidance are at docs.scribeocr.com, and recognition/implementation discussion belongs in the Scribe.js repository.

Loading comments...

loading comments...