Show HN: I built a free OCR tool powered by DeepSeek and PaddleOCR engines (deepseekocr.io)

0 points 252 days ago ago | visit original

🤖 AI Summary

A new free web demo, "OCR Playground," combines DeepSeek‑OCR and PaddleOCR to let users drag-and-drop images or PDFs (first page free) and pick an engine, model size (Tiny→Gundam), and task type (plain OCR, Markdown conversion, figure parsing, grounding, VLM descriptions). PaddleOCR is recommended for speed on common documents; DeepSeek for complex layouts. Results can be exported as raw text, rendered Markdown, or visualized with bounding boxes. The site is open-source, with model weights available on GitHub; multi-page PDFs, batch processing, and a hosted API are planned as Pro or upcoming features. Technically, DeepSeek touts two core innovations: "Contexts Optical Compression," which compresses high‑resolution pages into up to 10× fewer vision tokens, and a low‑memory DeepEncoder architecture that enables high throughput and deployment on modest hardware. The team reports up to ~97% OCR accuracy on complex benchmarks, support for nearly 100 languages, and throughput claims of ~200k pages/day on one GPU. Those properties make it useful for large‑scale data extraction, LLM/VLM training corpora, archival digitization, and automation of finance/academic workflows. Tradeoffs are explicit: choose PaddleOCR for cost/speed, DeepSeek for layout-aware accuracy; no hosted API yet, but the open repo allows self‑hosting and integration.

Loading comments...

loading comments...