Show HN: ocrbase – PDF/IMG –>.MD/JSON Model-Agnostic OCR API (github.com)

0 points 72 days ago ago | visit original

🤖 AI Summary

ocrbase has launched a novel, lightweight API designed for model-agnostic optical character recognition (OCR), which standardizes document parsing across various visual language models (VLMs). Built with a minimal footprint using Tiny Bun and Elysia, this versatile tool allows users to easily deploy OCR capabilities in just a single command. It supports multiple models, including GLM-OCR and PaddleOCR-VL, by simply pointing to their respective URLs. Significantly, ocrbase achieves state-of-the-art performance, scoring above 94.5 on the OmniDocBench v1.5 benchmark. Key features include endpoints for synchronous and asynchronous document parsing, job status checking, and seamless integration with Amazon S3 for file storage and retrieval. The API can be further enhanced by setting environment variables for Redis and S3, enabling queuing and job tracking. This tool not only simplifies the deployment of OCR services but also expands accessibility for developers working with different AI models, making it a noteworthy contribution to the AI/ML community.

Loading comments...

loading comments...