🤖 AI Summary
This blog post explores how to recreate Navan’s receipt scanning feature by extracting the total amount using the SmolDocling-256M-preview machine learning model. SmolDocling is an image-text-text model that processes receipt images alongside a text prompt to output relevant text, such as the total charge. Despite being a relatively small model (256 million parameters) that can run efficiently on less powerful hardware like a Macbook M1 Pro, it demonstrated promising accuracy—correctly extracting totals in 4 out of 5 different receipt tests after fine-tuning the prompts.
The significance for the AI/ML community lies in showing how smaller, accessible pre-trained models can perform practical tasks like receipt parsing with decent accuracy (~80%) and reasonable inference times (~2.1-2.7 seconds per image), potentially reducing manual data entry in expense management apps. However, the model struggled with receipts that had small font sizes or more complex layouts, highlighting the need for larger models or better prompt engineering to improve reliability. This experiment underscores the trade-offs between model size, performance, and accuracy in real-world document OCR tasks, offering a viable baseline for teams seeking to integrate lightweight receipt recognition without extensive resource demands.
Loading comments...
login to comment
loading comments...
no comments yet