Getting Answers from a Big PDF with RubyLLM (max.engineer)

0 points 3 hours ago ago | visit original

🤖 AI Summary

A developer built a quick CLI “ask your PDF” tool in Ruby using the RubyLLM gem to let an LLM answer questions against a 1,249‑page API PDF without sending the whole document to the model. Instead of dumping the file, they parse pages locally with the pdf-reader gem and expose two RubyLLM::Tool subclasses: PdfPageReader (returns text for specified pages, squeezing noisy dotted lines to save tokens) and PdfPageSearch (runs pdfgrep with PCRE to find pages matching a regex). The chat agent is configured with .with_tool and instruction prompts (e.g., scan the TOC on pages 31–49 first, ask for multiple pages per call) so the model fetches only relevant slices. The script also shows fetching an API key from 1Password and prints token usage (example run consumed ~95k input + 643 output tokens). Significance: this is a pragmatic, low-effort pattern for “talk-to-your-docs” workflows that reduces token waste and roundtrips by combining local parsing/search with LLM orchestration via tools. Technical takeaways: use local PDF parsing/search to prefilter pages, implement tool wrappers so the model can request page ranges or regex searches, and be mindful of model limits (Gemini’s 1,000‑page parsing limit prompted the hybrid approach). The author later switched from Gemini to OpenAI o3 and reported better search performance, showing the approach is portable across LLM providers.

Loading comments...

loading comments...