Show HN: Nano PDF – A CLI Tool to Edit PDFs with Gemini's Nano Banana (github.com)

0 points 227 days ago ago | visit original

🤖 AI Summary

Nano PDF is a command‑line tool that uses Google’s Gemini 3 Pro Image (aka “Nano Banana”) to let you edit PDF slide decks with natural‑language prompts — e.g., “Change the chart to a bar graph” or “Add a title slide.” It can modify multiple pages in parallel, insert new slides that match an existing visual style, and preserves the searchable text layer by “OCR re‑hydration” (Tesseract) after AI image edits. The workflow: Poppler renders PDF pages to images, the images + prompts (and optional style‑reference pages or full document context) are sent to Gemini for image generation, then Tesseract restores text and pages are stitched back into the PDF. Resolution is configurable (4K/2K/1K) to trade quality, speed and cost. For AI/ML practitioners this is a practical example of multimodal editing pipelines combining LLM/image‑generation APIs, classical OCR, and PDF tooling to create non‑destructive, editable artifacts. Important technical notes: it requires Python 3.10+, poppler and tesseract installed, and a paid Google Gemini API key with billing enabled (free tier doesn’t support image gen). Parallel processing speeds up multi‑page jobs, but high‑res outputs increase API costs and OCR may struggle with stylized fonts or small text. Also consider privacy and compliance implications since pages and document text may be sent to Google and the tool can optionally allow Google Search during generation.

Loading comments...

loading comments...