🤖 AI Summary
unpdf is a cross-runtime utility library that bundles a serverless-optimized build of Mozilla’s PDF.js to provide reliable PDF extraction and rendering across Node.js, Deno, Bun, browser and edge/serverless environments. It’s pitched as a modern, zero-dependency alternative to pdf-parse and is aimed squarely at serverless AI applications—think document ingestion, summarization, link extraction and image extraction for downstream LLM pipelines. The package ships with a custom serverless PDF.js (built from v5.4.149) that inlines the worker and applies platform tweaks (string replacements, global mocks) so it runs on edge runtimes out of the box; you can also opt into the official or legacy PDF.js build if needed.
Key features and technical notes: high‑level APIs include extractText (optionally mergePages to return a single string), extractLinks, extractImages (returns raw pixel buffers with width/height/channels), renderPageAsImage (ArrayBuffer or data URL), and getResolvedPDFJS/definePDFJSModule for low‑level access. Image rendering in Node requires the @napi-rs/canvas package and the official PDF.js build; unpdf’s serverless bundle includes a polyfill for PDF.js v5.x’s Promise.withResolvers (important for Node <22). The library simplifies PDF preprocessing for AI workflows by removing build friction on edge platforms while preserving full access to PDF.js when you need more control.
Loading comments...
login to comment
loading comments...
no comments yet