Show HN: Epub2md – Turn ePub books into Markdown folders for LLM agents (github.com)

🤖 AI Summary
Epub2md is a small CLI tool that converts EPUB books into tidy, chapter-per-file Markdown folders ready for LLM ingestion. Install with pip and run epub2md book.epub (or specify an output folder) to get a numbered set of Markdown files (01-chapter-i.md, 02-chapter-ii.md, …) plus an images/ directory with extracted JPEGs. Images are git-ignored by default (remove book/images/.gitignore if you want to commit them). The tool requires Python 3.8+ and pandoc and is released under the MIT license. For the AI/ML community, Epub2md streamlines turning long-form text into model-friendly chunks for retrieval-augmented generation, embedding generation, fine-tuning, or indexing. Numbered filenames preserve reading order for context windows; separate images make multimodal preprocessing explicit; and per-chapter Markdown simplifies token-budgeting, annotation, and metadata addition. Its simple, dependency-light workflow makes it easy to integrate into corpus pipelines or agent toolchains that need clean, versionable source files rather than monolithic EPUBs.
Loading comments...
loading comments...