🤖 AI Summary
Epub2md is a small CLI tool that converts EPUB books into tidy, chapter-per-file Markdown folders ready for LLM ingestion. Install with pip and run epub2md book.epub (or specify an output folder) to get a numbered set of Markdown files (01-chapter-i.md, 02-chapter-ii.md, …) plus an images/ directory with extracted JPEGs. Images are git-ignored by default (remove book/images/.gitignore if you want to commit them). The tool requires Python 3.8+ and pandoc and is released under the MIT license.
For the AI/ML community, Epub2md streamlines turning long-form text into model-friendly chunks for retrieval-augmented generation, embedding generation, fine-tuning, or indexing. Numbered filenames preserve reading order for context windows; separate images make multimodal preprocessing explicit; and per-chapter Markdown simplifies token-budgeting, annotation, and metadata addition. Its simple, dependency-light workflow makes it easy to integrate into corpus pipelines or agent toolchains that need clean, versionable source files rather than monolithic EPUBs.
Loading comments...
login to comment
loading comments...
no comments yet