A Graph Explorer of Epstein Emails (github.com)

0 points 1 hour ago ago | visit original

🤖 AI Summary

A searchable, interactive "Graph Explorer" has been released that ingests the Jeffrey Epstein document corpus, uses Claude (Anthropic) to extract entities, actions and events into RDF-style triples, then visualizes the resulting knowledge graph in a React/D3 force-directed UI. The live demo (deployed on Render) and repo include an analysis pipeline that converts PDFs to JSON, runs LLM extraction, tags triples with contextual metadata (legal, financial, travel, etc.), deduplicates entity mentions, and presents actor-centric views, a timeline browser, and a full-text document viewer. The system currently surfaces 15,000+ relationships, supports incremental processing of new docs, and preserves provenance (document links, timestamps, and AI-generated summaries). For AI/ML practitioners this is a practical, production-grade example of combining LLM extraction, embedding-based clustering, and scalable visualization. Key technical choices: Claude for extraction, Qwen3-Embedding-0.6B-ONNX for cached embeddings, K-means++ (cosine distance) to map 28,000+ tags into 30 semantic clusters, LLM-assisted entity deduplication, and materialized top-3 cluster IDs in SQLite for 10x faster filtering. The stack uses TypeScript, Express + better-sqlite3, React/Vite/Tailwind, react-force-graph-2d and D3; API endpoints expose stats, cluster metadata, filtered relationships, and document text. The project demonstrates how to turn messy legal corpora into structured, explorable graphs while addressing performance (indexing, materialized columns, rate limits) and incremental update workflows.

Loading comments...

loading comments...