Building a local LLM powered media search and organiser (ikouchiha47.github.io)

0 points 1 day ago ago | visit original

🤖 AI Summary

Cinestar is a privacy-first, local-first media search and organizer that makes images and long videos searchable on-device without cloud uploads. The project evolved from an Electron image-search prototype into a full video pipeline that preserves responsiveness by running only local AI models (Whisper, Llama 3.2 Vision, Ollama-hosted models, BGE-large embeddings) and local storage (SQLite + sqlite-vec, FTS5). Its significance lies in enabling immediate, multimodal search (audio + visual + scene/context) for sensitive media while avoiding the latency and privacy trade-offs of cloud services. Technically, stability and UX were solved with two key architectural choices: a resource-aware ffmpeg pool (limiting concurrent ffmpeg instances to avoid system overload) and a CQRS-inspired split between a fast read path (search API querying vector.db and main.db) and an asynchronous write path (JobQueue → VideoJobProcessor/ImageJobProcessor). Videos are chunked into 5-minute segments for Phase 0 audio transcription (searchable in ~3s/segment), then enriched with visual captioning, scene reconstruction using an RNN-style sliding-window context, and three refinement passes (thresholds 0.8 → 0.6 → 0.4). Searches use a hybrid ranker (70% vector similarity, 30% FTS, α=0.7) with modality-aware boosts, producing results like “romantic scene in dimly lit room” while keeping the UI snappy and processing scalable on local machines.

Loading comments...

loading comments...