🤖 AI Summary
Cinestar is a privacy-first, local-first media search and organizer that makes images and long videos searchable on-device without cloud uploads. The project evolved from an Electron image-search prototype into a full video pipeline that preserves responsiveness by running only local AI models (Whisper, Llama 3.2 Vision, Ollama-hosted models, BGE-large embeddings) and local storage (SQLite + sqlite-vec, FTS5). Its significance lies in enabling immediate, multimodal search (audio + visual + scene/context) for sensitive media while avoiding the latency and privacy trade-offs of cloud services.
Technically, stability and UX were solved with two key architectural choices: a resource-aware ffmpeg pool (limiting concurrent ffmpeg instances to avoid system overload) and a CQRS-inspired split between a fast read path (search API querying vector.db and main.db) and an asynchronous write path (JobQueue → VideoJobProcessor/ImageJobProcessor). Videos are chunked into 5-minute segments for Phase 0 audio transcription (searchable in ~3s/segment), then enriched with visual captioning, scene reconstruction using an RNN-style sliding-window context, and three refinement passes (thresholds 0.8 → 0.6 → 0.4). Searches use a hybrid ranker (70% vector similarity, 30% FTS, α=0.7) with modality-aware boosts, producing results like “romantic scene in dimly lit room” while keeping the UI snappy and processing scalable on local machines.
Loading comments...
login to comment
loading comments...
no comments yet