ThalamusDB: Query text, tables, images, and audio (github.com)

0 points 3 hours ago ago | visit original

🤖 AI Summary

ThalamusDB is a new approximate query engine that extends standard SQL with semantic operators to query multimodal data (text, images, audio) using large language and multimodal models. Installable via pip and runnable on DuckDB, it treats file-path columns as images or audio and supports NLfilter and NLjoin predicates that let you express natural-language conditions (e.g., "the car in the picture is red" or "the car is from a German manufacturer"). It works with models from multiple providers (OpenAI, Google, etc.), configured per-modality and per-operator via a JSON model configuration (modalities, priority, and kwargs like model id and reasoning_effort). The repo includes a cars example and a Google Colab demo for quick experimentation. Technically notable is ThalamusDB’s approximate processing design: queries are evaluated progressively and return intermediate results with quantified uncertainty. Aggregation queries report lower/upper bounds; retrieval queries return rows that appear in all possible results. Error is computed from the bounds (aggregates) or from the ratio of maximal to intersection rows (retrieval), and execution can stop according to configurable criteria (max_seconds default 600, max_calls 100, max_tokens 1,000,000, max_error default 0.0). This combination of SQL, multimodal LLM-driven predicates, and principled approximate semantics makes ThalamusDB useful for scalable, interactive exploration of large unstructured datasets.

Loading comments...

loading comments...