Show HN: We Built Mix, a Multimodal Agents SDK (github.com)

0 points 5 hours ago ago | visit original

🤖 AI Summary

Mix is a new multimodal agents SDK that provides an “agentic backbone” for building apps that combine text, images, audio and video. Unlike CLI-first toolkits, Mix treats the SDK as a first-class product and ships Python and TypeScript SDKs plus an HTTP backend so the frontend is just one possible client. It includes a GUI playground (built from the TypeScript SDK) for testing and debugging workflows, example scripts for multimodal search, TikTok-style short-video assembly, and dynamic system-prompt swapping, and stores all project data as plain text and native media files to avoid vendor lock‑in. Technically, Mix runs as an HTTP server with REST-able SDK clients, supports hosted SQLite/LibSQL (e.g. turso), and integrates standard multimedia tooling (ffmpeg, gsap) and LLM providers (recommended: Claude-sonnet-4 via OAuth or Anthropic API key; Gemini for the ReadMedia tool; Brave for Search). That combination makes it easy to orchestrate multimodal pipelines (search → fetch media → clip/animate → export) and to swap LLMs or tools without changing storage or app logic. For ML/AI developers, Mix lowers the barrier to building, debugging and extending multimodal agent workflows while keeping projects portable and extensible.

Loading comments...

loading comments...