Show HN: Speak to AI – offline speech-to-text for Linux (alternative to Dragon) (github.com)

0 points 5 hours ago ago | visit original

🤖 AI Summary

Speak-to-AI is a minimalist, privacy-first desktop app that brings offline speech-to-text to Linux by running OpenAI’s Whisper model locally (via whisper.cpp). Built in Go and distributed as an AppImage (Flatpak planned), it converts voice into text directly into any active window—editors, browsers, IDEs or AI assistants—using automatic typing on X11 (xdotool) and optional ydotool support for Wayland. Features include multi-language recognition, global hotkeys, system tray integration (GNOME/KDE), visual notifications, a WebSocket API for automation, and a clipboard fallback if typing isn’t available. The project is MIT-licensed, documented (ARCHITECTURE.md, DEVELOPMENT.md), and solicits testing and contributions on GitHub. Technically, Speak-to-AI targets modest desktop resources: the release bundles a quantized Whisper small q5 model (~277 MB on disk) and typically uses ~300 MB RAM during operation; it requires an AVX-capable CPU (Intel/AMD 2011+). Using whisper.cpp keeps inference local and efficient, preserving privacy and avoiding cloud latency/costs—an important alternative to cloud-based dictation tools like Dragon. Caveats include Wayland typing requiring extra setup (ydotool) and user membership in the input group for hotkey functionality. For developers and privacy-conscious users, it’s an approachable, extensible option for local voice typing and integration into workflows.

Loading comments...

loading comments...