🤖 AI Summary
Spotty is an open-source project that turns a Boston Dynamics Spot into a conversational, vision-aware robot by combining voice activation, navigation, and multimodal scene understanding. The system uses Picovoice wake-word detection ("Hey Spot"), OpenAI Whisper for STT, and TTS plus a conversational memory layer to handle natural-language queries. Navigation is built on Boston Dynamics' GraphNav with waypoint auto-labeling, location-based commands ("Go to kitchen"), and object search. Vision capabilities include scene description, visual question answering, and object detection via a GPT-4o-mini + CLIP pipeline. A multimodal RAG (retrieval-augmented generation) stack with FAISS provides location-context grounding for responses and navigation decisions.
Technically, Spotty is Python 3.8+ and requires the Spot SDK, OpenAI and Picovoice keys. The repo (github.com/vocdex/SpottyAI) includes orchestrator modules (UnifiedSpotInterface, GraphNav Interface, Audio Interface, Vision System, RAG Annotation), utility scripts to record maps, auto-label waypoints (CLIP prompts), build vector DBs, and visualize setups. Example voice commands cover activation, movement, object search, and VQA ("What do you see?"). Distributed under an MIT license and intended for research/education, Spotty demonstrates a practical integration of LLMs, multimodal models, and robotic navigation—lowering the barrier to experimenting with conversational, context-aware mobile robots in real environments.
Loading comments...
login to comment
loading comments...
no comments yet