Gacua: An open-source computer use agent with one-command start (github.com)

🤖 AI Summary
GACUA (Gemini CLI as Computer Use Agent) is a newly released open-source computer use agent that offers a seamless, one-command startup experience, empowering users to control and automate tasks on their computers with ease. Built on Node.js and leveraging Gemini 2.5 Pro’s enhanced grounding capabilities via a novel “Image Slicing + Two-Step Grounding” method, GACUA provides precise task execution along with transparent, step-by-step control that allows users to review, accept, or reject each proposed action. This openness contrasts with typical black-box agents, significantly boosting user trust and oversight. A standout feature of GACUA is its decoupled architecture, separating the "Brain" (which processes commands and requires Gemini API access) from the "Body" (which executes commands on a controlled machine). This design enables remote operation across devices on the same network or even different networks with stable connections, without conflicting over direct peripheral control. GACUA acts as a local web server, letting users manage their computer from mobile phones or other devices seamlessly. The project also supports extensibility through a pluggable agent architecture, autonomous sub-task tool creation for efficiency, and future plans for a CLI mode and advanced prompt management, positioning GACUA as a promising platform for developing customizable AI-powered computer assistants. Licensed under Apache 2.0, GACUA marks a significant step forward in accessible, transparent, and flexible AI-driven automation for developers and enthusiasts alike.
Loading comments...
loading comments...