Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview (github.com)

0 points 61 days ago ago | visit original

🤖 AI Summary

Dirac, an open-source coding agent, has achieved remarkable success by topping the Terminal-Bench-2 leaderboard with a score of 65.2% while utilizing the Gemini-3-flash-preview model. This performance surpasses both Google's baseline of 47.6% and the closed-source agent Junie CLI, which scored 64.3%. The key to Dirac's efficiency lies in its innovative approach to context management; it leverages hash-anchored parallel edits and Abstract Syntax Tree (AST) manipulation to enhance reasoning accuracy while significantly cutting API costs by an average of 64.8%. What makes Dirac particularly noteworthy for the AI/ML community is its capacity to maintain high accuracy—achieving 100% correctness across various complex software tasks—while reducing operational costs. The agent's ability to perform multi-file edits in a single round trip reduces latency and streamlines the development process. Dirac can autonomously manage tasks such as file reading/writing and executing terminal commands while keeping human oversight in its approval-based workflow. With its open-source nature and reproducible benchmarks, Dirac promotes accessibility and innovation in the AI coding landscape, paving the way for advancements in tool efficiency and resource management.

Loading comments...

loading comments...