Show HN: DOS – a referee between AI agents that doesn't believe their "done" (github.com)

🤖 AI Summary
A new tool called DOS was showcased on HN, designed to serve as a referee for AI agents claiming to have completed tasks. Rather than relying on an agent's assertion that a task is "done," DOS uses real-world metrics, particularly the git history, to verify claims. For instance, when an agent states that it has shipped a login endpoint, DOS checks the actual git commits to confirm whether the work has been correctly implemented. This approach minimizes the risk of unchecked failures and misreporting by using an exit code system; if the claim is backed by a commit, it exits 0 (SHIPPED), otherwise, it exits 1 (NOT_SHIPPED). The significance of DOS for the AI/ML community lies in its ability to improve workflow reliability by preventing agents from grade their own work. By integrating DOS into CI/CD pipelines, teams can effectively manage multiple agents working simultaneously, ensuring accurate accountability and reducing silent failures or overlaps in task execution. The open-source tool is built on Python and requires minimal setup, making it easily adoptable. It also provides insights into efficiency by flagging stalled processes and identifying claim discrepancies, thereby streamlining overall task management and enhancing software quality.
Loading comments...
loading comments...