Experiments in Autonomous AI Development (kenforthewin.github.io)

🤖 AI Summary
An engineer who began 2025 skeptical of LLM coding tools built an experimental autonomous development platform (Matic) and ran end-to-end agent workflows on real repos to test whether AI can truly own software tasks. Matic is a Phoenix/Elixir web app that composes Agents into directed execution graphs; each Agent has a system prompt, model (OpenRouter used here), toolset (including a Docker-hosted MCP server with bash, file editing/search, and a spawn_subagent tool), and structured outputs used to hand off work. Typical flows clone a repo, create a branch, spawn Developer sub-agents to implement tasks, commit via AI-generated commit messages, and open PRs — all without human coding intervention. Results show small greenfield apps are now plausible to automate: experiments (falling-sand cellular automata prompt) completed in minutes at cents-to-dollars cost, with quality and resource use varying dramatically by model pair (Claude wrote richer docs/tests; GLM 4.6 was unexpectedly cheapest and fastest; Sonnet 4.5/GPT-5 reduce error rates). Key limitations remain: context-window management, LLM myopia/“priming,” and coordination in large monorepos. The author proposes repo-native context artifacts (AGENTS.md, structured context docs) and smarter agent orchestration as the next steps. The takeaway: autonomous, end-to-end AI development is rapidly maturing for smaller tasks, shifting developer roles toward architecting agent workflows and context-management tooling.
Loading comments...
loading comments...