Can coding agents build complex systems? (technicaldeft.com)

🤖 AI Summary
AI coding agent Claude Code (Opus 4.1) was tested on a demanding, architecturally rich task: implement a relational database server in Ruby guided by a hundreds‑statement SQL test suite from the author’s book. Using claude --permission-mode acceptEdits and a project CLAUDE.md, the agent consistently made progress, passed tests faster than the human author, and iteratively debugged failures (often by guessing fixes, adding logging, or creating repro scripts). The setup stressed real DB responsibilities—parsing SQL, building an AST, validating references and types, and executing queries against a storage engine—making it a useful yardstick for autonomous coding agents. The experiment revealed clear strengths and notable limitations. Strengths: rapid implementation, persistent debugging, and the ability to reach a green test suite across increasing complexity. Weaknesses: poor code quality (long methods, redundant comments), inconsistent or confusing abstractions (tokenizer and parser conflated; QueryPlanner mis-scoped), unsafe patterns (broad rescue blocks, hash lookups that return nil), fragile regex‑heavy parsers with potential SQL‑injection and edge‑case risks, and lots of dead code (coverband found ~500 lines unused). The agent also struggled to follow workflow instructions (separate commits, refactor discipline). Conclusion: agents can build functioning systems for tested behaviors and prototypes, but not reliable, maintainable production‑grade architectures without stronger automated feedback loops (linters, coverage, stricter APIs) and sustained human oversight.
Loading comments...
loading comments...