The quality of AI-assisted software depends on unit of work management (blog.nilenso.com)

🤖 AI Summary
AI-assisted software quality hinges less on raw model intelligence and more on managing the “unit of work” and its context. The article argues that context engineering — how much and which parts of code, docs, and requirements are presented to an LLM — determines output quality because models generate token-by-token against a context window. Small, focused units reduce hallucination and brittle integration failures; large or noisy contexts dilute attention. The author quantifies error compounding: an agent with a 5% per-action error has only ~59.9% success over 10 turns, and points out that METR’s reported long-horizon wins (e.g., GPT-5 doing ~2-hour tasks at ~70% success) underestimate real-world “messiness” (METR mean 3.2/16 vs. software’s 7–13/16), which could cut that success rate to ~40%. The proposed remedy is to break projects into “right-sized” units that are human-legible and deliver business value — user stories augmented with additional context to guide agents — plus verifiable, human-readable checkpoints to limit error propagation. Planning modes in agents are useful but often only provide technical scaffolding; the missing piece is a unit-of-work spec that nudges agents to gather the right context for business outcomes. To explore this, the author launched StoryMachine, an experiment that turns PRDs and tech specs into story cards and will iterate on evaluation metrics to find optimal unit descriptions that make AI-assisted development more reliable and auditable.
Loading comments...
loading comments...