🤖 AI Summary
Andrew Gallagher warns against “vibe coding” unit tests with LLMs—i.e., auto-generating large suites of tests without deliberate design. Drawing on a React Button example, he shows that agents (Claude, etc.) typically emit dozens of brittle, noisy tests (~30 tests, ~200 LOC) that mostly assert “what the code does” (rendering, class names, forwarded attributes, event calls) rather than “what the code should do.” LLMs rarely ask clarifying questions and tend to verify every detail, producing huge spec files that lock in implementation details, blow up PR size, and amplify maintenance burden.
For the AI/ML and engineering community this matters because autogenerated tests harm downstream workflows: they consume valuable context-window and semantic-search budget for developer agents, surface high-ranked but low-signal artifacts, and create fragile coverage that requires constant updates when code evolves. Gallagher concedes LLMs shine at abstract, algorithmic tests, but for product code the right approach is human-guided testing: write focused tests one at a time, instruct the agent to test specific behaviors, validate each test, and prefer minimal, intent-driven assertions. The practical takeaway: use LLMs as assistants, not test-autopilots—quality over quantity reduces brittleness, PR friction, and long-term tech debt.
Loading comments...
login to comment
loading comments...
no comments yet