Code research projects with async coding agents like Claude Code and Codex (simonwillison.net)

0 points 17 hours ago ago | visit original

🤖 AI Summary

Simon Willison describes a practical pattern for "code research" using asynchronous coding agents (Claude Code, Codex and similar): fire-and-forget agents that run experiments, commit work and open pull requests to a dedicated GitHub repository. Rather than asking for explanations, you give the agent a clear research goal and an environment where it can fetch dependencies, run tests and produce artifacts (reports, charts, JSON, code). Because the agents execute code, mistakes and hallucinations become less harmful—if the code runs and tests pass, the result is evidence rather than just prose—so Willison is running multiple projects a day with minimal supervision. Technically the setup favors a separate public/private repo with full network access so agents can install packages, fetch web data and, when needed, run heavy workflows (e.g., emscripten to compile C extensions to WebAssembly). Examples include a benchmark that found cmarkgfm fastest among Python Markdown libraries, porting cmarkgfm into Pyodide (compiling a wheel and loading it in Node.js), and a scikit-learn tag-suggestion pipeline that produced JSON results and scripts. Caveats: agents can’t prove impossibility, they still hallucinate and require human review, and unrestricted access raises prompt-injection and data-leak risks—hence the recommendation to quarantine work in non-sensitive repos and review outputs before publication.

Loading comments...

loading comments...