Experimenting with ACL2 and Claude Code (mikedodds.org)

0 points 6 hours ago ago | visit original

🤖 AI Summary

In a short, hands‑on experiment the author used Claude Code (by prompting only, with zero lines of hand-written code) to produce 50+ ACL2 theorem proofs lifted and translated from Software Foundations book 1 and to build an MCP server in Python for iterative ACL2 development. The MCP server (15 helper tools) implements persistent, stateful solver sessions with checkpoint/rollback support so you can save prover state, try risky proof steps, and revert if needed. Total time was roughly 3–4 hours; Claude Code also generated feedback-driven improvements and two rounds of security audits (though the setup remains unsafe without containerization). The result is notable because ACL2 is a lower‑representation ITP compared with systems like Lean, yet the model learned syntax and proof patterns via in‑context learning well enough to handle introductory induction, list, and polymorphic-style theorems. A revealing technical highlight was the fold-product-append proof: success required selective theory control—temporarily disabling commutativity-of-* during the main induction and re‑enabling commutativity/associativity only at a specific subgoal—illustrating how aggressive rewriting can obscure needed algebraic structure and why proof engineering often needs careful orchestration. The experiment suggests large models can bridge representation gaps for entry‑level formal work, but scaling to research‑level mathematics or industrial verification (and secure deployment) remains an open question.

Loading comments...

loading comments...