Formal or not formal? That is the question in AI for theorem proving (xenaproject.wordpress.com)

0 points 4 days ago ago | visit original

🤖 AI Summary

Recent days have sharpened a practical divide in AI-for-mathematics: “informal” systems that ask LLMs to write human-style proofs (then rely on human graders) versus “formal” hybrids where an LLM generates code in a proof assistant (Lean) and the checker verifies correctness. The hybrid route produced striking headlines (e.g., the Erdos 707 development using ChatGPT+Lean), but the author warns against overclaiming: LLMs are sycophantic, rarely say “I don’t know,” hallucinate single fatal errors, and are already flooding journals with plausible-but-wrong papers. Numeric-answer benchmarks that mark a problem solved if any agent ever hits the right number also give a misleading impression of real mathematical progress. Technically, the biggest bottleneck for formal approaches is not proof search but coverage: modern research math concepts (Tate–Shafarevich groups, Calabi–Yau varieties, algebraic stacks, automorphic representations, etc.) mostly aren’t encoded in major libraries like mathlib, so many theorems can’t even be stated formally. Fix options—manual formalization by mathematicians (slow, undervalued in academia), LLM-generated definitions (risk of subtly incorrect but type-checking code), or scaling community funding for Mathlib—each have trade-offs. The author’s group, supported by a grant, is manually encoding research-level definitions in Lean. Bottom line: LLM+proof-checker hybrids are the most promising near-term path, but broad, reliable AI assistance requires a large, careful investment in formalizing modern mathematical definitions.

Loading comments...

loading comments...