Program-of-Thought Prompting Outperforms Chain-of-Thought by 15% (2022) (arxiv.org)

0 points 225 days ago ago | visit original

🤖 AI Summary

Researchers introduce Program of Thoughts (PoT), a prompting method that separates symbolic computation from natural-language reasoning to improve numerical problem solving in LMs. Instead of using chain-of-thought (CoT) where the model performs both reasoning and arithmetic inside text, PoT prompts code-producing models (primarily Codex) to emit a short program encoding the reasoning steps; an external executor runs the program to perform arithmetic and return the result. Evaluated in few-shot and zero-shot settings across five math datasets (GSM8K, AQuA, SVAMP, TabMWP, MultiArith) and three financial QA sets (FinQA, ConvFinQA, TATQA), PoT yields roughly a 12% average accuracy gain over CoT. When combined with self-consistency decoding over multiple program samples, PoT reaches state-of-the-art on math benchmarks and near-SOTA on financial tasks. The approach matters because it reduces numeric hallucination and precision errors by outsourcing computation to deterministic executors, while preserving the model’s high-level reasoning. Technically, PoT leverages code-generation strengths of LMs to make reasoning interpretable, debuggable, and reproducible—enabling ensemble/self-consistency on program outputs and the option to swap in stronger execution engines. Trade-offs include reliance on program synthesis quality and executor correctness, and potential limits for problems not easily expressed as short programs, but PoT signals a practical path to more reliable numerical reasoning with LMs.

Loading comments...

loading comments...