🤖 AI Summary
Researchers demonstrate that, given the same token and compute budget, running fewer reasoning chains that iteratively refine answers (sequential scaling) outperforms the dominant parallel self-consistency approach (many independent chains + majority vote). Across five state-of-the-art open-source LLMs and three challenging reasoning benchmarks, sequential refinement beat parallel sampling in 95.6% of tested configurations, with accuracy improvements as large as 46.7%. The comparison was done at matched compute/token budgets to isolate the effect of chaining style rather than extra resources, directly challenging the prevailing inference-time orthodoxy established by self-consistency decoding.
They also introduce inverse-entropy weighted voting, a simple training-free aggregation that weights candidate answers by the inverse entropy of their reasoning chains (i.e., lower-entropy chains get more weight). This mechanism further amplifies sequential scaling’s advantage over majority voting, suggesting that chain-level confidence/consensus is a useful proxy for correctness. Implications are practical and immediate: inference strategies should favor sequential refinement and entropy-aware aggregation when optimizing for reasoning accuracy under fixed compute, prompting a rethink of deployment choices (chain length vs. parallel samples), decoding algorithms, and test-time ensembling in LLM-based reasoning systems.
Loading comments...
login to comment
loading comments...
no comments yet