Internalizing Self-Consistency in LMs: Multi-Agent Consensus Alignment (arxiv.org)

🤖 AI Summary
Researchers introduce Multi-Agent Consensus Alignment (MACA), a reinforcement-learning post-training method that teaches language models to internalize self-consistency by favoring reasoning trajectories that align with their own multi-agent consensus. Instead of relying on inference-time fixes (e.g., majority voting over independent samples), MACA creates deliberative multi-agent exchanges where peers ground their reasoning in each other’s arguments. The framework uses majority/minority outcomes from these debates as a learning signal, encouraging models to adopt concise, decisive chains of thought that reflect the internal consensus rather than disparate, exploratory paths. Technically, MACA formalizes self-consistency as an intrinsic property and trains models to prefer trajectories consistent with peer-grounded consensus using reinforcement learning rewards derived from multi-agent outcomes. This unsupervised self-alignment yields substantial gains across reasoning benchmarks: +27.6% on GSM8K (self-consistency), +23.7% on single-agent MATH, +22.4% Pass@20 on MATH for sampling-based inference, and +42.7% for multi-agent ensemble decisions on MathQA. It also generalizes to unseen tasks (+16.3% GPQA, +11.6% CommonsenseQA), demonstrating that internalized consensus can unlock latent reasoning capability without external supervision and improve both single-agent and collaborative decision-making.
Loading comments...
loading comments...