More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models (arxiv.org)

🤖 AI Summary
Recent research has unveiled a surprising correlation between reasoning trajectory length and position bias in AI models used for multiple-choice question answering (MCQ). While chain-of-thought (CoT) reasoning and models like DeepSeek-R1 were expected to mitigate biases through enhanced reasoning, the study found that increased reasoning length is associated with a rise in position bias scores (PBS) across various configurations. The results stem from testing thirteen models on datasets like MMLU and ARC-Challenge, revealing that as the reasoning trajectory lengthens, the bias towards position-preferred options also intensifies, with PBS scores ranging from 0.11 to 0.41 in most configurations. This finding is significant for the AI/ML community as it challenges the assumption that reasoning improvements automatically safeguard against biases in model outputs. Rather than eliminating position bias, the study suggests that increased reasoning may merely shift its manifestation, indicating a need for more rigorous evaluation practices. The researchers propose a diagnostic toolkit—including measures like PBS and truncation probes—to audit and better understand this length-driven bias. Overall, the study underscores the importance of re-evaluating how reasoning capabilities are interpreted in bias assessments and suggests that models may not be as order-robust as previously thought in MCQ evaluation frameworks.
Loading comments...
loading comments...