A Comprehensive Survey on Trustworthiness in Reasoning with LLMs (arxiv.org)

🤖 AI Summary
A comprehensive new survey explores the trustworthiness of reasoning with Large Language Models (LLMs), focusing on chain-of-thought (CoT) techniques that enable models to produce intermediate reasoning steps. By enhancing accuracy and interpretability in tasks like language understanding, problem solving, and code generation, Long-CoT represents a significant leap in LLM capabilities. However, the survey highlights a crucial gap: despite these advances, the implications of CoT-based reasoning on model trustworthiness remain underexamined across five key dimensions—truthfulness, safety, robustness, fairness, and privacy. The paper systematically reviews recent research chronologically, analyzing how reasoning methods impact these trust factors. While CoT strategies show promise in reducing hallucinations, detecting harmful content, and improving robustness, the survey reveals that current reasoning models continue to face serious vulnerabilities, particularly in safety and privacy. This dichotomy underscores an ongoing challenge for the AI/ML community: advances in interpretability and reasoning do not automatically translate to enhanced security or fairness. By synthesizing these insights and outlining future research directions, the work serves as an essential resource for AI safety researchers, helping to guide the development of more trustworthy reasoning systems in LLMs.
Loading comments...
loading comments...