LLMs achieve adult human performance on higher-order "theory of mind" tasks (pmc.ncbi.nlm.nih.gov)

0 points 140 days ago ago | visit original

🤖 AI Summary

Recent research has revealed that large language models (LLMs) like GPT-4 and Flan-PaLM can achieve or nearly achieve adult human performance on higher-order theory of mind (ToM) tasks, demonstrating complex reasoning about mental states for the first time. This study introduces a new benchmark, the Multi-Order Theory of Mind Q&A (MoToMQA), which tests LLMs on ToM reasoning from orders 2 to 6, alongside a human dataset. Notably, GPT-4 exceeded adult performance on 6th-order inferences, highlighting a significant advancement in LLM capabilities. These findings are pivotal for the AI/ML community as they suggest that LLMs not only understand language but also possess the potential for sophisticated social reasoning akin to humans. This could enhance the design of more effective and personalized social AI agents, enabling a broader range of applications in education, therapy, and social interaction. The interplay between model size and fine-tuning, as identified in this study, indicates that further refinement of LLMs could lead to even more advanced cognitive functions, expanding the scope of AI's role in society and potentially transforming how these technologies interact with users in complex environments.

Loading comments...

loading comments...