Peer-reviewed by human experts: AI failed in key steps (link.springer.com)

0 points 35 days ago ago | visit original

🤖 AI Summary

A recent study has critically evaluated the performance of Large Language Models (LLMs), specifically Gemini 2.5 Pro, in generating evidence synthesis reviews for academic publishing. The authors prompted the LLM to create a scoping review on the neural mechanisms of cross-education and subsequently submitted the unedited manuscript for peer review to four experts in the field. Their feedback revealed significant shortcomings, such as the model's inability to clearly identify research questions, follow methodological frameworks, or conduct trustworthy literature searches. Notably, issues of referencing accuracy and potential plagiarism were highlighted, with the LLM showing a bias towards open-access sources and failing to hierarchically present evidence. These findings raise crucial concerns for the AI/ML community, particularly as LLMs are increasingly utilized in scientific writing and research. The inability of the LLM to independently produce a coherent and reliable manuscript emphasizes the need for human oversight in the academic publishing process. As LLMs continue to evolve, understanding their limitations will be vital for developing responsible guidelines and improving their integration into scholarly work, ensuring the integrity, accuracy, and ethical standards of academic literature.

Loading comments...

loading comments...