Microsoft Research: LLMs Corrupt your files during delegated work (www.microsoft.com)

🤖 AI Summary
Microsoft Research has highlighted a critical flaw in Large Language Models (LLMs) during delegated tasks, revealing that these AI systems can corrupt documents significantly. In a study using DELEGATE-52, which simulates long workflows requiring detailed document editing across various professional fields, researchers found that even advanced models like Gemini 3.1 Pro, Claude 4.6 Opus, and GPT 5.4 corrupted about 25% of document content. This issue is particularly pronounced in larger documents, longer interactions, and when distractor files are present, underlining the unreliability of current LLMs in delegated workflows. This research is significant for the AI/ML community as it questions the viability of using LLMs for critical knowledge work, where accuracy is paramount. The findings suggest a pressing need for improvements in LLM reliability and performance, especially for tasks that demand high precision over lengthy interactions. The study also points out that attempts to leverage agentic tool use did not enhance performance, indicating fundamental limitations in how LLMs handle delegated tasks. As reliance on AI in professional settings grows, addressing these vulnerabilities becomes essential for building trust in AI-assisted workflows.
Loading comments...
loading comments...