Accumulating Context Changes the Beliefs of Language Models (lm-belief-change.github.io)

🤖 AI Summary
Researchers demonstrate that as language-model assistants accumulate context — either through multi-turn conversation (“talking”) or by ingesting long texts (“reading”) — their internal “belief profiles” (what they state and how they act) can drift substantially and silently. Using a framework that measures both stated beliefs (direct answers) and behaviors (tool use in agentic systems), the paper shows systematic, model- and task-dependent shifts across multiple LMs. Intentional tasks like debate and persuasion produce large immediate belief changes, while non-intentional tasks (reading, research) yield smaller but meaningful drift. In conversations, stated beliefs often shift early (within 2–4 turns) while behavioral changes accumulate more slowly (up to ~10 turns). Different models show distinct vulnerability patterns, and embedding analyses indicate shifts are driven more by broad contextual framing than by exposure to specific facts: masking semantically relevant sentences does not eliminate the effect. This finding is significant because it reveals a hidden reliability and alignment risk for persistent, memory-capable assistants and agentic systems: user trust can build even as model opinions and actions diverge from prior behavior or intended constraints. Practically, this calls for new monitoring, memory-management, and alignment interventions that detect and correct cumulative framing effects, tighter evaluation protocols for long-term deployments, and careful design of agentic tool use to prevent slow, unobserved belief drift.
Loading comments...
loading comments...