Culturally transmitted color categories in LLMs reflect efficient compression (arxiv.org)

🤖 AI Summary
Researchers tested whether large language models implicitly develop human-like semantic categories by examining color naming and cultural transmission under the Information Bottleneck (IB) complexity–accuracy tradeoff—a principle that human languages appear to optimize for near-optimal compression. They replicated two classic behavioral paradigms with Gemini 2.0-flash and Llama 3.3-70B-Instruct: (1) an English color-naming task, where Gemini’s naming patterns closely matched native speakers and scored highly on IB-efficiency, while Llama produced a simpler but still efficient system; and (2) simulated cultural evolution via iterated in-context learning, where both models repeatedly learned and reproduced pseudo color-naming systems. Over iterations, initially random systems converged toward greater IB-efficiency and patterns seen across real-world languages. The work shows LLMs—despite not being trained to optimize IB—exhibit an inductive bias toward efficient, perceptually grounded semantic systems rather than merely parroting training data. Key technical points: use of IB-efficiency as an evaluation metric, replication of human iterated learning through in-context prompting, and comparisons across two contemporary LLM architectures. Implications span cognitive modeling, language evolution, and interpretability: LLMs can serve as testbeds for emergent semantic biases and may reveal how compression-driven principles arise from large-scale predictive training.
Loading comments...
loading comments...