In a First, AI Models Analyze Language as Well as a Human Expert (www.quantamagazine.org)

🤖 AI Summary
Researchers from UC Berkeley and Rutgers put several large language models through a rigorous battery of linguistics tests designed to rule out memorization, and found that one model — OpenAI’s o1 — performed at roughly the level of a graduate student in linguistics. The four-part evaluation included syntactic tree-diagramming (Chomskyan constituency trees), recursion and center-embedding challenges (e.g., nested clauses like “the cat the dog bit died”), ambiguity resolution by producing multiple valid parses, and phonological inference on 30 newly invented mini-languages (each with 40 nonce words). Because the test used original sentences and made-up phonologies, o1 couldn’t have learned the answers from training data yet it correctly analyzed deeply recursive structures, generated augmented embeddings, produced multiple parse trees for ambiguous sentences, and recovered phonological rules such as a vowel becoming “breathy” after a voiced obstruent. The result is significant because it challenges long-standing claims that LLMs merely mimic language without metalinguistic reasoning — a position voiced by some linguists including Chomsky. Technically, the work shows these models can internalize hierarchical syntactic structure and abstract phonological generalizations, not just surface token statistics. Caveats remain: most other models failed, o1 didn’t invent new linguistic theory, and limits tied to training objectives and generalization persist. Still, the study tightens benchmarks for probing reasoning, pushes questions about whether scale alone will yield human-level linguistic insight, and underscores the need to map where LLMs truly understand versus where they only approximate.
Loading comments...
loading comments...