What I don’t like about chains of thoughts (2023) (samsja.github.io)

🤖 AI Summary
A blog post argues that chain-of-thought (CoT) prompting — having autoregressive, next-token LLMs “think out loud” — is a powerful but fundamentally inefficient hack. Because autoregressive models pay a forward-pass cost per output token, CoT simply expands the model’s computation budget by forcing it to generate many intermediate tokens. That lets a model solve harder tasks (e.g., generating ten primes >100,000 vs. listing ten numbers), but it does so by serializing internal computation into language. The author uses simple complexity intuitions and vivid examples (Messi’s sub‑second tactical moves, animal problem‑solving) to argue that real-time, nonverbal reasoning is far denser and faster than tokenized inner speech. For AI/ML researchers this frames a clear technical implication: language is a communication-optimized tokenization, not an ideal substrate for internal reasoning. Moving reasoning into task-specific latent/embedding spaces (analogous to image generation moving from pixels to latents) could yield big gains in throughput and capability per FLOP. Practically, this motivates research into latent planning modules, nonlinguistic internal representations, hybrid neuro-symbolic systems, and architectures that decouple internal computation from surface token generation. CoT is an important practical breakthrough, but the post warns it’s unlikely to be the final or most efficient path toward more capable reasoning systems.
Loading comments...
loading comments...