Revisiting Karpathy's 'Unreasonable Effectiveness of Recurrent Neural Networks' (www.gilesthomas.com)

0 points 2 days ago ago | visit original

🤖 AI Summary

The author revisits Andrej Karpathy’s 2015 viral blog post “The Unreasonable Effectiveness of Recurrent Neural Networks,” re-implementing the experiments in PyTorch (using the built-in LSTM) and reflecting on why the old RNN demos—like Shakespeare- and TV-script‑style text generation—captured public imagination. The writeup highlights the post’s combination of clear exposition, striking sample outputs, and early interpretability findings (individual neurons tracking syntax/structure), and notes Karpathy’s early nod to attention as a key innovation that would soon reshape the field. Technically, the piece contrasts RNNs/LSTMs with modern Transformer-based LLMs. RNNs process sequences token-by-token, maintaining a fixed-size hidden state that acts as memory, which in practice creates a “fixed-length bottleneck” (limited by floating-point precision) despite theoretical Turing-completeness. Transformers, introduced by “Attention Is All You Need” (2017), ingest whole sequences as tensors and use attention/context vectors that scale with token count to avoid that bottleneck, at the cost of higher computation: attention makes training/inference complexity scale roughly O(n^2) with sequence length, whereas RNN inference uses constant space and linear time. The post teases follow-ups: a PyTorch writeup, a vanilla RNN implementation, and a handcrafted LSTM, aiming to clarify where old designs still matter and why Transformers became dominant.

Loading comments...

loading comments...