Autoreview: The Dragon Hatchling – The Missing Link Between the Transformer and (arxiviq.substack.com)

🤖 AI Summary
The paper introduces Dragon Hatchling (BDH), a new LLM architecture that reframes inference as local, biologically inspired graph dynamics rather than global matrix multiplications. Core to BDH is an edge-reweighting kernel that blends a modus-ponens–like inference rule with Hebbian-style synaptic potentiation: the model’s dynamic state lives on neuron-to-neuron edges (synapses) which are updated locally. To run on modern hardware the authors present BDH-GPU, a tensor-friendly state-space instantiation that scales mainly in a single, large neuronal dimension n and matches GPT-2–style performance across 10M–1B parameter regimes. Technically, BDH-GPU uses high-dimensional linear attention and a ReLU-lowrank feed-forward block (z → (D E z)+) that enforces positive, highly sparse activations (~5% nonzero). Training yields emergent, scale-free modular connectivity and “monosemantic” synapses—individual links that reliably encode abstract concepts—giving unusually direct interpretability and locality of state. The uniform n-scaling also enables simple model composition by concatenating neuron dimensions, preserving constituent knowledge without retraining. Important caveats: learning still depends on backpropagation through time and BDH-GPU relies on a mean-field approximation of the theoretical local dynamics. Still, BDH offers a practical micro-to-macro bridge between Transformers and brain-like computation, pointing toward more predictable, composable, and interpretable architectures for long-horizon reasoning.
Loading comments...
loading comments...