The Curved Spacetime of Transformer Architectures (arxiv.org)

🤖 AI Summary
The paper proposes a geometric framework that maps Transformer behavior onto the language of curved manifolds from General Relativity: queries and keys define an effective metric on token representation space, attention functions like a discrete connection that parallel-transports value vectors between tokens, and stacked layers act as time-slices through which token embeddings evolve. Backpropagation is cast as a least-action principle that sculpts loss-minimizing trajectories in parameter space. From this perspective, token trajectories through layers should bend (show curvature) rather than follow straight lines in feature space. To test this, the authors visualize a curvature landscape across tokens and layers and quantify trajectory geometry using turning angles and length-to-chord ratios. They run simulations and statistical controls to show the observed bends are not explained by dimensionality or chance. Finally, inspired by Einstein’s eclipse experiment, they perform controlled context edits and measure systematic deflections in embedding trajectories that align with meaning-preserving attention shifts. The work is significant because it gives a principled, testable geometric interpretation of attention mechanics—offering new interpretability tools, diagnostics for how context reshapes representations, and potential avenues for architecture or regularization that explicitly manage curvature in embedding space.
Loading comments...
loading comments...