Starting from scratch: Training a 30M Topological Transformer (www.tuned.org.uk)

0 points 13 days ago ago | visit original

🤖 AI Summary

A new AI model, the Tauformer, has been introduced, featuring a 30 million parameter Topological Transformer that innovatively replaces traditional dot-product attention with a Laplacian-derived scalar known as the taumode. This unique approach allows the model to inject domain-specific structure directly into attention mechanisms by ranking token relationships based on their proximity in scalar space rather than geometric similarity. By utilizing a compressed representation of key vectors through the taumode, Tauformer aims to enhance computational efficiency, reducing the storage requirements for attention computations by approximately 50%. The significance of Tauformer for the AI/ML community lies in its potential to improve the way attention mechanisms function, especially in domain-specific applications. By leveraging a sparsely-stored Laplacian matrix instead of full key tensors, it streamlines processing while maintaining accuracy. Initial training results demonstrate rapid convergence, with validation loss decreasing significantly early in the training process. This innovative architecture not only promises better performance through a more nuanced understanding of relationships in data but also introduces intriguing questions about the interactions between loss metrics and the behavior of the taumode, paving the way for future explorations in model efficiency and structure.

Loading comments...

loading comments...