Learning Pseudorandom Numbers with Transformers (arxiv.org)

🤖 AI Summary
A recent study has revealed that Transformer models can effectively learn sequences generated by Permuted Congruential Generators (PCGs), a sophisticated type of pseudo-random number generator. This research is significant for the AI/ML community as it demonstrates the capability of Transformers to tackle complex sequence prediction tasks that surpass traditional classical attacks. By scaling moduli up to $2^{22}$ and employing models with up to 50 million parameters on datasets containing billions of tokens, the findings show that even truncated outputs can be reliably predicted. Interestingly, when distinct PRNGs are introduced during training, the model adapts by recognizing underlying structural patterns across different permutations, indicating the versatility of Transformers in learning diverse data representations. Furthermore, the study uncovers a scaling law related to the number of in-context sequence elements needed for accurate predictions, which grows proportionally to the square root of the modulus. This insight suggests that larger moduli necessitate a curriculum learning approach; incorporating data from smaller moduli is essential for successful model training. An analysis of the embedding layers revealed a clustering phenomenon where principal components of integer inputs group into bitwise rotationally-invariant clusters. This offers new perspectives on how AI models can evolve their representations across varying complexities, enhancing the understanding of structure and interpretability in machine learning.
Loading comments...
loading comments...