Single prompt to Gemini to 6M parameters model (github.com)

🤖 AI Summary
A new advancement in natural language processing has emerged with the announcement of a fully autonomous, decoder-only Transformer model, Gemini, created from the ground up using PyTorch. This innovative project covers the entire lifecycle of a large language model (LLM), starting from the creation of a unique "Pseudo-Wikipedia" dataset consisting of 100,000 samples, which includes diverse content like mathematics, coding, and storytelling. The model features 6.39 million parameters, structured across 8 layers with a character-level tokenizer. Significantly, this project showcases a clean and educational implementation of the GPT architecture, complete with custom training loops optimized via the AdamW algorithm. With an interactive Streamlit web application for real-time inference, users can easily engage with the model and witness its learning capabilities in understanding English grammar and performing basic recall tasks. This initiative not only democratizes access to LLM development by optimizing for local machine training (including Mac M1/M2/M3 compatibility) but also serves as a valuable educational resource for builders and researchers in the AI/ML community, fostering further experimentation and innovation in transformer models.
Loading comments...
loading comments...