RustGPT: A pure-Rust transformer LLM built from scratch (github.com)

🤖 AI Summary
RustGPT is a from-scratch transformer LLM implemented entirely in Rust (no PyTorch/TensorFlow/Candle) using ndarray for linear algebra. The repo includes a complete training pipeline (src/main.rs), the core model and training logic (src/llm.rs), transformer building blocks (self_attention.rs, feed_forward.rs, layer_norm.rs, embeddings.rs, output_projection.rs), an Adam optimizer, and comprehensive tests for each component. The project runs two training phases — a factual pre-training stage and an instruction-tuning stage — then drops into an interactive chat mode. Key model specs: dynamic vocab, embedding dim 128, hidden dim 256, max sequence length 80, 3 transformer blocks, cross-entropy loss, Adam optimizer with L2 gradient clipping capped at 5.0, greedy decoding. Default training runs 100 epochs for pre-training (LR 0.0005) and 100 for instruction tuning (LR 0.0001). For the AI/ML community this is a practical, educational blueprint showing how a transformer and full backpropagation can be built and tested in a systems language. It’s significant for people who want production-friendly, dependency-light LLM tooling, systems-level optimizations (SIMD/parallelism), or deeper pedagogical insight into transformers. Limitations: small model capacity, simple greedy decoding, in-memory parameters (persistence is a TODO), and modest sequence/dimension sizes — so it’s best as a learning platform and a base for engineering improvements (positional encodings, better samplers, checkpoints, multi-GPU training).
Loading comments...
loading comments...