NanoChat – The best ChatGPT that $100 can buy (github.com)

🤖 AI Summary
Nanochat is a compact, end-to-end ChatGPT-like codebase (released on GitHub) that aims to make training and running an LLM accessible on a modest budget. The repo packages tokenization, pretraining, mid/fine-tuning, evaluation, inference and a simple web UI into a dependency-light, hackable pipeline you can run with a single script (speedrun.sh). On an 8×H100 node (~$24/hr) the “$100 tier” run finishes in about 4 hours (≈4e19 FLOPs) and yields a very small, chat-capable model with an accompanying report.md of evaluation metrics. The project is intended as a pedagogical, reproducible capstone for an LLM course and as a strong, minimal baseline for micro-model research under <$1,000 budgets. Technically, nanochat uses vanilla PyTorch, torchrun for multi-GPU, a rustbpe tokenizer with tests, and simple scripts that expose knobs like --depth (model size) and --device_batch_size (VRAM tuning). Scaling to a GPT-2–grade d26 model is a few config changes plus more data shards; d26 is estimated at ~$300 and ~12 hours on the same hardware. The code auto-adjusts with gradient accumulation when you reduce device batch size for smaller GPUs, and it also runs on single-GPU or A100 nodes (slower). Example evals in the repo show modest but measurable gains (e.g., GSM8K improving through stages up to ~0.0758, HumanEval ~0.0854, MMLU ~0.315). Nanochat’s value is in lowering cognitive and monetary barriers to hands‑on LLM experimentation, while remaining explicit about its limits compared with multi‑million‑dollar production models.
Loading comments...
loading comments...