Counting R in strawberry – nanochat guide (github.com)

🤖 AI Summary
The author demonstrated how they taught nanochat d32 to count letters (e.g., “How many ‘r’ are in strawberry?”) by injecting a tiny synthetic task into midtraining and SFT: two-turn dialogues where a user asks to count a letter and the assistant responds in a fixed, stepwise style. They use diverse user templates (including some foreign-language variants) to raise “entropy” so the model is triggered by varied phrasings, and they habitually show a short manual reasoning trace followed by a Python interpreter check — currently both are deterministic during training but are intended to habituate a chain-of-thought style approach. The technical lessons are concrete and broadly applicable: explicitly spell the target word into individual character tokens (careful punctuation and spacing controls tokenization), force the model to list letters without spaces (commas/colons used to create token boundaries), and make counting explicit by showing comparisons and an incrementing counter. This reduces per-token computation and helps smaller models by spreading “work” across tokens. The writeup highlights practical tradeoffs (synthetic data vs. full retraining, simulated mistakes, RLFT, and mixed sampling to avoid forgetting) and underscores how tokenization-aware SFT can add narrow computing capabilities and encourage transfer of stepwise problem solving in LLMs.
Loading comments...
loading comments...