GPT-5-Codex is a better AI researcher than me (www.seangoedecke.com)

🤖 AI Summary
A developer challenged themselves to train the best possible model on a laptop in five minutes and used OpenAI’s GPT-5-Codex (plus Codex CLI) as an autonomous coding/research assistant. Codex ran dozens of experiments on the TinyStories dataset—sweeping n‑gram baselines, ~50 small transformers (best: 3 layers, 4 heads, dim 144, perplexity 8.53), and hybrid tricks like kNN/cache heads and shallow fusion. Crucially, Codex automated the loop: edit training scripts, run a few quick trials, analyze results, and propose next steps—what the author dubs “vibe research.” That agentic workflow produced better models and faster iteration than the author could achieve alone. The key technical takeaway is that standard metrics can mislead: shallow fusion cut perplexity to 7.38 but produced highly repetitive, low-quality text, demonstrating that optimizing perplexity alone harms generation quality. The most effective technique was distillation from a quickly-trained n‑gram teacher into a transformer (forcing the transformer to match n‑gram predictions for ~200 steps, then continuing normal training), which jumpstarted grammatical competence and yielded the most coherent short stories within the five‑minute constraint. Implications for ML practice: coding agents can massively accelerate hyperparameter sweeps and prototyping on small compute budgets, but researchers should watch seed variance, metric choice, token costs, and sandboxing/safety when delegating experiment control to agents.
Loading comments...
loading comments...