LoRA without Regret from scratch (github.com)

🤖 AI Summary
This repo reproduces the “LoRA without Regret” SFT and RL experiments and confirms the core claim: in low-data regimes LoRA can match full fine-tuning performance while requiring far fewer trainable parameters. Experiments on Qwen3-4B using the first 6,400 No Robots examples (1 epoch, 200 steps, effective batch 32, AdamW, LoRA α=32, constant LR) show LoRA SFT reaches the same test NLL as full fine-tuning (1.8457). Crucially, optimal LoRA learning rates are ~10× higher than full fine-tune (full FT LR 2.5e-5 vs LoRA rank-256 LR ≈2.5e-4), and lower-rank LoRAs need lower optimal LRs (rank 1: 1.2e-4 → slightly worse NLL). Applying LoRA to both MLP+attention outperformed MLP-only in these runs, differing from the original blog's MLP-only parity; training curves show high variance and different configs outperform at different steps, suggesting limited generalizability. In RL experiments with Qwen3-1.7B (GRPO, 50 steps, 32 prompts × 8 rollouts, on-policy, Adam, LoRA α=32), LoRA also matched full fine-tune—even at rank 1—demonstrating practical parameter-efficient tuning for both SFT and RL in low-data settings. Takeaway for practitioners: LoRA is an effective, compute- and memory-saving alternative, but requires careful LR sweeps by rank and placement (MLP vs. attention), and results can be sensitive to run-to-run variability.
Loading comments...
loading comments...