LoRA Without Regret (thinkingmachines.ai)

🤖 AI Summary
Researchers evaluated low-rank adaptation (LoRA)—which replaces each pretrained weight W with W' = W + γBA, where A and B are low-rank matrices—across supervised and reinforcement-learning post-training to pin down when LoRA matches full fine-tuning (FullFT). Sweeping LoRA rank (1–512), learning rates, and using large instruction/reasoning corpora (Tulu3, OpenThoughts3) plus RL math tasks, they found a broad “low‑regret” regime: for most small-to-medium post‑training datasets, LoRA attains the same sample efficiency and final loss as FullFT. Key empirical facts: high-rank LoRAs track FullFT learning curves (loss decreases roughly linearly with log steps) until a rank-dependent capacity threshold; RL policy-gradient fine-tuning matches FullFT even at very low ranks (rank=1); and optimal LoRA learning rates are relatively rank‑invariant and roughly 10× higher than FullFT’s. The work has practical implications for PEFT adoption: LoRA’s memory, transfer, and multi‑tenant serving benefits are validated for typical post‑training workloads, but practitioners should apply adapters to all layers (MLP and MoE as well as attention) rather than attention-only, sweep LR carefully, and avoid overly large batch sizes (LoRA shows larger batch‑size sensitivity due to BA parametrization). Finally, when datasets grow beyond the adapter’s capacity, LoRA underperforms FullFT, so rank should be chosen in proportion to dataset size to stay in the low‑regret regime.
Loading comments...
loading comments...