Source-Optimal Training Is Transfer-Suboptimal (arxiv.org)

🤖 AI Summary
The paper proves a fundamental mismatch between the regularization that minimizes source-domain risk and the regularization that maximizes transfer benefit. Analyzing L2-SP (ridge) fine-tuning with sharp phase boundaries, the authors derive the transfer-optimal source penalty τ0* and show it typically differs from the source-optimal value: in high SNR regimes you need stronger pull toward the source (larger τ0*), while in low SNR regimes you should regularize less (smaller τ0*). In isotropic, analytically tractable settings they further show the binary decision of whether to transfer at all is largely independent of target sample size and noise — it depends primarily on task alignment and source characteristics. This result is significant for practitioners and theorists because it undermines the common heuristic of choosing source regularization to simply minimize source risk before transfer. Instead one should tune regularization specifically for transfer objectives, taking into account SNR and alignment between tasks. Theoretical phase maps and closed-form characterizations in the L2-SP ridge model give actionable intuition, and CIFAR-10/MNIST experiments indicate the counterintuitive pattern extends to non-linear networks. Overall, the work highlights that “source-optimal” training is generally transfer-suboptimal and provides principled guidance for setting fine-tuning penalties in transfer learning.
Loading comments...
loading comments...