LoRA-XS: Low-Rank Adaptation with Small Number of Parameters (arxiv.org)

🤖 AI Summary
Researchers introduced LoRA-XS, a theory-backed, ultra-parameter-efficient fine-tuning method that shrinks adapter size by inserting a tiny trainable weight matrix between frozen low-rank factors obtained from an SVD of pre-trained weights. Instead of learning full low-rank updates as in LoRA, LoRA-XS freezes the singular vectors and only optimizes a compact “core” that modulates them, enabling modules that can scale down to a single parameter or scale up as needed. This design yields dramatic storage savings—over 100× reduction for 7B models versus standard LoRA—while removing any lower bound on module size, making per-task or per-user personalization much more viable in constrained deployment settings. Empirically, LoRA-XS matches or outperforms LoRA and VeRA across GLUE, GSM8K, MATH and commonsense reasoning benchmarks at multiple model scales; ablations show the frozen singular vectors are central to its effectiveness. The work implies that singular vectors of transformer weights form a stable, useful basis that tiny trainable cores can exploit, opening a practical path to extreme adapter compression without sacrificing accuracy. For practitioners, LoRA-XS offers a plug-and-play way to massively reduce storage and computational costs of many small adapters while preserving or improving downstream performance.
Loading comments...
loading comments...