YARN: Efficient Context Window Extension of Large Language Models (2024) [PDF] (proceedings.iclr.cc)

🤖 AI Summary
A newly published paper at ICLR 2024 introduces YARN (Yet another RoPE extenSion method), a significant advancement in extending the context window of large language models (LLMs) such as LLaMA, GPT-NeoX, and PaLM. The authors, from Nous Research, EleutherAI, and the University of Geneva, identify that traditional models using Rotary Position Embeddings (RoPE) struggle to generalize beyond their trained sequence lengths. YARN aims to address this by providing a compute-efficient approach that requires 10 times fewer tokens and 2.5 times fewer training steps than existing methods. Impressively, models fine-tuned with YARN can effectively utilize context lengths up to 128,000 tokens. The significance of YARN lies in its potential to enhance the capabilities of LLMs in handling longer sequences, which is crucial for tasks that require in-context learning (ICL). By enabling models to extrapolate beyond their original training limits, YARN not only outpaces previous state-of-the-art methods but also demonstrates the ability to generalize effectively with minimal fine-tuning, using less than 0.1% of the original pre-training data. This development could foster significant advancements in natural language processing (NLP) by allowing for richer context without overwhelming resource requirements, thereby making large-scale LLM applications more accessible and efficient.
Loading comments...
loading comments...