HRM-Text: Efficient Pretraining Beyond Scaling (arxiv.org)

0 points 2 hours ago ago | visit original

🤖 AI Summary

A recent study introduces HRM-Text, a novel approach to pretraining large language models that addresses the high computational costs associated with current paradigms. Inspired by biological systems, HRM-Text employs a Hierarchical Recurrent Model (HRM) that separates strategy and execution into distinct layers, enhancing sample efficiency. The model is trained on instruction-response pairs instead of the vast raw text typically required, utilizing new techniques such as MagicNorm and warmup deep credit assignment to stabilize training. Remarkably, a 1B-parameter HRM-Text model trained on just 40 billion unique tokens and a budget of $1,500 achieves competitive performance on various benchmarks, including MMLU and ARC-C. This advancement is significant for the AI/ML community as it demonstrates that effective pretraining can be achieved with considerably less data and compute—up to 900 times fewer tokens and over 400 times less compute compared to standard methods. By highlighting the potential of co-designing model architectures and training objectives, HRM-Text paves the way for more accessible and efficient foundational research in AI, breaking down barriers that have limited participation in developing state-of-the-art models.

Loading comments...

loading comments...