How to inject knowledge efficiently? Knowledge infusion scaling law for LLMs (arxiv.org)

0 points 7 hours ago ago | visit original

🤖 AI Summary

Researchers propose a "knowledge infusion scaling law" to guide how much domain-specific data to inject during pretraining of large language models. They address a key trade-off: modest infusion can boost domain performance and reduce hallucinations, but too much causes "memory collapse"—a sharp, catastrophic loss of previously learned knowledge. Through systematic experiments the authors identify a model-specific critical collapse point and show that these thresholds reliably scale with model size. Crucially, the scaling law lets practitioners analyze smaller models to predict the optimal infusion amount for larger counterparts, avoiding expensive trial-and-error on full-scale LLMs. Technically, the work demonstrates a consistent scale correlation between model capacity and the token budget for domain data: each model has a threshold beyond which knowledge retention degrades rapidly. The law was validated across multiple model sizes and token-budget regimes, indicating generalizability. For the AI/ML community this provides a practical recipe for dataset curation and pretraining design—enabling more efficient domain adaptation, preventing catastrophic forgetting, and saving compute by estimating infusion regimes on cheaper, smaller models. The findings also inform choices between pretraining infusion and later fine-tuning or continual-learning strategies when balancing specialization and broad knowledge retention.

Loading comments...

loading comments...