Geometry Conflict: Explain & Controll Forgetting in LLM Continual Post-Training (huggingface.co)

0 points 1 hour ago ago | visit original

🤖 AI Summary

Recent research has uncovered the concept of "geometry conflict" as a key driver of forgetting in continual post-training of large language models (LLMs). This study addresses the challenge of how to effectively integrate new updates into LLMs without causing catastrophic forgetting. By analyzing the covariance geometry induced by parameter updates, the researchers concluded that forgetting occurs when new task updates become geometrically incompatible with the current model state shaped by previous updates. They introduced a novel method called Geometry-Conflict Wasserstein Merging (GCWM), which leverages this geometry conflict to optimize the integration of updates, enhancing model retention and overall performance without the need for replay data. This work is significant for the AI/ML community as it not only provides a deeper understanding of the factors influencing forgetting in LLMs but also offers a practical solution to improve continual learning. The findings highlight that controlling the geometric compatibility of updates is critical to successful model development. By demonstrating the effectiveness of GCWM across various model sizes and training scenarios, the research paves the way for more robust continual learning frameworks that can sustain knowledge retention while adapting to new tasks.

Loading comments...

loading comments...