Large Language Models Meet Joint Embedding Predictive Architectures (arxiv.org)

0 points 9 hours ago ago | visit original

🤖 AI Summary

Researchers introduce LLM-JEPA, the first practical application of Joint Embedding Predictive Architectures (JEPAs) to large language models. Instead of the standard input-space reconstruction or autoregressive generation objectives used in LLM pretraining and fine-tuning, LLM-JEPA trains models with embedding-space predictive losses—mirroring techniques that yielded major gains in computer vision. The paper presents a JEPA-based objective designed for both pretraining and fine-tuning and reports consistent, significant improvements over conventional LLM objectives across multiple benchmarks (NL-RX, GSM8K, Spider, RottenTomatoes) and model families (Llama3, OpenELM, Gemma2, Olmo). The authors also note increased robustness to overfitting. This work is significant because it challenges the long-standing assumption that language models must be trained primarily with input-space generative losses; moving to embedding-space prediction could yield better generalization, more stable fine-tuning, and alternative evaluation regimes. Technically, LLM-JEPA reframes supervision as predicting joint embeddings rather than tokens, enabling cross-modal lessons from vision to transfer to language models. The code is available, making it easier for practitioners to reproduce results and experiment with JEPA objectives in LLM pretraining and downstream adaptation.

Loading comments...

loading comments...