Nested Learning – A new ML paradigm for continual learning (research.google)

🤖 AI Summary
Google Research researchers introduced "Nested Learning," a new paradigm presented at NeurIPS 2025 that reframes a single ML model as a system of interconnected, multi-level optimization problems. Instead of treating architecture and optimization as separate, Nested Learning treats each subcomponent as its own optimization with its own "context flow" and update frequency, ordered into hierarchical levels. That reframing lets models maintain richer, spectrum-like memory (a continuum memory system, CMS) and avoid catastrophic forgetting by allowing components to update at different rates—enabling persistent long-term knowledge alongside fast short-term context handling. Technically, the team models backpropagation and attention as associative-memory modules, recasts optimizers (e.g., momentum) with L2-style objectives to make them more robust to noisy data, and introduces "deep optimizers" that operate as inner learning problems. Their proof-of-concept architecture, Hope—a self-modifying variant of Titans—implements unbounded in‑context learning and CMS blocks to scale context windows. Empirically Hope shows lower perplexity, higher accuracy, and markedly better long‑context memory on language modeling, long‑context Needle‑In‑Haystack tasks, and continual learning benchmarks. The work opens a new design axis for self‑improving systems: stacking heterogeneous update rates and nested optimizers to build models that learn continually without overwriting older skills.
Loading comments...
loading comments...