🤖 AI Summary
A research team introduced a constructive method for “stuffing” single-hidden-layer MLPs with discrete facts — not by probing pretrained LLMs, but by explicitly building MLPs that implement key→value mappings with provable correctness. Their encoder–decoder construction (a gated MLP encoder plus a linear decoder using a Johnson–Lindenstrauss-style random projection bottleneck) handles realistic, anisotropic embeddings, matches the information-theoretic facts‑per‑parameter lower bound in idealized settings, and yields precise scaling laws governed by a new embedding-quality metric called decodability. Practically, this means we can predict how many parameters an MLP needs to store a fact set for real LLM internal embeddings, and the construction is asymptotically as efficient as gradient‑descent trained MLPs.
Crucially, the team shows Transformers can reliably read these constructed MLP fact stores — after modest architectural constraints (tying embeddings, freezing pre-MLP RMSNorm and value/out projections, and removing residual bypasses) — enabling >99% recall on a synthetic task and facts‑per‑parameter scaling comparable to LLMs. They also reveal a capacity–usability tradeoff: whitening output embeddings increases capacity but raises the MLP Lipschitz constant and harms Transformer usability. As a proof‑of‑concept, swapping entire fact‑storing MLP blocks lets a Transformer adopt new facts instantly (no retraining), substantially improving modular fact‑editing performance versus prior methods — a step toward plug‑and‑play, editable model memory.
Loading comments...
login to comment
loading comments...
no comments yet