Reversible Deep Equilibrium Models (arxiv.org)

🤖 AI Summary
Deep Equilibrium Models (DEQs) define outputs implicitly as the fixed point of a learned single-layer mapping iterated to convergence, trading depth for repeated application. This paper introduces Reversible Deep Equilibrium Models (RevDEQs), a variant that enables exact gradient computation through the equilibrium without the approximate implicit differentiation used in standard DEQs. Because gradients are exact, RevDEQs eliminate the need for the stabilizing regularization and excessive solver iterations that DEQs typically require, yielding more stable training and far fewer function evaluations. Technically, RevDEQs keep the implicit fixed-point formulation but alter the model structure so that backward propagation can be computed exactly (rather than via approximate Jacobian solves), which reduces computational overhead and instability associated with iterative gradient estimation. Empirically, the authors report state-of-the-art results on language modeling and image classification compared to both implicit (other DEQs) and comparable explicit (fixed-depth) models. The work suggests a practical path to retain the parameter and representational efficiency of equilibrium models while addressing their training fragility and solver cost, making implicit architectures more attractive for large-scale AI/ML workloads.
Loading comments...
loading comments...