One Thousand Layer Networks for Self-Supervised RL (arxiv.org)

🤖 AI Summary
Researchers report that simply scaling neural network depth—up to 1,024 layers—produces large gains in self-supervised, goal-conditioned reinforcement learning. Working in an unsupervised setting where agents receive no demonstrations or task rewards, the team trains contrastive RL agents to maximize the likelihood of reaching commanded goals. Across simulated locomotion and manipulation benchmarks, deeper networks boosted success rates by factors of roughly 2×–50× and outperformed other goal-conditioned baselines; increasing depth not only improved scores but also produced qualitatively different, more capable behaviors. Code, demos, and a project webpage are provided. This work is significant because it transfers a simple but powerful scaling knob from language and vision—model depth—into RL, where most prior work used very shallow policies (2–5 layers). The key technical takeaway is that depth can unlock richer representations and goal-reaching capabilities even in purely self-supervised regimes, and that depth scaling materially alters what policies can learn. Practically, this suggests RL research should re-evaluate architectural baselines and explore extreme depths alongside other scaling axes; it also raises follow-up questions about compute, training stability, and when depth yields diminishing returns.
Loading comments...
loading comments...