🤖 AI Summary
Researchers propose "transformation learning," a reframing of continual learning that treats catastrophic forgetting not as a memory-loss problem but as asking a network to implement mathematically impossible mappings (e.g., XOR then XNOR on the same inputs). Instead of overwriting parameters, the method freezes a base network and learns lightweight task-specific transforms that map base features to each task’s outputs. On a toy XOR/XNOR test this eliminates forgetting (baseline 0% → transformation 100%), and on MNIST it scales to five tasks with 98.3% accuracy while saving ~75.6% of parameters versus training five separate networks. Remarkably, the frozen base task even slightly improves (99.86% → 99.91%) as new tasks are added.
Key technical details: transforms work far better in feature space (128D intermediate representations → 96.9% on 2-task MNIST) than at the logit/output level (5D → 80.6%). The architecture uses a star topology—one frozen base network feeding independent transforms for each task—avoiding error accumulation from chained mappings. Freezing the base prevents gradient-induced degradation (e.g., dormant neurons, weight/rank collapse). The approach requires task IDs at inference for best accuracy (reward-based routing without IDs reached 79.7%). Comparisons: unlike EWC/SI/PackNet that protect weights or Progressive Nets that grow linearly, transformation learning reformulates the problem and is parameter-efficient. Open questions remain for more complex benchmarks (CIFAR-100, real-world datasets) and cases without a natural base task.
Loading comments...
login to comment
loading comments...
no comments yet