PackNet

PackNet is a continual learning method that uses iterative pruning to allocate separate subnetworks within a single neural network for each task. Instead of growing the network (like progressive networks), PackNet reuses freed capacity from pruning to learn new tasks while protecting important weights for old tasks.

How PackNet Works

- Task 1: Train the full network on task 1. Then prune the network — identify and remove the least important weights (e.g., those with smallest magnitude). This frees up a significant portion of the network capacity.
- Task 1 Freeze: Mark the remaining (unpruned) task 1 weights as frozen — they will never be modified again.
- Task 2: Train only the freed (pruned) weights on task 2. The frozen task 1 weights participate in forward passes but don't receive gradient updates. After training, prune task 2 weights similarly.
- Repeat: Each new task uses the remaining free capacity. The network accumulates binary task masks indicating which weights belong to which task.

Key Properties

- Fixed Network Size: Unlike progressive networks, the model does not grow. All tasks share the same network, just using different subsets of weights.
- Zero Forgetting: Previous task weights are frozen, guaranteeing no catastrophic forgetting.
- Task Masks: Each task has a binary mask indicating its active weights. At inference time, the appropriate mask is applied.
- Capacity Limit: Eventually the network runs out of free weights. The number of tasks is limited by the pruning ratio and network size.

Typical Pruning Ratios

- 50–75% pruning per task is common — meaning each task uses only 25–50% of available weights.
- A network pruned at 75% can theoretically support ~4 tasks (though later tasks have less capacity).

Advantages Over Progressive Networks

- Constant model size — no linear growth.
- Efficient parameter usage — leverages the well-known observation that neural networks are over-parameterized and can achieve good performance with far fewer weights.

Limitations

- Finite Capacity: Cannot support unlimited tasks — the network eventually runs out of free parameters.
- No Forward Transfer: Tasks don't share weights (beyond the architectural structure), limiting knowledge transfer between tasks.
- Task ID Required: Must know which task mask to apply at inference time.

PackNet demonstrated that the over-parameterization of modern neural networks could be directly exploited for continual learning — a key insight for the field.

Want to learn more?