Meta-Learning (MAML) is the gradient-based optimization framework for learning to learn — computing meta-parameters (initialization) enabling rapid task-specific adaptation with few gradient steps, achieving state-of-the-art few-shot performance across vision and language tasks.
Learning to Learn Concept:
- Meta-learning objective: maximize performance on new tasks after few adaptation steps; not just single-task accuracy
- Task diversity: train on diverse tasks; learn common structure enabling generalization to new task distributions
- Rapid adaptation: few gradient steps on task-specific data sufficient; leverages learned initialization
- Few-shot adaptation: contrast to transfer learning (fine-tune all parameters); MAML updates from better initialization
MAML Bilevel Optimization:
- Inner loop: task-specific optimization; gradient descent on task loss with learned initialization θ
- Outer loop: meta-level optimization; update initialization θ to minimize loss on query set after inner loop steps
- Bilevel structure: inner loop nested within outer loop; optimization of optimization procedure
- Computational cost: requires computing gradients through inner loop (second-order derivatives); expensive but powerful
Algorithm Details:
- Meta-update: ∇_θ L_meta = ∑_tasks ∇_θ [L_task(θ - α∇L_support)]
- Hessian computation: exact second-order derivatives expensive; approximate via finite differences or implicit function theorem
- Computational efficiency: MAML-FOMAML (first-order) approximates second-order; significant speedup with minimal accuracy loss
- Multiple inner steps: 1-5 inner gradient steps typical; more steps better performance but higher computational cost
Meta-Learning on Few-Shot Classification:
- Support set: small set of labeled examples (5 per class typical) for task-specific adaptation
- Query set: test examples evaluating adapted model; loss on query set defines meta-loss
- Episode sampling: randomly sample tasks during training; each task has own support/query split
- Task distribution: diverse task distribution critical; meta-learning assumes test tasks from same distribution
Reptile Meta-Learning:
- First-order MAML simplification: further simplify MAML by removing second-order terms
- Simplified algorithm: just average parameter updates across tasks; surprisingly effective
- Computational efficiency: substantially faster than MAML; enables scaling to larger models
- Empirical performance: competitive with MAML on few-shot benchmarks; simpler implementation
Model-Agnostic Property:
- Architecture independence: applicable to any model trained via gradient descent; no special modules
- Flexibility: used for classification, reinforcement learning, neural ODEs, optimization itself
- Black-box compatibility: applicable to any differentiable model; doesn't require interior access
- Multi-modal learning: MAML applied to joint vision-language models; learns cross-modal adaptation
Prototypical Networks Comparison:
- Embedding-based vs optimization-based: prototypical networks learn embedding space; MAML learns initialization
- Computational comparison: prototypical networks efficient inference; MAML requires inner loop adaptation
- Performance: both state-of-the-art on few-shot; prototypical networks simpler; MAML potentially more flexible
- Task adaptation: MAML more naturally incorporates task information; prototypical networks class-agnostic
Meta-Learning for Hyperparameter Optimization:
- HPO meta-learning: learn hyperparameter schedules for optimization; HPO-as-few-shot-learning
- Learning rate schedules: meta-learn initial learning rates; task-specific tuning adapted quickly
- Data augmentation: meta-learn augmentation policies optimized for task; transfer across tasks
- Domain transfer: meta-learned initializations transfer across related domains; enables efficient fine-tuning
Applications Across Domains:
- Vision: few-shot classification on miniImageNet, Omniglot, CUB (bird classification); strong baselines
- Language: few-shot language modeling; meta-learning task-specific language adaptation; pre-training improvements
- Reinforcement learning: meta-RL enables rapid policy adaptation to new tasks; sample-efficient learning
- Robotics: few-shot robot control; meta-learning robot manipulation skills transferable across tasks
Meta-learning Challenges:
- Task distribution assumption: test tasks must match training task distribution; distribution shift problematic
- Overfitting to meta-training tasks: memorize task-specific adaptations; reduced generalization to new tasks
- Computational cost: second-order derivatives expensive; limits scalability to very large models
- Optimization challenges: saddle points and local minima in bilevel optimization; convergence difficult
MAML enables rapid few-shot adaptation through learned initializations — using bilevel optimization to find meta-parameters that facilitate task-specific learning with minimal gradient updates.