Zero-Shot and Few-Shot Learning

Zero-Shot and Few-Shot Learning is the transfer learning paradigm enabling recognition of novel unseen classes through semantic attributes or embeddings — critical for scaling to classes with limited or no labeled training examples.

Attribute-Based Zero-Shot Learning:
- Semantic attributes: human-defined attributes describing classes (e.g., cats: furry, four-legged, carnivorous)
- Zero-shot inference: classifier trained on seen classes; attributes transfer to predict unseen class labels
- Attribute prediction: classifier learns visual-to-attribute mapping from seen classes; applies to unseen classes
- Handcrafted attributes: domain expert designs attribute set; labor-intensive but interpretable
- Learned attributes: automatically discovered attributes from data; more flexible and potentially more informative

Visual-Semantic Embedding Space:
- Joint embedding: visual features and semantic embeddings (word2vec, GloVe, BERT) projected to shared space
- Similarity matching: unseen class prototype (semantic embedding) matched to test image embedding; nearest neighbor in shared space
- Cross-modal learning: learn similarity function aligning visual and semantic modalities; enables class transfer
- Embedding quality: semantic embeddings capture rich linguistic properties; word2vec encodes semantic relationships

Generalized Zero-Shot Learning:
- Seen + unseen classes: both seen and unseen classes available at test time; more realistic and challenging
- Bias toward seen classes: seen classes have more training data; models biased toward seen class predictions
- Hubness problem: test samples preferentially closest to seen class embeddings; seen classes dominate predictions
- Balancing mechanism: bias correction or calibration methods balance predictions toward seen/unseen classes

Few-Shot Learning Evaluation Protocol:
- N-way K-shot: evaluate on N classes with K examples per class; standard benchmark (5-way 5-shot, 10-way 1-shot)
- Episode evaluation: sample random tasks (N-way K-shot episodes); evaluate across many episodes and average
- Meta-test performance: only new classes at test time; models not trained on test classes; true transfer capability
- Benchmark datasets: miniImageNet (100 classes, 600 images/class), Omniglot (1623 classes, 20 images/class)

Prototypical Networks:
- Embedding-based meta-learning: learn embedding where same-class examples cluster; prototype = class mean
- Few-shot inference: new class prototype computed from K support examples; query classified by nearest prototype
- Metric learning: episodic training encourages compact clusters for same class; separated clusters for different classes
- Simplicity: straightforward approach; competitive with more complex meta-learning methods

Matching Networks:
- Attention-based matching: soft matching between query and support set; learned similarity function
- External memory: support set stored; matching network attends to relevant support examples
- Episodic training: simulates few-shot task at training time; trains model to match on small support sets
- Temporal attention: sequential attention over support set; learn which examples most relevant for query

Model-Agnostic Meta-Learning (MAML):
- Optimization-based meta-learning: learn initialization enabling rapid few-shot adaptation with few gradient steps
- Inner loop: update parameters on support set (few examples) via gradient descent
- Outer loop: meta-update initialization based on query set performance; learn better initialization
- Few-shot adaptation: after MAML pretraining, just 1-5 gradient steps on support set; excellent performance

In-Context Learning as Implicit Few-Shot:
- Large language models: few-shot learning without parameter updates; examples in prompt condition predictions
- Implicit learning: models learn through pretraining to adapt to examples; weights frozen at test time
- Tokenization advantage: text examples easily incorporated; enables flexible few-shot prompting
- Scaling: larger models show better few-shot learning; implicit few-shot emerges from scale

Challenges in Zero-Shot and Few-Shot Learning:
- Semantic gap: visual features and semantic embeddings from different modalities; bridging gap challenging
- Attribute sparsity: limited attributes may not capture distinguishing characteristics; rich attribute sets labor-intensive
- Domain shift: attributes/embeddings trained on source domain; transfer to target domain challenging
- Imbalanced data: few examples limit training; high variance in few-shot learning; uncertainty quantification needed

Zero-shot and few-shot learning leverage semantic embeddings and small example sets — enabling transfer to novel classes without requiring large labeled datasets, critical for real-world applications with evolving class sets.

Want to learn more?