Technique

Saliency maps highlight which input tokens most influence the model output through gradient-based attribution. Technique: Compute gradient of output with respect to input embeddings, magnitude indicates importance (high gradient = small change causes large output change). Methods: Simple gradient (vanilla), Gradient × Input (element-wise product), Integrated Gradients (path from baseline to input), SmoothGrad (average over noisy inputs). Interpretation: High saliency tokens are important for prediction - but can be positive or negative influence. Advantages: Model-agnostic within differentiable models, no additional training, fast computation. Limitations: Gradient saturation: Low gradient doesn't mean unimportant. Faithfulness: May not reflect actual model reasoning. Baseline dependence: Integrated gradients require baseline choice. For NLP: Apply to embedding space, aggregate across embedding dimensions. Tools: Captum (PyTorch), TensorFlow Explainability, custom gradient computation. Visualization: Highlight tokens by saliency score, color intensity. Comparison to attention: Saliency is attribution (which inputs matter), attention is mechanism (how info flows). Useful diagnostic but interpret cautiously.

Want to learn more?