Input × Gradient

Input × Gradient is an attribution method for neural network explainability that computes feature importance scores by element-wise multiplying each input feature by its corresponding gradient with respect to the model output — providing a single-backward-pass attribution map that identifies which input elements most influenced a specific prediction, combining the magnitude of each feature (how much it contributes) with the model's local sensitivity (how much the output changes per unit change in that feature), serving as the computationally efficient baseline for feature-level explainability in deep learning.

Core Formula and Intuition

For a model f with input x and scalar output S (typically a class score or log probability):

Attribution_i = x_i × (∂S / ∂x_i)

The gradient ∂S/∂x_i measures the local rate of change — how sensitive the output is to infinitesimal perturbations of feature i. Multiplying by x_i itself weights this sensitivity by the feature's actual value in the input.

Intuitive decomposition:
- Large |x_i|, large |∂S/∂x_i|: Feature is present AND the model is sensitive to it → HIGH importance
- Large |x_i|, small |∂S/∂x_i|: Feature is present but model ignores it → LOW importance
- Small |x_i|, large |∂S/∂x_i|: Model is sensitive to this feature but it's near-absent → LOW importance (correctly)
- Small |x_i|, small |∂S/∂x_i|: Feature absent and model insensitive → LOW importance

This captures the notion that importance requires BOTH presence AND relevance — unlike pure gradient attribution (∂S/∂x_i), which can assign high importance to features near zero where the gradient happens to be large.

Relationship to Other Attribution Methods

| Method | Formula | Key Property |
|--------|---------|-------------|
| Gradient (Saliency) | ∂S/∂x_i | Sensitive to gradient saturation at zero |
| Input × Gradient | x_i · ∂S/∂x_i | Corrects saturation, first-order Taylor term |
| Integrated Gradients | ∫₀¹ x_i · ∂S(αx)/∂(αx_i) dα | Axiomatically complete, completeness property |
| SHAP (DeepSHAP) | Shapley-weighted average of marginal contributions | Game-theoretic, locally linear approximation |
| GradCAM | ReLU(∂S/∂A_k) globally pooled over feature map | Spatial, uses activations not inputs |
| SmoothGrad | Average Input×Grad over noisy input copies | Noise reduction, sharper attributions |

Input × Gradient is the first-order Taylor approximation of the difference in model output between input x and a baseline of 0:

f(x) - f(0) ≈ Σᵢ x_i · (∂f/∂x_i evaluated at x)

This connection reveals the method's theoretical limitation: the Taylor approximation is accurate only locally (near x), and f(0) may not be a meaningful baseline for all inputs.

Completeness and the Sensitivity Axiom

Integrated Gradients (Sundararajan et al., 2017) identifies that Input × Gradient violates the completeness axiom: the sum of attribution scores does not necessarily equal f(x) - f(baseline).

Input × Gradient also violates sensitivity: if the model's output depends on feature i but f and its gradients are evaluated only at x (not at the baseline), the attribution may miss this dependence.

Despite these theoretical violations, Input × Gradient produces practically useful attributions for many tasks — the theoretical limitations manifest mainly in saturated regions of the network (post-ReLU dead neurons, high-confidence sigmoid outputs).

Gradient Saturation Problem

For ReLU networks, neurons become inactive (output = 0, gradient = 0) when their input is negative. In deep networks, many neurons may be simultaneously inactive for a given input, causing gradients to propagate through only a sparse subset of pathways. The resulting attribution map can be noisy or assign zero to clearly important features.

SmoothGrad addresses this by averaging Input × Gradient over n noisy copies:
Attribution_i^{SG} = (1/n) Σⱼ x_i · ∂S(x + ε_j)/∂x_i, where ε_j ~ N(0, σ²)

The averaging smooths out noise while preserving signal, producing sharper, more visually coherent attribution maps.

Computational Properties

- Cost: Exactly one forward + one backward pass — same cost as computing the training gradient
- Batch-compatible: Attributions for all examples in a batch computed simultaneously
- Model-agnostic: Works for any differentiable model — CNNs, transformers, MLPs, RNNs
- Output-dependent: Separately computed for each output class (or neuron) of interest

Input × Gradient serves as the standard sanity-check baseline in explainability research — a new attribution method that cannot outperform Input × Gradient on a given task is generally considered not worth the added complexity.

Want to learn more?