Home Knowledge Base Input × Gradient

Input × Gradient is an attribution method for neural network explainability that computes feature importance scores by element-wise multiplying each input feature by its corresponding gradient with respect to the model output — providing a single-backward-pass attribution map that identifies which input elements most influenced a specific prediction, combining the magnitude of each feature (how much it contributes) with the model's local sensitivity (how much the output changes per unit change in that feature), serving as the computationally efficient baseline for feature-level explainability in deep learning.

Core Formula and Intuition

For a model f with input x and scalar output S (typically a class score or log probability):

Attribution_i = x_i × (∂S / ∂x_i)

The gradient ∂S/∂x_i measures the local rate of change — how sensitive the output is to infinitesimal perturbations of feature i. Multiplying by x_i itself weights this sensitivity by the feature's actual value in the input.

Intuitive decomposition:

This captures the notion that importance requires BOTH presence AND relevance — unlike pure gradient attribution (∂S/∂x_i), which can assign high importance to features near zero where the gradient happens to be large.

Relationship to Other Attribution Methods

MethodFormulaKey Property
Gradient (Saliency)∂S/∂x_iSensitive to gradient saturation at zero
Input × Gradientx_i · ∂S/∂x_iCorrects saturation, first-order Taylor term
Integrated Gradients∫₀¹ x_i · ∂S(αx)/∂(αx_i) dαAxiomatically complete, completeness property
SHAP (DeepSHAP)Shapley-weighted average of marginal contributionsGame-theoretic, locally linear approximation
GradCAMReLU(∂S/∂A_k) globally pooled over feature mapSpatial, uses activations not inputs
SmoothGradAverage Input×Grad over noisy input copiesNoise reduction, sharper attributions

Input × Gradient is the first-order Taylor approximation of the difference in model output between input x and a baseline of 0:

f(x) - f(0) ≈ Σᵢ x_i · (∂f/∂x_i evaluated at x)

This connection reveals the method's theoretical limitation: the Taylor approximation is accurate only locally (near x), and f(0) may not be a meaningful baseline for all inputs.

Completeness and the Sensitivity Axiom

Integrated Gradients (Sundararajan et al., 2017) identifies that Input × Gradient violates the completeness axiom: the sum of attribution scores does not necessarily equal f(x) - f(baseline).

Input × Gradient also violates sensitivity: if the model's output depends on feature i but f and its gradients are evaluated only at x (not at the baseline), the attribution may miss this dependence.

Despite these theoretical violations, Input × Gradient produces practically useful attributions for many tasks — the theoretical limitations manifest mainly in saturated regions of the network (post-ReLU dead neurons, high-confidence sigmoid outputs).

Gradient Saturation Problem

For ReLU networks, neurons become inactive (output = 0, gradient = 0) when their input is negative. In deep networks, many neurons may be simultaneously inactive for a given input, causing gradients to propagate through only a sparse subset of pathways. The resulting attribution map can be noisy or assign zero to clearly important features.

SmoothGrad addresses this by averaging Input × Gradient over n noisy copies: Attribution_i^{SG} = (1/n) Σⱼ x_i · ∂S(x + ε_j)/∂x_i, where ε_j ~ N(0, σ²)

The averaging smooths out noise while preserving signal, producing sharper, more visually coherent attribution maps.

Computational Properties

Input × Gradient serves as the standard sanity-check baseline in explainability research — a new attribution method that cannot outperform Input × Gradient on a given task is generally considered not worth the added complexity.

input gradientattribution methodexplainability

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.