Home Knowledge Base Integrated Gradients

Integrated Gradients is the axiomatic attribution method that explains neural network predictions by summing gradients along the path from a baseline input to the actual input — satisfying provable mathematical properties (sensitivity and implementation invariance) that simpler gradient methods violate, making it the gold standard for feature attribution in high-stakes applications.

What Are Integrated Gradients?

Where x' = baseline, x = actual input, α parameterizes the interpolation path.

Why Integrated Gradients Matters

The Baseline Choice

The baseline x' is the "neutral" input from which attribution is measured:

ModalityCommon BaselineRationale
ImagesBlack image (zeros)No visual information
Text (embeddings)Zero embedding vectorNo semantic content
Text (tokens)Padding token [PAD]Empty/absent input
TabularFeature meansAverage input
AudioSilence (zeros)No signal

Baseline choice affects attributions significantly — different baselines answer different questions:

Computing Integrated Gradients

def integrated_gradients(model, input_x, baseline_x, n_steps=300):
    # Create interpolated inputs along path
    alphas = torch.linspace(0, 1, n_steps)
    interpolated = baseline_x + alphas.view(-1,1) * (input_x - baseline_x)
    
    # Compute gradients at each interpolation step
    grads = []
    for interp in interpolated:
        interp.requires_grad_(True)
        output = model(interp)
        output.backward()
        grads.append(interp.grad.clone())
    
    # Integrate: average gradients, scale by (input - baseline)
    avg_grads = torch.stack(grads).mean(dim=0)
    integrated_grads = (input_x - baseline_x) * avg_grads
    return integrated_grads

Applications

Integrated Gradients vs. Other Attribution Methods

MethodSensitivity AxiomCompletenessBaseline RequiredSpeed
Vanilla GradientFailsNoNoVery fast
Gradient × InputPartialNoNoVery fast
Guided BackpropFails (faithless)NoNoFast
Integrated GradientsYesYesYesModerate
SHAP (KernelSHAP)YesYesYesSlow
SHAP (GradientSHAP)ApproximateApproximateYesModerate

Integrated Gradients is the attribution method with mathematical guarantees that high-stakes applications require — by ensuring that feature attributions are provably faithful to the model's computation rather than plausible-but-arbitrary post-hoc stories, IG provides the rigorous explanatory foundation that enables trusted deployment of neural networks in medicine, law, and finance.

integrated gradientsattributionbaseline

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.