Home Knowledge Base Adversarial Training

Adversarial Training is the defense strategy that improves neural network robustness by augmenting training with adversarially perturbed examples — solving a min-max optimization problem where the inner maximization generates the strongest possible attacks and the outer minimization trains the model to correctly classify them, providing the most reliable empirical defense against adversarial examples at the cost of significant training overhead and reduced accuracy on clean inputs.

What Is Adversarial Training?

Why Adversarial Training Matters

Training Procedure

Standard Adversarial Training (PGD-AT): For each training batch (x, y): 1. Inner Maximization (Attack Step):

2. Outer Minimization (Training Step):

Typical hyperparameters: K=7-20 PGD steps, α=step-size, ε=4/255 for L∞.

Variants and Improvements

MethodKey InnovationAccuracy CostRobustness Gain
PGD-AT (Madry)PGD inner attackHighHigh
TRADESTrades clean/robust accuracy explicitlyMediumHigh
MARTFocuses on misclassified adversarial examplesMediumHigh
Fast-ATSingle-step FGSM with random initLowModerate
AWP (Adversarial Weight Perturbation)Perturbs weights during trainingMediumHigh
Consistency ATLabel smoothing on adversarial examplesLowModerate

The Accuracy-Robustness Trade-off

Adversarial training consistently reduces accuracy on clean (unperturbed) inputs:

Scaling to Large Models

Certified vs. Empirical Robustness

Adversarial training is the empirical robustness standard that has withstood the test of adaptive evaluation — while no defense is perfectly unbreakable, PGD adversarial training remains the most battle-tested method for building neural networks that maintain predictive accuracy under deliberate, worst-case input manipulation.

adversarial trainingrobustdefense

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.