Home Knowledge Base The Alignment Tax

The Alignment Tax is the empirical and theoretical phenomenon where making AI models safer, more aligned, and better at following human preferences reduces their raw performance on some capability benchmarks — representing the real and perceived trade-off between capability optimization and value alignment in AI training.

What Is the Alignment Tax?

Why the Alignment Tax Matters

Where the Alignment Tax Appears

Creative Tasks:

Dual-Use Knowledge:

Benchmark Performance:

Sycophancy Tax:

Evidence Against Large Alignment Tax

Mitigation Approaches

ApproachMechanismReduces Tax By
Joint capability + safety trainingTrain on diverse helpful + safe dataPrevents capability regression
DPO over PPOMore stable, less distributional shiftReduces capability degradation
High-quality preference dataBetter human feedback signal qualityReduces sycophancy
Larger base modelsMore capacity for both objectivesStructural reduction
Constitutional AIPrincipled safety, not over-refusalReduces over-refusal tax

The alignment tax is a real but solvable engineering challenge rather than a fundamental law — as alignment training techniques improve and become more sophisticated at jointly optimizing capability and safety, the tax is shrinking, suggesting that the dichotomy between capable AI and safe AI is a temporary artifact of early-stage alignment research rather than an inevitable feature of AI development.

alignment taxcapability tradeofftradeoff

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.