Counterfactual Fairness

Keywords: counterfactual fairness,fairness

Counterfactual Fairness is the causal reasoning-based fairness criterion that requires a model's prediction for an individual to remain the same in a counterfactual world where their protected attribute (race, gender, age) had been different — providing the strongest individual-level fairness guarantee by asking "would this person have received the same decision if they had been a different race or gender, with everything else causally appropriate adjusted?"

What Is Counterfactual Fairness?

- Definition: A prediction Ŷ is counterfactually fair if P(Ŷ_A←a | X=x, A=a) = P(Ŷ_A←b | X=x, A=a) — the prediction would be identical in the counterfactual world where the individual's protected attribute was different.
- Core Framework: Uses causal models (structural equation models) to reason about what would change if a protected attribute were different.
- Key Innovation: Goes beyond statistical correlation to causal reasoning about fairness.
- Origin: Kusner et al. (2017), "Counterfactual Fairness," NeurIPS.

Why Counterfactual Fairness Matters

- Individual Justice: Evaluates fairness at the individual level, not just across groups.
- Causal Reasoning: Distinguishes between legitimate and illegitimate influences of protected attributes.
- Path-Specific: Can identify which causal pathways from protected attributes to outcomes are fair and which are discriminatory.
- Intuitive Appeal: "Would the decision change if this person were a different race?" is naturally compelling.
- Legal Alignment: Closely matches legal concepts of "but-for" causation in discrimination law.

How Counterfactual Fairness Works

| Step | Action | Purpose |
|------|--------|---------|
| 1. Causal Model | Define causal graph relating attributes, features, and outcomes | Map relationships |
| 2. Identify Paths | Trace causal paths from protected attribute to prediction | Find influence channels |
| 3. Counterfactual | Compute prediction with protected attribute changed | Test fairness |
| 4. Compare | Check if prediction changes across counterfactuals | Measure unfairness |
| 5. Intervene | Modify model to equalize counterfactual predictions | Enforce fairness |

Causal Pathways

- Direct Path: Protected attribute → Prediction (always unfair).
- Indirect Path via Proxy: Protected attribute → ZIP code → Prediction (typically unfair).
- Legitimate Path: Protected attribute → Qualification → Prediction (context-dependent).
- Resolving Path: Protected attribute → Effort → Achievement → Prediction (arguably fair).

Advantages Over Statistical Fairness

- Individual-Level: Evaluates fairness for each person, not just group averages.
- Causal Clarity: Distinguishes legitimate from illegitimate feature influences.
- Handles Proxies: Identifies and addresses proxy discrimination through causal paths.
- Compositional: Can allow some causal paths while blocking others.

Limitations

- Causal Model Required: Requires specifying a causal graph, which may be contested or unknown.
- Counterfactual Identity: "What would this person be like as a different race?" is philosophically complex.
- Computational Cost: Computing counterfactuals through structural equation models is expensive.
- Sensitivity: Results depend heavily on the assumed causal structure.

Counterfactual Fairness is the most principled approach to individual-level algorithmic fairness — grounding fairness in causal reasoning rather than statistical correlation, providing intuitive guarantees about how decisions would change in counterfactual worlds where protected attributes were different.

Want to learn more?

Search 13,225+ semiconductor and AI topics or chat with our AI assistant.

Search Topics Chat with CFSGPT