Property inference

Property inference is a privacy attack against machine learning models that enables an adversary to determine aggregate statistical properties of the training dataset — such as the proportion of training examples with a particular attribute, the presence of a demographic subgroup, or the distribution of sensitive characteristics — by analyzing model parameters, outputs, or behavior patterns, constituting a privacy threat distinct from membership inference (which targets individual records) because it can reveal population-level secrets even when individual privacy is protected.

Distinction from Other Privacy Attacks

| Attack Type | Target | What Is Recovered | Example |
|-------------|--------|------------------|---------|
| Membership inference | Individual records | Was this specific person in the training set? | Determining if patient X's record was used |
| Model inversion | Input reconstruction | What did the training inputs look like? | Reconstructing faces from face recognition model |
| Property inference | Dataset statistics | What fraction of training data has property P? | Inferring % of female patients in training set |
| Training data extraction | Memorized content | Exact verbatim training examples | Extracting memorized text from language models |

Property inference is particularly insidious because it can succeed even when: the model implements differential privacy (which protects individuals, not population statistics), individual membership cannot be determined, and the model appears to behave normally on all evaluation inputs.

Attack Methodology

Property inference attacks typically follow one of two approaches:

Meta-classifier attack (Ganju et al., 2018): The adversary trains a meta-model on shadow models to predict the property from model parameters or activations.

Step 1: Train a large number of "shadow" models on datasets with known property prevalence (50% female, 30% female, 70% female, etc.)
Step 2: Extract features from each shadow model (weight statistics, activation patterns, gradient signatures)
Step 3: Train a meta-classifier mapping model features → property value
Step 4: Apply meta-classifier to the target model to infer its training set property

Behavioral probing: Design probe inputs that elicit different model behaviors depending on training set composition:
- Input texts referencing demographic groups and measure differential response rates
- Craft feature perturbations that reveal whether underrepresented groups are present
- Analyze confidence calibration differences across subgroups

Properties That Can Be Inferred

Research has demonstrated inference of:
- Gender and racial composition of training datasets (face recognition, medical imaging)
- Presence of specific individuals in training data (without identifying which individuals)
- Geographic distribution of training examples
- Economic characteristics of training population (income levels in financial models)
- Presence of sensitive behaviors (e.g., detecting if a text model trained on toxic content)
- Training data source composition (detecting which datasets were included in pretraining)

Defenses

| Defense | Mechanism | Limitation |
|---------|-----------|------------|
| Differential privacy | Add calibrated noise to gradients | Protects individuals but not aggregate properties by design |
| Representation scrubbing | Remove property-correlated features from representations | May degrade utility on legitimate tasks |
| Output perturbation | Add noise to API outputs | Reduces attack accuracy but degrades utility |
| Model weight encryption | Prevent direct weight access | Does not prevent behavioral probing |
| Access control and rate limiting | Limit query volume | Slows attack, does not prevent it |

Significance for Regulated Industries

In healthcare, financial services, and government:
- Training dataset composition may be commercially sensitive or legally restricted
- Revealing that a medical AI was trained predominantly on one demographic group raises fairness concerns and regulatory scrutiny
- Property inference can constitute a data breach under GDPR if the inferred properties are personal data of the training population

Property inference represents a fundamental tension in ML privacy: differential privacy provides strong individual-level protection but by design allows aggregate statistics to be learned — which is exactly what property inference exploits.

Want to learn more?