Differential Privacy
What is Differential Privacy? A mathematical framework providing rigorous privacy guarantees, ensuring that the output of a computation is nearly the same whether or not any individual data point is included.
Formal Definition A mechanism M is epsilon-differentially private if for all outputs S and datasets D, D_prime differing in one element:
P(M(D) in S) <= e^epsilon * P(M(D_prime) in S)
Lower epsilon = stronger privacy.
Key Concepts
| Concept | Description |
|---|---|
| Epsilon (eps) | Privacy budget, lower is more private |
| Delta | Probability of failure |
| Sensitivity | Max change from one person |
| Noise | Added randomness for privacy |
DP Mechanisms
Laplace Mechanism For numeric queries:
def laplace_mechanism(true_value, sensitivity, epsilon):
scale = sensitivity / epsilon
noise = numpy.random.laplace(0, scale)
return true_value + noise
Gaussian Mechanism For approximate DP:
def gaussian_mechanism(true_value, sensitivity, epsilon, delta):
sigma = sensitivity * sqrt(2 * log(1.25 / delta)) / epsilon
noise = numpy.random.normal(0, sigma)
return true_value + noise
DP-SGD (Differentially Private Training)
def dp_sgd_step(model, batch, clip_norm, noise_multiplier, lr):
# Compute per-sample gradients
per_sample_grads = compute_per_sample_gradients(model, batch)
# Clip each gradient
clipped_grads = [
g * min(1, clip_norm / g.norm())
for g in per_sample_grads
]
# Aggregate with noise
avg_grad = sum(clipped_grads) / len(batch)
noise = torch.randn_like(avg_grad) * clip_norm * noise_multiplier / len(batch)
noisy_grad = avg_grad + noise
# Update
for param, grad in zip(model.parameters(), noisy_grad):
param.data -= lr * grad
Privacy Accounting Track cumulative privacy loss:
from opacus.accountants import RDPAccountant
accountant = RDPAccountant()
for step in range(steps):
accountant.step(noise_multiplier, sample_rate)
epsilon, delta = accountant.get_privacy_spent(target_delta=1e-5)
print(f"Total privacy: eps={epsilon:.2f}, delta={delta}")
Tools
| Tool | Features |
|---|---|
| Opacus | PyTorch DP training |
| TF Privacy | TensorFlow DP |
| PyDP | DP primitives |
| Tumult Analytics | DP analytics |
Trade-offs
| Higher Privacy | Lower Privacy |
|---|---|
| More noise | Less noise |
| Lower accuracy | Higher accuracy |
| Slower training | Faster training |
Best Practices
- Start with reasonable epsilon (1-10 for training)
- Use privacy accounting throughout
- Consider local vs central DP
- Validate utility on downstream tasks
differential privacydpnoise
Explore 500+ Semiconductor & AI Topics
From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.