Federated Learning

Federated Learning is the distributed machine learning paradigm where a shared model is trained across multiple decentralized data sources (devices, organizations) without centralizing the data — preserving data privacy by exchanging only model updates (gradients or parameters) rather than raw training data, enabling collaboration between parties that cannot or will not share sensitive information.

FedAvg Algorithm:
- Communication Round: server sends current global model to selected client subset (typically 10-100 of thousands); each client trains the model locally for E epochs on its private data; clients send updated model parameters back to server
- Aggregation: server averages client model updates weighted by dataset size: w_global = Σ(n_k/n)·w_k where n_k is client k's data size; this weighted average approximates centralized SGD under IID data assumptions
- Local Training: each client performs multiple local SGD steps before communication, reducing communication frequency by 10-100× vs single-step SGD; more local steps increase communication efficiency but introduce client drift
- Client Selection: random subset selection each round; not all clients participate every round (device availability, bandwidth constraints); stochastic participation introduces variance equivalent to mini-batch noise

Non-IID Challenges:
- Data Heterogeneity: different clients have drastically different data distributions (a hospital specializes in certain conditions, a user types in a specific language); non-IID data is the primary challenge in federated learning
- Client Drift: with heterogeneous data, local updates push models in different directions; averaging drifted models degrades convergence compared to IID settings; convergence rate degrades proportionally to the degree of heterogeneity
- Solutions: FedProx adds a proximal term penalizing deviation from the global model during local training; SCAFFOLD uses control variates to correct for client drift; FedBN keeps batch normalization layers local (personal) while sharing other parameters
- Personalization: instead of a single global model, produce personalized models for each client; approaches include local fine-tuning after global training, mixture of global and local models, and meta-learning based initialization (Per-FedAvg)

Privacy and Security:
- Differential Privacy (DP): add calibrated noise to model updates before aggregation; guarantees that individual training examples cannot be inferred from the aggregated model; privacy budget ε controls the privacy-utility tradeoff (lower ε = more privacy, noisier model)
- Secure Aggregation: cryptographic protocol ensuring the server only sees the aggregated sum of client updates, not individual updates; prevents server from inspecting any single client's model changes; costs 2-10× communication overhead
- Gradient Inversion Attacks: adversarial server or client can attempt to reconstruct training data from gradient updates; modern attacks can reconstruct images from batch gradients with >90% fidelity for small batches; defense: differential privacy, gradient compression, larger batches
- Byzantine Robustness: malicious clients may send poisoned updates to corrupt the global model; robust aggregation methods (coordinate-wise median, trimmed mean, Krum) filter or down-weight outlier updates

Communication Efficiency:
- Gradient Compression: quantize gradient updates to lower precision (1-bit SGD, ternary quantization); random sparsification sends only top-K% of gradient values — 10-100× communication reduction with modest accuracy impact
- Federated Distillation: clients send model predictions (logits) on a public dataset rather than model parameters; eliminates architecture constraints (heterogeneous client models) and reduces communication to prediction vectors
- Asynchronous Federated: remove synchronization barriers; server aggregates client updates as they arrive; faster wall-clock convergence but introduces staleness — bounded staleness protocols balance freshness with efficiency

Federated learning is the enabling technology for privacy-preserving collaborative AI — allowing hospitals to jointly train diagnostic models without sharing patient records, banks to detect fraud across institutions without exposing transaction data, and mobile devices to improve predictive keyboards without uploading user text to the cloud.

Want to learn more?