Attack goals

Data poisoning injects malicious samples into training data to corrupt model behavior. Attack goals: Untargeted: Degrade overall model performance. Targeted: Make model misbehave on specific inputs while maintaining overall accuracy. Backdoor: Install hidden trigger that causes specific behavior. Attack vectors: Compromised labelers, poisoning public datasets, adversarial data contributions, supply chain attacks on training pipelines. Poison types: Clean-label: Poison examples have correct labels but adversarial features. Dirty-label: Intentionally mislabeled examples. Gradient-based: Craft poisons to maximally affect model. Impact examples: Spam filter trained to ignore specific spam patterns, classifier trained to misclassify specific targets. Defenses: Data sanitization, anomaly detection, certified defenses, robust training algorithms, provenance tracking. Challenges: Detecting subtle poisoning, clean-label attacks hard to spot, distinguishing poison from noise. Federated learning vulnerability: Malicious clients can poison aggregated model. Prevalence: Real concern for crowdsourced data, web-scraped datasets. Defense requires careful data pipeline security.

Want to learn more?