Purpose

Model watermarking embeds secret signals to prove ownership or detect unauthorized model use. Purpose: IP protection, leak detection, usage tracking, compliance verification. Watermarking types: Weight-based: Encode signal in model parameters (specific patterns in weights). Behavior-based: Model produces specific outputs for trigger inputs (backdoor-style). API-based: Watermark added to outputs at inference. Embedding techniques: Modify training to encode watermark, post-training weight modification, trigger-response pairs. Detection: Present trigger inputs, verify expected response, statistical analysis of weights. Properties needed: Fidelity: Doesn't hurt model performance. Robustness: Survives fine-tuning, pruning, quantization. Undetectability: Hard to find and remove. Capacity: Enough bits for identification. Attacks on watermarks: Fine-tuning to remove, model extraction to new architecture, watermark detection and removal. Open source challenge: Can't watermark publicly shared weights (signals become known). Applications: Proving model theft, licensing compliance, detecting model laundering. Active research area as model IP becomes valuable.

Want to learn more?