Home Knowledge Base Model Extraction (Model Stealing)

Model Extraction (Model Stealing) is the adversarial attack where an adversary reconstructs a functional copy of a proprietary machine learning model by systematically querying its API and training a surrogate model on the collected (input, output) pairs — enabling theft of intellectual property, transfer of capabilities to bypass API restrictions, and creation of local models for mounting more effective adversarial attacks.

What Is Model Extraction?

Why Model Extraction Matters

Attack Strategies

Equation-Solving (Linear/Logistic Models):

Learning-Based Extraction:

Active Learning Extraction:

Knockoff Nets (Orekondy et al.):

Query Efficiency

Attack TypeQueries RequiredAccuracy Achieved
Random queries50K-500K80-95% of original
Active learning5K-50K80-90% of original
Distribution-matched100K90-98% of original
Architecture-matched10KNear-perfect

Defenses

Detection:

Mitigation:

Watermarking:

Model extraction is the intellectual property theft attack enabled by the API economy of AI — as valuable ML models are increasingly deployed as API services, the ability to systematically recover their behavior through query-response pairs represents a fundamental tension between the commercial need to monetize ML capabilities and the impossibility of preventing information extraction from any black-box system that must respond to user queries.

model extractionstealingquery

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.