Home Knowledge Base Capability Control

Capability Control is the AI safety strategy of limiting what an AI system is physically able to do — independent of what it is trained to want to do — as a defense-in-depth measure against alignment failures — ensuring that even if an AI system's values or goals deviate from human intentions, its ability to cause harm is bounded by hard technical constraints.

What Is Capability Control?

Why Capability Control Matters

Critical Capabilities to Control

Self-Replication:

Resource Acquisition:

Internet and Network Access:

Code Execution Scope:

Tool Use Boundaries:

Capability Control in Practice

Minimal Footprint Principle:

Sandboxing Architecture:

Tripwires and Circuit Breakers:

Capability Control vs. Alignment

ApproachGoalFailure ModeWhen It Helps
AlignmentMake AI want good thingsValues learned incorrectlyPrevents misaligned intent
Capability controlLimit what AI can doOverrides too restrictiveBounds impact of misalignment
MonitoringDetect failures earlyAttacker evades detectionEnables rapid response
InterpretabilityUnderstand AI reasoningMisinterpret findingsPredicts problems before they occur

Capability control is the architectural safety harness that makes AI development safer during the critical period before we have robust alignment guarantees — by ensuring that even imperfectly aligned AI systems cannot take catastrophic or irreversible actions without human oversight, capability control buys the time and error tolerance needed to develop AI alignment into a mature, reliable engineering discipline.

capability controllimit capabilityscope

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.