Secure enclaves for ML inference

Secure enclaves for ML inference are hardware-isolated execution environments that protect sensitive data and model parameters during computation — using processor-level isolation technologies (Intel SGX, AMD SEV, ARM TrustZone, AWS Nitro Enclaves) to create tamper-resistant "trusted execution environments" (TEEs) where neither the cloud provider's privileged software (OS, hypervisor), nor other tenants, nor physical attackers can access the plaintext data, model weights, or intermediate computations, enabling confidential AI inference for healthcare, finance, and government applications where data sovereignty is non-negotiable.

The Threat Model

Standard cloud ML inference operates in an environment with multiple untrusted layers:

| Layer | Who Controls It | Can They See Your Data? |
|-------|----------------|------------------------|
| Application | Customer | Yes (you control this) |
| Container / VM | Cloud provider infrastructure | Yes (hypervisor has full access) |
| Operating system | Cloud provider | Yes (kernel sees all memory) |
| Hardware | Cloud provider / data center staff | Yes (physical memory access) |

Secure enclaves isolate a small protected region that is inaccessible even to the OS and hypervisor — only the CPU itself enforces the isolation boundary.

Intel SGX (Software Guard Extensions)

SGX is the most widely deployed TEE technology:

Architecture: Code and data within an "enclave" are encrypted in RAM using an ephemeral AES key stored only within the CPU. The Memory Encryption Engine (MEE) automatically encrypts/decrypts as data moves between CPU cache and DRAM.

Remote attestation: Before sending sensitive data to an SGX enclave, the data owner can cryptographically verify:
1. The enclave is running on genuine Intel hardware
2. The specific software running inside the enclave (via code measurement hash)
3. The SGX firmware is patched and uncompromised

This "trust but verify" mechanism enables secure delegation: the data owner sends encrypted data only after confirming what software will process it.

SGX for ML Inference: The ML model and inference code run inside the enclave. Input data is decrypted inside the enclave (only the CPU sees plaintext), inference executes, output is re-encrypted before leaving the enclave. The cloud provider runs the hardware but provably cannot access inputs, model weights, or outputs.

Limitations: SGX memory is limited (typically 256MB to several GB), restricting model size. Large language models (7B+ parameters) exceed SGX capacity — requiring model partitioning across multiple enclaves or alternative TEE designs.

AMD SEV (Secure Encrypted Virtualization)

AMD SEV provides VM-level rather than application-level isolation:
- The entire VM memory is encrypted with a per-VM key managed by the AMD Secure Processor (separate from the main CPU)
- The hypervisor cannot read VM memory even with root access
- SEV-SNP (Secure Nested Paging) adds integrity protection against hypervisor-based manipulation of page tables

AMD SEV is more suitable than SGX for large model inference because it encrypts the entire VM rather than a limited enclave region — supporting models of any size that fit in the VM's RAM allocation.

ARM TrustZone

TrustZone partitions the ARM processor into "Secure World" and "Normal World":
- Trusted OS (e.g., OP-TEE) runs in Secure World and handles sensitive operations
- Regular OS (Android, Linux) runs in Normal World and cannot access Secure World memory

Widely deployed in mobile devices for biometric processing (fingerprint, face recognition) and payment credential storage. Increasingly used for on-device AI inference on sensitive data (medical monitoring, private communication analysis).

AWS Nitro Enclaves

AWS-specific technology creating isolated EC2 instances within EC2 instances:
- No persistent storage, no interactive access, no networking (except local socket to parent EC2)
- Cryptographic attestation of enclave identity
- Parent EC2 instance cannot access enclave memory

Designed specifically for processing sensitive data in the cloud: medical record processing, cryptographic key operations, and confidential ML inference.

Performance Overhead

TEE overhead compared to unprotected execution:
- SGX memory operations: 10-40% overhead (memory encryption/decryption, cache pressure from EPC paging)
- AMD SEV: 2-10% overhead (bulk encryption more efficient than SGX page-level encryption)
- Attestation overhead: One-time cost (<1 second) per enclave session establishment

For many applications, the privacy guarantee is worth the performance cost — particularly when the alternative is not using cloud ML at all due to compliance constraints.

Confidential Computing Consortium

The Linux Foundation's Confidential Computing Consortium standardizes TEE interfaces and attestation protocols across AMD, Intel, ARM, Nvidia (Hopper H100 includes Confidential Computing mode), and cloud providers. Nvidia H100 GPU enclaves support confidential GPU inference, removing the bottleneck that GPU-accelerated models could not benefit from TEE protection.

Want to learn more?