TDP (Thermal Design Power) is the maximum amount of heat a processor generates under sustained workload — measured in watts, this specification determines cooling requirements and power delivery, directly impacting system design for AI workloads where GPU TDP ranges from 75W to 700W.
What Is TDP?
- Definition: Maximum heat output under sustained load, in watts.
- Purpose: Specifies cooling system requirements.
- Measurement: Sustained power, not peak.
- Relation: Roughly equals power consumption under load.
Why TDP Matters for AI
- Cooling Design: Higher TDP needs larger/better coolers.
- Power Delivery: PSU must supply TDP + headroom.
- Data Center: Determines rack density and cooling capacity.
- Operating Costs: Higher TDP = higher electricity bills.
- Thermal Throttling: Inadequate cooling reduces performance.
GPU TDP Comparison
AI/ML GPUs:
```
GPU | TDP (W) | Memory | Use Case
-----------------|---------|-----------|------------------
NVIDIA H100 SXM | 700 | 80GB HBM3 | Training/Inference
NVIDIA H100 PCIe | 350 | 80GB HBM3 | Inference, lower power
NVIDIA A100 SXM | 400 | 80GB HBM2e| Training/Inference
NVIDIA A100 PCIe | 300 | 80GB HBM2e| Inference
NVIDIA L40S | 350 | 48GB GDDR6| Inference
NVIDIA L4 | 72 | 24GB GDDR6| Edge inference
AMD MI300X | 750 | 192GB HBM3| Training
Consumer GPUs:
``
GPU | TDP (W) | Memory | AI Use
-----------------|---------|-----------|------------------
RTX 4090 | 450 | 24GB | Dev, small training
RTX 4080 Super | 320 | 16GB | Development
RTX 4070 | 200 | 12GB | Inference
RTX 3090 | 350 | 24GB | Budget training
TDP vs. Power Consumption
Understanding the Relationship:
`
TDP: Design thermal envelope (sustained)
Peak Power: Can exceed TDP briefly
Idle Power: Much lower than TDP
Actual Power: Depends on workload
Example (RTX 4090):
TDP: 450W
Peak: ~600W (transient)
Typical gaming: 300-400W
Idle: 20-30W
LLM inference: 250-350W
`
Power Modes:
``
Mode | Power | Performance
---------------|----------|-------------
Full TDP | 100% | 100%
Power limited | 70-80% | 95%
Eco mode | 50-60% | 80%
Undervolted | 80-90% | 100%
Cooling Requirements
Cooling Solutions by TDP:
``
TDP Range | Cooling Type | Noise
-------------|----------------------|-------
<100W | Single fan | Low
100-200W | Dual fan | Medium
200-350W | Triple fan/AIO | Medium-High
350-500W | Custom loop/blower | High
500W+ | Liquid (rack/water) | Varies
Data Center Cooling:
``
Cooling Type | Capacity | Density
-----------------|-------------|-------------------
Air cooling | <30kW/rack | Standard
Rear-door heat | 30-50kW/rack| Medium density
Direct liquid | 50-100kW/rack| High density H100
Immersion | 100kW+/rack | Extreme density
Power Budget Planning
System Power Calculation:
`
Component | Power (W)
-----------------|----------
GPU (H100 SXM) | 700
CPU | 200-350
Memory | 50-100
Storage | 25-50
Networking | 25-50
Misc | 50-100
System total | ~1100-1350W
PSU requirement: 1.5× total = 1650-2000W
`
Rack Planning:
``
8× H100 SXM system: ~10kW
Per-rack capacity: 30-100kW depending on cooling
H100 systems per rack: 3-10
Data center power: MW to hundreds of MW
Efficiency Considerations
Performance per Watt:
``
GPU | TDP | FP16 TFLOPS | TFLOPS/W
------------|------|-------------|----------
H100 SXM | 700W | 1979 | 2.83
H100 PCIe | 350W | 1513 | 4.32
A100 SXM | 400W | 312 | 0.78
L4 | 72W | 121 | 1.68
Optimization:
```
- Power limiting (90% power → 98% perf typical)
- Undervolting for efficiency
- Workload-appropriate GPU selection
- Batch scheduling to maximize utilization
TDP specification is fundamental to AI infrastructure planning — understanding thermal requirements determines cooling design, power delivery, operating costs, and ultimately the density and efficiency of AI compute deployments.