TDP (Thermal Design Power)

Keywords: power budget, tdp, thermal, cooling, watt, heat, efficiency

TDP (Thermal Design Power) is the maximum amount of heat a processor generates under sustained workload — measured in watts, this specification determines cooling requirements and power delivery, directly impacting system design for AI workloads where GPU TDP ranges from 75W to 700W.

What Is TDP?

- Definition: Maximum heat output under sustained load, in watts.
- Purpose: Specifies cooling system requirements.
- Measurement: Sustained power, not peak.
- Relation: Roughly equals power consumption under load.

Why TDP Matters for AI

- Cooling Design: Higher TDP needs larger/better coolers.
- Power Delivery: PSU must supply TDP + headroom.
- Data Center: Determines rack density and cooling capacity.
- Operating Costs: Higher TDP = higher electricity bills.
- Thermal Throttling: Inadequate cooling reduces performance.

GPU TDP Comparison

AI/ML GPUs:
``
GPU | TDP (W) | Memory | Use Case
-----------------|---------|-----------|------------------
NVIDIA H100 SXM | 700 | 80GB HBM3 | Training/Inference
NVIDIA H100 PCIe | 350 | 80GB HBM3 | Inference, lower power
NVIDIA A100 SXM | 400 | 80GB HBM2e| Training/Inference
NVIDIA A100 PCIe | 300 | 80GB HBM2e| Inference
NVIDIA L40S | 350 | 48GB GDDR6| Inference
NVIDIA L4 | 72 | 24GB GDDR6| Edge inference
AMD MI300X | 750 | 192GB HBM3| Training
`

Consumer GPUs:
`
GPU | TDP (W) | Memory | AI Use
-----------------|---------|-----------|------------------
RTX 4090 | 450 | 24GB | Dev, small training
RTX 4080 Super | 320 | 16GB | Development
RTX 4070 | 200 | 12GB | Inference
RTX 3090 | 350 | 24GB | Budget training
`

TDP vs. Power Consumption

Understanding the Relationship:
`
TDP: Design thermal envelope (sustained)
Peak Power: Can exceed TDP briefly
Idle Power: Much lower than TDP
Actual Power: Depends on workload

Example (RTX 4090):
TDP: 450W
Peak: ~600W (transient)
Typical gaming: 300-400W
Idle: 20-30W
LLM inference: 250-350W
`

Power Modes:
`
Mode | Power | Performance
---------------|----------|-------------
Full TDP | 100% | 100%
Power limited | 70-80% | 95%
Eco mode | 50-60% | 80%
Undervolted | 80-90% | 100%
`

Cooling Requirements

Cooling Solutions by TDP:
`
TDP Range | Cooling Type | Noise
-------------|----------------------|-------
<100W | Single fan | Low
100-200W | Dual fan | Medium
200-350W | Triple fan/AIO | Medium-High
350-500W | Custom loop/blower | High
500W+ | Liquid (rack/water) | Varies
`

Data Center Cooling:
`
Cooling Type | Capacity | Density
-----------------|-------------|-------------------
Air cooling | <30kW/rack | Standard
Rear-door heat | 30-50kW/rack| Medium density
Direct liquid | 50-100kW/rack| High density H100
Immersion | 100kW+/rack | Extreme density
`

Power Budget Planning

System Power Calculation:
`
Component | Power (W)
-----------------|----------
GPU (H100 SXM) | 700
CPU | 200-350
Memory | 50-100
Storage | 25-50
Networking | 25-50
Misc | 50-100
System total | ~1100-1350W

PSU requirement: 1.5× total = 1650-2000W
`

Rack Planning:
`
8× H100 SXM system: ~10kW
Per-rack capacity: 30-100kW depending on cooling
H100 systems per rack: 3-10
Data center power: MW to hundreds of MW
`

Efficiency Considerations

Performance per Watt:
`
GPU | TDP | FP16 TFLOPS | TFLOPS/W
------------|------|-------------|----------
H100 SXM | 700W | 1979 | 2.83
H100 PCIe | 350W | 1513 | 4.32
A100 SXM | 400W | 312 | 0.78
L4 | 72W | 121 | 1.68
`

Optimization:
`
- Power limiting (90% power → 98% perf typical)
- Undervolting for efficiency
- Workload-appropriate GPU selection
- Batch scheduling to maximize utilization
``

TDP specification is fundamental to AI infrastructure planning — understanding thermal requirements determines cooling design, power delivery, operating costs, and ultimately the density and efficiency of AI compute deployments.

Want to learn more?

Search 13,225+ semiconductor and AI topics or chat with our AI assistant.

Search Topics Chat with CFSGPT