TDP (Thermal Design Power)

TDP (Thermal Design Power) is the maximum amount of heat a processor generates under sustained workload — measured in watts, this specification determines cooling requirements and power delivery, directly impacting system design for AI workloads where GPU TDP ranges from 75W to 700W.

What Is TDP?

- Definition: Maximum heat output under sustained load, in watts.
- Purpose: Specifies cooling system requirements.
- Measurement: Sustained power, not peak.
- Relation: Roughly equals power consumption under load.

Why TDP Matters for AI

- Cooling Design: Higher TDP needs larger/better coolers.
- Power Delivery: PSU must supply TDP + headroom.
- Data Center: Determines rack density and cooling capacity.
- Operating Costs: Higher TDP = higher electricity bills.
- Thermal Throttling: Inadequate cooling reduces performance.

GPU TDP Comparison

AI/ML GPUs:
``GPU | TDP (W) | Memory | Use Case -----------------|---------|-----------|------------------ NVIDIA H100 SXM | 700 | 80GB HBM3 | Training/Inference NVIDIA H100 PCIe | 350 | 80GB HBM3 | Inference, lower power NVIDIA A100 SXM | 400 | 80GB HBM2e| Training/Inference NVIDIA A100 PCIe | 300 | 80GB HBM2e| Inference NVIDIA L40S | 350 | 48GB GDDR6| Inference NVIDIA L4 | 72 | 24GB GDDR6| Edge inference AMD MI300X | 750 | 192GB HBM3| Training`

Consumer GPUs:`GPU | TDP (W) | Memory | AI Use -----------------|---------|-----------|------------------ RTX 4090 | 450 | 24GB | Dev, small training RTX 4080 Super | 320 | 16GB | Development RTX 4070 | 200 | 12GB | Inference RTX 3090 | 350 | 24GB | Budget training`

TDP vs. Power Consumption

Understanding the Relationship:`TDP: Design thermal envelope (sustained) Peak Power: Can exceed TDP briefly Idle Power: Much lower than TDP Actual Power: Depends on workload

Example (RTX 4090): TDP: 450W Peak: ~600W (transient) Typical gaming: 300-400W Idle: 20-30W LLM inference: 250-350W`

Power Modes:`Mode | Power | Performance ---------------|----------|------------- Full TDP | 100% | 100% Power limited | 70-80% | 95% Eco mode | 50-60% | 80% Undervolted | 80-90% | 100%`

Cooling Requirements

Cooling Solutions by TDP:`TDP Range | Cooling Type | Noise -------------|----------------------|------- <100W | Single fan | Low 100-200W | Dual fan | Medium 200-350W | Triple fan/AIO | Medium-High 350-500W | Custom loop/blower | High 500W+ | Liquid (rack/water) | Varies`

Data Center Cooling:`Cooling Type | Capacity | Density -----------------|-------------|------------------- Air cooling | <30kW/rack | Standard Rear-door heat | 30-50kW/rack| Medium density Direct liquid | 50-100kW/rack| High density H100 Immersion | 100kW+/rack | Extreme density`

Power Budget Planning

System Power Calculation:`Component | Power (W) -----------------|---------- GPU (H100 SXM) | 700 CPU | 200-350 Memory | 50-100 Storage | 25-50 Networking | 25-50 Misc | 50-100 System total | ~1100-1350W

PSU requirement: 1.5× total = 1650-2000W`

Rack Planning:`8× H100 SXM system: ~10kW Per-rack capacity: 30-100kW depending on cooling H100 systems per rack: 3-10 Data center power: MW to hundreds of MW`

Efficiency Considerations

Performance per Watt:`GPU | TDP | FP16 TFLOPS | TFLOPS/W ------------|------|-------------|---------- H100 SXM | 700W | 1979 | 2.83 H100 PCIe | 350W | 1513 | 4.32 A100 SXM | 400W | 312 | 0.78 L4 | 72W | 121 | 1.68`

Optimization:`- Power limiting (90% power → 98% perf typical) - Undervolting for efficiency - Workload-appropriate GPU selection - Batch scheduling to maximize utilization``

TDP specification is fundamental to AI infrastructure planning — understanding thermal requirements determines cooling design, power delivery, operating costs, and ultimately the density and efficiency of AI compute deployments.

Want to learn more?