Mix-and-Match Chiplet Design

Mix-and-Match Chiplet Design is the semiconductor engineering practice of combining dies (chiplets) fabricated using different process technologies, fabricated at different foundries, or designed by different companies into a single heterogeneous integrated package, enabling system architects to optimize each functional block on its ideal process node rather than compromising on a single monolithic die. This paradigm — enabled by advanced packaging technologies like TSMC CoWoS, Intel EMIB, and the UCIe industry standard — represents the most important architectural shift in semiconductor design since the move to planar CMOS, offering a path beyond the economic and physical limits of monolithic scaling.

Why Monolithic SoCs Hit a Wall

Traditionally, chips were designed as a single die on a single process node. This created three compounding problems as nodes advanced beyond 7nm:

1. Yield cliff: A 100mm² die on 3nm with 0.1 defects/cm² yields ~37%. A 400mm² monolithic die yields only ~1.7% — economically catastrophic. Four chiplets of 100mm² each yield 37% each, with a multi-die assembly yield of ~37%⁴ × (packaging yield ~98%) ≈ ~17% — 10x better.
2. Process mismatch: High-performance compute logic benefits from 3nm FinFET/GAA. SerDes and RF circuits operate better on optimized 16nm/12nm processes. Analog blocks may need CMOS on 28nm BCD. A single process node is suboptimal for all these functions.
3. Cost: The NRE (non-recurring engineering) cost for a single 3nm SoC mask set is $40M+. Multiple smaller chiplets reuse existing, already-amortized designs from prior generations.

Mix-and-Match Enables Best-in-Class Integration

A typical AI accelerator example (similar to AMD MI300X/MI300A):

| Chiplet Type | Process Node | Foundry | Function |
|-------------|-------------|---------|----------|
| Compute die | TSMC N5/N4P | TSMC | GPU/GPGPU compute cores, tensor units |
| I/O die | TSMC N6 | TSMC | PCIe 5.0, network interfaces, memory controllers |
| HBM3 stack | DRAM process | SK Hynix/Samsung | High-bandwidth memory |
| Silicon interposer | TSMC CoWoS | TSMC | 2.5D interconnect backplane |

Each component is on its ideal process. The compute die uses the newest, most expensive node. The I/O die uses a mature, RF-optimized node that would cost more to migrate to 3nm. HBM uses a specialized DRAM process entirely different from logic.

UCIe: The Industry's USB Standard for Chiplets

Universal Chiplet Interconnect Express (UCIe) is an open industry standard ratified in 2022 by AMD, Intel, NVIDIA, TSMC, Samsung, Arm, Qualcomm, Google, Meta, and others:

- Physical layer: Defines bump pitch, signal integrity, and power delivery for die-to-die connections
- Protocol layer: Supports PCIe and CXL protocols over the physical interface
- Variants: Standard (55μm bump pitch, ~16 Gbps/mm²) and Advanced (25μm, ~64 Gbps/mm²)
- Purpose: Enables chiplets from different vendors to interoperate — like USB ended proprietary connector formats

Without UCIe, each company uses proprietary interconnects (AMD's Infinity Fabric, Intel's EMIB/Foveros topology) that prevent multi-vendor chiplet mixing.

Real-World Mix-and-Match Examples

AMD MI300X (2024):
- 13 chiplets total: 8 GPU compute dies (TSMC N5), 4 I/O dies (TSMC N6), 1 active base die
- 192GB HBM3 from SK Hynix stacked on top
- 3D stacking via AMD's 3D V-Cache technology
- Result: 1,307 TFLOPS FP8, 5.3 TB/s memory bandwidth

Intel Meteor Lake (2024):
- Compute tile: Intel 4 process (TSMC N3 equivalent)
- GPU tile: TSMC N5
- SoC tile: TSMC N6
- I/O tile: Intel 7
- All connected via Intel EMIB (Embedded Multi-die Interconnect Bridge)

Apple M2 Ultra:
- Two M2 Max dies connected via 2.5TB/s die-to-die interconnect (Apple UltraFusion)
- Software transparent: Applications see it as a single 192GB unified memory chip

NVIDIA H100 SXM:
- H100 GPU die (TSMC N4): Compute
- Separate NVLink chips (TSMC N5): High-bandwidth GPU-to-GPU interconnect

Design Challenges

Mix-and-match introduces engineering challenges not present in monolithic design:

Signal Integrity: Die-to-die connections cross process boundaries with different gate oxide thickness, metal resistivity, and timing models. Serialization/deserialization (SerDes) or parallel interfaces at the chiplet boundary require careful impedance matching.

Thermal Co-management: Different chiplets have different power densities and thermal resistance. A compute die at 400W next to an HBM stack (temperature-sensitive) requires precise thermal co-simulation. TIMs (thermal interface materials) must bridge non-uniform surfaces.

Test and Assembly Yield: Known Good Die (KGD) selection is critical — assembling four dies into one package where any one defective die kills the package assembly. Pre-screening each chiplet before assembly is required. Complex test flows must cover both individual chiplet functionality and inter-chiplet communication.

Supply Chain Coordination: Multi-foundry supply means coordinating yield, bin splits, and inventory from TSMC, Samsung, GlobalFoundries, and/or memory manufacturers simultaneously. Lead times compound.

Timing Convergence: Signals crossing die boundaries introduce latency (typically 3-5 ns per crossing) and require multi-die timing signoff tools. EDA tools from Synopsys and Cadence have evolved to support 2.5D/3D design.

The Economic Case

For a hypothetical 600mm² system:
- Monolithic 3nm: Yield ~1%, wafer cost $20K → cost per good die ~$20,000+
- Six 100mm² chiplets at 3nm: Yield ~37% each, ~17% assembly yield → ~$2,000-3,000 per assembled module
- Reuse I/O and SerDes chiplets from N7 (already amortized): Further reduces NRE by 30-50%

Mix-and-match chiplet architecture is no longer a cutting-edge option — it is the standard design methodology for AI, HPC, and data center chips at NVIDIA, AMD, Intel, Apple, Qualcomm, and every major hyperscaler building custom silicon.

Want to learn more?