Chemical Mechanical Planarization (CMP) Modeling in Semiconductor Manufacturing

Chemical Mechanical Planarization (CMP) Modeling in Semiconductor Manufacturing

1. Fundamentals of CMP

1.1 Definition and Principle

Chemical Mechanical Planarization (CMP) is a hybrid process combining:

- Chemical etching: Reactive slurry chemistry modifies surface properties
- Mechanical abrasion: Physical removal via abrasive particles and pad

The fundamental material removal can be expressed as:

$$
\text{Material Removal} = f(\text{Chemical Reaction}, \text{Mechanical Abrasion})
$$

1.2 Process Components

| Component | Function | Key Parameters |
|-----------|----------|----------------|
| Wafer | Substrate to be planarized | Material type, pattern density |
| Polishing Pad | Provides mechanical action | Hardness, porosity, asperity distribution |
| Slurry | Chemical + abrasive medium | pH, oxidizer, particle size/concentration |
| Carrier | Holds and rotates wafer | Down force, rotation speed |
| Platen | Rotates polishing pad | Rotation speed, temperature |

1.3 Key Process Parameters

- Down Force ($F$): Pressure applied to wafer, typically $1-7$ psi
- Platen Speed ($\omega_p$): Pad rotation, typically $20-100$ rpm
- Carrier Speed ($\omega_c$): Wafer rotation, typically $20-100$ rpm
- Slurry Flow Rate ($Q$): Typically $100-300$ mL/min
- Temperature ($T$): Typically $20-50°C$

2. Classical Physical Models

2.1 Preston Equation (Foundational Model)

The foundational model for CMP is the Preston equation (1927):

$$
\boxed{MRR = k_p \cdot P \cdot v}
$$

Where:
- $MRR$ = Material Removal Rate $[\text{nm/min}]$
- $k_p$ = Preston's coefficient $[\text{m}^2/\text{N}]$
- $P$ = Applied pressure $[\text{Pa}]$
- $v$ = Relative velocity $[\text{m/s}]$

The relative velocity between wafer and pad:

$$
v = \sqrt{(\omega_p r_p)^2 + (\omega_c r_c)^2 - 2\omega_p \omega_c r_p r_c \cos(\theta)}
$$

Where:
- $\omega_p, \omega_c$ = Angular velocities of platen and carrier
- $r_p, r_c$ = Radial positions
- $\theta$ = Phase angle

2.2 Modified Preston Models

2.2.1 Pressure-Velocity Product Modification

$$
MRR = k_p \cdot P^a \cdot v^b
$$

Where $a, b$ are empirical exponents (typically $0.5 < a, b < 1.5$)

2.2.2 Chemical Enhancement Factor

$$
MRR = k_p \cdot P \cdot v \cdot f(C, T, pH)
$$

Where $f(C, T, pH)$ represents chemical effects:
- $C$ = Oxidizer concentration
- $T$ = Temperature
- $pH$ = Slurry pH

2.2.3 Arrhenius-Modified Preston Equation

$$
MRR = k_0 \cdot \exp\left(-\frac{E_a}{RT}\right) \cdot P \cdot v
$$

Where:
- $k_0$ = Pre-exponential factor
- $E_a$ = Activation energy $[\text{J/mol}]$
- $R$ = Gas constant $= 8.314$ J/(mol$\cdot$K)
- $T$ = Temperature $[\text{K}]$

2.3 Tribocorrosion Model

For metal CMP (e.g., tungsten, copper):

$$
MRR = \frac{M}{z F \rho} \cdot \left( i_{corr} + \frac{Q_{pass}}{A \cdot t_{pass}} \right) \cdot f_{mech}
$$

Where:
- $M$ = Molar mass of metal
- $z$ = Number of electrons transferred
- $F$ = Faraday constant $= 96485$ C/mol
- $\rho$ = Density
- $i_{corr}$ = Corrosion current density
- $Q_{pass}$ = Passivation charge
- $f_{mech}$ = Mechanical factor

2.4 Contact Mode Classification

| Mode | Condition | Preston Constant | Friction Coefficient |
|------|-----------|------------------|---------------------|
| Contact | $\frac{\eta v_R}{p} < (\frac{\eta v_R}{p})_c$ | High, constant | High ($\mu > 0.3$) |
| Mixed | $\frac{\eta v_R}{p} \approx (\frac{\eta v_R}{p})_c$ | Transitional | Medium |
| Hydroplaning | $\frac{\eta v_R}{p} > (\frac{\eta v_R}{p})_c$ | Low, variable | Low ($\mu < 0.1$) |

Where:
- $\eta$ = Slurry viscosity
- $v_R$ = Relative velocity
- $p$ = Pressure

3. Pattern Density Models

3.1 Effective Pattern Density Model (Stine Model)

The local material removal rate depends on effective pattern density:

$$
\frac{dz}{dt} = -\frac{K}{\rho_{eff}(x, y)}
$$

Where:
- $z$ = Surface height
- $K$ = Blanket removal rate $= k_p \cdot P \cdot v$
- $\rho_{eff}$ = Effective pattern density

3.1.1 Effective Density Calculation

$$
\rho_{eff}(x, y) = \iint_{-\infty}^{\infty} \rho_0(x', y') \cdot W(x - x', y - y') \, dx' \, dy'
$$

Where:
- $\rho_0(x, y)$ = Local pattern density
- $W(x, y)$ = Weighting function (planarization kernel)

3.1.2 Elliptical Weighting Function

$$
W(x, y) = \frac{1}{\pi L_x L_y} \cdot \exp\left(-\frac{x^2}{L_x^2} - \frac{y^2}{L_y^2}\right)
$$

Where $L_x, L_y$ are planarization lengths in x and y directions.

3.2 Step Height Evolution Model

For oxide CMP with step height $h$:

$$
\frac{dh}{dt} = -K \cdot \left(1 - \frac{h_{contact}}{h}\right) \quad \text{for } h > h_{contact}
$$

$$
\frac{dh}{dt} = 0 \quad \text{for } h \leq h_{contact}
$$

Where $h_{contact}$ is the pad contact threshold height.

3.3 Integrated Density-Step Height Model

Combined model for oxide thickness evolution:

$$
z(x, y, t) = z_0 - K \cdot t \cdot \frac{1}{\rho_{eff}(x, y)} \cdot g(h)
$$

Where $g(h)$ is the step-height dependent function:

$$
g(h) = \begin{cases}
1 & \text{if } h > h_c \\
\frac{h}{h_c} & \text{if } h \leq h_c
\end{cases}
$$

4. Dishing and Erosion Models

4.1 Copper Dishing Model

Dishing depth $D$ for copper lines:

$$
D = K_{Cu} \cdot t_{over} \cdot f(w)
$$

Where:
- $K_{Cu}$ = Copper removal rate
- $t_{over}$ = Overpolish time
- $w$ = Line width
- $f(w)$ = Width-dependent function

Empirical relationship:

$$
D = D_0 \cdot \left(1 - \exp\left(-\frac{w}{w_c}\right)\right)
$$

Where:
- $D_0$ = Maximum dishing depth
- $w_c$ = Critical line width

4.2 Oxide Erosion Model

Erosion $E$ in dense pattern regions:

$$
E = K_{ox} \cdot t_{over} \cdot \rho_{metal}
$$

Where:
- $K_{ox}$ = Oxide removal rate
- $\rho_{metal}$ = Local metal pattern density

4.3 Combined Dishing-Erosion

Total copper thickness loss:

$$
\Delta z_{Cu} = D + E \cdot \frac{\rho_{metal}}{1 - \rho_{metal}}
$$

4.4 Pattern Density Effects

| Pattern Density | Dishing Behavior | Erosion Behavior |
|-----------------|------------------|------------------|
| Low ($< 20\%$) | Minimal | Minimal |
| Medium ($20-50\%$) | Moderate | Increasing |
| High ($> 50\%$) | Saturates | Severe |

5. Contact Mechanics Models

5.1 Pad Asperity Contact Model

Assuming Gaussian asperity height distribution:

$$
P(z) = \frac{1}{\sigma_s \sqrt{2\pi}} \exp\left(-\frac{(z - \bar{z})^2}{2\sigma_s^2}\right)
$$

Where:
- $\sigma_s$ = Standard deviation of asperity heights
- $\bar{z}$ = Mean asperity height

5.2 Real Contact Area

$$
A_r = \pi n \int_{d}^{\infty} R(z - d) \cdot P(z) \, dz
$$

Where:
- $n$ = Number of asperities per unit area
- $R$ = Asperity tip radius
- $d$ = Separation distance

For Gaussian distribution:

$$
A_r = \pi n R \sigma_s \cdot F_1\left(\frac{d}{\sigma_s}\right)
$$

Where $F_1$ is a statistical function.

5.3 Hertzian Contact

For elastic contact between abrasive particle and wafer:

$$
a = \left(\frac{3FR}{4E^*}\right)^{1/3}
$$

$$
\delta = \frac{a^2}{R} = \left(\frac{9F^2}{16RE^{*2}}\right)^{1/3}
$$

Where:
- $a$ = Contact radius
- $F$ = Normal force
- $R$ = Particle radius
- $\delta$ = Indentation depth
- $E^*$ = Effective elastic modulus

$$
\frac{1}{E^*} = \frac{1 -
u_1^2}{E_1} + \frac{1 -
u_2^2}{E_2}
$$

5.4 Material Removal by Single Abrasive

Volume removed per abrasive per pass:

$$
V = K_{wear} \cdot \frac{F_n \cdot L}{H}
$$

Where:
- $K_{wear}$ = Wear coefficient
- $F_n$ = Normal force on particle
- $L$ = Sliding distance
- $H$ = Hardness of wafer material

5.5 Multi-Scale Model Framework

``- ┌─────────────────────────────────────────────────────────────┐ │ WAFER SCALE (mm-cm) │ │ Pressure distribution, global uniformity │ ├─────────────────────────────────────────────────────────────┤ │ DIE SCALE ($\mu$m-mm) │ │ Pattern density effects, planarization │ ├─────────────────────────────────────────────────────────────┤ │ FEATURE SCALE (nm-$\mu$m) │ │ Dishing, erosion, step height evolution │ ├─────────────────────────────────────────────────────────────┤ │ PARTICLE SCALE (nm) │ │ Abrasive-surface interactions │ ├─────────────────────────────────────────────────────────────┤ │ MOLECULAR SCALE (Å) │ │ Chemical reactions, atomic removal │ └─────────────────────────────────────────────────────────────┘`

6. Machine Learning and Neural Network Models

6.1 Overview of ML Approaches

Machine learning methods for CMP modeling:

- Supervised Learning - Artificial Neural Networks (ANN) - Convolutional Neural Networks (CNN) - Support Vector Machines (SVM) - Random Forests / Gradient Boosting - Deep Learning - Deep Belief Networks (DBN) - Long Short-Term Memory (LSTM) - Generative Adversarial Networks (GAN)

- Transfer Learning - Pre-trained models adapted to new process conditions

6.2 Neural Network Architecture for CMP

6.2.1 Input Features

$$ \mathbf{x} = [P, v, t, \rho, w, s, pH, C_{ox}, T, ...]^T $$

Where: - $P$ = Pressure - $v$ = Velocity - $t$ = Polish time - $\rho$ = Pattern density - $w$ = Feature width - $s$ = Feature spacing - $pH$ = Slurry pH - $C_{ox}$ = Oxidizer concentration - $T$ = Temperature

6.2.2 Multi-Layer Perceptron (MLP)

$$ \mathbf{h}^{(1)} = \sigma(\mathbf{W}^{(1)} \mathbf{x} + \mathbf{b}^{(1)}) $$

$$ \mathbf{h}^{(2)} = \sigma(\mathbf{W}^{(2)} \mathbf{h}^{(1)} + \mathbf{b}^{(2)}) $$

$$ \hat{y} = \mathbf{W}^{(out)} \mathbf{h}^{(2)} + \mathbf{b}^{(out)} $$

Where: - $\sigma$ = Activation function (ReLU, tanh, sigmoid) - $\mathbf{W}^{(i)}$ = Weight matrices - $\mathbf{b}^{(i)}$ = Bias vectors

6.2.3 Activation Functions

| Function | Formula | Use Case | |----------|---------|----------| | ReLU | $\sigma(x) = \max(0, x)$ | Hidden layers | | Sigmoid | $\sigma(x) = \frac{1}{1 + e^{-x}}$ | Output (binary) | | Tanh | $\sigma(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$ | Hidden layers | | Softmax | $\sigma(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}$ | Classification |

6.3 CNN-Based CMP Modeling (CmpCNN)

6.3.1 Architecture

`Input: Layout Image (Binary) + Density Map ↓ Conv2D Layer (3×3 kernel, 32 filters) ↓ MaxPooling2D (2×2) ↓ Conv2D Layer (3×3 kernel, 64 filters) ↓ MaxPooling2D (2×2) ↓ Flatten ↓ Dense Layer (256 units) ↓ Dense Layer (128 units) ↓ Output: Post-CMP Height Map`

6.3.2 Convolution Operation

$$ (I * K)(i, j) = \sum_m \sum_n I(i+m, j+n) \cdot K(m, n) $$

Where: - $I$ = Input image (layout) - $K$ = Convolution kernel - $(i, j)$ = Output position

6.4 Loss Functions

6.4.1 Mean Squared Error (MSE)

$$ \mathcal{L}_{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2 $$

6.4.2 Root Mean Square Error (RMSE)

$$ RMSE = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2} $$

6.4.3 Mean Absolute Percentage Error (MAPE)

$$ MAPE = \frac{100\%}{N} \sum_{i=1}^{N} \left| \frac{y_i - \hat{y}_i}{y_i} \right| $$

6.5 Transfer Learning Framework

For adapting models across process nodes:

$$ \mathcal{L}_{transfer} = \mathcal{L}_{target} + \lambda \cdot \mathcal{L}_{domain} $$

Where: - $\mathcal{L}_{target}$ = Target domain loss - $\mathcal{L}_{domain}$ = Domain adaptation loss - $\lambda$ = Regularization parameter

6.6 Performance Metrics

| Metric | Formula | Target | |--------|---------|--------| | $R^2$ | $1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}$ | $> 0.95$ | | RMSE | $\sqrt{\frac{1}{N}\sum(y_i - \hat{y}_i)^2}$ | $< 5$ Å | | MAE | $\frac{1}{N}\sum|y_i - \hat{y}_i|$ | $< 3$ Å |

7. Slurry Chemistry Modeling

7.1 Kaufman Mechanism

Cyclic passivation-depassivation process:

$$ \text{Metal} \xrightarrow{\text{Oxidizer}} \text{Metal Oxide} \xrightarrow{\text{Abrasion}} \text{Removal} $$

7.2 Electrochemical Reactions

7.2.1 Copper CMP

Oxidation: $$ \text{Cu} \rightarrow \text{Cu}^{2+} + 2e^- $$

Passivation (with BTA): $$ \text{Cu} + \text{BTA} \rightarrow \text{Cu-BTA}_{film} $$

Complexation: $$ \text{Cu}^{2+} + n\text{L} \rightarrow [\text{CuL}_n]^{2+} $$

Where L = chelating agent (e.g., glycine, citrate)

7.2.2 Tungsten CMP

Oxidation: $$ \text{W} + 3\text{H}_2\text{O} \rightarrow \text{WO}_3 + 6\text{H}^+ + 6e^- $$

With hydrogen peroxide: $$ \text{W} + 3\text{H}_2\text{O}_2 \rightarrow \text{WO}_3 + 3\text{H}_2\text{O} $$

7.3 Pourbaix Diagram Integration

Stability regions defined by:

$$ E = E^0 - \frac{RT}{nF} \ln Q - \frac{RT}{F} \cdot m \cdot pH $$

Where: - $E$ = Electrode potential - $E^0$ = Standard potential - $Q$ = Reaction quotient - $m$ = Number of H⁺ in reaction

7.4 Abrasive Particle Effects

7.4.1 Particle Size Distribution (PSD)

Log-normal distribution:

$$ f(d) = \frac{1}{d \sigma \sqrt{2\pi}} \exp\left(-\frac{(\ln d - \mu)^2}{2\sigma^2}\right) $$

Where: - $d$ = Particle diameter - $\mu$ = Mean of $\ln(d)$ - $\sigma$ = Standard deviation of $\ln(d)$

7.4.2 Zeta Potential

$$ \zeta = \frac{4\pi \eta \mu_e}{\varepsilon} $$

Where: - $\eta$ = Viscosity - $\mu_e$ = Electrophoretic mobility - $\varepsilon$ = Dielectric constant

7.5 Slurry Components Summary

| Component | Function | Typical Materials | |-----------|----------|-------------------| | Abrasive | Mechanical removal | SiO₂, CeO₂, Al₂O₃ | | Oxidizer | Surface modification | H₂O₂, KIO₃, Fe(NO₃)₃ | | Complexant | Metal dissolution | Glycine, citric acid | | Inhibitor | Corrosion protection | BTA, BBI | | Surfactant | Particle dispersion | CTAB, SDS | | Buffer | pH control | Phosphate, citrate |

8. Chip-Scale and Full-Chip Models

8.1 Within-Wafer Non-Uniformity (WIWNU)

$$ WIWNU = \frac{\sigma_{thickness}}{\bar{thickness}} \times 100\% $$

Where: - $\sigma_{thickness}$ = Standard deviation of thickness - $\bar{thickness}$ = Mean thickness

8.2 Pressure Distribution Model

For a flexible carrier:

$$ P(r) = P_0 + \sum_{i=1}^{n} P_i \cdot J_0\left(\frac{\alpha_i r}{R}\right) $$

Where: - $P_0$ = Base pressure - $J_0$ = Bessel function of first kind - $\alpha_i$ = Bessel zeros - $R$ = Wafer radius

8.3 Multi-Zone Pressure Control

For zone $i$:

$$ MRR_i = k_p \cdot P_i \cdot v_i $$

Target uniformity achieved when:

$$ MRR_1 = MRR_2 = ... = MRR_n $$

8.4 Full-Chip Simulation Flow

`- ┌─────────────────────┐ │ Design Layout (GDS)│ └──────────┬──────────┘ ↓ ┌─────────────────────┐ │ Density Extraction │ │ ρ(x,y) for each │ │ metal/dielectric │ └──────────┬──────────┘ ↓ ┌─────────────────────┐ │ Effective Density │ │ ρ_eff = ρ * W │ └──────────┬──────────┘ ↓ ┌─────────────────────┐ │ CMP Simulation │ │ z(t) evolution │ └──────────┬──────────┘ ↓ ┌─────────────────────┐ │ Post-CMP Topography │ │ Dishing/Erosion Map │ └──────────┬──────────┘ ↓ ┌─────────────────────┐ │ Hotspot Detection │ │ Design Rule Check │ └─────────────────────┘``

9. Process Control Applications

9.1 Run-to-Run (R2R) Control

9.1.1 EWMA Controller

$$
\hat{y}_{k+1} = \lambda y_k + (1 - \lambda) \hat{y}_k
$$

Where:
- $\hat{y}_{k+1}$ = Predicted output for next run
- $y_k$ = Current measured output
- $\lambda$ = Smoothing factor $(0 < \lambda < 1)$

9.1.2 Recipe Adjustment

$$
u_{k+1} = u_k + G^{-1} (y_{target} - \hat{y}_{k+1})
$$

Where:
- $u$ = Process recipe (time, pressure, etc.)
- $G$ = Process gain matrix
- $y_{target}$ = Target output

9.2 Virtual Metrology

$$
\hat{y} = f_{VM}(\mathbf{x}_{FDC})
$$

Where:
- $\hat{y}$ = Predicted wafer quality
- $\mathbf{x}_{FDC}$ = Fault Detection and Classification sensor data

9.3 Endpoint Detection

9.3.1 Motor Current Monitoring

$$
I(t) = I_0 + \Delta I \cdot H(t - t_{endpoint})
$$

Where $H$ is the Heaviside step function.

9.3.2 Optical Endpoint

$$
R(\lambda, t) = R_{film}(\lambda, d(t))
$$

Where reflectance $R$ changes as film thickness $d$ decreases.

10. Current Challenges and Future Directions

10.1 Key Challenges

- Sub-5nm nodes: Atomic-scale precision required
- Thickness variation target: $< 5$ Å (3σ)
- Defect density target: $< 0.01$ defects/cm²

- New materials integration:
- Low-κ dielectrics ($\kappa < 2.5$)
- Cobalt interconnects
- Ruthenium barrier layers

- 3D integration:
- Through-Silicon Via (TSV) CMP
- Hybrid bonding surface preparation
- Wafer-level packaging

10.2 Future Model Development

- Physics-informed neural networks (PINNs):

$$
\mathcal{L} = \mathcal{L}_{data} + \lambda_{physics} \cdot \mathcal{L}_{physics}
$$

Where:
$$
\mathcal{L}_{physics} = \left\| \frac{\partial z}{\partial t} + \frac{K}{\rho_{eff}} \right\|^2
$$

- Digital twins for real-time process optimization
- Federated learning across multiple fabs

10.3 Industry Requirements

| Node | Thickness Uniformity | Defect Density | Dishing Limit |
|------|---------------------|----------------|---------------|
| 7nm | $< 10$ Å | $< 0.05$/cm² | $< 200$ Å |
| 5nm | $< 7$ Å | $< 0.03$/cm² | $< 150$ Å |
| 3nm | $< 5$ Å | $< 0.01$/cm² | $< 100$ Å |
| 2nm | $< 3$ Å | $< 0.005$/cm² | $< 50$ Å |

Symbol Glossary

| Symbol | Description | Units |
|--------|-------------|-------|
| $MRR$ | Material Removal Rate | nm/min |
| $k_p$ | Preston coefficient | m²/N |
| $P$ | Pressure | Pa, psi |
| $v$ | Relative velocity | m/s |
| $\rho$ | Pattern density | dimensionless |
| $\rho_{eff}$ | Effective pattern density | dimensionless |
| $L$ | Planarization length | $\mu$m |
| $D$ | Dishing depth | Å, nm |
| $E$ | Erosion depth | Å, nm |
| $w$ | Feature width | nm, $\mu$m |
| $h$ | Step height | nm |
| $t$ | Polish time | s, min |
| $T$ | Temperature | K, °C |
| $\eta$ | Viscosity | Pa$\cdot$s |
| $\mu$ | Friction coefficient | dimensionless |

Key Equations

Preston Equation
$$
MRR = k_p \cdot P \cdot v
$$

Effective Density
$$
\rho_{eff}(x,y) = \iint \rho_0(x',y') \cdot W(x-x', y-y') \, dx' dy'
$$

Material Removal (Density Model)
$$
\frac{dz}{dt} = -\frac{K}{\rho_{eff}(x,y)}
$$

Dishing Model
$$
D = D_0 \cdot \left(1 - e^{-w/w_c}\right)
$$

Erosion Model
$$
E = K_{ox} \cdot t_{over} \cdot \rho_{metal}
$$

Neural Network
$$
\hat{y} = \sigma(\mathbf{W}^{(n)} \cdot ... \cdot \sigma(\mathbf{W}^{(1)} \mathbf{x} + \mathbf{b}^{(1)}) + \mathbf{b}^{(n)})
$$

Chemical Mechanical Planarization (CMP) Modeling in Semiconductor Manufacturing

Want to learn more?