Chemical Mechanical Planarization (CMP) Modeling in Semiconductor Manufacturing
1. Fundamentals of CMP
1.1 Definition and Principle
Chemical Mechanical Planarization (CMP) is a hybrid process combining:
- Chemical etching: Reactive slurry chemistry modifies surface properties
- Mechanical abrasion: Physical removal via abrasive particles and pad
The fundamental material removal can be expressed as:
$$
\text{Material Removal} = f(\text{Chemical Reaction}, \text{Mechanical Abrasion})
$$
1.2 Process Components
| Component | Function | Key Parameters |
|-----------|----------|----------------|
| Wafer | Substrate to be planarized | Material type, pattern density |
| Polishing Pad | Provides mechanical action | Hardness, porosity, asperity distribution |
| Slurry | Chemical + abrasive medium | pH, oxidizer, particle size/concentration |
| Carrier | Holds and rotates wafer | Down force, rotation speed |
| Platen | Rotates polishing pad | Rotation speed, temperature |
1.3 Key Process Parameters
- Down Force ($F$): Pressure applied to wafer, typically $1-7$ psi
- Platen Speed ($\omega_p$): Pad rotation, typically $20-100$ rpm
- Carrier Speed ($\omega_c$): Wafer rotation, typically $20-100$ rpm
- Slurry Flow Rate ($Q$): Typically $100-300$ mL/min
- Temperature ($T$): Typically $20-50Β°C$
2. Classical Physical Models
2.1 Preston Equation (Foundational Model)
The foundational model for CMP is the Preston equation (1927):
$$
\boxed{MRR = k_p \cdot P \cdot v}
$$
Where:
- $MRR$ = Material Removal Rate $[\text{nm/min}]$
- $k_p$ = Preston's coefficient $[\text{m}^2/\text{N}]$
- $P$ = Applied pressure $[\text{Pa}]$
- $v$ = Relative velocity $[\text{m/s}]$
The relative velocity between wafer and pad:
$$
v = \sqrt{(\omega_p r_p)^2 + (\omega_c r_c)^2 - 2\omega_p \omega_c r_p r_c \cos(\theta)}
$$
Where:
- $\omega_p, \omega_c$ = Angular velocities of platen and carrier
- $r_p, r_c$ = Radial positions
- $\theta$ = Phase angle
2.2 Modified Preston Models
2.2.1 Pressure-Velocity Product Modification
$$
MRR = k_p \cdot P^a \cdot v^b
$$
Where $a, b$ are empirical exponents (typically $0.5 < a, b < 1.5$)
2.2.2 Chemical Enhancement Factor
$$
MRR = k_p \cdot P \cdot v \cdot f(C, T, pH)
$$
Where $f(C, T, pH)$ represents chemical effects:
- $C$ = Oxidizer concentration
- $T$ = Temperature
- $pH$ = Slurry pH
2.2.3 Arrhenius-Modified Preston Equation
$$
MRR = k_0 \cdot \exp\left(-\frac{E_a}{RT}\right) \cdot P \cdot v
$$
Where:
- $k_0$ = Pre-exponential factor
- $E_a$ = Activation energy $[\text{J/mol}]$
- $R$ = Gas constant $= 8.314$ J/(mol$\cdot$K)
- $T$ = Temperature $[\text{K}]$
2.3 Tribocorrosion Model
For metal CMP (e.g., tungsten, copper):
$$
MRR = \frac{M}{z F \rho} \cdot \left( i_{corr} + \frac{Q_{pass}}{A \cdot t_{pass}} \right) \cdot f_{mech}
$$
Where:
- $M$ = Molar mass of metal
- $z$ = Number of electrons transferred
- $F$ = Faraday constant $= 96485$ C/mol
- $\rho$ = Density
- $i_{corr}$ = Corrosion current density
- $Q_{pass}$ = Passivation charge
- $f_{mech}$ = Mechanical factor
2.4 Contact Mode Classification
| Mode | Condition | Preston Constant | Friction Coefficient |
|------|-----------|------------------|---------------------|
| Contact | $\frac{\eta v_R}{p} < (\frac{\eta v_R}{p})_c$ | High, constant | High ($\mu > 0.3$) |
| Mixed | $\frac{\eta v_R}{p} \approx (\frac{\eta v_R}{p})_c$ | Transitional | Medium |
| Hydroplaning | $\frac{\eta v_R}{p} > (\frac{\eta v_R}{p})_c$ | Low, variable | Low ($\mu < 0.1$) |
Where:
- $\eta$ = Slurry viscosity
- $v_R$ = Relative velocity
- $p$ = Pressure
3. Pattern Density Models
3.1 Effective Pattern Density Model (Stine Model)
The local material removal rate depends on effective pattern density:
$$
\frac{dz}{dt} = -\frac{K}{\rho_{eff}(x, y)}
$$
Where:
- $z$ = Surface height
- $K$ = Blanket removal rate $= k_p \cdot P \cdot v$
- $\rho_{eff}$ = Effective pattern density
3.1.1 Effective Density Calculation
$$
\rho_{eff}(x, y) = \iint_{-\infty}^{\infty} \rho_0(x', y') \cdot W(x - x', y - y') \, dx' \, dy'
$$
Where:
- $\rho_0(x, y)$ = Local pattern density
- $W(x, y)$ = Weighting function (planarization kernel)
3.1.2 Elliptical Weighting Function
$$
W(x, y) = \frac{1}{\pi L_x L_y} \cdot \exp\left(-\frac{x^2}{L_x^2} - \frac{y^2}{L_y^2}\right)
$$
Where $L_x, L_y$ are planarization lengths in x and y directions.
3.2 Step Height Evolution Model
For oxide CMP with step height $h$:
$$
\frac{dh}{dt} = -K \cdot \left(1 - \frac{h_{contact}}{h}\right) \quad \text{for } h > h_{contact}
$$
$$
\frac{dh}{dt} = 0 \quad \text{for } h \leq h_{contact}
$$
Where $h_{contact}$ is the pad contact threshold height.
3.3 Integrated Density-Step Height Model
Combined model for oxide thickness evolution:
$$
z(x, y, t) = z_0 - K \cdot t \cdot \frac{1}{\rho_{eff}(x, y)} \cdot g(h)
$$
Where $g(h)$ is the step-height dependent function:
$$
g(h) = \begin{cases}
1 & \text{if } h > h_c \\
\frac{h}{h_c} & \text{if } h \leq h_c
\end{cases}
$$
4. Dishing and Erosion Models
4.1 Copper Dishing Model
Dishing depth $D$ for copper lines:
$$
D = K_{Cu} \cdot t_{over} \cdot f(w)
$$
Where:
- $K_{Cu}$ = Copper removal rate
- $t_{over}$ = Overpolish time
- $w$ = Line width
- $f(w)$ = Width-dependent function
Empirical relationship:
$$
D = D_0 \cdot \left(1 - \exp\left(-\frac{w}{w_c}\right)\right)
$$
Where:
- $D_0$ = Maximum dishing depth
- $w_c$ = Critical line width
4.2 Oxide Erosion Model
Erosion $E$ in dense pattern regions:
$$
E = K_{ox} \cdot t_{over} \cdot \rho_{metal}
$$
Where:
- $K_{ox}$ = Oxide removal rate
- $\rho_{metal}$ = Local metal pattern density
4.3 Combined Dishing-Erosion
Total copper thickness loss:
$$
\Delta z_{Cu} = D + E \cdot \frac{\rho_{metal}}{1 - \rho_{metal}}
$$
4.4 Pattern Density Effects
| Pattern Density | Dishing Behavior | Erosion Behavior |
|-----------------|------------------|------------------|
| Low ($< 20\%$) | Minimal | Minimal |
| Medium ($20-50\%$) | Moderate | Increasing |
| High ($> 50\%$) | Saturates | Severe |
5. Contact Mechanics Models
5.1 Pad Asperity Contact Model
Assuming Gaussian asperity height distribution:
$$
P(z) = \frac{1}{\sigma_s \sqrt{2\pi}} \exp\left(-\frac{(z - \bar{z})^2}{2\sigma_s^2}\right)
$$
Where:
- $\sigma_s$ = Standard deviation of asperity heights
- $\bar{z}$ = Mean asperity height
5.2 Real Contact Area
$$
A_r = \pi n \int_{d}^{\infty} R(z - d) \cdot P(z) \, dz
$$
Where:
- $n$ = Number of asperities per unit area
- $R$ = Asperity tip radius
- $d$ = Separation distance
For Gaussian distribution:
$$
A_r = \pi n R \sigma_s \cdot F_1\left(\frac{d}{\sigma_s}\right)
$$
Where $F_1$ is a statistical function.
5.3 Hertzian Contact
For elastic contact between abrasive particle and wafer:
$$
a = \left(\frac{3FR}{4E^*}\right)^{1/3}
$$
$$
\delta = \frac{a^2}{R} = \left(\frac{9F^2}{16RE^{*2}}\right)^{1/3}
$$
Where:
- $a$ = Contact radius
- $F$ = Normal force
- $R$ = Particle radius
- $\delta$ = Indentation depth
- $E^*$ = Effective elastic modulus
$$
\frac{1}{E^*} = \frac{1 -
u_1^2}{E_1} + \frac{1 -
u_2^2}{E_2}
$$
5.4 Material Removal by Single Abrasive
Volume removed per abrasive per pass:
$$
V = K_{wear} \cdot \frac{F_n \cdot L}{H}
$$
Where:
- $K_{wear}$ = Wear coefficient
- $F_n$ = Normal force on particle
- $L$ = Sliding distance
- $H$ = Hardness of wafer material
5.5 Multi-Scale Model Framework
```
-
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β WAFER SCALE (mm-cm) β
β Pressure distribution, global uniformity β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β DIE SCALE ($\mu$m-mm) β
β Pattern density effects, planarization β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β FEATURE SCALE (nm-$\mu$m) β
β Dishing, erosion, step height evolution β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β PARTICLE SCALE (nm) β
β Abrasive-surface interactions β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β MOLECULAR SCALE (Γ
) β
β Chemical reactions, atomic removal β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
6. Machine Learning and Neural Network Models
6.1 Overview of ML Approaches
Machine learning methods for CMP modeling:
- Supervised Learning
- Artificial Neural Networks (ANN)
- Convolutional Neural Networks (CNN)
- Support Vector Machines (SVM)
- Random Forests / Gradient Boosting
- Deep Learning
- Deep Belief Networks (DBN)
- Long Short-Term Memory (LSTM)
- Generative Adversarial Networks (GAN)
- Transfer Learning
- Pre-trained models adapted to new process conditions
6.2 Neural Network Architecture for CMP
6.2.1 Input Features
$$
\mathbf{x} = [P, v, t, \rho, w, s, pH, C_{ox}, T, ...]^T
$$
Where:
- $P$ = Pressure
- $v$ = Velocity
- $t$ = Polish time
- $\rho$ = Pattern density
- $w$ = Feature width
- $s$ = Feature spacing
- $pH$ = Slurry pH
- $C_{ox}$ = Oxidizer concentration
- $T$ = Temperature
6.2.2 Multi-Layer Perceptron (MLP)
$$
\mathbf{h}^{(1)} = \sigma(\mathbf{W}^{(1)} \mathbf{x} + \mathbf{b}^{(1)})
$$
$$
\mathbf{h}^{(2)} = \sigma(\mathbf{W}^{(2)} \mathbf{h}^{(1)} + \mathbf{b}^{(2)})
$$
$$
\hat{y} = \mathbf{W}^{(out)} \mathbf{h}^{(2)} + \mathbf{b}^{(out)}
$$
Where:
- $\sigma$ = Activation function (ReLU, tanh, sigmoid)
- $\mathbf{W}^{(i)}$ = Weight matrices
- $\mathbf{b}^{(i)}$ = Bias vectors
6.2.3 Activation Functions
| Function | Formula | Use Case |
|----------|---------|----------|
| ReLU | $\sigma(x) = \max(0, x)$ | Hidden layers |
| Sigmoid | $\sigma(x) = \frac{1}{1 + e^{-x}}$ | Output (binary) |
| Tanh | $\sigma(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$ | Hidden layers |
| Softmax | $\sigma(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}$ | Classification |
6.3 CNN-Based CMP Modeling (CmpCNN)
6.3.1 Architecture
``
Input: Layout Image (Binary) + Density Map
β
Conv2D Layer (3Γ3 kernel, 32 filters)
β
MaxPooling2D (2Γ2)
β
Conv2D Layer (3Γ3 kernel, 64 filters)
β
MaxPooling2D (2Γ2)
β
Flatten
β
Dense Layer (256 units)
β
Dense Layer (128 units)
β
Output: Post-CMP Height Map
6.3.2 Convolution Operation
$$
(I * K)(i, j) = \sum_m \sum_n I(i+m, j+n) \cdot K(m, n)
$$
Where:
- $I$ = Input image (layout)
- $K$ = Convolution kernel
- $(i, j)$ = Output position
6.4 Loss Functions
6.4.1 Mean Squared Error (MSE)
$$
\mathcal{L}_{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2
$$
6.4.2 Root Mean Square Error (RMSE)
$$
RMSE = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2}
$$
6.4.3 Mean Absolute Percentage Error (MAPE)
$$
MAPE = \frac{100\%}{N} \sum_{i=1}^{N} \left| \frac{y_i - \hat{y}_i}{y_i} \right|
$$
6.5 Transfer Learning Framework
For adapting models across process nodes:
$$
\mathcal{L}_{transfer} = \mathcal{L}_{target} + \lambda \cdot \mathcal{L}_{domain}
$$
Where:
- $\mathcal{L}_{target}$ = Target domain loss
- $\mathcal{L}_{domain}$ = Domain adaptation loss
- $\lambda$ = Regularization parameter
6.6 Performance Metrics
| Metric | Formula | Target |
|--------|---------|--------|
| $R^2$ | $1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}$ | $> 0.95$ |
| RMSE | $\sqrt{\frac{1}{N}\sum(y_i - \hat{y}_i)^2}$ | $< 5$ Γ
|
| MAE | $\frac{1}{N}\sum|y_i - \hat{y}_i|$ | $< 3$ Γ
|
7. Slurry Chemistry Modeling
7.1 Kaufman Mechanism
Cyclic passivation-depassivation process:
$$
\text{Metal} \xrightarrow{\text{Oxidizer}} \text{Metal Oxide} \xrightarrow{\text{Abrasion}} \text{Removal}
$$
7.2 Electrochemical Reactions
7.2.1 Copper CMP
Oxidation:
$$
\text{Cu} \rightarrow \text{Cu}^{2+} + 2e^-
$$
Passivation (with BTA):
$$
\text{Cu} + \text{BTA} \rightarrow \text{Cu-BTA}_{film}
$$
Complexation:
$$
\text{Cu}^{2+} + n\text{L} \rightarrow [\text{CuL}_n]^{2+}
$$
Where L = chelating agent (e.g., glycine, citrate)
7.2.2 Tungsten CMP
Oxidation:
$$
\text{W} + 3\text{H}_2\text{O} \rightarrow \text{WO}_3 + 6\text{H}^+ + 6e^-
$$
With hydrogen peroxide:
$$
\text{W} + 3\text{H}_2\text{O}_2 \rightarrow \text{WO}_3 + 3\text{H}_2\text{O}
$$
7.3 Pourbaix Diagram Integration
Stability regions defined by:
$$
E = E^0 - \frac{RT}{nF} \ln Q - \frac{RT}{F} \cdot m \cdot pH
$$
Where:
- $E$ = Electrode potential
- $E^0$ = Standard potential
- $Q$ = Reaction quotient
- $m$ = Number of HβΊ in reaction
7.4 Abrasive Particle Effects
7.4.1 Particle Size Distribution (PSD)
Log-normal distribution:
$$
f(d) = \frac{1}{d \sigma \sqrt{2\pi}} \exp\left(-\frac{(\ln d - \mu)^2}{2\sigma^2}\right)
$$
Where:
- $d$ = Particle diameter
- $\mu$ = Mean of $\ln(d)$
- $\sigma$ = Standard deviation of $\ln(d)$
7.4.2 Zeta Potential
$$
\zeta = \frac{4\pi \eta \mu_e}{\varepsilon}
$$
Where:
- $\eta$ = Viscosity
- $\mu_e$ = Electrophoretic mobility
- $\varepsilon$ = Dielectric constant
7.5 Slurry Components Summary
| Component | Function | Typical Materials |
|-----------|----------|-------------------|
| Abrasive | Mechanical removal | SiOβ, CeOβ, AlβOβ |
| Oxidizer | Surface modification | HβOβ, KIOβ, Fe(NOβ)β |
| Complexant | Metal dissolution | Glycine, citric acid |
| Inhibitor | Corrosion protection | BTA, BBI |
| Surfactant | Particle dispersion | CTAB, SDS |
| Buffer | pH control | Phosphate, citrate |
8. Chip-Scale and Full-Chip Models
8.1 Within-Wafer Non-Uniformity (WIWNU)
$$
WIWNU = \frac{\sigma_{thickness}}{\bar{thickness}} \times 100\%
$$
Where:
- $\sigma_{thickness}$ = Standard deviation of thickness
- $\bar{thickness}$ = Mean thickness
8.2 Pressure Distribution Model
For a flexible carrier:
$$
P(r) = P_0 + \sum_{i=1}^{n} P_i \cdot J_0\left(\frac{\alpha_i r}{R}\right)
$$
Where:
- $P_0$ = Base pressure
- $J_0$ = Bessel function of first kind
- $\alpha_i$ = Bessel zeros
- $R$ = Wafer radius
8.3 Multi-Zone Pressure Control
For zone $i$:
$$
MRR_i = k_p \cdot P_i \cdot v_i
$$
Target uniformity achieved when:
$$
MRR_1 = MRR_2 = ... = MRR_n
$$
8.4 Full-Chip Simulation Flow
```
-
βββββββββββββββββββββββ
β Design Layout (GDS)β
ββββββββββββ¬βββββββββββ
β
βββββββββββββββββββββββ
β Density Extraction β
β Ο(x,y) for each β
β metal/dielectric β
ββββββββββββ¬βββββββββββ
β
βββββββββββββββββββββββ
β Effective Density β
β Ο_eff = Ο * W β
ββββββββββββ¬βββββββββββ
β
βββββββββββββββββββββββ
β CMP Simulation β
β z(t) evolution β
ββββββββββββ¬βββββββββββ
β
βββββββββββββββββββββββ
β Post-CMP Topography β
β Dishing/Erosion Map β
ββββββββββββ¬βββββββββββ
β
βββββββββββββββββββββββ
β Hotspot Detection β
β Design Rule Check β
βββββββββββββββββββββββ
9. Process Control Applications
9.1 Run-to-Run (R2R) Control
9.1.1 EWMA Controller
$$
\hat{y}_{k+1} = \lambda y_k + (1 - \lambda) \hat{y}_k
$$
Where:
- $\hat{y}_{k+1}$ = Predicted output for next run
- $y_k$ = Current measured output
- $\lambda$ = Smoothing factor $(0 < \lambda < 1)$
9.1.2 Recipe Adjustment
$$
u_{k+1} = u_k + G^{-1} (y_{target} - \hat{y}_{k+1})
$$
Where:
- $u$ = Process recipe (time, pressure, etc.)
- $G$ = Process gain matrix
- $y_{target}$ = Target output
9.2 Virtual Metrology
$$
\hat{y} = f_{VM}(\mathbf{x}_{FDC})
$$
Where:
- $\hat{y}$ = Predicted wafer quality
- $\mathbf{x}_{FDC}$ = Fault Detection and Classification sensor data
9.3 Endpoint Detection
9.3.1 Motor Current Monitoring
$$
I(t) = I_0 + \Delta I \cdot H(t - t_{endpoint})
$$
Where $H$ is the Heaviside step function.
9.3.2 Optical Endpoint
$$
R(\lambda, t) = R_{film}(\lambda, d(t))
$$
Where reflectance $R$ changes as film thickness $d$ decreases.
10. Current Challenges and Future Directions
10.1 Key Challenges
- Sub-5nm nodes: Atomic-scale precision required
- Thickness variation target: $< 5$ Γ
(3Ο)
- Defect density target: $< 0.01$ defects/cmΒ²
- New materials integration:
- Low-ΞΊ dielectrics ($\kappa < 2.5$)
- Cobalt interconnects
- Ruthenium barrier layers
- 3D integration:
- Through-Silicon Via (TSV) CMP
- Hybrid bonding surface preparation
- Wafer-level packaging
10.2 Future Model Development
- Physics-informed neural networks (PINNs):
$$
\mathcal{L} = \mathcal{L}_{data} + \lambda_{physics} \cdot \mathcal{L}_{physics}
$$
Where:
$$
\mathcal{L}_{physics} = \left\| \frac{\partial z}{\partial t} + \frac{K}{\rho_{eff}} \right\|^2
$$
- Digital twins for real-time process optimization
- Federated learning across multiple fabs
10.3 Industry Requirements
| Node | Thickness Uniformity | Defect Density | Dishing Limit |
|------|---------------------|----------------|---------------|
| 7nm | $< 10$ Γ
| $< 0.05$/cmΒ² | $< 200$ Γ
|
| 5nm | $< 7$ Γ
| $< 0.03$/cmΒ² | $< 150$ Γ
|
| 3nm | $< 5$ Γ
| $< 0.01$/cmΒ² | $< 100$ Γ
|
| 2nm | $< 3$ Γ
| $< 0.005$/cmΒ² | $< 50$ Γ
|
Symbol Glossary
| Symbol | Description | Units |
|--------|-------------|-------|
| $MRR$ | Material Removal Rate | nm/min |
| $k_p$ | Preston coefficient | mΒ²/N |
| $P$ | Pressure | Pa, psi |
| $v$ | Relative velocity | m/s |
| $\rho$ | Pattern density | dimensionless |
| $\rho_{eff}$ | Effective pattern density | dimensionless |
| $L$ | Planarization length | $\mu$m |
| $D$ | Dishing depth | Γ
, nm |
| $E$ | Erosion depth | Γ
, nm |
| $w$ | Feature width | nm, $\mu$m |
| $h$ | Step height | nm |
| $t$ | Polish time | s, min |
| $T$ | Temperature | K, Β°C |
| $\eta$ | Viscosity | Pa$\cdot$s |
| $\mu$ | Friction coefficient | dimensionless |
Key Equations
Preston Equation
$$
MRR = k_p \cdot P \cdot v
$$
Effective Density
$$
\rho_{eff}(x,y) = \iint \rho_0(x',y') \cdot W(x-x', y-y') \, dx' dy'
$$
Material Removal (Density Model)
$$
\frac{dz}{dt} = -\frac{K}{\rho_{eff}(x,y)}
$$
Dishing Model
$$
D = D_0 \cdot \left(1 - e^{-w/w_c}\right)
$$
Erosion Model
$$
E = K_{ox} \cdot t_{over} \cdot \rho_{metal}
$$
Neural Network
$$
\hat{y} = \sigma(\mathbf{W}^{(n)} \cdot ... \cdot \sigma(\mathbf{W}^{(1)} \mathbf{x} + \mathbf{b}^{(1)}) + \mathbf{b}^{(n)})
$$