Design of Experiments (DOE) in Semiconductor Manufacturing
DOE is a statistical methodology for systematically investigating relationships between process parameters and responses (yield, thickness, defects, etc.).
1. Fundamental Mathematical Model
First-order linear model:
y = β₀ + Σᵢβᵢxᵢ + ε
Second-order model (with curvature and interactions):
y = β₀ + Σᵢβᵢxᵢ + Σᵢβᵢᵢxᵢ² + Σᵢ<ⱼβᵢⱼxᵢxⱼ + ε
Where:
• y = response (oxide thickness, threshold voltage)
• xᵢ = coded factor levels (scaled to [-1, +1])
• β = model coefficients
• ε = random error ~ N(0, σ²)
2. Matrix Formulation
Model in matrix form:
Y = Xβ + ε
Least squares estimation:
β̂ = (X'X)⁻¹X'Y
Variance-covariance of estimates:
Var(β̂) = σ²(X'X)⁻¹
3. Factorial Designs
Full Factorial (2ᵏ)
For k factors at 2 levels: requires 2ᵏ runs.
Orthogonality property:
X'X = nI
All effects estimated independently with equal precision.
Fractional Factorial (2ᵏ⁻ᵖ)
Resolution determines confounding:
• Resolution III: Main effects aliased with 2FIs
• Resolution IV: Main effects clear; 2FIs aliased with each other
• Resolution V: Main effects and 2FIs all estimable
For 2⁵⁻² design with generators D = AB, E = AC:
• Defining relation: I = ABD = ACE = BCDE
• Find aliases by multiplying effect by defining relation
4. Response Surface Methodology (RSM)
Central Composite Design (CCD)
Combines:
• 2ᵏ or 2ᵏ⁻ᵖ factorial points
• 2k axial points at ±α from center
• n₀ center points
Rotatability condition:
α = (2ᵏ)¹/⁴ = F¹/⁴
• For k=2: α = √2 ≈ 1.414
• For k=3: α = 2³/⁴ ≈ 1.682
Box-Behnken Design
• 3 levels per factor
• No corner points (useful when extremes are dangerous)
• More economical than CCD for 3+ factors
5. Optimal Design Theory
D-optimal: Maximize |X'X|
• Minimizes volume of joint confidence region
A-optimal: Minimize trace[(X'X)⁻¹]
• Minimizes average variance of estimates
I-optimal: Minimize integrated prediction variance:
∫ Var[ŷ(x)] dx
G-optimal: Minimize maximum prediction variance
6. Analysis of Variance (ANOVA)
Sum of squares decomposition:
SSₜₒₜₐₗ = SSₘₒdₑₗ + SSᵣₑₛᵢdᵤₐₗ
SSₘₒdₑₗ = Σᵢ(ŷᵢ - ȳ)²
SSᵣₑₛᵢdᵤₐₗ = Σᵢ(yᵢ - ŷᵢ)²
F-test for significance:
F = MSₑffₑcₜ / MSₑᵣᵣₒᵣ = (SSₑffₑcₜ/dfₑffₑcₜ) / (SSₑᵣᵣₒᵣ/dfₑᵣᵣₒᵣ)
Effect estimation:
Effectₐ = ȳₐ₊ - ȳₐ₋
β̂ₐ = Effectₐ / 2
7. Semiconductor-Specific Designs
Split-Plot Designs
For hard-to-change factors (temperature, pressure) vs easy-to-change (gas flow):
yᵢⱼₖ = μ + αᵢ + δᵢⱼ + βₖ + (αβ)ᵢₖ + εᵢⱼₖ
Where:
• αᵢ = whole-plot factor (hard to change)
• δᵢⱼ = whole-plot error
• βₖ = subplot factor (easy to change)
• εᵢⱼₖ = subplot error
Variance Components (Nested Designs)
For Lots → Wafers → Dies → Measurements:
σ²ₜₒₜₐₗ = σ²ₗₒₜ + σ²wₐfₑᵣ + σ²dᵢₑ + σ²ₘₑₐₛ
Mixture Designs
For etch gas chemistry where components sum to 1:
Σᵢxᵢ = 1
Uses simplex-lattice designs and Scheffé models.
8. Robust Parameter Design (Taguchi)
Signal-to-Noise ratios:
Nominal-is-best:
S/N = 10·log₁₀(ȳ²/s²)
Smaller-is-better:
S/N = -10·log₁₀[(1/n)·Σyᵢ²]
Larger-is-better:
S/N = -10·log₁₀[(1/n)·Σ(1/yᵢ²)]
9. Sequential Optimization
Steepest Ascent/Descent:
∇y = (β₁, β₂, ..., βₖ)
Step sizes: Δxᵢ ∝ βᵢ × (range of xᵢ)
10. Model Diagnostics
Coefficient of determination:
R² = 1 - SSᵣₑₛᵢdᵤₐₗ/SSₜₒₜₐₗ
Adjusted R²:
R²ₐdⱼ = 1 - [SSᵣₑₛᵢdᵤₐₗ/(n-p)] / [SSₜₒₜₐₗ/(n-1)]
PRESS statistic:
PRESS = Σᵢ(yᵢ - ŷ₍ᵢ₎)²
Prediction R²:
R²ₚᵣₑd = 1 - PRESS/SSₜₒₜₐₗ
Variance Inflation Factor:
VIFⱼ = 1/(1 - R²ⱼ)
VIF > 10 indicates problematic collinearity.
11. Power and Sample Size
Minimum detectable effect:
δ = σ × √[2(zₐ/₂ + zᵦ)²/n]
Power calculation:
Power = Φ(|δ|√n / (σ√2) - zₐ/₂)
12. Multivariate Optimization
Desirability function for target T between L and U:
d = [(y-L)/(T-L)]ˢ when L ≤ y ≤ T
d = [(U-y)/(U-T)]ᵗ when T ≤ y ≤ U
Overall desirability:
D = (∏ᵢdᵢʷⁱ)^(1/Σwᵢ)
13. Process Capability Integration
Cₚ = (USL - LSL) / 6σ
Cₚₖ = min[(USL - μ)/3σ, (μ - LSL)/3σ]
DOE improves Cₚₖ by centering and reducing variation.
14. Model Selection
AIC:
AIC = n·ln(SSE/n) + 2p
BIC:
BIC = n·ln(SSE/n) + p·ln(n)
15. Modern Advances
Definitive Screening Designs (DSD)
• Jones & Nachtsheim (2011)
• Requires only 2k+1 runs for k factors
• Estimates main effects, quadratic effects, and some 2FIs
Bayesian DOE
• Prior: p(β)
• Posterior: p(β|Y) ∝ p(Y|β)p(β)
• Expected Improvement for sequential selection
Gaussian Process (Kriging)
• Non-parametric, data-driven
• Provides uncertainty quantification
Summary
DOE provides the rigorous framework for process optimization where:
• Single experiments cost tens of thousands of dollars
• Cycle times span weeks to months
• Maximum information from minimum runs is essential