distilling reasoning ability, model compression
Transfer reasoning from larger model.
656 technical terms and definitions
Transfer reasoning from larger model.
DistMult represents relations as diagonal matrices for bilinear scoring of knowledge graph triples.
Bilinear model for KG completion.
Distral learns distilled shared policy from multiple task-specific policies enabling multi-task transfer.
Multiple ESD devices across chip.
Retrieval across multiple machines.
Track requests across multiple services.
DDP: each GPU has full model, gradients synced. FSDP: shards model across GPUs. Both enable training on multiple GPUs.
Split training across multiple GPUs or machines.
Match class distributions.
Distribution shift occurs when test data differs from training distribution.
Test distribution differs from training.
Distributional Bellman equations propagate return distributions rather than expectations improving learning stability.
Learn full return distribution.
Pair of adjacent vacancies.
Class changed for many reasons.
Diverse beam search encourages varied hypotheses through dissimilarity penalties.
Generate diverse beams.
Balance relevance with variety.
Diversity intrinsic motivation rewards discovering behaviors maximizing state coverage or mutual information with skills.
Diversity regularization in recommendations balances accuracy with result diversity by penalizing overly similar recommended items.
Diversity sampling selects varied demonstrations covering different task aspects.
Factorize attention across space and time.
Django is full-featured Python framework. ORM, admin, auth. Enterprise web apps.
Six Sigma improvement process.
Six Sigma methodology.
Define-Measure-Analyze-Improve-Control provides structured methodology for process improvement.
Use biological molecules.
Densely connected NAS search space enables rich architectural diversity through dense connections between layers.
Chart deviations from target.
Do-calculus provides rules for deriving causal effects from observational distributions and causal graphs.
Paste code or text and say explain this step by step. I will add comments, clarify logic, and help you refactor or document it clearly.
Containerize ML environments.
I can containerize apps with Docker, write Dockerfiles, and explain basic Kubernetes concepts like pods, services, and deployments.
Generate docstrings and documentation. Explain functions automatically.
Document AI extracts structured data from documents. Layout understanding, table extraction, form parsing.
How to split documents.
Categorize legal documents.
Document expansion enriches documents with predicted relevant terms.
Clean and prepare documents.
Distinguish retrieval and generation quality.
Rotate document to predict original start.
Retrieve document summaries instead of full text.
Generate docstrings and technical documentation from code.
Document learnings, prompts, experiments. Shared wiki prevents knowledge silos. Onboard new team members faster.
# Design of Experiments (DOE) in Semiconductor Manufacturing DOE is a statistical methodology for systematically investigating relationships between process parameters and responses (yield, thickness, defects, etc.). 1. Fundamental Mathematical Model First-order linear model: y = β₀ + Σᵢβᵢxᵢ + ε Second-order model (with curvature and interactions): y = β₀ + Σᵢβᵢxᵢ + Σᵢβᵢᵢxᵢ² + Σᵢ<ⱼβᵢⱼxᵢxⱼ + ε Where: • y = response (oxide thickness, threshold voltage) • xᵢ = coded factor levels (scaled to [-1, +1]) • β = model coefficients • ε = random error ~ N(0, σ²) 2. Matrix Formulation Model in matrix form: Y = Xβ + ε Least squares estimation: β̂ = (X'X)⁻¹X'Y Variance-covariance of estimates: Var(β̂) = σ²(X'X)⁻¹ 3. Factorial Designs Full Factorial (2ᵏ) For k factors at 2 levels: requires 2ᵏ runs. Orthogonality property: X'X = nI All effects estimated independently with equal precision. Fractional Factorial (2ᵏ⁻ᵖ) Resolution determines confounding: • Resolution III: Main effects aliased with 2FIs • Resolution IV: Main effects clear; 2FIs aliased with each other • Resolution V: Main effects and 2FIs all estimable For 2⁵⁻² design with generators D = AB, E = AC: • Defining relation: I = ABD = ACE = BCDE • Find aliases by multiplying effect by defining relation 4. Response Surface Methodology (RSM) Central Composite Design (CCD) Combines: • 2ᵏ or 2ᵏ⁻ᵖ factorial points • 2k axial points at ±α from center • n₀ center points Rotatability condition: α = (2ᵏ)¹/⁴ = F¹/⁴ • For k=2: α = √2 ≈ 1.414 • For k=3: α = 2³/⁴ ≈ 1.682 Box-Behnken Design • 3 levels per factor • No corner points (useful when extremes are dangerous) • More economical than CCD for 3+ factors 5. Optimal Design Theory D-optimal: Maximize |X'X| • Minimizes volume of joint confidence region A-optimal: Minimize trace[(X'X)⁻¹] • Minimizes average variance of estimates I-optimal: Minimize integrated prediction variance: ∫ Var[ŷ(x)] dx G-optimal: Minimize maximum prediction variance 6. Analysis of Variance (ANOVA) Sum of squares decomposition: SSₜₒₜₐₗ = SSₘₒdₑₗ + SSᵣₑₛᵢdᵤₐₗ SSₘₒdₑₗ = Σᵢ(ŷᵢ - ȳ)² SSᵣₑₛᵢdᵤₐₗ = Σᵢ(yᵢ - ŷᵢ)² F-test for significance: F = MSₑffₑcₜ / MSₑᵣᵣₒᵣ = (SSₑffₑcₜ/dfₑffₑcₜ) / (SSₑᵣᵣₒᵣ/dfₑᵣᵣₒᵣ) Effect estimation: Effectₐ = ȳₐ₊ - ȳₐ₋ β̂ₐ = Effectₐ / 2 7. Semiconductor-Specific Designs Split-Plot Designs For hard-to-change factors (temperature, pressure) vs easy-to-change (gas flow): yᵢⱼₖ = μ + αᵢ + δᵢⱼ + βₖ + (αβ)ᵢₖ + εᵢⱼₖ Where: • αᵢ = whole-plot factor (hard to change) • δᵢⱼ = whole-plot error • βₖ = subplot factor (easy to change) • εᵢⱼₖ = subplot error Variance Components (Nested Designs) For Lots → Wafers → Dies → Measurements: σ²ₜₒₜₐₗ = σ²ₗₒₜ + σ²wₐfₑᵣ + σ²dᵢₑ + σ²ₘₑₐₛ Mixture Designs For etch gas chemistry where components sum to 1: Σᵢxᵢ = 1 Uses simplex-lattice designs and Scheffé models. 8. Robust Parameter Design (Taguchi) Signal-to-Noise ratios: Nominal-is-best: S/N = 10·log₁₀(ȳ²/s²) Smaller-is-better: S/N = -10·log₁₀[(1/n)·Σyᵢ²] Larger-is-better: S/N = -10·log₁₀[(1/n)·Σ(1/yᵢ²)] 9. Sequential Optimization Steepest Ascent/Descent: ∇y = (β₁, β₂, ..., βₖ) Step sizes: Δxᵢ ∝ βᵢ × (range of xᵢ) 10. Model Diagnostics Coefficient of determination: R² = 1 - SSᵣₑₛᵢdᵤₐₗ/SSₜₒₜₐₗ Adjusted R²: R²ₐdⱼ = 1 - [SSᵣₑₛᵢdᵤₐₗ/(n-p)] / [SSₜₒₜₐₗ/(n-1)] PRESS statistic: PRESS = Σᵢ(yᵢ - ŷ₍ᵢ₎)² Prediction R²: R²ₚᵣₑd = 1 - PRESS/SSₜₒₜₐₗ Variance Inflation Factor: VIFⱼ = 1/(1 - R²ⱼ) VIF > 10 indicates problematic collinearity. 11. Power and Sample Size Minimum detectable effect: δ = σ × √[2(zₐ/₂ + zᵦ)²/n] Power calculation: Power = Φ(|δ|√n / (σ√2) - zₐ/₂) 12. Multivariate Optimization Desirability function for target T between L and U: d = [(y-L)/(T-L)]ˢ when L ≤ y ≤ T d = [(U-y)/(U-T)]ᵗ when T ≤ y ≤ U Overall desirability: D = (∏ᵢdᵢʷⁱ)^(1/Σwᵢ) 13. Process Capability Integration Cₚ = (USL - LSL) / 6σ Cₚₖ = min[(USL - μ)/3σ, (μ - LSL)/3σ] DOE improves Cₚₖ by centering and reducing variation. 14. Model Selection AIC: AIC = n·ln(SSE/n) + 2p BIC: BIC = n·ln(SSE/n) + p·ln(n) 15. Modern Advances Definitive Screening Designs (DSD) • Jones & Nachtsheim (2011) • Requires only 2k+1 runs for k factors • Estimates main effects, quadratic effects, and some 2FIs Bayesian DOE • Prior: p(β) • Posterior: p(β|Y) ∝ p(Y|β)p(β) • Expected Improvement for sequential selection Gaussian Process (Kriging) • Non-parametric, data-driven • Provides uncertainty quantification Summary DOE provides the rigorous framework for process optimization where: • Single experiments cost tens of thousands of dollars • Cycle times span weeks to months • Maximum information from minimum runs is essential
Recognize coded language with hidden meaning.
Dolly is Databricks instruction-tuned model. Open source.
Domain adaptation in ASR transfers models across acoustic environments or speaking styles.
Domain adaptation techniques reduce distribution shift when applying recommendation models to new domains.