single-node multi-gpu, distributed training
Multiple GPUs in one machine.
9,967 technical terms and definitions
Multiple GPUs in one machine.
Process one unit at a time.
Process one wafer at a time for tight control.
Clean rinse and dry one wafer at a time for tighter control.
Containers for HPC environments.
Fixed sinusoidal position functions.
Silicon oxynitride IL.
Use periodic activations for INR.
Test after installation.
Flatness within small measurement site.
Six big losses are breakdowns setup time small stops reduced speed startup defects and quality defects.
Categories reducing OEE.
3.4 defects per million opportunities.
Six Sigma is data-driven approach to eliminate defects achieving 3.4 DPMO.
Quality methodology targeting 3.4 defects per million.
Recognize actions from pose sequences.
Generate sketch-like images.
Ensure signals arrive at same time.
Skew is timing difference between related signals affecting differential signaling and bus synchronization.
Skill discovery in RL learns diverse reusable behaviors without rewards through information-theoretic objectives.
Skills matrices track employee competencies identifying training needs.
Skin effect concentrates high-frequency current near conductor surfaces increasing effective resistance.
Classify skin conditions from photos.
Initialize residual branches to zero.
SkipNet learns to skip layers adaptively for different inputs reducing computation.
Service Level Agreements define expected supplier performance including delivery times defect rates and response requirements.
Define SLAs: 99.9% uptime, p95 latency < 500ms. Monitor and alert on SLA breaches. Reliability is feature.
Learn-based SLAM methods.
Learning rate schedule for transfer learning.
Slate recommendation jointly optimizes ordered lists of items considering inter-item dependencies and position effects.
Slate-level bandits extend contextual bandits to jointly optimize item sets considering within-slate interactions.
Slew rate measures signal transition speed affecting noise generation and timing margins.
Create presentation slides.
Only attend to recent tokens within fixed window.
Attention mechanism that only attends to nearby tokens within a fixed window to handle long sequences.
Move window as conversation progresses.
Process overlapping windows.
Sliding window attention restricts attention to local neighborhoods reducing complexity.
Sliding window forecasting maintains fixed-length training history discarding oldest observations.
Sliding window attention limits each token to nearby tokens. Linear complexity. Mistral uses 4K sliding window.
Train single network supporting multiple widths.
SLOs define target reliability (99.9% uptime). Error budget approach. Balance reliability and velocity.
Create slogans and taglines. Memorable, catchy.
Decompose scenes into object slots.
Extract required information.
Extract specific information to fill predefined slots.
Design rules for openings in metal planes.
Represent information in discrete slots.
Slow corner combines conditions minimizing transistor speed verifying setup timing.
Learn slowly-varying features.