load balancing (moe),load balancing,moe,model architecture
Ensure experts are used roughly equally to avoid underutilization.
3,145 technical terms and definitions
Ensure experts are used roughly equally to avoid underutilization.
Load balancing distributes work evenly preventing bottlenecks in agent systems.
Load balancing loss encourages uniform expert utilization.
Load shedding rejects requests when system capacity is exceeded preventing overload.
Local level model is simplest structural time series with stochastically evolving level component.
Perform multiple local updates before synchronizing.
Local trend model includes both stochastically evolving level and slope components for trending series.
Combine local sliding window with sparse global attention.
Locally typical sampling considers information content conditional on context.
Lock-in thermography applies periodic electrical stimulation and phase-sensitive thermal imaging to detect weak heat sources from intermittent defects.
Thermal imaging of defects.
Local Outlier Factor adapted for time series detects anomalies by comparing local densities in feature space.
Local Outlier Factor for time series detects anomalies by comparing densities in windowed feature space.
Logarithmic quantization represents values in log domain improving dynamic range.
Log-Gaussian Cox processes use Gaussian random fields to model spatially or temporally varying intensity functions.
Use logarithmic scale for quantization.
Use LLMs to interact with logic systems.
Logistics optimization determines efficient transportation routing warehousing and distribution strategies.
Logit bias adjusts token probabilities to encourage or discourage specific outputs.
Decode intermediate activations.
Models handling 100K+ tokens.
Long convolutions model extended dependencies through large kernel sizes.
Identify overly long methods.
Deal with prompts exceeding limit.
Long-tail recommendation focuses on effectively suggesting less popular items with few interactions.
Long-term memory stores experiences and knowledge for retrieval in future tasks.
Capture dependencies across many frames.
Combination of local and global attention.
Longformer combines local sliding window with global attention for efficient long context.
Model with local+global attention for long documents.
Lookahead decoding generates multiple future tokens simultaneously when possible.
Loop optimization reorders and transforms loops maximizing parallelism and data locality.
Loop unrolling replicates loop bodies reducing branching overhead and enabling instruction-level parallelism.
LoRA and DreamBooth customize diffusion models. Train on few images. Personalized generation.
Low-Rank Adaptation fine-tunes diffusion models efficiently by learning low-rank weight updates.
Efficient fine-tuning with low-rank adaptation.
Efficient fine-tuning of diffusion models with low-rank adapters.
Combine multiple LoRAs.
Multiply loss by constant to prevent gradients from underflowing in FP16.
Loss spikes indicate training instability. Reduce LR, check data, add gradient clipping. May need to restart.
Sudden increases in loss during training.
Lot sizing determines optimal production or order quantities balancing setup costs and inventory.
Lottery ticket hypothesis posits that dense networks contain sparse subnetworks trainable to full accuracy.
Sparse subnetworks that train from scratch.
Fast community detection method.
Small misorientation between grains.
Use FP16 or BF16 for training.
Low-rank factorization decomposes weight matrices into products of smaller matrices.
Efficient tensor fusion.
Bound perturbations in Lp norm.