Home Knowledge Base Dropped Tokens

Dropped Tokens are tokens that are discarded in sparse Mixture of Experts models when their selected expert has exceeded its processing capacity buffer — causing information loss, training instability, and inconsistent outputs — the most visible failure mode of discrete top-k routing in MoE architectures, driving the development of alternative routing strategies (expert choice, soft MoE, capacity-factor tuning) that eliminate or minimize this pathological behavior.

What Are Dropped Tokens?

Why Dropped Tokens Are a Problem

Mitigation Strategies

Capacity Factor Tuning:

Load Balancing Loss:

Expert Choice Routing:

Soft MoE:

Dropped Token Impact Analysis

Drop RateQuality ImpactCauseAction
<1%NegligibleNormal routing varianceAcceptable
1–5%Measurable degradationModerate imbalanceIncrease capacity factor
5–15%Significant quality lossPoor load balanceAdd/tune balance loss
>15%Training failureRouter collapseSwitch routing strategy

Dropped Tokens are the canary in the MoE coal mine — the most visible symptom of routing pathology that signals expert underutilization, load imbalance, and wasted model capacity, driving the evolution from naive top-k routing toward more sophisticated routing mechanisms that achieve sparse computation without sacrificing tokens.

dropped tokensmoe

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.