Home Knowledge Base Knowledge Distillation

Knowledge Distillation is the model compression technique where a large, high-performing teacher model transfers its learned representations to a smaller, more efficient student model — training the student to mimic the teacher's soft probability distributions rather than just the hard ground-truth labels, enabling the student to capture inter-class relationships and decision boundaries that hard labels cannot convey.

Distillation Framework:

Distillation Strategies:

Advanced Techniques:

Results and Applications:

Knowledge distillation is the most practical technique for deploying large model capabilities on resource-constrained hardware — transferring the dark knowledge embedded in teacher probability distributions to compact student models, enabling the accuracy benefits of massive models to reach every device and application.

knowledge distillation trainingteacher student networksoft label distillationfeature distillation intermediatedistillation temperature scaling

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.