What is model compression distillation?

Question

SMEBOOK (admin) · Accepted Answer

Model compression distillation or Knowledge distillation (KD) is commonly deemed as an effective model compression technique in which a compact model (student) is trained under the supervision of a larger pre-trained model or an ensemble of models (teacher). In knowledge distillation, a large, complex model is trained on a large dataset. When this large model can generalize and perform well on unseen data, it is transferred to a smaller network. The larger model is also known as the teacher model and the smaller network is also known as the student network.

Share it on social networks: Tweet Share

SMEBOOK Knowledge base

What is model compression distillation?