Knowledge distillation is a paradigm in which a compact “student” network is trained to emulate the performance of a larger, more complex “teacher” network. By transferring dark knowledge—subtle ...