Hi folks,
I hope you are doing well. I wanted to let y'all know about my latest blog post on the topic of Knowledge Distillation. A technique developed to transfer dark knowledge of heavy neural networks to way smaller ones. Asserting Knowledge Distillation works would be a huge understatement. For example, the Super-Resolution model you see in
this notebook is
only 33 KB.
In the blog post, I discuss Knowledge Distillation as a concept, touch upon the typical loss functions used when using Knowledge Distillation, a bunch of training recipes along with code walkthroughs in TensorFlow. What is more exciting is Knowledge Distillation plays out very well with other optimization techniques quantization and pruning.
Happy to address any feedback.