I want to implement knowledge distillation in caffe.
1. Are there any solution to reuse intermediate results to save GPU memory in caffe?
2. How can I use different mode in the same net, like one branch for training and the other for testing(only to generate the final output)?
Thanks