The prototxt file I provided contains both teacher and student net. But learning rate of each layer of teacher net is zero implying during training there will be no update to the teacher net. So what you have to do is :
1. At first train a teacher net which will have good accuracy.
2. Once the teacher net is trained, take the teacher model by using '-weights' in the terminal execution command, as a result the learnt weights of the teacher model will be shared in the new model which contains both teacher and student. You can always extract the student model from this hybrid model later. The weights are shared between two layers when their name are same. You get output from the softmax layer of teacher net and feed it to student model for matching.
3. To test the model, only use the student architecture in the test.prototxt.
This method makes it easier to do train the student model in case we have to change the temperature param many times. However, for more difficult datasets like CIFAR10, where the teacher model is cumbersome, it will take lot of time to train. In that case you may want to modify the method I suggested. For that case, I follow another way, which requires little bit of coding task, but is faster than the previous method:
1. Train Teacher model.
2. Get soft output of the training data from the teacher model.
3. Create hdf5 dataset which contains training image, ground truth label, soft outputs
4. Use this hdf5 dataset for training the student model.
You should be very careful while creating the hdf5 dataset. All the required preprocessing should be done properly. Then only it can be used for training.