Hello fellow brewers,
let me first say hi, as this is my first post (hopefully not my last, though).
I am using a different network architecture though, one that has a dropout layer at the end. The problem with the dropout layer is, that it gives non-deterministic results when you want to predict something but are in the TRAIN phase of caffe.
For those who are not familiar with the paper, what you do in this method is (simply put): You have an image, and you want to take an action based on that image.
You take the image as the input and predict what action you take. Then you compute your loss based on the taken action and the best action that you could have taken at that time step.
The problem is, that I have to switch constantly between training and test phase, test phase for the prediction of the action and training phase to train the layers. As I understand it, I could just initialize two networks, one in the training and one in the test phase and the constantly copy the trained layers from the training-network onto the test-network. However, that takes a lot of time, especially if you have to do that 10+ million times.
I want to have only one network that I can switch from test to training-state and back. This is currently not possible with caffe, so I have to do it myself. I tried, but I didn't really understand how to do it. Here you guys come in. Each layer has it's "phase_" variable, so I just have to get the dropout layer and change it, right? Or is "phase_" part of the network and I have to implement a new function that changes the phase on the network level? Do I have to synchronize the changes with any data on the GPU or is this kind of data only saved on the CPU side?
Maybe you have some ideas on how to do that, I really would appreciate that.
Thanks in advance