Manually changing the phase from TRAIN to TEST during learning

968 views
Skip to first unread message

Leif Blaese

unread,
Jun 8, 2015, 3:01:06 AM6/8/15
to caffe...@googlegroups.com
Hello fellow brewers,
let me first say hi, as this is my first post (hopefully not my last, though).

I want to implement a version of the deep reinforcement learning algorithm that was shown in Nature a few month ago (http://www.nature.com/nature/journal/v518/n7540/pdf/nature14236.pdf), similar to what was shown by muupan on github (https://github.com/muupan/dqn-in-the-caffe).

I am using a different network architecture though, one that has a dropout layer at the end. The problem with the dropout layer is, that it gives non-deterministic results when you want to predict something but are in the TRAIN phase of caffe.

For those who are not familiar with the paper, what you do  in this method is (simply put): You have an image, and you want to take an action based on that image.
You take the image as the input and predict what action you take. Then you compute your loss based on the taken action and the best action that you could have taken at that time step.

The problem is, that I have to switch constantly between training and test phase, test phase for the prediction of the action and training phase to train the layers. As I understand it, I could just initialize two networks, one in the training and one in the test phase and the constantly copy the trained layers from the training-network onto the test-network. However, that takes a lot of time, especially if you have to do that 10+ million times.

I want to have only one network that I can switch from test to training-state and back. This is currently not possible with caffe, so I have to do it myself. I tried, but I didn't really understand how to do it. Here you guys come in. Each layer has it's "phase_" variable, so I just have to get the dropout layer and change it, right? Or is "phase_" part of the network and I have to implement a new function that changes the phase on the network level? Do I have to synchronize the changes with any data on the GPU or is this kind of data only saved on the CPU side?

Maybe you have some ideas on how to do that, I really would appreciate that.
Thanks in advance






Leif Blaese

unread,
Jun 8, 2015, 7:51:34 AM6/8/15
to caffe...@googlegroups.com

Edit: Maybe anyone remembers how it was done in the old versions of caffe, when it was still implemented?

Leif Blaese

unread,
Jun 9, 2015, 4:55:17 AM6/9/15
to caffe...@googlegroups.com
So, just to let you know I solved it like this:


I edited the layer.cpp und net.cpp and moved the "phase_" variable from protected to public (I know that there are better ways to do that but it is just a quick and dirty hack). Then I wrote a new function in the net.cpp like this:


template <typename Dtype>
void Net<Dtype>::change_phase(const std::string& layername, Phase newPhase) {
const shared_ptr<Layer<Dtype> > tmpLayer = layers_[layer_names_index_.find(layername)->second];
tmpLayer->phase_ = newPhase;
}


In my main code, I can then just call the net and change the phase like this:

const std::string layername = "myDropoutLayer";
net_->change_phase(layername, caffe::TEST); // net_ is a pointer to my net
// some code
net_->change_phase(layername, caffe::TRAIN);


Best,
Leif

 

Evan Shelhamer

unread,
Jun 23, 2015, 1:43:20 PM6/23/15
to Leif Blaese, caffe...@googlegroups.com
The problem is, that I have to switch constantly between training and test phase, test phase for the prediction of the action and training phase to train the layers. As I understand it, I could just initialize two networks, one in the training and one in the test phase and the constantly copy the trained layers from the training-network onto the test-network. However, that takes a lot of time, especially if you have to do that 10+ million times.

Instead of copying the parameters back and forth you can share layers between the train and test nets. Since they are shared this has no memory cost and likewise no time is spent transferring. In Python one could do

    solver.test_nets[0].share_with(solver.net)


We decided to make phase immutable to avoid confusion about the current state.

Evan Shelhamer

Amir Abdi

unread,
Oct 7, 2016, 12:40:22 AM10/7/16
to Caffe Users, leifb...@gmail.com
The share_with functionality does not seem to work.
I am feeding data manually in python like this:
solver.net.blobs['data'].data[...] = x
or
solver.test_nets[0].blobs['data'].data[...] = x

then forward the network based on whether we are testing or training, like this:
solver.step(1)
or
solver.test_nets[0].forward()

I tried calling 
solver.test_nets[0].share_with(solver.net)
or 
solver.net.share_with(solver.test_nets[0])
anywhere in the code, before training or afterwards; but nothing seems to work.

The training is working fine, but the test network does share the weights with the train net.

The train and test nets in my solver.prototxt are defined as follows:
...
test_net: "/home/amir/echoProject/framework/models/caffe_nets/echo_netx/valnet.prototxt"
train_net: "/home/amir/echoProject/framework/models/caffe_nets/echo_netx/trainnet.prototxt"
...
Reply all
Reply to author
Forward
0 new messages