deploy.prototxt examples: why do they include solver information?

667 views
Skip to first unread message

csail123

unread,
Apr 4, 2016, 1:35:06 PM4/4/16
to Caffe Users
Why do example deploy.prototxt files from the caffe repo include learning rate parameters? 

For example, in the caffe repo:

models/bvlc_alexnet/deploy.prototxt, in the "conv1" first convolution layer:



layer
{
  name
: "conv1"
  type
: "Convolution"
  bottom
: "data"
  top
: "conv1"
  param
{
    lr_mult
: 1      # shouldn't be needed?
    decay_mult
: 1   # shouldn't be needed?
 
}
  param
{
    lr_mult
: 2      # shouldn't be needed?
    decay_mult
: 0   # shouldn't be needed?
 
}
  convolution_param
{
    num_output
: 96
    kernel_size
: 11
    stride
: 4
 
}
}

These are obviously directives for the solver and don't have a use in inference, but I wanted to check here before I submitted a PR or the like.  I suspect they are simply ignored during inference.

Unless is there some valid reason they are kept in?

Just curious. 

Jan

unread,
Apr 15, 2016, 8:03:12 AM4/15/16
to Caffe Users
You are right, they are not needed / ignored during inference. You could just as well delete them from the deploy version of the network. These directives still being there is probably due to the the writer of the deploy file copying the train_val version and just rewriting the input layers, leaving the others untouched. Which is ok.

Jan

SRQ

unread,
Apr 18, 2016, 10:59:14 AM4/18/16
to Caffe Users
Thank you for the question, I was just wondering the exact same thing. Also, are these required:?


layer {
  name
: "conv1"
  type
: "Convolution"
  bottom
: "data"
  top
: "conv1"
  param
{
    lr_mult
: 1
 
}

  param
{
    lr_mult
: 2
 
}
  convolution_param
{
    num_output
: 20
    kernel_size
: 5
    stride
: 1
    weight_filler
{                    ##### STARTING HERE
      type
: "xavier"
   
}
    bias_filler
{
      type
: "constant"      
   
}                                  ##### Ending here. Is this part useful as the weights are not being trained here nor the biases.                
 
}

Is there anything else that is not necessary but I am missing it?

Jan

unread,
Apr 19, 2016, 5:19:25 AM4/19/16
to Caffe Users
No, you can also remove the weight filler directives. As one would expect, these are only used to initialize the weights in the network. So if you have a set of trained weights for the network, they don't really matter anymore.

Jan

SRQ

unread,
Apr 19, 2016, 5:22:05 AM4/19/16
to Caffe Users
Thank you for the reply.

Hossein Hasanpour

unread,
Apr 19, 2016, 5:53:10 AM4/19/16
to Caffe Users
Hello guys, not a similar question though, what is a deploy file? how does one use it ?
where can I find more information about these files and working with them?
Thanks in advance

Jan

unread,
Apr 19, 2016, 6:12:10 AM4/19/16
to Caffe Users
It is basically the same as the train_val.prototxt, only for deployment, i.e. for use in conjunction with a caffemodel file and custom (not lmdb/leveldb/lmdb provided) data. There is no need to define a loss layer. Weight and bias filler directives as well as directives related only to learning (like lr_mult or loss_weight) can be stripped from the config. Usually instead of data input layers there is only a layer of type "Input", which has now the same task as the directives "input" and "input_shape" had earlier: define the name and shape of every input blob. The actual data is then stored inside the blob at runtime through the API. This is especially interesting if you want to use the network as a component in a larger software framework, which then operates caffe through its API.

All of this is just a conceptual thing, a convention, how to "deploy" a caffe model. In reality you can do whatever you want, maybe you don't even need something like a "deploy" config. There is nothing hardcoded into caffe about "deployment" configs or "tran_val" configs.

Jan

Hossein Hasanpour

unread,
Apr 19, 2016, 7:24:38 AM4/19/16
to Caffe Users
Thanks alot.
by the way do you mind if I ask another question ? like whether you have a very very tiny sample of code demonstrating the API and how would one use it?
I'm kind of lost here.
basically I mean, all I can think of is something like this :
you train your model using something like this ?
    string prototextPath = "";
    cout
<< "\nEnter the prototext file (e.g lenet_solver-leveldb.prototxt)\n ";
    getline
(cin, prototextPath);
   
// parse solver parameters
   
string solver_prototxt = prototextPath;// "examples/mnist/lenet_solver-leveldb.prototxt";
    caffe
::SolverParameter solver_param;
    caffe
::ReadProtoFromTextFileOrDie(solver_prototxt, &solver_param);

   
// set device id and mode
   
Caffe::SetDevice(0);
   
Caffe::set_mode(Caffe::GPU);

   
// solver handler
    caffe
::shared_ptr<caffe::Solver<float>> solver(caffe::GetSolver<float>(solver_param));

   
// start solver
    solver
->Solve();
And for testing something like this would be needed :
   // get a testing image and display it
   
Mat img = imread(path);//(CAFFE_ROOT + "/examples/images/mnist_5.png");
    cvtColor
(img, img, CV_BGR2GRAY);
    imshow
("img", img);
    waitKey
(1);


   
// Load net
   
Net<float> net(prototextPath);//(CAFFE_ROOT + "/examples/mnist/lenet_test-memory-1.prototxt");
   
string model_file = modelPath;//CAFFE_ROOT + "/examples/mnist/lenet_iter_10000.caffemodel";
    net
.CopyTrainedLayersFrom(model_file);

   
// set the patch for testing
    vector
<Mat> patches;
    patches
.push_back(img);

   
// push vector<Mat> to data layer
   
float loss = 0.0;
    boost
::shared_ptr<MemoryDataLayer<float> > memory_data_layer;
    memory_data_layer
= boost::static_pointer_cast<MemoryDataLayer<float>>(net.layer_by_name("data"));

    vector
<int> labels(patches.size());
    memory_data_layer
->AddMatVector(patches, labels);

   
// Net forward
   
const vector<Blob<float>*> & results = net.ForwardPrefilled(&loss);
   
float *output = results[1]->mutable_cpu_data();

   
// Display the output
   
for (int i = 0; i < 10; i++) {
        printf
("Probability to be Number %d is %.3f\n", i, output[i]);
   
}

So basically what API do we need to use in that case?
Thanks again
I really appreciate your time and help :)

Jan

unread,
Apr 19, 2016, 7:35:42 AM4/19/16
to Caffe Users
Well I suppose that would work. I usually train using the cmdline caffe tool. For the rest (visualization, plotting, whatever) I use the pycaffe interface. Loading a net and forwarding is as simple as

net = caffe.Net('prototxt', 'cafffemodel', caffe.TEST)

net
.blobs['myblob'].data[...] = ...

net
.forward()

# access all blobs you like to view results by net.blobs['blobname']
# and the current/trained layer parameters by net.params['layername']


Jan

Hossein Hasanpour

unread,
Apr 19, 2016, 7:49:21 AM4/19/16
to Caffe Users
Thanks, where can I find that pycaffe interface? does it have all of the visualization, plotting stuff ?

Jan

unread,
Apr 19, 2016, 8:40:52 AM4/19/16
to Caffe Users
No, you can use it to access the caffe network, the plotting and visualization you have to do yourself. It is located in the "python" subfolder of the caffe repo. Compile it by doing make pycaffe and use it in python with "import caffe".

Jan

Hossein Hasanpour

unread,
Apr 19, 2016, 10:53:21 AM4/19/16
to Caffe Users
Thanks alot sir
I really appreciate your help.
God bless you
Reply all
Reply to author
Forward
0 new messages