deploy.prototxt examples: why do they include solver information?

csail123

unread,

Apr 4, 2016, 1:35:06 PM4/4/16

to Caffe Users

Why do example deploy.prototxt files from the caffe repo include learning rate parameters?

For example, in the caffe repo:

models/bvlc_alexnet/deploy.prototxt, in the "conv1" first convolution layer:



layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1      # shouldn't be needed?
    decay_mult: 1   # shouldn't be needed?
  }
  param {
    lr_mult: 2      # shouldn't be needed?
    decay_mult: 0   # shouldn't be needed?
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
  }
}

These are obviously directives for the solver and don't have a use in inference, but I wanted to check here before I submitted a PR or the like. I suspect they are simply ignored during inference.

Unless is there some valid reason they are kept in?

Just curious.

Jan

unread,

Apr 15, 2016, 8:03:12 AM4/15/16

to Caffe Users

You are right, they are not needed / ignored during inference. You could just as well delete them from the deploy version of the network. These directives still being there is probably due to the the writer of the deploy file copying the train_val version and just rewriting the input layers, leaving the others untouched. Which is ok.

Jan

SRQ

unread,

Apr 18, 2016, 10:59:14 AM4/18/16

to Caffe Users

Thank you for the question, I was just wondering the exact same thing. Also, are these required:?

layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }


  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {                    ##### STARTING HERE
      type: "xavier"
    }
    bias_filler {
      type: "constant"      
    }                                  ##### Ending here. Is this part useful as the weights are not being trained here nor the biases.                 
  }

Is there anything else that is not necessary but I am missing it?

Jan

unread,

Apr 19, 2016, 5:19:25 AM4/19/16

to Caffe Users

No, you can also remove the weight filler directives. As one would expect, these are only used to initialize the weights in the network. So if you have a set of trained weights for the network, they don't really matter anymore.

Jan

SRQ

unread,

Apr 19, 2016, 5:22:05 AM4/19/16

to Caffe Users

Thank you for the reply.

Hossein Hasanpour

unread,

Apr 19, 2016, 5:53:10 AM4/19/16

to Caffe Users

Hello guys, not a similar question though, what is a deploy file? how does one use it ?
where can I find more information about these files and working with them?
Thanks in advance

Jan

unread,

Apr 19, 2016, 6:12:10 AM4/19/16

to Caffe Users

It is basically the same as the train_val.prototxt, only for deployment, i.e. for use in conjunction with a caffemodel file and custom (not lmdb/leveldb/lmdb provided) data. There is no need to define a loss layer. Weight and bias filler directives as well as directives related only to learning (like lr_mult or loss_weight) can be stripped from the config. Usually instead of data input layers there is only a layer of type "Input", which has now the same task as the directives "input" and "input_shape" had earlier: define the name and shape of every input blob. The actual data is then stored inside the blob at runtime through the API. This is especially interesting if you want to use the network as a component in a larger software framework, which then operates caffe through its API.

All of this is just a conceptual thing, a convention, how to "deploy" a caffe model. In reality you can do whatever you want, maybe you don't even need something like a "deploy" config. There is nothing hardcoded into caffe about "deployment" configs or "tran_val" configs.

Jan

Hossein Hasanpour

unread,

Apr 19, 2016, 7:24:38 AM4/19/16

to Caffe Users

Thanks alot.
by the way do you mind if I ask another question ? like whether you have a very very tiny sample of code demonstrating the API and how would one use it?
I'm kind of lost here.
basically I mean, all I can think of is something like this :
you train your model using something like this ?

    string prototextPath = "";
    cout << "\nEnter the prototext file (e.g lenet_solver-leveldb.prototxt)\n ";
    getline(cin, prototextPath);
    // parse solver parameters
    string solver_prototxt = prototextPath;// "examples/mnist/lenet_solver-leveldb.prototxt";
    caffe::SolverParameter solver_param;
    caffe::ReadProtoFromTextFileOrDie(solver_prototxt, &solver_param);

    // set device id and mode
    Caffe::SetDevice(0);
    Caffe::set_mode(Caffe::GPU);

    // solver handler
    caffe::shared_ptr<caffe::Solver<float>> solver(caffe::GetSolver<float>(solver_param));

    // start solver
    solver->Solve();

And for testing something like this would be needed :

   // get a testing image and display it
    Mat img = imread(path);//(CAFFE_ROOT + "/examples/images/mnist_5.png");
    cvtColor(img, img, CV_BGR2GRAY);
    imshow("img", img);
    waitKey(1);


    // Load net
    Net<float> net(prototextPath);//(CAFFE_ROOT + "/examples/mnist/lenet_test-memory-1.prototxt");
    string model_file = modelPath;//CAFFE_ROOT + "/examples/mnist/lenet_iter_10000.caffemodel";
    net.CopyTrainedLayersFrom(model_file);

    // set the patch for testing
    vector<Mat> patches;
    patches.push_back(img);

    // push vector<Mat> to data layer
    float loss = 0.0;
    boost::shared_ptr<MemoryDataLayer<float> > memory_data_layer;
    memory_data_layer = boost::static_pointer_cast<MemoryDataLayer<float>>(net.layer_by_name("data"));

    vector<int> labels(patches.size());
    memory_data_layer->AddMatVector(patches, labels);

    // Net forward
    const vector<Blob<float>*> & results = net.ForwardPrefilled(&loss);
    float *output = results[1]->mutable_cpu_data();

    // Display the output
    for (int i = 0; i < 10; i++) {
        printf("Probability to be Number %d is %.3f\n", i, output[i]);
    }

So basically what API do we need to use in that case?
Thanks again
I really appreciate your time and help :)

Jan

unread,

Apr 19, 2016, 7:35:42 AM4/19/16

to Caffe Users

Well I suppose that would work. I usually train using the cmdline caffe tool. For the rest (visualization, plotting, whatever) I use the pycaffe interface. Loading a net and forwarding is as simple as

net = caffe.Net('prototxt', 'cafffemodel', caffe.TEST)

net.blobs['myblob'].data[...] = ...

net.forward()

# access all blobs you like to view results by net.blobs['blobname']
# and the current/trained layer parameters by net.params['layername']

Jan

Hossein Hasanpour

unread,

Apr 19, 2016, 7:49:21 AM4/19/16

to Caffe Users

Thanks, where can I find that pycaffe interface? does it have all of the visualization, plotting stuff ?

Jan

unread,

Apr 19, 2016, 8:40:52 AM4/19/16

to Caffe Users

No, you can use it to access the caffe network, the plotting and visualization you have to do yourself. It is located in the "python" subfolder of the caffe repo. Compile it by doing make pycaffe and use it in python with "import caffe".

Jan

Hossein Hasanpour

unread,

Apr 19, 2016, 10:53:21 AM4/19/16

to Caffe Users

Thanks alot sir
I really appreciate your help.
God bless you

Reply all

Reply to author

Forward