How to validate? Atomatic vs. manual solving.

54 views
Skip to first unread message

Ses Vinyes

unread,
Mar 27, 2018, 12:18:19 PM3/27/18
to Caffe Users
Hi everyone! So, I'm a complete beginner with Caffe (PyCaffe actually) and this may seem like something very simple but I can't seem to wrap my head around it. After reviewing many tutorials, GitHub discussions and forum posts the only thing that is clear to me is that, in Caffe, there are SEVERAL ways of doing the same thing. However, I am not quite sure if this is the result of different Caffe versions being merged in my head or this multiplicity does actually exist.

So, I am trying to train an already existing Caffe model (a basic CNN with images as input). I have available the model_train_val.prototxt which includes two input layers: one for phase: TRAIN and one for phase: TEST which from my understanding (correct me if I am mistaken, please) are going to be used for training and validation respectively. For testing I should be using a second model_deploy.prototxt that is equal to the training one but without the input and loss layers (right?). The other thing I have is the solver.prototxt. I only have one of these as I understand that for testing I can simply load the net and perform all the forward passes manually (as there is no need to deal the "complicated" backward and update steps which the Solver handles automatically when training). In the solver I have included parameters for a validation round every now and then:

test_iter: 100
test_interval: 500

Which from my understanding is going to perform 100 test (validation) iterations every 500 train iterations. Also, the net parameter is pointing to the train prototxt model file.

Now, for training I am using a custom dataset, with some complex data preprocessing for which I already have a set of functions created for input handling. For this reason my data flow is as follows:
  • Load batch of images (with aforementioned custom functions).
  • Set data and label blobs of the network to the loaded batch. (I am using a MemoryData layer, as I thought that was the appropriate type to feed the data in this fashion.)
  • solver.solve(1) (Now that I think of it, should I tell the solver to solve starting from the first conv layer? I've seen this somewhere, as it seems that starting the forward pass from the beginning may overwrite the data I just placed into the net. However this was regarding the forward function, so I am not sure if that holds for the solve function too. I'd think so, but you never know.)
  • Output some statistics.

So, for my actual question(s). Given that I am solving one step at a time, is the validation specified in the solver.prototxt going to be actually performed when it reaches the test_interval or that only works if I just run solver.solve() without specifying the number of steps? Should I, alternatively, remove the validation parameters from the solver and perform it manually? This is hard for me to understand as in many tutorials they use such validation parameters on the solver but then work step by step. Or not. And maybe they validate manually. Or not. Hence my doubt.


In case that the second option holds, how should I do it? What comes to my mind right now is to have a second data loader for the validation set and perform the desired forward passes on it without any backward nor update calls. The solver would be untouched during this process, hence it will resume training after this validation is done as it never happened. Would this be correct? Also, I understand that in this case I could remove the second input layer in the model and that there would be no need for the phase parameter.


I am assuming, however, that the snapshots will indeed be saved automatically when the solver's iteration count reaches a multiple of the snapshot parameter. And will the learning rate be updated? Or should I handle these manually too? What would the solver be for when using single steps then (apart from joining forward, backward and update in a single function)? 


And finally, as I've mentioned above, for the testing (deploy) phase I am supposed to be using a second model_test.prototxt without input layer (nor loss layer). How am I supposed to load data into the model if it does not have an input layer? Couldn't I do this in the same fashion that I am validating? With simple forward passes and the basic training model definition (with input and loss layers) I should be able to test my model, shouldn't I?


If you need any more information to understand my problem, please let me know.


Thanks in advance.

Przemek D

unread,
Mar 29, 2018, 3:53:21 AM3/29/18
to Caffe Users
Great post and interesting set of questions -- to me it is a clear sign that what Caffe needs at the moment is a consolidated tutorial to introduce the new users to its basic concepts, as currently much of the required knowledge is spread over many sources, often of varying degrees of accuracy and up-to-dateness.

Now to try and actually answer these. The functions you use from the Python interface are only bindings to respective functions in the C++ backend - see python/caffe/_caffe.cpp for details. When you call solver.solve(), it is equivalent to running caffe train -solver=solver.prototxt -- training- and validation-wise. Calling solver.step(n) gives you finer control over essentially the same thing: solve() internally calls step(), which contains all the training behavior (forward & backward passes, learning rate updates, loss value outputs, testing, snapshotting, callbacks, ....). This means that you're free to work step by step - when you reach a test_iter, solver will automatically run the test net for you (during the step() call); same with the log display and snapshots. So there's no need to do anything manually - everything you said in the solver.prototxt, Caffe will do for you, step-by-step operation or not.

As to your issue with deploy.prototxt -- I think you're confusing "test" with "deployment". This is partially due to how the Caffe tutorials are written though, so another thing that should be made more clear in them. In my opinion, testing a model means running it on another known dataset (let's imagine an ILSVRC scenario, where you train on one dataset, validate on another, but the test server measures performance on a third, secret one) - so you have one more database with images and labels and you want to measure the accuracy on it. In this situation, you want to have the same prototxt as before, that is: with input layer, accuracy, loss etc., except you would change the data layer to point to your other (test) dataset. Running caffe test with this model (you will probably have to specify the network stage - consult caffe --help for details) will output the test results (accuracy, loss, whatever) on this other dataset.
Now, deployment is a whole another thing: this means using the trained network as a predictor for new, unseen data. The main difference is that you no longer have labels for the data -- you want the network to tell you those. For this reason you cannot use accuracy or loss layers -- there is no ground truth to compare against. You might also want to use your network as an on-line predictor: meaning, you do not collect the data beforehand -- it flows in as a continuous, potentially unending stream. So you might not be able to assemble a database, but instead want to be able to take the newly acquired data point and put it directly into the network -- this is exactly what using an input field lets you do (and exactly why you can't use the data layer).

I hope this clears things up for you. Please ask any further questions you might have though :)

Ses Vinyes

unread,
Mar 29, 2018, 12:38:55 PM3/29/18
to Caffe Users
Ok! Thanks for your quick response!

So, for what you say, it seems that the solver is going to perform everything you tell it to perform when it gets to the apropriate iteration to do so. In case of a validation round (even if test_iter is telling the solver to perform 100 iteration for validation) Caffe will run the whole validation stage before continuing training. That is good to know.

Regarding what you say about the difference between Testing and Deploy, that totally makes sense, thank you for the clear explanation.

I think I am ready to start training now ;)

Thank you very much!
Reply all
Reply to author
Forward
0 new messages