Hi everyone! So, I'm a complete beginner with Caffe (PyCaffe actually) and this may seem like something very simple but I can't seem to wrap my head around it. After reviewing many tutorials, GitHub discussions and forum posts the only thing that is clear to me is that, in Caffe, there are SEVERAL ways of doing the same thing. However, I am not quite sure if this is the result of different Caffe versions being merged in my head or this multiplicity does actually exist.
So, I am trying to train an already existing Caffe model (a basic CNN with images as input). I have available the
model_train_val.prototxt which includes two input layers: one for
phase: TRAIN and one for
phase: TEST which from my understanding (correct me if I am mistaken, please) are going to be used for training and validation respectively. For testing I should be using a second
model_deploy.prototxt that is equal to the training one but without the input and loss layers (right?). The other thing I have is the
solver.prototxt. I only have one of these as I understand that for testing I can simply load the net and perform all the forward passes manually (as there is no need to deal the "complicated" backward and update steps which the Solver handles automatically when training). In the solver I have included parameters for a validation round every now and then:
test_iter: 100
test_interval: 500
Which from my understanding is going to perform 100 test (validation) iterations every 500 train iterations. Also, the
net parameter is pointing to the train prototxt model file.
Now, for training I am using a custom dataset, with some complex data preprocessing for which I already have a set of functions created for input handling. For this reason my data flow is as follows:
- Load batch of images (with aforementioned custom functions).
- Set data and label blobs of the network to the loaded batch. (I am using a MemoryData layer, as I thought that was the appropriate type to feed the data in this fashion.)
- solver.solve(1) (Now that I think of it, should I tell the solver to solve starting from the first conv layer? I've seen this somewhere, as it seems that starting the forward pass from the beginning may overwrite the data I just placed into the net. However this was regarding the forward function, so I am not sure if that holds for the solve function too. I'd think so, but you never know.)
- Output some statistics.
So, for my actual question(s). Given that I am solving one step at a time, is the validation specified in the solver.prototxt going to be actually performed when it reaches the test_interval or that only works if I just run solver.solve() without specifying the number of steps? Should I, alternatively, remove the validation parameters from the solver and perform it manually? This is hard for me to understand as in many tutorials they use such validation parameters on the solver but then work step by step. Or not. And maybe they validate manually. Or not. Hence my doubt.
In case that the second option holds, how should I do it? What comes to my mind right now is to have a second data loader for the validation set and perform the desired forward passes on it without any backward nor update calls. The solver would be untouched during this process, hence it will resume training after this validation is done as it never happened. Would this be correct? Also, I understand that in this case I could remove the second input layer in the model and that there would be no need for the phase parameter.
I am assuming, however, that the snapshots will indeed be saved automatically when the solver's iteration count reaches a multiple of the snapshot parameter. And will the learning rate be updated? Or should I handle these manually too? What would the solver be for when using single steps then (apart from joining forward, backward and update in a single function)?
And finally, as I've mentioned above, for the testing (deploy) phase I am supposed to be using a second model_test.prototxt without input layer (nor loss layer). How am I supposed to load data into the model if it does not have an input layer? Couldn't I do this in the same fashion that I am validating? With simple forward passes and the basic training model definition (with input and loss layers) I should be able to test my model, shouldn't I?
If you need any more information to understand my problem, please let me know.
Thanks in advance.