Hey guys,
I've been working with Caffe for the past couple of days, trying to complete the tutorials, installing Caffe properly and training some networks. Apart from Caffe/Ubuntu/Nvidia/dependencies being a huge been in the behind; once it works it's very nice.
I've had some issues along the way, most of which I've been able to solve on my own, but some things I can't grasp yet. I'm fairly new to the CNN business, so I have a hard time understanding all the parameters and options. Caffe's limited documentation doesn't make this easier.
Anyway, I was hoping you could answer some of my question.
1) When following the tutorial on MNIST I noticed that the choose to scale the input by 1/256. I don't understand why. I've toyed around with this setting a bit and noticed that removing is destroys all learning completely while changing the number by not too great amount does not have a significant effect. How come?
2) What is an easy way to classify a single (or multiple) image? I've used
this tutorial so far, but it fails to explain most of the commands and the reason behind them. Especially the transformations and blob reshapes are like magic to me, although I am able to find the correct settings with some experimentation.
3) How can I change the number of outputs in the GoogLeNet network? I've semi-succesfully done this with LeNet and the Caffe reference net, to matcht he two classes of my own data. I can't get it to work on GoogLeNet though; simply changing the num_output in the loss3/top-5 layer just messes up the learning process.
4) Every so now and then, especially when using ImageNet (the Caffe Reference Net that is), the loss will stay around 0.68 or 0.7 for the rest of the training. I can't find the reason why. I've changed every setting imaginable, but it doesn't work. Even when I completely reset the settings it won't work. I have this problem with LeNet and CaffeRef, with both my own data and the example data.
5) What is the deploy.prototxt file for? I've noticed it describes the blobs in the network, but it's not needed for training and testing.
6) What is the importance of batch size? I've noticed that, combined with a certain resolution of an image, it can make or break the learning process, but I don't understand why.
7) Last but not least: do you know any good beginner tutorials that explain the parameters in the prototxt files? Especially what their effect/importance is.
I'm sorry about the amount of questions, but I would highly appreciate it if someone could answer one or multiple!
Cheers,
Thomas