How to classify and how to configure a net properly?

307 views
Skip to first unread message

Thomas Herus

unread,
Jul 28, 2015, 9:57:54 AM7/28/15
to Caffe Users
Hey guys,

I've been working with Caffe for the past couple of days, trying to complete the tutorials, installing Caffe properly and training some networks. Apart from Caffe/Ubuntu/Nvidia/dependencies being a huge been in the behind; once it works it's very nice.

I've had some issues along the way, most of which I've been able to solve on my own, but some things I can't grasp yet. I'm fairly new to the CNN business, so I have a hard time understanding all the parameters and options. Caffe's limited documentation doesn't make this easier.

Anyway, I was hoping you could answer some of my question.

1) When following the tutorial on MNIST I noticed that the choose to scale the input by 1/256. I don't understand why. I've toyed around with this setting a bit and noticed that removing is destroys all learning completely while changing the number by not too great amount does not have a significant effect. How come?

2) What is an easy way to classify a single (or multiple) image? I've used this tutorial so far, but it fails to explain most of the commands and the reason behind them. Especially the transformations and blob reshapes are like magic to me, although I am able to find the correct settings with some experimentation.

3) How can I change the number of outputs in the GoogLeNet network? I've semi-succesfully done this with LeNet and the Caffe reference net, to matcht he two classes of my own data. I can't get it to work on GoogLeNet though; simply changing the num_output in the loss3/top-5 layer just messes up the learning process.

4) Every so now and then, especially when using ImageNet (the Caffe Reference Net that is), the loss will stay around 0.68 or 0.7 for the rest of the training. I can't find the reason why. I've changed every setting imaginable, but it doesn't work. Even when I completely reset the settings it won't work. I have this problem with LeNet and CaffeRef, with both my own data and the example data.

5) What is the deploy.prototxt file for? I've noticed it describes the blobs in the network, but it's not needed for training and testing.

6) What is the importance of batch size? I've noticed that, combined with a certain resolution of an image, it can make or break the learning process, but I don't understand why.

7) Last but not least: do you know any good beginner tutorials that explain the parameters in the prototxt files? Especially what their effect/importance is.

I'm sorry about the amount of questions, but I would highly appreciate it if someone could answer one or multiple!

Cheers,

Thomas
Message has been deleted

Thomas Herus

unread,
Jul 28, 2015, 10:01:11 AM7/28/15
to Caffe Users, thomas...@gmail.com
As some extra information: I have used the MNIST dataset as well as my own dataset, consisting of 2000 labeled, noisy images of either a rectangle or a circle. I use 500 noise-free images to validate. The amount of circles/rectangles is pretty much equal, but the data is shuffled.

ath...@ualberta.ca

unread,
Jul 29, 2015, 11:57:53 AM7/29/15
to Caffe Users, thomas...@gmail.com
I would recommend a machine learning course like:
and/or neural nets:

followed by this fantastic CNN course (no videos):

Cheers,
Andy

npit

unread,
Jul 31, 2015, 5:47:44 AM7/31/15
to Caffe Users, thomas...@gmail.com
1) It explains it: "We will use a batch size of 64, and scale the incoming pixels so that they are in the range [0,1). Why 0.00390625? It is 1 divided by 256. ". 
The image pixels are in uint8 format, meaning a byte per pixel. A byte can store unsigned integer values ranging from 0 to 255. I don't know why it's required to be at the range of [0,1) and not of [0,1], but since
to scale a range to [0,1] you divide every element with the max value (255), using a value of 256 makes sure that every pixel value is less than 1.

2) Python should be the easiest out-of-the-box solution. Alternatively, you could use the extract_features tool, get the output of the last fc-layer and apply a softmax function to scale the output to the probability range.

3) While most CNNs have a single classification FC-layer, Googlenet has 3, to diminish the vanishing gradient problem during backprop. Check the googlenet paper http://arxiv.org/pdf/1409.4842v1.pdf
So you have to change all classification layers (loss1/classifier, loss2/classifier, and loss3/classifier) to produce the number of outputs you desire. The top-5 layer is an accuracy layer, you can leave it as-is (it doesn't even have a output dimension field).

4)The 0.68 value represents near random guesses for a classification network with 1000 classes, using the negative log loss. (-log(1/100)). So if the loss stays around there, the network is not learning anything, and either something's wrong with your data, or you have to change some hyper-parameters (LR, LR decay rate & scheme, training batchsize, etc)

5) I think it's for the matcaffe interface.

6) The network weights are updated using the gradients from a batch of images. Using a larger batch means that we update the net weights using a portion of the dataset that is more general (assuming you did not forget to shuffle your training data!) and thus learning is better and less prone to overfitting.

7) Sadly, no. I resort to the source code to find out. You can check the layers catalogue and the other sections in "Tour" , here : http://caffe.berkeleyvision.org/tutorial/

Hope this helps!

npit

unread,
Jul 31, 2015, 5:49:58 AM7/31/15
to Caffe Users, thomas...@gmail.com, pittar...@gmail.com


4)The 0.68 value represents near random guesses for a classification network with 1000 classes, using the negative log loss. (-log(1/100)). So if the loss stays around there, the network is not learning anything, and either something's wrong with your data, or you have to change some hyper-parameters (LR, LR decay rate & scheme, training batchsize, etc)


I mean . (-log(1/1000))  btw, which is ~= 6.9
Reply all
Reply to author
Forward
0 new messages