Finetuning issue; loss becomes zero and negative.

70 views
Skip to first unread message

arpit jain

unread,
Jun 30, 2016, 9:37:09 PM6/30/16
to Caffe Users

Hi,

I am new to Caffe. I am working on a project aimed to identify 3 types of objects in an image. Hence for my classification task, I will be needing 4 types of outputs i.e. the three objects and None.

My dataset is quite small. 
Training set : Total 3000 images. 1000 each of the 3 categories
Validation set: Total 3000 images. 1000 each of the 3 categories
(Note that my dataset does not contain images which have none of the three objects.)

I have to use the BVLC Alexnet pretrained model 
I have been suggested 2 methods:

Method 1:
Add an extra inner product layer in the end of the model (before loss and accuracy layer) such that it gives 4 output.

Method 2:
Modify the last inner product layer (i.e. fc8). Change its parameter to 4 instead of 1000.

I have tried both the methods. 
In method 2 the loss is going negative just after 20 iterations
In method 1 the loss becomes zero and the accuracy becomes 1 after 20 iterations.

In both the methods, I increased the 'lr_mult' parameter to 10 from 1 so that it learns more than other layers.
I am working in CPU Mode. And my training is very very slow. It took me 14 hours just to reach 1200 iterations.

Can anyone please tell me 
1. where I am wrong in both the methods
2. Is the way I am taking my dataset correct ?
3. Which method is preferable ?
4. How should I change my hyperparameters to obtain good results.
5. Should I change deploy.prototxt file also ?
6. Is the training so slow ?
7. I have created mean image file for my own dataset and not used the pretrained model one. Is it fine ?

http://pastebin.com/0vtJfjC4 - Method 2 train_val.prototxt
http://pastebin.com/tgYfmmU7 - Solver.prototxt

Hieu Do Trung

unread,
Jul 1, 2016, 3:50:14 AM7/1/16
to Caffe Users
How did you prepare data for your 4th category? Images that contain objects other than the ones from other 3 categories?
If I had 3 categories to classify, I'd use 3 outputs in the last layer only.

Training on CPU is much, much slower than on GPU.
The deploy.prototxt is used when done training and use it for classification only.
max_iter: 10000 -> I'd use stepsize: 2500 ~ 3000, so that after 3 times of step down, learning rate would be 0.001/1000 = 0.000001, just like the one provided in project's github site.

I'm no expert in this, so maybe my answer contains incorrect info.
Reply all
Reply to author
Forward
0 new messages