Retraining GoogleNet - too little and negative loss

170 views
Skip to first unread message

Максим Купрашевич

unread,
Apr 4, 2017, 6:05:31 AM4/4/17
to Caffe Users
Hello. I'm not very experienced in Caffe and seems that need help.

I want to retrain GoogleNet to reproduce Deep Dream example.

So what I did: I collected 2893 examples, convert all them to 3 channels .jpg 256x256 and put each one to each own directory, to make them all classes.

Then I downloaded Caffe, changed three num_outputs in train_val.prototxt from 1000 to 2893 and changed 1000 to 2893 in deploy.prototxt.

So in my understanding of Caffe I did all is needed to retrain (also I renamed classification layers to load weights from Imagenet trained GoogleNet).

But I faced strange issue. With learning reate 0.01 it converges in < 50 iterations to impossible little values of loss and then goes to negative!
Example:
I0404 13:00:21.652196 16349 solver.cpp:337] Iteration 55, Testing net (#0)
I0404 13:01:12.943830 16349 solver.cpp:404]     Test net output #0: loss1/loss1 = 0 (* 0.3 = 0 loss)
I0404 13:01:12.943946 16349 solver.cpp:404]     Test net output #1: loss1/top-1 = 1
I0404 13:01:12.943969 16349 solver.cpp:404]     Test net output #2: loss1/top-5 = 1
I0404 13:01:12.943979 16349 solver.cpp:404]     Test net output #3: loss2/loss2 = 0 (* 0.3 = 0 loss)
I0404 13:01:12.943985 16349 solver.cpp:404]     Test net output #4: loss2/top-1 = 1
I0404 13:01:12.943991 16349 solver.cpp:404]     Test net output #5: loss2/top-5 = 1
I0404 13:01:12.944000 16349 solver.cpp:404]     Test net output #6: loss3/loss3 = 0 (* 1 = 0 loss)
I0404 13:01:12.944006 16349 solver.cpp:404]     Test net output #7: loss3/top-1 = 1
I0404 13:01:12.944012 16349 solver.cpp:404]     Test net output #8: loss3/top-5 = 1
I0404 13:01:13.739944 16349 solver.cpp:228] Iteration 55, loss = 5.54323e-08
I0404 13:01:13.740008 16349 solver.cpp:244]     Train net output #0: loss1/loss1 = 0 (* 0.3 = 0 loss)
I0404 13:01:13.740020 16349 solver.cpp:244]     Train net output #1: loss2/loss2 = 0 (* 0.3 = 0 loss)
I0404 13:01:13.740030 16349 solver.cpp:244]     Train net output #2: loss3/loss3 = 0 (* 1 = 0 loss)
I0404 13:01:13.740038 16349 sgd_solver.cpp:106] Iteration 55, lr = 0.01

It is output with training from scratch! In that example I didn't set Imagenet weights.

Only with learning rate <= 1e-8 networks doint something that looks like normal training.

What am I doing wrong? Please, help me understand network behavior. 

I've attached all files.
deploy.prototxt
solver.prototxt
train_val.prototxt

Przemek D

unread,
Apr 4, 2017, 6:24:07 AM4/4/17
to Caffe Users
Your "classes" do not group images sharing the same features, so they are not real classes - just identifiers for each image in your dataset. As a consequence, you taught your network to simply memorize images, and it looks like it's doing a splendid job at it - zero loss means no mistakes.
Negative loss however can be a sign of serious problems, but your log does not show any example of that. Could you provide a complete log file where this happens (better as an attachment rather than in post content)?

Максим Купрашевич

unread,
Apr 4, 2017, 9:32:14 AM4/4/17
to Caffe Users
Yes, but I need to use DeepDream, not really classification network. For DeepDream overfit is ok, isn't it?

Well, since my first post I've re-built Caffe from scratch fully and now seituation magically changed to vice versa. Now it is every time the same loss - with any step and any LR, even big like 0.1

Please check my logs, maybe you can help.

вторник, 4 апреля 2017 г., 13:24:07 UTC+3 пользователь Przemek D написал:
report.txt

Максим Купрашевич

unread,
Apr 4, 2017, 9:49:44 AM4/4/17
to Caffe Users
https://github.com/BVLC/caffe/issues/4950
Exactly the same values like on third output in my case, but no solution.

вторник, 4 апреля 2017 г., 16:32:14 UTC+3 пользователь Максим Купрашевич написал:

Максим Купрашевич

unread,
Apr 4, 2017, 10:16:43 AM4/4/17
to Caffe Users
Okay, not decreasing loss was because of too high LR.
Now all as was at start. In < 80 iterations loss comes to zero and then negative.
I've attached log with full debug info. Please help me someone. 

вторник, 4 апреля 2017 г., 16:49:44 UTC+3 пользователь Максим Купрашевич написал:
report.txt

Максим Купрашевич

unread,
Apr 4, 2017, 12:56:33 PM4/4/17
to Caffe Users
Ok, I solved problem. It's really non-intuitive. First problem with <= 0 loss was because I forgot to set up class nums in file train.txt where are images paths. I thought Caffe will take it from subfolders, and because it's run without error in that case... 
So instead of:
img1 0
img2 1
...
It was:
img1
img2
...

Second error with endless error (87.+ loss) was because Caffe can't work with latest CuDNN. So solution is to set USE_CUDNN := 0 in makefile.

вторник, 4 апреля 2017 г., 17:16:43 UTC+3 пользователь Максим Купрашевич написал:

Przemek D

unread,
Apr 5, 2017, 2:55:14 AM4/5/17
to Caffe Users
Ah yes. I don't think the caffe docs mention the input file list format anywhere.

As to your cuDNN error, it is more interesting. I've seen the 87.3365 loss (this exact number) multiple times and it was always quite mysterious. You mean the 6.0 version caused it in your case? Maybe consider posting an issue on caffe github?

Also consider marking your post above as answer. Might help others this way.
Reply all
Reply to author
Forward
0 new messages