High loss training fcn8s at once

110 views
Skip to first unread message

Kostas

unread,
Feb 20, 2017, 5:24:25 AM2/20/17
to Caffe Users
Hi All,

I'm trying to train fcn8s at once on my own dataset of about 2500 images and 23 classes. I'm using the already trained voc-fcn8s-atonce model, I've made the necessary modifications to the score layers to accommodate the number of classes and I've tried a few different learning rates. Started at 1e-10 as per the default on the solver file but the loss was getting unstable after 20K iterations so I've now dropped the learning rate to 1e-12 and I'm getting better results. Still not quite good enough..best results were produced at iteration 44K with:

loss 72293.2818645
overall accuracy 0.877659267828
mean accuracy 0.412693646808
mean IU 0.283250753423
fwavacc 0.826358571841

At iteration 48K the loss started oscillating heavily again. Do you have any suggestions what to try next? Should I reduce the learning further? Variable learning rate? Let it run longer? Is there an empirical rule on how many iterations I should expect the net to converge for the size of my dataset? 41% accuracy seems pretty low.. Any pointers on what could be the issue would be very much appreciated!

Many Thanks!

figure_1.png

Pasc Peli

unread,
Mar 11, 2018, 1:03:33 PM3/11/18
to Caffe Users
Hello Kosta,

Have you found a solution to this problem or what might be causing it?

Kostas

unread,
Mar 12, 2018, 7:04:53 AM3/12/18
to Caffe Users
Hi Pasc,

It's been a while and I honestly can't remember what I did and unfortunately I don't use this model any more. If I was you, I'd check and visualise my dataset to make sure it's right. Also 2500 images are not that many and what I've seen in other models is that training all layers ends up "untuning" the model as opposed to fine-tuning it and results in overfitting. So as a general suggestion I'd say try freezing the weights on most layers or selectively reducing the loss multiplier. Or go for even lower learning rate generally. Again looking at this chart, I'd say that by 48k, you might well be overfitting by that stage with so few images. You might(?) also have vanishing gradients that are causing this behaviour.

Good luck!
Reply all
Reply to author
Forward
0 new messages