High loss training fcn8s at once

Kostas

unread,

Feb 20, 2017, 5:24:25 AM2/20/17

to Caffe Users

Hi All,

I'm trying to train fcn8s at once on my own dataset of about 2500 images and 23 classes. I'm using the already trained voc-fcn8s-atonce model, I've made the necessary modifications to the score layers to accommodate the number of classes and I've tried a few different learning rates. Started at 1e-10 as per the default on the solver file but the loss was getting unstable after 20K iterations so I've now dropped the learning rate to 1e-12 and I'm getting better results. Still not quite good enough..best results were produced at iteration 44K with:

loss 72293.2818645

overall accuracy 0.877659267828

mean accuracy 0.412693646808

mean IU 0.283250753423

fwavacc 0.826358571841

At iteration 48K the loss started oscillating heavily again. Do you have any suggestions what to try next? Should I reduce the learning further? Variable learning rate? Let it run longer? Is there an empirical rule on how many iterations I should expect the net to converge for the size of my dataset? 41% accuracy seems pretty low.. Any pointers on what could be the issue would be very much appreciated!

Many Thanks!

figure_1.png

Pasc Peli

unread,

Mar 11, 2018, 1:03:33 PM3/11/18

to Caffe Users

Hello Kosta,

Have you found a solution to this problem or what might be causing it?

Kostas

unread,

Mar 12, 2018, 7:04:53 AM3/12/18

to Caffe Users

Hi Pasc,

It's been a while and I honestly can't remember what I did and unfortunately I don't use this model any more. If I was you, I'd check and visualise my dataset to make sure it's right. Also 2500 images are not that many and what I've seen in other models is that training all layers ends up "untuning" the model as opposed to fine-tuning it and results in overfitting. So as a general suggestion I'd say try freezing the weights on most layers or selectively reducing the loss multiplier. Or go for even lower learning rate generally. Again looking at this chart, I'd say that by 48k, you might well be overfitting by that stage with so few images. You might(?) also have vanishing gradients that are causing this behaviour.

Good luck!

Reply all

Reply to author

Forward