FCN32 training loss getting stuck in local minima

skeptic.sisyphus

unread,

Jul 22, 2016, 2:59:29 AM7/22/16

to Caffe Users

Hi,

I am training FCN and it is getting stuck in local minima. The loss do not decrease for more than 30K iterations. I have tried decreasing the learning rate but the problem persists. I have trained the first 70K iterations using 1e-10 learning rate. Later i have loaded the model and finetuned it using the learning rate 1e-12. However the results are like this:

mprl

unread,

Jul 22, 2016, 6:23:42 AM7/22/16

to Caffe Users

I see two solutions for escaping from local minima :
- Increasing the learning rate (but you did the opposite, so tell me if i wrong !)
- Decreasing the batch size (your training loss will oscillate more, so it can help to escape from local minimas). Again, if it's not true, tell me, i'm interested !

skeptic.sisyphus

unread,

Jul 22, 2016, 8:42:52 AM7/22/16

to Caffe Users

Thanks for the response. You are correct i did the opposite, the reason is i have been struggling with NaNs as well. So, in order not to get NaNs i attempted to decrease the learning rate but to no avail. I am trying to fine tune now with higher learning rate and see if it gets out or overshoots.

I can not find anywhere the batch_size for FCN (Fully convolutional Network). I am guessing if the batch is the individual image?

skeptic.sisyphus

unread,

Jul 22, 2016, 9:39:48 AM7/22/16

to Caffe Users

Here is the update with increased learning rate. The loss shoots high as expected and then comes back to previous level. So, it is somehow now getting past a certain level. Any idea why that might be happening?

Message has been deleted

Victor Genty

unread,

Jul 22, 2016, 11:56:39 AM7/22/16

to Caffe Users

I missed seeing the graph in previous post. Loss is leveling off at 16K? Seems a bit high. Have you paused to run inference and check the output on validation? Without checking we can't be too sure you are in a minimum. You can set the batch size in the python layer btw.

skeptic.sisyphus

unread,

Jul 22, 2016, 1:09:52 PM7/22/16

to Caffe Users

I wonder why my previous post and plot got deleted. Here is another latest snapshot zoomed at the last part of the iterations.

By running inference you mean computing the accuracy on validation set? Wouldn't so much high loss result in lot false positives?

There is no parameters to set in layers file however a comment reads like this:

def reshape(self, bottom, top):

# load data for tops and reshape tops to fit (1 is the batch dim)

...

skeptic.sisyphus

unread,

Jul 22, 2016, 2:22:47 PM7/22/16

to Caffe Users

Just an update about the inference; I have checked it with example image and result is rather poor so, sadly it hasn't converged

On Friday, July 22, 2016 at 5:56:39 PM UTC+2, Vic Genty wrote:

Reply all

Reply to author

Forward