Loss doesn't decrease problem.

994 views
Skip to first unread message

mrgloom

unread,
Oct 6, 2016, 4:04:57 AM10/6/16
to DIGITS Users
How to debug and fix problem when loss doesn't decrease? Any general recommendation?

Greg Heinrich

unread,
Oct 6, 2016, 4:24:36 AM10/6/16
to DIGITS Users
Hello, you will probably find this useful: https://youtu.be/F1ka6a13S9I

mrgloom

unread,
Oct 10, 2016, 5:35:44 AM10/10/16
to DIGITS Users
Seems this lecture is too entry level and there is no discussion about loss debugging, so I need more practical advises.
Like decrease\increase learning rate or batch size or debug weights during learning.
Like here:
https://github.com/BVLC/caffe/issues/401
https://github.com/BVLC/caffe/issues/3243
https://github.com/BVLC/caffe/issues/2731

четверг, 6 октября 2016 г., 11:24:36 UTC+3 пользователь Greg Heinrich написал:

Greg Heinrich

unread,
Oct 10, 2016, 10:31:08 AM10/10/16
to mrgloom, DIGITS Users
I wouldn't take the advice from Andrew Ng so lightly... are you having a bias or variance issue?

--
You received this message because you are subscribed to the Google Groups "DIGITS Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digits-users+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/digits-users/bd9165b3-8b15-4335-85a7-908954e8853f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Renz Abergos

unread,
Oct 11, 2016, 4:10:33 AM10/11/16
to DIGITS Users
Are you referring to the Training Loss or Validation Loss?

Training loss is supposed to decrease over time using a good learning rate. If not, I suggest to decrease the learning rate. Experiment on different learning rate values.

Also, try using different solvers such as ADAM (Learning rate of 0.001) and SGD + Nesterov Momentum (LR of 0.1 to 0.001). These help for fast convergence.

mrgloom

unread,
Oct 12, 2016, 8:25:04 AM10/12/16
to DIGITS Users, gloomy...@gmail.com
I don't think my problem related to bias\variance problem, my network is unable to learn anything.

For 2 class problem I have 50% accuracy (random guess) and train and test loss are not decreasing since start of training.
I tried to train network for 10 epochs and then tried to predict single image to see weight and activations and in deeper layers mean and std of activcations were almost 0.0.
So seems even in forward pass for some reason activations become 0.0, how to prevent it?


понедельник, 10 октября 2016 г., 17:31:08 UTC+3 пользователь Greg Heinrich написал:
I wouldn't take the advice from Andrew Ng so lightly... are you having a bias or variance issue?
On Mon, Oct 10, 2016 at 11:35 AM, mrgloom <gloomy...@gmail.com> wrote:
Seems this lecture is too entry level and there is no discussion about loss debugging, so I need more practical advises.
Like decrease\increase learning rate or batch size or debug weights during learning.
Like here:
https://github.com/BVLC/caffe/issues/401
https://github.com/BVLC/caffe/issues/3243
https://github.com/BVLC/caffe/issues/2731

четверг, 6 октября 2016 г., 11:24:36 UTC+3 пользователь Greg Heinrich написал:
Hello, you will probably find this useful: https://youtu.be/F1ka6a13S9I

On Thursday, October 6, 2016 at 10:04:57 AM UTC+2, mrgloom wrote:
How to debug and fix problem when loss doesn't decrease? Any general recommendation?

--
You received this message because you are subscribed to the Google Groups "DIGITS Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to digits-users...@googlegroups.com.

mrgloom

unread,
Oct 13, 2016, 4:10:41 PM10/13/16
to DIGITS Users, gloomy...@gmail.com

Here I found some info about this phenomen

One dangerous pitfall that can be easily noticed with this visualization is that some activation maps may be all zero for many different inputs, which can indicate dead filters, and can be a symptom of high learning rates.

During training learning rate is decreasing, but situation not changing.

Here is screenshot from DIGITS, conv2 layer activations are zero.
In pool1 layer I use global average pooling.


среда, 12 октября 2016 г., 15:25:04 UTC+3 пользователь mrgloom написал:

Renz Abergos

unread,
Oct 13, 2016, 8:33:56 PM10/13/16
to DIGITS Users, gloomy...@gmail.com
Hi, observe the data shape in each layer in your visualization. Double check the connections of your layers, I'm guessing CONV2 is not properly connected to CONV1

mrgloom

unread,
Oct 15, 2016, 8:01:41 AM10/15/16
to DIGITS Users, gloomy...@gmail.com

Seem it's ok, it's DAG.

I fix previous problem by setting smaller learning rate, but with 2 layer network I can only get 60% accuracy.

But here is new problem with bigger net (sequence of 5 CONV->RELU) activations of last layers are not zero, but train and test loss doesn't decrease much during 30-epoch training, maybe because of small learning rate it's stack is some local minima?





пятница, 14 октября 2016 г., 3:33:56 UTC+3 пользователь Renz Abergos написал:
Reply all
Reply to author
Forward
0 new messages