Fluctuating loss during training

1,434 views
Skip to first unread message

Clint Sebastian

unread,
May 31, 2016, 3:39:53 AM5/31/16
to Caffe Users
Hi,

I am trying to train a network for segmentation using my own dataset but the loss generated is large and it keeps fluctuating.

The loss ranges between 15k and 30k.

I have tried altering the learning rate, momentum, and other parameters but the problem still persists.

Any ideas and suggestions to fix this on this would be appreciated.

Please have a look at my solver and train prototxt files.

Thank you.
solver.prototxt
train_test.prototxt

Clint Sebastian

unread,
May 31, 2016, 5:09:56 AM5/31/16
to Caffe Users
I also trained my model on FCN and I am having the same problem of fluctuating loss. It would be nice if someone can give me an insight. 

Evan Shelhamer

unread,
May 31, 2016, 12:27:47 PM5/31/16
to Clint Sebastian, Caffe Users
The losses for FCNs can fluctuate a lot since the effective batch size includes all the pixels of an image, and images can differ a lot in (1) the distribution of outputs, such as the classes in semantic segmentation, and (2) the accuracy of the network for any given image. As long as the loss decreases overall across training it is fine.

The loss may look high overall simply because it is the sum of the loss at every pixel; if there are a lot of pixels, it will look like a lot of loss. Compare the metrics for your task over the course of training to ground your sense of what loss values are good or bad for your data.

Hope that helps,

Evan Shelhamer





--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/0bcd98e7-488d-4058-8ae2-ecfb56b17d19%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Martin M

unread,
May 31, 2016, 5:15:05 PM5/31/16
to Caffe Users, clint.s...@gmail.com
I'm trying to learn an autoencoder (with both convolutions/deconvolutions and fully connected layers). During the first few hundred iterations, the loss is continuously decreasing, then it it suddenly increasing by a huge factor (it is much larger than at the very beginning), e.g. 

I0531 19:53:05.429440  6605 solver.cpp:228] Iteration 0, loss = 1.21995e+08
I0531 19:53:05.429461  6605 solver.cpp:244]     Train net output #0: cross_entropy_error = 1.36218e+06
I0531 19:53:05.429482  6605 solver.cpp:244]     Train net output #1: l2_error = 1.21995e+08 (* 1 = 1.21995e+08 loss)
I0531 19:53:05.429488  6605 sgd_solver.cpp:106] Iteration 0, lr = 1
I0531 19:53:50.632805  6605 solver.cpp:228] Iteration 100, loss = 1.2425e+06
I0531 19:53:50.632892  6605 solver.cpp:244]     Train net output #0: cross_entropy_error = 139539
I0531 19:53:50.632917  6605 solver.cpp:244]     Train net output #1: l2_error = 1.2425e+06 (* 1 = 1.2425e+06 loss)
I0531 19:53:50.632931  6605 sgd_solver.cpp:106] Iteration 100, lr = 1
I0531 19:54:37.757134  6605 solver.cpp:228] Iteration 200, loss = 233545
I0531 19:54:37.757241  6605 solver.cpp:244]     Train net output #0: cross_entropy_error = 67159
I0531 19:54:37.757263  6605 solver.cpp:244]     Train net output #1: l2_error = 233549 (* 1 = 233549 loss)
I0531 19:54:37.757268  6605 sgd_solver.cpp:106] Iteration 200, lr = 1
I0531 19:55:24.640571  6605 solver.cpp:228] Iteration 300, loss = 43443
I0531 19:55:24.640671  6605 solver.cpp:244]     Train net output #0: cross_entropy_error = 38911.9
I0531 19:55:24.640691  6605 solver.cpp:244]     Train net output #1: l2_error = 43447 (* 1 = 43447 loss)
I0531 19:55:24.640696  6605 sgd_solver.cpp:106] Iteration 300, lr = 1
I0531 19:56:11.049823  6605 solver.cpp:228] Iteration 400, loss = 8476.57
I0531 19:56:11.049924  6605 solver.cpp:244]     Train net output #0: cross_entropy_error = 31773.4
I0531 19:56:11.049945  6605 solver.cpp:244]     Train net output #1: l2_error = 8480.57 (* 1 = 8480.57 loss)
I0531 19:56:11.049949  6605 sgd_solver.cpp:106] Iteration 400, lr = 1
I0531 19:56:56.804311  6605 solver.cpp:228] Iteration 500, loss = 3066.52
I0531 19:56:56.804388  6605 solver.cpp:244]     Train net output #0: cross_entropy_error = 30970.7
I0531 19:56:56.804397  6605 solver.cpp:244]     Train net output #1: l2_error = 3070.52 (* 1 = 3070.52 loss)
I0531 19:56:56.804414  6605 sgd_solver.cpp:106] Iteration 500, lr = 1
I0531 19:57:34.892603  6605 solver.cpp:228] Iteration 600, loss = 1.94462e+21
I0531 19:57:34.892720  6605 solver.cpp:244]     Train net output #0: cross_entropy_error = 4.46449e+12
I0531 19:57:34.892741  6605 solver.cpp:244]     Train net output #1: l2_error = 1.94417e+21 (* 1 = 1.94417e+21 loss)

After that, the loss is more or less fluctuating around the high value. The training data was shuffled, I varied the batch size and used the adadelta optimizer. Any ideas where this effect comes from and how to avoid it?

Martin M

Clint Sebastian

unread,
Jun 1, 2016, 3:54:30 AM6/1/16
to Caffe Users, clint.s...@gmail.com
Thanks, Evan. That clears up a lot of things. 


On Tuesday, May 31, 2016 at 6:27:47 PM UTC+2, Evan Shelhamer wrote:
0 new messages