Fluctuating loss during training

Clint Sebastian

unread,

May 31, 2016, 3:39:53 AM5/31/16

to Caffe Users

Hi,

I am trying to train a network for segmentation using my own dataset but the loss generated is large and it keeps fluctuating.

The loss ranges between 15k and 30k.

I have tried altering the learning rate, momentum, and other parameters but the problem still persists.

Any ideas and suggestions to fix this on this would be appreciated.

Please have a look at my solver and train prototxt files.

Thank you.

solver.prototxt

train_test.prototxt

Clint Sebastian

unread,

May 31, 2016, 5:09:56 AM5/31/16

to Caffe Users

I also trained my model on FCN and I am having the same problem of fluctuating loss. It would be nice if someone can give me an insight.

Evan Shelhamer

unread,

May 31, 2016, 12:27:47 PM5/31/16

to Clint Sebastian, Caffe Users

The losses for FCNs can fluctuate a lot since the effective batch size includes all the pixels of an image, and images can differ a lot in (1) the distribution of outputs, such as the classes in semantic segmentation, and (2) the accuracy of the network for any given image. As long as the loss decreases overall across training it is fine.

The loss may look high overall simply because it is the sum of the loss at every pixel; if there are a lot of pixels, it will look like a lot of loss. Compare the metrics for your task over the course of training to ground your sense of what loss values are good or bad for your data.

Hope that helps,

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/0bcd98e7-488d-4058-8ae2-ecfb56b17d19%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Martin M

unread,

May 31, 2016, 5:15:05 PM5/31/16

to Caffe Users, clint.s...@gmail.com

I'm trying to learn an autoencoder (with both convolutions/deconvolutions and fully connected layers). During the first few hundred iterations, the loss is continuously decreasing, then it it suddenly increasing by a huge factor (it is much larger than at the very beginning), e.g.

I0531 19:53:05.429440 6605 solver.cpp:228] Iteration 0, loss = 1.21995e+08

I0531 19:53:05.429461 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 1.36218e+06

I0531 19:53:05.429482 6605 solver.cpp:244] Train net output #1: l2_error = 1.21995e+08 (* 1 = 1.21995e+08 loss)

I0531 19:53:05.429488 6605 sgd_solver.cpp:106] Iteration 0, lr = 1

I0531 19:53:50.632805 6605 solver.cpp:228] Iteration 100, loss = 1.2425e+06

I0531 19:53:50.632892 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 139539

I0531 19:53:50.632917 6605 solver.cpp:244] Train net output #1: l2_error = 1.2425e+06 (* 1 = 1.2425e+06 loss)

I0531 19:53:50.632931 6605 sgd_solver.cpp:106] Iteration 100, lr = 1

I0531 19:54:37.757134 6605 solver.cpp:228] Iteration 200, loss = 233545

I0531 19:54:37.757241 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 67159

I0531 19:54:37.757263 6605 solver.cpp:244] Train net output #1: l2_error = 233549 (* 1 = 233549 loss)

I0531 19:54:37.757268 6605 sgd_solver.cpp:106] Iteration 200, lr = 1

I0531 19:55:24.640571 6605 solver.cpp:228] Iteration 300, loss = 43443

I0531 19:55:24.640671 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 38911.9

I0531 19:55:24.640691 6605 solver.cpp:244] Train net output #1: l2_error = 43447 (* 1 = 43447 loss)

I0531 19:55:24.640696 6605 sgd_solver.cpp:106] Iteration 300, lr = 1

I0531 19:56:11.049823 6605 solver.cpp:228] Iteration 400, loss = 8476.57

I0531 19:56:11.049924 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 31773.4

I0531 19:56:11.049945 6605 solver.cpp:244] Train net output #1: l2_error = 8480.57 (* 1 = 8480.57 loss)

I0531 19:56:11.049949 6605 sgd_solver.cpp:106] Iteration 400, lr = 1

I0531 19:56:56.804311 6605 solver.cpp:228] Iteration 500, loss = 3066.52

I0531 19:56:56.804388 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 30970.7

I0531 19:56:56.804397 6605 solver.cpp:244] Train net output #1: l2_error = 3070.52 (* 1 = 3070.52 loss)

I0531 19:56:56.804414 6605 sgd_solver.cpp:106] Iteration 500, lr = 1

I0531 19:57:34.892603 6605 solver.cpp:228] Iteration 600, loss = 1.94462e+21

I0531 19:57:34.892720 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 4.46449e+12

I0531 19:57:34.892741 6605 solver.cpp:244] Train net output #1: l2_error = 1.94417e+21 (* 1 = 1.94417e+21 loss)

After that, the loss is more or less fluctuating around the high value. The training data was shuffled, I varied the batch size and used the adadelta optimizer. Any ideas where this effect comes from and how to avoid it?

Martin M