I'm trying to learn an autoencoder (with both convolutions/deconvolutions and fully connected layers). During the first few hundred iterations, the loss is continuously decreasing, then it it suddenly increasing by a huge factor (it is much larger than at the very beginning), e.g.
I0531 19:53:05.429440 6605 solver.cpp:228] Iteration 0, loss = 1.21995e+08
I0531 19:53:05.429461 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 1.36218e+06
I0531 19:53:05.429482 6605 solver.cpp:244] Train net output #1: l2_error = 1.21995e+08 (* 1 = 1.21995e+08 loss)
I0531 19:53:05.429488 6605 sgd_solver.cpp:106] Iteration 0, lr = 1
I0531 19:53:50.632805 6605 solver.cpp:228] Iteration 100, loss = 1.2425e+06
I0531 19:53:50.632892 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 139539
I0531 19:53:50.632917 6605 solver.cpp:244] Train net output #1: l2_error = 1.2425e+06 (* 1 = 1.2425e+06 loss)
I0531 19:53:50.632931 6605 sgd_solver.cpp:106] Iteration 100, lr = 1
I0531 19:54:37.757134 6605 solver.cpp:228] Iteration 200, loss = 233545
I0531 19:54:37.757241 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 67159
I0531 19:54:37.757263 6605 solver.cpp:244] Train net output #1: l2_error = 233549 (* 1 = 233549 loss)
I0531 19:54:37.757268 6605 sgd_solver.cpp:106] Iteration 200, lr = 1
I0531 19:55:24.640571 6605 solver.cpp:228] Iteration 300, loss = 43443
I0531 19:55:24.640671 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 38911.9
I0531 19:55:24.640691 6605 solver.cpp:244] Train net output #1: l2_error = 43447 (* 1 = 43447 loss)
I0531 19:55:24.640696 6605 sgd_solver.cpp:106] Iteration 300, lr = 1
I0531 19:56:11.049823 6605 solver.cpp:228] Iteration 400, loss = 8476.57
I0531 19:56:11.049924 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 31773.4
I0531 19:56:11.049945 6605 solver.cpp:244] Train net output #1: l2_error = 8480.57 (* 1 = 8480.57 loss)
I0531 19:56:11.049949 6605 sgd_solver.cpp:106] Iteration 400, lr = 1
I0531 19:56:56.804311 6605 solver.cpp:228] Iteration 500, loss = 3066.52
I0531 19:56:56.804388 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 30970.7
I0531 19:56:56.804397 6605 solver.cpp:244] Train net output #1: l2_error = 3070.52 (* 1 = 3070.52 loss)
I0531 19:56:56.804414 6605 sgd_solver.cpp:106] Iteration 500, lr = 1
I0531 19:57:34.892603 6605 solver.cpp:228] Iteration 600, loss = 1.94462e+21
I0531 19:57:34.892720 6605 solver.cpp:244] Train net output #0: cross_entropy_error = 4.46449e+12
I0531 19:57:34.892741 6605 solver.cpp:244] Train net output #1: l2_error = 1.94417e+21 (* 1 = 1.94417e+21 loss)
After that, the loss is more or less fluctuating around the high value. The training data was shuffled, I varied the batch size and used the adadelta optimizer. Any ideas where this effect comes from and how to avoid it?
Martin M