loss suddenly jumps as fine-tuning GoogleNet

149 views
Skip to first unread message

dpvo

unread,
May 15, 2016, 3:49:43 PM5/15/16
to Caffe Users
I have a classification problem of 280 classes with ~278,000 images.
I do fine-tuning based on the model GoogleNet (bvlc_googlenet in caffe) using quick_solver.txt.
My solver is as follows:

test_iter: 1000
test_interval: 4000
test_initialization: false
display: 40
average_loss: 40
base_lr: 0.001
lr_policy: "poly"
power: 0.5
max_iter: 800000
momentum: 0.9
weight_decay: 0.0002
snapshot: 20000

During training I use the batch size of 32, and the test batch 32 too. I just relearn from scratch three layers loss1/classifier
loss2/classifier and loss3/classifier by renaming them.

Logfile of the very first iterations:
I0515 08:44:35.029089  1279 solver.cpp:279] Solving GoogleNet
I0515 08:44:35.029093  1279 solver.cpp:280] Learning Rate Policy: poly
I0515 08:44:35.131124  1279 solver.cpp:228] Iteration 0, loss = 13.0141
I0515 08:44:35.131161  1279 solver.cpp:244]     Train net output #0: loss1/loss1 = 10.2031 (* 0.3 = 3.06092 loss)
I0515 08:44:35.131167  1279 solver.cpp:244]     Train net output #1: loss2/loss1 = 10.9583 (* 0.3 = 3.28748 loss)
I0515 08:44:35.131171  1279 solver.cpp:244]     Train net output #2: loss3/loss3 = 6.66573 (* 1 = 6.66573 loss)
I0515 08:44:35.131183  1279 sgd_solver.cpp:106] Iteration 0, lr = 0.001
I0515 08:44:41.838122  1279 solver.cpp:228] Iteration 40, loss = 9.72169
I0515 08:44:41.838163  1279 solver.cpp:244]     Train net output #0: loss1/loss1 = 5.7261 (* 0.3 = 1.71783 loss)
I0515 08:44:41.838170  1279 solver.cpp:244]     Train net output #1: loss2/loss1 = 5.65961 (* 0.3 = 1.69788 loss)
I0515 08:44:41.838173  1279 solver.cpp:244]     Train net output #2: loss3/loss3 = 5.46685 (* 1 = 5.46685 loss)
I0515 08:44:41.838179  1279 sgd_solver.cpp:106] Iteration 40, lr = 0.000999975


Until the 100,000-th iteration, my net obtains 50% top-1 accuracy and ~80% top-5 accuracy:
I0515 13:45:59.789113  1279 solver.cpp:337] Iteration 100000, Testing net (#0)
I0515 13:46:53.914217  1279 solver.cpp:404]     Test net output #0: loss1/loss1 = 2.08631 (* 0.3 = 0.625893 loss)
I0515 13:46:53.914274  1279 solver.cpp:404]     Test net output #1: loss1/top-1 = 0.458375
I0515 13:46:53.914279  1279 solver.cpp:404]     Test net output #2: loss1/top-5 = 0.768781
I0515 13:46:53.914284  1279 solver.cpp:404]     Test net output #3: loss2/loss1 = 1.88489 (* 0.3 = 0.565468 loss)
I0515 13:46:53.914288  1279 solver.cpp:404]     Test net output #4: loss2/top-1 = 0.494906
I0515 13:46:53.914290  1279 solver.cpp:404]     Test net output #5: loss2/top-5 = 0.805906
I0515 13:46:53.914294  1279 solver.cpp:404]     Test net output #6: loss3/loss3 = 1.77118 (* 1 = 1.77118 loss)
I0515 13:46:53.914297  1279 solver.cpp:404]     Test net output #7: loss3/top-1 = 0.517719
I0515 13:46:53.914299  1279 solver.cpp:404]     Test net output #8: loss3/top-5 = 0.827125


At the 119,00-th iteration everything is still normal
I0515 14:43:38.669674  1279 solver.cpp:228] Iteration 119000, loss = 2.70265
I0515 14:43:38.669777  1279 solver.cpp:244]     Train net output #0: loss1/loss1 = 2.41406 (* 0.3 = 0.724217 loss)
I0515 14:43:38.669783  1279 solver.cpp:244]     Train net output #1: loss2/loss1 = 2.38374 (* 0.3 = 0.715123 loss)
I0515 14:43:38.669787  1279 solver.cpp:244]     Train net output #2: loss3/loss3 = 1.92663 (* 1 = 1.92663 loss)
I0515 14:43:38.669798  1279 sgd_solver.cpp:106] Iteration 119000, lr = 0.000922632


Right after that the loss suddenly raise, i.e. equal to the initial loss ( from 8 to 9),
I0515 14:43:45.377710  1279 solver.cpp:228] Iteration 119040, loss = 8.3068
I0515 14:43:45.377751  1279 solver.cpp:244]     Train net output #0: loss1/loss1 = 5.77026 (* 0.3 = 1.73108 loss)
I0515 14:43:45.377758  1279 solver.cpp:244]     Train net output #1: loss2/loss1 = 5.76971 (* 0.3 = 1.73091 loss)
I0515 14:43:45.377763  1279 solver.cpp:244]     Train net output #2: loss3/loss3 = 5.70022 (* 1 = 5.70022 loss)
I0515 14:43:45.377768  1279 sgd_solver.cpp:106] Iteration 119040, lr = 0.000922605
I0515 14:43:52.083770  1279 solver.cpp:228] Iteration 119080, loss = 9.07503
I0515 14:43:52.083809  1279 solver.cpp:244]     Train net output #0: loss1/loss1 = 5.6852 (* 0.3 = 1.70556 loss)
I0515 14:43:52.083816  1279 solver.cpp:244]     Train net output #1: loss2/loss1 = 5.6172 (* 0.3 = 1.68516 loss)
I0515 14:43:52.083819  1279 solver.cpp:244]     Train net output #2: loss3/loss3 = 5.6686 (* 1 = 5.6686 loss)
I0515 14:43:52.083825  1279 sgd_solver.cpp:106] Iteration 119080, lr = 0.000922578


And the net cannot reduce that loss long after the sudden change happened
I0515 16:51:10.485610  1279 solver.cpp:228] Iteration 161040, loss = 9.01994
I0515 16:51:10.485649  1279 solver.cpp:244]     Train net output #0: loss1/loss1 = 5.63485 (* 0.3 = 1.69046 loss)
I0515 16:51:10.485656  1279 solver.cpp:244]     Train net output #1: loss2/loss1 = 5.63484 (* 0.3 = 1.69045 loss)
I0515 16:51:10.485661  1279 solver.cpp:244]     Train net output #2: loss3/loss3 = 5.62972 (* 1 = 5.62972 loss)
I0515 16:51:10.485666  1279 sgd_solver.cpp:106] Iteration 161040, lr = 0.0008937
I0515 16:51:17.204901  1279 solver.cpp:228] Iteration 161080, loss = 9.01589
I0515 16:51:17.204941  1279 solver.cpp:244]     Train net output #0: loss1/loss1 = 5.61281 (* 0.3 = 1.68384 loss)
I0515 16:51:17.204946  1279 solver.cpp:244]     Train net output #1: loss2/loss1 = 5.6128 (* 0.3 = 1.68384 loss)
I0515 16:51:17.204951  1279 solver.cpp:244]     Train net output #2: loss3/loss3 = 5.61579 (* 1 = 5.61579 loss)
I0515 16:51:17.204964  1279 sgd_solver.cpp:106] Iteration 161080, lr = 0.000893672


I rerun the experiment two times and it just repeat exactly at the iteration 119040-th. For further information, I did
data shuffling in creating the LMDB database. And I used this database to train a VGG-16 (step learning rate policy,
max 80k iterations, 20k iters per step) without any problem. With VGG I obtain 55% top-1 accuracy.

Anybody meet a similar problem to mine?

Norman he

unread,
Jun 10, 2016, 12:58:20 PM6/10/16
to Caffe Users
It is overfitting after the 119040th iteration.
Reply all
Reply to author
Forward
0 new messages