Understanding base_lr in solver.prototxt

54 views
Skip to first unread message

ranju mandal

unread,
Nov 11, 2017, 11:46:43 AM11/11/17
to Caffe Users

Hi Everyone,
I just want to train my network with a base_lr = 0.01 in solver.prototxt as recommended. But the networks' # training loss suddenly goes up to loss = 87.3365 and it never comes down.

I1112 02:23:49.529306 172471 caffe.cpp:248] Starting Optimization
I1112 02:23:49.529321 172471 solver.cpp:272] Solving GoogleNet
I1112 02:23:49.529326 172471 solver.cpp:273] Learning Rate Policy: step
I1112 02:23:49.533668 172471 solver.cpp:330] Iteration 0, Testing net (#0)
I1112 02:24:43.198619 172478 data_layer.cpp:73] Restarting data prefetching from start.
I1112 02:25:29.000083 172471 solver.cpp:397] Test net output #0: loss1/top-1 = 0.293104
I1112 02:25:29.315837 172471 solver.cpp:218] Iteration 0 (-1.4013e-45 iter/s, 99.7842s/100 iters), loss = 11.2306
I1112 02:25:29.315917 172471 sgd_solver.cpp:105] Iteration 0, lr = 0.01
I1112 02:26:01.957139 172471 solver.cpp:218] Iteration 100 (3.06368 iter/s, 32.6405s/100 iters), loss = 87.3365
I1112 02:26:01.957279 172471 sgd_solver.cpp:105] Iteration 100, lr = 0.01
I1112 02:26:34.587430 172471 solver.cpp:218] Iteration 200 (3.06472 iter/s, 32.6294s/100 iters), loss = 87.3365
I1112 02:26:34.587604 172471 sgd_solver.cpp:105] Iteration 200, lr = 0.01
I1112 02:27:07.223541 172471 solver.cpp:218] Iteration 300 (3.06418 iter/s, 32.6352s/100 iters), loss = 87.3365
I1112 02:27:07.223726 172471 sgd_solver.cpp:105] Iteration 300, lr = 0.01
I1112 02:27:39.855602 172471 solver.cpp:218] Iteration 400 (3.06456 iter/s, 32.6311s/100 iters), loss = 87.3365
I1112 02:27:39.855777 172471 sgd_solver.cpp:105] Iteration 400, lr = 0.01
I1112 02:28:12.496170 172471 solver.cpp:218] Iteration 500 (3.06376 iter/s, 32.6396s/100 iters), loss = 87.3365
I1112 02:28:12.496341 172471 sgd_solver.cpp:105] Iteration 500, lr = 0.01
I1112 02:28:45.131556 172471 solver.cpp:218] Iteration 600 (3.06424 iter/s, 32.6345s/100 iters), loss = 87.3365
I1112 02:28:45.131744 172471 sgd_solver.cpp:105] Iteration 600, lr = 0.01

However, if I set the base_lr = 0.0001, the network behaves properly and train loss reduces

I1112 02:14:23.399960 172365 net.cpp:255] Network initialization done.
I1112 02:14:23.400444 172365 solver.cpp:56] Solver scaffolding done.
I1112 02:14:23.402956 172365 caffe.cpp:248] Starting Optimization
I1112 02:14:23.402972 172365 solver.cpp:272] Solving GoogleNet
I1112 02:14:23.402979 172365 solver.cpp:273] Learning Rate Policy: step
I1112 02:14:23.407411 172365 solver.cpp:330] Iteration 0, Testing net (#0)
I1112 02:15:16.486804 172372 data_layer.cpp:73] Restarting data prefetching from start.
I1112 02:16:01.746248 172365 solver.cpp:397] Test net output #0: loss1/top-1 = 0.705125
I1112 02:16:02.059989 172365 solver.cpp:218] Iteration 0 (-1.03629e-34 iter/s, 98.6552s/100 iters), loss = 10.0794
I1112 02:16:02.060058 172365 sgd_solver.cpp:105] Iteration 0, lr = 0.0001
I1112 02:16:34.620509 172365 solver.cpp:218] Iteration 100 (3.07127 iter/s, 32.5598s/100 iters), loss = 0.805383
I1112 02:16:34.620595 172365 sgd_solver.cpp:105] Iteration 100, lr = 0.0001
I1112 02:17:07.216924 172365 solver.cpp:218] Iteration 200 (3.0679 iter/s, 32.5956s/100 iters), loss = 0.661281
I1112 02:17:07.217043 172365 sgd_solver.cpp:105] Iteration 200, lr = 0.0001
I1112 02:17:39.866210 172365 solver.cpp:218] Iteration 300 (3.06293 iter/s, 32.6484s/100 iters), loss = 0.569396
I1112 02:17:39.866411 172365 sgd_solver.cpp:105] Iteration 300, lr = 0.0001
I1112 02:18:12.551064 172365 solver.cpp:218] Iteration 400 (3.05961 iter/s, 32.6839s/100 iters), loss = 0.556595
I1112 02:18:12.551189 172365 sgd_solver.cpp:105] Iteration 400, lr = 0.0001
I1112 02:18:45.240438 172365 solver.cpp:218] Iteration 500 (3.05918 iter/s, 32.6885s/100 iters), loss = 0.666869
I1112 02:18:45.240638 172365 sgd_solver.cpp:105] Iteration 500, lr = 0.0001
I1112 02:19:17.942977 172365 solver.cpp:218] Iteration 600 (3.05795 iter/s, 32.7016s/100 iters), loss = 0.591102
I1112 02:19:17.943089 172365 sgd_solver.cpp:105] Iteration 600, lr = 0.0001
I1112 02:19:50.637046 172365 solver.cpp:218] Iteration 700 (3.05874 iter/s, 32.6932s/100 iters), loss = 0.702966
I1112 02:19:50.637214 172365 sgd_solver.cpp:105] Iteration 700, lr = 0.0001
I1112 02:20:23.344806 172365 solver.cpp:218] Iteration 800 (3.05746 iter/s, 32.7068s/100 iters), loss = 0.712518
I1112 02:20:23.344913 172365 sgd_solver.cpp:105] Iteration 800, lr = 0.0001
I1112 02:20:56.043337 172365 solver.cpp:218] Iteration 900 (3.05832 iter/s, 32.6977s/100 iters), loss = 0.664418
I1112 02:20:56.043498 172365 sgd_solver.cpp:105] Iteration 900, lr = 0.0001

%---------------------------------------------------------------------------------------------------------

I just want to know, why base_lr = 0.01 does not work for my network, Could anybody explain? Thanks in advance.

H Zhao

unread,
Nov 11, 2017, 7:48:23 PM11/11/17
to Caffe Users

Look the picture. If the  base_lr is too large, your network may fail to converge.



在 2017年11月12日星期日 UTC+8上午12:46:43,ranju mandal写道:

ranju mandal

unread,
Nov 12, 2017, 2:59:46 AM11/12/17
to Caffe Users
Many thanks for your reply. However, do u think it is ok to keep base_lr = 0.0001.

H Zhao

unread,
Nov 12, 2017, 4:14:12 AM11/12/17
to Caffe Users
You can decrease the base_lr from large to small to find a proper learning rate. A proper learning rate should make the network converge quickly. If your network converge too slow, you can set a larger learning rate.

在 2017年11月12日星期日 UTC+8下午3:59:46,ranju mandal写道:
Reply all
Reply to author
Forward
0 new messages