Problem, training googlenet

Andreas Kölsch

unread,

Jun 3, 2016, 6:19:29 AM6/3/16

to Caffe Users

Hi guys,

I am training googlenet on ImageNet right now using the solver.prototxt provided on github. The training is not done, yet, but it's foreseeable, that it won't reach recognition rates like the bundled model. This is what it looks like right now: googlenet.png.

Iteration 7072000, top-1 accuracy: 0.55794

Looking at the graph I don't think I will get anywhere near the reported 68% by iteration 10M.

Therefore, I decided to train using the quick_solver.prototxt, but it did not learn a thing. At iteration 12k, accuracy was still random guessing (0.001). Should I train for longer get the first results?

Any ideas, why I get such poor accuracies?

Thanks in advance,

Andreas

PS: using caffe-1.0rc3

googlenet.png

Andreas Kölsch

unread,

Jun 6, 2016, 9:39:42 AM6/6/16

to Caffe Users

Has anyone ever trained googlenet with the provided solver.prototxt and could show me the training history?

Norman he

unread,

Jun 28, 2016, 10:38:02 PM6/28/16

to Caffe Users

seems you are getting into the plateau, you getting there pretty slow also.

If you are using batch size 64, you should be able to get to plateau with lr = 0.01. then reduce lr in half.

Norman he

unread,

Jun 28, 2016, 11:15:12 PM6/28/16

to Caffe Users

try lr=0.001 if you use batch=64.

Norman he

unread,

Jun 29, 2016, 12:18:21 AM6/29/16

to Caffe Users

I got exactly as you shown in the graph and stuck around the plateau. after that, I have switched using lr=0.001, gamma =0.96, stepsize=20k. I am still trying to hit 68.6%.:)

Norman he

unread,

Jun 29, 2016, 2:13:12 PM6/29/16

to Caffe Users

Accuracy keep on climbing:)

I0629 09:34:19.202311 92674 solver.cpp:338] Iteration 28000, Testing net (#0)
I0629 09:40:04.110777 92674 solver.cpp:406]     Test net output #0: loss1/loss1 = 2.15742 (* 0.3 = 0.647227 loss)
I0629 09:40:04.110847 92674 solver.cpp:406]     Test net output #1: loss1/top-1 = 0.50298
I0629 09:40:04.110853 92674 solver.cpp:406]     Test net output #2: loss1/top-5 = 0.75754
I0629 09:40:04.110860 92674 solver.cpp:406]     Test net output #3: loss2/loss1 = 1.91715 (* 0.3 = 0.575144 loss)
I0629 09:40:04.110865 92674 solver.cpp:406]     Test net output #4: loss2/top-1 = 0.54536
I0629 09:40:04.110870 92674 solver.cpp:406]     Test net output #5: loss2/top-5 = 0.795441
I0629 09:40:04.110877 92674 solver.cpp:406]     Test net output #6: loss3/loss3 = 1.70343 (* 1 = 1.70343 loss)
I0629 09:40:04.110910 92674 solver.cpp:406]     Test net output #7: loss3/top-1 = 0.59296
I0629 09:40:04.110915 92674 solver.cpp:406]     Test net output #8: loss3/top-5 = 0.826301
I0629 09:40:05.214720 92674 solver.cpp:229] Iteration 28000, loss = 2.44163
I0629 09:40:05.214779 92674 solver.cpp:245]     Train net output #0: loss1/loss1 = 2.32132 (* 0.3 = 0.696397 loss)
I0629 09:40:05.214787 92674 solver.cpp:245]     Train net output #1: loss2/loss1 = 1.82686 (* 0.3 = 0.548058 loss)
I0629 09:40:05.214792 92674 solver.cpp:245]     Train net output #2: loss3/loss3 = 1.19718 (* 1 = 1.19718 loss)
I0629 09:40:05.214800 92674 sgd_solver.cpp:106] Iteration 28000, lr = 0.00096

Andreas Kölsch

unread,

Jun 29, 2016, 2:53:18 PM6/29/16

to Caffe Users

Hi Norman,

thanks for your reply. After my initial post, I realized, that the bundled model is trained using the quick_solver, so after the 10M iterations of the solver I tried using the quick_solver again and it started to learn. See the intermediate results in the attached graph. Again, I doubt, it will reach the accuracy of the bundled model. Of course, I could experiment with different learning rates and stuff, but I wonder, why they added prototxt's which do not reproduce the results of the bundled model. Was the bundled model maybe trained with a different prototxt than the one provided? Or could it be that the new caffe version makes it worse?

I would like to play with a few parameters of the network and for comparison it would be cool, if I could run them without changing the solver during training rate manually.. So first, I want to make sure I can reproduce the results of Googlenet. I am surprised, no one stumbled across this before... Should I try with a different version of caffe, or do you know, if the bundled model was created using a different protoxt?

Cheers

googlenet.png

googlenet_quick.png

Norman he

unread,

Jun 29, 2016, 4:46:01 PM6/29/16

to Caffe Users

Other people published result is all around 60 epocs, we only trained for around 10 epocs( 1 epoc = 1.281167 million images , how many iteration dependes on your batch size). I am trying different adaptive learning method to speed it up now. hopefully we can achieve 68.6% accuracy in far shorter time. increase momentum to 0.94. lr=0.0005. :)

I0629 13:06:23.347209 22223 solver.cpp:338] Iteration 4000, Testing net (#0)
I0629 13:12:07.014796 22223 solver.cpp:406]     Test net output #0: loss1/loss1 = 2.18214 (* 0.3 = 0.654642 loss)
I0629 13:12:07.014844 22223 solver.cpp:406]     Test net output #1: loss1/top-1 = 0.49964
I0629 13:12:07.014852 22223 solver.cpp:406]     Test net output #2: loss1/top-5 = 0.75406
I0629 13:12:07.014858 22223 solver.cpp:406]     Test net output #3: loss2/loss1 = 1.89004 (* 0.3 = 0.567011 loss)
I0629 13:12:07.014868 22223 solver.cpp:406]     Test net output #4: loss2/top-1 = 0.54942
I0629 13:12:07.014873 22223 solver.cpp:406]     Test net output #5: loss2/top-5 = 0.798261
I0629 13:12:07.014878 22223 solver.cpp:406]     Test net output #6: loss3/loss3 = 1.62255 (* 1 = 1.62255 loss)
I0629 13:12:07.014883 22223 solver.cpp:406]     Test net output #7: loss3/top-1 = 0.60774
I0629 13:12:07.014889 22223 solver.cpp:406]     Test net output #8: loss3/top-5 = 0.837381
I0629 13:12:08.108912 22223 solver.cpp:229] Iteration 4000, loss = 2.92022
I0629 13:12:08.108964 22223 solver.cpp:245]     Train net output #0: loss1/loss1 = 2.39938 (* 0.3 = 0.719813 loss)
I0629 13:12:08.108973 22223 solver.cpp:245]     Train net output #1: loss2/loss1 = 2.29997 (* 0.3 = 0.689991 loss)
I0629 13:12:08.108980 22223 solver.cpp:245]     Train net output #2: loss3/loss3 = 1.51042 (* 1 = 1.51042 loss)
I0629 13:12:08.108989 22223 sgd_solver.cpp:106] Iteration 4000, lr = 0.0005

Norman he

unread,

Jun 29, 2016, 5:29:53 PM6/29/16

to Caffe Users

Norman he

unread,

Jun 29, 2016, 5:35:20 PM6/29/16

to Caffe Users

I am pretty sure the published github solver.txt will only get you to the plateau which top-1 is around 50%... the gamma is too large, if you try gamma =.5 it will work out after 60 epochs.

On Wednesday, June 29, 2016 at 11:53:18 AM UTC-7, Andreas Kölsch wrote:

Norman he

unread,

Jun 29, 2016, 8:42:52 PM6/29/16

to Caffe Users

Keep on drop lr, so far so good.

I0629 17:29:24.272935 93047 solver.cpp:338] Iteration 2000, Testing net (#0)
I0629 17:35:08.542156 93047 solver.cpp:406]     Test net output #0: loss1/loss1 = 2.06764 (* 0.3 = 0.620292 loss)
I0629 17:35:08.542244 93047 solver.cpp:406]     Test net output #1: loss1/top-1 = 0.51834
I0629 17:35:08.542254 93047 solver.cpp:406]     Test net output #2: loss1/top-5 = 0.770541
I0629 17:35:08.542261 93047 solver.cpp:406]     Test net output #3: loss2/loss1 = 1.85069 (* 0.3 = 0.555206 loss)
I0629 17:35:08.542269 93047 solver.cpp:406]     Test net output #4: loss2/top-1 = 0.55856
I0629 17:35:08.542275 93047 solver.cpp:406]     Test net output #5: loss2/top-5 = 0.804001
I0629 17:35:08.542282 93047 solver.cpp:406]     Test net output #6: loss3/loss3 = 1.59296 (* 1 = 1.59296 loss)
I0629 17:35:08.542289 93047 solver.cpp:406]     Test net output #7: loss3/top-1 = 0.614659
I0629 17:35:08.542294 93047 solver.cpp:406]     Test net output #8: loss3/top-5 = 0.842102
I0629 17:35:09.641232 93047 solver.cpp:229] Iteration 2000, loss = 2.3549
I0629 17:35:09.641288 93047 solver.cpp:245]     Train net output #0: loss1/loss1 = 1.73005 (* 0.3 = 0.519015 loss)
I0629 17:35:09.641309 93047 solver.cpp:245]     Train net output #1: loss2/loss1 = 1.5902 (* 0.3 = 0.47706 loss)
I0629 17:35:09.641317 93047 solver.cpp:245]     Train net output #2: loss3/loss3 = 1.35883 (* 1 = 1.35883 loss)
I0629 17:35:09.641326 93047 sgd_solver.cpp:106] Iteration 2000, lr = 0.0001

Norman he

unread,

Jun 30, 2016, 2:51:41 PM6/30/16

to Caffe Users

Best result I got so far: I don't think it will progress to 68.3%: ) I am in epoc 26

I0630 02:28:24.517712 81687 solver.cpp:338] Iteration 22000, Testing net (#0)
I0630 02:34:15.971487 81687 solver.cpp:406]     Test net output #0: loss1/loss1 = 2.04278 (* 0.3 = 0.612836 loss)
I0630 02:34:15.971526 81687 solver.cpp:406]     Test net output #1: loss1/top-1 = 0.52528
I0630 02:34:15.971532 81687 solver.cpp:406]     Test net output #2: loss1/top-5 = 0.77476
I0630 02:34:15.971539 81687 solver.cpp:406]     Test net output #3: loss2/loss1 = 1.82704 (* 0.3 = 0.548111 loss)
I0630 02:34:15.971544 81687 solver.cpp:406]     Test net output #4: loss2/top-1 = 0.56368
I0630 02:34:15.971547 81687 solver.cpp:406]     Test net output #5: loss2/top-5 = 0.807662
I0630 02:34:15.971552 81687 solver.cpp:406]     Test net output #6: loss3/loss3 = 1.56157 (* 1 = 1.56157 loss)
I0630 02:34:15.971556 81687 solver.cpp:406]     Test net output #7: loss3/top-1 = 0.62178
I0630 02:34:15.971561 81687 solver.cpp:406]     Test net output #8: loss3/top-5 = 0.847341
I0630 02:34:17.066014 81687 solver.cpp:229] Iteration 22000, loss = 3.79609
I0630 02:34:17.066068 81687 solver.cpp:245]     Train net output #0: loss1/loss1 = 3.11895 (* 0.3 = 0.935685 loss)
I0630 02:34:17.066076 81687 solver.cpp:245]     Train net output #1: loss2/loss1 = 2.77588 (* 0.3 = 0.832765 loss)
I0630 02:34:17.066083 81687 solver.cpp:245]     Train net output #2: loss3/loss3 = 2.02764 (* 1 = 2.02764 loss)
I0630 02:34:17.066092 81687 sgd_solver.cpp:106] Iteration 22000, lr = 4e-05

Norman he

unread,

Jul 1, 2016, 12:43:19 PM7/1/16

to Caffe Users

I am starting to explore Nesterov, Adam, AdaGrad, AdaDelta .... from epoc 26 :)

Norman he

unread,

Jul 6, 2016, 1:53:54 PM7/6/16

to Caffe Users

trained for Another 15.4 epoc, best result:

I0704 11:37:04.426751 28504 solver.cpp:338] Iteration 154000, Testing net (#0)
I0704 11:43:07.569953 28504 solver.cpp:406]     Test net output #0: loss1/loss1 = 2.00542 (* 0.3 = 0.601627 loss)
I0704 11:43:07.569998 28504 solver.cpp:406]     Test net output #1: loss1/top-1 = 0.53236
I0704 11:43:07.570016 28504 solver.cpp:406]     Test net output #2: loss1/top-5 = 0.78096
I0704 11:43:07.570022 28504 solver.cpp:406]     Test net output #3: loss2/loss1 = 1.79131 (* 0.3 = 0.537393 loss)
I0704 11:43:07.570026 28504 solver.cpp:406]     Test net output #4: loss2/top-1 = 0.570819
I0704 11:43:07.570030 28504 solver.cpp:406]     Test net output #5: loss2/top-5 = 0.813521
I0704 11:43:07.570035 28504 solver.cpp:406]     Test net output #6: loss3/loss3 = 1.52793 (* 1 = 1.52793 loss)
I0704 11:43:07.570039 28504 solver.cpp:406]     Test net output #7: loss3/top-1 = 0.629399
I0704 11:43:07.570042 28504 solver.cpp:406]     Test net output #8: loss3/top-5 = 0.852481
I0704 11:43:08.663274 28504 solver.cpp:229] Iteration 154000, loss = 2.67388
I0704 11:43:08.663331 28504 solver.cpp:245]     Train net output #0: loss1/loss1 = 2.27942 (* 0.3 = 0.683826 loss)
I0704 11:43:08.663341 28504 solver.cpp:245]     Train net output #1: loss2/loss1 = 2.02491 (* 0.3 = 0.607474 loss)
I0704 11:43:08.663346 28504 solver.cpp:245]     Train net output #2: loss3/loss3 = 1.38258 (* 1 = 1.38258 loss)
I0704 11:43:08.663354 28504 sgd_solver.cpp:106] Iteration 154000, lr = 1.25e-05

Norman he

unread,

Jul 8, 2016, 1:32:00 PM7/8/16

to Caffe Users

I am training using distributed system also. See

https://github.com/amplab/SparkNet/issues/140

Reply all

Reply to author

Forward