hello all, I am confused about "batch_size" in testing net & "test_interval" in solver.prototxt

150 views
Skip to first unread message

may lin

unread,
Nov 9, 2016, 12:54:57 AM11/9/16
to Caffe Users
In caffe MNIST example, it says the following:
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations (test_iter: 100) ,
# covering the full 10,000 testing images.


I want to know if it is different that I set batch size =1 & test_iter=10,000 between batch size =100 & test_iter=100 when I train caffe model? Does it has different accuracy result or computational situation?


Thank you all.

Armaan Bhullar

unread,
Nov 9, 2016, 1:10:13 PM11/9/16
to Caffe Users
Yes the computational situation is different (implying different accuracy in general). Think of it this way:
  • Every iteration involves one backprop and the corresponding update ( recall that gradient descent requires you to first compute error on entire training set), to get this update, minibatch strategy uses "batch size" number of images. So in other words, every update is based on 100 images if batch size=100.
  • Generally having a higher batch size allows you to train faster (less updates per epoch) and also reduces "noise" in the gradients but it also requires higher memory, also sometimes a little noise may be helpful to escape local minima.

Birol Kuyumcu

unread,
Nov 9, 2016, 1:23:31 PM11/9/16
to Caffe Users
"Every iteration involves one backprop and the corresponding update"
But this is for testing why backprop an update if so means that "caffe use test data for training also !"


9 Kasım 2016 Çarşamba 21:10:13 UTC+3 tarihinde Armaan Bhullar yazdı:

Armaan Bhullar

unread,
Nov 10, 2016, 3:37:29 AM11/10/16
to Caffe Users
No, caffe does not calculate weight updates during testing phase, but the accuracies reported are for batch_size*test_iter number of images. In my understanding, batch_size just means how many images your input blob will load from disk. These images can be used for different purposes including calculating an update.

may_lin

unread,
Nov 10, 2016, 9:22:49 PM11/10/16
to Caffe Users
Thank you for your reply. 

@Armaan Bhullar  It is a good opinion that "a little noise may be helpful to escape local minima". 
It's like both you said that "test images don't participate in training phase and backprop".


I did a experiments that when I set Test batch_size=1, test_iter=1,800 and Test batch_size=100, test_iter=18. Both two experiments use 1,800 images for doing Test phase. 

batch_size=1, test_iter=1800: 
I1110 16:04:32.444972  9722 solver.cpp:337] Iteration 10, Testing net (#0)
I1110
16:05:11.255198  9722 solver.cpp:404]     Test net output #0: accuracy = 0.518889
I1110
16:05:11.255300  9722 solver.cpp:404]     Test net output #1: loss = 0.696062 (* 1 = 0.696062 loss)
I1110
16:05:17.068862  9722 solver.cpp:337] Iteration 15, Testing net (#0)
I1110
16:05:55.884680  9722 solver.cpp:404]     Test net output #0: accuracy = 0.537222
I1110
16:05:55.884776  9722 solver.cpp:404]     Test net output #1: loss = 0.691227 (* 1 = 0.691227 loss)
I1110
16:06:01.924124  9722 solver.cpp:337] Iteration 20, Testing net (#0)
I1110
16:06:40.695374  9722 solver.cpp:404]     Test net output #0: accuracy = 0.552778
I1110
16:06:40.695482  9722 solver.cpp:404]     Test net output #1: loss = 0.684588 (* 1 = 0.684588 loss)

batch_size=100, test_iter=18:
I1110 16:40:41.339869  9859 solver.cpp:337] Iteration 10, Testing net (#0)
I1110
16:40:44.912143  9859 solver.cpp:404]     Test net output #0: accuracy = 0.518889
I1110
16:40:44.912185  9859 solver.cpp:404]     Test net output #1: loss = 0.696061 (* 1 = 0.696061 loss)
I1110
16:40:51.390815  9859 solver.cpp:337] Iteration 15, Testing net (#0)
I1110
16:40:54.948755  9859 solver.cpp:404]     Test net output #0: accuracy = 0.537222
I1110
16:40:54.948791  9859 solver.cpp:404]     Test net output #1: loss = 0.691227 (* 1 = 0.691227 loss)
I1110
16:41:01.446266  9859 solver.cpp:337] Iteration 20, Testing net (#0)
I1110
16:41:05.033687  9859 solver.cpp:404]     Test net output #0: accuracy = 0.552778
I1110
16:41:05.033730  9859 solver.cpp:404]     Test net output #1: loss = 0.684588 (* 1 = 0.684588 loss)

Over the two experiments, I found that when the Test batch_size bigger, test calculation time shorter.(batch_size*test_iter=1800)
batch_size=1, test_iter=1800 spent about 39 seconds v.s. batch_size=100, test_iter=18 spent about 4 seconds.
Does it means that testing phase also uses parallel computing (cuda)? So when Test batch_size bigger, there are more images processed in parallel and the test accuracy value is still the same (test accuracy = label correct/batch_size*test_iter). But Test batch_size is limited by GPU memory. 
Is it correct to say like that?

-----

@Armaan Bhullar
What is "batch_size*test_iter images can be used for different purposes including calculating an update" means? Is it like that you can know the current network model's efficacy according to the test accuracy? OR is it has other meaning? Thank you.
Reply all
Reply to author
Forward
0 new messages