Does changing batch size speed up training?

Caleb Belth

unread,

Feb 11, 2016, 1:40:04 PM2/11/16

to Caffe Users

I'm new to Caffe and I'm running the ImageNet tutorial. I don't have sufficient GPU memory for ImageNet, so I'm using the CPU. I shrank the dataset size to 25 classes to attempt to make the training faster. I want to know if I can change the batch size to increase the speed of training. Also, since I shrank the dataset size from 1000 to 25, I believe I need to change the output layer of the network to 25. I changed "num_output: " to 25 in the inner_product_param field of the train_val.prototxt in an attempt to do this, but I'm not sure that's correct. I attached the prototxt for reference. Thanks for the help!

half_train_val.prototxt

Alex Orloff

unread,

Feb 11, 2016, 2:21:49 PM2/11/16

to Caffe Users

I think you are right.

Have you ever started your network? does train process begin without any error?

Sure, you can change batch size, and it will affect training speed. Another question - how will it affect? It's not so straight-forward.

But, if you think that reducing dataset 40 times you can increase training speed 40 times - I have bad news for you. It doesn't work this way

Caleb Belth

unread,

Feb 11, 2016, 2:41:19 PM2/11/16

to Caffe Users

Right now this is the output:

I0211 14:03:39.892946 18839 solver.cpp:340] Iteration 0, Testing net (#0)

10:06:04.719017 1395 solver.cpp:340] Iteration 0, Testing net (#0)

I0211 12:42:16.511376 1395 solver.cpp:408] Test net output #0: accuracy = 0

I0211 12:42:16.522457 1395 solver.cpp:408] Test net output #1: loss = 3.23429 (* 1 = 3.23429 loss)

I0211 12:44:29.792454 1395 solver.cpp:236] Iteration 0, loss = 7.64491

I0211 12:44:29.792937 1395 solver.cpp:252] Train net output #0: loss = 7.64491 (* 1 = 7.64491 loss)

I0211 12:44:29.793012 1395 sgd_solver.cpp:106] Iteration 0, lr = 0.01

caffe: malloc.c:3700: _int_malloc: Assertion `victim->fd_nextsize->bk_nextsize == victim' failed.

*** Aborted at 1455212803 (unix time) try "date -d @1455212803" if you are using GNU date ***

PC: @ 0x7f51a1ccfcc9 (unknown)

The strange thing is that the network hasn't crashed (i.e. the train command is still being executed, but the top command doesn't show Caffe as running). I haven't seen this before. Also, I understand the dataset size isn't directly proportional to the training speed, but are they completely unrelated?

Thanks for your help.

Reply all

Reply to author

Forward