Different image sizes - caffe seems very slow in GPU mode

Alex Orloff

unread,

Oct 31, 2017, 4:45:06 PM10/31/17

to Caffe Users

Hi all

I use Caffee's python module.

My network consists of few conv layers (no full connect layers) and thus it can accept different image sizes.

Forward pass time increases when image size increases. (seems OK)

But if I fed big image and then small image both images are processed nearly same time (seems not OK)

Is it bug or feature?

How can I correct this?

i.e. - "After processing big image, small image takes same GPU time to calculate in forward pass"

I don't understand why does it so.

Alex Orloff

unread,

Oct 31, 2017, 10:23:58 PM10/31/17

to Caffe Users

I've checked that it's not GPU or CUDA feature.

In torch7 framework bigger image takes more time, smaller takes less in any feed order

Alex Orloff

unread,

Nov 1, 2017, 1:15:33 PM11/1/17

to Caffe Users

Hi again, following code

h=512

w=512

im_data = np.empty([h, w, 3])

print(im_data.shape)

im_data = np.swapaxes(im_data, 0, 2)

im_data = np.array([im_data], dtype = np.float)

Net.blobs['data'].reshape(1, 3, w, h)

Net.blobs['data'].data[...] = im_data

k=0

while k<20:

start_time = time.time()

out = Net.forward()

total_spend = total_spend + time.time() - start_time

print "Total spend = %f, current spend = %f" %(total_spend,time.time() - start_time)

k=k+1

h=16

w=16

im_data = np.empty([h, w, 3])

print(im_data.shape)

im_data = np.swapaxes(im_data, 0, 2)

im_data = np.array([im_data], dtype = np.float)

Net.blobs['data'].reshape(1, 3, w, h)

Net.blobs['data'].data[...] = im_data

k=0

while k<20:

start_time = time.time()

out = Net.forward()

total_spend = total_spend + time.time() - start_time

result CPU>>>

(512, 512, 3)

Total spend = 0.453633, current spend = 0.453639

Total spend = 0.611434, current spend = 0.157802

Total spend = 0.766412, current spend = 0.154979

Total spend = 0.920541, current spend = 0.154131

Total spend = 1.076776, current spend = 0.156237

Total spend = 1.231442, current spend = 0.154668

Total spend = 1.386451, current spend = 0.155010

Total spend = 1.540230, current spend = 0.153780

Total spend = 1.696707, current spend = 0.156478

Total spend = 1.853254, current spend = 0.156549

Total spend = 2.010661, current spend = 0.157408

Total spend = 2.166058, current spend = 0.155399

Total spend = 2.321408, current spend = 0.155352

Total spend = 2.476787, current spend = 0.155380

Total spend = 2.631135, current spend = 0.154350

Total spend = 2.786622, current spend = 0.155488

Total spend = 2.940805, current spend = 0.154185

Total spend = 3.096548, current spend = 0.155744

Total spend = 3.251005, current spend = 0.154458

Total spend = 3.406091, current spend = 0.155088

(16, 16, 3)

Total spend = 3.463368, current spend = 0.057279

Total spend = 3.463493, current spend = 0.000125

Total spend = 3.463600, current spend = 0.000107

Total spend = 3.463706, current spend = 0.000106

Total spend = 3.463817, current spend = 0.000112

Total spend = 3.463918, current spend = 0.000101

Total spend = 3.464019, current spend = 0.000100

Total spend = 3.464120, current spend = 0.000101

Total spend = 3.464220, current spend = 0.000100

Total spend = 3.464379, current spend = 0.000159

Total spend = 3.464515, current spend = 0.000136

Total spend = 3.464651, current spend = 0.000136

Total spend = 3.464751, current spend = 0.000101

Total spend = 3.464851, current spend = 0.000100

Total spend = 3.464952, current spend = 0.000101

Total spend = 3.465053, current spend = 0.000100

Total spend = 3.465153, current spend = 0.000100

Total spend = 3.465253, current spend = 0.000100

Total spend = 3.465353, current spend = 0.000100

Total spend = 3.465453, current spend = 0.000100

result GPU >>

(512, 512, 3)

Total spend = 0.024570, current spend = 0.024575

Total spend = 0.035578, current spend = 0.011011

Total spend = 0.046762, current spend = 0.011186

Total spend = 0.057938, current spend = 0.011177

Total spend = 0.068915, current spend = 0.010979

Total spend = 0.080003, current spend = 0.011089

Total spend = 0.090985, current spend = 0.010983

Total spend = 0.101882, current spend = 0.010898

Total spend = 0.112621, current spend = 0.010739

Total spend = 0.123411, current spend = 0.010790

Total spend = 0.134041, current spend = 0.010630

Total spend = 0.144658, current spend = 0.010617

Total spend = 0.155418, current spend = 0.010762

Total spend = 0.166558, current spend = 0.011141

Total spend = 0.177589, current spend = 0.011033

Total spend = 0.188786, current spend = 0.011199

Total spend = 0.200002, current spend = 0.011218

Total spend = 0.210893, current spend = 0.010892

Total spend = 0.221989, current spend = 0.011098

Total spend = 0.233669, current spend = 0.011682

(16, 16, 3)

Total spend = 0.251532, current spend = 0.017865

Total spend = 0.260617, current spend = 0.009086

Total spend = 0.269569, current spend = 0.008954

Total spend = 0.278521, current spend = 0.008953

Total spend = 0.287494, current spend = 0.008973

Total spend = 0.296458, current spend = 0.008965

Total spend = 0.305350, current spend = 0.008893

Total spend = 0.314149, current spend = 0.008799

Total spend = 0.322967, current spend = 0.008819

Total spend = 0.331809, current spend = 0.008844

Total spend = 0.341357, current spend = 0.009550

Total spend = 0.350171, current spend = 0.008816

Total spend = 0.359381, current spend = 0.009211

Total spend = 0.368544, current spend = 0.009164

Total spend = 0.377791, current spend = 0.009249

Total spend = 0.386919, current spend = 0.009129

Total spend = 0.395754, current spend = 0.008837

Total spend = 0.404894, current spend = 0.009143

Total spend = 0.413824, current spend = 0.008932

Total spend = 0.422858, current spend = 0.009035

Hieu Do Trung

unread,

Nov 2, 2017, 6:19:12 AM11/2/17

to Caffe Users

It might be that, 16x16x3 image is too small, and the setup time (copy data from host to device etc.) outweighs the speed gain by using GPU.

Excerpt from http://caffe.berkeleyvision.org/gathered/examples/mnist.html:

"MNIST is a small dataset, so training with GPU does not really introduce too much benefit due to communication overheads. On larger datasets with more complex models, such as ImageNet, the computation speed difference will be more significant."

If this is the case, you can try removing the big image, testing with small images only.

Then, it's not "After processing big image, small image takes same GPU time to calculate in forward pass"

but rather, "small image takes same GPU time to calculate in forward pass"

and its reasonable here.

Alex Orloff

unread,

Nov 2, 2017, 8:28:59 AM11/2/17

to Caffe Users

Hi, thanks for your reply

But I've checked already what you've noticed

If I run 16x16 -> 512x512 -> 16x16 on CPU, both 16x16 passes take nearly same time

If I run 16x16 -> 512x512 -> 16x16 on GPU, second 16x16 pass takes SIGNIFANTLY more time to pass.

Thanks

Alex Orloff

unread,

Nov 7, 2017, 8:08:38 PM11/7/17

to Caffe Users

Here the figures>

(16, 16, 3)
Time to pass 16x16 = 0.139678
(512, 512, 3)
Time to pass 512x512 = 1.097116
(16, 16, 3)
Time to pass 16x16 once again = 0.900323

Reply all

Reply to author

Forward