Different image sizes - caffe seems very slow in GPU mode

85 views
Skip to first unread message

Alex Orloff

unread,
Oct 31, 2017, 4:45:06 PM10/31/17
to Caffe Users
Hi all
I use Caffee's python module.
My network consists of few conv layers (no full connect layers) and thus it can accept different image sizes.

Forward pass time increases when image size increases. (seems OK)
But if I fed big image and then small image both images are processed nearly same time (seems not OK)

Is it bug or feature?
How can I correct this?

i.e. - "After processing big image, small image takes same GPU time to calculate in forward pass"
I don't understand why does it so.


Alex Orloff

unread,
Oct 31, 2017, 10:23:58 PM10/31/17
to Caffe Users
I've checked that it's not GPU or CUDA feature.
In torch7 framework bigger image takes more time, smaller takes less in any feed order

Alex Orloff

unread,
Nov 1, 2017, 1:15:33 PM11/1/17
to Caffe Users
Hi again, following code


h=512
w=512
im_data = np.empty([h, w, 3])
print(im_data.shape)
im_data = np.swapaxes(im_data, 0, 2)
im_data = np.array([im_data], dtype = np.float)
Net.blobs['data'].reshape(1, 3, w, h)
Net.blobs['data'].data[...] = im_data
k=0
while k<20:
        start_time = time.time()
        out = Net.forward()
        total_spend = total_spend + time.time() - start_time
        print "Total spend = %f, current spend = %f" %(total_spend,time.time() - start_time)
        k=k+1

h=16
w=16
im_data = np.empty([h, w, 3])
print(im_data.shape)
im_data = np.swapaxes(im_data, 0, 2)
im_data = np.array([im_data], dtype = np.float)
Net.blobs['data'].reshape(1, 3, w, h)
Net.blobs['data'].data[...] = im_data
k=0
while k<20:
        start_time = time.time()
        out = Net.forward()
        total_spend = total_spend + time.time() - start_time

result CPU>>>
(512, 512, 3)
Total spend = 0.453633, current spend = 0.453639
Total spend = 0.611434, current spend = 0.157802
Total spend = 0.766412, current spend = 0.154979
Total spend = 0.920541, current spend = 0.154131
Total spend = 1.076776, current spend = 0.156237
Total spend = 1.231442, current spend = 0.154668
Total spend = 1.386451, current spend = 0.155010
Total spend = 1.540230, current spend = 0.153780
Total spend = 1.696707, current spend = 0.156478
Total spend = 1.853254, current spend = 0.156549
Total spend = 2.010661, current spend = 0.157408
Total spend = 2.166058, current spend = 0.155399
Total spend = 2.321408, current spend = 0.155352
Total spend = 2.476787, current spend = 0.155380
Total spend = 2.631135, current spend = 0.154350
Total spend = 2.786622, current spend = 0.155488
Total spend = 2.940805, current spend = 0.154185
Total spend = 3.096548, current spend = 0.155744
Total spend = 3.251005, current spend = 0.154458
Total spend = 3.406091, current spend = 0.155088
(16, 16, 3)
Total spend = 3.463368, current spend = 0.057279
Total spend = 3.463493, current spend = 0.000125
Total spend = 3.463600, current spend = 0.000107
Total spend = 3.463706, current spend = 0.000106
Total spend = 3.463817, current spend = 0.000112
Total spend = 3.463918, current spend = 0.000101
Total spend = 3.464019, current spend = 0.000100
Total spend = 3.464120, current spend = 0.000101
Total spend = 3.464220, current spend = 0.000100
Total spend = 3.464379, current spend = 0.000159
Total spend = 3.464515, current spend = 0.000136
Total spend = 3.464651, current spend = 0.000136
Total spend = 3.464751, current spend = 0.000101
Total spend = 3.464851, current spend = 0.000100
Total spend = 3.464952, current spend = 0.000101
Total spend = 3.465053, current spend = 0.000100
Total spend = 3.465153, current spend = 0.000100
Total spend = 3.465253, current spend = 0.000100
Total spend = 3.465353, current spend = 0.000100
Total spend = 3.465453, current spend = 0.000100

result GPU >>
(512, 512, 3)
Total spend = 0.024570, current spend = 0.024575
Total spend = 0.035578, current spend = 0.011011
Total spend = 0.046762, current spend = 0.011186
Total spend = 0.057938, current spend = 0.011177
Total spend = 0.068915, current spend = 0.010979
Total spend = 0.080003, current spend = 0.011089
Total spend = 0.090985, current spend = 0.010983
Total spend = 0.101882, current spend = 0.010898
Total spend = 0.112621, current spend = 0.010739
Total spend = 0.123411, current spend = 0.010790
Total spend = 0.134041, current spend = 0.010630
Total spend = 0.144658, current spend = 0.010617
Total spend = 0.155418, current spend = 0.010762
Total spend = 0.166558, current spend = 0.011141
Total spend = 0.177589, current spend = 0.011033
Total spend = 0.188786, current spend = 0.011199
Total spend = 0.200002, current spend = 0.011218
Total spend = 0.210893, current spend = 0.010892
Total spend = 0.221989, current spend = 0.011098
Total spend = 0.233669, current spend = 0.011682
(16, 16, 3)
Total spend = 0.251532, current spend = 0.017865
Total spend = 0.260617, current spend = 0.009086
Total spend = 0.269569, current spend = 0.008954
Total spend = 0.278521, current spend = 0.008953
Total spend = 0.287494, current spend = 0.008973
Total spend = 0.296458, current spend = 0.008965
Total spend = 0.305350, current spend = 0.008893
Total spend = 0.314149, current spend = 0.008799
Total spend = 0.322967, current spend = 0.008819
Total spend = 0.331809, current spend = 0.008844
Total spend = 0.341357, current spend = 0.009550
Total spend = 0.350171, current spend = 0.008816
Total spend = 0.359381, current spend = 0.009211
Total spend = 0.368544, current spend = 0.009164
Total spend = 0.377791, current spend = 0.009249
Total spend = 0.386919, current spend = 0.009129
Total spend = 0.395754, current spend = 0.008837
Total spend = 0.404894, current spend = 0.009143
Total spend = 0.413824, current spend = 0.008932
Total spend = 0.422858, current spend = 0.009035


Hieu Do Trung

unread,
Nov 2, 2017, 6:19:12 AM11/2/17
to Caffe Users
It might be that, 16x16x3 image is too small, and the setup time (copy data from host to device etc.) outweighs the speed gain by using GPU.


"MNIST is a small dataset, so training with GPU does not really introduce too much benefit due to communication overheads. On larger datasets with more complex models, such as ImageNet, the computation speed difference will be more significant."

If this is the case, you can try removing the big image, testing with small images only.
Then, it's not "After processing big image, small image takes same GPU time to calculate in forward pass"
but rather,  "small image takes same GPU time to calculate in forward pass"
and its reasonable here.

Alex Orloff

unread,
Nov 2, 2017, 8:28:59 AM11/2/17
to Caffe Users
Hi, thanks for your reply

But I've checked already what you've noticed
If I run 16x16 -> 512x512 -> 16x16 on CPU, both 16x16 passes take nearly same time
If I run 16x16 -> 512x512 -> 16x16 on GPU, second 16x16 pass takes SIGNIFANTLY more time to pass.

Thanks

Alex Orloff

unread,
Nov 7, 2017, 8:08:38 PM11/7/17
to Caffe Users
Here the figures>
(16, 16, 3)
Time to pass 16x16 = 0.139678
(512, 512, 3)
Time to pass 512x512 = 1.097116
(16, 16, 3)
Time to pass 16x16 once again = 0.900323
Reply all
Reply to author
Forward
0 new messages