gpu: out of memory, even though batch_size=1

489 views
Skip to first unread message

Manu

unread,
Feb 18, 2016, 11:58:56 AM2/18/16
to Caffe Users
Dear community, 

I'm following this tutorial, in which a pre-trained model is loaded to perform a simple forward pass of an image.
The blob to be processed is defined as : 
net.blobs['data'].reshape(50,3,227,227)

In cpu mode everything works fine. But when I set the gpu mode, I get the error :
F0218 17:51:32.118782 2115146496 syncedmem.cpp:64] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***

I know this is a common error, the common solution being to reduce the batch size. But in my case, even with batch_size=1 I get the same error.

I'm running OS X 10.10.5, with cuda 7.5, latest cuda driver installed. Those are my gpu properties according to caffe:
lobelia:~ user$ /path_to_caffe/build/tools/caffe device_query -gpu 0
I0218
17:56:36.361524 2115146496 caffe.cpp:112] Querying GPUs 0
I0218
17:56:36.480191 2115146496 common.cpp:168] Device id:                     0
I0218
17:56:36.480242 2115146496 common.cpp:169] Major revision number:         3
I0218
17:56:36.480260 2115146496 common.cpp:170] Minor revision number:         0
I0218
17:56:36.480271 2115146496 common.cpp:171] Name:                          GeForce GT 650M
I0218
17:56:36.480284 2115146496 common.cpp:172] Total global memory:           1073414144
I0218
17:56:36.480293 2115146496 common.cpp:173] Total shared memory per block: 49152
I0218
17:56:36.480300 2115146496 common.cpp:174] Total registers per block:     65536
I0218
17:56:36.480307 2115146496 common.cpp:175] Warp size:                     32
I0218
17:56:36.480314 2115146496 common.cpp:176] Maximum memory pitch:          2147483647
I0218
17:56:36.480320 2115146496 common.cpp:177] Maximum threads per block:     1024
I0218
17:56:36.480337 2115146496 common.cpp:178] Maximum dimension of block:    1024, 1024, 64
I0218
17:56:36.480345 2115146496 common.cpp:181] Maximum dimension of grid:     2147483647, 65535, 65535
I0218
17:56:36.480353 2115146496 common.cpp:184] Clock rate:                    900000
I0218
17:56:36.480388 2115146496 common.cpp:185] Total constant memory:         65536
I0218
17:56:36.480397 2115146496 common.cpp:186] Texture alignment:             512
I0218
17:56:36.480403 2115146496 common.cpp:187] Concurrent copy and execution: Yes
I0218
17:56:36.480409 2115146496 common.cpp:189] Number of multiprocessors:     2
I0218
17:56:36.480415 2115146496 common.cpp:190] Kernel execution timeout:      Yes

Does anyone know what is going on?





sar...@cube26ar.com

unread,
Feb 18, 2016, 2:11:35 PM2/18/16
to Caffe Users
You should decrease your image size, I think your GPU is not sufficient enough to process such amount of memory usage.



Jan C Peters

unread,
Feb 19, 2016, 4:31:46 AM2/19/16
to Caffe Users
Yeah well, you do have a CUDA-capable GPU, but its installed memory is relatively few (1GB) compared to other cards like the Titan X (12GB), which are mainly used for training and testing those massive networks like the ones for the ImageNet challenge. So an out-of-memory error is not too surprising. If your problem is not solvable by reducing the batchsize, you will have to resort to smaller networks (less layers and/or filters) for GPU training. Or get a better card. Sadly you need a quite high-end card to make real use of it for deep learning. But if you are just experimenting, don't just yet run to the store, start with LeNet and MNIST, which is much smaller in size and you should be able to handle it without any problems. And it is just as instructive as the ImageNet stuff.

Jan
Reply all
Reply to author
Forward
0 new messages