Check failed: error == cudaSuccess (2 vs. 0) out of memory

749 views
Skip to first unread message

Caleb Belth

unread,
Jan 26, 2016, 11:53:43 PM1/26/16
to Caffe Users
I'm an undergraduate computer science student beginning to do research in machine learning and I'm new to Caffe, so I started with the tutorials provided by caffe (MNIST, cifar10, and ImageNet). MNIST and cifar10 were a breeze and I got a little bit of a feel for how Caffe works. I then went to tackle ImageNet—I did this last because of the gargantuan size of the dataset. Unfortunately, I don't have much of a GPU with which to work (attached as GPU.txt and specs at https://www.techpowerup.com/gpudb/894/geforce-gtx-650.html), so I, very understandably, got a GPU out of memory error (attached as output0.txt). 

I scoured this Caffe users group and found multiple "out of memory" issues. However, every single one I found was solved simply by reducing the batch size. After seeing this, I reduced the batch size from 256 to 64. It still crashed. I figured it might have to do with the size of each individual images, so I also reduced the JPEG quality factor from the default 75% to 50% in order to shrink the size of individual images. It still crashed. Next I reduced the dataset to 4 classes instead of the 1,000 default classes. Finally it worked. Unfortunately, the command I used to redirect the output to a file failed and the network trained with only the end results visible to me (the training took 48 hours). Strangely, the accuracy at the end was 0. I figured out what was wrong with the output redirect and went to train the network anew so I could attempt to figure out the 0 accuracy, but the out of memory crash reappeared. I had changed nothing in my network parameters or dataset. Having scoured this group some more, I lowered the batch size incrementally all the way to 1—to no avail. I then reduced the JPEG quality factor all the way to 10% and dropped the number of classes from 4 to 1. It still crashed. I'm baffled as to what could have changed to cause my once working setup to fail. Can anyone give me suggestions for how to proceed? Feel free to ask follow up questions on my predicament. Thanks for your time!
output0.txt
output1.txt
GPU.txt
Reply all
Reply to author
Forward
0 new messages