Faster RCNN Joint Learning: Out of Memory - 5376 MB (changed batch sizes and no.of RPN proposals)

dusa

unread,

Jul 13, 2016, 4:27:47 PM7/13/16

to Caffe Users

I have worked with the alternative optimization matlab code before with the same setting (both training and testing / ZF architecture), currently I am trying to get joint learning running. I am able to run the test demo with my GPU Tesla 2070 (computation capability 2- can not use cuDNN). For training, I have set all the batch sizes to 1:

__C.TRAIN.IMS_PER_BATCH = 1
__C.TRAIN.BATCH_SIZE = 1
__C.TRAIN.RPN_BATCHSIZE = 1 (updated yml to 1 as well since it was overridden)

But I still have the error == cudaSuccess (2 vs. 0) out of memory.

I have tried to experiment with lowering the number of proposals. Working with the original method, I had changed the no of proposals after nms in the matlab code (to set around 350/500) and it had run smoothly so I thought I would make the changes here too. (the originals are below:)

train:
Number of top scoring boxes to keep before apply NMS to RPN proposals
__C.TRAIN.RPN_PRE_NMS_TOP_N = 12000
Number of top scoring boxes to keep after applying NMS to RPN proposals
__C.TRAIN.RPN_POST_NMS_TOP_N = 2000

test:
Number of top scoring boxes to keep before apply NMS to RPN proposals
__C.TEST.RPN_PRE_NMS_TOP_N = 6000
Number of top scoring boxes to keep after applying NMS to RPN proposals
__C.TEST.RPN_POST_NMS_TOP_N = 300

I tried as low as pre: 100 post:50 for sanity check.

And I still am not able to run without the out of memory problem. What am I missing here?? I have a Tesla 5376 MB dedicated memory and I use the Tesla only for this (have a separate GPU for my screen) I am positive about reading 5376 MB should be enough by an author himself.

Thanks.

At the moment I am trying to run training without the flipping but now I have a dozen reshape errors with still no luck.

Message has been deleted

dusa

unread,

Jul 13, 2016, 5:05:40 PM7/13/16

to Caffe Users

as soon as I run training, python goes crazy

27361 root 20 0 58.416g(VIRT) 123480 53816 R 148.6 (%CPU - then it settles around 99 and goes out of memory) 0.3 0:04.48 python

Reply all

Reply to author

Forward