Out of memory on K80 with 12G - fully convolutional net

331 views
Skip to first unread message

eran paz

unread,
Jul 9, 2015, 3:16:54 PM7/9/15
to caffe...@googlegroups.com
Hi all
would appreciate any insight to this....

I'm running a fully convolutional net (longjon future branch), had ton of problems until I got it to run, but it finally runs.
However, after about 6000 iterations it fails on:
F0709 22:02:38.919351 30207 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0)  out of memory

Now, that doesn't make sense at all, I have a K80 with 12G (actually have a bunch of them):
caffe@iltlvl383:~$ ./caffe/build/tools/caffe device_query -gpu=0
I0709 21:50:32.001188 32412 caffe.cpp:73] Querying device ID = 0
I0709 21:50:35.214002 32412 common.cpp:157] Device id:                     0
I0709 21:50:35.214074 32412 common.cpp:158] Major revision number:         3
I0709 21:50:35.214087 32412 common.cpp:159] Minor revision number:         7
I0709 21:50:35.214097 32412 common.cpp:160] Name:                          Tesla K80
I0709 21:50:35.214107 32412 common.cpp:161] Total global memory:           12079136768
I0709 21:50:35.214126 32412 common.cpp:162] Total shared memory per block: 49152
I0709 21:50:35.214136 32412 common.cpp:163] Total registers per block:     65536
I0709 21:50:35.214146 32412 common.cpp:164] Warp size:                     32
I0709 21:50:35.214156 32412 common.cpp:165] Maximum memory pitch:          2147483647
I0709 21:50:35.214165 32412 common.cpp:166] Maximum threads per block:     1024
I0709 21:50:35.214174 32412 common.cpp:167] Maximum dimension of block:    1024, 1024, 64
I0709 21:50:35.214184 32412 common.cpp:170] Maximum dimension of grid:     2147483647, 65535, 65535
I0709 21:50:35.214193 32412 common.cpp:173] Clock rate:                    823500
I0709 21:50:35.214201 32412 common.cpp:174] Total constant memory:         65536
I0709 21:50:35.214210 32412 common.cpp:175] Texture alignment:             512
I0709 21:50:35.214220 32412 common.cpp:176] Concurrent copy and execution: Yes
I0709 21:50:35.214236 32412 common.cpp:178] Number of multiprocessors:     13
I0709 21:50:35.214246 32412 common.cpp:179] Kernel execution timeout:      No

while all I need for the model is:
I0709 21:48:47.162098 30207 net.cpp:219] Memory required for data: 122306400
which is obviously a lot less than what I have.

In addition, this is a smaller model than the one I'm actually trying to run, I've reduced the number of possible pixel classes from 100 to 10 (just for testing).
I'm also running in batch_size=1 & iter_size=1
This is my solver definition:
net: "./models/models/V3_FCN/train_val_s32.prototxt"
test_iter: 500
test_interval: 10000 # py solving tests
display: 500
#average_loss: 20
lr_policy: "fixed"
base_lr: 1e-4
momentum: 0.9
iter_size: 1
# base_lr: 1e-9
# momentum: 0.99
max_iter: 100000
weight_decay: 0.0005
snapshot: 6000
test_initialization: false
snapshot_prefix: "./models/snapshots/V3_FCN/snapshot"


So, if anybody can tell me why my memory explodes, how to solve this, or is there any way to make the run use less memory I'd appreciate it.
THX

Evan Shelhamer

unread,
Jul 9, 2015, 5:59:39 PM7/9/15
to eran paz, caffe...@googlegroups.com
Note PR #2016 for sharing convolution buffers reduces memory (and can sometimes reduce it drastically in the case of fully convolutional nets).

Check also that you do not have an input image that is simply too large.

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/1ff8e3e4-8de6-4b1f-8cc0-5eb7c9c091e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

eran paz

unread,
Jul 10, 2015, 8:06:44 AM7/10/15
to caffe...@googlegroups.com
Hi Evan
Thanks for the quick reply.
Using PR#2016 sure helped, I was able to run my mini dataset (10 classes instead of 100) to completion, trying not the full dataset.
Since this PR isn't merged on longjon future branch I assume you ran the semantic segmentation without it, so I'm trying to figure out why it doesn't work in my case.
My images are not especially large, just embedded some objects in the ImageNet dataset.

THX
Eran

Evan Shelhamer

unread,
Jul 17, 2015, 12:26:00 AM7/17/15
to eran paz, caffe...@googlegroups.com
Re: #2016, we did run our experiments with shared buffers and the longjon:future branch readme suggests it although we haven't bundled it into the branch.

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages