cuda runtime error (60) : peer mapping resources exhausted while require 'cunn'/‘cutorch’

797 views
Skip to first unread message

Qingnan Fan

unread,
Jun 22, 2016, 8:57:16 AM6/22/16
to torch7
Here is the full print info.
--------------------------------------------------------
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-9142/cutorch/lib/THC/THCGeneral.c line=176 error=60 : peer mapping resources exhausted
qlua: cuda runtime error (60) : peer mapping resources exhausted at /tmp/luarocks_cutorch-scm-1-9142/cutorch/lib/THC/THCGeneral.c:176
stack traceback:
[C]: at 0x7f89b4f589c0
[C]: at 0x7f8941b84970
[C]: in function 'require'
/home/qingnan/torch/install/share/lua/5.1/cutorch/init.lua:2: in main chunk
[C]: in function 'require'
/mnt/qn/code/imgSmooth/test2.lua:267: in main chunk
--------------------------------------------------------

I can install torch successfully, but while I require the cuda-related module like cunn and cutorch, it pops out this error.

I'm using two servers, torch works fine on one with 8 K80 gpus, but fails on this one(16 k80 gpus). Does this mean torch can't handle so many gpus at the same time? Every time I run torch, it starts a thread on each gpu.

Could anybody help me with this?


alban desmaison

unread,
Jun 22, 2016, 10:32:37 AM6/22/16
to torch7
You can use the nvidia environment variable to prevent torch from initializing on the gpus that you don't want to use:

CUDA_VISIBLE_DEVICES=0,1,2,3 th -lcutorch

Qingnan Fan

unread,
Jun 27, 2016, 11:51:05 AM6/27/16
to torch7 on behalf of alban desmaison
Great, it works! But why cutorch.setDevice() method can't allocate a single gpu for computation?

Sorry for the late reply!

--
You received this message because you are subscribed to a topic in the Google Groups "torch7" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/torch7/gH0KlLJiVco/unsubscribe.
To unsubscribe from this group and all its topics, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

alban desmaison

unread,
Jun 27, 2016, 12:26:34 PM6/27/16
to torch7
When  you use the CUDA_VISIBLE_DEVICES macro, only the devices specified there are usable. Thus you cannot do a setDevice on the out of range elements. You can use cutorch.getDeviceCount() to see how many devices are available
On Monday, June 27, 2016 at 4:51:05 PM UTC+1, Qingnan Fan wrote:
Great, it works! But why cutorch.setDevice() method can't allocate a single gpu for computation?

Sorry for the late reply!
Reply all
Reply to author
Forward
0 new messages