Here is the full print info.
--------------------------------------------------------
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-9142/cutorch/lib/THC/THCGeneral.c line=176 error=60 : peer mapping resources exhausted
qlua: cuda runtime error (60) : peer mapping resources exhausted at /tmp/luarocks_cutorch-scm-1-9142/cutorch/lib/THC/THCGeneral.c:176
stack traceback:
[C]: at 0x7f89b4f589c0
[C]: at 0x7f8941b84970
[C]: in function 'require'
/home/qingnan/torch/install/share/lua/5.1/cutorch/init.lua:2: in main chunk
[C]: in function 'require'
/mnt/qn/code/imgSmooth/test2.lua:267: in main chunk
--------------------------------------------------------
I can install torch successfully, but while I require the cuda-related module like cunn and cutorch, it pops out this error.
I'm using two servers, torch works fine on one with 8 K80 gpus, but fails on this one(16 k80 gpus). Does this mean torch can't handle so many gpus at the same time? Every time I run torch, it starts a thread on each gpu.
Could anybody help me with this?