I am running Theano.test() on two separate GPUs simultaneously.
After awhile I realized both instances were stopped waiting for a lock
to be released. It was stuck for about 10 minutes, each periodically
printing a message along the lines of:
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
.FINFO (theano.gof.compilelock): Waiting for existing lock by process
'9319' (I am process '9307')
INFO (theano.gof.compilelock): To manually release the lock, delete
/home/brian/.theano/compiledir_Linux-2.6.38-gentoo-r6-x86_64-Intel-R-_Core-TM-_i7-2600K_CPU_@_3.40GHz-with-gentoo-2.0.3-Intel_R_Core_TM_i7-2600K_CPU_@_3.40GHz-2.7.1/lock_dir
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
After awhile, I saw the following. I wasn't sure what to make of it,
so I'm guessing one of you will:
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
INFO (theano.gof.compilelock): Waiting for existing lock by unknown
process (I am process '9307')
INFO (theano.gof.compilelock): To manually release the lock, delete
/home/brian/.theano/compiledir_Linux-2.6.38-gentoo-r6-x86_64-Intel-R-_Core-TM-_i7-2600K_CPU_@_3.40GHz-with-gentoo-2.0.3-Intel_R_Core_TM_i7-2600K_CPU_@_3.40GHz-2.7.1/lock_dir
...............................[ 0.2 1.2 2.2 3.2 4.2 5.2]
[ 0.2 1.2 2.2 3.2 4.2 5.2]
........ERROR (theano.gof.opt): Optimization failure due to:
local_gpu_advanced_incsubtensor1
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "/mnt/space/projects/Theano/theano/gof/opt.py", line 928, in process_node
replacements = lopt.transform(node)
File "/mnt/space/projects/Theano/theano/sandbox/cuda/opt.py", line
673, in local_gpu_advanced_incsubtensor1
return [host_from_gpu(GpuAdvancedIncSubtensor1()(gpu_x, gpu_y, *coords))]
File "/mnt/space/projects/Theano/theano/gof/op.py", line 377, in __call__
node = self.make_node(*inputs, **kwargs)
File "/mnt/space/projects/Theano/theano/sandbox/cuda/basic_ops.py",
line 1803, in make_node
assert x_.type.ndim == y_.type.ndim
AssertionError
FF.........................INFO (theano.gof.compilelock): Waiting for
existing lock by unknown process (I am process '9307')
INFO (theano.gof.compilelock): To manually release the lock, delete
/home/brian/.theano/compiledir_Linux-2.6.38-gentoo-r6-x86_64-Intel-R-_Core-TM-_i7-2600K_CPU_@_3.40GHz-with-gentoo-2.0.3-Intel_R_Core_TM_i7-2600K_CPU_@_3.40GHz-2.7.1/lock_dir
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-Brian
If you used the init_gpu_device flag, tell us. We will need to check
why this test failed.
Fred
I set (THEANO_FLAGS?) to have device=gpu0 and device=gpu1
(respectively) in each shell.
-Brian
So, initially I ran the tests with the default, device=cpu, and some
tests failed. I imagine you guys want to see the results?
-Brian