failed test

309 views
Skip to first unread message

Brian Vandenberg

unread,
Feb 8, 2012, 4:10:41 AM2/8/12
to thean...@googlegroups.com
I assume this is the right list, please let me know if I'm mistaken.

I am running Theano.test() on two separate GPUs simultaneously.
After awhile I realized both instances were stopped waiting for a lock
to be released. It was stuck for about 10 minutes, each periodically
printing a message along the lines of:

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
.FINFO (theano.gof.compilelock): Waiting for existing lock by process
'9319' (I am process '9307')
INFO (theano.gof.compilelock): To manually release the lock, delete
/home/brian/.theano/compiledir_Linux-2.6.38-gentoo-r6-x86_64-Intel-R-_Core-TM-_i7-2600K_CPU_@_3.40GHz-with-gentoo-2.0.3-Intel_R_Core_TM_i7-2600K_CPU_@_3.40GHz-2.7.1/lock_dir
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

After awhile, I saw the following. I wasn't sure what to make of it,
so I'm guessing one of you will:

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
INFO (theano.gof.compilelock): Waiting for existing lock by unknown
process (I am process '9307')
INFO (theano.gof.compilelock): To manually release the lock, delete
/home/brian/.theano/compiledir_Linux-2.6.38-gentoo-r6-x86_64-Intel-R-_Core-TM-_i7-2600K_CPU_@_3.40GHz-with-gentoo-2.0.3-Intel_R_Core_TM_i7-2600K_CPU_@_3.40GHz-2.7.1/lock_dir
...............................[ 0.2 1.2 2.2 3.2 4.2 5.2]
[ 0.2 1.2 2.2 3.2 4.2 5.2]
........ERROR (theano.gof.opt): Optimization failure due to:
local_gpu_advanced_incsubtensor1
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "/mnt/space/projects/Theano/theano/gof/opt.py", line 928, in process_node
replacements = lopt.transform(node)
File "/mnt/space/projects/Theano/theano/sandbox/cuda/opt.py", line
673, in local_gpu_advanced_incsubtensor1
return [host_from_gpu(GpuAdvancedIncSubtensor1()(gpu_x, gpu_y, *coords))]
File "/mnt/space/projects/Theano/theano/gof/op.py", line 377, in __call__
node = self.make_node(*inputs, **kwargs)
File "/mnt/space/projects/Theano/theano/sandbox/cuda/basic_ops.py",
line 1803, in make_node
assert x_.type.ndim == y_.type.ndim
AssertionError

FF.........................INFO (theano.gof.compilelock): Waiting for
existing lock by unknown process (I am process '9307')
INFO (theano.gof.compilelock): To manually release the lock, delete
/home/brian/.theano/compiledir_Linux-2.6.38-gentoo-r6-x86_64-Intel-R-_Core-TM-_i7-2600K_CPU_@_3.40GHz-with-gentoo-2.0.3-Intel_R_Core_TM_i7-2600K_CPU_@_3.40GHz-2.7.1/lock_dir
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

-Brian

Olivier Delalleau

unread,
Feb 8, 2012, 6:54:02 AM2/8/12
to thean...@googlegroups.com
Hi,

You can ignore the "waiting for existing lock...": this is because two processes are sharing the same compilation directory, and they compete for the lock. To avoid this, you can make them use a different directory (config.[base_]compiledir[_format], see http://deeplearning.net/software/theano/library/config.html)

Besides this, the test suite is currently intended to be run with "device=cpu". GPU tests will still be run if a GPU is available.
That being said, about the specific optimization error you see, it looks like a bug.

Thanks for the report,

-=- Olivier

Frédéric Bastien

unread,
Feb 8, 2012, 10:21:38 AM2/8/12
to thean...@googlegroups.com
How did you told the test to run on the gpu? with the device or the
init_gpu_device flag? You must use the init_gpu_device flag as Olivier
told, the tests are build to have theano function run on the cpu by
default.

If you used the init_gpu_device flag, tell us. We will need to check
why this test failed.

Fred

Brian Vandenberg

unread,
Feb 8, 2012, 10:51:47 AM2/8/12
to thean...@googlegroups.com
2012/2/8 Frédéric Bastien <no...@nouiz.org>:

> How did you told the test to run on the gpu? with the device or the
> init_gpu_device flag? You must use the init_gpu_device flag as Olivier
> told, the tests are build to have theano function run on the cpu by
> default.
>
> If you used the init_gpu_device flag, tell us. We will need to check
> why this test failed.
>
> Fred

I set (THEANO_FLAGS?) to have device=gpu0 and device=gpu1
(respectively) in each shell.

-Brian

James Bergstra

unread,
Feb 8, 2012, 10:53:23 AM2/8/12
to thean...@googlegroups.com
Often people run the test with device=gpu.

What do you think of adding functionality to sandbox.cuda.__init__ to detect when the import has been done by nosetests, and then do something more sensible in response?

- James

2012/2/8 Frédéric Bastien <no...@nouiz.org>

Olivier Delalleau

unread,
Feb 8, 2012, 11:14:56 AM2/8/12
to thean...@googlegroups.com
Or, would it be a lot of work to make the test suite run with device=gpu? (possibly skipping some tests if needed).

-=- Olivier

Brian Vandenberg

unread,
Feb 8, 2012, 11:27:21 AM2/8/12
to thean...@googlegroups.com
On Wed, Feb 8, 2012 at 9:14 AM, Olivier Delalleau
<dela...@iro.umontreal.ca> wrote:
> Or, would it be a lot of work to make the test suite run with device=gpu?
> (possibly skipping some tests if needed).

So, initially I ran the tests with the default, device=cpu, and some
tests failed. I imagine you guys want to see the results?

-Brian

Olivier Delalleau

unread,
Feb 8, 2012, 11:36:12 AM2/8/12
to thean...@googlegroups.com
Sure thing! (you can ignore KnownFailureTests though)

-=- Olivier
Reply all
Reply to author
Forward
0 new messages