This is really bad. I suggest that theano should detect that the driver is 280.13 and just crash immediately before screwing up anyone's work. The runtime API has a call that returns the kernel driver version:
Brian - do you feel like patching a call to that function into the theano.sandbox.cuda.cuda_ndarary.cu file's "gpu_init" method, and reporting exactly what you get as the driver version? That way we can start a blacklist of driver versions that Theano can flag as being buggy. Hopefully the blacklist always has length 1.- JamesOn Mon, Feb 6, 2012 at 10:14 PM, Chris Currivan <ccur...@gmail.com> wrote:
Yes, I do. Thanks, that's probably it. I'll try rolling back to 270.41.19.
Chris
On Mon, Feb 6, 2012 at 9:26 PM, Olivier Delalleau
<dela...@iro.umontreal.ca> wrote:
> This reminds me of people having issues with buggy drivers:
> https://groups.google.com/group/theano-users/browse_thread/thread/186db6e00bcc8ccf/7b9ade6ba3a08cc3?hl=en&lnk=gst
>
> Do you have 280.13 CUDA drivers too?
>
> -=- Olivier
>
>
> Le 6 février 2012 21:13, Chris Currivan <ccur...@gmail.com> a écrit :
>
>> Some more information... It works correctly on the GPU with
>> mode='FAST_COMPILE', and 'DEBUG_MODE' catches the bug. theano.test()
>> passes. The bug appears with fvector input variables and not just
>> shared.
>>
>>
>> DEBUG_MODE output:
>>
>> Traceback (most recent call last):
>> File "tests/test_theano.py", line 26, in <module>
>> print "np mse: %f, theano mse: %f" % (mse, f_mse())
>> File
>> "/usr/local/lib/python2.7/dist-packages/Theano-0.5.0rc2-py2.7.egg/theano/compile/function_module.py",
>> line 636, in __call__
>> outputs = self.fn()
>> File
>> "/usr/local/lib/python2.7/dist-packages/Theano-0.5.0rc2-py2.7.egg/theano/compile/debugmode.py",
>> line 1601, in deco
>> return f()
>> File
>> "/usr/local/lib/python2.7/dist-packages/Theano-0.5.0rc2-py2.7.egg/theano/compile/debugmode.py",
>> line 1484, in f
>> raise BadCLinkerOutput(r, val_py=r_vals[r], val_c=storage_map[r][0])
>> theano.compile.debugmode.BadCLinkerOutput: BadCLinkerOutput
>> variable: GpuSum{1}.0
>> Outputs Type : CudaNdarrayType(float32, scalar)
>> Inputs Type: [CudaNdarrayType(float32, vector)]
>> Apply : GpuSum{1}(GpuElemwise{Composite{[sqr(sub(i0,
>> i1))]},no_inplace}.0)
>> val_py : <CudaNdarray object at 0x7ec1130>
>> val_c : <CudaNdarray object at 0x7ebdfb0>
>> op : <class 'theano.sandbox.cuda.basic_ops.GpuSum'>
>> Max Abs Diff: 64.0
>> Mean Abs Diff: 64.0
>> Median Abs Diff: 64.0
>> Std Abs Diff: 0.0
>> Max Rel Diff: 0.470588
>> Mean Rel Diff: 0.470588237047
>> Median Rel Diff: 0.470588237047
>> Std Rel Diff: 0.0
>>
>>
>>
>>
>> On Mon, Feb 6, 2012 at 8:03 PM, Chris Currivan <ccur...@gmail.com>
>> wrote:
>> > I'm seeing some strange behavior with a trivial mean squared error
>> > calculation. It does the right thing on the cpu, but the gpu result is
>> > wrong. I'm using the latest code (commit
>> > 0f86ecd9cc47e31ee14eb5fda2683e4b71123195) with CUDA 4.0 on a GTX 560
>> > Ti under Ubuntu. Any suggestions how to debug this?
>> >
>> >
>> > Correct CPU result:
>> > $ python tests/test_theano.py
>> > np mse: 1.000000, theano mse: 1.000000
>> >
>> > On the GPU:
>> > $ python tests/test_theano.py
>> > Using gpu device 0: GeForce GTX 560 Ti
>> > np mse: 1.000000, theano mse: 0.360000
>> >
>> >
>> >
>> > # test script
>> > import numpy as np
>> > import theano
>> > import theano.tensor as T
>> > from theano import config
>> >
>> > a = np.arange(100)
>> > b = a+1
>> > mse = np.mean((a-b)**2)
>> >
>> > A = theano.shared(name="A", value=a.astype(config.floatX))
>> > B = theano.shared(name="B", value=b.astype(config.floatX))
>> >
>> > MSE = T.mean((A-B)**2)
>> > f_mse = theano.function(inputs=[], outputs=MSE)
>> >
>> > print "np mse: %f, theano mse: %f" % (mse, f_mse())
>> >
>> >
>> >
>> >
>> > .theanorc:
>> >
>> > [cuda]
>> > root = /usr
>> >
>> > [global]
>> > floatX = float32
>> > intX = int32
>> > device = gpu
>> >
>> > [blas]
>> > ldflags = -lblas
>> >
>> > [nvcc]
>> > fastmath = True
>> > compiler_bindir = ~/.theano/nvcc-bindir
>> >
>> >
>> >
>> > $ nvcc --version
>> > nvcc: NVIDIA (R) Cuda compiler driver
>> > Copyright (c) 2005-2011 NVIDIA Corporation
>> > Built on Thu_May_12_11:09:45_PDT_2011
>> > Cuda compilation tools, release 4.0, V0.2.1221
>
>
My PR that test the driver when the gpu code in Theano is used is now
merged. So if Someone else use a bad driver, it will crash when he try
to use the GPU code.
Fred