Re: [theano-users] Re: Buggy MSE calculation on GPU only

179 views
Skip to first unread message

Olivier Delalleau

unread,
Feb 8, 2012, 9:02:21 PM2/8/12
to thean...@googlegroups.com
I tried that, unfortunately the driver version does not seem to be something that looks like 280.13 (or other driver versions).
I get 4010 on leprof, and 3020 on ceylon, which I suspect correspond to toolkit versions 4.1 and 3.2.
If I use the cudaRuntimeGetVersion function, I get 3020 on both computers.
Anyone knows if the faulty 280.13 drivers can be reliably mapped to a similar number?

-=- Olivier

Le 6 février 2012 23:07, James Bergstra <james.b...@gmail.com> a écrit :
This is really bad.  I suggest that theano should detect that the driver is 280.13 and just crash immediately before screwing up anyone's work. The runtime API has a call that returns the kernel driver version:


Brian - do you feel like patching a call to that function into the theano.sandbox.cuda.cuda_ndarary.cu file's "gpu_init" method, and reporting exactly what you get as the driver version? That way we can start a blacklist of driver versions that Theano can flag as being buggy. Hopefully the blacklist always has length 1.

- James

On Mon, Feb 6, 2012 at 10:14 PM, Chris Currivan <ccur...@gmail.com> wrote:
Yes, I do. Thanks, that's probably it. I'll try rolling back to 270.41.19.

Chris

On Mon, Feb 6, 2012 at 9:26 PM, Olivier Delalleau
<dela...@iro.umontreal.ca> wrote:
> This reminds me of people having issues with buggy drivers:
> https://groups.google.com/group/theano-users/browse_thread/thread/186db6e00bcc8ccf/7b9ade6ba3a08cc3?hl=en&lnk=gst
>
> Do you have 280.13 CUDA drivers too?
>
> -=- Olivier
>
>
> Le 6 février 2012 21:13, Chris Currivan <ccur...@gmail.com> a écrit :
>
>> Some more information... It works correctly on the GPU with
>> mode='FAST_COMPILE', and 'DEBUG_MODE' catches the bug. theano.test()
>> passes. The bug appears with fvector input variables and not just
>> shared.
>>
>>
>> DEBUG_MODE output:
>>
>> Traceback (most recent call last):
>>  File "tests/test_theano.py", line 26, in <module>
>>    print "np mse: %f, theano mse: %f" % (mse, f_mse())
>>  File
>> "/usr/local/lib/python2.7/dist-packages/Theano-0.5.0rc2-py2.7.egg/theano/compile/function_module.py",
>> line 636, in __call__
>>    outputs = self.fn()
>>  File
>> "/usr/local/lib/python2.7/dist-packages/Theano-0.5.0rc2-py2.7.egg/theano/compile/debugmode.py",
>> line 1601, in deco
>>    return f()
>>  File
>> "/usr/local/lib/python2.7/dist-packages/Theano-0.5.0rc2-py2.7.egg/theano/compile/debugmode.py",
>> line 1484, in f
>>    raise BadCLinkerOutput(r, val_py=r_vals[r], val_c=storage_map[r][0])
>> theano.compile.debugmode.BadCLinkerOutput: BadCLinkerOutput
>>  variable: GpuSum{1}.0
>>  Outputs Type    : CudaNdarrayType(float32, scalar)
>>  Inputs Type: [CudaNdarrayType(float32, vector)]
>>  Apply   : GpuSum{1}(GpuElemwise{Composite{[sqr(sub(i0,
>> i1))]},no_inplace}.0)
>>  val_py  : <CudaNdarray object at 0x7ec1130>
>>  val_c   : <CudaNdarray object at 0x7ebdfb0>
>>  op      : <class 'theano.sandbox.cuda.basic_ops.GpuSum'>
>>  Max Abs Diff:  64.0
>>  Mean Abs Diff:  64.0
>>  Median Abs Diff:  64.0
>>  Std Abs Diff:  0.0
>>  Max Rel Diff:  0.470588
>>  Mean Rel Diff:  0.470588237047
>>  Median Rel Diff:  0.470588237047
>>  Std Rel Diff:  0.0
>>
>>
>>
>>
>> On Mon, Feb 6, 2012 at 8:03 PM, Chris Currivan <ccur...@gmail.com>
>> wrote:
>> > I'm seeing some strange behavior with a trivial mean squared error
>> > calculation. It does the right thing on the cpu, but the gpu result is
>> > wrong. I'm using the latest code (commit
>> > 0f86ecd9cc47e31ee14eb5fda2683e4b71123195) with CUDA 4.0 on a GTX 560
>> > Ti under Ubuntu. Any suggestions how to debug this?
>> >
>> >
>> > Correct CPU result:
>> > $ python tests/test_theano.py
>> > np mse: 1.000000, theano mse: 1.000000
>> >
>> > On the GPU:
>> > $ python tests/test_theano.py
>> > Using gpu device 0: GeForce GTX 560 Ti
>> > np mse: 1.000000, theano mse: 0.360000
>> >
>> >
>> >
>> > # test script
>> > import numpy as np
>> > import theano
>> > import theano.tensor as T
>> > from theano import config
>> >
>> > a = np.arange(100)
>> > b = a+1
>> > mse = np.mean((a-b)**2)
>> >
>> > A = theano.shared(name="A", value=a.astype(config.floatX))
>> > B = theano.shared(name="B", value=b.astype(config.floatX))
>> >
>> > MSE = T.mean((A-B)**2)
>> > f_mse = theano.function(inputs=[], outputs=MSE)
>> >
>> > print "np mse: %f, theano mse: %f" % (mse, f_mse())
>> >
>> >
>> >
>> >
>> > .theanorc:
>> >
>> > [cuda]
>> > root = /usr
>> >
>> > [global]
>> > floatX = float32
>> > intX = int32
>> > device = gpu
>> >
>> > [blas]
>> > ldflags = -lblas
>> >
>> > [nvcc]
>> > fastmath = True
>> > compiler_bindir = ~/.theano/nvcc-bindir
>> >
>> >
>> >
>> > $ nvcc --version
>> > nvcc: NVIDIA (R) Cuda compiler driver
>> > Copyright (c) 2005-2011 NVIDIA Corporation
>> > Built on Thu_May_12_11:09:45_PDT_2011
>> > Cuda compilation tools, release 4.0, V0.2.1221
>
>


Frédéric Bastien

unread,
Feb 20, 2012, 1:10:46 PM2/20/12
to thean...@googlegroups.com
Hi,

My PR that test the driver when the gpu code in Theano is used is now
merged. So if Someone else use a bad driver, it will crash when he try
to use the GPU code.

Fred

Reply all
Reply to author
Forward
0 new messages