GPU error:Error when tring to find the memory information on the GPU: RuntimeError: CudaNdarray.__setitem__: syncing structure to device failed

85 views
Skip to first unread message

Sonam Singh

unread,
Mar 1, 2016, 4:38:50 AM3/1/16
to theano-users
Hi,

I have an RNN Encoder - decoder model which is running fine on CPU but throws this error on GPU. 
Note: I can run similar( RNN encoder decoder with slight cost function changes on different data) without any errors, so drivers and all seem fine

Let me know if exception_verbosity=high trace is needed .


TRACE:

Error when tring to find the memory information on the GPU: unspecified launch failure
Error freeing device pointer 0x130040c600 (unspecified launch failure). Driver report 0 bytes free and 0 bytes total 
device_free: cudaFree() returned an error, but there is already an Python error set. This happen during the clean up when there is 
a first error and the CUDA driver is in a so bad state that it don't work anymore. We keep the previous error set to help debugging
 it.CudaNdarray_uninit: error freeing dev_structure memory 0x130040c600 (self=0x7f3433c3bab0)





 reraise(exc_type, exc_value, exc_trace)
  File "/home/ms/ssingh/anaconda2/lib/python2.7/site-packages/Theano-0.8.0.dev0-py2.7.egg/theano/compile/function_module.py", line 
859, in __call__
    outputs = self.fn()
  File "/home/ms/ssingh/anaconda2/lib/python2.7/site-packages/Theano-0.8.0.dev0-py2.7.egg/theano/scan_module/scan_op.py", line 963,
 in rval
    r = p(n, [x[0] for x in i], o)
  File "/home/ms/ssingh/anaconda2/lib/python2.7/site-packages/Theano-0.8.0.dev0-py2.7.egg/theano/scan_module/scan_op.py", line 952,
 in <lambda>
    self, node)
  File "theano/scan_module/scan_perform.pyx", line 505, in theano.scan_module.scan_perform.perform (/home/ms/ssingh/.theano/compile
dir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.7.10-64/scan_perform/mod.cpp:5551)
RuntimeError: CudaNdarray.__setitem__: syncing structure to device failed
Apply node that caused the error: forall_inplace,gpu,grad_of_scan_fn}(TensorConstant{10}, GpuDimShuffle{0,2,1}.0, GpuDimShuffle{0,2
,1}.0, GpuElemwise{Composite{(i0 - sqr(i1))},no_inplace}.0, GpuSubtensor{::int64}.0, GpuAlloc{memset_0=True}.0, GpuAlloc{memset_0=T
rue}.0, GpuAlloc{memset_0=True}.0, GpuDimShuffle{1,0}.0)
Toposort index: 247
Inputs types: [TensorType(int64, scalar), CudaNdarrayType(float32, 3D), CudaNdarrayType(float32, 3D), CudaNdarrayType(float32, 3D),
 CudaNdarrayType(float32, 3D), CudaNdarrayType(float32, 3D), CudaNdarrayType(float32, 3D), CudaNdarrayType(float32, matrix), CudaNd
arrayType(float32, matrix)]
Inputs shapes: [(), (10, 2048, 4), (10, 10000, 4), (10, 4, 2048), (11, 4, 2048), (2, 10000, 2048), (2, 2048, 2048), (2, 2048), (204
8, 2048)]
Inputs strides: [(), (-8192, 1, 2048), (-10000, 1, 100000), (8192, 2048, 1), (-8192, 2048, 1), (20480000, 2048, 1), (4194304, 2048,
 1), (2048, 1), (1, 2048)]
Inputs values: [array(10), 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown']
Outputs clients: [[], [GpuSubtensor{int64}(forall_inplace,gpu,grad_of_scan_fn}.1, Constant{1})], [GpuSubtensor{int64}(forall_inplac
e,gpu,grad_of_scan_fn}.2, Constant{1})], [GpuSubtensor{int64}(forall_inplace,gpu,grad_of_scan_fn}.3, Constant{1})]]



Thanks,
Sonam

Frédéric Bastien

unread,
Mar 8, 2016, 6:20:40 PM3/8/16
to theano-users

This is hard to debug. Do you still have this with the same model but smaller layer size? Can you try with the env variable CUDA_LAUNCH_BLOCKING=1
This can give better error message.

Fred

--

---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sonam Singh

unread,
Mar 11, 2016, 12:24:45 AM3/11/16
to theano...@googlegroups.com
I guess it was driver issue. I tweaked here and there and wasn't able to reproduce after few days.

Thanks
-Sonam

--

---
You received this message because you are subscribed to a topic in the Google Groups "theano-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/theano-users/EjBvk1oyS3Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to theano-users...@googlegroups.com.

Aritz Bilbao

unread,
Sep 18, 2016, 4:58:58 PM9/18/16
to theano-users
How did you solve the problem?  What was it caused by?

Thanks!
Reply all
Reply to author
Forward
0 new messages