I have an RNN Encoder - decoder model which is running fine on CPU but throws this error on GPU.
Note: I can run similar( RNN encoder decoder with slight cost function changes on different data) without any errors, so drivers and all seem fine
Let me know if exception_verbosity=high trace is needed .
Error when tring to find the memory information on the GPU: unspecified launch failure
Error freeing device pointer 0x130040c600 (unspecified launch failure). Driver report 0 bytes free and 0 bytes total
device_free: cudaFree() returned an error, but there is already an Python error set. This happen during the clean up when there is
a first error and the CUDA driver is in a so bad state that it don't work anymore. We keep the previous error set to help debugging
it.CudaNdarray_uninit: error freeing dev_structure memory 0x130040c600 (self=0x7f3433c3bab0)
reraise(exc_type, exc_value, exc_trace)
File "/home/ms/ssingh/anaconda2/lib/python2.7/site-packages/Theano-0.8.0.dev0-py2.7.egg/theano/compile/function_module.py", line
859, in __call__
outputs = self.fn()
File "/home/ms/ssingh/anaconda2/lib/python2.7/site-packages/Theano-0.8.0.dev0-py2.7.egg/theano/scan_module/scan_op.py", line 963,
in rval
r = p(n, [x[0] for x in i], o)
File "/home/ms/ssingh/anaconda2/lib/python2.7/site-packages/Theano-0.8.0.dev0-py2.7.egg/theano/scan_module/scan_op.py", line 952,
in <lambda>
self, node)
File "theano/scan_module/scan_perform.pyx", line 505, in theano.scan_module.scan_perform.perform (/home/ms/ssingh/.theano/compile
dir_Linux-2.6-el6.x86_64-x86_64-with-redhat-6.5-Santiago-x86_64-2.7.10-64/scan_perform/mod.cpp:5551)
RuntimeError: CudaNdarray.__setitem__: syncing structure to device failed
Apply node that caused the error: forall_inplace,gpu,grad_of_scan_fn}(TensorConstant{10}, GpuDimShuffle{0,2,1}.0, GpuDimShuffle{0,2
,1}.0, GpuElemwise{Composite{(i0 - sqr(i1))},no_inplace}.0, GpuSubtensor{::int64}.0, GpuAlloc{memset_0=True}.0, GpuAlloc{memset_0=T
rue}.0, GpuAlloc{memset_0=True}.0, GpuDimShuffle{1,0}.0)
Toposort index: 247
Inputs types: [TensorType(int64, scalar), CudaNdarrayType(float32, 3D), CudaNdarrayType(float32, 3D), CudaNdarrayType(float32, 3D),
CudaNdarrayType(float32, 3D), CudaNdarrayType(float32, 3D), CudaNdarrayType(float32, 3D), CudaNdarrayType(float32, matrix), CudaNd
arrayType(float32, matrix)]
Inputs shapes: [(), (10, 2048, 4), (10, 10000, 4), (10, 4, 2048), (11, 4, 2048), (2, 10000, 2048), (2, 2048, 2048), (2, 2048), (204
8, 2048)]
Inputs strides: [(), (-8192, 1, 2048), (-10000, 1, 100000), (8192, 2048, 1), (-8192, 2048, 1), (20480000, 2048, 1), (4194304, 2048,
1), (2048, 1), (1, 2048)]
Inputs values: [array(10), 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown']
Outputs clients: [[], [GpuSubtensor{int64}(forall_inplace,gpu,grad_of_scan_fn}.1, Constant{1})], [GpuSubtensor{int64}(forall_inplac
e,gpu,grad_of_scan_fn}.2, Constant{1})], [GpuSubtensor{int64}(forall_inplace,gpu,grad_of_scan_fn}.3, Constant{1})]]