Error Means GPU Out of Memory?

6 430 katselukertaa
Siirry ensimmäiseen lukemattomaan viestiin

John Coolidge

lukematon,
16.9.2014 klo 21.41.3416.9.2014
vastaanottaja theano...@googlegroups.com
I have an error in my theano code that only surfaces when I run it on a gpu (error output at end).  I've run the code on my laptop's GeForce GTX 870M and amazon's EC2 GRID K520 and both have the error.  The symptoms of the error are a little strange, but I think generally seem to point to low memory:

1.) I use lookup tables and if I get rid of the largest lookup table and replace it with a smaller one the error disappears.  Removing one of the smaller lookup tables without replacement also fixes the error.

2.) If I decrease the batch size, the error goes away

3.) I've found 3 samples in the training dataset that are correlated with the error.  If I replace them with a different sample the error goes away, yet if I just delete them outright the error persists. There's nothing immediately obviously wrong with these samples and seem to be preprocessed correctly.

4.) If I train the network using SGD with rprop as opposed to vanilla SGD, the error goes away.  This is surprising because the rprop implementation should use more memory as it requires additional shared variables.

5.) If at random I remove enough samples (> ~100) from the training dataset, the error goes away.


Here's the console output on EC2:
Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0x7023a4c00 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total
Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0x702a02200 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total
device_free: cudaFree() returned an error, but there is already an Python error set. This happen during the clean up when there is a first error and the CUDA driver is in a so bad state that it don't work anymore. We keep the previous error set to help debugging it.CudaNdarray_uninit: error freeing self->devdata. (self=0xcd7fe30, self->devata=0x702a02200)
Traceback (most recent call last):
  File "train_and_save_action_type.py", line 507, in <module>
    main()
  File "train_and_save_action_type.py", line 454, in main
    test_score, val_score = ol.optimize_sgd(max_n_epochs=250, batch_size=1000,l1_reg=0,l2_reg=0)
  File "/home/ec2-user/cc_service/at_train/../theano_wrapper.py", line 642, in optimize_sgd
    train_model(minibatch_index)
  File "/usr/lib/python2.7/site-packages/theano/compile/function_module.py", line 588, in __call__
    self.fn.thunks[self.fn.position_of_error])
  File "/usr/lib/python2.7/site-packages/theano/compile/function_module.py", line 579, in __call__
    outputs = self.fn()
RuntimeError: Cuda error: GpuElemwise node_0b37b87ae450f460e9c32025b91e4cd5_0 Composite: an illegal memory access was encountered.
    n_blocks=30 threads_per_block=256
   Call: kernel_Composite_node_0b37b87ae450f460e9c32025b91e4cd5_0_Ccontiguous<<<n_blocks, threads_per_block>>>(numEls, i0_data, i1_data, i2_data, i3_data, i4_data, o0_data)

Apply node that caused the error: GpuElemwise{Composite{[Composite{[Composite{[sub(i0, mul(i1, i2))]}(i0, i1, add(i2, i3))]}(i0, i1, add(i2, i3), i4)]}}[(0, 0)](InputConvolution2D1DMaxComboLayer.lookup_table0#opt#ser, CudaNdarrayConstant{[[ -6.26268447e-05]]}, GpuAdvancedIncSubtensor1_dev20{no_inplace,inc}.0, GpuAdvancedIncSubtensor1_dev20{no_inplace,inc}.0, GpuAdvancedIncSubtensor1_dev20{inplace,inc}.0)
Inputs shapes: [(16789, 15), (1, 1), (16789, 15), (16789, 15), (16789, 15)]
Inputs strides: [(15, 1), (0, 0), (15, 1), (15, 1), (15, 1)]
Inputs types: [CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, (True, True)), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)]
Use the Theano flag 'exception_verbosity=high' for a debugprint of this apply node.

Does this seem like a problem caused by low GPU memory?  Perhaps on the GPU it's trying to allocate memory but can't and then tries to access the returned invalid memory pointer and that creates the illegal memory access error? Also,I should add that I have the latest stable version of theano (installed via pip).

Thoughts?

Thanks!

Sander Dieleman

lukematon,
17.9.2014 klo 6.02.2517.9.2014
vastaanottaja theano...@googlegroups.com
What version of CUDA are you using? Afaik there was a bug in CUDA 5.0 through 6.0 that could lead to illegal memory access errors, and it affected the new GpuCorrMM implementation. It is fixed in CUDA 6.5. Maybe this is the same bug? I don't know if that's possible, but the symptoms look similar.

Sander

Frédéric Bastien

lukematon,
17.9.2014 klo 8.13.1117.9.2014
vastaanottaja theano-users
The first error is:

RuntimeError: Cuda error: GpuElemwise node_
0b37b87ae450f460e9c32025b91e4cd5_0 Composite: an illegal memory access was encountered.

The other following error are due to the driver being in a bad state.

GPU execution are asynchrone. So the error is raised most of the time at a different place then where it was generated. To force a synchrone execution(slower, but the error appear at the right place, set this env variable:

CUDA_LAUNCH_BLOCKING=1

Try this and give us the new error.

Fred

--

---
You received this message because you are subscribed to the Google Groups "theano-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Coolidge

lukematon,
17.9.2014 klo 17.33.0017.9.2014
vastaanottaja theano...@googlegroups.com
Thanks for getting back to me.

I updated Cuda to 6.5 and still get the error.  I also set CUDA_LAUNCH_BLOCKING=1, but the output did not change.  Did I do it correctly? In the command line I entered CUDA_LAUNCH_BLOCKING=1 THEANO_FLAGS=device=gpu,floatX=float32,cuda.root=/usr/local/cuda-6.5/ python train_and_save_action_type.py

Any ideas?

John Coolidge

lukematon,
17.9.2014 klo 17.38.0117.9.2014
vastaanottaja theano...@googlegroups.com
I guess I should add that when I said I have the latest version of theano I meant the latest stable version (6.0)

Frédéric Bastien

lukematon,
17.9.2014 klo 20.03.4017.9.2014
vastaanottaja theano-users
The command you gave seem good.

Try the development version. It is very stable. I recommand it to everybody instead of the last release:

http://deeplearning.net/software/theano/install.html#bleeding-edge-install-instructions

There is many speedup and other fixes.

Fred

John Coolidge

lukematon,
18.9.2014 klo 17.05.2918.9.2014
vastaanottaja theano...@googlegroups.com
Thanks Fred, that fixed things!  Whew, that's a relief.

Albert Zeyer

lukematon,
30.1.2015 klo 10.00.2030.1.2015
vastaanottaja theano...@googlegroups.com
Hi there,

Sorry to resurrect this thread, but I get the same error and I have the latest Theano version from Git and CUDA 6.5.

It is quite non-deterministic for me. I do Neural Network training and in about half the cases, it runs through without a problem. In the other cases, it also runs for quite a while (> 30 mins or so), i.e. it does many successful iterations / mini-batches but then I get the error.


That is with a GeForce GTX 680, exclusively used by my process, on Ubuntu 12.04.

It's a bit hard to debug because it is so non-deterministic and it never happens right at the start but always only after a while.

Any thoughts?

I have multiple Python threads running. Could that be in any way related?

Thanks, Kind Regards,
Albert

Frédéric Bastien

lukematon,
30.1.2015 klo 10.14.0230.1.2015
vastaanottaja theano-users
On Fri, Jan 30, 2015 at 10:00 AM, Albert Zeyer <alb...@googlemail.com> wrote:
Hi there,

Sorry to resurrect this thread, but I get the same error and I have the latest Theano version from Git and CUDA 6.5.

It is quite non-deterministic for me. I do Neural Network training and in about half the cases, it runs through without a problem. In the other cases, it also runs for quite a while (> 30 mins or so), i.e. it does many successful iterations / mini-batches but then I get the error.


That is with a GeForce GTX 680, exclusively used by my process, on Ubuntu 12.04.

Do you use it for the monitor too? How are you sure no other process using it? If this is the case, this can change the memory allocation pattern and allow you to hit some strange case bug.

Can you run as was told in this thread with CUDA_LAUNCH_BLOCKING=1 as an environment variable?

Can you also run the code with cuda_memcheck?

This can help find the cause. If you have a similar error, but with very small variation, give it here. This can tell me more information.

Are you sure the power supply is strong enough for the computer? If not this can cause very strange errors.

Can you try to set your GPU in exclusive mode to be sure nothing else run on it?
 

It's a bit hard to debug because it is so non-deterministic and it never happens right at the start but always only after a while.

Any thoughts?

I have multiple Python threads running. Could that be in any way related?

Are you sure it is python threads and not python process? Is there only 1 threads that use Theano? Do you have the problem is you use only 1 threads.

Fred

Albert Zeyer

lukematon,
30.1.2015 klo 10.24.0230.1.2015
vastaanottaja theano...@googlegroups.com
Thanks for the answer.

>> That is with a GeForce GTX 680, exclusively used by my process, on Ubuntu
>> 12.04.
>
>
> Do you use it for the monitor too? How are you sure no other process using
> it? If this is the case, this can change the memory allocation pattern and
> allow you to hit some strange case bug.

It's used in exclusive mode.


> Can you run as was told in this thread with CUDA_LAUNCH_BLOCKING=1 as an
> environment variable?
>
> Can you also run the code with cuda_memcheck?

Will do and report back.


> This can help find the cause. If you have a similar error, but with very
> small variation, give it here. This can tell me more information.

Error when trying to find the memory information on the GPU: an
illegal memory access was encountered
Error allocating 36 bytes of device memory (an illegal memory access
was encountered). Driver report 0 bytes free and 0 bytes total
Error when trying to find the memory information on the GPU: an
illegal memory access was encountered
Error allocating 36 bytes of device memory (an illegal memory access
was encountered). Driver report 0 bytes free and 0 bytes total
Error when tring to find the memory information on the GPU: an illegal
memory access was encountered
Error freeing device pointer 0xb069e0000 (an illegal memory access was
encountered). Driver report 0 bytes free and 0 bytes total
device_free: cudaFree() returned an error, but there is already an
Python error set. This happen during the clean up when there is a
first erro
r and the CUDA driver is in a so bad state that it don't work anymore.
We keep the previous error set to help debugging
it.CudaNdarray_uninit:
error freeing self->devdata. (self=0x1583a070, self->devata=0xb069e0000)
Error when tring to find the memory information on the GPU: an illegal
memory access was encountered
Error freeing device pointer 0xb072c0000 (an illegal memory access was
encountered). Driver report 0 bytes free and 0 bytes total
device_free: cudaFree() returned an error, but there is already an
Python error set. This happen during the clean up when there is a
first erro
r and the CUDA driver is in a so bad state that it don't work anymore.
We keep the previous error set to help debugging
it.CudaNdarray_uninit:
error freeing dev_structure memory 0xb072c0000 (self=0x1583a070)

...
File "/u/zeyer/py-envs/py2-theano/local/lib/python2.7/site-packages/theano/compile/function_module.py",
line 595, in __call__
line: outputs = self.fn()
locals:
outputs = <not found>
self = <local> <theano.compile.function_module.Function object
at 0x148c8cd0>
self.fn = <local> <theano.gof.vm.CVM object at 0x15890d00>
File "/u/zeyer/py-envs/py2-theano/local/lib/python2.7/site-packages/theano/gof/op.py",
line 748, in rval
line: r = p(n, [x[0] for x in i], o)
locals:
r = <not found>
p = <local> <bound method GpuFlatten.perform of
<theano.sandbox.cuda.basic_ops.GpuFlatten object at 0x2acfd54f53d0>>
n = <local> GpuFlatten{2}(GpuElemwise{Add}[(0, 1)].0)
x = <local> [<CudaNdarray object at 0x1583a030>], _[0]: {len =
4501, _[0]: {len = 512, _[0]: {len = 1}}}
i = <local> [[<CudaNdarray object at 0x1583a030>]], _[0]: {_[0]:
{len = 4501, _[0]: {len = 512}}}
o = <local> [[None]]
File "/u/zeyer/py-envs/py2-theano/local/lib/python2.7/site-packages/theano/tensor/basic.py",
line 3990, in perform
line: out[0] = x.reshape(newshape)
locals:
out = <local> [None]
x = <local> <CudaNdarray object at 0x1583a030>, len = 4501,
_[0]: {len = 512, _[0]: {len = 1, _[0]: {len = 0}}}
x.reshape = <local> <built-in method reshape of CudaNdarray
object at 0x1583a030>
newshape = <local> (4501, 512)
RuntimeError: Cuda error: k_elemwise_unary_rowmajor_copy: an illegal
memory access was encountered. (n_blocks=4096,
n_threads_per_block=256)

Apply node that caused the error: GpuFlatten{2}(GpuElemwise{Add}[(0, 1)].0)
Inputs types: [CudaNdarrayType(float32, 3D)]
Inputs shapes: [(4501, 512, 1)]
Inputs strides: [(1, 4501, 0)]
Inputs values: ['not shown']


> Are you sure the power supply is strong enough for the computer? If not this
> can cause very strange errors.

I guess so. How can I be sure?

Albert Zeyer

lukematon,
30.1.2015 klo 10.27.4930.1.2015
vastaanottaja theano...@googlegroups.com

Are you sure it is python threads and not python process? Is there only 1 threads that use Theano? Do you have the problem is you use only 1 threads.


Multiple threads. Other processes would not have access because of the exclusive mode.

I'm not 100% sure that only one thread will access any Theano functions, although all the training etc will run in 1 thread.

I cannot really avoid having multiple threads in my application. So if that could be the cause for such problems, how can I solve them with multiple threads?


 

Frédéric Bastien

lukematon,
30.1.2015 klo 10.56.2330.1.2015
vastaanottaja theano-users
Compiled Theano function aren't thread safe. This mean that if you have 2 threads using the same Theano function at the same time, you will have problems. This could be wrong answer and/or crash.

If multiple threads use the same Theano function, you should put a lock around Theano function to prevent that. Otherwise, you can have each thread have its own copy of the Theano function. For now, the easier way to do so is to compile it multiple time. I think pickling/unpickling it would also work with recent Theano version (developement version). I didn't test it.

Fred

--

Albert Zeyer

lukematon,
30.1.2015 klo 11.04.5830.1.2015
vastaanottaja theano...@googlegroups.com


On Friday, January 30, 2015 at 4:56:23 PM UTC+1, nouiz wrote:
Compiled Theano function aren't thread safe. This mean that if you have 2 threads using the same Theano function at the same time, you will have problems. This could be wrong answer and/or crash.

If multiple threads use the same Theano function, you should put a lock around Theano function to prevent that. Otherwise, you can have each thread have its own copy of the Theano function. For now, the easier way to do so is to compile it multiple time. I think pickling/unpickling it would also work with recent Theano version (developement version). I didn't test it.


Ok, that is not the case. I only have one theano.function and it is only ever used by one thread at the same time.

Another thing which came to my mind, not sure:

In the same thread where I call the theano.function, before that, I do:

some_shared_var.set_value(some_data, borrow=True)

where some_shared_var = theano.shared(..., borrow=True)
and some_data is some Numpy array (on the CPU).

some_data is originally coming from another thread, although I don't access it in that other thread anymore.

I'm not sure, is that set_value blocking, or if not, keeping its own reference on some_data?

Another thing: All recent errors were always with the same reshape:

    line: out[0] = x.reshape(newshape)
    locals:
      out = <local> [None]
      x = <local> <CudaNdarray object at 0x1583a030>, len = 4501, _[0]: {len = 512, _[0]: {len = 1, _[0]: {len = 0}}}
      x.reshape = <local> <built-in method reshape of CudaNdarray object at 0x1583a030>
      newshape = <local> (4501, 512)
RuntimeError: Cuda error: k_elemwise_unary_rowmajor_copy: an illegal memory access was encountered. (n_blocks=4096, n_threads_per_block=256)
Apply node that caused the error: GpuFlatten{2}(GpuElemwise{Add}[(0, 1)].0)


Frédéric Bastien

lukematon,
30.1.2015 klo 11.24.4030.1.2015
vastaanottaja theano-users
The set_value with an ndarray, where the value is on the GPU will always trigger a transfer to the GPU. This is synchronious. So it should not cause problems.

I do not have more ideas. Did the run with cuda_memcheck return something? It can be much slower from memory.

Fred

Albert Zeyer

lukematon,
7.2.2015 klo 10.03.067.2.2015
vastaanottaja theano...@googlegroups.com
I am using CUDA_LAUNCH_BLOCKING=1 now, and it seems as if I get the error much more rarely now (only 1 out of 20 runs, where each run does about 10k theano.function calls). Although I'm not sure if that is related to CUDA_LAUNCH_BLOCKING or just by accident.

I also tried to use cuda-memcheck, but I have some troubles to get it running. I get the cuda-memcheck error 'Could not start the application (14)'. Some more details: http://stackoverflow.com/questions/28328813/cuda-memcheck-could-not-start-the-application-14

Another college which uses the same SGE cluster now suddenly gets a lot of similar 'illegal memory access' error by Theano. He is almost able to reproduce the error in about 95% of the cases.

We are both starting to think that this is somehow related to Theano (something in ndarray), multithreading and/or multiprocessing. He uses a multiprocessing.Process to do some non-CUDA/Theano related background task. Could it be that os.fork() is problematic for Theano/CUDA? He doesn't call any Theano related stuff in the subprocess, however, maybe some Python GC cleanup triggers some Theano cleanup stuff in the subprocess and this somehow messes up the CUDA driver state.

In my setting, I don't use multiprocessing, but I use multiple threads, which also should not infer with the single Theano thread, however, you never really now with all the implicit Python GC actions.

Pascal Lamblin

lukematon,
9.2.2015 klo 9.45.059.2.2015
vastaanottaja theano...@googlegroups.com
On Sat, Feb 07, 2015, Albert Zeyer wrote:
> We are both starting to think that this is somehow related to Theano
> (something in ndarray), multithreading and/or multiprocessing. He uses a
> multiprocessing.Process to do some non-CUDA/Theano related background task.
> Could it be that os.fork() is problematic for Theano/CUDA? He doesn't call
> any Theano related stuff in the subprocess, however, maybe some Python GC
> cleanup triggers some Theano cleanup stuff in the subprocess and this
> somehow messes up the CUDA driver state.

os.fork() is problematic for CUDA if you fork a process where a CUDA context
has already been initialized, even if you do not use the context in the
forked process.
If the fork happens before the initialization of CUDA, it should be OK.


--
Pascal

Frédéric Bastien

lukematon,
9.2.2015 klo 10.07.039.2.2015
vastaanottaja theano-users
This page give some info about multi-process and GPUs:

https://github.com/Theano/Theano/wiki/Using-Multiple-GPUs

in case you want more details.

Fred

Colin Raffel

lukematon,
9.3.2015 klo 11.15.209.3.2015
vastaanottaja theano...@googlegroups.com
Just wanted to chime in to say I am also getting this error, using the latest Theano from github and cuda 6.5.  The only place I've seen it happen is when I've been using hyperopt, which shouldn't be relevant.  The experiment takes a long time to run (on the order of days), so it's frustrating when this happens and crashes it.  The error is
Error freeing device pointer 0x1306009800 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total
device_free: cudaFree() returned an error, but there is already an Python error set. This happen during the clean up when there is a first error and the CUDA driver is
 in a so bad state that it don't work anymore. We keep the previous error set to help debugging it.CudaNdarray_uninit: error freeing self->devdata. (self=0x171966f0, s
elf->devata=0x1306009800)
Error when trying to find the memory information on the GPU: an illegal memory access was encountered
...
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/compile/function_module.py", line 606, in __call__
    storage_map=self.fn.storage_map)
  File "/usr/local/lib/python2.7/dist-packages/Theano-0.6.0-py2.7.egg/theano/compile/function_module.py", line 595, in __call__
    outputs = self.fn()
MemoryError: Error allocating 563200 bytes of device memory (an illegal memory access was encountered).

Aloïs

lukematon,
11.3.2015 klo 4.55.0411.3.2015
vastaanottaja theano...@googlegroups.com

With the flag optimizer_including=conv_meta, with some CNN code based on Lasagne and when i put a lot of filters per layer, i got the same error. I'm not out of memory on my GPU RAM (only 30%).

This error got away when lowering the number of filters, or when removing the flag optimizer_including=conv_meta.

I have a full page of errors, ask me if needed.

The error pops when i get the finetuning function : (the model is already built).

I'm using K40, but the error also pops with another GPU.

Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0xbc5000000 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total 
device_free: cudaFree() returned an error, but there is already an Python error set. This happen during the clean up when there is a first error and the CUDA driver is in a so bad state that it don't work anymore. We keep the previous error set to help debugging it.CudaNdarray_uninit: error freeing self->devdata. (self=0x7fca69ad6c30, self->devata=0xbc5000000)

Error when trying to find the memory information on the GPU: an illegal memory access was encountered
Error allocating 30515200 bytes of device memory (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total 
Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0xbc3280000 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total 
CudaNdarray_uninit: error freeing self->devdata. (self=0x7fca69ad6df0, self->devata=0xbc3280000)
Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0xbbf7a0000 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total 
device_free: cudaFree() returned an error, but there is already an Python error set. This happen during the clean up when there is a first error and the CUDA driver is in a so bad state that it don't work anymore. We keep the previous error set to help debugging it.CudaNdarray_uninit: error freeing self->devdata. (self=0x7fca69ad6bb0, self->devata=0xbbf7a0000)
Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0xb047c9800 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total 
device_free: cudaFree() returned an error, but there is already an Python error set. This happen during the clean up when there is a first error and the CUDA driver is in a so bad state that it don't work anymore. We keep the previous error set to help debugging it.CudaNdarray_uninit: error freeing dev_structure memory 0xb047c9800 (self=0x7fca69ad6bb0)
Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0xbbda40000 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total 
device_free: cudaFree() returned an error, but there is already an Python error set. This happen during the clean up when there is a first error and the CUDA driver is in a so bad state that it don't work anymore. We keep the previous error set to help debugging it.CudaNdarray_uninit: error freeing self->devdata. (self=0x7fca6a1aef70, self->devata=0xbbda40000)
ERROR (theano.gof.opt): Optimization failure due to: LocalOptGroup(,local_conv_dnn,local_conv_gemm)
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/theano/gof/opt.py", line 1493, in process_node
replacements = lopt.transform(node)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/opt.py", line 991, in transform
repl = opt.transform(node)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/opt.py", line 881, in transform
for opt in self.optimizers:
MemoryError: error freeing device pointer 0xbc3280000 (an illegal memory access was encountered)

Albert Zeyer

lukematon,
11.3.2015 klo 5.13.3711.3.2015
vastaanottaja theano...@googlegroups.com
I also still have this error from time to time.

Up to now, I'm pretty confident that it is related to multithreading.
Also, multiprocessing (multiprocessing.Process or os.fork()) can cause
similar errors under some circumstances. In case you use exclusive
mode, it will probably be more clear if multiprocessing is a problem
in your case - so you can maybe rule that case out. However,
multithreading is not so clear.

Theano functions are not multithreading safe. However, that is not a
problem for me because I'm only using it in one thread. However, I
still think that other threads might cause these problems. Maybe it is
related to the GC of Python which frees some Cuda_Ndarray in some
other thread while the theano.function is running.

I looked a bit at the relevant Theano code and not sure if it covers
all such cases.
https://github.com/Theano/Theano/blob/master/theano/sandbox/cuda/cuda_ndarray.cu

Note that you might even not be aware that you have some background
threads. Some Python stdlib code can spawn such background threads.
E.g. multiprocessing.Queue will do that.

Btw, can some of you vote to undelete the relevant SO question?
http://stackoverflow.com/questions/28221191
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "theano-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/theano-users/Pu4YKlZKwm4/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

Pascal Lamblin

lukematon,
11.3.2015 klo 14.25.0111.3.2015
vastaanottaja theano...@googlegroups.com
On Wed, Mar 11, 2015, Albert Zeyer wrote:
> Up to now, I'm pretty confident that it is related to multithreading.
> Also, multiprocessing (multiprocessing.Process or os.fork()) can cause
> similar errors under some circumstances. In case you use exclusive
> mode, it will probably be more clear if multiprocessing is a problem
> in your case - so you can maybe rule that case out. However,
> multithreading is not so clear.

I think it was established some time ago that spawning new threads
in Python after the CUDA context is initialized could lead to such
problems, as CUDA does not support that kind of things. The symptom
would be double "free"s on memory allocated before the thread was
created (as each thread would try to free it), which would then put the
GPU in an inconsistent state.

I think you could create a bunch of threads before theano is imported, and then either:
- only import theano in one thread, or
- set device to cpu, and then call theano.sandbox.cuda.use(...) in only
the thread that will access the GPU.
> You received this message because you are subscribed to the Google Groups "theano-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to theano-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Pascal

Albert Zeyer

lukematon,
27.3.2015 klo 7.30.1827.3.2015
vastaanottaja theano...@googlegroups.com
For anyone who has this problem: I cannot avoid having multiple
threads, and until this is fixed in Theano, I create a new subprocess
with a single thread where I do all the Theano work. This also has
several advantages such as: More clear separation of the code, being
faster in some cases because it all really runs in parallel, and being
able to use multiple GPUs.

Note that just using the multiprocessing module did not work for me
that well because there are a few libs (Numpy and others, and maybe
Theano itself) which might behave bad in a forked process (depending
on the versions, the OS and race conditions). Thus, I needed a real
subprocess (fork + exec, not just fork).

My code is here, in case anyone is interested in this:
https://gist.github.com/albertz/4177e40d41cb7f9f7c68

There is ExecingProcess which is modeled after multiprocessing.Process
but does a fork+exec. (Btw, on Windows, the multiprocessing module
will anyway do this, because there is no fork on Windows.)
And there is AsyncTask which adds up a duplex pipe to this which works
with both ExecingProcess and the standard multiprocessing.Process.

Frédéric Bastien

lukematon,
27.3.2015 klo 10.25.2527.3.2015
vastaanottaja theano-users
Thanks for sharing that.

Do you want to add a link with some description/explanation of why/when this is useful on that page:

https://github.com/Theano/Theano/wiki/Using-Multiple-GPUs

That way people will find it more frequently.

Do this fix all the crash you add?

Also, I do not know why but it happened to me a few time that Python gc didn't do all the clean up (and we rely on it to free some of the GPU stuff). So calling

import gc
gc.collect()

a few time, helped in some cases.

Fred

Albert Zeyer

lukematon,
7.4.2015 klo 7.52.137.4.2015
vastaanottaja theano...@googlegroups.com


On Friday, March 27, 2015 at 3:25:25 PM UTC+1, nouiz wrote:
Thanks for sharing that.

Do you want to add a link with some description/explanation of why/when this is useful on that page:

https://github.com/Theano/Theano/wiki/Using-Multiple-GPUs

That way people will find it more frequently.

Did that now.
 
Do this fix all the crash you add?

So far, yes. Not a single crash anymore. And I already have done quite a lot of experiments since then.
 

Samuel Leeman-Munk

lukematon,
8.4.2015 klo 8.55.038.4.2015
vastaanottaja theano...@googlegroups.com
I'm running on a single thread. I've even tried disabling openmp, and my GPU still returns this error.

Error when tring to find the memory information on the GPU: unspecified launch failure
Error freeing device pointer 0xf025c5800 (unspecified launch failure). Driver report 0 bytes free and 0 bytes total 
Error when tring to find the memory information on the GPU: unspecified launch failure
Error freeing device pointer 0xf02b7da00 (unspecified launch failure). Driver report 0 bytes free and 0 bytes total 
device_free: cudaFree() returned an error, but there is already an Python error set. This happen during the clean up when there is a first error and the CUDA driver is in a so bad state that it don't work anymore. We keep the previous error set to help debugging it.CudaNdarray_uninit: error freeing self->devdata. (self=0x136e28070, self->devata=0xf02b7da00)
Error when trying to find the memory information on the GPU: unspecified launch failure
Error allocating 276560 bytes of device memory (unspecified launch failure). Driver report 0 bytes free and 0 bytes total 

Sam

Samuel Leeman-Munk

lukematon,
8.4.2015 klo 8.56.318.4.2015
vastaanottaja theano...@googlegroups.com
I've also installed a new GPU to run the display on my computer so my system uses my high performance GPU exclusively for processing.

Samuel Leeman-Munk

lukematon,
8.4.2015 klo 14.34.368.4.2015
vastaanottaja theano...@googlegroups.com
I just finished carefully making sure that my system was using the latest version of Theano, the latest version of CUDA, the latest version of my card's driver, and the proper host compiler for this version of nvcc.
I'm still getting the issue.

Pascal Lamblin

lukematon,
8.4.2015 klo 15.18.238.4.2015
vastaanottaja theano...@googlegroups.com
On Wed, Apr 08, 2015, Samuel Leeman-Munk wrote:
> I just finished carefully making sure that my system was using the latest
> version of Theano, the latest version of CUDA, the latest version of my
> card's driver, and the proper host compiler for this version of nvcc.
> I'm still getting the issue.

Just to make sure: can you compile and run example programs that are
distributed with the CUDA SDK?

Cheng Guo

lukematon,
6.6.2015 klo 7.01.046.6.2015
vastaanottaja theano...@googlegroups.com
I have the same problem. I have GTX 980 and theano, cuda, cudnn up to date. Here is the error message with cuda-memcheck:

========= CUDA-MEMCHECK
Using gpu device 0: GeForce GTX 980
Loading the dataset...
Loaded 8155 training examples, 1019 valid examples and 1020 test examples
num_valid_chunks=0
... building the model
W_bound= 0.00690197037968
========= Program hit cudaErrorInitializationError (error 3) due to "initialization error" on CUDA API call to cudaThreadSynchronize. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2e40d3]
=========     Host Frame:/usr/local/cuda-7.0/lib64/libcudart.so.7.0 (cudaThreadSynchronize + 0x166) [0x2f8a6]
=========     Host Frame:/home/cheng/.theano/compiledir_Linux-3.13--generic-x86_64-with-LinuxMint-17-qiana-x86_64-3.4.0-64/cuda_ndarray/cuda_ndarray.so (_Z11device_freePv + 0x40) [0x68f0]
=========     Host Frame:/home/cheng/.theano/compiledir_Linux-3.13--generic-x86_64-with-LinuxMint-17-qiana-x86_64-3.4.0-64/cuda_ndarray/cuda_ndarray.so [0x6bdf]
=========     Host Frame:python3 [0x31af4]
=========     Host Frame:python3 [0x31980]
=========     Host Frame:python3 [0x6bce9]
=========     Host Frame:python3 [0x31980]
=========     Host Frame:python3 [0x5a0aa]
=========     Host Frame:python3 [0x33a42]
=========     Host Frame:python3 [0x1c95b9]
=========     Host Frame:python3 (_PyObject_GC_NewVar + 0xd1) [0x6a1f1]
=========     Host Frame:python3 (PyTuple_New + 0x65) [0x6a295]
=========     Host Frame:python3 [0x1aa38c]
=========     Host Frame:python3 [0x1aaa1a]
=========     Host Frame:python3 [0x1aa3e2]
=========     Host Frame:python3 [0x1aaa1a]
=========     Host Frame:python3 [0x11616a]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x786c) [0x17c7bc]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x17df80]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========     Host Frame:python3 (_PyObject_CallMethodIdObjArgs + 0xeb) [0x10c06b]
=========     Host Frame:python3 (PyImport_ImportModuleLevelObject + 0x771) [0xbbda1]
=========     Host Frame:python3 [0x1fa00b]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x6a3a) [0x17b98a]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x17df80]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========     Host Frame:python3 (_PyObject_CallMethodIdObjArgs + 0xeb) [0x10c06b]
=========     Host Frame:python3 (PyImport_ImportModuleLevelObject + 0x632) [0xbbc62]
=========     Host Frame:python3 [0x1fa00b]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========     Host Frame:python3 [0x367b8]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x3e99) [0x178de9]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x20ed9c]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x6a3a) [0x17b98a]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x17df80]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========
========= Program hit cudaErrorInitializationError (error 3) due to "initialization error" on CUDA API call to cudaFree. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2e40d3]
=========     Host Frame:/usr/local/cuda-7.0/lib64/libcudart.so.7.0 (cudaFree + 0x186) [0x3afd6]
=========     Host Frame:/home/cheng/.theano/compiledir_Linux-3.13--generic-x86_64-with-LinuxMint-17-qiana-x86_64-3.4.0-64/cuda_ndarray/cuda_ndarray.so (_Z11device_freePv + 0x50) [0x6900]
=========     Host Frame:/home/cheng/.theano/compiledir_Linux-3.13--generic-x86_64-with-LinuxMint-17-qiana-x86_64-3.4.0-64/cuda_ndarray/cuda_ndarray.so [0x6bdf]
=========     Host Frame:python3 [0x31af4]
=========     Host Frame:python3 [0x31980]
=========     Host Frame:python3 [0x6bce9]
=========     Host Frame:python3 [0x31980]
=========     Host Frame:python3 [0x5a0aa]
=========     Host Frame:python3 [0x33a42]
=========     Host Frame:python3 [0x1c95b9]
=========     Host Frame:python3 (_PyObject_GC_NewVar + 0xd1) [0x6a1f1]
=========     Host Frame:python3 (PyTuple_New + 0x65) [0x6a295]
=========     Host Frame:python3 [0x1aa38c]
=========     Host Frame:python3 [0x1aaa1a]
=========     Host Frame:python3 [0x1aa3e2]
=========     Host Frame:python3 [0x1aaa1a]
=========     Host Frame:python3 [0x11616a]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x786c) [0x17c7bc]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x17df80]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========     Host Frame:python3 (_PyObject_CallMethodIdObjArgs + 0xeb) [0x10c06b]
=========     Host Frame:python3 (PyImport_ImportModuleLevelObject + 0x771) [0xbbda1]
=========     Host Frame:python3 [0x1fa00b]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x6a3a) [0x17b98a]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x17df80]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========     Host Frame:python3 (_PyObject_CallMethodIdObjArgs + 0xeb) [0x10c06b]
=========     Host Frame:python3 (PyImport_ImportModuleLevelObject + 0x632) [0xbbc62]
=========     Host Frame:python3 [0x1fa00b]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========     Host Frame:python3 [0x367b8]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x3e99) [0x178de9]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x20ed9c]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x6a3a) [0x17b98a]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x17df80]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========
========= Program hit cudaErrorInitializationError (error 3) due to "initialization error" on CUDA API call to cudaGetLastError. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2e40d3]
=========     Host Frame:/usr/local/cuda-7.0/lib64/libcudart.so.7.0 (cudaGetLastError + 0x163) [0x2f083]
=========     Host Frame:/home/cheng/.theano/compiledir_Linux-3.13--generic-x86_64-with-LinuxMint-17-qiana-x86_64-3.4.0-64/cuda_ndarray/cuda_ndarray.so (_Z11device_freePv + 0x7c) [0x692c]
=========     Host Frame:/home/cheng/.theano/compiledir_Linux-3.13--generic-x86_64-with-LinuxMint-17-qiana-x86_64-3.4.0-64/cuda_ndarray/cuda_ndarray.so [0x6bdf]
=========     Host Frame:python3 [0x31af4]
=========     Host Frame:python3 [0x31980]
=========     Host Frame:python3 [0x6bce9]
=========     Host Frame:python3 [0x31980]
=========     Host Frame:python3 [0x5a0aa]
=========     Host Frame:python3 [0x33a42]
=========     Host Frame:python3 [0x1c95b9]
=========     Host Frame:python3 (_PyObject_GC_NewVar + 0xd1) [0x6a1f1]
=========     Host Frame:python3 (PyTuple_New + 0x65) [0x6a295]
=========     Host Frame:python3 [0x1aa38c]
=========     Host Frame:python3 [0x1aaa1a]
=========     Host Frame:python3 [0x1aa3e2]
=========     Host Frame:python3 [0x1aaa1a]
=========     Host Frame:python3 [0x11616a]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x786c) [0x17c7bc]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x17df80]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========     Host Frame:python3 (_PyObject_CallMethodIdObjArgs + 0xeb) [0x10c06b]
=========     Host Frame:python3 (PyImport_ImportModuleLevelObject + 0x771) [0xbbda1]
=========     Host Frame:python3 [0x1fa00b]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x6a3a) [0x17b98a]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x17df80]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========     Host Frame:python3 (_PyObject_CallMethodIdObjArgs + 0xeb) [0x10c06b]
=========     Host Frame:python3 (PyImport_ImportModuleLevelObject + 0x632) [0xbbc62]
=========     Host Frame:python3 [0x1fa00b]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========     Host Frame:python3 [0x367b8]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x3e99) [0x178de9]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x20ed9c]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x6a3a) [0x17b98a]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x17df80]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========
========= Program hit cudaErrorInitializationError (error 3) due to "initialization error" on CUDA API call to cudaMemGetInfo. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2e40d3]
=========     Host Frame:/usr/local/cuda-7.0/lib64/libcudart.so.7.0 (cudaMemGetInfo + 0x1a9) [0x39429]
=========     Host Frame:/home/cheng/.theano/compiledir_Linux-3.13--generic-x86_64-with-LinuxMint-17-qiana-x86_64-3.4.0-64/cuda_ndarray/cuda_ndarray.so (_Z11device_freePv + 0x9d) [0x694d]
=========     Host Frame:/home/cheng/.theano/compiledir_Linux-3.13--generic-x86_64-with-LinuxMint-17-qiana-x86_64-3.4.0-64/cuda_ndarray/cuda_ndarray.so [0x6bdf]
=========     Host Frame:python3 [0x31af4]
=========     Host Frame:python3 [0x31980]
=========     Host Frame:python3 [0x6bce9]
=========     Host Frame:python3 [0x31980]
=========     Host Frame:python3 [0x5a0aa]
=========     Host Frame:python3 [0x33a42]
=========     Host Frame:python3 [0x1c95b9]
=========     Host Frame:python3 (_PyObject_GC_NewVar + 0xd1) [0x6a1f1]
=========     Host Frame:python3 (PyTuple_New + 0x65) [0x6a295]
=========     Host Frame:python3 [0x1aa38c]
=========     Host Frame:python3 [0x1aaa1a]
=========     Host Frame:python3 [0x1aa3e2]
=========     Host Frame:python3 [0x1aaa1a]
=========     Host Frame:python3 [0x11616a]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x786c) [0x17c7bc]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x17df80]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========     Host Frame:python3 (_PyObject_CallMethodIdObjArgs + 0xeb) [0x10c06b]
=========     Host Frame:python3 (PyImport_ImportModuleLevelObject + 0x771) [0xbbda1]
=========     Host Frame:python3 [0x1fa00b]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x6a3a) [0x17b98a]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x17df80]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========     Host Frame:python3 (_PyObject_CallMethodIdObjArgs + 0xeb) [0x10c06b]
=========     Host Frame:python3 (PyImport_ImportModuleLevelObject + 0x632) [0xbbc62]
=========     Host Frame:python3 [0x1fa00b]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========     Host Frame:python3 [0x367b8]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x3e99) [0x178de9]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x20ed9c]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x6a3a) [0x17b98a]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x17df80]
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========
========= Program hit cudaErrorInitializationError (error 3) due to "initialization error" on CUDA API call to cudaGetLastError. 
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2e40d3]
=========     Host Frame:/usr/local/cuda-7.0/lib64/libcudart.so.7.0 (cudaGetLastError + 0x163) [0x2f083]
=========     Host Frame:/home/cheng/.theano/compiledir_Linux-3.13--generic-x86_64-with-LinuxMint-17-qiana-x86_64-3.4.0-64/cuda_ndarray/cuda_ndarray.so (_Z11device_freePv + 0x111) [0x69c1]
=========     Host Frame:/home/cheng/.theano/compiledir_Linux-3.13--generic-x86_64-with-LinuxMint-17-qiana-x86_64-3.4.0-64/cuda_ndarray/cuda_ndarray.so [0x6bdf]
=========     Host Frame:python3 [0x31af4]
=========     Host Frame:python3 [0x31980]
Error when tring to find the memory information on the GPU: initialization error
=========     Host Frame:python3 [0x6bce9]
Error freeing device pointer 0x904c08000 (initialization error). Driver report 0 bytes free and 0 bytes total 
=========     Host Frame:python3 [0x31980]
=========     Host Frame:python3 [0x5a0aa]
=========     Host Frame:python3 [0x33a42]
=========     Host Frame:python3 [0x1c95b9]
CudaNdarray_uninit: error freeing self->devdata. (self=0x7f37f7759030, self->devata=0x904c08000)
=========     Host Frame:python3 (_PyObject_GC_NewVar + 0xd1) [0x6a1f1]
=========     Host Frame:python3 (PyTuple_New + 0x65) [0x6a295]
=========     Host Frame:python3 [0x1aa38c]
=========     Host Frame:python3 [0x1aaa1a]
=========     Host Frame:python3 [0x1aa3e2]
=========     Host Frame:python3 [0x1aaa1a]
=========     Host Frame:python3 [0x11616a]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x786c) [0x17c7bc]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
Exception ignored in: 'garbage collection'
=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
=========     Host Frame:python3 [0x17df80]
MemoryError: error freeing device pointer 0x904c08000 (initialization error)
=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
=========     Host Frame:python3 (_PyObject_CallMethodIdObjArgs + 0xeb) [0x10c06b]
Fatal Python error: unexpected exception during garbage collection
=========     Host Frame:python3 (PyImport_ImportModuleLevelObject + 0x771) [0xbbda1]
=========     Host Frame:python3 [0x1fa00b]

=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x6a3a) [0x17b98a]
Current thread 0x=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
00007f380ce14740=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
 (most recent call first):
=========     Host Frame:python3 [0x17df80]
  File =========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
"=========     Host Frame:python3 (_PyObject_CallMethodIdObjArgs + 0xeb) [0x10c06b]
<f=========     Host Frame:python3 (PyImport_ImportModuleLevelObject + 0x632) [0xbbc62]
r=========     Host Frame:python3 [0x1fa00b]
oz=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
e=========     Host Frame:python3 [0x367b8]
n =========     Host Frame:python3 (PyEval_EvalFrameEx + 0x3e99) [0x178de9]
i=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
m=========     Host Frame:python3 [0x20ed9c]
po=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x6a3a) [0x17b98a]
r=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
t=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x705a) [0x17bfaa]
li=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
b=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
._=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
b=========     Host Frame:python3 (PyEval_EvalFrameEx + 0x718b) [0x17c0db]
o=========     Host Frame:python3 (PyEval_EvalCodeEx + 0x943) [0x17d3d3]
o=========     Host Frame:python3 [0x17df80]
ts=========     Host Frame:python3 (PyObject_Call + 0x5a) [0x3810a]
t=========
rap>", line 656 in _compile_bytecode
  File "<frozen importlib._bootstrap>", line 1547 in get_code
  File "<frozen importlib._bootstrap>", line 1444 in exec_module
  File "<frozen importlib._bootstrap>", line 1129 in _exec
  File "<frozen importlib._bootstrap>", line 1200 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 2203 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 2214 in _find_and_load
  File "<frozen importlib._bootstrap>", line 321 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 2261 in _handle_fromlist
  File "/usr/lib/python3/dist-packages/PIL/BmpImagePlugin.py", line 30 in <module>
  File "<frozen importlib._bootstrap>", line 321 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1448 in exec_module
  File "<frozen importlib._bootstrap>", line 1129 in _exec
  File "<frozen importlib._bootstrap>", line 1200 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 2203 in _find_and_load_unW_bound= 0.00835191380976
locked
  File "<frozen importlib._bootstrap>", line 2214 in _find_and_load
  File "<frozen importlib._bootstrap>", line 321 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 2261 in _handle_fromlist
  File "/usr/lib/python3/dist-packages/PIL/Image.py", line 324 in preinit
  File "/usr/lib/python3/dist-packages/PIL/Image.py", line 2002 in open
  File "/usr/lib/python3/dist-packages/scipy/ndimage/io.py", line 44 in imread
  File "/home/cheng/workplace/convnet/data_generator.py", line 25 in load_images
  File "/home/cheng/workplace/convnet/data_generator.py", line 210 in realtime_augmented_data_gen_single_thread
  File "/home/cheng/workplace/convnet/data_generator.py", line 252 in fill_buffer_process
  File "/usr/lib/python3.4/multiprocessing/process.py", line 93 in run
  File "/usr/lib/python3.4/multiprocessing/process.py", line 254 in _bootstrap
  File "/usr/lib/python3.4/multiprocessing/popen_fork.py", line 77 in _launch
  File "/usr/lib/python3.4/multiprocessing/popen_fork.py", line 21 in __init__
  File "/usr/lib/python3.4/multiprocessing/context.py", line 267 in _Popen
  File "/usr/lib/python3.4/multiprocessing/context.py", line 212 in _Popen
  File "/usr/lib/python3.4/multiprocessing/process.py", line 105 in start

Cheng Guo

lukematon,
6.6.2015 klo 11.22.456.6.2015
vastaanottaja theano...@googlegroups.com
I tried changing CNN parameters, reinstalling libs, reboot computers and nothing helps. Finally I found a method to make it work at least for my case:

I run my program with THEANO_FLAGS=mode='DebugMode'. No error is reported. Then I run with normal mode. The error no longer shows up agin.
...

Frédéric Bastien

lukematon,
11.6.2015 klo 17.56.4911.6.2015
vastaanottaja theano-users

If using once fix the problem for the following process, it is very strange.

I won't be surprised that there is an hardware problem, like a bad PSU, especially of you added a second GPU.

Fred

--

Zichao Yang

lukematon,
29.6.2015 klo 3.04.5729.6.2015
vastaanottaja theano...@googlegroups.com
May I ask if this debug is solved? I have meet with similar problems. I have read through all the previous posts and don't find any solutions yet...

Here is the error message :

Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0xb047d3800 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total
Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0xb15e1e000 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total
device_free: cudaFree() returned an error, but there is already an Python error set. This happen during the clean up when there is a first error and the CUDA driver is in a so bad state that it don't work anymore. We keep the previous error set to help debugging it.CudaNdarray_uninit: error freeing self->devdata. (self=0x20e35230, self->devata=0xb15e1e000)
Error when trying to find the memory information on the GPU: an illegal memory access was encountered
Error allocating 16777216 bytes of device memory (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total
Traceback (most recent call last):
  File "image_qa.py", line 141, in <module>
    label_idx, label_mask, label)
  File "/home/zichaoy/local/lib/python2.6/site-packages/Theano-0.7.0-py2.6.egg/theano/compile/function_module.py", line 608, in __call__
    storage_map=self.fn.storage_map)
  File "/home/zichaoy/local/lib/python2.6/site-packages/Theano-0.7.0-py2.6.egg/theano/compile/function_module.py", line 597, in __call__
    outputs = self.fn()
MemoryError: Error allocating 16777216 bytes of device memory (an illegal memory access was encountered).
Apply node that caused the error: GpuElemwise{mul,no_inplace}(CudaNdarrayConstant{error while transferring the value: error copying data to host}, GpuDimShuffle{x,x}.0, image_hidden_w)
Inputs types: [CudaNdarrayType(float32, (True, True)), CudaNdarrayType(float32, (True, True)), CudaNdarrayType(float32, matrix)]
Inputs shapes: [(1, 1), (1, 1), (4096, 1024)]
Inputs strides: [(0, 0), (0, 0), (1024, 1)]
Inputs values: [<CudaNdarray object at 0x17a8c2f0>, <CudaNdarray object at 0x20e40e70>, 'not shown']


I have been using theano for quite a while and have this problem for the current project only.

My case is quite simple and there is not multiple processing/threads involved at all.

Can anyone give some suggestion what the problem can be? Many thanks!


On Tuesday, September 16, 2014 at 6:41:34 PM UTC-7, John Coolidge wrote:
I have an error in my theano code that only surfaces when I run it on a gpu (error output at end).  I've run the code on my laptop's GeForce GTX 870M and amazon's EC2 GRID K520 and both have the error.  The symptoms of the error are a little strange, but I think generally seem to point to low memory:

1.) I use lookup tables and if I get rid of the largest lookup table and replace it with a smaller one the error disappears.  Removing one of the smaller lookup tables without replacement also fixes the error.

2.) If I decrease the batch size, the error goes away

3.) I've found 3 samples in the training dataset that are correlated with the error.  If I replace them with a different sample the error goes away, yet if I just delete them outright the error persists. There's nothing immediately obviously wrong with these samples and seem to be preprocessed correctly.

4.) If I train the network using SGD with rprop as opposed to vanilla SGD, the error goes away.  This is surprising because the rprop implementation should use more memory as it requires additional shared variables.

5.) If at random I remove enough samples (> ~100) from the training dataset, the error goes away.


Here's the console output on EC2:
Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0x7023a4c00 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total
Error when tring to find the memory information on the GPU: an illegal memory access was encountered
Error freeing device pointer 0x702a02200 (an illegal memory access was encountered). Driver report 0 bytes free and 0 bytes total
device_free: cudaFree() returned an error, but there is already an Python error set. This happen during the clean up when there is a first error and the CUDA driver is in a so bad state that it don't work anymore. We keep the previous error set to help debugging it.CudaNdarray_uninit: error freeing self->devdata. (self=0xcd7fe30, self->devata=0x702a02200)
Traceback (most recent call last):

Frédéric Bastien

lukematon,
29.6.2015 klo 10.37.5129.6.2015
vastaanottaja theano-users
Can you run with cuda-memcheck?

cuda-memcheck python ...

This could help understand what is going on.

Fred

--

Zichao Yang

lukematon,
29.6.2015 klo 13.09.2229.6.2015
vastaanottaja theano...@googlegroups.com
The error message is long. I attached it.

Thanks,
zichao
error.txt

Frédéric Bastien

lukematon,
3.7.2015 klo 22.35.083.7.2015
vastaanottaja theano-users
The problem is that you have some indexing like this:

a_matrix[a_int_vector]

where the indices are outside the bound. Theano do not handle that correctly it seam (we try to read the bad data). I checked the code, we also do not support negative index in that case, but I'm not sure we should support it there.

If you run on the CPU, if the index are really out of bound, you will get the good error. If you use negative index, stop doing that for now, until it is fixed.


thanks for the report

Fred
Vastaa kaikille
Vastaa kirjoittajalle
Välitä
0 uutta viestiä