I'm using HOOMD and 99% of the time (or more) it works perfectly. But once in a while I get a random CUDA error, which is not reproducible. For instance, sometimes hoomd.context.initialize fails with a CUDA error. More recently I got the following error:
RuntimeError: CUDA Error**ERROR**: an illegal memory access was encountered before /hoomd/GPUArray.h:672This one is apparently a memory deallocation error. It happens in the middle of a run.
Those errors seem to appear randomly (and rarely) in an otherwise working code.So my questions are:1) do these errors necessarily mean an error in the code (or compilation issues)?2) is there a way to catch those errors from the Python script and continue running?For the second part, I tried to catch an exception with try:except, and then try again (hoomd.context.initialize), but usually it fails again.I'm using v2.1.9 and I did create my own plugins: forces and integrators. It is running on k-80 Teslas in our cluster.
--
You received this message because you are subscribed to the Google Groups "hoomd-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
To post to this group, send email to hoomd...@googlegroups.com.
Visit this group at https://groups.google.com/group/hoomd-users.
For more options, visit https://groups.google.com/d/optout.
Hi Kirill,
On Feb 13, 2018, at 11:43 AM, Kirill Moskovtsev <kmosk...@gmail.com> wrote:If it fails in context.initialize() this would hint at a GPU problem, not a problem with the code.I'm using HOOMD and 99% of the time (or more) it works perfectly. But once in a while I get a random CUDA error, which is not reproducible. For instance, sometimes hoomd.context.initialize fails with a CUDA error. More recently I got the following error:RuntimeError: CUDA Error**ERROR**: an illegal memory access was encountered before /hoomd/GPUArray.h:672This one is apparently a memory deallocation error. It happens in the middle of a run.We will need to track this down further. Can you run with —gpu_error_checking and give us the exact location of the failure?Also, a run with cuda-memcheck will be useful output.- Jens
Those errors seem to appear randomly (and rarely) in an otherwise working code.So my questions are:1) do these errors necessarily mean an error in the code (or compilation issues)?2) is there a way to catch those errors from the Python script and continue running?For the second part, I tried to catch an exception with try:except, and then try again (hoomd.context.initialize), but usually it fails again.I'm using v2.1.9 and I did create my own plugins: forces and integrators. It is running on k-80 Teslas in our cluster.--
You received this message because you are subscribed to the Google Groups "hoomd-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users+unsubscribe@googlegroups.com.
To post to this group, send email to hoomd...@googlegroups.com.
Visit this group at https://groups.google.com/group/hoomd-users.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "hoomd-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hoomd-users/y4JGIYVpeQM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hoomd-users+unsubscribe@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
J
Thanks,
Kirill
Hi Kirill,
To unsubscribe from this group and stop receiving emails from it, send an email to hoomd-users...@googlegroups.com.
To post to this group, send email to hoomd...@googlegroups.com.
Visit this group at https://groups.google.com/group/hoomd-users.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "hoomd-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hoomd-users/y4JGIYVpeQM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hoomd-users...@googlegroups.com.
To post to this group, send email to hoomd...@googlegroups.com.
Visit this group at https://groups.google.com/group/hoomd-users.
For more options, visit https://groups.google.com/d/optout.