Simulation Crash

85 views
Skip to first unread message

Mohammad Wasfi

unread,
Jan 22, 2023, 12:01:43 PM1/22/23
to ProjectChrono
Hi, 

This is a DEME-related question. 

I have been running into a problem where my simulation crashes after being normal for a while. The error I get is the following:
//////////////////////////////////////////////////
-------- Simulation crashed potentially due to too many geometries in a bin --------
Right now, the dT reported (by user specification or by calculation) max velocity is 0.133465
The contact margin thickness is 9.35108e-06
If the velocity is extremely large, then the simulation probably diverged due to encountering large particle velocities, and decreasing the step size could help.
If the velocity is fair but the margin is large compared to particle sizes, then perhaps too many contact geometries are in one bin, and decreasing the step size, update frequency or the bin size could help.
If they are both fair and you do not see "exceeding maximum allowance" reports before the crash, then it is probably not too many geometries in a bin and it crashed for other reasons.

terminate called after throwing an instance of 'std::runtime_error'
  what():  GPU Assertion: an illegal memory access was encountered. This happened in /DEM-Engine/src/algorithms/DEMCubContactDetection.cu:

////////////////////////////////////////////////////////////////

I have tried to reduce my simulation bin size to as small as 0.5*particle radius. I have also tried to reduce/increase other parameters, such as update frequency and safety multiplier but still, the simulation crashes after being normal for a while (I have a video that I could share with you via email if you like). In addition, I have tried to reduce the time step size very much (4e-7) but that did not seem to work.  Also, I have reduced my mesh to have a Total num of triangles: 6790 which I do not think is really large. I have attached my sim file for your reference. 

In addition, I have tried to use a different material from one of your demos with the same time step but I still seem to have the same problem. 

Also, one thing that I noticed, every time I increased the CDupdate frequency value, the simulation reports a higher value of Average steps per dynamic update. for example, when I set my update frequency to 15, the simulation reports the Average steps per dynamic update: 16.94662. In addition, when I increase my CD update to 20  the simulation reports the Average steps per dynamic update: 21.997. Is that how it is supposed to be?

Thank you in advance for your help, 




DEMdemo_ScrewDrop.cpp

Ruochun Zhang

unread,
Jan 22, 2023, 3:38:15 PM1/22/23
to ProjectChrono
Hi Mohammad,

First I would like you to make sure that you did not see something like "geometries in a bin exceeding maximum allowance of 256" in the error report. If you did not see it, then I'd like you to confirm that the crash happens a while into the simulation, and may occur at different time spots across several simulations.

If you can confirm these two, then it's probably a good time to have this discussion... I'll ask you to try this: Pull the latest repo, which contains a stability fix that I pushed today. Build that and run your script (if you work on your own branch it'd be the best since you can just merge the main into it). Please build it using a new and empty build folder, or alternatively, you can remove the kernel directory in your old build directory and then build it again there. Let's try if it helps, and if it does I can explain what happened.

About the update frequency, when you see what you described, that means you can increase the update frequency. If you make the bins very small, kT's workload is high so a larger CDFreq number is needed. The reported frequency can be maybe 1 or 2 higher than the limit you define just by how I calculate it, and when you see that you know dT waits for kT from time to time in the simulation (which can be confirmed in the final collaboration stats as well; but I guess if it crashes then you don't get to see it), and you should increase that number to improve efficiency. All of these should not be a problem after the solver gains the ability to adjust the frequency by itself.

Thank you,
Ruochun

Mohammad Wasfi

unread,
Jan 22, 2023, 8:40:03 PM1/22/23
to ProjectChrono
Hi Ruochun, 

Thank you for your reply. 

I would like to confirm that I did not see an error saying a bin exceeding the maximum allowance of 256" in the error report.
In addition, when I ran multiple tests, it seemed that the simulation failed at different points in the simulation. 

I have pulled from the new repository and it seems that everything is working fine right now. 

Thank you so much, 

Ruochun Zhang

unread,
Jan 23, 2023, 5:59:52 PM1/23/23
to ProjectChrono
Hi Mohammad,

If it is still working fine, then that is good and I can explain this new fix. For an end user, it is however not a big concern. 

There is a step in the domain discretization that is materialized by doing two essentially identical kernels in serial, one for allocating the correct size of memory and one for actually filling out the memory. The code used to rely on those two kernels producing the same result, and in most cases they do. However, when triangles are involved, there is something like a 1 in ten or one hundred billion chance that they do not agree, presumably because of the floating point arithmetic execution order. It can in the end leave one integer slot at a random state, and that one will then be used as an index, causing a segmentation fault. It is subtle, and I did learn something in the process of chasing it down.

Thank you,
Ruochun

Reply all
Reply to author
Forward
0 new messages