DEM-Engine - Bus error (core dumped)

194 views
Skip to first unread message

Julian Reis

unread,
Mar 28, 2024, 11:55:44 AMMar 28
to ProjectChrono
Hello,

I've tried to setup a Docker Container for the DEM-Engine using the nvidia/cuda-12.0.1-devel-ubuntu22.04 as a base image.
I followed the compile instructions from the github-repo and the code compiles fine. 
When I'm trying to run any of the test-cases tough, the simulation crashes with the following error:
Bus error (core dumped)
Right after the following outputs for the demo file SingleSphereCollide:
These owners are tracked: 0,
Meshes' owner--offset pairs: {1, 0}, {2, 1},
kT received a velocity update: 1

Are you aware of any problems like this?

Julian

Julian Reis

unread,
Mar 29, 2024, 10:58:18 AMMar 29
to ProjectChrono
I was able to run a simulation on a different GPU setup, using 2 GPUS. Is it not possible to run the DEM-Engine on a single GPU?

Ruochun Zhang

unread,
Mar 29, 2024, 2:54:35 PMMar 29
to ProjectChrono
Hi Julian,

I suspect it's a more general problem with the docker usage itself rather than DEME, since it appears you have problems accessing the device. I'll post Chatgpt's comments first, please click this link for it.

My comment is that I also think not allowing the devices to be detected, or a driver--CUDA version mismatch is most probable. If you could, please try those suggested methods, then try and see if you can install any other CUDA-based simple utilities or code with the base image at all. It's unfortunate that I don't have anything at hand to reproduce or test it, but if all the paths above cannot resolve the issue, we can probably find someone more professional to help you.

Thank you,
Ruochun

Ruochun Zhang

unread,
Mar 29, 2024, 2:57:49 PMMar 29
to ProjectChrono
Just to be clear, DEM-Engine runs on a single GPU as well and there is no difference other than being (around) half as fast.

Ruochun 

On Friday, March 29, 2024 at 10:58:18 PM UTC+8 julz....@gmail.com wrote:

Julian Reis

unread,
Mar 29, 2024, 3:38:40 PMMar 29
to ProjectChrono
Hi Ruochun,
Thank you for your answer and trying to help me.
I have been able to run a simulation in the container using the same image on another GPU machine (a cluster with several NVIDIA RTX 2080Ti w/ 12GB).
When I'm trying to run a simulation on my local machine, that I'm using for development purposes with a (NVIDIA GTX 970 w/ 4GB) the simulation crashes.
I also tried to run the simulation outside of a container, and the simulation still crashes with the same error. Also other projects using CUDA do run on my local machine.
Both machines the cluster and local machine run the exact same CUDA and NVIDIA drivers, so I'm assuming running the simulation inside the Docker Container is not the issue.

I'm assuming that there is an issue with the compute capabilities of my local GPU, is there any kind of minimum hardware requirements?

Julian

Ruochun Zhang

unread,
Mar 29, 2024, 4:23:29 PMMar 29
to ProjectChrono
Hi Julian,

I see. The minimum CC tested was 6.1 (10 series). 9 and 10 series are a big jump and DEME is a new package that uses newer CUDA features a lot. Most likely GTX 970 is not going to support them. Quite a good reason to get an upgrade I would say, no?

Thank you,
Ruochun
Message has been deleted

Julian Reis

unread,
May 13, 2024, 9:05:34 AMMay 13
to ProjectChrono
Hi Ruochun,
I've upgraded my hardware and now everything is working fine.

I'm trying to run a co-simulation with the DEM-Engine where it would be necessary to pass the acceleration for each particle to the simulation.
From the code, I've seen that there are two options, either by adding an acceleration or using a prescribed force/acceleration.

If I read the comments from the code correctly, the acceleration is only added for the next time step but not constant over the DoDynamics call?
From my tests it looks like the acceleration has no effect on the trajectory of my particle.
On the other hand, the prescribed acceleration can only be added during the initialisation, and not between DoDynamics calls.

Is there an option to add an acceleration to a particle that affects the particle over the whole DoDynamics call?

Thank you for help
Julian

Ruochun Zhang

unread,
May 14, 2024, 1:03:59 PMMay 14
to ProjectChrono
Hi Julian,

Glad that you are able to move on to doing co-simulations. 

If you use a tracker to add acceleration to some owners, then it affects only the next time step. This is to be consistent with other tracker Set methods (such as SetPos) because, well, they technically only affect the simulation once, too. This is also because setting acceleration with trackers is assumed to be used in a co-simulation, and in this case, the acceleration probably changes at each step. If the acceleration modification was to affect indefinitely then it would be the user's responsibility to deactivate it once it is not needed. Of course, this is not necessarily the best or most intuitive design choice and I am open to suggestions.

The acceleration prescription can only be added before initialization because it is just-in-time compiled into the CUDA kernels to make it more efficient. They are expected to be non-changing during the simulation and, although fixed prescribed motions are very common in DEM simulation, they are perhaps not suitable to be used in co-simulations. 

If in your test case the added acceleration seems to have no effect, then it's likely that it is too small, or the DoDynamics is called with a time length that is significantly larger than the time step size. If this is not the case and you suspect it is due to a bug, please provide a minimal reproducible example so I can look into it.

Thank you,
Ruochun

Julian Reis

unread,
May 14, 2024, 6:27:26 PMMay 14
to ProjectChrono
Thank you for your fast reply, you've been very helpful already.

I'm using the trackers to track granular particles inside a fluid flow.
Thank you for pointing out the difference in time step size and the time duration of the DoDynamics call, I'm pretty sure that is where my error is coming from.
Since we're using an adaptive time stepping for the fluid simulation, it can happen that the time step for the flow varies throughout the simulation. For this reason I'm running the DoDynamics call with the time step size of the fluid simulation. Usually the time step for the flow is much smaller than the DEM time step. (*additional question towards the end)
It would be possible to add an additional time step criterion based on the DEM simulation on the fluid side, but this would probably result in unnecessary long simulations, since we haven't fully coupled the system yet.

So when I'm passing the states of my particles, I want them to move according to the forces of the fluid. The problem I observed is exactly what you described, basically I'm just applying a short acceleration in the first DEM time step but after that the particle is not further accelerated by that force. I was able to recreate some experimental results by pre-calculating the resulting velocities from the acceleration but this is definitely not a long term solution...

For this particular case it would be handy for if the acceleration is cleared again after a DoDynamics call, and stays constant over the time steps of the DoDynamics call.
Is this something that would be easy for me to tweak in the code? Or do you maybe have an alternative suggestion for me?

* additional question: I don't know if this will ever be the case in my simulation but what would happen if the DoDynamics duration is smaller then the DEM time step?

Thank you, Julian

Ruochun Zhang

unread,
May 15, 2024, 5:22:41 AMMay 15
to ProjectChrono
To achieve what you need, there might be an easy way with the current code. First, know that you can change the time step size by calling UpdateStepSize. You can replace long DyDynamics calls with step-by-step calls to circumvent the problem. That is, replacing

my_tracker->AddAcc(...);
DEMSim.DoDynamics(a_long_time);

with

DEMSim.UpdateStepSize(current_stepsize);
for (double t = 0.; t < a_long_time; t+=current_stepsize) {
    my_tracker->AddAcc(...);
    DEMSim.DoDynamics(current_stepsize); 
}

You may be concerned about the performance and indeed, transferring an array to the device at each step will take its toll, but it's probably not that bad considering how heavy each DEM step is anyway (I may add another utility that applies a persistent acceleration later on). On the other hand, splitting a DoDynamics call into multiple pieces in a for loop alone, should affect the performance little, so you should not be worried. This way, it should be safe to advance the fluid simulation for several time steps and then advance the DEM simulation by one step. In fact, I do this in my co-simulations just fine.

A note: In theory UpdateStepSize should only be used from a synchronized solver stance, meaning after a DoDynamicsThenSync call, because the step size is used to determine how proactive the contact detection has to be. But if your step size change is a micro tweak, then you should be able to get away with it even if it follows asynchronized calls aka DoDynamics.

As for the call duration being smaller than the step size (but larger than 0): This is a good question. Right now it should still advance the simulation by a time step, which puts the simulation time ahead of what you would expect. So it's better to UpdateStepSize as needed to stay safe. This behavior might be improved later.

Thank you,
Ruochun

Julian Reis

unread,
Jun 5, 2024, 10:02:58 AMJun 5
to ProjectChrono
Screenshot 2024-06-05 at 15.56.14.pngHi Ruochun,
Thank you for that suggestion, this implementation will work for me for now.

I've been trying to run some simulations but my simulations keep crashing.
It is difficult to share a code snippet because I've already abstracted the DEME calls in my code.

My basic test setup right now looks as follows tough:
- cube represented as a mesh which I'm planning to use as my boundaries (also tried the same with using the BoxDomainBoundaryConditions), the normals are pointing inwards accordingly (1.4 in each direction)
- the cube inside is filled with clumps, simple spheres with diameter 0.04 and a spacing of 0.1
- Mat properties: {"E", 1e9}, {"nu", 0.33}, {"CoR", 0.8}, {"mu", 0.3}, {"Crr", 0.00}
- timestep: 1e-6
- initial bin size: 1.0
(I had problems when I was not settings the initial bin size, the simulation already crashed during initialisation. Then I saw the comment of setting the bin size to 25*granular radius in the DEMdemo_Mixer file. Then the simulation kept running for a while.)
- max CDUpdateFreq: 20

Eventually the simulation crashes with the following error:
// 233 contacts were active at time 0.192906 on dT, but they are not detected on kT, therefore being removed unexpectedly!
// terminate called recursively
// terminate called after throwing an instance of 'std::runtime_error'
// GPU Assertion: an illegal memory access was encountered. This happened in /tmp/chrono-dem/src/DEM/dT.cpp:1941 (also happened at different locations in dT.cpp)

I've been trying to tweak some of the parameters but couldn't find a set of reasonable parameters. Do you maybe have any suggestions on what could be wrong?
I've added an screenshot from the last output I could get from the simulation, for me it looks fine until then. I've also added my particle output file and mesh file, it's the output from my code so it's a H5 file if that is of any help.

Also I don't really understand how I should set the initial bin size, can you maybe give me some insight on how this parameter affects the simulation?

Thank you for your help, so far!
Julian
phase_Granular.h5part
Housing_0.obj

Ruochun Zhang

unread,
Jun 6, 2024, 11:38:17 AMJun 6
to ProjectChrono
Hi Julian,

Let me get a couple of things out of the way first.

1. Quite often you see "illegal memory access" errors in DEME when the simulation diverges very badly. If it diverges only a bit badly, you are more likely to see "velocity too high" or "too many spheres in bin" errors. I don't fully understand the mechanism but it is empirically so.
2. That "233 contacts were active at time 0.192906 on dT, but they are not detected on kT, therefore being removed unexpectedly!" is in fact a warning and you can turn it off. But it does indicate the simulation is already not running normally at that point.
3. You usually don't have to worry about the bin size, or explicitly set it. It will be selected and adapted during the simulation (unless you turn this functionality off). A bin can at least hold 32768 spheres and it should have enough time to adapt if the number of spheres per bin is raising alarmingly. So if the initial bin size does matter in how long your simulation can run, the simulation is probably escalating quickly from the start anyway, and you should worry about other things.

The information you gave allows me to make some guesses about the cause, but not much more than that -- especially when the parameters you showed seem reasonable. I suspect that this is due to an invisible boundary at an unexpected location. I suggest you debug using the following procedure:
1. Remove all analytical boundaries (no automatic box domain boundaries ("none" option), no analytical objects etc.) and the mesh, then run the simulation. Make sure you see the particles free fall in space without a problem.
2. Then add your meshed box back to the simulation. See if the particles can make contact with it normally.
3. Revert everything back to the original and see if it runs.

This should help you isolate the problem: which wall is affecting?

Thank you,
Ruochun

Julian Reis

unread,
Jun 7, 2024, 11:15:13 AMJun 7
to ProjectChrono
Hi Ruochun,
Thank you for your help again.
After doing some more testing and running my test case independently from my code, I found that it was not a problem of the setup.

My simulation seems to crash because of an unknown interaction with my code. My code is also using the same GPU for calculations and apparently there seems to be problem there.
When I run the setup and exclude all the other GPU calls, the simulation runs without any problems. So there has to be a problem there somewhere...
I know this is a problem that I probably have to figure out by myself.
My only question would be if you maybe have any experiences with co-simulations with other GPU solvers.

Julian

Ruochun Zhang

unread,
Jun 8, 2024, 6:26:47 AMJun 8
to ProjectChrono
Hi Julian,

That is a possibility and it's interesting. DEME does automatically use up to 2 available GPUs the user doesn't control this behavior, yet it creates, uses, and synchronizes its own two GPU streams, so I don't think it will affect other GPU-bound applications running simultaneously. However, admittedly I never tested running another GPU-bound applications as a part of the co-simulation, and maybe I should, in due time. It obviously can be an interplay, but I am of course more inclined to guess that it's because of some inappropriate device synchronization calls from the other application in question.

Please let us know what you find. And if you can let us know what this application you ran alongside was, maybe we can help better. 

Thank you,
Ruochun

Julian Reis

unread,
Jun 10, 2024, 11:10:03 AMJun 10
to ProjectChrono
Hi Ruochun,

I've been trying to find something but haven't been successful so far.

I'm working on an SPH solver that is based on OpenFPM (http://openfpm.mpi-cbg.de). Since all the CUDA calls are basically running through OpenFPM, I'm assuming that there has to be an issue there somewhere.
Also, after I removed all the explicitly set solver settings, the simulation already crashes during the setup with the following error.
terminate called after throwing an instance of 'std::runtime_error'
  what():  GPU Assertion: an illegal memory access was encountered. This happened in /tmp/chrono-dem/src/algorithms/DEMCubContactDetection.cu:384
Based on this error, you're probably right about some device synchronization behavior.

I'm going to keep looking for a solution but since I'm quite new to GPU computing this could probably take some time.

Julian
Reply all
Reply to author
Forward
0 new messages