--
You received this message because you are subscribed to the Google Groups "ProjectChrono" group.
To unsubscribe from this group and stop receiving emails from it, send an email to projectchron...@googlegroups.com.
To post to this group, send email to projec...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Francesco
Let me comment some points here, even if the "parallel" version of Chrono is not my contribution (I am mostly involved in the development of the "serial" version).
1
The fast SOR_MULTITHREAD solver, which is available in the default "serial" Chrono, was experimental and it is non-deterministic in the sense that different processors can give different results (same issue for different runs), therefore it should not be used for scientific work. Prefer using SSOR or SOR for tests, and BARZILAIBORWEIN for better precision at the cost of slower performance. Note that BARZILAIBORWEIN already uses some OpenMP multithreading, although I admit that it could be optimized.
2
The serial chrono, except for those few OpenMP optimizations, uses mostly one core. This said, I am surprised that the parallel version is loading just one core in your tests...
3
The ChSystem::SetParallelThreadNumber() setting is a bit obsolete. It was used only by that SOR_MULTITHREADING code I mentioned before. Otherwise, the N of threads should be automatically set by OpenMP in the rest of the code.
4
Yes, at the moment not all features of Chrono serial are available in Chrono parallel. Sorry for this issue. In future we'll port more functionalities in the parallel part. Do you need some specific missing feature apart from that AABB stuff that you already mentioned?
By the way: nice work in your video...
Best regards
Alessandro Tasora
--
You received this message because you are subscribed to the Google Groups "ProjectChrono" group.
To unsubscribe from this group and stop receiving emails from it, send an email to projectchrono+unsubscribe@googlegroups.com.
Hi Francesco,
One of the settings that you should *not* leave at default value is the number of bins used for the broad-phase collision detection. If I recall correctly, the default is something like 20x20x20 or 10x10x10. You will likely need quite a bit more than that, but will need to experiment to find the sweet spot. As a rule of thumb, I suggest you try some values that lead to roughly 2-4 objects per bin per direction (in other words, bins with 8-64 objects each).
Indeed, Chrono::Parallel does not use the collision detection from Bullet. Instead it uses a custom grid-based algorithm for broad-phase and a combination of Minkovski Portal Refinement + analytical for narrow-phase. The broad-phase algorithm used in
Chrono::Parallel (based on bins on a uniform grid which is assumed to be aligned with the global reference frame) is nowhere near as sophisticated as it should be and works best for objects of roughly the same size and for simulations where there is little
"flow" of the collision shapes: situations like those encountered in terramechanics when modeling soil as granular material. Which means that, for the type of simulations you are doing, a bin setting that works well at the beginning of the simulation may
not be all that great when bodies coalesce.
You can in principle adjust the bin setting as the simulation progresses. However, the current implementation assumes a uniform grid (i.e. bins of equal size in a given direction), so this will anyway be sub-optimal for your problems. One solution is to allow for a non-uniform grid (and allow the user to control this, in a problem-dependent manner, as the simulation progresses) -- a modification along these lines is on my todo list, but I do not know when I'll get to it.
By the way, you can change the number of bins using something like:
my_system.GetSettings()->collision.bins_per_axis = vec3(bins_x, bins_y, bins_z);
I'm very interested to see how much you can get out of using a more appropriate binning, at least at the beginning of your simulations, before significant aggregation occurs. I also expect that you will see diminishing returns from parallelizing functions such as f_Store_AABB_AABB_Intersection() and f_Count_AABB_AABB_Intersection() since those will have little work left per active bin.
Having said that, it would be great if you could share with us the modifications you've experimented with. In particular, which parallel for loops did you find benefited from dynamic scheduling? A while ago (must be 2-3 years now) we did some extensive
testing trying to find the best scheduling for the various parallel for loops in Chrono::Parallel. What is implemented right now is the best compromise we converged on at that time, using certain benchmark problems that unfortunately may not be representative
of what you are doing.
You are also correct that Chrono::Parallel supports only a subset of what can be done with the serial Chrono solver (most notably, here is no support for FEA).
Indeed, removing bodies from a Chrono::Parallel system is not implemented (this has to do with the underlying data structures used in the parallel code); there is some experimental code to allow that, but it had to be commented out due to some bugs.
Also, not all queries you can make for a serial Chrono system can be made for a Chrono::Parallel system, or at least not with the same API. For your specific question, I think there should be a way to access the current AABBs. If that is critical for you,
I can look into this and give you a definite answer.
Bottom line is that Chrono::Parallel is overdue for a refactoring and revamping. Feedback such as the one you provided and taking into account applications from domains other than those we typically work with are very useful. Thank you for this and keep it coming :-)
Best,
Radu
Francesco,
For now a couple more pointers to help with your current simulations.
I agree with your assessment of what was happening when using a small number of bins. Indeed, the binning works by first finding a global AABB (which encompasses all AABBs of collision shapes in the system) and then dividing that large box in equal-size
bins. The small number of bins and the clusters in your simulation effectively led to a large number of sequential tests within the crowded bins.
As you can probably see, this is still an issue if you have bodies which get farther and farther from your domain of interest. There is one trick in Chrono::Parallel that you can use to address this issue, but you will need to ensure that does not affect the physics of the problem you're trying to solve. Indeed, you can specify a large AABB that will define a boundary that your bodies cannot cross. If you enable this option, as soon as a body reaches the boundary of this AABB, the body will be deactivated (it will not be removed from the system, but it will become a zombie). This feature ensures that the computational domain stays bounded and a single body being "ejected" does not negatively affect performance of the collision detection.
You can enable this feature with code like this:
my_system.GetSettings()->collision.use_aabb_active = true;
my_system.GetSettings()->collision.aabb_min = bmin;
my_system.GetSettings()->collision.aabb_max = bmax;
where (bmin, bmax) are triplets that define the "active area" AABB.
In your case, you probably need to specify such a box that is large enough and then ignore any zombie body from computation of gravitational forces (hopefully, this is an acceptable approximation). You can test whether a body is "active" or a "zombie"
by testing the corresponding entry in the vector:
my_system.data_manager->host_data.active_rigid
using the body id as index. If you have a shared pointer to a body, you can do a test like so:
if (my_system.data_manager->host_data.active_rigid[body->GetId()] != 0)
// active body
else
// zombie
Removing bodies from a Chrono::Parallel system is not straightforward. This is because how data is organized in arrays with different levels of indirection. Removing a body would mess with the indexing of bodies and shapes and would require recalculating all indices.
Finally, shape AABBs. These are accessible in vectors in the data manager:
my_system.data_manager->host_data.aabb_min
my_system.data_manager->host_data.aabb_max
However, these arrays are indexed by a shape id. So you can loop over all shapes in the system, access its current AABB, find the id of the body associated with that shape, and so on. But not the other way around (at least not efficiently). You can get
a better picture of this if you look at the implementation of a function such as ChCAABBGenerator::GenerateAABB in ChAABBGenerator.cpp (line 110).
One more thing: the AABBs stored in the above two arrays are shifted such that they all lie in the 1st octant (i.e. they all have only positive values). You can find the current value of the offset in
my_system.data_manager->measures.collision.global_origin
and undo the offset if needed (i.e. apply the inverse of the shift implemented in ChBroadphase::OffsetAABB).
Having said that, I think it'd be possible to provide an implementation for ChCollisionModelParallel::GetAABB (currently absent). I'll look into this.
--Radu