Hello,
Lots of good information here.
Just wanted to consult.
For very large simulations, say 20 million+ cells, what combination yields the best results on a machine/cluster that has 20 physical cores?
Is it a hybrid combo of MPI+Open MPI?
OpenMPI on it's own to single Mesh a large simulation doesn't seem on it's own very efficient.
It is creating 20 individual mesh pieces and using MPI them solve simultaneously? When I do this, I find that CPU utilization fluctations and is not 100%. I know this way creates many open boundaries and "communication penalty".
From reading the userguide, it appears the Optimal # of threads for Open MP is 4. Therefore, should I be breaking up my large simulation into 5 mesh pieces, assigning MPI to them, and then let OpenMP do it's work for each of the 5 mesh pieces with Open MP assigning 4 cores for each of the 5 mesh pieces?
I have not done any benchmarks yet, but with this method I seem to be able to get full CPU load (which is what matters for simulations I think)
There is also opportunity to daisy chain machines together, so I could combine 2x20 two core machines. What's the fastest combo here?
I guess ultimately my question is, is there a point where OpenMP + MPI is faster than simply MPI it's self. I think that beyond a certain # of MPI processes, I see large RAM usage and inefficiencies especially if I can't balance the mesh # exactly to 20 segments.
Thanks in advance.