Liltle help with MUMPS

53 views
Skip to first unread message

Abbas Ballout

unread,
Nov 15, 2023, 6:09:31 AM11/15/23
to deal.II User Group
This isn't a dealii problem per-se. 

I am trying to run a number of simulations with different parameters of the same code with the underlying solver being MUMPS. I am using  mpirun --cpu-set  to bind and isolate the different simulations to different cores (As I believe I should) 

Profiling: 
The number of dofs are something like 262,144, the assemble function is called 25 times and the solve function is called 20 times. The solve function is using MUMPS nothing fancy is happening there. 
If I run mpirun -np2 the assemble times are 25.8 seconds and the MUMPS solve times are 51.7 seconds. 
If I run mpirun --cpu-set 0-1 -np 2, the assemble times are 26 seconds (unchanged) but the solve time are at 94.9 seconds! 
 
Is this normal and expected? 

Maybe relevant details: 
I have 4 physical cores. I am on ubuntu. PETScVectors and  PETScWrappers::SparseDirectMUMPS 


Wolfgang Bangerth

unread,
Nov 15, 2023, 11:39:59 AM11/15/23
to dea...@googlegroups.com


On 11/15/23 04:09, Abbas Ballout wrote:
>
> If I run mpirun -np2 the assemble times are 25.8 seconds and the MUMPS
> solve times are 51.7 seconds.
> If I run mpirun --cpu-set 0-1 -np 2, the assemble times are 26 seconds
> (unchanged) but the solve time are at 94.9 seconds!
> Is this normal and expected?

What happens if you use `--cpu-set 0,2` or other combinations?

Cores have memory channels that they often share with other cores. I
wonder whether, for example, cores 0 and 1 share a memory channel and
consequently step on each other's toes during the direct solve (which is
memory bound). Or they share SIMD execution units. It would be
interesting to see what happens if you choose other sets of cores to use.

Best
W.

blais...@gmail.com

unread,
Nov 15, 2023, 5:42:43 PM11/15/23
to deal.II User Group
In addition to what Wolfgang wrote,
Sometimes the OS will map the cores of a CPU in the following order:
Physical-Logical-Physical-Logical-Physical-Logical etc.
Consequently, if your processor supports hyperthreading through the use of logical core, running with core 0-1 means you are essentially running on a single physical core but using two logical cores. For intensive operations that use the same instruction set, this is generally never a good idea. In the present case though, since the assembly is fast but not the matrix solve, I would presume it is more a question of memory lane issues than anything else. Your CPU might be really memory-bandwith bound.

Abbas Ballout

unread,
Nov 16, 2023, 4:44:01 AM11/16/23
to deal.II User Group
Thanks W. and Blais. 

I ran all six 2-core combinations and I got the same results. I passed --cpu-set 0,1,2,3 to np2 and I still got the same thing.
I monitored the CPU usage and nothing is out of the ordinary.  

The good news is that I don't get this issue when I run on the desktop which has more cores and RAM.  
All good then c: 


Reply all
Reply to author
Forward
0 new messages