Weird cpu usage with step-3 and step-17

36 views
Skip to first unread message

Pai Liu

unread,
Jul 19, 2018, 12:41:27 PM7/19/18
to deal.II User Group
Hi all,

1. When I run step-3 on my laptop, which has 2 cores * 2 threads, 
I found the cpu usage of "step-3" is about 370% with the "top" command; that means that all 4 threads are used. 

So how does this happen? (as step-3 does not include any parallel computation setting).

2. Furthermore, when I run step-17 on my laptop, with "mpirun -np  N  ./step-17", with the top command I see N "step-17"s each takes about 200% cpu usage (two processes for one processor???).

3. I further tested step-17 on my PC, which has 4 cores * 2 threads. 
In my PC's linux system:
With "mpirun -np 1 ./step-17", with the top command I see 1 "step-17" which takes about 200% cpu usage;
With "mpirun -np 2 ./step-17", with the top command I see 2 "step-17"s, and each of them takes about 200% cpu usage;
With "mpirun -np 3 ./step-17", with the top command I see 3 "step-17"s, and each of them takes about 260% cpu usage; this means when I use "mpirun -np 3 ./step-17", it takes all my 8 threads.

However, I also try step-17 in my PC's windows system with linux virtual machine. In that virtual machine, With "mpirun -np N ./step-17", I find N "step-17" each takes about 100% cpu usage with the top
command. And I feel only this last case is unstandable. Also, I find even the compuation in my virtual machine is faster than that in my PC's linux system for step-17 (this is very weird).

So how to explain these cases?
I am really confused, and any help is appreciated.

Best,
Liu

Bruno Turcksin

unread,
Jul 19, 2018, 2:09:01 PM7/19/18
to deal.II User Group
Liu,


On Thursday, July 19, 2018 at 12:41:27 PM UTC-4, Pai Liu wrote:
So how does this happen? (as step-3 does not include any parallel computation setting).
By default, some of the functions in deal use multithreading to speed up the computation. So even if it looks serial, part of the code is using multithreading.
 

3. I further tested step-17 on my PC, which has 4 cores * 2 threads. 
In my PC's linux system:
With "mpirun -np 1 ./step-17", with the top command I see 1 "step-17" which takes about 200% cpu usage;
With "mpirun -np 2 ./step-17", with the top command I see 2 "step-17"s, and each of them takes about 200% cpu usage;
With "mpirun -np 3 ./step-17", with the top command I see 3 "step-17"s, and each of them takes about 260% cpu usage; this means when I use "mpirun -np 3 ./step-17", it takes all my 8 threads.
Yes that's normal see here

However, I also try step-17 in my PC's windows system with linux virtual machine. In that virtual machine, With "mpirun -np N ./step-17", I find N "step-17" each takes about 100% cpu usage with the top
command. And I feel only this last case is unstandable. Also, I find even the compuation in my virtual machine is faster than that in my PC's linux system for step-17 (this is very weird).
I guess in this case the multithreading was disabled or that TBB didn't get the right number of cores. It might depends on how many cores you allowed the virtual machine to use.

Best,

Bruno

Pai Liu

unread,
Jul 22, 2018, 9:00:35 AM7/22/18
to deal.II User Group
Hi Bruno,

Thank you for your imformation. For the problem that each MPI process takes about 200% CPU usage, I have found that it is caused by the openMP setting.
After I set the environment variable OMP_NUM_THREADS=1, each MPI process will take at most 100% cpu usage, and everything works fine for me.


Best,
Pai

Uwe Köcher

unread,
Jul 23, 2018, 5:28:52 AM7/23/18
to deal.II User Group
Hei Liu,


On Sunday, 22 July 2018 15:00:35 UTC+2, Pai Liu wrote:
Thank you for your imformation. For the problem that each MPI process takes about 200% CPU usage, I have found that it is caused by the openMP setting.
After I set the environment variable OMP_NUM_THREADS=1, each MPI process will take at most 100% cpu usage, and everything works fine for me.


exactly, this is what I'm also doing for that case.

As far as I can report on your issue: some years ago the blas package (?) was compiled in serial only and setting the mpi threads
in the init/finalize function to one was enough. Currently you have to do both to avoid having multiple threads running on linux
systems.

A second note: usually threads are recognized as cores nowadays. So you should see 4 cpu's in top. Analyzing
  cat /proc/cpuinfo
shows you which CPU number belongs to which real core. Look for "processor" and "core id".

Best
  Uwe

Pai Liu

unread,
Jul 25, 2018, 9:29:13 AM7/25/18
to deal.II User Group
Hi Uwe,

Thank you for your reply. Your comments help my to understand the problem better.

Best,
Pai
Reply all
Reply to author
Forward
0 new messages