Running multiple programs at the same time

45 views
Skip to first unread message

Toni Vidal

unread,
Feb 11, 2020, 11:45:03 AM2/11/20
to deal.II User Group

Dear deal.ii users and developers,

I am currently running a deal.II based code thousands of times with different input parameters. Each  simulation takes about 30 seconds in a single processor. To do this, I have made a python script that runs 4 simulations at a time (using the multiprocessing module). However, each simulation takes about 60 seconds and my have 8 cores (Intel® Core™ i7-9700K CPU @ 3.60GHz × 8).  Is it not supposed to take approximately the same amount of because the processors are independent? Am I 

In order to isolate the problem I have executed deal.II's step 6 (with 12 cycles and 1e5 maximum solver steps) 1, 2 and 4 times at the same (using different terminals).

1 running programs ~ 32 s user time 
2 running programs ~ 44 s user time 
4 running programs ~ 80 s user time 

Why the programs does not take the same time even though my computer have 8 cores?
Any idea? Am I missing something obvious?

Ton Vidal

David Wells

unread,
Feb 11, 2020, 11:53:53 AM2/11/20
to deal.II User Group
Hi Toni,

I think that this is due to each individual program creating the same
number of threads as you have physical processors. Try adding

MultithreadInfo::set_thread_limit(1);

at the top of your code to prevent this from happening. Let us know if
this works!

Thanks,
David
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/767ae142-a02d-4bc2-a454-d4c385d1b217%40googlegroups.com.

Wolfgang Bangerth

unread,
Feb 11, 2020, 12:31:24 PM2/11/20
to dea...@googlegroups.com
On 2/11/20 9:53 AM, David Wells wrote:
> I think that this is due to each individual program creating the same
> number of threads as you have physical processors.

In other words, by default deal.II uses multiple threads to run some
things in parallel.

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Toni Vidal

unread,
Feb 11, 2020, 4:27:21 PM2/11/20
to deal.II User Group
Hi David,

That did not solve the problem with step 6. I got the same times.

Indeed I have installed deal.II without  threads (DEAL_II_WITH_THREADS = OFF) and I set in my .basrc OMP_NUM_THREADS=1.

Any other idea?


El dimarts, 11 febrer de 2020 17:53:53 UTC+1, David Wells va escriure:
> To unsubscribe from this group and stop receiving emails from it, send an email to dea...@googlegroups.com.

Wolfgang Bangerth

unread,
Feb 11, 2020, 4:37:31 PM2/11/20
to dea...@googlegroups.com
On 2/11/20 2:27 PM, Toni Vidal wrote:
>
> That did not solve the problem with step 6. I got the same times.
>
> Indeed I have installed deal.II without  threads (DEAL_II_WITH_THREADS =
> OFF) and I set in my .basrc OMP_NUM_THREADS=1.

There are many other possible reasons for contention. For example, most
finite element programs are limited by the transfer of data from memory
to the processor. If you have just one program running, then only one
program is using the memory bus and is getting its full speed. But if
you have multiple programs running, then they are all competing for the
same bandwidth on the memory bus, and they will also be slowed down by
more than a single program would be.

It could also be that you have, say, 4 cores on your processor but 3
other programs currently running. Then running one instance of your code
will get a full core, but if you ran four, the total of 7 codes would
have to compete for 4 cores, and all would be slowed down.

By the way, to see whether your program really is using only one thread,
you can run the program 'top' in a separate command line window. It will
show you which percentage of a processor each running job takes up.

Ahmad Shahba

unread,
Feb 11, 2020, 4:40:42 PM2/11/20
to dea...@googlegroups.com
I was just wondering how much I/O operations contribute to your timings. What would happen if you minimize the I/O activities, maybe comment out output_results method and see if anything changes?

Regards 
Ahmad

To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/36de50e0-355b-4337-a9ad-0512229a2f80%40googlegroups.com.

Toni Vidal

unread,
Feb 12, 2020, 4:25:04 AM2/12/20
to deal.II User Group
Hi Ahmad,

I commented the output results method but it is the times did not change significantly.

Toni

El dimarts, 11 febrer de 2020 22:40:42 UTC+1, Ahmad Shahba va escriure:

Toni Vidal

unread,
Feb 12, 2020, 5:13:04 AM2/12/20
to deal.II User Group
Hello Wolfgang,

The memory bus bandwidth seems the most reliable answer. But two more questions arise:
- Matrix-free methods would have this problems minimised?
- Will this affect also to MPI parallelization scalability in this computer?

Regards,
Toni
 

El dimarts, 11 febrer de 2020 22:37:31 UTC+1, Wolfgang Bangerth va escriure:

Wolfgang Bangerth

unread,
Feb 12, 2020, 9:16:10 AM2/12/20
to dea...@googlegroups.com
On 2/12/20 3:13 AM, Toni Vidal wrote:
> - Matrix-free methods would have this problems minimised?

Maybe. Probably.

> - Will this affect also to MPI parallelization scalability in this computer?

Yes. That's one of the issues with having modern many-core chips and running
lots of MPI processes on them.
Reply all
Reply to author
Forward
0 new messages