Poor parallel efficiency of FEValues::reinit() with Workstream
37 views
Skip to first unread message
Jonas Knoch
unread,
May 4, 2023, 6:44:30 AM5/4/23
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to deal.II User Group
Dear deal.II community,
I have implemented a multiscale finite element problem using deal.II. It includes PDEs on different levels, namely a macroscopic/effective PDE and so-called "cell problems" (not to be confused with cells of the grid), which have to be solved in each quadrature point of the macroscopic grid in order to compute the effective coefficient of the macroscopic PDE.
I am trying to parallelize parts of the code using the Workstream class, in particular the solution of the cell problems. Unfortunately, I observe suboptimal parallel efficiency when I compare the runtimes for different numbers of threads: as an example, the efficiency is only approx. 75% when I change from one thread to two threads with MultiThreadInfo::set_thread_limit(). I should mention that I only compare the runtimes for the parallelized part of the program, i.e. I expect an efficiency close to 100%.
Now I was able to trace back the efficiency problem to the following issue: when assembling the system matrix for a cell problem (which has to be done for each quadrature point of the macroscopic grid), I measure the following wall times for one call to scratch.fe_values.reinit(cell): - one thread: ~1.5e-6s, - two threads: ~5e-6s. If the time spent for the task (i.e. the local assembly) on a cell is comparably small, the parallel efficiency suffers.
However, the problem vanishes (regain of ~100% efficiency) if the time spent on one cell during assembly is increased, e.g. by using higher order finite elements or by simply putting the thread to sleep for a sufficiently long time. This is not really an option in my application.
I have also attached a MWE. A similar behavior can be observed in tutorial step 9 when the FE order in line 259 is lowered from 5 to 1.
My questions are: - What exactly is the reason for this behavior of the FEValues::reinit-function, or how can I find it out by myself? - Is there anything I can do to circumvent this problem?
Thank you in advance, any help/hint is highly appreciated.
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to deal.II User Group
Jonas,
What you are seeing is expected. There is an overhead associated with using multiple threads. Since the overhead is constant it looks like it disappears when you do more work. You probably want to give more work do to your threads to amortize the constant cost.