Poor parallel efficiency of FEValues::reinit() with Workstream

37 views
Skip to first unread message

Jonas Knoch

unread,
May 4, 2023, 6:44:30 AM5/4/23
to deal.II User Group
Dear deal.II community,

I have implemented a multiscale finite element problem using deal.II. It includes PDEs on different levels, namely a macroscopic/effective PDE and so-called "cell problems" (not to be confused with cells of the grid), which have to be solved in each quadrature point of the macroscopic grid in order to compute the effective coefficient of the macroscopic PDE.

I am trying to parallelize parts of the code using the Workstream class, in particular the solution of the cell problems. Unfortunately, I observe suboptimal parallel efficiency when I compare the runtimes for different numbers of threads: as an example, the efficiency is only approx. 75% when I change from one thread to two threads with MultiThreadInfo::set_thread_limit(). I should mention that I only compare the runtimes for the parallelized part of the program, i.e. I expect an efficiency close to 100%.

Now I was able to trace back the efficiency problem to the following issue: when assembling the system matrix for a cell problem (which has to be done for each quadrature point of the macroscopic grid), I measure the following wall times for one call to scratch.fe_values.reinit(cell):
- one thread: ~1.5e-6s,
- two threads: ~5e-6s.
If the time spent for the task (i.e. the local assembly) on a cell is comparably small, the parallel efficiency suffers.

However, the problem vanishes (regain of ~100% efficiency) if the time spent on one cell during assembly is increased, e.g. by using higher order finite elements or by simply putting the thread to sleep for a sufficiently long time. This is not really an option in my application.

I have also attached a MWE. A similar behavior can be observed in tutorial step 9 when the FE order in line 259 is lowered from 5 to 1.

My questions are:
- What exactly is the reason for this behavior of the FEValues::reinit-function, or how can I find it out by myself?
- Is there anything I can do to circumvent this problem?

Thank you in advance, any help/hint is highly appreciated.

Best,
Jonas
parallel_efficiency_issue.cc

Bruno Turcksin

unread,
May 5, 2023, 8:55:25 AM5/5/23
to deal.II User Group
Jonas,

What you are seeing is expected. There is an overhead associated with using multiple threads. Since the overhead is constant it looks like it disappears when you do more work. You probably want to give more work do to your threads to amortize the constant cost.

Best,

Bruno
Reply all
Reply to author
Forward
0 new messages