Tips on writing "versatile" assembly function

94 views
Skip to first unread message

blais...@gmail.com

unread,
Jan 4, 2021, 3:31:45 PM1/4/21
to deal.II User Group
Dear all,
I wish you all an happy new year!

One problem we always end up facing with FEM problems is that, as program grow, more and more features are added to the equations. This leads to multiple variation of the same equations (for example, Navier-Stokes with Newtonian and non-Newtonian viscosity, etc.). In turn, this leads to assembly functions for the non-linear systems of equations which branch out to multiple possibilities.

I was wondering what is the best approach to deal with multiple "switches" in an assembly function. The naïve approach would be to use if conditions, but I have a feeling that if they appear down in the assembly (for example the loop on dofs), this would lead to significantly higher assembly cost because the loops would spend time just checking if.

From experience, I have seen the following approaches:
1- If or switches in the assembly routine. Simplest/most versatile way, but adds significant overhead
2- Template the assembly function with the parameters. I think this would lead to zero additional cost, but as the number of parameters grows, this can become more and more complex since the various possibilities have to be known at compile time.
3- Use generic interface objects for common elements (for example a viscosity class to calculate viscosity, etc.) and use inheritance to specialise the model. I think this could be inefficient because of the need to extensively look up through the vtable.

What is the general consensus in the deal.II community on this problem? What's the best way to deal with multiple possibilities in the same assembly function? I would be very interested in hearing the perspective of the ASPECT author since it has such a large range of sub models.


Bruno Turcksin

unread,
Jan 5, 2021, 3:15:06 PM1/5/21
to deal.II User Group
Bruno,

If you are worry about the cost of looking up though the vtable, I think that you are stuck using template. So either use 2 or 3 and CRTP. But first of all, I think that you should profile your code and verify that this is a problem. There is no point in spending time refactoring your code, if your are going to gain less than 1%...

Best,

Bruno

blais...@gmail.com

unread,
Jan 5, 2021, 4:28:03 PM1/5/21
to deal.II User Group
Bruno,

Thanks, you are right. As always, measure first and then optimize after. No point in optimising stuff that costs nothing...

Timo Heister

unread,
Jan 5, 2021, 4:34:19 PM1/5/21
to dea...@googlegroups.com
Hi Bruno,

We mitigate the performance problem by making the decision per cell in ASPECT:
We have a set of "assemblers" that are executed one after each other
per cell. This means the vtable access cost is small compared to the
actual work.
See
https://github.com/geodynamics/aspect/blob/b9add5f53172aac18bdbb19d14ca266e88c491dd/include/aspect/simulator/assemblers/interface.h#L493-L515
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/a501667f-4cb7-4894-9364-fb2eaa47c0ecn%40googlegroups.com.



--
Timo Heister
http://www.math.clemson.edu/~heister/

blais...@gmail.com

unread,
Jan 5, 2021, 4:48:00 PM1/5/21
to deal.II User Group
Hi Timo,

I understand. It makes a lot of sense.
Thanks!
Bruno

Timo Heister

unread,
Jan 5, 2021, 6:18:38 PM1/5/21
to dea...@googlegroups.com
What I forgot to say:
We used to have something like

if (use_anisotropic_viscosity==true)
cell(i,j) += viscosity_tensor * ....
else
cell(i,j) += viscosity_constant * ....

and improved the performance by making two separate assemblers (note
that there is no function call/vtable lookup here, just an "if"). I
don't remember how big the difference was, so I went back and found
the PR:
https://github.com/geodynamics/aspect/pull/2139
Rene claims 25% fewer instructions.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/8748edb0-c096-4830-92e3-4a50baae00e9n%40googlegroups.com.

Doug Shi-Dong

unread,
Jan 5, 2021, 6:26:56 PM1/5/21
to deal.II User Group
Hello Prof. Blais,

Optimizing code is always fun, so I've had this discussion multiple times with a colleague. Dr. Turcksin comment was also our conclusion. Templating seems to be the way to go, but only on options/variations where mispredicted branches actually slow down your performance since branch is known at compile time. However, from my understanding, current processors are pretty good with branch prediction from if-statements, that is if you turn on either Newtonian or non-Newtonian viscosity, and you don't flip-flop between them, the cost is relatively low.

Another interesting technique you might find interesting is: branchless programming. It basically computes the "if-statement" and discards its results when unnecessary. This can be useful when your if-statement is deep inside your loops and the computed operation is inexpensive. It's also easy to implement with very little refactoring.

Lastly, this might not be as helpful, but C++20 has the "un/likely" attribute that can give hints the compiler to favour a branch.

Best regards,

Doug

Doug Shi-Dong

unread,
Jan 5, 2021, 6:33:11 PM1/5/21
to deal.II User Group
That's interesting. Seems like it was more about inlining than branch prediction. Surprising how much difference it made.

Corbin Foucart

unread,
Jun 11, 2023, 2:41:08 PM6/11/23
to deal.II User Group
Hello everyone, 

I"m encountering a similar question for large 3D incompressible fluid solver with many assembly options (which I implement via inheritance). I'd like to investigate how much class/templated inheritance plays a role in performance---Bruno, how do you go about profiling something like this as you suggest (without having to implement an entirely new class free of inheritance for comparison)?

Thank you,
Corbin

blais...@gmail.com

unread,
Jun 12, 2023, 12:13:57 AM6/12/23
to deal.II User Group
Dear Corbin,
I mostly used callgrind to identify bottlenecks.
Currently, our matrix assembly and RHS assembly codes use virtualization at the cell level. I think this is a very adequate equilibrium. We keep all the flexibility, but we have not seen a drop in performance. We use the WorkStream paradigm and have scratch / copy data structures.

Hope that helps!
Best
Bruno
Reply all
Reply to author
Forward
0 new messages