Run time analysis for step-75 with matrix-free or matrix-based method

49 views
Skip to first unread message

yy.wayne

unread,
Oct 19, 2022, 4:38:34 AM10/19/22
to deal.II User Group
Hello everyone,

I modified step-75 a little bit and try to test it's runtime. However the result is kind of inexplainable from my point of view, especially on disproportionate assemble time and solve time. Here are some changes:
1. a matrix-based version of step75 is contructed to compare with matrix-free one.
2. no mesh refinement and no GMG, and fe_degree is constant across all cells within every cycle. Fe_degree adds one after each cycle. I make this setting to compare runtime due to fe_degree.
3. a direct solver on coareset grid. I think it won't affect runtime since coarest grid never change

For final cycle it has fe_degree=6 and DoFs=111,361.
For matrix-based method, overall runtime is 301s where setup system(84s) and solve system(214s) take up most. In step-75 solve system actually did both multigrid matrices assembling, smoother construction, and CG solving. Runtime of this case is shown:
matrix-based.png
On each level I print time assembling level matrix. The solve system is mostly decomposed to MG matrices assembling(83.9+33.6+...=133s), smoother set up(65s), coarse grid solve(6s) and CG solve(2.56). My doubt is why actual CG solve only takes 2.56 out of 301 seconds for this problem? The time spent on assembling and smoother construction account too much that they seems a burden.

For matrix-free method however, runtime is much smaller without assembling matrices. Besides, CG solve cost more because of more computation required by matrix-free I guess. But smoother construction time reduces significantly as well is out of my expectation.
matrix-free.png

Matrix-free framework saves assembling time but it seems too efficient to be real. The text in bold are my main confusion. May someone share some experience on matrix-free and multigrid methods' time consumption?

Best,
Wayne

Peter Munch

unread,
Oct 19, 2022, 5:10:27 AM10/19/22
to deal.II User Group
Hi Wayne,

your numbers make totally sense. Don't forget that you are running for high order: degree=6! The number of non-zeroes per element-stiffness matrix is ((degree + 1)^dim)^2 and the cost of computing the element stiffness matrix is even ((degree + 1)^dim)^3 if I am not mistaken (3 nested loop: i, j and q). Higher orders are definitely made for matrix-free algorithms!

Out of curiosity: how large is the setup cost of MG in the case of the matrix-free run? As a comment: don't be surprised that the setup costs are relatively high compared to the solution process: you are probably setting up a new Triangulation-, DoFHander-, MatrixFree-, ... -object per level. In many simulations, you can reuse these objects, since you don't perform AMR every time step. 

Peter

yy.wayne

unread,
Oct 19, 2022, 6:08:19 AM10/19/22
to deal.II User Group
Thanks for your reply Peter,

The matrix-free run is basic same as in step-75 except I substitute coarse grid solver. For fe_degree=6 without GMG and fe_degree in each level decrease by 1 for pMG, the solve_system() function runtime is 24.1s. It's decomposed to MatrixFree MG operators construction(1.36s), MatrixFree MG transfers(2.73s),  KLU coarse grid solver(5.7s), setting smoother_data and compute_inverse_diagonal for level matrices(3.4s) CG iteration(9.8s).

The two bold texts cost a lot more(133s and 62s, respectively) in matrix-based multigrid case. I noticed just as in step-16, the finest level matrix is assembled twice(one for system_matrix and one for mg_matrices[maxlevel]) so assembling time cost more.

Best,
Wayne

Martin Kronbichler

unread,
Oct 19, 2022, 7:40:01 AM10/19/22
to dea...@googlegroups.com

Dear Wayne,

I am a bit surprised by your numbers and find them rather high, at least with the chosen problem sizes. I would expect the matrix-free solver to run in less than a second for 111,000 unknowns on typical computers, not almost 10 seconds. I need to honestly say that I do not have a good explanation at this point. I did not write this tutorial program, but I know more or less what should happen. Let me ask a basic question first: Did you record the timings with release mode? The numbers would make more sense if they are based on the debug mode.

Best,
Martin

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/58dae75e-644e-49d8-bee2-e212c5184e1cn%40googlegroups.com.

yy.wayne

unread,
Oct 19, 2022, 7:50:32 AM10/19/22
to deal.II User Group
Thanks Martin !

I never considered about Debug or optimized mode before. Cmake result says I'm using Debug mode.

Some more information: The computaiton is done in deal.ii 9.4.0 oracle virtualBox, with 1 mpi process in qtcreator, and CPU is intel 10600kf. I didn't change the CMakeLists and just copy from examples, so I think by default it's debug mode.

Best,
Wayne

Martin Kronbichler

unread,
Oct 19, 2022, 7:52:36 AM10/19/22
to dea...@googlegroups.com

Dear Wayne,

For performance it certainly matters, because some components of our codes have more low-level checks in debug mode than others, and because the compiler optimizations do not have the same effect on all parts of our code. Make sure to test the release mode and see if it makes more sense. We'd be happy to help from there.

Best,
Martin

yy.wayne

unread,
Oct 19, 2022, 8:30:53 AM10/19/22
to deal.II User Group
I run both matrix-based and matrix-free mode with release mode, both speed up a lot. The matrix-free CG iteration speeds up 30 times compared to debug mode. Coarse grid solver doesn't get speed up because I'm using Trilinos interface.
Matrix-based:
matrix-based_opt.png
20.6 = mg_matrices

Matrix-free:
matrix-free_opt.png

Thank you so much Martin.

Best,
Wayne

yy.wayne

unread,
Oct 19, 2022, 9:32:55 AM10/19/22
to deal.II User Group
Besides, the Trilinos direct solver applied is Amesos_Lapack(a mistake). Changing to Klu therefore save more time.
Reply all
Reply to author
Forward
0 new messages