Sparsity pattern

Petr Krysl

unread,

Mar 12, 2024, 12:58:37 PM3/12/24

to deal.II User Group

It would appear that the sparsity pattern is not (and cannot be) computed in parallel. That is my reading of the code. Am I wrong?

Bruno Turcksin

unread,

Mar 12, 2024, 1:39:22 PM3/12/24

to deal.II User Group

Hello,

SparsityPattern is used for deal.II own matrices which do not support MPI. As far as I can tell, SparsityPattern does not use multithreading and I don't think adding elements is threadsafe. So you are right.

Best,

Bruno

Petr Krysl

unread,

Mar 12, 2024, 2:17:56 PM3/12/24

to deal.II User Group

Thanks, that is a helpful confirmation.

Would you know if there is any multithreading sparsity pattern computation available anywhere?

Is something like this planned for dealii?

Bruno Turcksin

unread,

Mar 12, 2024, 3:08:29 PM3/12/24

to dea...@googlegroups.com

In Trilinos, the Tpetra library has Graph which is basically a SparsityPattern. Tpetra uses Kokkos (which abstracts multithreading and GPU support) everywhere so I would expect that they use Kokkos to build the Graph too. Multithreading for dealii's SparsityPattern is not planned so far. It could be done but building the SparsityPattern is usually extremely fast compared to the rest of the code and so, it's not worth doing.

Bruno

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "deal.II User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dealii/hq6L0-qgpbE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/d9905f95-ef78-4564-a605-e10c2c19d981n%40googlegroups.com.

Petr Krysl

unread,

Mar 12, 2024, 5:15:22 PM3/12/24

to deal.II User Group

Good point about Tpetra: I will check it out.

> SparsityPattern is usually extremely fast

If it is the only sequential of a program executed with multiple threads, it can still control the scalability.

Which reminds me: In your WorkStream paper, you do not report the time taken to construct a sparsity pattern.

Could you tell me how expensive it was compared to the sequential assembly of the numbers into the matrix?

Many thanks,

Petr

Petr Krysl

unread,

Mar 12, 2024, 5:22:34 PM3/12/24

to deal.II User Group

That is a good point about Tpetra: I will check it out.

> SparsityPattern is usually extremely fast

It can still become a bottleneck, provided everything else runs in parallel. ;-)

Which reminds me: I wanted to ask you about the WorkStream paper: The sparsity pattern computation was not reported.

Could you tell me how expensive it was compared to the sequential assembly of the matrix?

For linear problems, both the sparsity pattern computation and the computation of the entries need to be done.

Therefore, It does not really help if the computation of the entries is run in parallel, when the sparsity pattern is done in serial.

Isn't that right?

On Tuesday, March 12, 2024 at 12:08:29 PM UTC-7 bruno.t...@gmail.com wrote:

Petr Krysl

unread,

Mar 12, 2024, 5:22:38 PM3/12/24

to deal.II User Group

Oops. I was wondering what happened to my response, and I did not realize it was subject to approval and posted another

attempt at framing the answer. Sorry.

Bruno Turcksin

unread,

Mar 12, 2024, 5:46:07 PM3/12/24

to dea...@googlegroups.com

I am not sure why but I need to approve all of your messages. Usually we only need to approve the first time someone posts. Not sure what's going on here.

I don't remember how long the computation of the sparsity pattern took but it was small. The reason this part is not parallelized in deal.II is because nobody found out that this was a bottleneck. If you fill your matrix with ones, you probably want the sparsity pattern computation to be multithreaded. However, if you are assembling a mass matrix on a non-uniform mesh, you have so much more work to do that it doesn't matter if you build your sparsity pattern in serial. In theory, I agree everything should be multithreaded but in practice, it has not been worth the effort so far. If the performance of the assembly is really critical, you should use the MatrixFree class instead of building a matrix.

Bruno

To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/5f66ea8c-1725-4dab-b026-fb7e19e92e14n%40googlegroups.com.

Petr Krysl

unread,

Mar 12, 2024, 8:38:12 PM3/12/24

to deal.II User Group

In Matlab calling sparse() is equivalent to building a sparsity pattern (just forget about the actual values)

from the I, J row and column indices. And that is NOT a negligible expense. It is a sizeable fraction of

the cost of building a Laplacian matrix on triangles, for instance.

Bruno Turcksin

unread,

Mar 12, 2024, 9:59:19 PM3/12/24

to dea...@googlegroups.com

If building the sparsity pattern is the bottleneck in your code, feel free to open an issue and we can take a look at how we can improve it. Though if the assembly of the matrix is performance critical, using the MatrixFree class is the way to go.

Bruno

To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/a5277543-bc58-4af6-a2c1-3930c5100553n%40googlegroups.com.

Petr Krysl

unread,

Mar 12, 2024, 10:29:49 PM3/12/24

to deal.II User Group

Thank you. I have actually no complaints about the performance.

I just want to know what is going on in deal.ii for a paper I am working on.

Petr Krysl

unread,

Mar 13, 2024, 12:46:55 AM3/13/24

to deal.II User Group

Here is a data point:

```

pkrysl@firenze:~/dealii-9.5.2/examples/step-3$ make
Consolidate compiler generated dependencies of target step-3
[ 50%] Building CXX object CMakeFiles/step-3.dir/step-3.cc.o
[100%] Linking CXX executable step-3
[100%] Built target step-3
pkrysl@firenze:~/dealii-9.5.2/examples/step-3$ time ./step-3
Number of active cells: 262144
Number of degrees of freedom: 263169
Took: 5.520668
DEAL:cg::Starting value 0.00779724
DEAL:cg::Convergence step 583 value 7.44264e-09

real 1m2.503s
user 4m45.171s
sys 0m2.427s
pkrysl@firenze:~/dealii-9.5.2/examples/step-3$

```

I measured

```

clock_t start, end;
start = clock();

DoFTools::make_sparsity_pattern(dof_handler, dsp);
end = clock();
printf("Took: %f\n", (float)((end - start) / (float)CLOCKS_PER_SEC));

```

5.5 seconds of sparsity pattern construction compared to

62 seconds of total run time is definitely not negligible, wouldn't you agree?

Bruno Turcksin

unread,

Mar 13, 2024, 9:56:57 AM3/13/24

to dea...@googlegroups.com

I agree, in that case it's not negligible. Be aware that by default deal.II is multithreaded, so if you ran deal.II without explicitly disabling multithreading or forcing deal.II to use a single core, the code was using all the cores available when it could.

Best,

Bruno

To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/444f9972-91ef-43f9-8019-14d9784dd3c4n%40googlegroups.com.

Petr Krysl

unread,

Mar 13, 2024, 10:48:34 AM3/13/24

to deal.II User Group

Something is very wrong. I checked with my own code on the same machine:

heat equation with 1M quads, the conductivity matrix

is completely assembled in 1.9 seconds. The dealii mesh is smaller, and just the sparsity pattern takes 3x as long.

And this can't be multithreading, can it?

Bruno Turcksin

unread,

Mar 13, 2024, 11:03:50 AM3/13/24

to dea...@googlegroups.com

Yes, something is strange. Are you running in release mode? deal.II is a lot slower in debug mode because of all the asserts we have. Can you try `make release` in the step-3 directory. We actually use step-3 as one of our performance benchmark using 2 threads. As you can see here, the assembly takes 0.6 second.

To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/c4ec7935-a079-4074-bad5-9e05066df645n%40googlegroups.com.

Petr Krysl

unread,

Mar 13, 2024, 11:17:16 AM3/13/24

to deal.II User Group

Oh, wow! You must be doing a great deal of checking in debug mode!!!

All fixed now:

pkrysl@firenze:~/dealii-9.5.2/examples/step-3$ time DEAL_II_NUM_THREADS=1 ./step-3
Number of active cells: 1048576
Number of degrees of freedom: 1050625
Sparsity: 0.496119
As!sembly: 0.741549
DEAL:cg::Starting value 0.00390244
DEAL:cg::Failure step 1000 value 4.73155e-07
terminate called after throwing an instance of 'dealii::SolverControl::NoConvergence'

But: the sparsity pattern is definitely not negligible compared to the overall assembly time!

Petr Krysl

unread,

Mar 13, 2024, 11:56:17 AM3/13/24

to deal.II User Group

Btw: the assembly does NOT appear to be multithreaded. Only the CG solve...

Bruno Turcksin

unread,

Mar 13, 2024, 12:37:35 PM3/13/24

to dea...@googlegroups.com

In step-3, the user assembles the matrix by writing its own for loop and so it's not multithreaded. To use multithreaded assembly, you need to use mesh_loop like in step-16 for example.

deal.II own functions are sometimes multithreaded sometimes they aren't. Usually if someone (most often a developer) will get a significant speedup by multithreading a function, then the function is multithreaded. Since we are not keeping track of which functions are multithreaded and which ones are not, I cannot tell you what's multithreaded in step-3. The solver is multithreaded, maybe some other functions are multithreaded too, maybe not.

To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/3a42fdec-2d75-4b3e-816b-ea0b32073377n%40googlegroups.com.

Petr Krysl

unread,

Mar 13, 2024, 12:39:33 PM3/13/24

to deal.II User Group

Thanks. That makes sense.

By the way: How do you handle multithreading at one level of the library interfering with multithreading in another?

For instance, blas using its own thread scheduling...

Bruno Turcksin

unread,

Mar 13, 2024, 1:24:03 PM3/13/24

to dea...@googlegroups.com

This is a difficult problem. We have talked about it before but if I remember correctly, we couldn't find a satisfying solution and we just pretend that there is no problem. It has happened that inside a multithreaded part of deal.II there is a call to blas and you oversubscribe the machine. Since we don't use OpenMP and usually the blas library does, you can play with OMP_NUM_THREADS and DEAL_II_NUM_THREADS to get around that issue.

To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/02762c37-4345-4cab-9a4a-fb152e744a3cn%40googlegroups.com.

Reply all

Reply to author

Forward