Sparsity pattern

156 views
Skip to first unread message

Petr Krysl

unread,
Mar 12, 2024, 12:58:37 PM3/12/24
to deal.II User Group
It would appear that the sparsity pattern is not (and cannot be) computed in parallel. That is my reading of the code. Am I wrong?

Bruno Turcksin

unread,
Mar 12, 2024, 1:39:22 PM3/12/24
to deal.II User Group
Hello,

SparsityPattern is used for deal.II own matrices which do not support MPI. As far as I can tell, SparsityPattern does not use multithreading and I don't think adding elements is threadsafe. So you are right.

Best,

Bruno

Petr Krysl

unread,
Mar 12, 2024, 2:17:56 PM3/12/24
to deal.II User Group
Thanks, that is a helpful confirmation.
Would you know if there is any multithreading sparsity pattern computation available anywhere?
Is something like this planned for dealii?

Bruno Turcksin

unread,
Mar 12, 2024, 3:08:29 PM3/12/24
to dea...@googlegroups.com
In Trilinos, the Tpetra library has Graph which is basically a SparsityPattern. Tpetra uses Kokkos (which abstracts multithreading and GPU support) everywhere so I would expect that they use Kokkos to build the Graph too. Multithreading for dealii's SparsityPattern is not planned so far. It could be done but building the SparsityPattern is usually extremely fast compared to the rest of the code and so, it's not worth doing.

Bruno

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
---
You received this message because you are subscribed to a topic in the Google Groups "deal.II User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dealii/hq6L0-qgpbE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dealii+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/d9905f95-ef78-4564-a605-e10c2c19d981n%40googlegroups.com.

Petr Krysl

unread,
Mar 12, 2024, 5:15:22 PM3/12/24
to deal.II User Group
Good point about  Tpetra: I will check it out.

>  SparsityPattern is usually extremely fast
If it is the only sequential of a program executed with multiple threads, it can still control the scalability.

Which reminds me: In your WorkStream paper, you do not report the time taken to construct a sparsity pattern.
Could you tell me how expensive it was compared to the sequential assembly of the numbers into the matrix?

Many thanks,

Petr

Petr Krysl

unread,
Mar 12, 2024, 5:22:34 PM3/12/24
to deal.II User Group
That is a good point about  Tpetra: I will check it out.

>  SparsityPattern is usually extremely fast 
It can still become a bottleneck, provided everything else runs in parallel. ;-)
Which reminds me: I wanted to ask you about the WorkStream paper: The sparsity pattern computation was not reported.
Could you tell me how expensive it was compared to the sequential assembly of the matrix?
For linear problems, both the sparsity pattern computation and the computation of the entries need to be done.
Therefore, It does not really help if the computation of the entries is run in parallel, when the sparsity pattern is done in serial.
Isn't that right?

On Tuesday, March 12, 2024 at 12:08:29 PM UTC-7 bruno.t...@gmail.com wrote:

Petr Krysl

unread,
Mar 12, 2024, 5:22:38 PM3/12/24
to deal.II User Group
Oops. I was wondering what happened to my response, and I did not realize it was subject to approval and posted another 
attempt at framing the answer. Sorry.

Bruno Turcksin

unread,
Mar 12, 2024, 5:46:07 PM3/12/24
to dea...@googlegroups.com
I am not sure why but I need to approve all of your messages. Usually we only need to approve the first time someone posts. Not sure what's going on here.

I don't remember how long the computation of the sparsity pattern took but it was small. The reason this part is not parallelized in deal.II is because nobody found out that this was a bottleneck. If you fill your matrix with ones, you probably want the sparsity pattern computation to be multithreaded. However, if you are assembling a mass matrix on a non-uniform mesh, you have so much more work to do that it doesn't matter if you build your sparsity pattern in serial. In theory, I agree everything should be multithreaded but in practice, it has not been worth the effort so far. If the performance of the assembly is really critical, you should use the MatrixFree class instead of building a matrix.

Bruno

Petr Krysl

unread,
Mar 12, 2024, 8:38:12 PM3/12/24
to deal.II User Group
In Matlab calling sparse() is equivalent to building a sparsity pattern (just forget about the actual values)
from the I, J row and column indices. And that is NOT a negligible expense. It is a sizeable fraction of
the cost of building a Laplacian matrix on triangles, for instance.

Bruno Turcksin

unread,
Mar 12, 2024, 9:59:19 PM3/12/24
to dea...@googlegroups.com
If building the sparsity pattern is the bottleneck in your code, feel free to open an issue and we can take a look at how we can improve it. Though if the assembly of the matrix is performance critical, using the MatrixFree class is the way to go.

Bruno


Petr Krysl

unread,
Mar 12, 2024, 10:29:49 PM3/12/24
to deal.II User Group
Thank you. I have actually no complaints about the performance. 
I just want to know what is going on in deal.ii for a paper I am working on.

Petr Krysl

unread,
Mar 13, 2024, 12:46:55 AM3/13/24
to deal.II User Group
Here is a data point: 
```
pkrysl@firenze:~/dealii-9.5.2/examples/step-3$ make
Consolidate compiler generated dependencies of target step-3
[ 50%] Building CXX object CMakeFiles/step-3.dir/step-3.cc.o
[100%] Linking CXX executable step-3
[100%] Built target step-3
pkrysl@firenze:~/dealii-9.5.2/examples/step-3$ time ./step-3
Number of active cells: 262144
Number of degrees of freedom: 263169
Took: 5.520668
DEAL:cg::Starting value 0.00779724
DEAL:cg::Convergence step 583 value 7.44264e-09

real    1m2.503s
user    4m45.171s
sys     0m2.427s
pkrysl@firenze:~/dealii-9.5.2/examples/step-3$

```
I measured
```
clock_t start, end;
  start = clock();

  DoFTools::make_sparsity_pattern(dof_handler, dsp);
  end = clock();
  printf("Took: %f\n", (float)((end - start) / (float)CLOCKS_PER_SEC));
```
5.5 seconds of sparsity pattern construction compared to
62 seconds of total run time is definitely not negligible, wouldn't you agree?

Bruno Turcksin

unread,
Mar 13, 2024, 9:56:57 AM3/13/24
to dea...@googlegroups.com
I agree, in that case it's not negligible. Be aware that by default deal.II is multithreaded, so if you ran deal.II without explicitly disabling multithreading or forcing deal.II to use a single core, the code was using all the cores available when it could.

Best,

Bruno

Petr Krysl

unread,
Mar 13, 2024, 10:48:34 AM3/13/24
to deal.II User Group
Something is very wrong. I checked with my own code on the same machine: 
heat equation with 1M quads, the conductivity matrix 
is completely assembled in 1.9 seconds. The dealii mesh is smaller, and just the sparsity pattern takes 3x as long.
And this can't be multithreading, can it?

Bruno Turcksin

unread,
Mar 13, 2024, 11:03:50 AM3/13/24
to dea...@googlegroups.com
Yes, something is strange. Are you running in release mode? deal.II is a lot slower in debug mode because of all the asserts we have. Can you try  `make release` in the step-3 directory. We actually use step-3 as one of our performance benchmark using 2 threads. As you can see here, the assembly takes 0.6 second.

Petr Krysl

unread,
Mar 13, 2024, 11:17:16 AM3/13/24
to deal.II User Group
Oh, wow! You must be doing a great deal of checking in debug mode!!!

All fixed now:

pkrysl@firenze:~/dealii-9.5.2/examples/step-3$ time DEAL_II_NUM_THREADS=1 ./step-3
Number of active cells: 1048576
Number of degrees of freedom: 1050625
Sparsity: 0.496119
As!sembly: 0.741549
DEAL:cg::Starting value 0.00390244
DEAL:cg::Failure step 1000 value 4.73155e-07
terminate called after throwing an instance of 'dealii::SolverControl::NoConvergence'

But: the sparsity pattern is definitely not negligible compared to the overall assembly time!

Petr Krysl

unread,
Mar 13, 2024, 11:56:17 AM3/13/24
to deal.II User Group
Btw: the assembly does NOT appear to be multithreaded. Only the CG solve...

Bruno Turcksin

unread,
Mar 13, 2024, 12:37:35 PM3/13/24
to dea...@googlegroups.com
In step-3, the user assembles the matrix by writing its own for loop and so it's not multithreaded. To use multithreaded assembly, you need to use mesh_loop like in step-16 for example.

deal.II own functions are sometimes multithreaded sometimes they aren't. Usually if someone (most often a developer) will get a significant speedup by multithreading a function, then the function is multithreaded. Since we are not keeping track of which functions are multithreaded and which ones are not, I cannot tell you what's multithreaded in step-3. The solver is multithreaded, maybe some other functions are multithreaded too, maybe not.

Petr Krysl

unread,
Mar 13, 2024, 12:39:33 PM3/13/24
to deal.II User Group
Thanks. That makes sense.

By the way: How do you handle multithreading at one level of the library interfering with multithreading in another?
For instance, blas using its own thread scheduling...

Bruno Turcksin

unread,
Mar 13, 2024, 1:24:03 PM3/13/24
to dea...@googlegroups.com
This is a difficult problem. We have talked about it before but if I remember correctly, we couldn't find a satisfying solution and we just pretend that there is no problem. It has happened that inside a multithreaded part of deal.II there is a call to blas and you oversubscribe the machine. Since we don't use OpenMP and usually the blas library does, you can play with OMP_NUM_THREADS and DEAL_II_NUM_THREADS to get around that issue.

Reply all
Reply to author
Forward
0 new messages