PETSc Sparse LU Preallocation

Lucas Campos

unread,

Dec 7, 2017, 5:12:29 AM12/7/17

to deal.II User Group

Dear all,

Currently I am using a direct LU solver via PETSc/MUMPS to solve my matrix. However, I have noticed that I spend a lot of time in allocation, at every step. Is it possible (or useful) to preallocate the internal structures necessary to solve the matrix? According to [1], it is possible to do it if I use UMFPack, but it seems I would need to change a bit more code to still work with MPI, so would be simpler to do it while using PETSc/MUMPS.

Bests,

Lucas

[1]https://www.dealii.org/developer/doxygen/deal.II/classSparseDirectUMFPACK.html

Wolfgang Bangerth

unread,

Dec 7, 2017, 9:40:30 AM12/7/17

to dea...@googlegroups.com

On 12/07/2017 03:12 AM, Lucas Campos wrote:
>
> Currently I am using a direct LU solver via PETSc/MUMPS to solve my matrix.
> However, I have noticed that I spend a lot of time in allocation, at every
> step. Is it possible (or useful) to preallocate the internal structures
> necessary to solve the matrix? According to [1], it is possible to do it if I
> use UMFPack, but it seems I would need to change a bit more code to still work
> with MPI, so would be simpler to do it while using PETSc/MUMPS.

Lucas -- I don't think I entirely understand the question. Where is the time
allocating memory spent? When you build the matrix, or when MUMPS works on it
to compute the LU decomposition?

You have no control over what MUMPS does -- it's a black box from our
perspective. Or do you know whether PETSc allows setting parameters that are
then passed to MUMPS?

Best
W.

--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Lucas Campos

unread,

Dec 7, 2017, 10:28:40 AM12/7/17

to dea...@googlegroups.com

Dear Wolfgang,

I am sorry for not being clear.

> Lucas -- I don't think I entirely understand the question. Where is the time allocating memory spent? When you build the matrix, or when MUMPS works on it to compute the LU decomposition?

I mean that the LU Decomposition solver takes a huge amount of RAM, and it seems to me that allocating that once and reusing the space would be better. Attached you can find a simple graph* showing how the free memory in time. I ran an instance of my program using around 164k cells, running on 7 threads. As you can see, the solving step consumes a lot of RAM, and then deallocates it after the solver finishes. What I wonder is if it is useful and possible to just do this allocation/freeing once, at the start of the program.

> You have no control over what MUMPS does -- it's a black box from our perspective. Or do you know whether PETSc allows setting parameters that are then passed to MUMPS?

I don't know if PETSc allows to change MUMPS' configuration. In fact, I only ever used PETSc via deal.II. What I understood from the documentation was that UMFPack would allow me to use a single allocation, but currently I am not 100% how to make it play nice with PETSc interface and I want to check if there is a simpler/more direct way to do before diving into it.

*: This graph is less than scientific. I simply ran free every 5 seconds while my program ran. But I think it gets the point across.

Bests,

Lucas

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en
--- You received this message because you are subscribed to a topic in the Google Groups "deal.II User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dealii/yje1MbYdlfc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dealii+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

memory.png

Lucas Campos

unread,

Dec 7, 2017, 10:31:35 AM12/7/17

to dea...@googlegroups.com

Dear Wolfgang,

I just noticed a minor mistake on the graph: two of the arrows are slightly misplaced. While this does not change the main message, I am still sending you the corrected figure.

Lucas

memory.png

Wolfgang Bangerth

unread,

Dec 7, 2017, 10:44:46 AM12/7/17

to dea...@googlegroups.com, Lucas Campos

Lucas,

> I mean that the LU Decomposition solver takes a huge amount of RAM, and it
> seems to me that allocating that once and reusing the space would be better.
> Attached you can find a simple graph* showing how the free memory in time. I
> ran an instance of my program using around 164k cells, running on 7 threads.
> As you can see, the solving step consumes a lot of RAM, and then deallocates
> it after the solver finishes. What I wonder is if it is useful and possible to
> just do this allocation/freeing once, at the start of the program.

I don't think it matters. First, you can't know how much memory you're going
to use if you do a sparse LU decomposition. It all depends on the sparsity
pattern of the matrix. Second, MUMPS does this internally -- I don't think you
have control over it. Third, sparse decompositions are so expensive to compute
that the cost of memory allocation is likely completely negligible.

I think there's little to gain from trying to tweak the memory allocation part
of this. The fact that MUMPS takes a lot of memory is also nothing you can
change -- that's just what you get for using a direct solver.

Nice graph, though. It clearly shows what's going on!

Cheers

Lucas Campos

unread,

Dec 7, 2017, 10:58:02 AM12/7/17

to deal.II User Group

Dear Wolfgang,

> I think there's little to gain from trying to tweak the memory allocation part of this. The fact that MUMPS takes a lot of memory is also nothing you can change -- that's just what you get for using a direct solver.

Thanks for your thoughts on this. I have tried to use iterative solvers, but they all seem to fail after at some point, and it seems that the maximum number of iterations in every one of them is capped at 10000. I will give them another try before accepting this RAM devouring fate. Worst case scenario, I hope that when I distribute this to several nodes the "parallel ram" will (even if slightly) compensate this effect.

> Nice graph, though. It clearly shows what's going on!

Thank you! I hope it makes up for the unclear first post!

Bests,
L

Wolfgang Bangerth

unread,

Dec 7, 2017, 12:26:24 PM12/7/17

to dea...@googlegroups.com

> Thanks for your thoughts on this. I have tried to use iterative solvers,
> but they all seem to fail after at some point,

Yes, you need good preconditioners!

> and it seems that the
> maximum number of iterations in every one of them is capped at 10000.

No, you can select that in the SolverControl object. But if you need
more than 10,000 iterations, then that really is no longer efficient.

Best

Lucas Campos

unread,

Dec 8, 2017, 3:21:36 AM12/8/17

to dea...@googlegroups.com

> Yes, you need good preconditioners!

I am investigating this now. I have to admit I was overwhelmed with the plethora of options, but I am going through Barrett et al's Template Book now, and I hope it will help.

www: http://www.math.colostate.edu/~bangerth/

Wolfgang Bangerth

unread,

Dec 8, 2017, 10:49:20 AM12/8/17

to dea...@googlegroups.com

On 12/08/2017 01:21 AM, Lucas Campos wrote:
> > Yes, you need good preconditioners!
>
> I am investigating this now. I have to admit I was overwhelmed with the
> plethora of options, but I am going through Barrett et al's Template Book now,
> and I hope it will help.

That's pretty old.

Take a look at the (many) video lectures I've recorded on the topic.

Jie Cheng

unread,

Dec 17, 2017, 12:19:48 AM12/17/17

to deal.II User Group

Hi Lucas and Wolfgang

I have something to say on this issue because I think it might be helpful to Lucas or other users. I know large memory usage is unavoidable in direct solvers. But I guess Lucas' issue with MUMPS was that he did not know how to reuse the factorization like one always does with SparseDirectUMFPACK. MUMPS does not have similar initialize and vmult functions as UMFPACK, and its documentation is not clear on how to reuse the memory.

Upon checking this discussion I realize the MUMPS solver does have the ability to reuse the factorization, it's just that the APIs of this class is not consistent with UMFPACK or TrilinosWrappers::SolverDirect: there is actually a github issue for it.

In my case, I managed to reuse the factorization with a workaround: I construct the MUMPS solver in the constructor of my upper-level class. Then call MUMPS::solve in another member function of this class. Therefore, as long as I do not reset the upper-level object, the memory reallocation does not happen. With this workaround, the time consumption of MUMPS in a specific problem drops from 49s to 7s (one processor), comparing with UMFPACK (9s), is quite satisfactory.