Direct solvers, adaptivity and memory issues

98 views
Skip to first unread message

Marco Favino

unread,
Sep 29, 2016, 10:29:59 AM9/29/16
to moose-users
Hi all,

Our application models fractures in a rock.
We use adaptive mesh refinement to obtain a fine mesh at fracture locations and a coarse mesh otherwise.
We specify the locations of the fractures and, thus, where the refinement has to happen.
As the iterative solvers do not converge for our problem, we use a direct solver (MUMPS or super-lu).
Depending on the amount of fractures in the model, the application crashes at a later or an earlier mesh-refinement step.
The solver super-lu gave as error "Not enough memory to perform factorisation."
MUMPS just results in a segmentation fault.
However, we are running this code on a fat-node of our cluster with 512 GB of memory.
The log-file indicates, that only a maximum of 256 GB of memory have been used and no other user was using that node at that time.
Thus, in theory, there should have been more memory available.
Therefore, the question is, why did it crash?

Thanks in advance
Marco and Jürg


Peterson, JW

unread,
Sep 29, 2016, 10:55:00 AM9/29/16
to moose-users
There may have been 256 GB of memory *in use* when the code crashed, but you would need to know the amount of memory being requested at the time of the crash.  It might be some ridiculous amount due to e.g. some other bug in SuperLU/MUMPS...

--
John

Benjamin Spencer

unread,
Sep 29, 2016, 10:58:19 AM9/29/16
to moose-users
As you have observed, the downside to direct solvers is that they have significantly higher memory requirements than iterative solvers, especially for larger models. We have tracked the history of memory usage with both superlu and mumps, and both of them experience very large spikes in memory usage at certain points in time. They are both very powerful tools for moderately-sized problems, but they are not as scalable as iterative solvers, and it sounds like you are hitting up against their limits with the size of your model.

-Ben

On Thu, Sep 29, 2016 at 8:29 AM, Marco Favino <phd...@gmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "moose-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to moose-users+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/moose-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/moose-users/d6c31378-10e4-4b81-978b-55103ed788b7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kong, Fande

unread,
Sep 29, 2016, 11:10:34 AM9/29/16
to moose...@googlegroups.com
On Thu, Sep 29, 2016 at 8:29 AM, Marco Favino <phd...@gmail.com> wrote:
Hi all,

Our application models fractures in a rock.
We use adaptive mesh refinement to obtain a fine mesh at fracture locations and a coarse mesh otherwise.
We specify the locations of the fractures and, thus, where the refinement has to happen.
As the iterative solvers do not converge for our problem, we use a direct solver (MUMPS or super-lu).

 Iterative solvers usually work as long as you have a CORRECT jacobian matrix, unless your problem is super difficult.  I think your problem is just fine because you do not have an issue on the converge of the nonlinear solver.

Fande,
 
Depending on the amount of fractures in the model, the application crashes at a later or an earlier mesh-refinement step.
The solver super-lu gave as error "Not enough memory to perform factorisation."
MUMPS just results in a segmentation fault.
However, we are running this code on a fat-node of our cluster with 512 GB of memory.
The log-file indicates, that only a maximum of 256 GB of memory have been used and no other user was using that node at that time.
Thus, in theory, there should have been more memory available.
Therefore, the question is, why did it crash?

Thanks in advance
Marco and Jürg


Reply all
Reply to author
Forward
0 new messages