Handling NaN Values in TrilinosWrappers::SparseMatrix with Amesos_SuperluDist

33 views
Skip to first unread message

Simon

unread,
Dec 18, 2024, 3:28:25 PM12/18/24
to deal.II User Group
Dear all,

In the attached example, I deliberately assign NaN values to a TrilinosWrappers::SparseMatrix to test how the Amesos_SuperluDist direct solver handles such inputs. The code is run using deal.II version 9.4.0.

Unfortunately, the program results in a segmentation fault. Specifically, no exception (TrilinosWrappers::SolverDirect::ExcTrilinosError) is thrown when the matrix containing NaN values is passed to the solver's solve function.

In contrast, when running a sequential program with SparseDirectUMFPACK, the solver correctly handles the issue, and I am able to catch the exception SparseDirectUMFPACK::ExcUMFPACKError.

Can this be considered a Trilinos bug? Has this issue been addressed in newer versions of deal.II or Trilinos?

Clearly, assigning NaN values to a matrix does not make much sense. However, in the optimized release version of my program, bad inputs can occasionally lead to cases where the cell matrix contains NaN values. In such situations, I rely on catching exceptions (e.g., from SparseDirectUMFPACK) to handle the error gracefully.


Best,
Simon
main.cc

Wolfgang Bangerth

unread,
Dec 18, 2024, 5:53:42 PM12/18/24
to dea...@googlegroups.com

Simon:

> Unfortunately, the program results in a segmentation fault. Specifically, no
> exception (TrilinosWrappers::SolverDirect::ExcTrilinosError) is thrown when
> the matrix containing NaN values is passed to the solver's solve function.

If you run the program in a debugger, can you find out where the segfault happens?


> In contrast, when running a sequential program with SparseDirectUMFPACK, the
> solver correctly handles the issue, and I am able to catch the exception
> SparseDirectUMFPACK::ExcUMFPACKError.
>
> Can this be considered a Trilinos bug? Has this issue been addressed in newer
> versions of deal.II or Trilinos?

Segfaults should never happen. Better error messages are always better, but I
don't know whether this has been addressed. Perhaps this is your chance to try
a newer version of deal.II? ;-)


> Clearly, assigning NaN values to a matrix does not make much sense. However,
> in the optimized release version of my program, bad inputs can occasionally
> lead to cases where the cell matrix contains NaN values. In such situations, I
> rely on catching exceptions (e.g., from SparseDirectUMFPACK) to handle the
> error gracefully.

Perhaps a better approach than seeing whether a linear solver succeeds would
be to first check whether the matrix/rhs have NaNs before you hand them off to
the solver. This can be done cheaply via the Vector::l1_norm() and
SparseMatrix::frobenius_norm() functions that simply add up the (squares) or
entries. If the result is NaN, you know something is amiss.

Best
W.


--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/


Simon

unread,
Dec 19, 2024, 4:37:45 AM12/19/24
to deal.II User Group
*If you run the program in a debugger, can you find out where the segfault happens?*

Yes, the starting point of the trace is "TrilinosWrappers::SolverDirect::do_solve", and the segmentation fault occurs in the libsuperlu_dist library. 

On reviewing the do_solve function, it seems the exception TrilinosWrappers::SolverDirect::ExcTrilinosError is expected to be thrown. While Amesos_Mumps solver correctly throws this exception, unfortunately, Amesos_superludist does not. :(

As you mentioned, it might be worthwhile to try a newer version of deal.II, which provides wrappers for the more advanced Amesos2 and Tpetra packages.


*Perhaps a better approach than seeing whether a linear solver succeeds would
be to first check whether the matrix/rhs have NaNs before you hand them off to
the solver. *

My motivation for avoiding such checks was to keep the optimized release code free of
unnecessary overhead, trusting that the solver would throw appropriate exceptions when feeded
with bad input. However, I probably can not rely on that for all solvers, and manually checking for NaN
is the more reliable approach.

-Simon

Wolfgang Bangerth

unread,
Dec 19, 2024, 12:15:27 PM12/19/24
to dea...@googlegroups.com

> *If you run the program in a debugger, can you find out where the segfault
> happens?*
>
> Yes, the starting point of the trace is
> "TrilinosWrappers::SolverDirect::do_solve", and the segmentation fault occurs
> in the libsuperlu_dist library.
>
> On reviewing the do_solve function, it seems the exception
> TrilinosWrappers::SolverDirect::ExcTrilinosError is expected to be thrown.
> While Amesos_Mumps solver correctly throws this exception, unfortunately,
> Amesos_superludist does not. :(

You might want to report that to them.


> *Perhaps a better approach than seeing whether a linear solver succeeds would
> be to first check whether the matrix/rhs have NaNs before you hand them off to
> the solver. *
>
> My motivation for avoiding such checks was to keep the optimized release code
> free of
> unnecessary overhead, trusting that the solver would throw appropriate
> exceptions when feeded
> with bad input. However, I probably can not rely on that for all solvers, and
> manually checking for NaN
> is the more reliable approach.

In fairness, I think it's reasonable for a solver to expect a matrix that at
least doesn't have NaNs in them. It shouldn't segfault, of course, but I also
don't think that it ought to check for NaNs itself.

Compared to the overall cost of a direct solver, going over all matrix entries
once and sum them up is quite cheap. It doesn't seem worth it (to me) to avoid
the check if it makes it easier for you to get good error messages from your code.
Reply all
Reply to author
Forward
0 new messages