Sundials::Kinsol exceptions only checked in debug mode

139 views
Skip to first unread message

Simon

unread,
Apr 26, 2025, 6:16:59 AM4/26/25
to deal.II User Group
Dear all,

I am solving an elasticity BVP where a nonlinear system must be solved at each time step. To speed up my assembly, I am currently testing the SUNDIALS::KINSOL wrapper in dealii version 9.4.0.

The main logic is happening during
// call to KINSol
588 status = KINSol(kinsol_mem, solution, data.strategy, u_scale, f_scale);
589 AssertKINSOL(status);

where the AssertKINSOL translates into an Assert(code>=0, ExcKINSOLError), which is optimized away in release mode. Therefore, wrapping the solve() call in a try/catch block does not work in release mode. 
However, I believe that catching ExcKINSOLError would also be valuable in release mode, for example when:
- The residual callback can not be evaluated (returns negative integer) in the first call.
- The residual callback can not be evaluated five times in a row .
In that cases, KINSol returns a negative error code, but the solve function currently returns the number of iterations taken as if the solve was successful.
Robust error handling strategies (if an exception were thrown and caught) could include switching to a different nonlinear solver, or adjusting the outer time step size, ...

I am aware that the SUNDIALS wrapper have been updated as of dealii 9.4.0, but I believe the above issue still exists.

My questions are:
1. What are the reasons why the ExcKINSOLError is defined using Assert rather than AssertThrow? 
2. What can be done instead to catch the above errors?
(Of course, there is the option to compute the residual after the solve call, but I was hoping for a cheaper solution, ideally letting Kinsol itself decide if the nonlinear solve was successful.) 

Thank you.

Best,
Simon

Wolfgang Bangerth

unread,
Apr 27, 2025, 5:37:52 PM4/27/25
to dea...@googlegroups.com

Simon:

> I am solving an elasticity BVP where a nonlinear system must be solved at each
> time step. To speed up my assembly, I am currently testing the
> SUNDIALS::KINSOL wrapper in dealii version 9.4.0.
>
> The main logic is happening during
> // call to KINSol
> 588 status = KINSol(kinsol_mem, solution, data.strategy, u_scale, f_scale);
> 589 AssertKINSOL <https://nam10.safelinks.protection.outlook.com/?
> url=https%3A%2F%2Fwww.dealii.org%2F9.4.0%2Fdoxygen%2Fdeal.II%2Fkinsol_8h.html%23a0b815dfbfd8c5c49a64af965acdf5330&data=05%7C02%7CWolfgang.Bangerth%40colostate.edu%7C911376c5e6e64a419e3808dd84ab7dda%7Cafb58802ff7a4bb1ab21367ff2ecfc8b%7C0%7C0%7C638812594279314310%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=bcUWydLH%2B8MwJ%2FqPyi4fpX%2FWZHBZhddK%2F%2F8N9LctL4c%3D&reserved=0>(status);
>
> where the AssertKINSOL translates into an Assert(code>=0, ExcKINSOLError),
> which is optimized away in release mode. [...]
> My questions are:
> 1. What are the reasons why the ExcKINSOLError is defined using Assert rather
> than AssertThrow?
> 2. What can be done instead to catch the above errors?
> (Of course, there is the option to compute the residual after the solve call,
> but I was hoping for a cheaper solution, ideally letting Kinsol itself decide
> if the nonlinear solve was successful.)

This code has been rewritten a couple of years ago, and I think that it now
does exactly what you want it to do:
https://github.com/dealii/dealii/blob/master/source/sundials/kinsol.cc#L522-L567
This assumes that you have a way in your callbacks to throw an exception. This
is also documented here:
https://dealii.org/developer/doxygen/deal.II/DEALGlossary.html#GlossUserProvidedCallBack

Or perhaps I misunderstand? Do you have a situation where your callbacks do
what they are expected to do, but KINSOL still creates an error for legitimate
reasons?

Best
W.


Simon

unread,
Apr 28, 2025, 2:45:13 AM4/28/25
to deal.II User Group
Dear Wolfgang,

attached, I have adjusted your "kinsol_06" test to dealii 9.4.0 format (callbacks return integers rather than throwing exceptions) 
and forced it to report an evaluation failure five times in a row. In that situation, KINSol cannot recover and returns a negative status value. 

I can only test the program in dealii 9.4.0, where the .solve() function ignores in release mode whatever KINSol returns. 
But if I understand ... 

     "This code has been rewritten a couple of years ago, and I think that it now
      does exactly what you want it to do:"

correctly, the attached program behaves differently in dealii 9.6.0 by throwing an exception, thereby offering the possibility to handle those errors? 

Best,
Simon
kinsol_06_v2.cc

Wolfgang Bangerth

unread,
Apr 29, 2025, 1:41:54 PM4/29/25
to dea...@googlegroups.com

On 4/28/25 00:45, Simon wrote:
>
> attached, I have adjusted your "kinsol_06" test to dealii 9.4.0 format
> (callbacks return integers rather than throwing exceptions)
> and forced it to report an evaluation failure five times in a row. In
> that situation, KINSol cannot recover and returns a negative status value.

Ah, I see -- the error in that case does not originate in a user
callback, but instead is a legitimate error code of KINSOL not related
to a programming mistake. Indeed that should result in an exception, not
a program termination (or nothing at all, as currently in release mode).


> I can only test the program in dealii 9.4.0, where the .solve() function
> ignores in release mode whatever KINSol returns.
> But if I understand ...
>
>      "This code has been rewritten a couple of years ago, and I think
> that it now
>       does exactly what you want it to do:"
>
> correctly, the attached program behaves differently in dealii 9.6.0 by
> throwing an exception, thereby offering the possibility to handle those
> errors?

No, in this case the behavior has not changed -- but it should.

Do you know whether KINSOL returns different error codes based on
whether it just can't find a solution, or whether something is
conceptually wrong (i.e., a programming error)?

Would you be willing to create this test case for the current deal.II
sources as well? That would make it easier for me to come up with a patch.

Best
W.

Simon

unread,
Apr 30, 2025, 2:40:08 AM4/30/25
to deal.II User Group
"Do you know whether KINSOL returns different error codes based on
whether it just can't find a solution, or whether something is
conceptually wrong (i.e., a programming error)?"

I believe there are currently 19 possible return values when KINSol does not succeed:
For instance,  'KIN_MAXITER_REACHED' or 'KIN_LINESEARCH_NONCONV' suggests KINSol could not find a solution, whereas errors like 
'KIN_FIRST_SYSFUNC_ERR' (and others) indicate recoverable errors. 
That said, I am not sure whether any of these error codes are intended to indicate actual programming errors 

"No, in this case the behavior has not changed -- but it should.

Would you be willing to create this test case for the current deal.II
sources as well? That would make it easier for me to come up with a patch."

Attached is the test case for the deal.II 9.6.0 release. 
I was also able to install version 9.6.0 and run the program myself.
Currently, the program throws a 'StandardExceptions::RecoverableUserCallbackError',
which can be caught and handled as a recoverable error. However, I believe this is not sufficient. 
For example, if we limit the number of nonlinear iterations to two via the AdditionalData argument, 
we can easily trigger a 'KIN_MAXITER_REACHED' error -- still being ignored in release mode.

Best,
Simon
kinsol_06_v2.cc

Wolfgang Bangerth

unread,
Apr 30, 2025, 4:03:22 PM4/30/25
to dea...@googlegroups.com

On 4/30/25 00:40, Simon wrote:
>
> Attached is the test case for the deal.II 9.6.0 release.
> I was also able to install version 9.6.0 and run the program myself.
> Currently, the program throws a
> 'StandardExceptions::RecoverableUserCallbackError',
> which can be caught and handled as a recoverable error. However, I
> believe this is not sufficient.
> For example, if we limit the number of nonlinear iterations to two via
> the AdditionalData argument,
> we can easily trigger a 'KIN_MAXITER_REACHED' error -- still being
> ignored in release mode.

I tried that, but I still get the following with a slightly modified
testcase:

21: DEAL::Computing residual for the 1st time, at 10.0000
21: DEAL::Setting up Jacobian system at u=10.0000
21: DEAL::Computing residual for the 2nd time, at 10.0000
21: DEAL::Reporting recoverable failure.
21:
21: sundials/kinsol_06_v2.debug: RUN failed. ------ Additional output on
stdout/stderr:
21:
21:
21: [KINSOL ERROR] KINSol
21: The linear solver's solve function failed recoverably, but the
Jacobian data is already current.
21:
21: terminate called after throwing an instance of
'dealii::StandardExceptions::RecoverableUserCallbackError'
21: what():
21: --------------------------------------------------------
21: An error occurred in line <0> of file <> in function
21:
21: The violated condition was:
21:
21: Additional information:
21: A user call-back function encountered a recoverable error, but the
21: underlying library that called the call-back did not manage to
recover
21: from the error and aborted its operation.
21:
21: See the glossary entry on user call-back functions for more
21: information.
21: --------------------------------------------------------

That actually sounds like a reasonable error message to me, and it's an
exception I can catch. I posted this testcase here:
https://github.com/dealii/dealii/pull/18404
So I can't reproduce what you are saying -- though I can see that what
you are saying is right in principle. Would you be willing to modify
this testcase in such a way that it actually shows the issue you mention?

Best
W.

Simon

unread,
May 2, 2025, 1:51:41 AM5/2/25
to deal.II User Group
"So I can't reproduce what you are saying -- though I can see that what
you are saying is right in principle. Would you be willing to modify
this testcase in such a way that it actually shows the issue you mention?"

The only change I made to your new testcase was commenting out the check on count in line 61:
if ((u[0] < -10) || (u[0] > 20) /* || (count > 0) */)
After this modification, I get the following output:
------------------------------------------------------------------------------------------
Computing residual for the 1th time, at u=10

Setting up Jacobian system at u=10
Computing residual for the 2th time, at u=10
Computing residual for the 3th time, at u=-88.0839
Reporting recoverable failure.
Computing residual for the 3th time, at u=-39.0419
Reporting recoverable failure.
Computing residual for the 3th time, at u=-14.521
Reporting recoverable failure.
Computing residual for the 3th time, at u=-2.26049
Setting up Jacobian system at u=-2.26049
Computing residual for the 4th time, at u=-2.26049
Computing residual for the 5th time, at u=7.84693
Computing residual for the 6th time, at u=7.84693
Computing residual for the 7th time, at u=2.07902
Setting up Jacobian system at u=2.07902
Computing residual for the 8th time, at u=2.07902
Computing residual for the 9th time, at u=-1.23396
Computing residual for the 10th time, at u=-1.23396
Computing residual for the 11th time, at u=6.16274


[KINSOL ERROR]  KINSol
  The maximum number of iterations was reached before convergence.
------------------------------------------------------------------------------------------

That said, if there is no pending exception from the callbacks, the status returned by KINSOL 
is ignored in release mode. 
One solution would be to define the AssertKINSOL macro as "AssertThrow(...)", rather than "Assert(...)". 

Best,
Simon

Wolfgang Bangerth

unread,
May 6, 2025, 4:43:32 PM5/6/25
to dea...@googlegroups.com

Simon:

> That said, if there is no pending exception from the callbacks, the
> status returned by KINSOL
> is ignored in release mode.
> One solution would be to define the AssertKINSOL macro as
> "AssertThrow(...)", rather than "Assert(...)".

Yes, thanks for showing me a situation where this can happen! I made
this into a test, and changed the Assert into AssertThrow here:
https://github.com/dealii/dealii/pull/18428
I think that's what you had in mind, right?

Do you want to take a look whether the PETSC SNES and the Trilinos NOX
solvers behave in the same way and perhaps also need to be fixed?

Best & thanks for the collaboration!
W.

Simon

unread,
May 8, 2025, 4:44:54 AM5/8/25
to deal.II User Group
"I think that's what you had in mind, right?"

Yes :)

"Do you want to take a look whether the PETSC SNES and the Trilinos NOX
solvers behave in the same way and perhaps also need to be fixed?"

I do not have Trilinos configured with NOX on my system.
For PETSC SNES, I attached a test that allows only a single nonlinear iteration.
But the error message (release mode) is very descriptive: 

***************************************************************
Computing residual for the 1th time, at u=10
Setting up Jacobian system at u=10
Computing residual for the 2th time, at u=10
Computing residual for the 3th time, at u=-88.0839
Reporting recoverable failure.
Computing residual for the 3th time, at u=-39.0419
Reporting recoverable failure.
Computing residual for the 3th time, at u=-14.521
Reporting recoverable failure.
Computing residual for the 3th time, at u=-2.26049
Computing residual for the 4th time, at u=8.77395
Nonlinear Solver threw an exception with the following message:

--------------------------------------------------------
An error occurred in line <649> of file </calculate/temp/ltmadmin/spack-stage-dealii-9.6.0-zazikg5icx5chodqsv5hc7v77ksctgrj/spack-src/include/deal.II/lac/petsc_snes.templates.h> in function
    unsigned int dealii::PETScWrappers::NonlinearSolver<VectorType, PMatrixType, AMatrixType>::solve(VectorType&) [with VectorType = dealii::PETScWrappers::MPI::Vector; PMatrixType = dealii::PETScWrappers::MatrixBase; AMatrixType = dealii::PETScWrappers::MatrixBase]
The violated condition was:
    reason > 0
Additional information:
    SNES solver did not converge after 1 iterations with reason
    DIVERGED_MAX_IT

***************************************************************

Additionally, if the user callbacks fail repeatedly (e.g., comment in the count variable), the resulting error message remains informative.

Therefore, I have no suggestions regarding PETSC SNES. 

Best,
Simon

nonlsolver.cc

Wolfgang Bangerth

unread,
May 9, 2025, 7:56:12 PM5/9/25
to dea...@googlegroups.com

Simon:

> I do not have Trilinos configured with NOX on my system.
> For PETSC SNES, I attached a test that allows only a single nonlinear iteration.
> But the error message (release mode) is very descriptive:
> [...]

Thanks for trying this out! I made your program into a test so that we can
check that this behavior keeps working in the future as well!
https://github.com/dealii/dealii/pull/18443

Best
W.


--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/


Simon

unread,
May 19, 2025, 2:52:57 PM5/19/25
to deal.II User Group
" Thanks for trying this out! I made your program into a test so that we can
check that this behavior keeps working in the future as well!"

Excellent!

Do you have any suggestions for how I could work around the fix with my existing deal.II installation?
Essentially, I need a way to determine whether KINSol was successful without access to the error code -- particularly those related to repetitive evaluation failures.
One option is to compute the residual after the solve and compare it to the tolerances specified in AdditionalData.
However, can you think of a more efficient alternative that avoids residual assembly?  

Best,
Simon

Wolfgang Bangerth

unread,
May 19, 2025, 8:18:34 PM5/19/25
to dea...@googlegroups.com

> Do you have any suggestions for how I could work around the fix with my
> existing deal.II installation?
> Essentially, I need a way to determine whether KINSol was successful without
> access to the error code -- particularly those related to repetitive
> evaluation failures.
> One option is to compute the residual after the solve and compare it to the
> tolerances specified in AdditionalData.
> However, can you think of a more efficient alternative that avoids residual
> assembly?

Hi Simon,
I can't see any convenient way to achieve what you want short of just
upgrading deal.II. This would of course also give you dozens or hundreds of
other bug fixes :-)

Best
W.
Reply all
Reply to author
Forward
0 new messages