SIGABORT when throwing exception?

278 views
Skip to first unread message

Sergi Molins Rafa

unread,
Apr 22, 2023, 4:09:58 PM4/22/23
to Amanzi-ATS Users
Hi all

I am getting a SIGABORT error when an error message from Alquimia_PK.cc throws an exception. This is all in Amanzi src but the simulation this is an issue for is with ATS. I got in the debugger but I can't figure out why this message is an issue for the exception. 

I am on Ubuntu, using a relatively old gcc (9.3). This same setup works for the same problem but running amanzi-ats 1.3 (before clang related changes in the exceptions.hh). Replacing exceptions.hh from this older version in the dev version does not solve the problem. 

Any suggestions?

Thanks

Sergi

error

terminate called after throwing an instance of 'Errors::Message'
  what():  Failure in Alquimia_PK::AdvanceStep
[smolins-u33:00865] *** Process received signal ***
[smolins-u33:00865] *** Process received signal ***
[smolins-u33:00865] Signal: Aborted (6)
[smolins-u33:00865] Signal code:  (-6)

Alquimia_PK.cc

if (recv[0] != 0) {
    Errors::Message msg;
    msg << "Failure in Alquimia_PK::AdvanceStep";
    Exceptions::amanzi_throw(msg);
  }

exceptions.hh

template <typename E>
void
amanzi_throw(const E& exception)
{
  if (behavior == Exceptions::RAISE)
    throw exception;
  else
    abort();
}

----


Coon, Ethan

unread,
Apr 23, 2023, 11:19:44 AM4/23/23
to Sergi Molins Rafa, Amanzi-ATS Users
Is the problem that you’re not getting a more detailed error message from the engine, or were you expecting a catch block somewhere?  

Ethan

Sent from Ethan Coon’s phone.

From: ats-...@googlegroups.com <ats-...@googlegroups.com> on behalf of Sergi Molins Rafa <smo...@lbl.gov>
Sent: Saturday, April 22, 2023 4:09:44 PM
To: Amanzi-ATS Users <ats-...@googlegroups.com>
Subject: [EXTERNAL] SIGABORT when throwing exception?
 
--
You received this message because you are subscribed to the Google Groups "Amanzi-ATS Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ats-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ats-users/CAP9Hfk6qQ_RqiwY8J5LfkC6%2BtnhxMq7Kp0%2Bhn_LfLxE3Xzb8qw%40mail.gmail.com.

Sergi Molins Rafa

unread,
Apr 23, 2023, 1:50:16 PM4/23/23
to Coon, Ethan, Amanzi-ATS Users
The main problem is that the simulation should not abort. A failed step is ok, it should just result in a time step cut.

Coon, Ethan

unread,
Apr 23, 2023, 3:01:33 PM4/23/23
to Sergi Molins Rafa, Amanzi-ATS Users

Right, ok, it should be as simple as throwing CutTimeStep instead of Errors::Message, though that may be an implicit thing only, or possible just returning false in Advance step rather than throwing.  I can take a look – is the run a simple one that you could send me xml? 

 

Ethan

 

-- 

-------------------------------------------------------------------------

Ethan Coon

Senior Research Scientist

Oak Ridge National Laboratory

 

865-241-1296

https://www.ornl.gov/staff-profile/ethan-t-coon

-------------------------------------------------------------------------

 

Message has been deleted

Sergi Molins Rafa

unread,
Aug 3, 2023, 4:56:46 PM8/3/23
to Coon, Ethan, Amanzi-ATS Users
I am getting back to this. 

The problem I have here is that after throwing this exception, with behavior==Exceptions::RAISE, the code still terminates. I would like to know what the possible reasons for that are. Where this happens is too low-level to be intelligible to me

eh_throw.cc line 98:

  // Some sort of unwinding error.  Note that terminate is a handler.
  __cxa_begin_catch (&header->exc.unwindHeader);
  std::terminate ();

Thanks

Sergi

On Sun, Apr 23, 2023 at 8:19 AM Coon, Ethan <coo...@ornl.gov> wrote:

Coon, Ethan

unread,
Aug 3, 2023, 9:56:25 PM8/3/23
to Sergi Molins Rafa, Amanzi-ATS Users

I’m not sure what you would like to have happen.  Behavior == RAISE and behavior == ABORT will both terminate the simulation.  Behavior == ABORT terminates immediately, do not pass go, with SIGABRT.  Behavior == RAISE throws an error, which then unwinds through the calling stack to see if someone is going to catch it.  If no one catches it, it also terminates with SIGABRT.

 

ATS does not wrap the entire “main” in a try/catch.  So currently all calls to amanzi_throw(Errors::Message) will terminate with SIGABRT.

 

I believe this mechanism was introduced in Day 1 of Amanzi when the thinking was that it would get wrapped in Akuna, which would be doing UQ with Amanzi, and would therefore catch Errors::Message as a way of just saying “this set of parameters resulted in a bad run, move on.”

 

Don’t try to catch Errors::Message.  If you’re going to catch something, throw a special error and catch that.  A bit of explanation of what you’re trying to throw and catch would help.

 

Ethan

 

From: ats-...@googlegroups.com <ats-...@googlegroups.com> on behalf of Sergi Molins Rafa <smo...@lbl.gov>


Date: Thursday, August 3, 2023 at 4:56 PM
To: Coon, Ethan <coo...@ornl.gov>
Cc: Amanzi-ATS Users <ats-...@googlegroups.com>

Amanzi-ATS Users

unread,
Aug 5, 2023, 3:50:40 PM8/5/23
to Amanzi-ATS Users
Ok, sorry Sergi, I wasn't seeing the full thread on my laptop for some reason, so I didn't have your earliest email about wanting it to cut the timestep.

Assuming you're in Alquimia_PK::AdvanceStep(), rather than throwing, you should simply "return false" on error.  This must be done collectively on all ranks.  Then the Alquimia_PK, on the next call to get_dt(), should return a smaller value of dt.  This somewhat comes back to our previous conversation on refactoring that to use the TimeStepController class for modifying its dt, or could somehow be done in the existing step size code in that PK.

Ethan

Sergi Molins Rafa

unread,
Aug 8, 2023, 8:23:07 PM8/8/23
to Amanzi-ATS Users
No need to apologize. The previous answer was actually very useful to me as well. 

I created a pull request that fixes the bug that was causing the discrepancies I was seeing (not related to the Alquimia exception but that was leading to it). The pull request also catches the Alquimia exception up the stack.

Next would be to replace the old alquimia_pk ComputeNextTimeStep with a TimestepController-based time step. But that would be on the Amanzi side. (If anybody can point to a pk where that has been implemented, that would be helpful.)

Thanks

Sergi

Coon, Ethan

unread,
Aug 9, 2023, 9:33:19 AM8/9/23
to Sergi Molins Rafa, Amanzi-ATS Users

Great, I’ll look at it ASAP.  We have quite the backlog of PRs for Konstantin and I to go through, in both ATS and Amanzi!  Which is great.  We’ll work through these and hopefully get out a new release this fiscal year.

 

As to users of TimestepController, the only one I’m aware of currently is in src/time_integration/BDF1_TI.hh, which isn’t really a PK, but should help show how to use it.  Obviously that one is set up around nonlinear iteration counts for a single, collective nonlinear solve.  I’m not sure whether you want that or not – you’d have to come up with some way of choosing the dt based on iteration counts from NCELL different solves. 

 

Alternatively, you could just enforce in code that min_iterations to 1, max_iterations to INF, and effectively just use the “time step reduction factor” to decrease dt on failure.  The question then is whether you want to recover the bigger dt once you get through the “hard” part.  Since you’re not really subcycling here, I’m assuming that you have to have the same dt across all cells...  Some more thought required!

 

Ethan

 

 

Reply all
Reply to author
Forward
0 new messages