Exception handling for multithreaded case

54 views
Skip to first unread message

Paras Kumar

unread,
Aug 16, 2021, 9:18:30 AM8/16/21
to deal.II User Group
Dear deal.ii Community,

I am trying to work on the exception handling aspect of my code, which employs dealii::Workstream::run() for thread based parallelization of FE system assembly.  The code employs the Assert and AssertThrow macros, in conjunction with dealii::ExcMessage(), at several places with in the source code. The dealii::deal_II_exceptions::disable_abort_on_exception() command is used to ensure all Assert macros throw as well.

It is desired that the exception thrown from within a function is propagated up to the main function, where it can be caught so that the code can be "elegantly" exited instead of being aborted.
The  procedure works fine for exceptions thrown from single threaded functions but I get the following error for an exception thrown from the multi-threaded assembly function.

---------------------------------------------------------
In one of the sub-threads of this program, an exception
was thrown and not caught. Since exceptions do not
propagate to the main thread, the library has caught it.
The information carried by this exception is given below.

---------------------------------------------------------

Having a look at the source code reveals that this is the intended behavior for exception being thrown from a sub-thread, but I am not able to understand why this is so. Alternatively, is there some way other out to ensure that the such an exception is propagated to the main ?


Thanks in advance for your help.

Best regards,
Paras

Wolfgang Bangerth

unread,
Aug 16, 2021, 11:36:10 AM8/16/21
to dea...@googlegroups.com

Paras,
This is actually very difficult to achieve. I made that work for the 9.3
release when you use `new_task()`, and I think one could probably make that
work for `new_thread()` as well based on features C++11/14 provides. But I
don't think this is easily possible when using WorkStream and it's not clear
to me to begin with what kind of semantics this would have. Your best bet is
to ensure that functions you call via WorkStream do not throw exceptions.

As for turning Assert into throwing exceptions instead of aborting the
program: The way we treat the conditions used in Assert statements is that
they are irrecoverable: There is nothing you can do to make the program do
anything useful if you trigger an Assert, and as such aborting the program is
the correct choice: You just need to fix the program. A properly debugged
program should never run into an Assert statement.

Best
Wolfgang


--
------------------------------------------------------------------------
Wolfgang Bangerth email: bang...@colostate.edu
www: http://www.math.colostate.edu/~bangerth/

Paras Kumar

unread,
Aug 17, 2021, 10:57:23 AM8/17/21
to dea...@googlegroups.com
Dear Wolfgang,

Thank you for the quick response.

This is actually very difficult to achieve. I made that work for the 9.3
release when you use `new_task()`, and I think one could probably make that
work for `new_thread()` as well based on features C++11/14 provides. But I
don't think this is easily possible when using WorkStream and it's not clear
to me to begin with what kind of semantics this would have. Your best bet is
to ensure that functions you call via WorkStream do not throw exceptions.

This is exactly what I am trying to do now.

I have a few more (un)related questions:

1.  Thus this issue would not exist for simple parallel tasks involving a reduction operation, such as computing average stresses over the domain, using Threads::new_task?

2. The ExceptionBase::print_stack_trace() function does not print anything if I catch an exception thrown using AssetThrow(cond, dealii::ExcMessage()) or even my own MyException class derived from ExceptionBase. How do I ensure that the stack trace is "populated". I tried using the set_fields() function but it did not help.

3. Considering the parallelization of stress averaging, how could I do a reduction operation to ensure summation of the thread local stresses into a "global" variable once I have distributed the computation using the idea described on pg-10 pf Video Lecture-40 slides? Is Workstream the only solution, or is there some less involved alternative, since I just need to sum the return values of the Compute_stressSumOnCellRange() function?

 
As for turning Assert into throwing exceptions instead of aborting the
program: The way we treat the conditions used in Assert statements is that
they are irrecoverable: There is nothing you can do to make the program do
anything useful if you trigger an Assert, and as such aborting the program is
the correct choice: You just need to fix the program. A properly debugged
program should never run into an Assert statement.

This was just for the purpose of testing the procedure.


Thanks and best regards,
Paras

Wolfgang Bangerth

unread,
Aug 17, 2021, 3:09:45 PM8/17/21
to dea...@googlegroups.com

> 1.  Thus this issue would not exist for simple parallel tasks involving
> a reduction operation, such as computing average stresses over the
> domain, using Threads::new_task?

Correct. This is the test that checks this:

https://github.com/dealii/dealii/blob/master/tests/multithreading/task_01_exception.cc


> 2. The ExceptionBase::print_stack_trace() function does not print
> anything if I catch an exception thrown using AssetThrow(cond,
> dealii::ExcMessage()) or even my own MyException class derived from
> ExceptionBase. How do I ensure that the stack trace is "populated". I
> tried using the set_fields() function but it did not help.

Can you illustrate in a small test case what you are trying to do?


> 3. Considering the parallelization of stress averaging, how could I do a
> reduction operation to ensure summation of the thread local stresses
> into a "global" variable once I have distributed the computation using
> the idea described on pg-10 pf Video Lecture-40 slides? Is Workstream
> the only solution, or is there some less involved alternative, since I
> just need to sum the return values of the Compute_stressSumOnCellRange()
> function?

It's difficult because you probably don't want the object you want to
sum into to be thread-local (assuming that it is not just a single
number). WorkStream was invented to work around exactly these sorts of
problems. We wrote a whole paper about WorkStream precisely because it
is not trivial to get right, and any alternative solution I could offer
would also not be trivial.

Best
W.

Paras Kumar

unread,
Aug 18, 2021, 2:38:12 PM8/18/21
to dea...@googlegroups.com

> 2. The ExceptionBase::print_stack_trace() function does not print
> anything if I catch an exception thrown using AssetThrow(cond,
> dealii::ExcMessage()) or even my own MyException class derived from
> ExceptionBase. How do I ensure that the stack trace is "populated". I
> tried using the set_fields() function but it did not help.

Can you illustrate in a small test case what you are trying to do?

Sorry, it was my mistake. Just to reassure myself, such a stack-trace is only printed for Assert() and not for AssertThrow(). I realised this while creating the attached MWE.

Stacktrace:
-----------
#0  ./mwe-stack-trace: Solve_timeStep()
#1  ./mwe-stack-trace: Simulate()
#2  ./mwe-stack-trace: main
--------------------------------------------------------
 


> 3. Considering the parallelization of stress averaging, how could I do a
> reduction operation to ensure summation of the thread local stresses
> into a "global" variable once I have distributed the computation using
> the idea described on pg-10 pf Video Lecture-40 slides? Is Workstream
> the only solution, or is there some less involved alternative, since I
> just need to sum the return values of the Compute_stressSumOnCellRange()
> function?

It's difficult because you probably don't want the object you want to
sum into to be thread-local (assuming that it is not just a single
number). WorkStream was invented to work around exactly these sorts of
problems. We wrote a whole paper about WorkStream precisely because it
is not trivial to get right, and any alternative solution I could offer
would also not be trivial.

 The object could be a scalar or a dealii::Tensor<1,dim,double>. I was looking for something on the lines of #pragma omp for using globalSum as a shared variable and a '+' reduction operation.

TensorType globalSum;

// parallel part for each thread;  globalSum is shared , '+' as reduction
{
  TensorType localSum;

   // loop for a sub-range of cells
    localSum += cellContribution;

// have a synchronised reduction operation, i.e. globalSum += localSum, while avoiding race conditions.
}

// back to serial part

To me it seemed that there would be an openmp equivalent procedure for such simple reduction tasks.

Best regards,
Paras
mwe-stack-trace.cc

Wolfgang Bangerth

unread,
Aug 20, 2021, 6:48:31 PM8/20/21
to dea...@googlegroups.com

> Can you illustrate in a small test case what you are trying to do?
>
>
> Sorry, it was my mistake. Just to reassure myself, such a stack-trace is only
> printed for Assert() and not for AssertThrow(). I realised this while creating
> the attached MWE.

Interesting. I had no idea that there was a difference, and why. I'm tracking
this here:
https://github.com/dealii/dealii/issues/12693
I think I know how to fix this, but you'll only get to see this in the next
release (or with developer versions), of course.


>  The object could be a scalar or a dealii::Tensor<1,dim,double>. I was
> looking for something on the lines of #pragma omp for using globalSum as a
> shared variable and a '+' reduction operation.

You can try to use
std::atomic<double>
and
Tensor<1,dim,std::atomic<double>>


> To me it seemed that there would be an openmp equivalent procedure for such
> simple reduction tasks.

Even standard C++ :-)

Best
Wolfgang
Reply all
Reply to author
Forward
0 new messages