High level thread design question

Larry Palmer

unread,

Aug 13, 2002, 7:38:33 AM8/13/02

to

I'm doing a project which basically involves keeping state for a large
number of customer sessions, such that there might be as many as, say 20,000
simultaneous "sessions", each lasting minutes to hours (but requiring only
infrequent actions during the session, so the total CPU load is still fairly
small). It's been suggested that I should model each transaction as a
thread, to exploit the elegant and intuitive concept of writing a simple
piece of code that handles a single session, then simply replicating it as
many times as needed. Essentially, a thread would be used as the unit of
unit of local state for a single session. It's also been suggested that
this might make the program more "robust", in the sense that a single thread
blowing up due to a bug might allow all the other sessions to carry on
unaffected, or at least have a higher probability of doing so.

My question is, first, is that a valid model? Secondly, of course, is it
even feasible to have 20,000 simultaneous threads (under redhat linux, or
possibly realtime linux)? Certainly it involves raising the allowed number
of threads in the kernel, as well as recompiling libpthread to allow more
threads per process (and having a lot of memory for all those stacks). I
suspect there will be other limits as well that will have to be changed as
they're encountered. Is it doable?

My gut feeling is that this is asking for grief by using threads in a way
that's not really intended (as repositories of local state, rather than
simply units of concurrency). My inclination is to do a more standard
design with a single process and an array of objects, one for each session,
containing finite state machines showing the current state of the session,
and using non-blocking calls for all IO. A few threads could conceivably be
used to minimize latency by making sure there's always a thread ready to
handle input from any data source, in the event that some lengthy code paths
appear.

Is there any conventional wisdom on this subject? Or any thoughts one way
or the other?

Tony Gale

unread,

Aug 13, 2002, 8:28:33 AM8/13/02

to

In article <ulhruom...@corp.supernews.com>,

"Larry Palmer" <ac...@concentric.net> writes:
>
> My gut feeling is that this is asking for grief by using threads in a way
> that's not really intended (as repositories of local state, rather than
> simply units of concurrency). My inclination is to do a more standard
> design with a single process and an array of objects, one for each session,
> containing finite state machines showing the current state of the session,
> and using non-blocking calls for all IO. A few threads could conceivably be
> used to minimize latency by making sure there's always a thread ready to
> handle input from any data source, in the event that some lengthy code paths
> appear.

Yes, this is a better approach. This way, for example, you can easily
calculate how much storage 20,0000 instances is going to require. You
can use a thread pool with a request queue to service incoming requests.

>
> Is there any conventional wisdom on this subject? Or any thoughts one way
> or the other?
>

Yes, don't allocate a thread-per-client. Use a thread pool. And
remember, threads don't automatically give you extra performance,
they allow you to easily utilise extra processors to potentially give
you better performance (assuming your model scales).

-tony

Mark Johnson

unread,

Aug 13, 2002, 9:18:18 AM8/13/02

to

Larry Palmer wrote:
>
> I'm doing a project which basically involves keeping state for a large
> number of customer sessions, such that there might be as many as, say 20,000
> simultaneous "sessions", each lasting minutes to hours (but requiring only
> infrequent actions during the session, so the total CPU load is still fairly
> small). It's been suggested that I should model each transaction as a
> thread, to exploit the elegant and intuitive concept of writing a simple
> piece of code that handles a single session, then simply replicating it as
> many times as needed. Essentially, a thread would be used as the unit of
> unit of local state for a single session. It's also been suggested that
> this might make the program more "robust", in the sense that a single thread
> blowing up due to a bug might allow all the other sessions to carry on
> unaffected, or at least have a higher probability of doing so.
>

Please don't use a single thread for each session. Tens of thousands of
threads will likely *not* work on the OS you selected nor does it do
anything to help making the program more robust. If you want it "more
robust", you need separate processes (not threads) with separate address
spaces. A single thread corrupting memory can take down the entire
application.

> My question is, first, is that a valid model?

Yes, but not very efficient. The kernel overhead for a thread is often
far higher than the state you are trying to maintain for each session
and does not take the place of the state data.

> Secondly, of course, is it
> even feasible to have 20,000 simultaneous threads (under redhat linux, or
> possibly realtime linux)?

Not with a stock kernel.

> Certainly it involves raising the allowed number
> of threads in the kernel, as well as recompiling libpthread to allow more
> threads per process (and having a lot of memory for all those stacks). I
> suspect there will be other limits as well that will have to be changed as
> they're encountered. Is it doable?
>

Yes, but I still would not do that.

> My gut feeling is that this is asking for grief by using threads in a way
> that's not really intended (as repositories of local state, rather than
> simply units of concurrency).

A good hunch.

> My inclination is to do a more standard
> design with a single process and an array of objects, one for each session,
> containing finite state machines showing the current state of the session,
> and using non-blocking calls for all IO. A few threads could conceivably be
> used to minimize latency by making sure there's always a thread ready to
> handle input from any data source, in the event that some lengthy code paths
> appear.
>

Unless you have multiple CPU's the lengthy code paths will probably slow
you down anyway. Each CPU can only do one thing at a time anyway :-).

> Is there any conventional wisdom on this subject? Or any thoughts one way
> or the other?

Thread pools is a common approach. What you described as "your
inclination" will work just fine with careful implementation.
--Mark

Eric D Crahen

unread,

Aug 13, 2002, 10:05:20 AM8/13/02

to

You can find some good information in the POSA books about this.

http://www.cs.wustl.edu/~schmidt/POSA/

There is also some related information on the C10K page.

http://www.kegel.com/c10k.html

- Eric
http://www.cse.buffalo.edu/~crahen

Eric Sosman

unread,

Aug 13, 2002, 11:08:21 AM8/13/02

to

Larry Palmer wrote:
>
> I'm doing a project which basically involves keeping state for a large
> number of customer sessions, such that there might be as many as, say 20,000
> simultaneous "sessions", each lasting minutes to hours (but requiring only
> infrequent actions during the session, so the total CPU load is still fairly
> small). It's been suggested that I should model each transaction as a
> thread, to exploit the elegant and intuitive concept of writing a simple
> piece of code that handles a single session, then simply replicating it as
> many times as needed. Essentially, a thread would be used as the unit of
> unit of local state for a single session. It's also been suggested that
> this might make the program more "robust", in the sense that a single thread
> blowing up due to a bug might allow all the other sessions to carry on

> unaffected, or at least have a higher probability of doing so. [...]

Others have addressed the drawbacks of using so many threads (even
if you somehow manage to create them all, which is doubtful), but I
haven't seen any discussion of the robustness claim. Here's my take
on it: the claim is flat-out wrong, R-O-N-G, wrong.

If a single thread "blows up" you need to consider what it might
have done to the state of the rest of the program while exploding.
What data belonging to threads A,B,C,... are immune from damage when
thread X runs amok? Why, none at all. The entire motivation for
running multiple threads inside a single process is to allow them to
share the same address space and to have the same access privileges
to all the memory it represents. If thread X goes berserk and starts
using uninitialized pointers and the like, all threads are vulnerable.
In fact, the thread that "blows up" may not even be the thread "at
fault:" for example, the rogue X could have trashed a data structure
"belonging to" the innocent A, causing the latter to go off the rails.

Even if the jaywalker treads only on "its own" data and dies
before stomping on anything else, there can still be plenty of trouble.
What if thread X locks a mutex and then dies before unlocking it? Later
"undamaged" threads trying to acquire that same mutex will fail, and
will fail forever. Even if you come up with some sort of scheme for
forcibly overtaking the already-locked mutex, what is the state of the
data that mutex protected? The whole point of a mutex is to prevent
threads from "seeing" protected data in the inconsistent states that
quite commonly arise even in correct programs. If you drive past the
"DANGER - DRAWBRIDGE OPEN" sign, it's nobody's fault but your own if
you wind up in the drink.

Just about the only kind of blowup a thread can experience without
putting the other threads in jeopardy is a "hang" -- an infinite loop
or self-deadlock or something of the sort. And even then, the hang
must occur while the thread holds no mutexes other threads might later
need to acquire.

Errors, faults, and bugs are not properties of individual threads,
but of the entire program. "All for one, and one for all."

--
Eric....@sun.com

Alexander Terekhov

unread,

Aug 13, 2002, 12:40:28 PM8/13/02

to

Yeah, but a whole bunch of boost.org experts have completely different
opinion; they really want that to a) unwind and b) make JOIN propagate
ALL EXCEPTIONS [*UNHANDLED INCLUDING*] to the join() caller(s)! ("never
joined"/detached threads aside -- they just aren't "that far" in their
EH/MT design discussions at this moment, so to speak)

Here's the latest bit from the key designer:

"....
I think I'm convinced that strict termination is too difficult to use."

Got it? ;-)

regards,
alexander.

Alexander Terekhov

unread,

Aug 13, 2002, 1:36:21 PM8/13/02

to

It's sooooo funny, folks... sorry, but I just can't resist:

-------- Fwd Message --------
From: "William E. Kempf" <willia...@hotmail.com>
Newsgroups: gmane.comp.lib.boost.devel
Subject: Re: Re: Re: Threads & Exceptions
Date: Tue, 13 Aug 2002 12:16:53 -0500

From: "Anthony Williams" <ant...@nortelnetworks.com>
> William E. Kempf writes:
> > I claimed that the concept could not be applied _to the main
> > thread_, because that results in an explicit call to terminate(),
possibly
> > with out stack unwinding, no less.
>
> OK, you've convinced me (with the stack-unwinding (or possible lack
thereof)
> issue) --- unhandled exceptions from any thread should terminate the
process
> by default.

Just to make things as confusing as possible, I'm no longer in the camp of
believing this. The reasons I thought the process should terminate are
still valid, but the fact that the termination handlers would execute in the
context of that thread and how difficult it would be to exit cleanly seem to
weigh much heavier then other factors, so I think only the thread in
question should be terminated.

> If a user wishes to catch/pass through exceptions from a
> thread, then they can install a thread_terminate_handler (of some variety)
to
> do this, under the knowledge that installing such a handler for the main()
> thread (which would have to be done under the guise of a real
> terminate_handler, using set_terminate, even if the library hides this
fact)
> may skip stack unwinding, or cause other problems (depending on the
platform).

set_terminate() applies to the process, not to the main thread. I think
this distinction will cause issues, especially if set_thread_terminate() is
routed to set_terminate() for the main thread in this way. As much as I
dislike the disparity, I think I just have to document that the main thread
behaves differently in this regard.

> (BTW: Are my posts coming through in plain text now?)

Appear to be, yes.

Bill Kempf
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Hillel Y. Sims

unread,

Aug 13, 2002, 2:40:19 PM8/13/02

to

"Alexander Terekhov" <tere...@web.de> wrote in message
news:3D594395...@web.de...

>
> -------- Fwd Message --------
> From: "William E. Kempf" <willia...@hotmail.com>
> Newsgroups: gmane.comp.lib.boost.devel
> Subject: Re: Re: Re: Threads & Exceptions
> Date: Tue, 13 Aug 2002 12:16:53 -0500
>

> Just to make things as confusing as possible, I'm no longer in the camp of
> believing this. The reasons I thought the process should terminate are
> still valid, but the fact that the termination handlers would execute in
the
> context of that thread and how difficult it would be to exit cleanly seem
to
> weigh much heavier then other factors, so I think only the thread in
> question should be terminated.
>

I have posted the following response a short time ago to that thread on the
boost newsgroup, which I believe highlights a critical flaw in that line of
thinking:

"Hillel Y. Sims" <hs...@factset.com> wrote in message
news:ajbhm1$b36$1...@main.gmane.org...
>
> "William E. Kempf" <willia...@hotmail.com> wrote in message
> news:OE52qXkXriph1...@hotmail.com...
> >
> > This means terminate_thread()
> > instead, which only terminates the current thread and not the
application.
> > A set_terminate_thread_handler() can be employed to handle this case in
> the
> > same manner that you'd use for application termination.
>
> Hi Bill, what do you think of the following code?
> (please overlook the pseudocode aspect, I'm not completely familiar yet
with
> the boost.thread syntax..)
>
> Mutex mutex;
> Condition cond;
> DataObj sharedData;
> bool serverRunning = true;
>
> void* threadfunc(void*);
>
> int main()
> {
> Thread* t = new Thread(&threadfunc);
> while (serverRunning) {
> ...
> Lock lock(mutex);
> sharedData.ready = false;
> sharedData.question = ...;
> while (!sharedData.ready)
> cond.wait(lock);
> cout << "the answer is: " << sharedData.answer << endl;
> ...
> }
> thread_join(t);
> }
>
> void* threadfunc(void*)
> {
> Lock lock(mutex);
> sharedData.answer = 42;
> sharedData.ready = true;
> if (random() % 2)
> throw catch_me_if_you_can();
> cond.signal();
> return 0;
> }
>

(However, I am still unsure as to whether there could still exist any
potential deadlock or race conditions via the use of a custom
terminate_handler by way of std::terminate() when terminating the entire
process on unhandled exception in any thread? Actually, I will guess there
probably are, given my experience with multi-threaded exit handling on
VMS...)

hys

(ps, the above message was copyright by me, but I promise not to sue if
anyone quotes it in their responses ;-)

--
Hillel Y. Sims
FactSet Research Systems
hsims AT factset.com

Alexander Terekhov

unread,

Aug 13, 2002, 3:24:34 PM8/13/02

to

"Hillel Y. Sims" wrote:
[...]

> I have posted the following response a short time ago to that thread on the
> boost newsgroup, which I believe highlights a critical flaw in that line of

> thinking: ....

Yeah, and you've got tons of replies (so far)... well, mine are still
"on hold"; Abrahams is just way too busy to "moderate" them quickly,
I guess. ;-) ;-)

-------- Original Message --------
boost...@lists.boost.org@lists.boost.org on 08/13/2002 04:47:08 PM

Sent by: boost-...@lists.boost.org
Subject: Your message to Boost awaits moderator approval

Your mail to 'Boost' with the subject

Re: Attempting resolution of Threads & Exceptions Issue

Is being held until the list moderator can review it for approval.

The reason it is being held:

Post by a moderated member

Either the message will get posted to the list, or you will receive
notification of the moderator's decision. If you would like to cancel
this posting, please visit the following URL:

-------- Original Message --------
boost...@lists.boost.org@lists.boost.org on 08/13/2002 05:46:01 PM

Sent by: boost-...@lists.boost.org
Subject: Your message to Boost awaits moderator approval

Your mail to 'Boost' with the subject

Re: Attempting resolution of Threads & Exceptions Issue

Is being held until the list moderator can review it for approval.

The reason it is being held:

Post by a moderated member

Either the message will get posted to the list, or you will receive
notification of the moderator's decision. If you would like to cancel
this posting, please visit the following URL:

-------- Original Message --------
Alexander Terekhov
08/13/2002 04:46 PM

To: bo...@lists.boost.org
cc: pdi...@mmltd.net
Subject: Re: Attempting resolution of Threads & Exceptions Issue

Peter Dimov wrote:
[...]
> This is a valid concern, but it's possible to have the cake and eat it,
too.
> Provide the "unsafe" interface, where thread<R> propagates, as an
> experiment. Wait for user feedback. Obtain real data on whether the
mistakes
> caused by this behavior are a significant problem, and not a small part
of
> the vast majority of thread-related errors. Nobody argues with real data.
> :-)

If you really want/need it, why don't you simply "wrap" a function
throwing something useful [that you'd EXPECT and are really going to
HANDLE in the joined thread(s)] using some return-value/exception(s)-
variant-holder-functor that would propagate EXPECTED exceptions caught
and in it's launching/landing-"pad" function-wrapper ON RETURN VALUE
EXTRACTOR "method" invocation [or f.ex.pthread->join()->
throwCaughtExceptionIfAny() for void operations, thread cancellation
aside in this case]?

Consider that thread join/timedjoin/tryjoin interface would have to
return A POINTER to a wrapper object [stored-in/managed-by thread
object] anyway -- to indicate NO-RESULT and STILL-RUNNING conditions
[unique pointers] as the consequence of thread cancelation and timed-
out/still-busy timedjoin() and tryjoin() calls.

Well, as for "Nobody argues with real data"... consider the following
opinions [Mr. Butenhof, of course]:

"POSIX threads do not have parent-child relationships. All threads
are peers, and as nearly equal as possible. (The only real exception
is that a return from most threads is equivalent to pthread_exit()
while a return from the initial thread [main()] is equivalent to
exit(), terminating the process.)

DCE did try to build this sort of hierarchy for RPC relationships;
an exception that terminated an RPC server routine would propagate
to the stub routine in the client (caller). This proved REALLY
annoying; but of course in that case exception support was primitive
(C macros) and we were propagating the exception (sans stack context)
across address spaces and even architectures/operating systems."

"There are exceptions, and building cancellation on, well, exceptions,
allows a clean implementation in such cases. The original application
was DCE. DCE was designed to propagate a cancel across the RPC link
from server to client, unwinding the "virtual call stack". (Not
necessarily a good idea since they're in different address spaces,
but that's beside the point.) However, although the server routine
might be cancelled, the server thread would remain to run other
instances. Therefore, the "wrapper" would catch (and finalize) any
exception (including cancel) raised by the RPC server routine, and
propagate it through the comm link to the client, while continuing
to run normally. There are very few examples where a scheme like
this makes any sense."

regards,
alexander.

-------- Original Message --------
Alexander Terekhov
08/13/2002 05:45 PM

To: bo...@lists.boost.org
Subject: Re: Attempting resolution of Threads & Exceptions Issue

William E. Kempf wrote:
[...]
> However, the point is probably irrelavant, since I think I'm convinced

> that strict termination is too difficult to use.

"On Digital UNIX, an unhandled exception in a thread will terminate the
process -- and I believe that's the best possible implementation. The
thread is an inseparable part of the application, not a safely protected
address space like a process. If it's gone awry for reasons you don't
understand, the entire process is suspect.

An unhandled signal WILL terminate the process, so there's no such thing
(in POSIX) as a thread dying from a signal. Again, if you're working with
some non-portable extensions that allow an individual thread to die of a
signal, then you need to look to that implementation to provide whatever
assistance you require. And, again, I think such an implementation would
be a bad idea. We originally did that with DCE threads, and came to realize
how dangerous it was. When a thread gets a SIGSEGV, there's something wrong
with its code or address space -- and the other threads in the process have
the same problem. There's no protection. You're best off just taking the
core dump and figuring out what happened. If you really need "fail safety",
then fork the process from a "watcher" that takes the SIGCHLD (or just
waits) and forks a new, clean, copy to start over.

And, finally, even if your implementation doesn't terminate the process
on an unhandled signal or exception, and even if it allows you to capture
the termination status with something like pthread_join, beware of the
consequences of breaking the code that created the thread, expecting to
later join with it itself." --Mr.B

regards,
alexander.

-------- Original Message --------
Date: Mon, 12 Aug 2002 16:07:27 +0200
From: Alexander Terekhov <tere...@web.de>
Reply-To: tere...@web.de
Newsgroups: comp.std.c++
Subject: Re: C++ exception handling

"Hillel Y. Sims" wrote:
>
> "Alexander Terekhov" <tere...@web.de> wrote in message

> news:3D501B4D...@web.de...
> >
> >
> > Do really think that runtime checks without unwinding would be stupid?
> >
>
> Isn't exc-spec checking really unnecessary runtime overhead in general
> though (for non-empty throw(..) specs)?

< really important empty throw() specs aside >

Preventing catch(...){}/rethrow-like idioms from screwing things up [and
somewhat silly "translation" in unexpected()], my answer is: SORT-OF-YES.
Uhmm, well, Hillel, we've "already been there and done that" at least
>>TWICE<< w.r.t. your points I've snipped below. So(*),

[snip]

All, read the following rather old articles, think about the issues
discussed in them.. [DON'T USE THE PROPOSED "SOLUTIONS" ;-) ;-) ;-)]

http://www.bleading-edge.com/Publications/C++Report/v9609/Column3.rtf
http://www.bleading-edge.com/Publications/C++Report/v9609/Column3.htm
(Exceptions and Debugging, Jack Reeves)

http://www.bleading-edge.com/Publications/C++Report/v9703/Column5.rtf
http://www.bleading-edge.com/Publications/C++Report/v9703/Column5.htm
(Reflections on Exceptions and the STL, Jack Reeves)

regards,
alexander.

(*) http://groups.google.com/groups?selm=3C9B85C2.39936A4B%40web.de
http://groups.google.com/groups?selm=3CB18FDE.93636516%40web.de

Daniel Miller

unread,

Aug 13, 2002, 4:43:05 PM8/13/02

to

Alexander Terekhov wrote:

> Eric Sosman wrote:
> [...]
>
>> Others have addressed the drawbacks of using so many threads (even
>>if you somehow manage to create them all, which is doubtful), but I
>>haven't seen any discussion of the robustness claim. Here's my take
>>on it: the claim is flat-out wrong, R-O-N-G, wrong.

[...snip...]

> Yeah, but a whole bunch of boost.org experts

To which subject matter of expertise are you referring? :-)

I have been reading that recent set of tumultuous MT
newsgroup-threads over at
news://news.gmane.org/gmane.comp.lib.boost.devel. "Expert" was not the
first word which came to my mind.

> have completely different
> opinion; they really want that to a) unwind and b) make JOIN propagate
> ALL EXCEPTIONS [*UNHANDLED INCLUDING*] to the join() caller(s)! ("never
> joined"/detached threads aside -- they just aren't "that far" in their
> EH/MT design discussions at this moment, so to speak)

Yeah, I too have been silently wondering when they will get around
that <satire>minor detail</satire>.

It never ceases to amaze me how discussions about the "portable"
Boost.Threads MT library (which even goes so far as to fuss endless over
thread-id because of the remote theoretical possibility that there might
be absolutely no common representation of thread-id across all operating
systems, not even some POD union) fail to grasp that entire schools of
MT thought exist where joining threads is not only frowned upon, but
where whole well-respected MT RTOSes historically lacked a thread-join
feature for two decades (e.g., pSOS). I find it disappointing that
there exist people who think that the primary reason for creating a
thread is to later join it---i.e., I am disappointed with their
Arizona-lightning-bolt thread-model with its incumbent
start-a-thread/shutdown-a-thread overhead not to mention its incumbent
do-I-have-enough-resources-to-start-yet-another-branching-thread
uncertainty. I find it completely baffling that the same people who
fuss endlessly over how to represent thread-id portably (when in fact
every modern operating system has a substantially similar
thread-identifier) completely ignore operating systems where joining
threads is impossible/impractical and then blithely focus so much of
their library design on joining threads when in fact that thread-join
concept is not portable to some very important MT environments in
embedded systems using certain RTOSes.

1) With at least one regular Booster strongly making the claim that
the entire topic of MT is congruent to asynchronous function-call and
asynchronous function-call is congruent to the entire topic of MT over
in news://news.gmane.org/gmane.comp.lib.boost.devel's recent MT
newsgroup-threads and 2) with other people making other bold statements
about MT which not only do I disagree but which I find varying somewhere
between naive to preposterous about MT and 3) with still other people
acquiescing to some of these outside-of-the-mainstream bold statements,
I am uncomfortable with the whole concept of Boost even *possibly* being
in charge of the future of MT in C++, let alone apparently having the
favorite-son status for C++0x standardization of an MT library.

I keep oscillating between two theories of how to handle
Boost.Threads versus the future:

1) All of us who have extensive mainstream MT experience over the
years should go into Boost and fix what is messed up, lest we get a
C++0x (i.e., language or libraries or both) which is likewise messed up
in MT matters which will take years of education to dismantle.

2) We should let Boost merrily go on its arcane MT path. The
end-result might be so peculiar that it might, fortunately, be rejected
by the long-standing traditional MT mainstream.

> Here's the latest bit from the key designer:
>
> "....
> I think I'm convinced that strict termination is too difficult to use."
>
> Got it? ;-)

<Knock. Knock. Knock.> INTERPOL Copyright Police here, Mr Terekhov.
Do you have permission from the copyright holder to quote that
outside of Boost? ;^)

> regards,
> alexander.

FYI, here is a URL to an image of the stereotypical Arizona lightning
bolt fracturing into further sublightning bolts, possibly
not-well-bounded in number:

http://www.alamy.com/thumbs/4/{D53DBB94-F11C-4B82-BB9A-7205A0D5B2A3}/A15A9B.jpg

The alternative thread model school of thought to which I subscribe resembles digital electrical

engineering's timing diagrams, as shown briefly at the following URL.

http://www.redbrick.dcu.ie/academic/CLD/chapter8/chapter08.doc.anc4.gif

Create all of the threads at startup-time to ensure that there exist
enough resources to support the maximum number of threads needed by the
design.

Rising edge is when a (detached) thread obtains a unit of work (e.g.,
pops a unit-of-work/transaction message from a message-queue; receives
an event on which it was pending).

Falling edge is when a (detached) thread finishes that
unit-of-work/transaction and one again pends, awaiting the next
unit-of-work/transaction.

Alexander Terekhov

unread,

Aug 13, 2002, 5:57:25 PM8/13/02

to

Daniel Miller wrote:
[...]

> I keep oscillating between two theories of how to handle

> Boost.Threads versus the future: ....

All, FYI:

<quote>

-------- Original Message --------
From: Beman Dawes <bda...@acm.org>
Newsgroups: gmane.comp.lib.boost.devel
Subject: RE: Re: Re: Re: Threads & Exceptions

At 08:21 PM 8/11/2002, David Bergman wrote:

>I truly hope that this forum will be open for diverse interpretations of
>how the C++ standard should be applied to MT C++, at least till such a
>standard emerges (in, say, six years).

Threads are currently on track for the Technical Report. That is on a much
shorter schedule - final proposals are due next Spring.

--Beman

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

</quote>

regards,
alexander.

Hillel Y. Sims

unread,

Aug 13, 2002, 7:40:46 PM8/13/02

to

"Alexander Terekhov" <tere...@web.de> wrote in message

news:3D595CF2...@web.de...

>
> "Hillel Y. Sims" wrote:
> [...]
> > I have posted the following response a short time ago to that thread on
the
> > boost newsgroup, which I believe highlights a critical flaw in that line
of
> > thinking: ....
>
> Yeah, and you've got tons of replies (so far)...

Is there a logic error in my code or reasoning demonstrating why I think
"thread_terminate()" would not necessarily work well? (Other than it
precludes all the fun discussion of how to pass exceptions back across
thread boundaries, which somehow leads back to catch(...) etc.)

hys

ps:

>
> -------- Original Message --------
> Date: Mon, 12 Aug 2002 16:07:27 +0200
> From: Alexander Terekhov <tere...@web.de>
> Reply-To: tere...@web.de
> Newsgroups: comp.std.c++
> Subject: Re: C++ exception handling
>

That post never showed up either...?

>
> "Hillel Y. Sims" wrote:
> >
> > "Alexander Terekhov" <tere...@web.de> wrote in message
> > news:3D501B4D...@web.de...
> > >
> > >
> > > Do really think that runtime checks without unwinding would be stupid?
> > >
> >
> > Isn't exc-spec checking really unnecessary runtime overhead in general
> > though (for non-empty throw(..) specs)?
>
> < really important empty throw() specs aside >
>
> Preventing catch(...){}/rethrow-like idioms from screwing things up [and
> somewhat silly "translation" in unexpected()], my answer is: SORT-OF-YES.

I guess... it's really the whole "shoot yourself in the foot thing" though.

> Uhmm, well, Hillel, we've "already been there and done that" at least
> >>TWICE<< w.r.t. your points I've snipped below. So(*),

In a good way or a bad way? (only twice? ;-)

>
> [snip]
>
> All, read the following rather old articles, think about the issues
> discussed in them.. [DON'T USE THE PROPOSED "SOLUTIONS" ;-) ;-) ;-)]
>
> http://www.bleading-edge.com/Publications/C++Report/v9609/Column3.rtf
> http://www.bleading-edge.com/Publications/C++Report/v9609/Column3.htm
> (Exceptions and Debugging, Jack Reeves)
>
> http://www.bleading-edge.com/Publications/C++Report/v9703/Column5.rtf
> http://www.bleading-edge.com/Publications/C++Report/v9703/Column5.htm
> (Reflections on Exceptions and the STL, Jack Reeves)
>
> regards,
> alexander.
>
> (*) http://groups.google.com/groups?selm=3C9B85C2.39936A4B%40web.de
> http://groups.google.com/groups?selm=3CB18FDE.93636516%40web.de

--

Alexander Terekhov

unread,

Aug 14, 2002, 6:29:20 AM8/14/02

to

"Hillel Y. Sims" wrote:
[...]

> Is there a logic error in my code or reasoning demonstrating why I think
> "thread_terminate()" would not necessarily work well?

I don't think so. However, your insistence on std::exception and
replacement catch(...) with catch(std::exception)-&-eat-it-or-rethrow
is WRONG. The real strategy is much simpler and straightforward:
don't catch things you have no idea what they are meant for (use
RAII objects [finally in SEH] for cleanup/fixup instead). Using this
strategy you'd only need throw() ex.specs. -- that's also quite the
advantage (having maintenance in mind). On the other hand, any LIBRARY
developer [library code using templates, or whatever meant to invoke
client code, aside... unless you want to impose nothrow-safety on the
client here and there] is well advised to systematically use non-empty
ex.specs -- as a defense(*) against silly-user-written catch(...) and
catch(std::exception) and etc. "handlers" stealing INTERNAL/UNEXPECTED
exceptions from would-be-core-dump and his change/service team.

regards,
alexander.

(*) Brain-dead unwinding on ex.specs. violation and implementation-
defined unwinding for uncaught/unhandled exceptions aside.

--
< I wanna post it here because I just can't work and monitor boost
newgroup -- way too funny.. resulting in way too prolonged ROFLs >

David Abrahams wrote:
[...]
> If all the code is exception-neutral and gives the basic guarantee,
> by the time the thread exits, all program invariants are intact.

Even if some throw() operation would suddenly throw? Well, if life is
so perfect as Abrahams seems to paint it, and C++ programs, threads aside,
are meant to recover whatever-is-thrown at any point, then why does the
standard have that nice terminate()->abort() mechanism with UNBELIEVABLY
SHORT list of "situations"/"such cases"... saying *specifically* the
following [emphasized part], to begin with:

"[except.terminate] The terminate() function
In the following situations exception handling must be abandoned
for less subtle error handling techniques: ....
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^(*)

In such cases,

void terminate();

is called (18.6.3). In the situation where no matching handler is
found, it is implementation-defined whether or not the stack is
unwound before terminate() is called. In all other situations,
the stack shall not be unwound before terminate() is called."
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

<?!>

regards,
alexander.

(*) < C++ Standard Core Language Active Issues, Revision 22 >

"Bjarne Stroustrup: I strongly prefer to have the call to
std::terminate() be conforming. I see std::terminate() as
a proper way to blow away "the current mess" and get to the
next level of error handling. I do not want that escape to
be non-conforming - that would imply that programs relying
on a error handling based on serious errors being handled
by terminating a process (which happens to be a C++ program)
in std::terminate() becomes non-conforming. In many systems,
there are - and/or should be - error-handling and recovery
mechanisms beyond what is offered by a single C++ program."

Alexander Terekhov

unread,

Aug 14, 2002, 8:15:55 AM8/14/02

to

Alexander Terekhov wrote:
[...]

> -------- Original Message --------
> Alexander Terekhov
> 08/13/2002 04:46 PM
>
> To: bo...@lists.boost.org
> cc: pdi...@mmltd.net
> Subject: Re: Attempting resolution of Threads & Exceptions Issue
>
> Peter Dimov wrote:
> [...]
> > This is a valid concern, but it's possible to have the cake and eat it,
> too.
> > Provide the "unsafe" interface, where thread<R> propagates, as an
> > experiment. Wait for user feedback. Obtain real data on whether the
> mistakes
> > caused by this behavior are a significant problem, and not a small part
> of
> > the vast majority of thread-related errors. Nobody argues with real data.
> > :-)
>
> If you really want/need it, why don't you simply "wrap" a function
> throwing something useful [that you'd EXPECT and are really going to
> HANDLE in the joined thread(s)] using some return-value/exception(s)-
> variant-holder-functor that would propagate EXPECTED exceptions caught
> and in it's launching/landing-"pad" function-wrapper ON RETURN VALUE
> EXTRACTOR "method" invocation [or f.ex.pthread->join()->
> throwCaughtExceptionIfAny() for void operations, thread cancellation

> aside in this case]? ....

Eric D Crahen wrote:
>
> Hillel Y. Sims wrote:
>
> > I just thought of another one:
> > 4. The standard library already throws exceptions inherited from
> > std::exception, and you don't want to introduce another completely
> > orthogonal hierarchy just for the sake of being contrary.
>
[...]
> ImageFutue f("somefile"); // Start loading an image in another thread
^^^^^^^^^^^^^^^^^^^^^^^(A)

> using
> // Image.read()
>
> // .. // Do some other things for a while, setup the
> // frame to display it, find the matching
> // thumbnail that won't take as long to
> // load image, whatever
>
> try {
^^^^^^^(B)

>
> Image& img = f.getImage(); // Get the ImageFuture threads result.
^^^^^^^^^^^(C)

>
> // paint it, print it, etc
>
> } catch(ErrorCorruptImage&) { // not derived from std::exception
> // ... Handle one error one way
> } catch(ErrorFileOpen&) { // not derived from std::exception
> // ... Handle the other error another
> }
[...]
> I don't think having the threads exception unwind another stack is that
> big of a problem for something like this.

The problem is that another thread has its own stack and dynamic context
for exception handling. The EXPECTED handlers are known at the point (B)
in the joiner thread; NOT at the point (A) -- when you start another
thread or just queue a workitem. Now, presuming that future C++ would
provide a mechanism to "extract" a properly ordered typelist of exception
types from exceptions specifications [opened ex.specs. aside -- you'd have
to provide your own typelist or simply catch-wrapper-function], you could
then easily write generic/meta code that would catch all exceptions and
simply store the copy constructed/placed into some >>discriminated union<<
(problems w.r.t. alignments aside) inside that "future-"/"thread-" object,
and later, could then re-throw it on "result extractor method" invocation;
(C) in your example. You could then even provide some optional
unhandledException() hook/handler that would be invoked on "future-"/
thread-" object destruction with never-re-thrown exceptions; it would
then cover "never-joined"/detached "threads"/"futures" case as well.

But that has really NOTHING to do with thread v. process termination
and unwinding/propagation of ALL exceptions, however.

regards,
alexander.

David Butenhof

unread,

Aug 15, 2002, 10:27:06 AM8/15/02

to

Daniel Miller wrote:

I haven't tried, and I don't intend to. At least, I've avoided any momentary
impulses in that direction based on what I've seen of the responses to
those who HAVE tried. (I know that Alexander can drive people crazy with
his style, and I certainly don't always agree with his points of view...
but he DOES have serious technical issues that cannot be ignored.) It's
clear (whether that impression is right or wrong is irrelevant: they've
caused it) that Boost doesn't care and won't listen. And all this copyright
stuff feels suspiciously like a smokescreen that could only be intended to
prevent dissenting views from being taken elsewhere.

I'm not sure of the relationship between the Boost folk and the "core" of
the C++ committee. I've gotten feedback from the latter (during a brief
spate of interesting discussion in comp.std.c++) that the committee is
extremely interested in knowledgable feedback and discussion. If Boost is
not, their work may prove irrelevant when it goes to committee where
unbiased discussion may be not merely allowed but encouraged.

> 2) We should let Boost merrily go on its arcane MT path. The
> end-result might be so peculiar that it might, fortunately, be rejected
> by the long-standing traditional MT mainstream.

There is always a risk in this strategy. We tried this several times with
obviously brain-dead and uselessly insane cruft that someone wanted to get
into the POSIX standard. It wasn't rejected. It was an option, and we
thought that was OK... until the working group for TOG's SUSv2 came along
and ignored all rational arguments to require that everyone implement the
"braindead and useless cruft" option.

The idea of propagating exceptions across a join is silly. The meaning and
disposition of an exception exists within the context of its call tree;
that's why exceptions are handled by unwinding the call tree instead of
registering global cleanup routines. Once the call tree is unwound, an
exception is a disembodied error status; a ghost. There's nothing the
joiner can possibility do with this that it couldn't do with a simple error
status. (And of course you'd better be sure that the object you've raised
is global and not local to the thread, or it CAN do something an error
status couldn't do... SIGSEGV.)

To put it more succinctly, once you've taken the exception out of its call
tree you might as well have delivered a standard UNIX signal to a global
signal handler. (With POSIX realtime signals you even get an argument
value...)

Or, even worse, they mean to keep the thread and its call tree alive and
unwound until and unless someone joins and finalizes the pending exception?
Now that's REALLY scary. Only partly because the cleanup would be done in a
different thread that, for example, doesn't own any mutexes that might be
unlocked. Or would the deferred unwind actually re-animate the zombie
thread to unwind up to the join barrier and then pass off to the joiner? (I
have to agree with Robin William's Genie here... "I can't bring the dead
back to life. It's not a pretty sight.")

"Be afraid. Be very afraid."

--
/--------------------[ David.B...@hp.com ]--------------------\
| Hewlett-Packard Company Tru64 UNIX & VMS Thread Architect |
| My book: http://www.awl.com/cseng/titles/0-201-63392-2/ |
\-------------[ http://homepage.mac.com/~dbutenhof ]--------------/

Mark Johnson

unread,

Aug 15, 2002, 2:17:09 PM8/15/02

to

David Butenhof wrote:
> [snip]

> The idea of propagating exceptions across a join is silly. The meaning and
> disposition of an exception exists within the context of its call tree;
> that's why exceptions are handled by unwinding the call tree instead of
> registering global cleanup routines.

Hmm. I'm kind of surprised that you not consider the use of global
handlers. Let me use Asynchronous System Traps from VMS as a counter
example. You could specify...
- a "first chance" handler
- handlers at each level of the call tree
- a "debug" handler
- a "last chance" handler
for AST's. I consider the idea of a global cleanup routine to be similar
to the "last chance" handler. Admittedly on VMS it was within the
context of a single process, but I don't see the real problem with
handlers across threads except for access to that thread's registers and
enough status to determine the cause of the problem.

When I think about it, I wish that exceptions (and signals) were handled
more like AST's on VMS since you could take some reasonable action (say
in the first chance handler) and resume execution at that point and save
the sequence of calculations (instead of abandoning them as you do now).
I really appreciated the ability to convert divide by zero's to a
suitably large value in the first chance handler and letting the user
get a report that had *some* useful data in it. [and we flagged the
suspect data appropriately]

> [snip]

>
> To put it more succinctly, once you've taken the exception out of its call
> tree you might as well have delivered a standard UNIX signal to a global
> signal handler. (With POSIX realtime signals you even get an argument
> value...)
>

With the way signals and exceptions are defined - I pretty much agree
with you. To make it usable, you need to get some insight into the
context of the condition that caused the problem (and have the ability
to fix it and resume execution).

--Mark

Eric Sosman

unread,

Aug 15, 2002, 3:57:08 PM8/15/02

to

Mark Johnson wrote:
>
> David Butenhof wrote:
> > [snip]
> > The idea of propagating exceptions across a join is silly. The meaning and
> > disposition of an exception exists within the context of its call tree;
> > that's why exceptions are handled by unwinding the call tree instead of
> > registering global cleanup routines.
>
> Hmm. I'm kind of surprised that you not consider the use of global
> handlers. Let me use Asynchronous System Traps from VMS as a counter
> example. You could specify...
> - a "first chance" handler
> - handlers at each level of the call tree
> - a "debug" handler
> - a "last chance" handler
> for AST's.

I think you've confused "conditions" (which are synchronous)
with "asynchronous system traps" (which aren't). The former were
handled by stack unwinding, etc. The latter were passed directly
to a global function designated for the purpose; there was no
provision for walking up a call chain.

When you think about it, the state of the call chain has very
little to do with the handling of an asynchronous event. By its
nature, an asynchronous event is independent of the call context;
it arises from activities unrelated to the program counter and the
current stack contents.

--
Eric....@sun.com

David Butenhof

unread,

Aug 16, 2002, 11:08:54 AM8/16/02

to

Mark Johnson wrote:

As Eric Sosman pointed out, you're describing VMS conditions (which are
exceptions, though with detached rather than C++-style attached handlers).
ASTs are essentially a more modular version of UNIX signals, where each AST
is directed (always queued and with arguments, like POSIX realtime signals)
to a uniquely specified function rather than to a generic global handler.

There's no way for arbitrary code to intercept an AST before delivery to the
specified handler, nor to get control after it returns. Each AST is
directed only to the single handler routine specified when the AST was
requested.

Condition handling provides "first chance" handlers and a last chance
handler for the debugger. Normal code cannot use them. Normal propagation
and handling is strictly by stack unwinding. Although there's a special
trick shared libraries can play in load-time initialization to call back
into the image activator in order to get a permanent condition handler at
(or near) the base of the process stack. Each thread stack is "cactus
linked" back to the main thread's (pre-main()) stack so all that is
preserved in a threaded process... but that works only because these
handlers aren't really "stack based", and don't assume there's any valid
call stack.

The Tru64 UNIX exception package provides for last-chance and
thread-last-chance handlers that it calls when it's done unwinding the
stack and the exception hasn't been finalized. The thread-last-chance
handler is really just a first-level intercept on the last-chance handler:
it gets first shot at the unhandled exception, and the exception facility
goes on to call the other if it doesn't handle it. (We use that to trap
cancel and exit within each thread, while leaving last-chance for the C++
runtime.)

>> [snip]
>>
>> To put it more succinctly, once you've taken the exception out of its
>> call tree you might as well have delivered a standard UNIX signal to a
>> global signal handler. (With POSIX realtime signals you even get an
>> argument value...)
>>
> With the way signals and exceptions are defined - I pretty much agree
> with you. To make it usable, you need to get some insight into the
> context of the condition that caused the problem (and have the ability
> to fix it and resume execution).

Right. You can't get that from another thread, so propagating an exception
makes no sense.

Besides, the thread that raised the exception is ASYNCHRONOUS with the
joining thread. By the time you've joined, the exception (and any
accompanying irreversable damage) happened some arbitrary time ago. Of
course, one could argue that if the process has remained standing during
the interval, it'll probably continue to stand as long as you don't push
too hard. (But then you could argue that for a lot of things, and I don't
see the point.)

I'm not arguing against trying to clean up. I'm arguing against pretending
that the joined thread's call stack is some sort of weird asynchronous
extension of the joining thread's call stack -- which is what you're saying
if you propagate the exception.

If you want to clean up on an unhandled exception, you'd better OWN the
thread that got the exception (otherwise, how would you know it's even OK
to try, or have any idea how... and, in any case, you need the thread ID to
join). So it's easy. You just code the thread's start routine (C++ method,
of course, in this case) with a catch(...) [ugh, but there you are] and
terminate the thread with a return value that will be visible to the
joining thread. If Boost wants to say that a Boost thread's return value is
an OBJECT rather than a (void*), that's probably appropriate, and certainly
reasonable. If the joiner chooses to handle this by raising that object to
clean up and gracefully terminate, why, that's just peachy too.

It's this notion that it might make any sense at all to propagate an
exception automatically/transparently across the join to another thread
that frightens me. That's a whole different kettle of creepy-crawlies.

The routine/method that called join knows it did so, and can correctly
handle cleanup on failure. But it doesn't need an exception to do so; a
status would be just fine. The caller, however, and the rest of the call
tree, cannot possibly distinguish the "mirrored" exception from the real
thing, and cannot be expected to understand the (lost) context.

And if you want all this to be automatic, Boost could specify (and
implement) a standard wrapper routine for all threads that catches any
unhandled exception and calls some globally registered thread-last-chance
handler. It can either return, terminate the process, or try some form of
cleanup. (Though I see no possible excuse for putting cleanup there rather
than in the thread start routine.) IF it returns, the Boost wrapper could
then terminate the thread with an "unhandled thread exception" exception.
The Boost join method could recognize that status and raise the exception
into the joiner's call tree to provoke an unwind.

The big difference? There's no pretense here that it's the same exception
that was unhandled in the joinee. This isn't "propagation". The joiner is
simply reporting the event by raising its own unique exception,
synchronously and normally, into the caller's call tree. Any callers of the
join method must be prepared to handle this new exception... but there's
only ONE new exception, not an open set of all possible exceptions the
joinee might have raised.

However... I still think it's a mistake to continue "business as usual" in
the process after any thread dies with an unhandled exception. Something is
seriously wrong, and nobody knows what it is. (Or it'd have been handled.)
This is not a recipe for reliable operation... or even for useful/safe
cleanup. Kill the process with a core file and sort it out later. If an
application really can't afford a shutdown, then the whole thing should be
running in a captive subprocess in the first place. The parent, safely
isolated from the suspect address space, (and other resources), can
fork/exec a new copy when one child dies unexpectedly.

Sure, that's "inconvenient"; but reliability (like performance) often is.
Pretending to handle cleanup in an unknown environment isn't reliable.
You've just painted over the cracks so you can stand back and sigh with
satisfaction over a job well done... until you actually try to walk into
the room, and fall through.

Mark Johnson

unread,

Aug 16, 2002, 1:49:28 PM8/16/02

to

Eric Sosman wrote:
>
> Mark Johnson wrote:
> >
> > David Butenhof wrote:
> > > [snip]
> > > The idea of propagating exceptions across a join is silly. The meaning and
> > > disposition of an exception exists within the context of its call tree;
> > > that's why exceptions are handled by unwinding the call tree instead of
> > > registering global cleanup routines.
> >
> > Hmm. I'm kind of surprised that you not consider the use of global
> > handlers. Let me use Asynchronous System Traps from VMS as a counter
> > example. You could specify...
> > - a "first chance" handler
> > - handlers at each level of the call tree
> > - a "debug" handler
> > - a "last chance" handler
> > for AST's.
>
> I think you've confused "conditions" (which are synchronous)
> with "asynchronous system traps" (which aren't). The former were
> handled by stack unwinding, etc. The latter were passed directly
> to a global function designated for the purpose; there was no
> provision for walking up a call chain.
>

Yes, my mistake. I should have reread the "gray wall" out in the lab
before making my comment. There is another mistake in the three handlers
which should have read...
- primary (used by the debugger)
- secondary (available for program use)
- the handlers at each level of the call tree
- last chance (used by the debugger and/or supervisor)
These are set using the SYS$SETEXV system call and provides a "global
means" of handling exceptions.

When I read about "global cleanup routines", I think about this kind of
mechanism (which was in VMS, and is not defined in Posix).
--Mark

Mark Johnson

unread,

Aug 16, 2002, 2:16:41 PM8/16/02

to

David Butenhof wrote:
>
> Mark Johnson wrote:
>
> [snip - mistaken characterization of AST instead of condition handlers]

>
> As Eric Sosman pointed out, you're describing VMS conditions (which are
> exceptions, though with detached rather than C++-style attached handlers).
> ASTs are essentially a more modular version of UNIX signals, where each AST
> is directed (always queued and with arguments, like POSIX realtime signals)
> to a uniquely specified function rather than to a generic global handler.
>

I agree - its been too long since I dealt with VMS and should have read
the "gray wall" before commenting.

> Condition handling provides "first chance" handlers and a last chance
> handler for the debugger. Normal code cannot use them. Normal propagation
> and handling is strictly by stack unwinding. Although there's a special
> trick shared libraries can play in load-time initialization to call back
> into the image activator in order to get a permanent condition handler at
> (or near) the base of the process stack. Each thread stack is "cactus
> linked" back to the main thread's (pre-main()) stack so all that is
> preserved in a threaded process... but that works only because these
> handlers aren't really "stack based", and don't assume there's any valid
> call stack.

I agree that the first and last chance handlers were reserved for the
debugger [I had to reread the documentation to confirm that]. However,
the secondary handler could be set by an application using SYS$SETEXV to
handle the exception. I used that mechanism in the mid 80's to trap...
- FORTRAN format errors to suppress the stack traceback and issue the
** quietly
- floating divide by zero operations to replace the result with a
suitably large (10**20 I recall) value and continue execution
This was put into a statistics package that was brought over from
another system and allowed our users to get *some* meaningful data from
the operation of the program. When the secondary handler runs, the stack
is perfectly valid.

The mechanisms you have in Ada, Java, C++, etc. for exception processing
do not allow for that kind of global solution. I find it extremely
irritating to work on a large system, have an exception thrown and
caught at some high level w/ only half the processing done. It means
that someone didn't take appropriate care at the lower level design and
implementation.

Of course, if you are looking for something to fire after the stack is
completely unwound, that is a different problem. :-(

>
> [explanation of Tru64 UNIX exception package]
>
I'll take your word for it, I haven't used that system.

> >> [snip]
> >>
> >> To put it more succinctly, once you've taken the exception out of its
> >> call tree you might as well have delivered a standard UNIX signal to a
> >> global signal handler. (With POSIX realtime signals you even get an
> >> argument value...)
> >>
> > With the way signals and exceptions are defined - I pretty much agree
> > with you. To make it usable, you need to get some insight into the
> > context of the condition that caused the problem (and have the ability
> > to fix it and resume execution).
>
> Right. You can't get that from another thread, so propagating an exception
> makes no sense.
>

I think we agree on this point - you can't do this today. You might be
able to do this some new mechanism, but that will be a *lot* of work.

> [snip - a lot of boost specific commentary]

> However... I still think it's a mistake to continue "business as usual" in
> the process after any thread dies with an unhandled exception. Something is
> seriously wrong, and nobody knows what it is. (Or it'd have been handled.)
> This is not a recipe for reliable operation... or even for useful/safe
> cleanup. Kill the process with a core file and sort it out later. If an
> application really can't afford a shutdown, then the whole thing should be
> running in a captive subprocess in the first place. The parent, safely
> isolated from the suspect address space, (and other resources), can
> fork/exec a new copy when one child dies unexpectedly.
>

Exactly. However, the current Posix exception handling is not conducive
to a global solution and condition handlers like VMS had is one way to
provide that ability without a lot of overhead.
--Mark

Alexander Terekhov

unread,

Aug 16, 2002, 3:59:31 PM8/16/02

to

David Butenhof wrote:
[...]

> If you want to clean up on an unhandled exception, you'd better OWN the
> thread that got the exception (otherwise, how would you know it's even OK
> to try, or have any idea how... and, in any case, you need the thread ID to
> join). So it's easy. You just code the thread's start routine (C++ method,
> of course, in this case) with a catch(...) [ugh, but there you are] and
> terminate the thread with a return value that will be visible to the
> joining thread. If Boost wants to say that a Boost thread's return value is
> an OBJECT rather than a (void*), that's probably appropriate, and certainly
> reasonable. If the joiner chooses to handle this by raising that object to
> clean up and gracefully terminate, why, that's just peachy too.

I believe that a simple "show case" demonstrating what some Boost folks
want to achieve is this:

// can be canceled and can also throw "operation_error"
object operation( another_object ) throw( operation_error, thread_cancel );

<somewhere (create a new thread -- async. "operation" function invocation)>
.
.
.
thread_ptr tp = new_thread( &operation,another_object(/**/) );
.
.
.

<somewhere (with access to "tp" or its copy or result of publishing
thread::current() from the "tp"-referenced thread itself) >
.
.
.
// promotion [rtti checked] -- now can type-safely join
joinable_thread_ptr< object > jtp( tp );

try {

// *join() can now throw "operation_error" if joinee ended with it
object* presult = jtp->timedjoin( timeout::relative( 1000 ) );

if ( thread::timedout( presult ) &&
thread::canceled( presult = jtp->cancel().join() ) ) {

std::cout << "OOPS: timedout and canceled!" << std::endl;

}
else {

std::cout << "DONE: " << *presult << std::endl;

}

}
catch( const operation_error& e ) {

cout << "operation failed: " << e.what() << std::endl;

}

However, I think, the problem is that even restricting the set
of "propagated" [caught-and-ready-for-rethrow-on-join] exceptions
to only those that are specified in the exception specs, we still
might:

a) inject too many catch handlers catching rather silly things
[like std::logic_error, std::invalid_argument, etc.] you
wouldn't "normally" want to catch, and

b) have "some problems" with never joined threads [detached
automatically on last ref. ("self" including) evaporation].

[...]

> However... I still think it's a mistake to continue "business as usual" in
> the process after any thread dies with an unhandled exception. Something is
> seriously wrong, and nobody knows what it is. (Or it'd have been handled.)
> This is not a recipe for reliable operation... or even for useful/safe
> cleanup. Kill the process with a core file and sort it out later. If an
> application really can't afford a shutdown, then the whole thing should be
> running in a captive subprocess in the first place. The parent, safely
> isolated from the suspect address space, (and other resources), can
> fork/exec a new copy when one child dies unexpectedly.

Yep; just another level of "recovery"; it can even end up in cluster-wide
"fail over" [to another system].

regards,
alexander.

Peter Dimov

unread,

Aug 17, 2002, 10:28:02 AM8/17/02

to

David Butenhof <David.B...@compaq.com> wrote in message news:<aA879.13$on2.8...@news.cpqcorp.net>...

The idea behind the exception propagation "Boost proposal" is not to
clean up, or handle unexpected exceptions.

Basically, (my interpretation) the idea is to extend the basic concept
of

class thread
{
thread(function-returning-void);
void join();
};

(currently provided by Boost.Threads) to

template<class ReturnValue> class thread
{
thread(function-returning-ReturnValue);
ReturnValue join();
};

where thread<void> provides the old behavior (no "join" return value,
no exceptions), and thread<R> serves as an asynchronous function call,
allowing code like:

int f();

try
{
int x = f();
}
catch(std::runtime_error)
{
// f() failed
}

to be transformed to:

int f();

thread<int> t(f);

try
{
int x = t.join();
}
catch(std::runtime_error)
{
// f() failed
}

The possible exceptions thrown by f() are treated, and transported by
the join(), as a return value would.

It is possible to not offer library support for this, and expect
people to wrap the f's with try/catch blocks that somehow translate
exceptions to return codes, and "unmangle" them at the join() side,
but this is just forcing users to do the same work manually.

Alexander Terekhov

unread,

Aug 17, 2002, 1:49:03 PM8/17/02

to

Peter Dimov wrote:
[...]

> class thread
> {
> thread(function-returning-void);
> void join();
> };
>
> (currently provided by Boost.Threads) to
>
> template<class ReturnValue> class thread
> {
> thread(function-returning-ReturnValue);
> ReturnValue join();
> };
>
> where thread<void> provides the old behavior (no "join" return value,
> no exceptions),

That's rather unnecessary "restriction", I think. I'd personally
would have "no problem" [never joined threads aside for a moment]
if

void operation() throw( std::bad_alloc, std::thread_cancel ); // ;-)

used in "new_thread( operation )->join();" statement, it would be
expected to throw std::bad_alloc indicating out of memory as the
result of operation() doing something in another thread too -- NOT
only "new_thread" out-of-memory failure. (this example is somewhat
silly, but never mind ;-) )

Exceptions and return values are orthogonal concepts, in my view.

> and thread<R> serves as an asynchronous function call,
> allowing code like:
>
> int f();

Here is what I personally DISAGREE STRONGLY. This f() can throw
ANYTHING! So, in order to make the following "transformation"
(your version of join NOT returning a pointer to report canceled/
timedout/stillbusy conditions [note: join(), and also timedjoin(),
and also tryjoin() operations ;-) ] aside for a moment):

> try
> {
> int x = f();
> }
> catch(std::runtime_error)
> {
> // f() failed
> }
>
> to be transformed to:
>
> int f();
>
> thread<int> t(f);
>
> try
> {
> int x = t.join();
> }
> catch(std::runtime_error)
> {
> // f() failed
> }
>
> The possible exceptions thrown by f() are treated, and transported by
> the join(), as a return value would.

you'd have to catch ALL exceptions using silly (if used w/o ES
"protection") catch(...) and some currently unavailable (I guess,
C++EX.ABI calls would work here, BTW) mechanism to store whatever-
is-thrown and "rethrow" it on join in the joiner thread(s). That's
BRAIN-DEAD. You ought to somehow LIMIT the set of exceptions caught
and "propagated" on join operations!(*) Also-brain-dead [currently
standard] unwinding on ES violations aside, in many, but NOT in all
cases, limiting the set to only those that are specified in the
prototype/exception specifications [thread_cancel and thread_exit
exceptions aside since they are "finalized" internally] would
"do it"... well, "probably".

regards,
alexander.

(*) http://groups.google.com/groups?selm=3D5A49FB.CED1038C%40web.de
(please pay some attention to the "A-B-C-problem"...)

Eric D Crahen

unread,

Aug 17, 2002, 3:09:33 PM8/17/02

to

On 08 Aug 2002, Alexander Terekhov wrote:

> But that has really NOTHING to do with thread v. process termination
> and unwinding/propagation of ALL exceptions, however.

On 17 Aug 2002, Alexander Terekhov wrote:

> Exceptions and return values are orthogonal concepts, in my view.

I think I would have to agree with you about that. (What? Did really I
just say that?) :)

I've seen alot of good, sensible arguments for why propogating ALL
exceptions is not that great an idea. The only time something similar to
this would be useful, in my view anyway, would be in the context of some
kind of Future pattern. But thats quite different as Alexander mentioned.

So, my question is what exactly is Boost trying to gain by doing this? In
otherwords, what is the upside to propgating all exceptions?
My impression, from following thier mailinglist discussion, was that they
were originally exploring the implementation of a Future (they were
refering to it as an async function wrapper). I'm unsure why the focus
shifted to progating all exceptions.

- Eric
http://www.cse.buffalo.edu/~crahen

Alexander Terekhov

unread,

Aug 17, 2002, 5:10:20 PM8/17/02

to

Eric D Crahen wrote:
[...]

> So, my question is what exactly is Boost trying to gain by doing this? In
> otherwords, what is the upside to propgating all exceptions?
> My impression, from following thier mailinglist discussion, was that they
> were originally exploring the implementation of a Future (they were
> refering to it as an async function wrapper). I'm unsure why the focus
> shifted to progating all exceptions.

Functions may return values. Functions may also throw "bad" things.

What they want to have is simply that "the effect" of {try/timed}
joining *non-canceled* and "finished" thread, created for a specific
function, would produce "the same result" [with respect to its return
value and exceptions that it can throw] as simply calling that function
synchronous; on the same thread ["A-B-C-problem" aside].

The "problem" is that thinking in "generic code"/templates terms, you
don't normally have any knowledge about exceptions that can be thrown
by the "user code". (unless you impose throw()-nothing "restrictions")

So, I guess, that's the only reason why they want "catch and propagate
everything".

regards,
alexander.

Peter Dimov

unread,

Aug 19, 2002, 1:39:28 PM8/19/02

to

Eric D Crahen <cra...@cse.buffalo.edu> wrote in message news:<Pine.SOL.4.30.02081...@hadar.cse.Buffalo.EDU>...

What do you gain by _not_ propagating all exceptions? (On conceptual
level, that is.)

Peter Dimov

unread,

Aug 19, 2002, 1:45:59 PM8/19/02

to

Alexander Terekhov <tere...@web.de> wrote in message news:<3D5E8C8F...@web.de>...

>
> Exceptions and return values are orthogonal concepts, in my view.

Not really. If you are wrapping a function, and it throws an
exception, you have no return value to propagate, so you must throw an
exception, too.

> > and thread<R> serves as an asynchronous function call,
> > allowing code like:
> >
> > int f();
>
> Here is what I personally DISAGREE STRONGLY. This f() can throw
> ANYTHING! So, in order to make the following "transformation"
> (your version of join NOT returning a pointer to report canceled/
> timedout/stillbusy conditions [note: join(), and also timedjoin(),
> and also tryjoin() operations ;-) ] aside for a moment):
>

[...]

>
> you'd have to catch ALL exceptions using silly (if used w/o ES
> "protection") catch(...) and some currently unavailable (I guess,
> C++EX.ABI calls would work here, BTW) mechanism to store whatever-
> is-thrown and "rethrow" it on join in the joiner thread(s).

Yes.

> That's
> BRAIN-DEAD.

What is brain dead, the concept or the implementation? If the concept
is solid, I can live with an approximate library-based implementation
until the compilers catch up.

Alexander Terekhov

unread,

Aug 19, 2002, 2:47:32 PM8/19/02

to

Peter Dimov wrote:
[...]

> What is brain dead, the concept or the implementation?

The concept. I, personally, really don't like the idea of propagating
std::logic_error (and alike beasts) across threads on join -- ideally,
I want it to kill the process at throw point, unless someone really
wants to pretend that s/he can "recover" from it and catch it for that
or some other (most likely silly) reason; like ignoring it and/or just
wanting to cause stack unwinding try to release something "important"
(transactional stuff like ATM cards, etc. should be released/rollbacked
by some isolated external watcher/observer/manager or just on the next
automatic restart after abnormal termination with uncommitted stuff).

Well, "futures" aside for a moment, how about the following "concept"
(just an illustration/ideas):

new_thread{< thread >}( function{, ...args...} );

{exception_propagator< exception{, ...exceptions...} >::}
{thread_attr_object.}new_thread( function{, args} );

{exception_propagator< thread, exception{, ...exceptions...} >::}
{thread_attr_object.}new_thread( function{, args} );

For example, given:

void operation(); // Oh, BTW, in the next release this might throw std::bad_alloc

We could then have:

a) no propagation of exceptions on join ["default"]:

new_thread( operation );

new_thread< my_fancy_thread >( operation );

thread::attr().set_name( "007" ).
new_thread( operation );

my_fancy_thread::attr().set_something( something ).
new_thread( operation );

b) propagation of exceptions [specified at thread CREATION point] on join:

exception_propagator< std::bad_alloc >::
new_thread( operation );

exception_propagator< my_fancy_thread,std::bad_alloc >::
new_thread( operation );

exception_propagator< std::bad_alloc >::
attr().set_name( "007" ).
new_thread( operation );

exception_propagator< my_fancy_thread,std::bad_alloc >::
attr().set_something( something ).
new_thread( operation );

where "my_fancy_thread" would be something along the lines of:

class my_fancy_thread : public thread {
public:

class attr : public thread::attr {
public: /*...add some fancy stuff...*/ };

/*...add some fancy stuff...*/ };

(with "thread fields", custom on_thread_start() and
on_thread_termination() "hooks", etc. )

Basically, the idea is to use "exception_propagator" beast to communicate
the typelist of exceptions that need to be caught, stored and propagated
on join to the "thread" template -- it would simply have discriminated
union for storing result (void* for void functions; just to have
"something" instead of void) PLUS all exceptions from that typelist.

This typelist should be properly ordered and will be used for "generic"
catch-and-store-it-in-a-union "finalization" in the launching/landing
pad routine. Join operations would simply fire a "visitor pattern" to
throw this or that exception caught and stored in the joinee thread.

Generic code would have to be parameterized to let users specify a
typelist containing any exception types s/he wants to catch-and-
propagate-on-join for this or that async. "operation" invocation.

Well, I'd also probably could live with something along the lines of:
(ES: exception specification/throw({...})-spec; ES_stuff_only: things
specified in the ES excluding thread_cancel and thread_exit exceptions;
well, I'm somewhat unsure with respect to thread_restart exception ;-) )

new_thread_that_will_propagate_on_join_ES_stuff_only( operation );

new_thread_that_will_propagate_on_join_ES_stuff_only<
my_fancy_thread >( operation );

thread::attr().set_name( "007" ).
new_thread_that_will_propagate_on_join_ES_stuff_only( operation );

my_fancy_thread::attr().set_something( something ).
new_thread_that_will_propagate_on_join_ES_stuff_only( operation );

but I don't think that this can be done in the current C++... its
standard-required-and-utterly-silly unwinding on ES violations aside
for a moment.

regards,
alexander.

Peter Dimov

unread,

Aug 20, 2002, 8:46:52 AM8/20/02

to

Alexander Terekhov <tere...@web.de> wrote in message news:<3D613D44...@web.de>...

> Peter Dimov wrote:
> [...]
> > What is brain dead, the concept or the implementation?
>
> The concept. I, personally, really don't like the idea of propagating
> std::logic_error (and alike beasts) across threads on join -- ideally,
> I want it to kill the process at throw point, unless someone really
> wants to pretend that s/he can "recover" from it and catch it for that
> or some other (most likely silly) reason;

You can put an exception specification that prohibits std::logic_error
on the threadproc (and hope that it doesn't unwind ;-) .)

Alexander Terekhov

unread,

Aug 20, 2002, 11:23:06 AM8/20/02

to

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Peter, I'm going to IGNORE your wink. ;-) That's rather serious
issue/design flaw in the current C++ language, I believe strongly.

To others: IT'S HOPELESS! Unfortunately, brain-dead unwinding on
ES violations IS THE STANDARD REQUIRED "FEATURE" in the current
C++ language ("The C++ Programming Language", 3rd Edition, Bjarne
Stroustrup [but annotations are mine]):

</quote>

14.6 Exception Specifications

Throwing or catching an exception affects the way a function
relates to other functions. It can therefore be worthwhile to
specify the set of exceptions that might be thrown as part of
the function declaration.

For example:

void f(int a) throw ( x2, x3 );

This specifies that f() may throw only exceptions x2, x3, and
exceptions derived from these types, but no others. When a
function specifies what exceptions it might throw, it effectively
offers a guarantee to its callers. If during execution that
function does something that tries to abrogate the guarantee,
the attempt will be transformed into a call of std::unexpected().
The default meaning of unexpected() is std::terminate(), which
in turn normally calls abort(); see §9.4.1.1 for details.

In effect,

void f() throw ( x2, x3 )
{
// stuff
}

is equivalent to:
^^^^^^^^^^^^^^^^^ <annotation>

a) IT IS NOT "is equivalent to:" -- pls see the
next annotation below (it's simply the second
major flaw; after the general idea to always
unwind unexpected things... partially hiding
behind rather silly "On other systems, it is
architecturally close to impossible not to
invoke the destructors while searching for a
handler" argument[1]. (quoted from: Section
14.7, Uncaught Exceptions, TC++PL)

b) copying/constructing temporary ("x2" or "x1")
caught exception object aside; BTW, I really
prefer to catch "const refs" (for "class type"
exceptions)

</annotation>

void f()
try
^^^ <annotation>

that's "function-try-block" -- pretty useless thing, and
only good for "translating"/"logging"/"attach something to"
exceptions thrown from c-tors and d-tors (but d-tors should
normally NOT throw)... And, of course, it *WORKS JUST FINE*
w.r.t. brain-dead catch(...)->unexpected() unwinding on ES
violations! ;-) Well, actually, it's >>UTTERLY BRAIN-DEAD<<
(and IS NOT "is equivalent to:", to begin with) given that
the current language simply BREAKS "RAII model" with respect
to user specified (using RAII do/undo objects) terminate()
and unexpected() handlers (when hitting ES "barriers";
"implementation-defined" unwinding on uncaught exceptions
aside for a moment):

http://groups.google.com/groups?selm=m.collett-E982F8.12450216072002%40lust.ihug.co.nz
http://groups.google.com/groups?threadm=3D3547BE.3045A2A6%40web.de
(Subject: Re: Is internal catch-clause rethrow standard?)

</annotation>
{
// stuff
}
catch (x2) { throw; } // re-throw
catch (x3) { throw; } // re-throw
catch (...) {
std::unexpected(); // unexpected() will not return
}

The most important advantage is that the function declaration
belongs to an interface that is visible to its callers. Function
definitions, on the other hand, are not universally available.
Even when we do have access to the source code of all our
libraries, we strongly prefer not to have to look at it very
often. In addition, a function with an exception-specification
is shorter and clearer than the equivalent hand-written version.

A function declared without an exception-specification is
assumed to throw every exception.

For example:

int f(); // can throw any exception

A function that will throw no exceptions can be declared with
an empty list:

int g() throw(); // no exception thrown

</quote>

regards,
alexander.

[1] http://groups.google.com/groups?threadm=3C91397A.9FC5F5A4%40web.de
(Subject: Re: C++ and threads)

P.S. http://lists.boost.org/MailArchives/boost/msg34156.php
([boost] Re: Attempting resolution of Threads & Exceptions Issue)

"....
In my view, there should be NO difference whatsoever between
"no-handler-found" and hitting some exception propagation barrier
[ES, d-tor that is unwound itself, etc.] -- that everything should
result in calling unexpected() AT THROW POINT (no stack unwinding;
two phase processing).
...."

Pontus Gagge

unread,

Sep 11, 2002, 8:15:52 AM9/11/02

to

So, barging in long after the original posting: what exactly
is *wrong* with the "Arizona lightning model" vs the "signal
timing diagram" thread design models? I haven't been following
this NG for long, so please bear with me.

I can certainly see advantages to the latter in terms of
predictable response times, possibly fixed bounds on
resource allocation (though that would depend on the tasks):
generally, I'd say, a design which should deliver performance
better suited to "harder" real-time requirements (if not all
*that* hard, less hardware-level interrupts).

However, I can also see advantages to creating/destroying
threads in response to changing loads and response times,
particularly in applications where resources are less
limited, occasional failures to serve a request are
permissible, maximising throughput within some upper
limit on response times, and your process will coexist
with others, and should therefore try not to grab every
resource at once (ie, your average application server/web
server scenario). Likewise, I'm sure, an argument can
be made for threading only when absolutely necessary
in a mostly single-threaded application (e.g., worker
threads in GUI-intensive processes).

So, is branching/joining threads really a Bad Thing(tm)
as the quoted posting seems to imply, or should we just
recognize that we, as in any area of software design,
simply must consider our engineering tradeoffs from case
to case, without pledging our souls to any single design
model? I'm curious to hear further arguments.

Daniel Miller <daniel...@tellabs.com> wrote in message news:<3D596F59...@tellabs.com>...