pthread_mutex is not released when process closes

pthreads

unread,

Jul 23, 2007, 7:54:55 AM7/23/07

to

folks,
Server exe will intialise the mutex. clients will work on mutex_lock
and mutex_unlock. My problem what if client that got lock on the
mutex ends (somehow by external kill commands) before it releases the
mutex.
The other processes that are waiting to aquire mutex lock never
aquires the lock and it will be in waiting condition forever!!!!!!

Following is the code snippet i am using

Server process
pthread_mutexattr_t mattr;
int ret = pthread_mutexattr_init (&mattr);
ret = pthread_mutexattr_setpshared (&mattr,
PTHREAD_PROCESS_SHARED);
ret = pthread_mutex_init (&(mutxtestObj->mut_free), &mattr);

Client1
pthread_mutex_lock (&(mutxtestObj->mut_free));
cout<<" \n inside mutx locking ";
testObj->values=10;
..
...
pthread_mutex_unlock(&(mutxtestObj->mut_free));

Client2
pthread_mutex_lock (&(mutxtestObj->mut_free));
cout<<" \n inside mutx locking ";
testObj->values=10;
..
...
pthread_mutex_unlock(&(mutxtestObj->mut_free));

Thanks in advance
Radha G

Sascha Bohnenkamp

unread,

Jul 23, 2007, 8:54:10 AM7/23/07

to

I think your OS (which?) is broken

Frank Cusack

unread,

Jul 23, 2007, 11:45:46 AM7/23/07

to

On Mon, 23 Jul 2007 04:54:55 -0700 pthreads <rgirip...@gmail.com> wrote:
> folks,
> Server exe will intialise the mutex. clients will work on mutex_lock
> and mutex_unlock. My problem what if client that got lock on the
> mutex ends (somehow by external kill commands) before it releases the
> mutex.
> The other processes that are waiting to aquire mutex lock never
> aquires the lock and it will be in waiting condition forever!!!!!!

atexit()

Dave Butenhof

unread,

Jul 23, 2007, 12:24:26 PM7/23/07

to

No, because the thread that takes the termination signal may not be the
one that owns the mutex, and therefore can't release it.

The answer is that "this is the way it's supposed to work". A locked
mutex implies broken shared data invariants. If the code terminates
without releasing the mutex, you cannot know that the data is correct,
and a hang on the locked mutex is the best failure mode.

(In general.)

Recent POSIX, however, includes a new "robust" mutex type designed to
address this case for applications that believe they CAN reliably
recover from the failure. You set "robust" type using
pthread_mutexattr_setrobust() on a mutex attributes object, and creating
the mutex with that attributes object.

With a "robust" mutex, the next process/thread to attempt a lock on the
"abandoned" mutex will receive status EOWNDERDEAD informing it of the
fact. It can then attempt to recover the state protected by the mutex
and issue a new mutex function (pthread_mutex_consistent()) that
restores the mutex to normal operation. If the mutex is unlocked WITHOUT
having made that new call, the mutex is assumed "non-recoverable", and
all subsequent attempts to lock the mutex will fail with the status
ENOTRECOVERABLE.

Note that you should be very paranoid about relying on the capability
between fully shared memory (or "largely shared") processes. The
"unexpected termination" that abandoned the mutex is likely to be
something like a SIGSEGV, which often implies random memory corruption
that may not be isolated to the data your application might THINK it
needs to recover. (But you're welcome to try...)

Of course you may not find robust mutexes widely available yet...

Chris Thomasson

unread,

Jul 23, 2007, 4:05:31 AM7/23/07

to

"Dave Butenhof" <david.b...@hp.com> wrote in message
news:f82knr$bug$1...@usenet01.boi.hp.com...

> Frank Cusack wrote:
>> On Mon, 23 Jul 2007 04:54:55 -0700 pthreads <rgirip...@gmail.com>
>> wrote:
>>> folks,
>>> Server exe will intialise the mutex. clients will work on mutex_lock
>>> and mutex_unlock. My problem what if client that got lock on the
>>> mutex ends (somehow by external kill commands) before it releases the
>>> mutex.
>>> The other processes that are waiting to aquire mutex lock never
>>> aquires the lock and it will be in waiting condition forever!!!!!!
>>
>> atexit()
>
> No, because the thread that takes the termination signal may not be the
> one that owns the mutex, and therefore can't release it.
>
> The answer is that "this is the way it's supposed to work". A locked mutex
> implies broken shared data invariants. If the code terminates without
> releasing the mutex, you cannot know that the data is correct, and a hang
> on the locked mutex is the best failure mode.
>
> (In general.)
>
> Recent POSIX, however, includes a new "robust" mutex type designed to
> address this case for applications that believe they CAN reliably recover
> from the failure. You set "robust" type using
> pthread_mutexattr_setrobust() on a mutex attributes object, and creating
> the mutex with that attributes object.

[...]

Here is some "relevant" information wrt actually implementing robust
synchronization objects:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/b5775d829f3f1259

http://groups.google.com/group/comp.programming.threads/msg/a2e6d0e5b18a7475

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/38d68d1289352a53

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/7dda9ad554a976a6/

What does POSIX require a standard conforming robust mutex implementation to
do on failure?

Dave Butenhof

unread,

Jul 24, 2007, 7:42:23 AM7/24/07

to

Chris Thomasson wrote:
> "Dave Butenhof" <david.b...@hp.com> wrote in message
>

>> Recent POSIX, however, includes a new "robust" mutex type designed to
>> address this case for applications that believe they CAN reliably
>> recover from the failure. You set "robust" type using
>> pthread_mutexattr_setrobust() on a mutex attributes object, and
>> creating the mutex with that attributes object.
>

> What does POSIX require a standard conforming robust mutex
> implementation to do on failure?

The short answer is "precisely what it says".

A more useful answer depends on what you really intend to ask.

POSIX provides a list of errors for each function, associating
particular standard error codes with specific failures. POSIX rarely
prohibits implementations from detecting and reporting any OTHER
failures by whatever means seems appropriate; in those rare cases where
it does, the prohibitions are explicit. (For example, the mutex
functions specify "These functions shall not return an error code of
[EINTR]." This doesn't mean simply that if interrupted by a signal the
function must return something else, but that it's not allowed to return
merely because it was interrupted by a signal.)

Obviously, robust mutexes add the specific new failure codes EOWNERDEAD
and ENOTRECOVERABLE to pthread_mutex_lock(), pthread_mutex_trylock(),
and pthread_mutex_timedlock(). EOWNERDEAD replaces the normal behavior
of blocking, failing with EAGAIN, or timing out when the mutex is locked
but the current owner no longer exists. The pthread_mutex_consistent()
function must return EINVAL if the mutex is not valid, is not a robust
mutex, or if the mutex "does not protect an inconsistent state" (i.e.,
has not been abandoned by a terminated process).

Of course all the other specified and likely failure modes of mutex
creation, destruction, locking, and unlocking apply. Since "robust" is a
separate mutex attribute rather than a mutex type, a robust mutex can
also be recursive, error-check, normal, or default. The standard
specifically says that a call to pthread_mutex_unlock() after
pthread_mutex_consistent() "releases" (not "unlocks") the mutex so that
"it can be used normally by other threads". This implies that the
combination of pthread_mutex_consistent() and pthread_mutex_unlock()
must "unroll" any recursive locks on a recursive mutex, not merely
decrease the lock depth as on a normal pthread_mutex_unlock().

J de Boyne Pollard

unread,

Jul 27, 2007, 1:49:39 PM7/27/07

to

DB> Of course you may not find robust mutexes widely available yet...

... or you may realize that this is the very behaviour that mutex
semaphores have on OS/2, and have had since 1987, that Dave Cutler so
famously abhorred. Those who fail to learn from OS/2 are doomed to
reinvent it, it seems.

Dave Butenhof

unread,

Jul 27, 2007, 3:01:24 PM7/27/07

to

SystemV semaphores have similar capabilities. This is nothing new or
radical. They are traditional kernel objects designed before
multiprocessing and shared address spaces became widely prevalent, and
are intended for coarse-grained, non-shared address synchronization.
They can work well for that purpose.

The addition of shared memory makes recovery of any failures difficult;
fully shared memory makes it effectively impossible unless you've got an
embedded monolithic application and are willing to put a lot of effort
into analyzing and repairing any random corruption in your address
space. That's why POSIX mutexes didn't have a "recovery" mechanism in
the first place. If a process dies with a SIGSEGV, it probably means
memory was trashed. It need not (and usually isn't) isolated in the data
controlled by the mutex; or even in the region that's actively shared
between processes.

Nevertheless, there ARE some applications that really can do this, and
more that think they can and want to try even if they really can't. The
"robust" mutex extension (which isn't really very "robust") was proposed
to make them happy. So, fine; let them have it. ;-)

Chris Friesen

unread,

Jul 27, 2007, 3:03:32 PM7/27/07

to

J de Boyne Pollard wrote:

Without this type of behaviour, how can one ensure that the system has a
chance to recover if an app dies while holding one or more mutexes?

Chris

Chris Friesen

unread,

Jul 27, 2007, 3:21:21 PM7/27/07

to

Dave Butenhof wrote:

> If a process dies with a SIGSEGV, it probably means
> memory was trashed. It need not (and usually isn't) isolated in the data
> controlled by the mutex; or even in the region that's actively shared
> between processes.

If the memory that was trashed is not in the data protected by a
particular mutex (and possibly not even accessable), why would the next
entity to aquire that mutex care about it?

Chris

Dave Butenhof

unread,

Jul 27, 2007, 4:38:39 PM7/27/07

to

Because the application is now in an unknown and unreliable state,
maybe? Even if a previous corruption didn't happen to affect THIS shared
data, continuing execution as if nothing happened might well do it
later. Unless you can find and fix the problem, you don't know; you're
just tossing dice with your data. Well, now there's a standard "throw
dice and sit back" mutex, so anyone who thinks they're up to it can give
it a go. I'm just saying, use with caution and be aware of the larger
context in which the application operates. ;-)

Ian Collins

unread,

Jul 27, 2007, 6:59:39 PM7/27/07

to

"has a chance" pretty well sums up the situation. What odds would you
consider acceptable in order to continue rather than bail?

--
Ian Collins.

Chris Friesen

unread,

Jul 27, 2007, 8:14:35 PM7/27/07

to

Ian Collins wrote:

> "has a chance" pretty well sums up the situation. What odds would you
> consider acceptable in order to continue rather than bail?

It's not a matter of odds, the data is either provably correct (where
provably means that CRCs match, or something similar rather than
mathematically certain) or else it's considered corrupt.

If I were to write something using a robust mutex, the recovery
mechanism would be something like:

try to aquire mutex
if mutex is "abandoned"
run integrity checks on data covered by mutex
if integrity passes
set mutex to consistent
else
bail

The integrity tests would basically ensure that the data is
self-consistent. So things like CRCs at multiple levels, string lengths
stored in addition to the null-terminated string itself, etc.

Depending on the app, it might be possible to continue on even if a
subset of the data is corrupt. It may be possible to re-generate that
data from other sources, for instance.

Chris

Chris Friesen

unread,

Jul 27, 2007, 8:23:47 PM7/27/07

to

Dave Butenhof wrote:
> Chris Friesen wrote:

>> If the memory that was trashed is not in the data protected by a
>> particular mutex (and possibly not even accessable), why would the
>> next entity to aquire that mutex care about it?

> Because the application is now in an unknown and unreliable state,
> maybe? Even if a previous corruption didn't happen to affect THIS shared
> data, continuing execution as if nothing happened might well do it
> later.

I think I see what you're getting at. The real issue is that the
application that went nuts could theoretically have trampled data not
protected by any of the mutexes held at the time it went nuts. In a
multithreaded environment this could be particularly nasty.

Even so, there are cases where "robust" mutexes could still be useful.
I have an app where two separate processes both access a single chunk of
shared memory. They currently use file locking to synchronize, but a
process-shared mutex would work as well. If I were to include a CRC of
the data in the shared memory area, then if one of them dies while
holding the mutex, the other could easily verify whether the data is
self-consistent simply by verifying the CRC.

If the app that died corrupted other memory it really doesn't matter to
the second app, because the only chunk it cares about (and in this case
the only area it can actually see) would be covered by the mutex.

Chris

Dave Butenhof

unread,

Jul 28, 2007, 12:09:40 AM7/28/07

to

Chris Friesen wrote:
> Dave Butenhof wrote:
>> Chris Friesen wrote:
>
>>> If the memory that was trashed is not in the data protected by a
>>> particular mutex (and possibly not even accessable), why would the
>>> next entity to aquire that mutex care about it?
>
>> Because the application is now in an unknown and unreliable state,
>> maybe? Even if a previous corruption didn't happen to affect THIS
>> shared data, continuing execution as if nothing happened might well do
>> it later.
>
> I think I see what you're getting at. The real issue is that the
> application that went nuts could theoretically have trampled data not
> protected by any of the mutexes held at the time it went nuts. In a
> multithreaded environment this could be particularly nasty.
>
> Even so, there are cases where "robust" mutexes could still be useful. I
> have an app where two separate processes both access a single chunk of
> shared memory. They currently use file locking to synchronize, but a
> process-shared mutex would work as well. If I were to include a CRC of
> the data in the shared memory area, then if one of them dies while
> holding the mutex, the other could easily verify whether the data is
> self-consistent simply by verifying the CRC.

Yes, there are. But only for this sort of scenerio, where the shared
area of memory is small, controlled, and "provably correct[able]"... or
in a monolithic app where literally ALL memory can be analyzed and
repaired. Both are very special cases, and the original phase of POSIX
threading was, deliberately, narrowly focused on general purpose
primitives. (And enough of THEM were sufficiently contentious without
dealing with even more specialized cases like this.)

MOST code cannot "prove" that all possible shared memory with the dead
process is correct. And certainly not with a fully shared address space
inhabited by completely independent shared libraries.

> If the app that died corrupted other memory it really doesn't matter to
> the second app, because the only chunk it cares about (and in this case
> the only area it can actually see) would be covered by the mutex.

Sure... as long as there's NO other possible "leakage" beyond a region
of memory you know and can reliably repair. Again... it DOES happen, but
it's definitely not a common case.

Chris Thomasson

unread,

Jul 27, 2007, 6:40:41 AM7/27/07

to

"Ian Collins" <ian-...@hotmail.com> wrote in message
news:5gvbmrF...@mid.individual.net...

http://groups.google.com/group/comp.programming.threads/msg/9b1edc053cc58f2e

http://groups.google.com/group/comp.programming.threads/msg/94f2a233bd65bf8a

http://groups.google.com/group/comp.programming.threads/msg/a2e6d0e5b18a7475

Ian Collins

unread,

Jul 28, 2007, 12:42:16 AM7/28/07

to

Chris Friesen wrote:
> Ian Collins wrote:
>
>> "has a chance" pretty well sums up the situation. What odds would you
>> consider acceptable in order to continue rather than bail?
>
> It's not a matter of odds, the data is either provably correct (where
> provably means that CRCs match, or something similar rather than
> mathematically certain) or else it's considered corrupt.
>
> If I were to write something using a robust mutex, the recovery
> mechanism would be something like:
>
> try to aquire mutex
> if mutex is "abandoned"
> run integrity checks on data covered by mutex
> if integrity passes
> set mutex to consistent
> else
> bail
>
> The integrity tests would basically ensure that the data is
> self-consistent. So things like CRCs at multiple levels, string lengths
> stored in addition to the null-terminated string itself, etc.
>

You are talking about a very small set of applications where the cost of
providing a means of validation to protect against the chance of one
part failing is justifiable. I'd have thought the data size and the
range of activities performed under protection of a the mutex must be
small in this case.

--
Ian Collins.

Ian Collins

unread,

Jul 28, 2007, 12:49:00 AM7/28/07

to

Chris Thomasson wrote:
> "Ian Collins" <ian-...@hotmail.com> wrote in message
> news:5gvbmrF...@mid.individual.net...
>> Chris Friesen wrote:
>>> J de Boyne Pollard wrote:
>>>> DB> Of course you may not find robust mutexes widely available yet...
>>>>
>>>> ... or you may realize that this is the very behaviour that mutex
>>>> semaphores have on OS/2, and have had since 1987, that Dave Cutler so
>>>> famously abhorred. Those who fail to learn from OS/2 are doomed to
>>>> reinvent it, it seems.
>>>
>>> Without this type of behaviour, how can one ensure that the system has a
>>> chance to recover if an app dies while holding one or more mutexes?
>>>
>> "has a chance" pretty well sums up the situation. What odds would you
>> consider acceptable in order to continue rather than bail?
>

> http://groups.google.com/group/comp.programming.threads/msg/94f2a233bd65bf8a
>
Sums up what I just posted, you have to use fine grained locking for
this to be viable.

By the way, your clock appears to on on the blink, your replay is dated
12 hours before my post!

--
Ian Collins.

David Schwartz

unread,

Jul 28, 2007, 4:53:03 PM7/28/07

to

On Jul 27, 5:23 pm, Chris Friesen <cbf...@mail.usask.ca> wrote:

> Even so, there are cases where "robust" mutexes could still be useful.
> I have an app where two separate processes both access a single chunk of
> shared memory. They currently use file locking to synchronize, but a
> process-shared mutex would work as well. If I were to include a CRC of
> the data in the shared memory area, then if one of them dies while
> holding the mutex, the other could easily verify whether the data is
> self-consistent simply by verifying the CRC.

How would that help? The CRC is calculated by the entity you don't
trust, so you don't trust the CRC. For example, suppose the thread
goes berzerk, corrupts the data, calculates the correct CRC, and then
crashes due to the corruption.

The CRC being correct just tells you the thread wrote what it intended
to write. However, given that it crashed before it released the mutex,
it may well have failed because it wrote the wrong data in the first
place.

DS

Chris Thomasson

unread,

Jul 28, 2007, 3:12:20 AM7/28/07

to

"Dave Butenhof" <david.b...@hp.com> wrote in message

news:f84oiv$9qo$1...@usenet01.boi.hp.com...

> Chris Thomasson wrote:
>> "Dave Butenhof" <david.b...@hp.com> wrote in message
> >
>>> Recent POSIX, however, includes a new "robust" mutex type designed to
>>> address this case for applications that believe they CAN reliably
>>> recover from the failure. You set "robust" type using
>>> pthread_mutexattr_setrobust() on a mutex attributes object, and creating
>>> the mutex with that attributes object.
>>
>> What does POSIX require a standard conforming robust mutex implementation
>> to do on failure?
>
> The short answer is "precisely what it says".

;^)

> A more useful answer depends on what you really intend to ask.

I was wondering if the mutex's internal state would have to be protected, or
hidden in the kernel and no way accessable from user-space.

[...]

>
> Obviously, robust mutexes add the specific new failure codes EOWNERDEAD
> and ENOTRECOVERABLE to pthread_mutex_lock(), pthread_mutex_trylock(), and
> pthread_mutex_timedlock(). EOWNERDEAD replaces the normal behavior of
> blocking, failing with EAGAIN, or timing out when the mutex is locked but
> the current owner no longer exists. The pthread_mutex_consistent()
> function must return EINVAL if the mutex is not valid, is not a robust
> mutex, or if the mutex "does not protect an inconsistent state" (i.e., has
> not been abandoned by a terminated process).

Okay. Thanks for that.

> Of course all the other specified and likely failure modes of mutex
> creation, destruction, locking, and unlocking apply. Since "robust" is a
> separate mutex attribute rather than a mutex type, a robust mutex can also
> be recursive, error-check, normal, or default. The standard specifically
> says that a call to pthread_mutex_unlock() after
> pthread_mutex_consistent() "releases" (not "unlocks") the mutex so that
> "it can be used normally by other threads". This implies that the
> combination of pthread_mutex_consistent() and pthread_mutex_unlock() must
> "unroll" any recursive locks on a recursive mutex, not merely decrease the
> lock depth as on a normal pthread_mutex_unlock().

Ouch. Seems like a pain the neck to handle robust recursive mutexs...

Chris Thomasson

unread,

Jul 28, 2007, 9:29:08 PM7/28/07

to

"Ian Collins" <ian-...@hotmail.com> wrote in message

news:5h005sF...@mid.individual.net...

DAMN! Your right. Will fix. I can't believe I have not noticed that!

:^0

Dave Butenhof

unread,

Jul 28, 2007, 9:39:46 PM7/28/07

to

Chris Thomasson wrote:
> "Dave Butenhof" <david.b...@hp.com> wrote in message

> I was wondering if the mutex's internal state would have to be

> protected, or hidden in the kernel and no way accessable from user-space.

Yeah, it really would. At least, it needs to be unwritable from the
"loose cannon" user-mode code that's going to trash something and kill
the process. (Or else the mutex itself might get trashed, and all bets
are kinda off.) Which, in most UNIX systems, is essentially the same thing.

> Ouch. Seems like a pain the neck to handle robust recursive mutexs...

Yeah, well, just one more reason on the long list to stay away from
recursive mutexes. ;-)

Chris Thomasson

unread,

Jul 31, 2007, 1:31:56 AM7/31/07

to

"Dave Butenhof" <david.b...@hp.com> wrote in message

news:f8gr52$a3p$1...@usenet01.boi.hp.com...

> Chris Thomasson wrote:
>> "Dave Butenhof" <david.b...@hp.com> wrote in message
>
>> I was wondering if the mutex's internal state would have to be protected,
>> or hidden in the kernel and no way accessable from user-space.
>
> Yeah, it really would. At least, it needs to be unwritable from the "loose
> cannon" user-mode code that's going to trash something and kill the
> process. (Or else the mutex itself might get trashed, and all bets are
> kinda off.) Which, in most UNIX systems, is essentially the same thing.

Agreed. However, I am not sure that this is any sort of requirement demanded
by the POSIX standard... Technically, would a robust mutex implementation
whose state was completely assessable to "loost cannon", ;^0, user-space
programs that may or may not be drunk while the scheduler lets them drive...
Please correct me if I am misuderthing the standard on this issue...

>> Ouch. Seems like a pain the neck to handle robust recursive mutexs...
>
> Yeah, well, just one more reason on the long list to stay away from
> recursive mutexes. ;-)

Bingo!

Chris Thomasson

unread,

Jul 31, 2007, 1:34:52 AM7/31/07

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:F6adnUqTjcSHVDPb...@comcast.com...

> "Dave Butenhof" <david.b...@hp.com> wrote in message
> news:f8gr52$a3p$1...@usenet01.boi.hp.com...
>> Chris Thomasson wrote:
>>> "Dave Butenhof" <david.b...@hp.com> wrote in message
>>
>>> I was wondering if the mutex's internal state would have to be
>>> protected, or hidden in the kernel and no way accessable from
>>> user-space.
>>
>> Yeah, it really would. At least, it needs to be unwritable from the
>> "loose cannon" user-mode code that's going to trash something and kill
>> the process. (Or else the mutex itself might get trashed, and all bets
>> are kinda off.) Which, in most UNIX systems, is essentially the same
>> thing.
>
> Agreed. However, I am not sure that this is any sort of requirement
> demanded by the POSIX standard... Technically, would a robust mutex
> implementation whose state was completely assessable to "loost cannon",
> ;^0, user-space programs that may or may not be drunk while the scheduler
> lets them drive...

Oops. Forgot the question part of the last-sentence above: Would a mutex
like that be standard conforming wrt fully supporting the robust attribute?

Dave Butenhof

unread,

Jul 31, 2007, 6:02:32 AM7/31/07

to Chris Thomasson

Chris Thomasson wrote:
> "Chris Thomasson" <cri...@comcast.net> wrote in message
> news:F6adnUqTjcSHVDPb...@comcast.com...
>> "Dave Butenhof" <david.b...@hp.com> wrote in message
>> news:f8gr52$a3p$1...@usenet01.boi.hp.com...
>>> Chris Thomasson wrote:
>>>> "Dave Butenhof" <david.b...@hp.com> wrote in message
>>>
>>>> I was wondering if the mutex's internal state would have to be
>>>> protected, or hidden in the kernel and no way accessable from
>>>> user-space.
>>>
>>> Yeah, it really would. At least, it needs to be unwritable from the
>>> "loose cannon" user-mode code that's going to trash something and
>>> kill the process. (Or else the mutex itself might get trashed, and
>>> all bets are kinda off.) Which, in most UNIX systems, is essentially
>>> the same thing.
>>
>> Agreed. However, I am not sure that this is any sort of requirement
>> demanded by the POSIX standard... Technically, would a robust mutex
>> implementation whose state was completely assessable to "loost
>> cannon", ;^0, user-space programs that may or may not be drunk while
>> the scheduler lets them drive...
>
> Oops. Forgot the question part of the last-sentence above: Would a mutex
> like that be standard conforming wrt fully supporting the robust attribute?

There's no explicit POSIX rule that robust mutexes not be susceptible to
user-mode corruption. Technically, "a mutex", in strictly POSIX terms,
is just a pthread_mutex_t -- a structure allocated by the user in the
process address space and initialized by pthread_mutex_init(). If it's
in process-shared memory and initialized with the pshared attribute,
then it can be used to synchronize between processes. There's nothing
anywhere that requires, or even strongly suggests, that any part of the
operational context of a mutex be stored in kernel space.

The existence of the "robust" mutex attribute strongly IMPLIES that the
context of the mutex should be protected, because "robustness" is pretty
much irrelevant if the mutex itself is subject to the same corruption
you're hoping to repair. But nowhere is that formally required by POSIX.
Nor, within the format and context of POSIX, CAN it be required. POSIX
is an API, not an ABI, and anything more than informative hints about
possible implementation is out of scope.

So, yes, a corruptible "robust" mutex with all the state hanging out in
user space could conform -- and yet be virtually useless for the stated
purpose. It's a mere "quality of implementation" issue. ;-)

Chris Thomasson

unread,

Jul 31, 2007, 6:20:00 AM7/31/07

to

"Dave Butenhof" <david.b...@hp.com> wrote in message

news:46AF08B8...@hp.com...

> Chris Thomasson wrote:
>> "Chris Thomasson" <cri...@comcast.net> wrote in message
>> news:F6adnUqTjcSHVDPb...@comcast.com...
>>> "Dave Butenhof" <david.b...@hp.com> wrote in message
>>> news:f8gr52$a3p$1...@usenet01.boi.hp.com...
>>>> Chris Thomasson wrote:
>>>>> "Dave Butenhof" <david.b...@hp.com> wrote in message

[...]

> So, yes, a corruptible "robust" mutex with all the state hanging out in
> user space could conform -- and yet be virtually useless for the stated
> purpose. It's a mere "quality of implementation" issue. ;-)

Wow: Thank you so much for that wonderful answer! Humm... Okay, well,
quality of implementation issues are key to the success of the standard,
imvho of course. If the standard 'stand' is so strict as to prevent common
sense, clever and down right genius innovations from flooding throughout the
various possible implementations of a particular standard, then is may be
dead on arrival in more ways than one. How do you, as a concrete, and
potential influence on the POSIX standard, write documentation that tries to
get across that a "best" implementation route for a giving specification is
recommended, without sounding like a darn salesman?

BTW, thanks for letting me basically interview you!

Many thanks across the board.

Dave Butenhof

unread,

Jul 31, 2007, 7:36:00 AM7/31/07

to

Chris Thomasson wrote:
> "Dave Butenhof" <david.b...@hp.com> wrote in message
> news:46AF08B8...@hp.com...
>> Chris Thomasson wrote:
>>> "Chris Thomasson" <cri...@comcast.net> wrote in message
>>> news:F6adnUqTjcSHVDPb...@comcast.com...
>>>> "Dave Butenhof" <david.b...@hp.com> wrote in message
>>>> news:f8gr52$a3p$1...@usenet01.boi.hp.com...
>>>>> Chris Thomasson wrote:
>>>>>> "Dave Butenhof" <david.b...@hp.com> wrote in message
> [...]
>
>> So, yes, a corruptible "robust" mutex with all the state hanging out
>> in user space could conform -- and yet be virtually useless for the
>> stated purpose. It's a mere "quality of implementation" issue. ;-)
>
> Wow: Thank you so much for that wonderful answer! Humm... Okay, well,
> quality of implementation issues are key to the success of the standard,
> imvho of course. If the standard 'stand' is so strict as to prevent
> common sense, clever and down right genius innovations from flooding
> throughout the various possible implementations of a particular
> standard, then is may be dead on arrival in more ways than one. How do
> you, as a concrete, and potential influence on the POSIX standard, write
> documentation that tries to get across that a "best" implementation
> route for a giving specification is recommended, without sounding like a
> darn salesman?

The best and most appropriate implementation for many functions is going
to depend on the intended users -- as well as a bunch of other factors.
Many implementation decisions in POSIX APIs will be vastly different for
an embedded realtime system used for, say, a spaceship flight control
system, vs a multiuser timeshare system.

It's not easy to make implementation requirements without risk of
prohibiting someone's "next great idea". There are a couple of hints
basically where someone thought that people might think "whoa, this is
too hard to do"... but in general it's assumed that someone sufficiently
familiar with OS systems development will be able to figure out an
implementation that makes sense within the environment(s) they're trying
to support.

In theory, as a USER of POSIX conforming systems, it's your job to
thoroughly evaluate various implementations to find one appropriate to
your specific needs.

As a developer, of course, it's usually more a matter of figuring out
how to get by with the implementation(s) you have to support. POSIX at
least helps by standardizing the basic APIs. But sometimes it might come
down to designing your application so that it uses some robust
synchronization that IS a pure kernel object, perhaps SystemV
semaphores, on implementations where robust mutexes don't work as you'd
like. Or, if you can get away with it, don't support that system... or
document that their implementation endangers your application's attempts
at robustness and it just won't be as reliable there. (Generally,
though, users aren't happy with that because you're just making excuses.)