Following is the code snippet i am using
Server process
pthread_mutexattr_t mattr;
int ret = pthread_mutexattr_init (&mattr);
ret = pthread_mutexattr_setpshared (&mattr,
PTHREAD_PROCESS_SHARED);
ret = pthread_mutex_init (&(mutxtestObj->mut_free), &mattr);
Client1
pthread_mutex_lock (&(mutxtestObj->mut_free));
cout<<" \n inside mutx locking ";
testObj->values=10;
..
...
pthread_mutex_unlock(&(mutxtestObj->mut_free));
Client2
pthread_mutex_lock (&(mutxtestObj->mut_free));
cout<<" \n inside mutx locking ";
testObj->values=10;
..
...
pthread_mutex_unlock(&(mutxtestObj->mut_free));
Thanks in advance
Radha G
atexit()
No, because the thread that takes the termination signal may not be the
one that owns the mutex, and therefore can't release it.
The answer is that "this is the way it's supposed to work". A locked
mutex implies broken shared data invariants. If the code terminates
without releasing the mutex, you cannot know that the data is correct,
and a hang on the locked mutex is the best failure mode.
(In general.)
Recent POSIX, however, includes a new "robust" mutex type designed to
address this case for applications that believe they CAN reliably
recover from the failure. You set "robust" type using
pthread_mutexattr_setrobust() on a mutex attributes object, and creating
the mutex with that attributes object.
With a "robust" mutex, the next process/thread to attempt a lock on the
"abandoned" mutex will receive status EOWNDERDEAD informing it of the
fact. It can then attempt to recover the state protected by the mutex
and issue a new mutex function (pthread_mutex_consistent()) that
restores the mutex to normal operation. If the mutex is unlocked WITHOUT
having made that new call, the mutex is assumed "non-recoverable", and
all subsequent attempts to lock the mutex will fail with the status
ENOTRECOVERABLE.
Note that you should be very paranoid about relying on the capability
between fully shared memory (or "largely shared") processes. The
"unexpected termination" that abandoned the mutex is likely to be
something like a SIGSEGV, which often implies random memory corruption
that may not be isolated to the data your application might THINK it
needs to recover. (But you're welcome to try...)
Of course you may not find robust mutexes widely available yet...
[...]
Here is some "relevant" information wrt actually implementing robust
synchronization objects:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/b5775d829f3f1259
http://groups.google.com/group/comp.programming.threads/msg/a2e6d0e5b18a7475
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/38d68d1289352a53
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/7dda9ad554a976a6/
What does POSIX require a standard conforming robust mutex implementation to
do on failure?
The short answer is "precisely what it says".
A more useful answer depends on what you really intend to ask.
POSIX provides a list of errors for each function, associating
particular standard error codes with specific failures. POSIX rarely
prohibits implementations from detecting and reporting any OTHER
failures by whatever means seems appropriate; in those rare cases where
it does, the prohibitions are explicit. (For example, the mutex
functions specify "These functions shall not return an error code of
[EINTR]." This doesn't mean simply that if interrupted by a signal the
function must return something else, but that it's not allowed to return
merely because it was interrupted by a signal.)
Obviously, robust mutexes add the specific new failure codes EOWNERDEAD
and ENOTRECOVERABLE to pthread_mutex_lock(), pthread_mutex_trylock(),
and pthread_mutex_timedlock(). EOWNERDEAD replaces the normal behavior
of blocking, failing with EAGAIN, or timing out when the mutex is locked
but the current owner no longer exists. The pthread_mutex_consistent()
function must return EINVAL if the mutex is not valid, is not a robust
mutex, or if the mutex "does not protect an inconsistent state" (i.e.,
has not been abandoned by a terminated process).
Of course all the other specified and likely failure modes of mutex
creation, destruction, locking, and unlocking apply. Since "robust" is a
separate mutex attribute rather than a mutex type, a robust mutex can
also be recursive, error-check, normal, or default. The standard
specifically says that a call to pthread_mutex_unlock() after
pthread_mutex_consistent() "releases" (not "unlocks") the mutex so that
"it can be used normally by other threads". This implies that the
combination of pthread_mutex_consistent() and pthread_mutex_unlock()
must "unroll" any recursive locks on a recursive mutex, not merely
decrease the lock depth as on a normal pthread_mutex_unlock().
... or you may realize that this is the very behaviour that mutex
semaphores have on OS/2, and have had since 1987, that Dave Cutler so
famously abhorred. Those who fail to learn from OS/2 are doomed to
reinvent it, it seems.
SystemV semaphores have similar capabilities. This is nothing new or
radical. They are traditional kernel objects designed before
multiprocessing and shared address spaces became widely prevalent, and
are intended for coarse-grained, non-shared address synchronization.
They can work well for that purpose.
The addition of shared memory makes recovery of any failures difficult;
fully shared memory makes it effectively impossible unless you've got an
embedded monolithic application and are willing to put a lot of effort
into analyzing and repairing any random corruption in your address
space. That's why POSIX mutexes didn't have a "recovery" mechanism in
the first place. If a process dies with a SIGSEGV, it probably means
memory was trashed. It need not (and usually isn't) isolated in the data
controlled by the mutex; or even in the region that's actively shared
between processes.
Nevertheless, there ARE some applications that really can do this, and
more that think they can and want to try even if they really can't. The
"robust" mutex extension (which isn't really very "robust") was proposed
to make them happy. So, fine; let them have it. ;-)
Without this type of behaviour, how can one ensure that the system has a
chance to recover if an app dies while holding one or more mutexes?
Chris
> If a process dies with a SIGSEGV, it probably means
> memory was trashed. It need not (and usually isn't) isolated in the data
> controlled by the mutex; or even in the region that's actively shared
> between processes.
If the memory that was trashed is not in the data protected by a
particular mutex (and possibly not even accessable), why would the next
entity to aquire that mutex care about it?
Chris
Because the application is now in an unknown and unreliable state,
maybe? Even if a previous corruption didn't happen to affect THIS shared
data, continuing execution as if nothing happened might well do it
later. Unless you can find and fix the problem, you don't know; you're
just tossing dice with your data. Well, now there's a standard "throw
dice and sit back" mutex, so anyone who thinks they're up to it can give
it a go. I'm just saying, use with caution and be aware of the larger
context in which the application operates. ;-)
--
Ian Collins.
> "has a chance" pretty well sums up the situation. What odds would you
> consider acceptable in order to continue rather than bail?
It's not a matter of odds, the data is either provably correct (where
provably means that CRCs match, or something similar rather than
mathematically certain) or else it's considered corrupt.
If I were to write something using a robust mutex, the recovery
mechanism would be something like:
try to aquire mutex
if mutex is "abandoned"
run integrity checks on data covered by mutex
if integrity passes
set mutex to consistent
else
bail
The integrity tests would basically ensure that the data is
self-consistent. So things like CRCs at multiple levels, string lengths
stored in addition to the null-terminated string itself, etc.
Depending on the app, it might be possible to continue on even if a
subset of the data is corrupt. It may be possible to re-generate that
data from other sources, for instance.
Chris
>> If the memory that was trashed is not in the data protected by a
>> particular mutex (and possibly not even accessable), why would the
>> next entity to aquire that mutex care about it?
> Because the application is now in an unknown and unreliable state,
> maybe? Even if a previous corruption didn't happen to affect THIS shared
> data, continuing execution as if nothing happened might well do it
> later.
I think I see what you're getting at. The real issue is that the
application that went nuts could theoretically have trampled data not
protected by any of the mutexes held at the time it went nuts. In a
multithreaded environment this could be particularly nasty.
Even so, there are cases where "robust" mutexes could still be useful.
I have an app where two separate processes both access a single chunk of
shared memory. They currently use file locking to synchronize, but a
process-shared mutex would work as well. If I were to include a CRC of
the data in the shared memory area, then if one of them dies while
holding the mutex, the other could easily verify whether the data is
self-consistent simply by verifying the CRC.
If the app that died corrupted other memory it really doesn't matter to
the second app, because the only chunk it cares about (and in this case
the only area it can actually see) would be covered by the mutex.
Chris
Yes, there are. But only for this sort of scenerio, where the shared
area of memory is small, controlled, and "provably correct[able]"... or
in a monolithic app where literally ALL memory can be analyzed and
repaired. Both are very special cases, and the original phase of POSIX
threading was, deliberately, narrowly focused on general purpose
primitives. (And enough of THEM were sufficiently contentious without
dealing with even more specialized cases like this.)
MOST code cannot "prove" that all possible shared memory with the dead
process is correct. And certainly not with a fully shared address space
inhabited by completely independent shared libraries.
> If the app that died corrupted other memory it really doesn't matter to
> the second app, because the only chunk it cares about (and in this case
> the only area it can actually see) would be covered by the mutex.
Sure... as long as there's NO other possible "leakage" beyond a region
of memory you know and can reliably repair. Again... it DOES happen, but
it's definitely not a common case.
http://groups.google.com/group/comp.programming.threads/msg/9b1edc053cc58f2e
http://groups.google.com/group/comp.programming.threads/msg/94f2a233bd65bf8a
http://groups.google.com/group/comp.programming.threads/msg/a2e6d0e5b18a7475
--
Ian Collins.
By the way, your clock appears to on on the blink, your replay is dated
12 hours before my post!
--
Ian Collins.
> Even so, there are cases where "robust" mutexes could still be useful.
> I have an app where two separate processes both access a single chunk of
> shared memory. They currently use file locking to synchronize, but a
> process-shared mutex would work as well. If I were to include a CRC of
> the data in the shared memory area, then if one of them dies while
> holding the mutex, the other could easily verify whether the data is
> self-consistent simply by verifying the CRC.
How would that help? The CRC is calculated by the entity you don't
trust, so you don't trust the CRC. For example, suppose the thread
goes berzerk, corrupts the data, calculates the correct CRC, and then
crashes due to the corruption.
The CRC being correct just tells you the thread wrote what it intended
to write. However, given that it crashed before it released the mutex,
it may well have failed because it wrote the wrong data in the first
place.
DS
;^)
> A more useful answer depends on what you really intend to ask.
I was wondering if the mutex's internal state would have to be protected, or
hidden in the kernel and no way accessable from user-space.
[...]
>
> Obviously, robust mutexes add the specific new failure codes EOWNERDEAD
> and ENOTRECOVERABLE to pthread_mutex_lock(), pthread_mutex_trylock(), and
> pthread_mutex_timedlock(). EOWNERDEAD replaces the normal behavior of
> blocking, failing with EAGAIN, or timing out when the mutex is locked but
> the current owner no longer exists. The pthread_mutex_consistent()
> function must return EINVAL if the mutex is not valid, is not a robust
> mutex, or if the mutex "does not protect an inconsistent state" (i.e., has
> not been abandoned by a terminated process).
Okay. Thanks for that.
> Of course all the other specified and likely failure modes of mutex
> creation, destruction, locking, and unlocking apply. Since "robust" is a
> separate mutex attribute rather than a mutex type, a robust mutex can also
> be recursive, error-check, normal, or default. The standard specifically
> says that a call to pthread_mutex_unlock() after
> pthread_mutex_consistent() "releases" (not "unlocks") the mutex so that
> "it can be used normally by other threads". This implies that the
> combination of pthread_mutex_consistent() and pthread_mutex_unlock() must
> "unroll" any recursive locks on a recursive mutex, not merely decrease the
> lock depth as on a normal pthread_mutex_unlock().
Ouch. Seems like a pain the neck to handle robust recursive mutexs...
DAMN! Your right. Will fix. I can't believe I have not noticed that!
:^0
> I was wondering if the mutex's internal state would have to be
> protected, or hidden in the kernel and no way accessable from user-space.
Yeah, it really would. At least, it needs to be unwritable from the
"loose cannon" user-mode code that's going to trash something and kill
the process. (Or else the mutex itself might get trashed, and all bets
are kinda off.) Which, in most UNIX systems, is essentially the same thing.
> Ouch. Seems like a pain the neck to handle robust recursive mutexs...
Yeah, well, just one more reason on the long list to stay away from
recursive mutexes. ;-)
Agreed. However, I am not sure that this is any sort of requirement demanded
by the POSIX standard... Technically, would a robust mutex implementation
whose state was completely assessable to "loost cannon", ;^0, user-space
programs that may or may not be drunk while the scheduler lets them drive...
Please correct me if I am misuderthing the standard on this issue...
>> Ouch. Seems like a pain the neck to handle robust recursive mutexs...
>
> Yeah, well, just one more reason on the long list to stay away from
> recursive mutexes. ;-)
Bingo!
Oops. Forgot the question part of the last-sentence above: Would a mutex
like that be standard conforming wrt fully supporting the robust attribute?
There's no explicit POSIX rule that robust mutexes not be susceptible to
user-mode corruption. Technically, "a mutex", in strictly POSIX terms,
is just a pthread_mutex_t -- a structure allocated by the user in the
process address space and initialized by pthread_mutex_init(). If it's
in process-shared memory and initialized with the pshared attribute,
then it can be used to synchronize between processes. There's nothing
anywhere that requires, or even strongly suggests, that any part of the
operational context of a mutex be stored in kernel space.
The existence of the "robust" mutex attribute strongly IMPLIES that the
context of the mutex should be protected, because "robustness" is pretty
much irrelevant if the mutex itself is subject to the same corruption
you're hoping to repair. But nowhere is that formally required by POSIX.
Nor, within the format and context of POSIX, CAN it be required. POSIX
is an API, not an ABI, and anything more than informative hints about
possible implementation is out of scope.
So, yes, a corruptible "robust" mutex with all the state hanging out in
user space could conform -- and yet be virtually useless for the stated
purpose. It's a mere "quality of implementation" issue. ;-)
> So, yes, a corruptible "robust" mutex with all the state hanging out in
> user space could conform -- and yet be virtually useless for the stated
> purpose. It's a mere "quality of implementation" issue. ;-)
Wow: Thank you so much for that wonderful answer! Humm... Okay, well,
quality of implementation issues are key to the success of the standard,
imvho of course. If the standard 'stand' is so strict as to prevent common
sense, clever and down right genius innovations from flooding throughout the
various possible implementations of a particular standard, then is may be
dead on arrival in more ways than one. How do you, as a concrete, and
potential influence on the POSIX standard, write documentation that tries to
get across that a "best" implementation route for a giving specification is
recommended, without sounding like a darn salesman?
BTW, thanks for letting me basically interview you!
Many thanks across the board.
The best and most appropriate implementation for many functions is going
to depend on the intended users -- as well as a bunch of other factors.
Many implementation decisions in POSIX APIs will be vastly different for
an embedded realtime system used for, say, a spaceship flight control
system, vs a multiuser timeshare system.
It's not easy to make implementation requirements without risk of
prohibiting someone's "next great idea". There are a couple of hints
basically where someone thought that people might think "whoa, this is
too hard to do"... but in general it's assumed that someone sufficiently
familiar with OS systems development will be able to figure out an
implementation that makes sense within the environment(s) they're trying
to support.
In theory, as a USER of POSIX conforming systems, it's your job to
thoroughly evaluate various implementations to find one appropriate to
your specific needs.
As a developer, of course, it's usually more a matter of figuring out
how to get by with the implementation(s) you have to support. POSIX at
least helps by standardizing the basic APIs. But sometimes it might come
down to designing your application so that it uses some robust
synchronization that IS a pure kernel object, perhaps SystemV
semaphores, on implementations where robust mutexes don't work as you'd
like. Or, if you can get away with it, don't support that system... or
document that their implementation endangers your application's attempts
at robustness and it just won't be as reliable there. (Generally,
though, users aren't happy with that because you're just making excuses.)