Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Detecting mutexes owned by dead processes

695 views
Skip to first unread message

Otto Lind

unread,
Dec 2, 1996, 3:00:00 AM12/2/96
to

Hi,

I have a problem I want to pose to the readers of this newsgroup. We
are currently developing a product using multiple processes, with each
process potentially using multiple POSIX threads. Inter-process
communication is done using objects stored in memory mapped shared
memory files. Process coordination to these objects is done using
POSIX mutexes with a mutex type of PTHREAD_PROCESS_SHARED. Currently,
primary development is on Solaris 2.5.1, and a parallel port is being
done on Windows NT; Ports to other Unix platforms which use POSIX
threads will be occurring in the future.

One aspect of our architecture is that we have a coordinator process
which tries to clean up all system resources in use if a child process
dies, and restarts the failed process in recovery mode so that it can
restore a consistent state of all objects it manages and continue
execution. This is done since each process will be interfacing with
the outside world through various protocol stacks and third party
libraries. We have no idea what the internals of these libraries will
do, they could have bugs which cause SIGSEGV or SIGBUS signals, or
they just might decide to call exit() at some point.

One thing that I have not been able to figure out is how to determine
if a dead process was holding a POSIX mutex when it died. We could
try to use a pthread_cancel() mechanism in which each lock will install
a cleanup handler, set a handler for all signals (and atexit()), and
try to gracefully clean up things by cancelling each thread and joining
to it. However, I don't think this is a robust solution for the
following reasons:

o The POSIX standard has not defined a reasonable set of
cancellation points, and async cancels are unusable.

o In my experience, signals and threads have never quite worked
together in a "harmonious" fashion.

o Code complexity will quickly become a nightmare, we anticipate
several thousand active mutexes in shared memory at any given
time.

o I'm worried about the performance impact.

Instead of trying to use cancel, I would rather do some post mortem
analysis on the shared memory segment to find out which mutexes were
locked by the dead process, or have any waiters of the lock notified
that the current owner of the mutex died (which NT can do).

If using the post mortem approach, I need a way to determine if the
current mutex holder is dead. This can be done by calling a routine
which returns the process id of the current holder (I know the dead
process id), or by just indicating if the current holder is a valid
process. Currently POSIX threads doesn't have anything to support
this. Is there a Solaris port specific method we can use? The
pthread_mutex_t structure has a pthread_mutex_owner64 field, but I'm
pretty sure this just references kernel info we don't have access to,
correct?

The approach where the pthread_mutex_lock() returns an error if the owner
dies is actually my preferred solution, since I then wouldn't have to
maintain a list of all active mutexes. On NT, if the thread owning a
mutex dies, one of the active waiters is given a WAIT_ABANDONED return
code. This can be used to coordinate recovery. Is there any plans on
updating the POSIX standard to have the mutex lock operations fail if
the current owner dies? Is there any way we could emulate this
behaviour on Solaris? (we'll worry about the other POSIX ports later.)

Thanks in advance for any advice,

Otto

--
Otto Lind Softwire Corporation (North office)
ot...@olcs.com 12125 285th street, Lindstrom, MN 55045
skypoint!olcs!otto voice:(612)257-1259 fax:(612)257-0923

Bryan O'Sullivan

unread,
Dec 2, 1996, 3:00:00 AM12/2/96
to

o> One aspect of our architecture is that we have a coordinator
o> process which tries to clean up all system resources in use if a
o> child process dies, and restarts the failed process in recovery
o> mode so that it can restore a consistent state of all objects it
o> manages and continue execution.

Wow. If you're using multiple processes, each with multiple threads,
all communicating through shared memory, I'm glad you're the one
trying to do the recovery work upon a crash, and not me.

o> One thing that I have not been able to figure out is how to
o> determine if a dead process was holding a POSIX mutex when it died.

There is no portable interface for doing this, because it is not a
useful feature in general. Trying to restore a state of sanity in the
face of shared data structures being screwy strikes me as sufficiently
hard as to probably not be worth the effort for 99% of people.

o> Instead of trying to use cancel, I would rather do some post mortem
o> analysis on the shared memory segment to find out which mutexes
o> were locked by the dead process, or have any waiters of the lock
o> notified that the current owner of the mutex died (which NT can
o> do).

This sounds like a more sensible way to do things. If I were you, I'd
write a heavyweight mutex library that maintained enough bookkeeping
information for you to be able to figure out this kind of thing with
ease. You could piggyback atop POSIX threads for portability.

o> Currently POSIX threads doesn't have anything to support [finding
o> out who owns a mutex]. Is there a Solaris port specific method we
o> can use?

No, nor will there be at any point in the future. The thread_db
library gets you some of the way there, but it's specific to Solaris,
only documented under 2.5 and above, isn't supported, and is subject
to change at the whim of the maintainers. Oh, and it's a bit of a
pain to use, too.

o> Is there any plans on updating the POSIX standard to have the mutex
o> lock operations fail if the current owner dies?

Not that I am aware of, but Dave Butenhof could better answer this
question, since he's actually involved in the standards activity stuff
and I'm not.

<b

--
Let us pray:
What a Great System. b...@eng.sun.com
Please Do Not Crash. b...@serpentine.com
^G^IP@P6 http://www.serpentine.com/~bos

Andrew Gabriel

unread,
Dec 2, 1996, 3:00:00 AM12/2/96
to

Define your own mutex, say our_mutex_t, which contains a mutex_t
and a process number. Any process locking the mutex writes its
own process number in after locking it, and clears it out to
something invalid just before unlocking it.

However, a more significant problem is the tidying up.

First part of tidying up is recovering the data structure which
the mutex is protecting. It is presumably in an inconsistent
state, which is why the mutex was preventing other access to it.
Your cleanup handler will need to know how to repair it.

Then you have a locked mutex with no (official) way to unlock it,
since (officially) it can only be unlocked by the process(/thread)
which locked it, and that will never exist again. Reading
between the lines, I suspect Solaris will allow another process
to unlock it, but such behaviour would certainly not be portable
or guarateed to work predictably in the future. Perhaps a way
round this would be to use semaphores (and hope a sem_post or
sem_wait doesn't crash :-) ).

--
Andrew Gabriel Home: And...@cucumber.demon.co.uk
Consultant Software Engineer Work: Andrew....@net-tel.co.uk


Otto Lind

unread,
Dec 3, 1996, 3:00:00 AM12/3/96
to

Andrew Gabriel <and...@cucumber.demon.co.uk> wrote:
: Define your own mutex, say our_mutex_t, which contains a mutex_t

: and a process number. Any process locking the mutex writes its
: own process number in after locking it, and clears it out to
: something invalid just before unlocking it.

I've already done that, and in stress testing found the window (more than
once) where the process dies after locking the mutex, but before it can
set the process id; or clears the id and crashes before doing the unlock.

: First part of tidying up is recovering the data structure which


: the mutex is protecting. It is presumably in an inconsistent
: state, which is why the mutex was preventing other access to it.
: Your cleanup handler will need to know how to repair it.

We've got that part down (I've done transactionally logged shared memory
implementations before).

: Then you have a locked mutex with no (official) way to unlock it,


: since (officially) it can only be unlocked by the process(/thread)
: which locked it, and that will never exist again. Reading
: between the lines, I suspect Solaris will allow another process
: to unlock it, but such behaviour would certainly not be portable
: or guarateed to work predictably in the future. Perhaps a way
: round this would be to use semaphores (and hope a sem_post or
: sem_wait doesn't crash :-) ).

I consider this a hole in the standard, making interprocess mutexes
somewhat worthless on paper (hey, if ya gotta problem, just kill all your
processes, or better yet, just reboot the machine!!! Where are those
alt-control-del keys? :-). Solaris pthreads (in 2.5.1) allows anyone to
unlock a mutex that was locked by a dead process. We'll address this on
a platform/OS case by case basis.

Dave Butenhof

unread,
Dec 3, 1996, 3:00:00 AM12/3/96
to

Bryan O'Sullivan wrote:
> o> Is there any plans on updating the POSIX standard to have the mutex
> o> lock operations fail if the current owner dies?
>
> Not that I am aware of, but Dave Butenhof could better answer this
> question, since he's actually involved in the standards activity stuff
> and I'm not.

Gee, Bryan, and up until that LAST paragraph, I was all ready to just
assume you'd said all that needed to be said, and go on to the next
message. ;-)

No, there are no such plans in POSIX. It's unlikely that any such
"widget" would be added in the future, although it's not impossible.

Verifying the current owner on a lock attempt is expensive, expecially
with pshared mutexes. It means a call into the kernel (a kill(pid,0) or
equivalent) to validate the pid that owns the mutex. You would actually
need to do a little more, since (at least in theory) a pid could be
recycled eventually, and you'd need to eliminate the false positive by
doing some additional check.

POSIX mutexes aren't required to track ownership, mostly to avoid
overhead. In theory, even with pshared mutexes, a lock can be a few
quick instructions, and determining the identity of the thread can be
relatively expensive. Priority inheritance requires tracking ownership,
and extended mutex types like "recursive" mutexes require ownership.
There's no inherent requirement that pshared mutexes track ownership,
which means rather than slow down the fast locking path, if I were to
design such a beast as you propose, we'd be talking about a new
"detect-orphan" mutex type that would be penalized for the additional
error recovery support. Before going to sleep on a "detect-orphan"
mutex, the thread would validate the pthread_t and pid_t of the current
owner, returning some sort of new error on failure.

Then what? I suppose it could wake any other waiters with the same
failure status. You could then restart everything with a new mutex. You
can't even destroy the "broken" mutex -- waking the waiters would NOT
unlock the mutex, so you can't lock it or do anything else to it. Why
not unlock it? Because mutexes protect shared data, and the fact that
it's locked means that data is (or at least may be) inconsistent. YOU
would have to analyze and restore the data, and you can't do that until
after you've detected the failure. So unlocking the mutex when the
thread library sees the problem would be wrong (another thread could
immediately lock the mutex and blow up over inconsistent data). No way
are you going to get a supported function to unlock a mutex owned by
another thread.

Interestingly, POSIX has already addressed this sort of issue. The
amendments 1003.1d and 1003.1j both contain functions to support
realtime "failsafe" error recovery mechanisms, including mutex wait
timeouts and abort (non-recoverable cancellation for stubborn threads
that don't respond to normal cancellation). But these are intended to
satisfy the "hard realtime" requirements of NASA and DOD, without
severely compromising the architectural principles of POSIX threads.
Hard realtime applications are a very different breed from what you're
talking about -- they KNOW which threads are doing what, and what data
they might touch, and all sorts of things that you don't know. You could
probably go "most of the way" with these extensions, when and if they
are supported by the systems you care about, but not as easily as you'd
like.

I have a feeling there's not enough demand for "detect-orphan" mutexes
to justify the development work. But by all means register your needs
with the O/S vendors you care about -- you may NOT be the only one, and
maybe, if everyone doesn't stay silent, there WILL be enough demand.

In any case, in the short term, go with Bryan's advice. You can always
get the current thread's pthread_t, and the current process pid_t, and
you can store them. So you can create your own mutex with all the
information you want. The main hitch is that you may have to add a
server thread into each process to validate pthread_t values using
pthread_kill(tid,0) -- you can't do that from outside the process. As
for any "mutex wrapper" package, use a condition variable to wait for a
locked mutex, and use the "real mutex" only to protect access to your
extended mutex structure.

/---[ Dave Butenhof ]-----------------------[ bute...@zko.dec.com ]---\
| Digital Equipment Corporation 110 Spit Brook Rd ZKO2-3/Q18 |
| 603.881.2218, FAX 603.881.0120 Nashua NH 03062-2698 |
\-----------------[ Better Living Through Concurrency ]----------------/

Steven G. Townsend

unread,
Dec 3, 1996, 3:00:00 AM12/3/96
to

Otto Lind wrote:

>
> Dave Butenhof <bute...@zko.dec.com> wrote:
>
> > Before going to sleep on a "detect-orphan" mutex, the thread would
> > validate the pthread_t and pid_t of the current owner, returning some
> > sort of new error on failure.
>
> This is one approach, another would be to have the kernel get track of
> all mutexes owned by a process (or thread), and implicitly unlock them
> upon exit. In either case, the mutex itself would be flagged so that the
> next waiter which is granted ownership on it is returned a special error
> code indicating that it was granted ownership only because the last owner
> died. This is pretty much how "Mutex" is implemented on NT.

This would be very expensive. For instance, for every successful lock the
system would have to inspect and modify the kernal process data structure.
This would involve some additional locking of kernal data structures.
Additionally, a unlock would require a search of the list (you said that
there could be thousands?) and the removal of the information from the
list (lock proc mutex, unlink entry, free() mem block, unlock proc mutex).
The only way to speed up the mutex tracking in this case is to statically
allocate an array with one entry for each possible mutex that might be
involved. In this case that would result (in your example) in 10000+
entities being assigned to each thread (which *significantly* increases
the size of the thread and process structures -- and introduces other
problems). Even if you did this you would still have the proc/thread
structure manipulation to deal with...
And remember, the mutex lock/unlock paths are supposed to be *FAST*.

>
> > Then what? I suppose it could wake any other waiters with the same
> > failure status. You could then restart everything with a new mutex.
>

> No need to have a new mutex, just re-use the same one, only have the
> special error return.
>

But if the process died, the mutex now "protects" (potentially)
corrupted data. This prevents other threads from using corrupt
data as "good" and spreading the damage and corruption.

> > Why not unlock it? Because mutexes protect shared data, and the fact
> > that it's locked means that data is (or at least may be) inconsistent.
> > YOU would have to analyze and restore the data, and you can't do that
> > until after you've detected the failure.
>

> If ownership is relinquished upon process death, and a special error is
> returned, you would only have handle the special case of the lock
> returning the error code. Yes, it would be up to the application to make
> sure that the data being protected by the mutex is consistent after being
> granted ownership, but that is better than the current situation, which
> is that the data is never accessible again, unless all processes are
> killed and the shared memory mutex blown away.

In order for thread B to "clean-up" thread A's inconsistent data, it would
need to know exactly what thread A was doing when it died (and potentially
the values of thread-private data and/or thread specific stack data).
This is not the typical situation (or even desired situation) when working
with multiple threads.

Consider the following scenario:
A global structure which contains the maximum time a thread has been executing,
the amount of time since the last check and the threads id is maintained...
Pseudocode might look like this:

Set the (stack) variable start_time for this thread to the current time
at thread creation time.
Set the (stack) varaible t_pid for this thread to the thread id for this
thread.

while (i=<random number between 0 and 10000)
{
thread_sleep(i)
lock mutex()
if ((<current_time> - start_time) > global_max_time)
{
global_max_th_id = t_pid;
global_max_time = <current_time> - start_time;
glabal_last_interval = i;
}
unlock mutex()
}

Assume that the thread dies inside the routine which returns the current
time inside the if statement... The thread_id has already been updated.
What value should thread B assign to global_max_time and
global_last_interval when "cleaning up" the data structure?


>
> > I have a feeling there's not enough demand for "detect-orphan" mutexes
> > to justify the development work. But by all means register your needs
> > with the O/S vendors you care about -- you may NOT be the only one, and
> > maybe, if everyone doesn't stay silent, there WILL be enough demand.
>

> And by the year 2020, something might actually get implemented. No
> thanks, I was wonder if there was a workaround that I could use today.
>
> Frankly, I feel that the Unix/POSIX vendors will need to address this in
> some fashion, as more developers start implementing multi-process
> multi-threaded architectures. Otherwise, the builtin support for this on
> NT will tend to sway some people towards that platform (something I would
> hate to see happen; I'm not a big fan of NT).


>
> > In any case, in the short term, go with Bryan's advice. You can always
> > get the current thread's pthread_t, and the current process pid_t, and
> > you can store them.
>

> But isn't there race conditions when establishing a lock and setting this
> info? What if the process dies after the lock is granted, but before the
> instructions are executed to do the set? This is what is currently
> happening to me in my stress tests.
> You would have to build your own wrapper datastructure and use a separate
mutex to protect that datastructure as appropriate (as well as the mutex
which protects the actual data you are protecting)... Sounds a bit like
the "lock the proc mutex...." huh?

> > As for any "mutex wrapper" package, use a condition variable to wait
> > for a locked mutex, and use the "real mutex" only to protect access to
> > your extended mutex structure.
>

> I'm not following this, are you saying that I should use a global mutex
> and condition variable to protect the "real mutex"? Wouldn't that create
> a "minor" hotspot in the system when it comes to contention? And how
> would I protect the global mutex from process death?

See above two entries... It would certainly increase the number of times
you would be locking/unlocking mutexes, blocking on condvars, etc.
This gets back to the original argument that providing "detect-orphan"
mutex support is a expensive proposition.

Hope this helps.
>
> Thanks for your response(s), any additional insight would be appreciated.
>
> Otto

Otto Lind

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

Dave Butenhof <bute...@zko.dec.com> wrote:

> Before going to sleep on a "detect-orphan" mutex, the thread would
> validate the pthread_t and pid_t of the current owner, returning some
> sort of new error on failure.

This is one approach, another would be to have the kernel get track of
all mutexes owned by a process (or thread), and implicitly unlock them
upon exit. In either case, the mutex itself would be flagged so that the
next waiter which is granted ownership on it is returned a special error
code indicating that it was granted ownership only because the last owner
died. This is pretty much how "Mutex" is implemented on NT.

> Then what? I suppose it could wake any other waiters with the same


> failure status. You could then restart everything with a new mutex.

No need to have a new mutex, just re-use the same one, only have the
special error return.

> Why not unlock it? Because mutexes protect shared data, and the fact


> that it's locked means that data is (or at least may be) inconsistent.
> YOU would have to analyze and restore the data, and you can't do that
> until after you've detected the failure.

If ownership is relinquished upon process death, and a special error is
returned, you would only have handle the special case of the lock
returning the error code. Yes, it would be up to the application to make
sure that the data being protected by the mutex is consistent after being
granted ownership, but that is better than the current situation, which
is that the data is never accessible again, unless all processes are
killed and the shared memory mutex blown away.

> I have a feeling there's not enough demand for "detect-orphan" mutexes


> to justify the development work. But by all means register your needs
> with the O/S vendors you care about -- you may NOT be the only one, and
> maybe, if everyone doesn't stay silent, there WILL be enough demand.

And by the year 2020, something might actually get implemented. No
thanks, I was wonder if there was a workaround that I could use today.

Frankly, I feel that the Unix/POSIX vendors will need to address this in
some fashion, as more developers start implementing multi-process
multi-threaded architectures. Otherwise, the builtin support for this on
NT will tend to sway some people towards that platform (something I would
hate to see happen; I'm not a big fan of NT).

> In any case, in the short term, go with Bryan's advice. You can always
> get the current thread's pthread_t, and the current process pid_t, and
> you can store them.

But isn't there race conditions when establishing a lock and setting this
info? What if the process dies after the lock is granted, but before the
instructions are executed to do the set? This is what is currently
happening to me in my stress tests.

> As for any "mutex wrapper" package, use a condition variable to wait


> for a locked mutex, and use the "real mutex" only to protect access to
> your extended mutex structure.

I'm not following this, are you saying that I should use a global mutex
and condition variable to protect the "real mutex"? Wouldn't that create
a "minor" hotspot in the system when it comes to contention? And how
would I protect the global mutex from process death?

Thanks for your response(s), any additional insight would be appreciated.

Bart Smaalders

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

The problem of handling process death in the case of shared memory
mutexes is rather thorny. Without operating system intervention it is
not possible to reliably deal with the asynchronous death of a
process. The design center for shared memory mutexes is intimately
cooperating processes, perhaps directly parent and child, rather than
distantly related processes that may crash (or worse yet, corrupt w/o
crashing those shared memory segments containing the locks). The
System V IPC mechanisms seem ideally suited to handling the later
case, and allows less trusted apps to safely synchronize accesses to
a shared resource.

Overall, after re-reading your original request my feeling is that
you'd be better off partitioning your application differently so that
only trusted code is part of the database portion; crashes will then
be at least less frequent :-). When a crash does occur, it will be
necessary to halt all pending operations and perform recovery operations
back to the last known correct data; by having the non-crashed database
processes back out of their locks you can avoid having to recover very
many corrupted objects. This all implies, of course, that you don't
support transactional operations (atomic modification to multiple objects
at once).

- Bart


---
Bart Smaalders Solaris Clustering SunSoft
ba...@cyber.eng.sun.com (415) 786-5335 MS UMPK17-301
http://playground.sun.com/~barts 2550 Garcia Ave
Mt View, CA 94043-1100


Otto Lind

unread,
Dec 4, 1996, 3:00:00 AM12/4/96
to

Steven G. Townsend <stow...@pacbell.net> wrote:

> In order for thread B to "clean-up" thread A's inconsistent data, it would
> need to know exactly what thread A was doing when it died (and potentially
> the values of thread-private data and/or thread specific stack data).
> This is not the typical situation (or even desired situation) when working
> with multiple threads.

But it is a common occurance when dealing with databases. This is one
aspect that I'm not worried about, we've architected a fairly elegant
solution for this.

> You would have to build your own wrapper datastructure and use a separate
> mutex to protect that datastructure as appropriate (as well as the mutex
> which protects the actual data you are protecting)... Sounds a bit like
> the "lock the proc mutex...." huh?

But isn't this "separate mutex" also vulnerable to process death? Who
protects the "protection mutex"? I'm just curious, since we've decided
to take a more conservative approach based on feedback from this group.

The new approach will detect unexpected process deaths, stop all other
running processes, recover object state and reinitialize all shared
memory mutexes, then restart all processes. Cuts into our high
availability, but it's better than having a hosed system (and hopefully
won't happen that often!).

Thanks to everyone who responded,

Dave Butenhof

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to

> > Before going to sleep on a "detect-orphan" mutex, the thread would
> > validate the pthread_t and pid_t of the current owner, returning some
> > sort of new error on failure.
>
> This is one approach, another would be to have the kernel get track of
> all mutexes owned by a process (or thread), and implicitly unlock them
> upon exit. In either case, the mutex itself would be flagged so that the
> next waiter which is granted ownership on it is returned a special error
> code indicating that it was granted ownership only because the last owner
> died. This is pretty much how "Mutex" is implemented on NT.

Sure, we could have the "detect-orphan" mutex type fully implemented
inside the kernel, where the lock/unlock sequences can be interrupted
only if the system crashes, and the pid/tid can be validated and checked
each time anyone tries to lock it. Awfully expensive (even compared to
other kernel synchronization mechanisms, much less user-mode mutexes),
and extremely special-purpose. I think this is way beyond the legitimate
scope of "POSIX threads".

> Yes, it would be up to the application to make
> sure that the data being protected by the mutex is consistent after being
> granted ownership, but that is better than the current situation, which
> is that the data is never accessible again, unless all processes are
> killed and the shared memory mutex blown away.

The data is completely accessible. You just need to know how to repair
the inconsistent state before using it. You said you can do that. Fine.
You need to discard the mutex (because you can't unlock it), and you
need to do something with any threads blocked on the mutex -- awkward,
but no worse than tracking who's doing what to which data and then
recovering from an arbitrary failure. Could "the system" do it better?
Maybe -- but few could use the capability, and it would cost everyone.

Why don't you just use a lock file -- write a process/thread ID pair
into a file. If the file's there, it's locked -- and you can read it to
find the ID and see if the process is there, ask it if the thread is
there. What if it crashes between creation and write? It could first
write to a secondary "intent to write" log file, for example, and if
you find an empty primary lock file, validate each intended locker.

Expensive? Sure. So is what you're proposing. Maybe it's too expensive
for your needs -- but even at that, are you sure your proposal wouldn't
be?

What you need is maybe something more along the lines of the OpenVMS
"lock manager" facility -- you can even get immediate notification if
the owner of a lock goes away. Something like it seems to be fairly
common in UNIX "clustering" packages, because the model can be made to
work well between cluster members without shared memory.

Andrew Gabriel

unread,
Dec 5, 1996, 3:00:00 AM12/5/96
to

In article <32A41E...@zko.dec.com>,

Dave Butenhof <bute...@zko.dec.com> writes:
>As for any "mutex wrapper" package, use a condition variable to wait for a
>locked mutex, and use the "real mutex" only to protect access to your
>extended mutex structure.

Please, why is this?
I've seen this stated a few times, and have implemented this way myself
(in order to emulate NT's recursive mutex on a pthreads system), but
I've never understood why things should be done this way.

Douglas C. Schmidt

unread,
Dec 8, 1996, 3:00:00 AM12/8/96
to

In article <581rrv$t...@stratus.skypoint.net>,
Otto Lind <ot...@skypoint.com> wrote:
++ I consider this a hole in the standard, making interprocess mutexes
++ somewhat worthless on paper (hey, if ya gotta problem, just kill all your
++ processes, or better yet, just reboot the machine!!!

I agree. I have found interprocess mutexes (e.g., USYNC_PROCESS in
Solaris threads) to be completely useless for the concurrent
programming I've done. For instance, unless you can guarantee the
order in which processes start up, it's hard to initialize and name
interprocess mutexes correctly. Likewise, rebooting the machine seems
like the only portable way to handle certain types of error
conditions, as you describe above.

For these reasons, ACE implements "Process_Mutex" using either System V
semaphores, named POSIX semaphores, or named Win32 mutexes, depending
on the platform.

Doug
--
Dr. Douglas C. Schmidt (sch...@cs.wustl.edu)
Department of Computer Science, Washington University
St. Louis, MO 63130. Work #: (314) 935-7538; FAX #: (314) 935-7302
http://www.cs.wustl.edu/~schmidt/

Dave Butenhof

unread,
Dec 9, 1996, 3:00:00 AM12/9/96
to

Andrew Gabriel wrote:
>
> In article <32A41E...@zko.dec.com>,
> Dave Butenhof <bute...@zko.dec.com> writes:
> >As for any "mutex wrapper" package, use a condition variable to wait for a
> >locked mutex, and use the "real mutex" only to protect access to your
> >extended mutex structure.
>
> Please, why is this?
> I've seen this stated a few times, and have implemented this way myself
> (in order to emulate NT's recursive mutex on a pthreads system), but
> I've never understood why things should be done this way.

First off, in nearly any case where you want to store thread ids, you'll
want to check them. You usually need to do this before trying to lock
the "real mutex". Especially for a recursive mutex, where you're trying
to avoid the deadlock that would otherwise occur (you want to skip the
lock attempt entirely). If you check your current ID (pthread_t) against
that stored in your "my_mutex_t" structure, without owning a mutex that
had been previously UNLOCKED by the thread that wrote the ID (after
writing it), then you're not guaranteed to see the right data. That's
the way the POSIX memory model works, and for very good reasons too
deep to get into here. Simply put, if you look at any shared data
without a mutex, you're hosed.

It's silly to invent a second mutex that controls access to the data
that you wrap around the "real mutex". Of course, if you do invent such
a mutex, you must ensure that, when you wait for the "real mutex" to
become free, you atomically release the structure mutex before trying
to lock the real mutex -- otherwise you risk the state changing before
you block on the mutex. (You may need to re-lock the structure mutex and
verify.)

But if you use a mutex to protect your structure, and a condition
variable to wait for the (now imaginary) "real mutex" to become free,
you've simplified everything and solved all the problems. You have data
visibility because the data is protected by a mutex. You don't have to
worry about atomicity because that's handled already by condition wait
semantics. You don't have deadlock because you know that the mutex will
NOT be locked by the current thread -- you unlock at the end of each
call into your "extended mutex package".

Georges Brun-Cottan

unread,
Dec 10, 1996, 3:00:00 AM12/10/96
to

Dave Butenhof <bute...@zko.dec.com> writes:

> Andrew Gabriel wrote:
> >
> > In article <32A41E...@zko.dec.com>,
> > Dave Butenhof <bute...@zko.dec.com> writes:
> > >As for any "mutex wrapper" package, use a condition variable to wait for a
> > >locked mutex, and use the "real mutex" only to protect access to your
> > >extended mutex structure.
> >
> > Please, why is this?
> > I've seen this stated a few times, and have implemented this way myself
> > (in order to emulate NT's recursive mutex on a pthreads system), but
> > I've never understood why things should be done this way.
>
> First off, in nearly any case where you want to store thread ids, you'll
> want to check them. You usually need to do this before trying to lock
> the "real mutex". Especially for a recursive mutex, where you're trying
> to avoid the deadlock that would otherwise occur (you want to skip the
> lock attempt entirely). If you check your current ID (pthread_t) against
> that stored in your "my_mutex_t" structure, without owning a mutex that
> had been previously UNLOCKED by the thread that wrote the ID (after
> writing it), then you're not guaranteed to see the right data. That's
> the way the POSIX memory model works, and for very good reasons too
> deep to get into here. Simply put, if you look at any shared data
> without a mutex, you're hosed.

Humm. What's wrong in the following implementation (pseudo C) of a
recursive lock on top of non recursive one ? 'owner' and 'level' are
shared data not protected by any locks:

1: UserLock () {
2: if ( (level == 0) || (owner != SELF) ) {
3: system_raw_not_recursive_lock();
4: owner = SELF;
5: }
6: level++;
7: }

Let's assume that 'level' and 'owner' are states associated to the
lock. For correctness, we need also 'level' to satisfy the register
property, which is usually the case for a scalar on a shared memory.

The essential of the proof is to exploit the causality (ordering)
between the statement at line 4 and the one at line 6, that is, if
level is greater than 0 then owner field is correct (deterministically
identifying the current owner thread). If level is zero, owner
accuracy is not important. Data race on owner can occur only when
level is zero.

It's look simple, just one mutex, no condition variable.

for information the unlock is :

1: UserUnlock () {
2: level--;
3: if (level == 0)
4: system_raw_not_recursive_unlock();
5: }


--
georges

Dave Butenhof

unread,
Dec 12, 1996, 3:00:00 AM12/12/96
to

-- Georges Brun-Cottan wrote:
> Humm. What's wrong in the following implementation (pseudo C) of a
> recursive lock on top of non recursive one ? 'owner' and 'level' are
> shared data not protected by any locks:
>
> 1: UserLock () {
> 2: if ( (level == 0) || (owner != SELF) ) {
> 3: system_raw_not_recursive_lock();
> 4: owner = SELF;
> 5: }
> 6: level++;
> 7: }
>
> Let's assume that 'level' and 'owner' are states associated to the
> lock. For correctness, we need also 'level' to satisfy the register
> property, which is usually the case for a scalar on a shared memory.
>
> The essential of the proof is to exploit the causality (ordering)
> between the statement at line 4 and the one at line 6, that is, if
> level is greater than 0 then owner field is correct (deterministically
> identifying the current owner thread). If level is zero, owner
> accuracy is not important. Data race on owner can occur only when
> level is zero.

You're making some non-portable assumptions about memory systems. Not
all memory systems do writes and reads "in order" across multiple
processors. Handling them out of order can result in substantially
better performance most of the time. The only time ordering matters is
for synchronization -- and you're supposed to do something (like locking
and unlocking mutexes) to make your ordering assumptions explicit.

If you clear the owner on release, it's possible that your code will
work. That would avoid the possibility of a thread seeing its own
(stale) owner value but some other thread's more recent (non-zero) level
value, and skipping the real mutex lock.

> It's look simple, just one mutex, no condition variable.

Yup. Simple C code. Much more complicated set of assumptions about the
memory system. As coded, it won't work. If you clear the owner field,
it'll probably work -- unless I've missed something in my brief
analysis.

Following the POSIX memory visibility rules will result in more
complicated C code implementing an architecture that is really
substantially simpler -- because it no longer includes implicit
dependencies on the underlying system architecture.

Georges Brun-Cottan

unread,
Dec 12, 1996, 3:00:00 AM12/12/96
to

Dave Butenhof <bute...@zko.dec.com> writes:

>
> > It's look simple, just one mutex, no condition variable.
>
> Yup. Simple C code. Much more complicated set of assumptions about the
> memory system. As coded, it won't work. If you clear the owner field,
> it'll probably work -- unless I've missed something in my brief
> analysis.

First, on uniprocessors, it works. Because C and all imperative
language compilers will (and must) enforce the semantics of the
programs considering the ordering of instructions. This whatever weird
can be the memory/cache/processor underlying architecture. And that's
the only needed property.

Second, on multiprocessors, it works if at least processor consistency
is achieved. That is roughly, FIFO ordering is achieved between
processors: each processor see the writes of each other processors in
the order these writes have been processed on this processor. Can we
guess more "relaxed" semantics ? Yes sure... But if processor
consistency is not achieved *at least*, semantics of programs are
changed ! So it is unlikely to be the case. All relaxed memory model,
I am aware of (weak consistency, release-consistency and derivated) at
least, achieve FIFO ordering between processors.

Last. Clearing the owner field is useless. It doesn't change anything
on the ordering of the execution on one processor of the
two instructions :
owner = SELF;
level++;

Which is the only thing needed in addition to the register property of
'level'. In fact, from the register property I need only the part
related to "a read returns a value which has been written".


> You're making some non-portable assumptions about memory systems.

True. But see above. Does such a multiprocessor exist? Would it work
with existing compiler techniques?


> If you clear the owner on release, it's possible that your code will
> work. That would avoid the possibility of a thread seeing its own
> (stale) owner value but some other thread's more recent (non-zero) level
> value, and skipping the real mutex lock.

See the processor consistency requirement above.

>
> You're making some non-portable assumptions about memory systems. Not
> all memory systems do writes and reads "in order" across multiple
> processors. Handling them out of order can result in substantially
> better performance most of the time. The only time ordering matters is
> for synchronization -- and you're supposed to do something (like locking
> and unlocking mutexes) to make your ordering assumptions explicit.
>

[...]


> Yup. Simple C code. Much more complicated set of assumptions about the
> memory system. As coded, it won't work. If you clear the owner field,
> it'll probably work -- unless I've missed something in my brief
> analysis.

Just that : "As coded, it won't work." is a too strong statement: On a
uniprocessor using a POSIX compliant compiler (I'm sure such a beast
must exist :-)) it works. On a multiprocessor, It is likely to work
even on most of the weird architecture (Do you have a DASH machine or
somthing equivalent ? I would enjoy testing that).

I think you have overgeneralized the ordering problem of
multiprocessors. Total ordering of access is *very* problematic, not
the FIFO ordering (even on loosely coopled architecture).

--
georges

Bil Lewis

unread,
Dec 12, 1996, 3:00:00 AM12/12/96
to

Dave Butenhof wrote:
>
> -- Georges Brun-Cottan wrote:
> > Humm. What's wrong in the following implementation (pseudo C) of a
> > recursive lock on top of non recursive one ?
...

> You're making some non-portable assumptions about memory systems. Not
> all memory systems do writes and reads "in order" across multiple
> processors. Handling them out of order can result in substantially
> better performance most of the time. The only time ordering matters is
> for synchronization -- and you're supposed to do something (like locking
> and unlocking mutexes) to make your ordering assumptions explicit.

Georges,

Dave is correct, of course. (He seems to be making a habit of that.) To
complete the discussion, I'd like to add a bit of code...

Said rules basically say "Lock all shared data." Here is a functioning
recursive mutex. Note that a mutex is taken before any data is accessed.
After it is released, nothing more is done. It *has* to be this way.
Virtually any attempt to use data outside of the mutex will result either
in data races or lost wakeups. (This code is about 3x slower than simple
mutexes.) (Full listing on the web page below.)

int pthread_np_recursive_mutex_lock(pthread_np_recursive_mutex_t *m)
{int tid = pthread_self();

pthread_mutex_lock(&(m->lock));
while (! ((m->owner == NULL) || m->owner == tid))
pthread_cond_wait(&(m->cv), &(m->lock));
m->owner = tid;
m->count++;
pthread_mutex_unlock(&(m->lock));
return(0);
}


int pthread_np_recursive_mutex_unlock(pthread_np_recursive_mutex_t *m)
{
pthread_mutex_lock(&(m->lock));
m->count--;
if (m->count == 0)
{m->owner = NULL;
pthread_mutex_unlock(&(m->lock));
pthread_cond_signal(&(m->cv));
return(0);
}
pthread_mutex_unlock(&(m->lock));
return(0);
}


-Bil

--
================
B...@LambdaCS.com

http://www.LambdaCS.com
Lambda Computer Science
555 Bryant St. #194
Palo Alto, CA,
94301

Phone/FAX: (415) 328-8952

Dave Butenhof

unread,
Dec 16, 1996, 3:00:00 AM12/16/96
to

>> Simple C code. Much more complicated set of assumptions about the
>> memory system. As coded, it won't work. If you clear the owner field,
>> it'll probably work -- unless I've missed something in my brief
>> analysis.
>
> First, on uniprocessors, it works. Because C and all imperative
> language compilers will (and must) enforce the semantics of the
> programs considering the ordering of instructions. This whatever weird
> can be the memory/cache/processor underlying architecture. And that's
> the only needed property.

Yes, on uniprocessors it works. Fine. There *are* advantages to using
multiple threads on a uniprocessor, and that's cool. But to design
threaded code that will *only* work on a uniprocessor seems a little
pointless, doesn't it?

> Second, on multiprocessors, it works if at least processor consistency
> is achieved. That is roughly, FIFO ordering is achieved between
> processors: each processor see the writes of each other processors in
> the order these writes have been processed on this processor. Can we
> guess more "relaxed" semantics ? Yes sure... But if processor
> consistency is not achieved *at least*, semantics of programs are
> changed ! So it is unlikely to be the case. All relaxed memory model,
> I am aware of (weak consistency, release-consistency and derivated) at
> least, achieve FIFO ordering between processors.

No, multiprocessors do NOT need to support FIFO ordering between
processors. Some already do not, (for example, Alpha), and you'll see
more of them as everyone scrambles for faster memory architectures to
avoid falling even further behind CPU speeds. Although I believe all
current SPARC systems provide fully ordered SMP memory access, I have
seen a few hints that Sun "reserves the right" to change this in future
versions of the architecture (whether that would ever happen is a
different matter -- it would be a difficult, though ultimately
rewarding, transition.)

Your code simply won't work without memory ordering unless you clear
your owner field. Clearing it on release will, at least, prevent a
previous owner from seeing a "false positive" and believing it owns the
recursive mutex after another thread has locked it. There may well be
other problems -- but maybe you're lucky.

To write portable threaded code, NEVER (NEVER!!!) access shared data
without holding a mutex, even for read-only access. Period. No
exceptions. Look at the recursive mutex code Bil Lewis posted. It's
simple, and it's correct.

You can usually make a few shortcuts on shared memory reads... a single
aligned int (and long, when long is 32 bits or you're on a 64 bit
machine) will *probably* be read and written by a single atomic bus
transaction on most systems. Not guaranteed, but reasonably safe.
However short and char data are much less likely to be safe, because
most systems (especially RISC systems) don't support short bus
transactions. And even when you know a transaction is atomic, that
doesn't help if you need to read more than one *consistent* value,
because you need to *synchronize* the two reads.

Beyond that... you're treading in completely processor-specific
territory. Study the architecture manual for your system VERY carefully
and very thoroughly. Experiment. Understand what's true of YOUR model,
and what's guaranteed by the architecture. Do this for EACH architecture
you care about. And make sure you comment your architectural assumptions
carefully so the next person to come along won't be hopeless lost in
your code.

And if you want to ignore my advice, that's just fine. You don't need to
argue about it. I'm just trying to help. I give advice, but I don't make
rules, and your code is your responsibility, not mine.

Bryan O'Sullivan

unread,
Dec 16, 1996, 3:00:00 AM12/16/96
to

d> No, multiprocessors do NOT need to support FIFO ordering between
d> processors. Some already do not, (for example, Alpha), and you'll
d> see more of them as everyone scrambles for faster memory
d> architectures to avoid falling even further behind CPU
d> speeds. Although I believe all current SPARC systems provide fully
d> ordered SMP memory access, [...]

They do not; they provide single-CPU total store ordering. Stores
from a single processor will be handled in FIFO order by
SuperSPARC-based systems, but if two processors perform a store one
cycle apart, the order in which the stores are committed to memory is
undefined.

This topic is treated quite thoroughly in Curt Schimmel's book "Unix
Systems for Modern Architectures".

Andrew Gabriel

unread,
Dec 17, 1996, 3:00:00 AM12/17/96
to

In article <32B542...@zko.dec.com>,

Dave Butenhof <bute...@zko.dec.com> writes:
>
>No, multiprocessors do NOT need to support FIFO ordering between
>processors. Some already do not, (for example, Alpha), and you'll see
>more of them as everyone scrambles for faster memory architectures to
>avoid falling even further behind CPU speeds.

Does this imply some sort of delayed-writeback memory cache between each
CPU and the shared memory (more accurately delayed-writeback-out-of-order
cache) must be allowed for?

And furthermore, that all such caches are effectively flushed whenever
synchronisation operations (mutexs, condition variables, etc) are
performed?

Dave Butenhof

unread,
Dec 18, 1996, 3:00:00 AM12/18/96
to

Andrew Gabriel wrote:
>
> In article <32B542...@zko.dec.com>,
> Dave Butenhof <bute...@zko.dec.com> writes:
> >
> >No, multiprocessors do NOT need to support FIFO ordering between
> >processors. Some already do not, (for example, Alpha), and you'll see
> >more of them as everyone scrambles for faster memory architectures to
> >avoid falling even further behind CPU speeds.
>
> Does this imply some sort of delayed-writeback memory cache between each
> CPU and the shared memory (more accurately delayed-writeback-out-of-order
> cache) must be allowed for?

Yes, but even more than that -- the memory pipeline within the CPU,
even before cache (or after cache on the input side) can reorder, too.
For example, if a cache line is busy, the CPU might decide to go on to
the next item in the queue. Unless there's a memory barrier...

> And furthermore, that all such caches are effectively flushed whenever
> synchronisation operations (mutexs, condition variables, etc) are
> performed?

It may be much less general than a "cache flush", but synchronization
operations must "propagate to the point of memory coherence" in the
memory system to ensure that there is synchronization. This is true,
for example, of Alpha's LDx_L/STx_C instructions. But the memory
coherence guaranteed by the instructions extends only to a small region
of memory around the referenced address -- say, 64 bytes.

To ensure the sequence of operations outside the lock itself, you use
memory barriers (MB) to limit reordering of memory operations, as I
described before.

Ian Emmons

unread,
Dec 18, 1996, 3:00:00 AM12/18/96
to

Dave Butenhof wrote:
> To write portable threaded code, NEVER (NEVER!!!) access shared data
> without holding a mutex, even for read-only access. Period. No
> exceptions. Look at the recursive mutex code Bil Lewis posted. It's
> simple, and it's correct.
>
> You can usually make a few shortcuts on shared memory reads... a single
> aligned int (and long, when long is 32 bits or you're on a 64 bit
> machine) will *probably* be read and written by a single atomic bus
> transaction on most systems. Not guaranteed, but reasonably safe.
> However short and char data are much less likely to be safe, because
> most systems (especially RISC systems) don't support short bus
> transactions. And even when you know a transaction is atomic, that
> doesn't help if you need to read more than one *consistent* value,
> because you need to *synchronize* the two reads.

I have recently been reading about concurrent sorted linked lists, and some
of the algorithms I've found claim that they can avoid using read locks as
long as reads and writes of pointers are atomic. Now, I suspect that they
are also relying on memory ordering, but I'm not sure of that. For the
moment, assume that this is not the case.

Is it really and truely risky (RISC-y?) to assume that a read or a write of
an aligned pointer is atomic? It seems to me that while a super-fancy
memory architecture might go a bit faster, the overhead of locking and
unlocking many mutexes would swamp any such performance gain.

Assuming that you tell me that this is indeed risky (which is likely, given
the course this thread has taken so far), can you actually name systems that
exhibit the pathological behaviors you speak of? I know that relying on any
answers you give here would be risky, since they could change, but it would
help to know (a) the extent of the problem, and (b) what hardware to test
against to expose the most problems in my code.

A loosely related question: does POSIX have a counterpart to the following
Win32 API's?
InterlockedCompareExchange
InterlockedDecrement
InterlockedExchange
InterlockedExchangeAdd
InterlockedIncrement
If it did, it would certainly help to lessen the impact of having to assume
that all modern memory architectures are being designed by a small band of
rogue hardware engineers that get pleasure out of watching software
engineers go insane.

Thanks in advance,

Ian

___________________________________________________________________________
Ian Emmons Work phone: (415) 372-3623
i...@persistence.com Work fax: (415) 341-8432
Persistence Software, 1720 S. Amphlett Blvd. Suite 300, San Mateo, CA 94402

Xavier Leroy

unread,
Dec 19, 1996, 3:00:00 AM12/19/96
to

Ian Emmons <i...@persistence.com> writes:

> A loosely related question: does POSIX have a counterpart to the following
> Win32 API's?
> InterlockedCompareExchange
> InterlockedDecrement
> InterlockedExchange
> InterlockedExchangeAdd
> InterlockedIncrement

Depends whether you want these atomic operations to be asynchronous
signal-safe or not. If you don't care about signal safety, then you
can always implement these operations using one or several global
mutexes to guarantee atomicity.

Of course, on some processors, you can have much more efficient
mutex-free implementations, e.g. lock; cmpxchg on the 386 or
load-with-reservation/store-conditional on the Alpha and the PowerPC.

If you need these operations to be signal-safe, you can't use mutexes
nor any form of spin-locking. In this case, you're in big trouble on
several widely used hardware architectures which do not provide enough
support for implementing these operations atomically. These include
the Sparc V7 and V8, and also the HPPA 1.1.

At least that's my understanding of these architectures. If anyone
knows how to implement an atomic, lock-free compare-and-exchange on
the Sparc V7, I'd sure like to know.

- Xavier Leroy
--
Valid e-mail address (without the underscores): Xavier.Leroy@i_n_r_i_a.f_r
This is a protection against junk mail. Apologies for the inconvenience.
Home page: http://pauillac.inria.fr/~xleroy/

Bryan O'Sullivan

unread,
Dec 19, 1996, 3:00:00 AM12/19/96
to

x> If anyone knows how to implement an atomic, lock-free
x> compare-and-exchange on the Sparc V7, I'd sure like to know.

You can't, and there isn't really any reason to. Atomic exchanges
don't map well to RISC implementations, whereas NT was designed
primarily for the x86 family (and if you think that Alpha is a
significant source of NT sales, I'm a high-volume reseller of
bridges). It's also easy enough to argue that the atomic exchange
operations provided by NT don't buy you anything useful that mutexes
don't already provide.

Dave Butenhof

unread,
Dec 20, 1996, 3:00:00 AM12/20/96
to

Ian Emmons wrote:
> I have recently been reading about concurrent sorted linked lists, and some
> of the algorithms I've found claim that they can avoid using read locks as
> long as reads and writes of pointers are atomic. Now, I suspect that they
> are also relying on memory ordering, but I'm not sure of that. For the
> moment, assume that this is not the case.

Sure. It's good mental exercise to imagine unlikely things. ;-)

> Is it really and truely risky (RISC-y?) to assume that a read or a write of
> an aligned pointer is atomic? It seems to me that while a super-fancy
> memory architecture might go a bit faster, the overhead of locking and
> unlocking many mutexes would swamp any such performance gain.

Yes, it is "risky", because there are absolutely no guarantees anywhere
that your code will port -- even to newer versions of the architecure
you're using currently.

> Assuming that you tell me that this is indeed risky (which is likely, given
> the course this thread has taken so far), can you actually name systems that
> exhibit the pathological behaviors you speak of? I know that relying on any
> answers you give here would be risky, since they could change, but it would
> help to know (a) the extent of the problem, and (b) what hardware to test
> against to expose the most problems in my code.

"Risk" is a very relative thing. I take a risk every time I get into my
car. I take a greater risk when I turn the ignition. And getting out
into the street... well, that's a whole different order of risk. But I
do it anyway because I consider it a sensible risk for the value.

I can't think of any architecture where an ALIGNED item of pointer size
is not read and written atomically. Nor do I think such an architecure
would make much sense. On the other hand, there's nothing impossible
about it, either, and I don't claim to know every detail of every
current computer architecture. So on the risk scale, I'd say we're
maybe somewhere between "getting in the car" and "starting it". Probably
nowhere near pulling out onto the road.

The question is how much risk are YOU willing to accept in your code?
If you're running mostly on one architecture, and it guarantees atomic
pointer-size accesses, and you're careful to keep all pointers naturally
aligned, then you're probably safe enough to satisfy all but the most
paranoid programmer.

On the other hand, if you're doing things that could result in UNaligned
pointers, all bets are off. RISC systems may handle reads and writes to
unaligned data -- but in general it's emulated using multiple bus
transactions, with no interlock. The result is that you have no
atomicity guarantees at all, and "word tearing" is likely. But then,
RISC systems generally make it difficult to create unaligned data, and
you don't want to do it anyway, just because it's (at best) twice as
slow.

Andrew Gabriel

unread,
Dec 21, 1996, 3:00:00 AM12/21/96
to

In article <32BA84...@zko.dec.com>,

Dave Butenhof <bute...@zko.dec.com> writes:
>
>I can't think of any architecture where an ALIGNED item of pointer size
>is not read and written atomically.

Any system where the data bus is narrower than the pointer size.
I know of two (there are probably others):

80386SX (which will run Solaris, albeit painfully I suspect)
GEC 4160 (32bit minicomputer using 16bit hardware)

>Nor do I think such an architecure would make much sense.

Not now, no :-)

Yousuf Khan

unread,
Dec 23, 1996, 3:00:00 AM12/23/96
to

Dave Butenhof wrote:
>
> Ian Emmons wrote:
> > I have recently been reading about concurrent sorted linked lists, and some
> > of the algorithms I've found claim that they can avoid using read locks as
> > long as reads and writes of pointers are atomic. Now, I suspect that they
> > are also relying on memory ordering, but I'm not sure of that. For the
> > moment, assume that this is not the case.

<snip>



> On the other hand, if you're doing things that could result in UNaligned
> pointers, all bets are off. RISC systems may handle reads and writes to
> unaligned data -- but in general it's emulated using multiple bus
> transactions, with no interlock. The result is that you have no
> atomicity guarantees at all, and "word tearing" is likely. But then,
> RISC systems generally make it difficult to create unaligned data, and
> you don't want to do it anyway, just because it's (at best) twice as
> slow.

I missed the start of this thread. Now, without going into too much
detail rehashing every detail I've likely missed, I'll just ask some
specific questions. What exactly is an atomic read or write of a
pointer?

Does it involve some sort of hardware assisted protection, like
intelligent memory caches, or internal algorithms in the CPUs or
chipsets, etc.?

Dave Butenhof

unread,
Dec 27, 1996, 3:00:00 AM12/27/96
to

Yousuf Khan wrote:
> I missed the start of this thread. Now, without going into too much
> detail rehashing every detail I've likely missed, I'll just ask some
> specific questions. What exactly is an atomic read or write of a
> pointer?

The CPU (or peripheral) says "give me this value", and the memory
responds with the value. Another CPU (or some peripheral) may be
simultaneously writing a new value to that memory -- but the reader is
guaranteed to get either the old or new bits, and nothing else.

Without atomic read/write, the reader may get some new bits and some old
bits. For example, let's take the most common case ("word tearing"),
where a read from an unaligned address conflicts with a concurrent write
to one half of the data. The reader gets the OLD value of one half of
the "word", and the NEW value of the other half.

If you imagine reading a 16-bit value, for example, while another CPU
rolls that value over from 255 (0xff) to 256 (0x100). An atomic
read/write architecture will always return either 255 or 256. But
without atomic read/write, you might instead get either 0 (the low 8
bits "after" and the high 8 bits "before") or 511 (0x1ff -- the low 8
bits "before" and the high 8 bits "after").

Shaheen Ali

unread,
Dec 31, 1996, 3:00:00 AM12/31/96
to

Yousuf Khan <yk...@bellglobal.com> writes:
>
> I missed the start of this thread. Now, without going into too much
> detail rehashing every detail I've likely missed, I'll just ask some
> specific questions. What exactly is an atomic read or write of a
> pointer?
>

Some definitions of atomic read:

A read/write that takes one centrally arbitrated bus cycle.
In other words, if it takes two bus cycles to read a multi-byte word
then it ain't atomic.

A read/write that is started and completed before any other
process can start and/or finish a read/write.

The point is to make sure that if you are reading a word, that you get
the word in its entirety. A non-atomic read would start reading a
byte of a word, while another process starts writing to that same
word. A race would ensue and the reader would end up with a scrambled
word.

Memory that displays this behavior is called bogus memory. Very
technical :-)

Shaheen


Ian Emmons

unread,
Jan 6, 1997, 3:00:00 AM1/6/97
to

Dave,

I posted a message on this thread a couple of weeks ago, and then went on
vacation. The replies that I was seeking seem to have expired. Is there an
archive of this newsgroup somewhere? If so, where?

Thanks,

Chris Colohan

unread,
Jan 7, 1997, 3:00:00 AM1/7/97
to

In article <32D19F...@persistence.com>,

Ian Emmons <i...@persistence.com> wrote:
>Dave,
>
>I posted a message on this thread a couple of weeks ago, and then went on
>vacation. The replies that I was seeking seem to have expired. Is there an
>archive of this newsgroup somewhere? If so, where?

I believe http://www.dejanews.com keeps a 1 year record of every
newsgroup on the Usenet. You can search it by author to get your
articles, then pick out individual threads...

Chris

Zlatko Calusic

unread,
Jan 7, 1997, 3:00:00 AM1/7/97
to

Ian Emmons <i...@persistence.com> wrote:
> Dave,

> I posted a message on this thread a couple of weeks ago, and then went on
> vacation. The replies that I was seeking seem to have expired. Is there an
> archive of this newsgroup somewhere? If so, where?

> Thanks,

http://www.dejanews.com/

For all groups.
--
__________
/\ _______\ Posted by Zlatko Calusic
\ \ \____ / E-mail: zcal...@srce.hr
\ \ \/ / / URL: http://fly.cc.fer.hr/~maverick/
\ \/ / / ---------------------------------------------------------------
\ / / FER - Faculty of Electrical Engineering and Computer Science
\/_/ SRCE - University Computing Centre, University of Zagreb, Croatia

WARNING: my messages are offensive to morons!

Andrew Gabriel

unread,
Jan 8, 1997, 3:00:00 AM1/8/97
to

In article <32D19F...@persistence.com>,

Ian Emmons <i...@persistence.com> writes:
>Dave,
>
>I posted a message on this thread a couple of weeks ago, and then went on
>vacation. The replies that I was seeking seem to have expired. Is there an
>archive of this newsgroup somewhere? If so, where?

Mailed a copy from my newsserver to poster.

gee....@gmail.com

unread,
Feb 2, 2018, 4:16:54 PM2/2/18
to

I have my own lock solution which solves this problem but with some extra
work and ONLY for write locking. This is so since my structure stores
only the last writer pid & thread in the structure. Unfortunately, since read
locks are granted to multiple threads and since I do not store all the
threads which have so far acquired the read lock, the mechanism protects
only against write locks. If one was to also add code to maintain the
entire list of reader threads, it would also work for reader crashes too
but that would make the lock object quite complicated.

I have a mutex and a bunch of counters & thread ids which are in
my own private lock structure.

When a read or write lock is requested, the code checks whether there
is currently a write lock already granted. If so, AND the pid/thread
which currently has the write lock is NOT the same as the requesting
pid/thread, there and then a quick check is made using the pid/thread id
of the current writer to see if it is alive. If so, no problem, the
write lock is not granted. But if the pid/thread has already died, then
the lock is granted to the new requester.

Basically, what is being done is detecting whether a pid/thread is alive
only at the instance when another pid/thread requests the write lock.
This limits the liveness checking to be done only when the locks are
required and is not important any other times.

As I said at the beginning, since write lock is granted only to one thread
at a time, it is easy to do. If it is required that "dead" threads be also
detected when readers are involved, a list of readers have to be maintained.

So make sure you crash only when u have the write lock & not the read lock :-)


Kaz Kylheku

unread,
Feb 2, 2018, 6:47:46 PM2/2/18
to
On 2018-02-02, gee....@gmail.com <gee....@gmail.com> wrote:
>
> I have my own lock solution which solves this problem but with some extra
> work and ONLY for write locking.

I think even POSIX doesn't solve this problem for read-write locks;
the whole "EOWNERDEAD" error check is only specified for mutexes.

This is as good a reason to avoid read-write locks as any, and implement
some sort of process-distributed RCU-like mechanism instead
(in which any necessary locks are process-shared robust mutexes).

The problem then probably reduces to just detecting when there are
processes that died while in a RCU read-side critical section, which
seems simpler due to being batched inside the "synchronize_rcu"
operation rather than tied to a lock. You need just one global table of
processes that are in a read-side or something like that.

RCU means that you need data structures that can be traversed by readers
even while updated; not an easy retrofit for all read-write lock
situations.

Steve Watt

unread,
Feb 5, 2018, 7:23:32 PM2/5/18
to
In article <8aa2d575-75b9-4bbd...@googlegroups.com>,
<gee....@gmail.com> wrote:
[ ... ]
>As I said at the beginning, since write lock is granted only to one thread
>at a time, it is easy to do. If it is required that "dead" threads be also
>detected when readers are involved, a list of readers have to be maintained.
>
>So make sure you crash only when u have the write lock & not the read lock :-)

And make sure you crash only when you're leaving the data structure in a
state that won't nuke other accessors. But the owning thread crashed,
so it's a huge gamble about whether the protected data structure is
safe.

Having a shared lock silently succeed when the previous write-locking
owner dies is a terrible idea. EOWNERDEAD exists to allow code that can
do sanity checks or reconstruction on data structures.

Basically what will happen with your scheme is that some thread will
take the write lock, start manipulating the data structure, and explode
while pointers are still in an inconsistent state. Then the next (more
innocent) thread will come along, attempt to modify the data structure
in some other way, and also crash. Lather, rinse, repeat, until all
processes that might update the shared structure have crashed.

Good luck debugging the mess that ensues.
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.5" / 37N 20' 15.3"
Internet: steve @ Watt.COM Whois: SW32-ARIN
Free time? There's no such thing. It just comes in varying prices...
0 new messages