Using RCU to manage mutex destruction during EBUSY...

SenderX

unread,

Jul 14, 2004, 9:58:37 PM7/14/04

to

Could this happen in a legal POSIX application and/or implementation:

/* created in main() */
static pthread_mutex_t dm;

/* Mutex is visible to all threads */

Thread A
-------------

a1: pthread_mutex_lock( &dm );
a2: pthread_mutex_unlock( &dm );

/* Its legal to call this after an unlock, no? */
a3: pthread_mutex_destroy( &dm );

Thread B
------------

/* Crappy application calls this without regard to a3's actions */
b1: pthread_mutex_lock( &dm );
{
b1a: touches memory in dm->Whatever
b1a: touches some more memory in dm->Whatever
}

Now, the execution happens to go as follows:

a1
a2
b1
a3
b1a - Touches destroyed memory!
b1b - Touches destroyed memory!

If this indeed can happen in a "legal POSIX application", RCU could be used
to defer the actual destruction of the memory making up the pthread_mutex_t
until all threads have gone through a quiescent period. It would totally
eliminate the race-condition. Would it be overkill? The reason I ask is
because my new library exposes a pthread emulation api for windows, and I
was wondering how to properly handle EBUSY... I want my pthread win32
emulation to be robust and follow the standard...

Any thoughts?

Ronald Landheer-Cieslak

unread,

Jul 15, 2004, 9:19:14 AM7/15/04

to

SenderX wrote:

IIRC, using a non-created or already-destroyed mutex (or whatever other
kind of) descriptor causes undefined behaviour, so it's up to the
developer to make sure his descriptors are valid when used. If it's not
undefined behaviour, it should be, IMHO.

Hence, IMHO, using RCU to eliminate this particular race condition would
be overkill.

I'd be interested, though, to know how you implement RCU on Windows (of
all places!). I haven't gone through all of the thread of your
discussion with mr Seigh on this subject yet, but it seems more
interesting than I first though ;)

rlc

SenderX

unread,

Jul 15, 2004, 3:24:59 PM7/15/04

to

> IIRC, using a non-created or already-destroyed mutex (or whatever other
> kind of) descriptor causes undefined behaviour, so it's up to the
> developer to make sure his descriptors are valid when used. If it's not
> undefined behaviour, it should be, IMHO.

> > b1
> > a3

Well, b1 has executed pthread_mutex_lock first. So, technically speaking
that should swing the lock state into EBUSY, imho, because a thread has a
reference to the mutex and is already called into the function,"before" a3
executed. So, a3 should get EBUSY? It seems to be a nasty race-condition...

Oh well, I know see the power of undefined behaviors, less error checking!

;)

> Hence, IMHO, using RCU to eliminate this particular race condition would
> be overkill.

Yeah, most likely...

I was actually thinking of ways to create robust sync primitives that will
not get freed if any thread holds a reference. Then I think my experimental
RCU ( so far its working ) would be a non-invasive method. Of course, this
is under the assumption that the user application will not be destroying
sync objects frequently, 1 every 5 to 10 ms is fine.

> I'd be interested, though, to know how you implement RCU on Windows (of
> all places!).

Actually, the algo can be implemented in user-space on any OS that has
pthreads and/or native windows threads. The CPU specific part is the cas and
some memory barrier if needed.

> I haven't gone through all of the thread of your
> discussion with mr Seigh on this subject yet, but it seems more
> interesting than I first though ;)

Well, I can't give the exact algorithm to public domain. If IBM already has
the quiescent period detection algorithm I came up with patented, then I
will post it. The RCU API will be presented in the library I'm tweaking and
polishing right now. I need to test for some more weeks/months. I gave a
"very crude" outline of the basic algo. The working algo is loosely based on
it.

Ronald Landheer-Cieslak

unread,

Jul 15, 2004, 4:02:01 PM7/15/04

to

SenderX wrote:
>>IIRC, using a non-created or already-destroyed mutex (or whatever other
>>kind of) descriptor causes undefined behaviour, so it's up to the
>>developer to make sure his descriptors are valid when used. If it's not
>>undefined behaviour, it should be, IMHO.
>>>b1
>>>a3
> Well, b1 has executed pthread_mutex_lock first. So, technically speaking
> that should swing the lock state into EBUSY, imho, because a thread has a
> reference to the mutex and is already called into the function,"before" a3
> executed. So, a3 should get EBUSY? It seems to be a nasty race-condition...
>
> Oh well, I know see the power of undefined behaviors, less error checking!
>
> ;)

Exactly! Whenever I write technical specs, I pepper them with
"unspecified" and "undefined" behaviour and when the bug reports arive,
I simply hand them the specs and say "well, duh!" ;)

>>Hence, IMHO, using RCU to eliminate this particular race condition would
>>be overkill.
> Yeah, most likely...
>
> I was actually thinking of ways to create robust sync primitives that will
> not get freed if any thread holds a reference. Then I think my experimental
> RCU ( so far its working ) would be a non-invasive method. Of course, this
> is under the assumption that the user application will not be destroying
> sync objects frequently, 1 every 5 to 10 ms is fine.

Ehm.. I don't know about yours, but my sync primitives are usually
either initialized and destroyed by the main thread ant startup/shutdown
time resp. or local to the thread. I would probably never need that
kind of robustness. Even ADTs that have their own sync primitives would
have them initialized when they're initialized, and destroyed when
they're destroyed - and I always consider using destroyed ADTs
"undefined behaviour" (I think I even mention it in those words in the
libcontain docs).

Robustness is nice, but good programming is better: I wouldn't try to
allow the user of your sync primitive to use bad programming - the
programming will only get worse..

>>I'd be interested, though, to know how you implement RCU on Windows (of
>>all places!).
> Actually, the algo can be implemented in user-space on any OS that has
> pthreads and/or native windows threads. The CPU specific part is the cas and
> some memory barrier if needed.

I really do have to take a look at the original article - I always
assumed some special kernel support was necessary for it to tell you
when all CPUs had gone through the quiescent state..

Your implementation wouldn't be FLOSS (*), perchance? (reading on, I
guess not).

>>I haven't gone through all of the thread of your
>>discussion with mr Seigh on this subject yet, but it seems more
>>interesting than I first though ;)
> Well, I can't give the exact algorithm to public domain. If IBM already has
> the quiescent period detection algorithm I came up with patented, then I
> will post it. The RCU API will be presented in the library I'm tweaking and
> polishing right now. I need to test for some more weeks/months. I gave a
> "very crude" outline of the basic algo. The working algo is loosely based on
> it.

Too bad.. I saw the mention of the IBM patents. I used to live in a
country where you couldn't patent algorithms, but I haven't checked
whether that's still the case (i.e. I moved to Canada, the laws in
France haven't changed on the subject)..

Anyways, when I get round to taking a close look at RCU, I guess I'll
take a look at those patents as well and try to come up with a "free" algo..

rlc

*: FLOSS == Free/Libre Open Source Software

Joe Seigh

unread,

Jul 15, 2004, 5:21:01 PM7/15/04

to

SenderX wrote:
>
> Could this happen in a legal POSIX application and/or implementation:
>

(snip example of thread locking destroyed lock)

>
> If this indeed can happen in a "legal POSIX application", RCU could be used
> to defer the actual destruction of the memory making up the pthread_mutex_t
> until all threads have gone through a quiescent period. It would totally
> eliminate the race-condition. Would it be overkill? The reason I ask is
> because my new library exposes a pthread emulation api for windows, and I
> was wondering how to properly handle EBUSY... I want my pthread win32
> emulation to be robust and follow the standard...
>

From the point of view of the pthread_mutex_t structure, no as the application
would be free to do with that memory whatever it liked, including freeing it
if it was allocated on the heap or going out of scope if it was allocated
on the stack.

You're probably thinking of additionally allocated memory that is linked
in pthread_mutex_t, depending on the implementation. Yes, you could do that
but I don't think POSIX can require pthread_mutex_lock safetly and reliably
return EINVAL even if your implementation could do it.

Joe Seigh

SenderX

unread,

Jul 15, 2004, 7:42:01 PM7/15/04

to

> Ehm.. I don't know about yours, but my sync primitives are usually
> either initialized and destroyed by the main thread ant startup/shutdown
> time resp.

> Robustness is nice, but good programming is better: I wouldn't try to

> allow the user of your sync primitive to use bad programming - the
> programming will only get worse..

Very good point. I may add it in under a mutex type in pthread_mutexattr_t:

PTHREAD_MUTEX_DEMO_RCU_NP perhaps..

You never know what a user is going to do... The less your lib crashes, the
less phone calls... But, you can always fall back on that nice undefined
behavior clause. :P

lol.

Anyway, I guess it could be used as a demonstration of what rcu can do.

> I really do have to take a look at the original article - I always
> assumed some special kernel support was necessary for it to tell you
> when all CPUs had gone through the quiescent state..

Look at K42. They use counters for their per-cpu quiescent state and (
grace ) period algorithms. If you can grasp their rcu algorithm, then your
off to a good start.

> Your implementation wouldn't be FLOSS (*), perchance? (reading on, I
> guess not).

Not sure yet. I could use user-space RCU all over the place in some server
applications that make frequent iterations over dynamic collections. Server
applications are loaded with natural quiescent states. In fact, you could do
a specialized RCU just by using the natural points in your code.

It still very experimental. I have asserts all over the place!

> Too bad.. I saw the mention of the IBM patents. I used to live in a
> country where you couldn't patent algorithms, but I haven't checked
> whether that's still the case (i.e. I moved to Canada, the laws in
> France haven't changed on the subject)..

They patent everything!

SenderX

unread,

Jul 16, 2004, 4:05:19 AM7/16/04

to

> You're probably thinking of additionally allocated memory that is linked
> in pthread_mutex_t, depending on the implementation.

Exactly.

> Yes, you could do that
> but I don't think POSIX can require pthread_mutex_lock safetly and
reliably
> return EINVAL even if your implementation could do it.

Right...

I wonder if garbage collected sync primitives would be more appropriate for
mission critical servers...

SenderX

unread,

Jul 20, 2004, 6:20:34 PM7/20/04

to

> Your implementation wouldn't be FLOSS (*), perchance? (reading on, I
> guess not).

All of the algorithms that I did not think up my self will have source code.

It starting to look like my method for detecting quiescent periods in
user-space is fairly unique, I mean... It runs on Windows and Linux...

:O

Joe Seigh

unread,

Jul 21, 2004, 6:36:22 AM7/21/04

to

As far as you know. You're not the only one who doesn't publish. :)

Joe Seigh

SenderX

unread,

Jul 21, 2004, 6:47:08 AM7/21/04

to

> As far as you know. You're not the only one who doesn't publish. :)

;)

Joe Seigh

unread,

Jul 21, 2004, 7:01:39 AM7/21/04

to

SenderX wrote:
>
> > As far as you know. You're not the only one who doesn't publish. :)
>
> ;)

I've been fooling around with the published RCU algorithms.

Fun with dynamically resizable lock-free combining trees and maybe porting
the zero overhead signaling to unix. Not sure why. It's not like I'll have
1000's of threads running on anything anytime soon.

Joe Seigh

unread,

Jul 22, 2004, 6:29:37 PM7/22/04

to

> Fun with dynamically resizable lock-free combining trees and maybe porting
> the zero overhead signaling to unix. Not sure why. It's not like I'll have
> 1000's of threads running on anything anytime soon.
>

A little bit more overhead but still beats the heck out of mutex and rwlock.
Would definitely scale better than the first two.

Three RCU implementations is enough. Bored already. Maybe I'll get out of
multithreading and into something more useful.

Joe Seigh