Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A pthread compatible, lock-free semaphore class...

172 views
Skip to first unread message

SenderX

unread,
Jul 23, 2003, 9:30:03 PM7/23/03
to
Humm, I thought somebody would notice the race-condition here...

It has to do with the fact that you can't use an auto-reset event here, or
any pthread emulation of a auto-reset event!


The fixed code now has zero-race-conditions:

http://appcore.home.comcast.net/pthreads/lfsema.cpp


This is because the wake signals persist now. There will be some spurious
wakes, but the atomic predicate check will handle them...

Any thoughts?

--
The designer of the experimental, SMP and HyperThread friendly, AppCore
library.

http://AppCore.home.comcast.net


Joe Seigh

unread,
Jul 24, 2003, 5:48:11 AM7/24/03
to

SenderX wrote:
>
> Humm, I thought somebody would notice the race-condition here...
>
> It has to do with the fact that you can't use an auto-reset event here, or
> any pthread emulation of a auto-reset event!
>
> The fixed code now has zero-race-conditions:
>
> http://appcore.home.comcast.net/pthreads/lfsema.cpp
>
> This is because the wake signals persist now. There will be some spurious
> wakes, but the atomic predicate check will handle them...
>
> Any thoughts?
>

I think you can do the whole thing with just one count field, e.g. something like

Joe Seigh

unread,
Jul 24, 2003, 6:10:12 AM7/24/03
to

SenderX wrote:
>
> Humm, I thought somebody would notice the race-condition here...
>
> It has to do with the fact that you can't use an auto-reset event here, or
> any pthread emulation of a auto-reset event!
>
> The fixed code now has zero-race-conditions:
>
> http://appcore.home.comcast.net/pthreads/lfsema.cpp
>
> This is because the wake signals persist now. There will be some spurious
> wakes, but the atomic predicate check will handle them...
>
> Any thoughts?

(ignore that previous post. apparently I've discovered an undocumented Netscape
key sequence. I need to get a new browser, probably not anything by the same
people responsible for Netscape)

I think you can do the whole thing with just one count field, e.g. something like

#include <windows.h>
#include <winbase.h>

class qsem {
public:
qsem(int z = 0) {
count = z;
hSem = CreateSemaphore(NULL, 0, 999999, NULL);
}

~qsem() {
CloseHandle(hSem);
}

void wait() {
if (InterlockedDecrement(&count) < 0)
WaitForSingleObject(hSem, INFINITE);
}

void post() {
if (InterlockedIncrement(&count) <= 0)
ReleaseSemaphore(hSem, 1, NULL);
}

private:
LONG count;
HANDLE hSem;

};


It's basically recursive. You can basically do the same thing for
some event types if you can define them as a FSM (Mealy machine).
The recursion would show up in the output action on some of the
transitions.

Joe Seigh

Alexander Terekhov

unread,
Jul 24, 2003, 4:27:49 PM7/24/03
to

Joe Seigh wrote:
[...]

> void wait() {
> if (InterlockedDecrement(&count) < 0)
> WaitForSingleObject(hSem, INFINITE);
> }
>
> void post() {
> if (InterlockedIncrement(&count) <= 0)
> ReleaseSemaphore(hSem, 1, NULL);
> }

Now it's time to add cancelation and timeouts. The funny thing is that
it seems that on some windows "version(s)", timeouts on sema-waits do
consume "concurrent posts". I had to remove the spurious wakes
detection/prevention from pthread-win32 condvars for that reason. But
the most funny thing is the NPTL sema-stuff... don't miss it. ;-) ;-)

regards,
alexander.

Joe Seigh

unread,
Jul 24, 2003, 5:03:57 PM7/24/03
to

Adding cancelation would probably obviate any benefits from using the
interlocked stuff in the first place. Another one of those lack of
performance hints things. Tell me again why we're using threads.

Joe Seigh

Joe Seigh

unread,
Jul 25, 2003, 6:48:31 AM7/25/03
to

If case anyone hasn't figured it out by now, to handle timeouts or
cancelations, or any error, you just increment the count (undo your
decrement). No other actions.

Kind of ironic that the easiest to fast path synchronization primitive
is one of the least useful. I don't think I've ever used semaphores
more than a couple of times. I code I posted is scrap code for a situation
that turns out not solvable by semaphores.

Joe Seigh

Alexander Terekhov

unread,
Jul 25, 2003, 2:24:51 PM7/25/03
to

Joe Seigh wrote:
[...]

> If case anyone hasn't figured it out by now, to handle timeouts or
> cancelations, or any error, you just increment the count (undo your
> decrement). No other actions.

Yep, just ignore all those races with respect to "concurrent signals".

Well, in case anyone hasn't figured it out by now, NPTL futex-based
stuff can be "classified" into two categories:

a) "astonishingly curious" --> their CV impl, primarily;

-and-

b) "brain-damaged"/"totally broken" --> the rest, so to speak.

I may be missing and/or misunderstanding something, of course.

regards,
alexander.

--
"If Unix were a car, they said, SCO Unix was like a driving motorized
wheelbarrow through a mosquito-infested swamp naked with both hands
tied behind your back."

-- http://www.humorix.org/articles/may03/sco.shtml

Joe Seigh

unread,
Jul 25, 2003, 4:18:36 PM7/25/03
to

Alexander Terekhov wrote:
>
> Joe Seigh wrote:
> [...]
> > If case anyone hasn't figured it out by now, to handle timeouts or
> > cancelations, or any error, you just increment the count (undo your
> > decrement). No other actions.
>
> Yep, just ignore all those races with respect to "concurrent signals".

But it's consistent, i.e. you are going to have the races w.r.t
"concurrent signals" whether or not you fast path the semaphore.

>
> Well, in case anyone hasn't figured it out by now, NPTL futex-based
> stuff can be "classified" into two categories:
>
> a) "astonishingly curious" --> their CV impl, primarily;
>
> -and-
>
> b) "brain-damaged"/"totally broken" --> the rest, so to speak.
>
> I may be missing and/or misunderstanding something, of course.
>

I haven't looked at the futex code. I presume it's some sort of
"fast" mutex.

Joe Seigh

SenderX

unread,
Jul 25, 2003, 6:23:35 PM7/25/03
to
> I haven't looked at the futex code. I presume it's some sort of
> "fast" mutex.

http://groups.google.com/groups?selm=3DADB46D.2376E22B%40web.de

Its not fast, as it locks a mutex on all modifications.

Alexander Terekhov

unread,
Jul 25, 2003, 6:30:10 PM7/25/03
to

SenderX wrote:
>
> > I haven't looked at the futex code. I presume it's some sort of
> > "fast" mutex.
>
> http://groups.google.com/groups?selm=3DADB46D.2376E22B%40web.de

http://groups.google.com/groups?selm=3F0D8EE1.E2BE3ECA%40web.de

>
> Its not fast, as it locks a mutex on all modifications.

See also "P.S." in the 2nd message referenced above.

regards,
alexander.

Alexander Terekhov

unread,
Jul 25, 2003, 6:35:30 PM7/25/03
to

Alexander Terekhov wrote:
[...]

> > Its not fast, as it locks a mutex on all modifications.
>
> See also "P.S." in the 2nd message referenced above.

I forgot to mention that the "real" futex doesn't need any locking in
the user space. All its locking is done in the kernel. The user space
is meant to be "lock-free" with respect to futex modifications, Sender.

regards,
alexander.

SenderX

unread,
Jul 25, 2003, 6:41:18 PM7/25/03
to
Clever to use a negative count, as a pseudo-waiter count itself...


> void wait() {
> if (InterlockedDecrement(&count) < 0)
> WaitForSingleObject(hSem, INFINITE);
> }


Wouldn't this wait function need to re-check the predicate?:

while( InterlockedDecrement( &count ) < 0 )
{
WaitForSingleObject(hSem, INFINITE);
}


This may make it impossible to add timeouts, without messing up the counter?

while( InterlockedDecrement( &count ) < 0 )
{
if ( WaitForSingleObject(hSem, 1000) != WAIT_OBJECT_0 )
{
// What do we do with the count?

// Do we remove our previous decrement?

// InterlockedIncrement( &count );

return false;
}
}

I may be missing something?

;)


Using the cas64 version, you can keep exact track of waiters, even after
timeouts.

SenderX

unread,
Jul 25, 2003, 6:54:20 PM7/25/03
to
> I forgot to mention that the "real" futex doesn't need any locking in
> the user space. All its locking is done in the kernel. The user space
> is meant to be "lock-free" with respect to futex modifications, Sender.

Ahh yes... Wonderfull! It should go into the pthreads library:


pthread_futex_init
(
pthread_futex_t * futex,
const pthread_futexattr_t * attr
);

...

pthread_futex_destroy
(
pthread_futex_t * futex
);

;)


P.S.

I will be posting an entire library to this group, that exports
fast-user-space synchronization primitives.

The windows version will be able to sync processes and/or threads. The
pthreads versions will only sync threads, unless some portable shared memory
functions get added to the pthread library...

Joe Seigh

unread,
Jul 25, 2003, 7:16:47 PM7/25/03
to

I already commented in that thread. "concurrent" events aren't a problem
if it's left up to the implementation to decide which one occurred first.
If it's not, i.e. it's tied to externally observable actions, then you
are a lot more limited in what you can do and you can end up being forced
to do a really suboptimal implementation.

Also, some of the POSIX scheduling parameters force the implementation into
the kernel due to scheduling and dispatching considerations which you will
never be able to address outside the kernel.

On those grounds, you should probably avoid trying to do a strict POSIX
implementation. Just do "generic" mutexes, condvars, semaphores, etc...
... plus all the new stuff that will never get into POSIX because it's
too new, less than ten years old.

Joe Seigh

Joe Seigh

unread,
Jul 25, 2003, 9:19:25 PM7/25/03
to

SenderX wrote:
>
> Clever to use a negative count, as a pseudo-waiter count itself...
>
> > void wait() {
> > if (InterlockedDecrement(&count) < 0)
> > WaitForSingleObject(hSem, INFINITE);
> > }
>
> Wouldn't this wait function need to re-check the predicate?:
>
> while( InterlockedDecrement( &count ) < 0 )
> {
> WaitForSingleObject(hSem, INFINITE);
> }

No. Decrementing the count to a negative value means it has to
wait to be explicitly signaled. If it wakes up and the count
is still negative, it only means that other threads are waiting
to be explicitly signaled.

>
> This may make it impossible to add timeouts, without messing up the counter?
>
> while( InterlockedDecrement( &count ) < 0 )
> {
> if ( WaitForSingleObject(hSem, 1000) != WAIT_OBJECT_0 )
> {
> // What do we do with the count?
>
> // Do we remove our previous decrement?
>
> // InterlockedIncrement( &count );
>
> return false;
> }
> }
>
> I may be missing something?

No loop but I'm probably missing something. Let me look at it some
more. This has some of the same issues that I ran into doing a
win32 condvar.

Joe Seigh

Torsten Robitzki

unread,
Jul 26, 2003, 5:39:04 PM7/26/03
to
Hi Mister unknown SenderX,

SenderX wrote:
> This code uses a lock-free algo. for the count and waiters, and uses a
> pthread semaphore (sem_t) to implement the waits. The waiter semaphore is
> only called when needed, due to the lock-free algo. The object keeps exact
> track of waiters at all times. The class will signal and wait for any
> concurrent waiters in the destructor.
>
> The count and the waiters get updated together in a single atomic operation
> ( cas ), this eliminates count and waiter race-conditions. The signaling is
> done via. pthread semaphore with a max of 1 put on the semaphore count and
> the sem_post call is guarded, so it acts like a win32 auto-reset event.
>
> The lfsema::counter union uses an __int64 for 64-bit integer values, not
> sure if this is portable, maybe switch it to uint64_t or u_int64_t? Are
> there portable 64-bit integer types?
<snip>


> What do you think of it?

am I'm to stupid or what purpose will your class serve over a normal
semaphor? Or is just to prevent calls into the kernal? If so, why do you
think that functions like sem_post() will call into the kernel without
waiters?

regards Torsten

SenderX

unread,
Jul 26, 2003, 6:16:04 PM7/26/03
to
> am I'm to stupid or what purpose will your class serve over a normal
> semaphor?

Your not stupid, but you overlooked one thing...


> Or is just to prevent calls into the kernal? If so, why do you
> think that functions like sem_post() will call into the kernel without
> waiters?


The pthread sem_post(...) sources for win32:

ftp://sources.redhat.com/pub/pthreads-win32/sources/pthreads-snap-2003-05-10
/sem_post.c

ftp://sources.redhat.com/pub/pthreads-win32/sources/pthreads-snap-2003-05-10
/ptw32_increase_semaphore.c


Those calls clearly enter the kernel on every counter update. The latter
uses a critical section which skips the kernel upon NO contention... If a
critical section gets any contention whatsoever ( more than 1 thread ), it
waits on a kernel object.

However, if more than ONE thread tries to update the current pthread
semaphore at the SAME time, they will BLOCK in the kernel! Not good. There
is absolutely NO reason for multi-threads to go into the kernel when
concurrently updating a semaphore...

My lock-free semaphore will avoid the kernel 100% of the time under any kind
of actual load, and try to avoid it as much as it can under very light-load.
My algo. actually works faster under any kind of light-moderate to
real-heavy loads ( cycling millions of posts through 500 threads, for days
on end ).

Test an see on a win32 platform, watch the kernel usage. It's 0 under load.
Compare this to a normal pthread sema's kernel usage...

It doesn't even come close.


;)

Joe Seigh

unread,
Jul 27, 2003, 1:34:01 PM7/27/03
to

>
> SenderX wrote:
> >
> > This may make it impossible to add timeouts, without messing up the counter?
> >
> > while( InterlockedDecrement( &count ) < 0 )
> > {
> > if ( WaitForSingleObject(hSem, 1000) != WAIT_OBJECT_0 )
> > {
> > // What do we do with the count?
> >
> > // Do we remove our previous decrement?
> >
> > // InterlockedIncrement( &count );
> >
> > return false;
> > }
> > }
> >
> > I may be missing something?
>
> No loop but I'm probably missing something. Let me look at it some
> more. This has some of the same issues that I ran into doing a
> win32 condvar.
>

Yes, no loop but not quite as simple as just doing an increment but not
complicated either. Doing proofs of semaphore correctness is weird. I
think I'll just let it set a while. Anyway you don't need sem_timedwait()
anyhow.

"The sem_timedwait() function is part of the Semaphores
and Timeouts options and need not be provided on all
implementations."

Joe Seigh

SenderX

unread,
Jul 27, 2003, 6:12:10 PM7/27/03
to
> Anyway you don't need sem_timedwait() anyhow.

What about adding a trywait?

I find trywait's very useful for sync primitives. Especially when the algo
to implement the trywait is non-blocking.


> Doing proofs of semaphore correctness is weird.

I was trying to implement a semaphore using the Interlocked API's, but could
not find a way to keep the waiter count and the actual semaphore count
together for the timeouts. They would drift apart do to race-conditions. I
needed a way to atomically update the waiter and the count.

I think a timedout thread needs to revoke its waiter increment, and update
the waiter atomically with the sema count in order to leave the wait
function correctly...

Humm...


The InterlockedCompareExchangeAcquire/Release64 API's would be perfect for
this!

Joe Seigh

unread,
Jul 28, 2003, 6:57:04 AM7/28/03
to

SenderX wrote:
>
> > Anyway you don't need sem_timedwait() anyhow.
>
> What about adding a trywait?

Trivial.

>
> I find trywait's very useful for sync primitives. Especially when the algo
> to implement the trywait is non-blocking.
>
> > Doing proofs of semaphore correctness is weird.
>
> I was trying to implement a semaphore using the Interlocked API's, but could
> not find a way to keep the waiter count and the actual semaphore count
> together for the timeouts. They would drift apart do to race-conditions. I
> needed a way to atomically update the waiter and the count.
>
> I think a timedout thread needs to revoke its waiter increment, and update
> the waiter atomically with the sema count in order to leave the wait
> function correctly...
>
> Humm...

You are trying to coordinate things that don't need coordinating.

Maybe I'm taking the wrong tact here with a minimalist solution. Maybe
I should come up with a bloatware solution. Everyone complains about
bloatware but the marketplace seems to love it. New strategy. Come up
with the most complicated solution possible. Yeah, I know there is
serious competition out there already (ACE, ...) but that just makes
it more challenging.

>
> The InterlockedCompareExchangeAcquire/Release64 API's would be perfect for
> this!
>

Hmm...

Joe Seigh

SenderX

unread,
Jul 28, 2003, 7:11:13 AM7/28/03
to
> You are trying to coordinate things that don't need coordinating.

I don't quite see how to add a timeout function using your algo. without
getting race-conditions... I am obviously missing something...

Could you please enlighten me?

;)


> Maybe I'm taking the wrong tact here with a minimalist solution. Maybe
> I should come up with a bloatware solution.

lol.

Your solution for this, is very fast!

Joe Seigh

unread,
Jul 28, 2003, 8:22:28 AM7/28/03
to

SenderX wrote:
>
> > You are trying to coordinate things that don't need coordinating.
>
> I don't quite see how to add a timeout function using your algo. without
> getting race-conditions... I am obviously missing something...
>
> Could you please enlighten me?
>

I will post the code for this when I'm reasonably sure I haven overlooked
anything.

>
> > Maybe I'm taking the wrong tact here with a minimalist solution. Maybe
> > I should come up with a bloatware solution.
>
> lol.
>
> Your solution for this, is very fast!
>

Yeah, but fast doesn't pay the bills. And bloatware vendors seem to making
tons of money.

Joe Seigh

John Hickin

unread,
Jul 25, 2003, 8:39:19 AM7/25/03
to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:3F204B9D...@xemaps.com...

>
> Adding cancelation would probably obviate any benefits from using the
> interlocked stuff in the first place. Another one of those lack of
> performance hints things. Tell me again why we're using threads.
>

Why not use Windows itself to cancel?

Instead of WaitForSingleObject use WaitForSingleObjectEx. To cancel use
QueueUserAPC and enque an APC that throws an exception.


Regards, John.


Joe Seigh

unread,
Jul 28, 2003, 4:07:08 PM7/28/03
to

SenderX wrote:
>
> > You are trying to coordinate things that don't need coordinating.
>
> I don't quite see how to add a timeout function using your algo. without
> getting race-conditions... I am obviously missing something...
>
> Could you please enlighten me?
>

Here's the source code for fast semaphores w/ timeouts.

How it works Basically, you can cancel a wait by incrementing the count field
if it's still negative which will prevent a subsequent post from happening. If the
count is not negative then it's too late to cancel.

Here's where the trickery comes in. Since this algorithm is a loop free sequence
of atomic actions, the intervals between the atomic actions can be of arbitrary
length including well neigh infinite. And since semaphores have no concept
of ownership, the cancellations can be done by any thread. So basically the
solution is to defer the cancellations until convenient, i.e. when the count goes
negative. A convenient place is the post method. You can do it in the wait
method but it doesn't buy you that much, even slows you down a fraction.

Since the cancellations can be indefinitely delayed, what if they were never done,
would this algorithm still work? Yes, basically the algorithm devolves to a semaphore
without any fast path, the count staying negative and the actual count being maintained
in the internal semaphore.

tryWait(). There is one. It's implemented as timedWait(0). The success path is
fast path but the fail path is not. If you want fast path on the tryWait fail path
you will have to implement a different algorithm.

Joe Seigh

--
#include <windows.h>
#include <winbase.h>

class qsem {
public:

qsem(int z = 0) {
count = z;

cancel = 0;


hSem = CreateSemaphore(NULL, 0, 999999, NULL);
}

~qsem() {
CloseHandle(hSem);
}

bool timedWait(DWORD dwMillisecs) {
bool rc = true;

if (InterlockedDecrement(&count) < 0) {
if (WaitForSingleObject(hSem, dwMillisecs) == WAIT_TIMEOUT) {
// uncomment 1 and only one of the next 2 lines.
InterlockedIncrement(&cancel);
//processCancels(1);
rc = false;
}
}

return rc;
}

inline void wait() {
timedWait(INFINITE);
}

inline bool tryWait() {
return timedWait(0);
}

void post() {

if (cancel > 0 && count < 0)
processCancels(InterlockedExchange(&cancel, 0));

if (InterlockedIncrement(&count) <= 0)
ReleaseSemaphore(hSem, 1, NULL);
}

private:

// increment count by min(cancelCount, -(count)) iff count < 0
inline void processCancels(LONG cancelCount) {
LONG temp2, temp3, temp4, temp5;

if (cancelCount > 0) {
temp2 = count;
while (temp2 < 0) {
temp3 = 0; // unprocessed cancels
temp4 = cancelCount + temp2; // new count

if (temp4 > 0) {
temp3 = temp4; // unprocessed cancels
temp4 = 0; // new count
}

temp5 = temp2;
if ((temp2 = InterlockedCompareExchange(&count, temp4, temp2)) == temp5)
break;
}

if (temp3 > 0)
InterlockedExchangeAdd(&cancel, temp3);
}

}

public:
LONG count;
LONG cancel; // deferred cancel count
HANDLE hSem;

};

/*-*/

SenderX

unread,
Jul 28, 2003, 6:12:25 PM7/28/03
to
> And since semaphores have no concept
> of ownership, the cancellations can be done by any thread.

Yes they can.

Your algo keeps track of the semaphore count, waiters, and timeouts(
cancels ), in a loop-free manor, using 64-bits of data. Very nice!

My algo does the waiter canceling in the same thread that waited. A waiting
thread can't leave a function unless its waiter increment has been revoked.


> > > You are trying to coordinate things that don't need coordinating.

I wanted to make sure that the waiter and count stayed together, to ease my
mind. And to ease the flow of the algo. as everything in done in a simple
CAS section.

For some reason I still think the waiter and count should be updated
together... But I see that you seem to atomically queue and process the
timeouts, using the cancel counter...


I will study the code. Especially processCancels!

;)

Joe Seigh

unread,
Jul 28, 2003, 6:26:22 PM7/28/03
to

SenderX wrote:

> For some reason I still think the waiter and count should be updated
> together... But I see that you seem to atomically queue and process the
> timeouts, using the cancel counter...
>
> I will study the code. Especially processCancels!

The amount subtracted from the cancel count is the same as
the amount added to the wait count. You could do both
at once with a cas64 but it's probably more portable with
single word cas.

Joe Seigh

SenderX

unread,
Jul 28, 2003, 6:36:03 PM7/28/03
to
> You could do both
> at once with a cas64 but it's probably more portable with
> single word cas.

More portable to 32-bit systems for sure.

64-bit systems should have a native LL/SC or CAS that works on 64-bit
values. So ANY algo. that uses cas64 to swap two 32-bit values, that don't
contain pointers, should be able to port to it.

I still am wondering how MS is going to put the SList API on a 64-bit system
that lacks a cas that works on memory double the pointer size. They must use
arrays and offsets, or mapped files...

SenderX

unread,
Jul 28, 2003, 7:51:35 PM7/28/03
to
> If you want fast path on the tryWait fail path
> you will have to implement a different algorithm.

> if (InterlockedDecrement(&count) < 0) {


> if (WaitForSingleObject(hSem, dwMillisecs) == WAIT_TIMEOUT) {
> // uncomment 1 and only one of the next 2 lines.
> InterlockedIncrement(&cancel);
> //processCancels(1);
> rc = false;
> }
> }


Why wouldn't this work?


if (InterlockedDecrement(&count) < 0) {

bool bCancel = true;

if ( dwMillisecs > 0 )
{
if (WaitForSingleObject(hSem, dwMillisecs) == WAIT_OBJECT_0)
{
bCancel = false;
}
}

if ( bCancel ) {


// uncomment 1 and only one of the next 2 lines.
InterlockedIncrement(&cancel);
//processCancels(1);
rc = false;
}
}


Sure seems like it should...

0 new messages