Bug-Free Mutexs and CondVars w/ EventCounts...

Chris Thomasson

unread,

Mar 24, 2007, 7:30:15 PM3/24/07

to

I simply had to do this in order to clear up some of the crappy damn BUGS
that were in some of the previous mutex pseudo-code I posted:

The following logic is bug free:

Correct Pseudo-Code EventCount "Interface"
--------------------------
typedef unsigned int AtomicWord_t;

class EventCount {
public:
typedef AtomicWord_t Value_t;

public: // wait functions
Value_t Load() {...}
void Wait(Value_t) {...}
bool TryWait(Value_t) {...}
bool TimedWait(Value_t, const timespec&) {...}

public: // signal functions
void Signal() {...}
void Broadcast() {...}
};

Correct Pseudo-Code for Mutex
--------------------------

class Mutex {
AtomicWord_t m_State;
EventCount m_Waitset;

enum Config_e {
RELEASED, ACQUIRED, CONTENTION
};

public:
Mutex() : m_State(RELEASED) {}

public:
void Lock() {
AtomicWord_t LockState = ACQUIRED;
while(AtomicSwap(&m_State, LockState) != RELEASED) {
LockState = CONTENTION;
EventCount::Value_t ec_local = m_Waitset.Load();
if (AtomicSwap(&m_State, LockState) == RELEASED) {
return;
}
m_Waitset.Wait(ec_local);
}
}

void Unlock() {
if (AtomicSwap(&m_State, RELEASED) == CONTENTION) {
m_Waitset.Signal();
}
}
};

Correct Pseudo-Code for Condvar
--------------------------

class CondVar {
EventCount m_Waitset;

public:
void Wait(Mutex &mtx) {
EventCount::Value_t ec_local = m_Waitset.Load();
mtx.Unlock();
m_Waitset.TimedWait(ec_local);
mtx.Lock();
}

bool TimedWait(Mutex &mtx, const timespec &tspec) {
EventCount::Value_t ec_local = m_Waitset.Load();
mtx.Unlock();
bool retval = m_Waitset.TimedWait(ec_local, tspec);
mtx.Lock();
return retval;
}

void Signal() { m_Waitset.Signal(); }
void Broadcast() { m_Waitset.Broadcast(); }
};

enjoy!

;^)

Peter Dimov

unread,

Mar 26, 2007, 3:17:51 PM3/26/07

to

On Mar 25, 2:30 am, "Chris Thomasson" <cris...@comcast.net> wrote:
> I simply had to do this in order to clear up some of the crappy damn BUGS
> that were in some of the previous mutex pseudo-code I posted:
>
> The following logic is bug free:
>
> Correct Pseudo-Code EventCount "Interface"

Cool, thanks. Can you give a somewhat formal specification for the
eventcount primitive or point me at a paper that does? I've tried to
find more info on it but could not.

Steve Watt

unread,

Mar 26, 2007, 7:24:54 PM3/26/07

to

In article <l-adnUw7LazjKZjb...@comcast.com>,

Chris Thomasson <cri...@comcast.net> wrote:
>I simply had to do this in order to clear up some of the crappy damn BUGS
>that were in some of the previous mutex pseudo-code I posted:
>
>The following logic is bug free:

I can't resist:

Are you certain?

What was Knuth's disclaimer again? Something like "I have only proven
it correct, not tested it."

:)
--
Steve Watt KD6GGD PP-ASEL-IA ICBM: 121W 56' 57.5" / 37N 20' 15.3"
Internet: steve @ Watt.COM Whois: SW32-ARIN
Free time? There's no such thing. It just comes in varying prices...

Chris Thomasson

unread,

Mar 27, 2007, 2:18:06 AM3/27/07

to

"Steve Watt" <steve.re...@Watt.COM> wrote in message
news:eu9ko6$2nk8$1...@wattres.Watt.COM...

> In article <l-adnUw7LazjKZjb...@comcast.com>,
> Chris Thomasson <cri...@comcast.net> wrote:
>>I simply had to do this in order to clear up some of the crappy damn BUGS
>>that were in some of the previous mutex pseudo-code I posted:
>>
>>The following logic is bug free:
>
> I can't resist:
>
> Are you certain?

well, the first part of the condvar:

-----------------
class CondVar {
EventCount m_Waitset;

public:
void Wait(Mutex &mtx) {
EventCount::Value_t ec_local = m_Waitset.Load();
mtx.Unlock();
m_Waitset.TimedWait(ec_local);
mtx.Lock();
}

----------------

should use m_Waitset.Wait(ec_local) instead of .TimedWait... So, there is a
typo... Anyway, the compiler will warn that you need to include a const
reference to a timespec in the EventCount::TimedWait function.

So:

void CondVar::Wait(Mutex &mtx) {

EventCount::Value_t ec_local = m_Waitset.Load();
mtx.Unlock();

m_Waitset.Wait(ec_local);
mtx.Lock();
}

works better!

> What was Knuth's disclaimer again?

:^)

Well, perhaps I should stop typing this stuff out in the newsreader
interface... I guess I need to start submitting more full blown compliable
programs!

:^O

Any thoughts?

Chris Thomasson

unread,

Mar 27, 2007, 2:50:10 AM3/27/07

to

"Peter Dimov" <pdi...@gmail.com> wrote in message
news:1174936671....@y66g2000hsf.googlegroups.com...

> On Mar 25, 2:30 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>> I simply had to do this in order to clear up some of the crappy damn BUGS
>> that were in some of the previous mutex pseudo-code I posted:
>>
>> The following logic is bug free:
>>
>> Correct Pseudo-Code EventCount "Interface"
>
> Cool, thanks. Can you give a somewhat formal specification for the
> eventcount primitive

Well, they are kind of like win32 events, but, they allow you to wait on a
"multiple number" of events in a single primitive. Its "safer" than a normal
event because your not going to miss any signals... It's a non-lossy event
so to speak...

You use them kind of like you would use a condition vairable... The usage
pattern for "generic predicated-waits" is like this...:

// Waiters
while(! TestGenericPredicate()) {
val = LoadEventCount();
if (TestGenericPredicate()) { break; }
WaitEventCount(val);
}

//Signals
SatisfySingleGenericPredicate();
SignalEventCount();

//Broadcasts
SatisfyMultipleGenericPredicates();
BroadcastEventCount();

Humm, okay here is a fairly simple example...

Lets take a simple abstract thread-safe queue interface:

class MTQueue {
public:
void Push(void*);
void* TryPop();
};

The eventcount allows us to setup generic predicates, and arbitrarily timed
waits on their state... The state shifts are transformed into event count
increments which may wake up waitset generations. You can setup a condition
with an eventcount that says "must wait for work if there is no work." like
this:

class WaitMTQueue {
MTQueue m_WorkQ;
EventCount m_Waitset;

public:
void SignalPush(void *state) {
m_WorkQ.Push(state);
m_Waitset.Signal();
}

void* WaitPop() {
void *retval = m_WorkQ.TryPop();
while(! retval) {

EventCount::Value_t ec_local = m_Waitset.Load();

retval = m_WorkQ.TryPop();
if (retval) { return retval; }
m_WorkQ.Wait(ec_local);
retval = m_WorkQ.TryPop();
}
return retval;
}
};

Joe Seigh has mentioned this pattern in this group before... Its a
double-checked thing that works well with eventcount implmentations that
have signalling function that is nearly free...

> or point me at a paper that does?

http://portal.acm.org/citation.cfm?id=359076

And the fairly portable original lock-free fast-pathed implementation I did
here:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/aa8c62ad06dbb380

The stuff I did is not "quite" the same as the stuff in the paper... My
stuff is a little different... For instance, it sets a waiters bit in the
eventcount state when its value is "loaded" by a thread.

> I've tried to
> find more info on it but could not.

I am sorry for the limited information... The eventcount is so versatile
that it deserves better info...

Chris Thomasson

unread,

Mar 27, 2007, 3:04:51 AM3/27/07

to

> What was Knuth's disclaimer again? Something like "I have only proven
> it correct, not tested it."

I have tested the hell out of it. The eventcount is a sound primitive. The
reason I blast pseudo-code out in the newsreader is because its quicker than
cutting pieces of actual source code and stripping away all of the
library/system details. I created AppCore so that I could show off some real
code:

http://appcore.home.comcast.net/

comp.programming.thread is a place where I tend to
brainstorm/experiment/discuss "new" ideas, and post quick pseudo-code of my
existing working models...

Steve Watt

unread,

Mar 27, 2007, 7:31:33 PM3/27/07

to

In article <u66dnb-zeNKYX5Xb...@comcast.com>,

Wholly understood, I'm just poking fun at your habit of posting 3-4
"oops" messages on a bit of code. Whenever I see you post an idea,
I know to wait a day or so while you find all your own bugs. Then
the rest of us can look. :)

Mirek Fidler

unread,

Mar 30, 2007, 5:21:09 AM3/30/07

to

> Well, they are kind of like win32 events, but, they allow you to wait on a
> "multiple number" of events in a single primitive. Its "safer" than a normal
> event because your not going to miss any signals... It's a non-lossy event
> so to speak...

Well, that sounds pretty much like semaphore. What is the difference?

Mirek

Chris Thomasson

unread,

Mar 30, 2007, 5:52:23 PM3/30/07

to

"Mirek Fidler" <c...@ntllib.org> wrote in message
news:1175246469....@r56g2000hsd.googlegroups.com...

Here is the short answer:

You cannot use a single semaphore to create a condition variable. You can
use a single eventcount to create a condition variable.

Semaphores simply do not keep proper waitset generations because waiters are
allowed to steal increments that were not meant for them. Eventcounts do
keep proper waitset generations.

Does that help?

Mirek Fidler

unread,

Mar 31, 2007, 4:41:01 AM3/31/07

to

No, but I will try to google the rest :) Thanks.

Mirek

rani_s...@hotmail.com

unread,

Apr 1, 2007, 10:35:02 PM4/1/07

to

> > Well, that sounds pretty much like semaphore. What is the difference?
>
> Here is the short answer:
>
> You cannot use a single semaphore to create a condition variable. You can
> use a single eventcount to create a condition variable.

Let me rephrase the question.

For the lock free construct that you presented (e.g. lock free queue)
what are the benefits for using Event-Count, Conditional Variable or
Lock-Free Semaphore (pseudo implementation below).

With semaphore I can rewrite the lock-free wait construct in two ways:

// #1 - test and wait
while(!TestGenericPredicate()) {
Wait-On-Semaphore();
}

// #2 wait and get
Wait-On-Semaphore();
Dequeue();

The weaker just signal the semaphore.
(I still trying to figure out which way is better in general. My guess
is #2 since it seems to have less context switches).

Can you compare between the semaphore approach and other approaches:
1) Performance
2) Scalability - high loaded
3) Scalability - Number of CPUs
4) Fairness pessimism
5) Potential optimizations to avoid unnecessary context switches

Lock free semaphore (for dummies):
void Wait() {
if (InterlockedIncrement(&m_lCount) > 0) SemaWait();
}

void Release() {
if (InterlockedDecrement(&m_lCount) >= 0) SemaRelease();
}

Thanks,
Rani

Chris Thomasson

unread,

Apr 2, 2007, 2:20:42 AM4/2/07

to

<rani_s...@hotmail.com> wrote in message
news:1175481302.3...@l77g2000hsb.googlegroups.com...

>> > Well, that sounds pretty much like semaphore. What is the difference?
>>
>> Here is the short answer:

[...]> Let me rephrase the question.

>
> For the lock free construct that you presented (e.g. lock free queue)
> what are the benefits for using Event-Count, Conditional Variable or
> Lock-Free Semaphore (pseudo implementation below).
>
> With semaphore I can rewrite the lock-free wait construct in two ways:

[...]

> Can you compare between the semaphore approach and other approaches:
> 1) Performance

[...]

> void Release() {
> if (InterlockedDecrement(&m_lCount) >= 0) SemaRelease();
> }

The Release function of a semaphore is basically analogous to the Signal
function for an EventCount. However, there is one MAJOR difference. The
semaphore is simply FORCED to execute an interlocked RMW operation, while an
EventCount has the ability to get a interlocked-op-free fast-path for it's
Signals or Broadcasts. It boils down to the cost of a MOV and TEST
instruction versus the cost of a LOCK DEC/XADD instruction.

The LOCK prefix, along with the interlocked RMW is FAR more expensive and
caries vastly greater overheads than the MOV and TEST instructions do.

Is this starting to make sense?

Sorry for any confusion!
Thank you.

rani_s...@hotmail.com

unread,

Apr 2, 2007, 8:54:52 PM4/2/07

to

> > void Release() {
> > if (InterlockedDecrement(&m_lCount) >= 0) SemaRelease();
> > }
>
> The Release function of a semaphore is basically analogous to the Signal
> function for an EventCount. However, there is one MAJOR difference. The
> semaphore is simply FORCED to execute an interlocked RMW operation, while an
> EventCount has the ability to get a interlocked-op-free fast-path for it's
> Signals or Broadcasts. It boils down to the cost of a MOV and TEST
> instruction versus the cost of a LOCK DEC/XADD instruction.
>
> The LOCK prefix, along with the interlocked RMW is FAR more expensive and
> caries vastly greater overheads than the MOV and TEST instructions do.
>
> Is this starting to make sense?

I think that I understand your intention. You are trying to move
expensive operation from the fast waker's path (i.e. no waiters) into
the slow waiter's path (i.e. about to wait due to no items). You are
doing so by "marking" the fact that there is a potential waiter in the
additional get function (i.e. interlocked m_ec & 0x80000000). The
technique seems to go beyond lock-free into being interlocked-free. I
assume that you are *not* a big fan of ref-counting using interlocked
operations.

I have some concern about interlocked-free programming. Do you rely
(architecture-wise) on the fact that the waker operation before
signaling (e.g. en-queuing) ensures the visibility of the EC get
operation? It seems that otherwise there is no guarantee (in general)
that waker's CPU will see the change.

I don't see the same problem with the interlocked-free implementation
of conditional variable signaling since the race condition is always
there yet the CV usage construct is robust enough due to the
serializing mutex.

Another concert that I had is regarding the actual benefit of avoiding
additional interlocked operation in the common use case in which the
waker already preformed interlocked operation before signaling. I
thought that CPU architectures are motivated to optimize common use
cases like multiple interlocked operations in short time (e.g. async
queue using CV or plain mutex with fast scope).
Do you have some measurements? For example async queue via EC, CV and
semaphore. I'm mainly interested to know scalability in respect to the
number of CPUs in various loads.

Thanks,
Rani

Chris Thomasson

unread,

Apr 5, 2007, 10:11:03 PM4/5/07

to

<rani_s...@hotmail.com> wrote in message
news:1175561692.3...@p15g2000hsd.googlegroups.com...
[...]

>> Is this starting to make sense?
>
> I think that I understand your intention. You are trying to move
> expensive operation from the fast waker's path (i.e. no waiters) into
> the slow waiter's path (i.e. about to wait due to no items).

[...]

Yup.

> I
> assume that you are *not* a big fan of ref-counting using interlocked
> operations.

Yup.

> I have some concern about interlocked-free programming. Do you rely
> (architecture-wise) on the fact that the waker operation before
> signaling (e.g. en-queuing) ensures the visibility of the EC get
> operation? It seems that otherwise there is no guarantee (in general)
> that waker's CPU will see the change.

Joe Seigh already outlined the required memory barriers here:

http://groups.google.com/group/comp.programming.threads/msg/8e7a0379b55557c0