Condition Variables on Win32

Anthony Williams

unread,

Sep 19, 2006, 11:33:56 AM9/19/06

to

Hi all,

I've been working on implementing condition variables on win32 for the boost
thread library (www.boost.org), and I'd like folks here to comment on my
algorithm.

The condition variable algorithm is actually a variant of Alexander Terekhov's
8a algorithm, and uses the same "gate" idea. However, my implementation uses
QueueUserAPC to wake up the waiting threads, rather than have them wait on an
event or semaphore. This gets round what I think is a deficiency in
Alexander's algorithm, which is that once a notify operation is underway, no
new threads can wait on the condition until all notified threads have actually
woken up, and the last one has "opened the gate". By using QueueUserAPC, we
can notify the threads to wake up, and open the gate, without having to wait
for all the notified threads to resume. The flipside is that the first thread
to wait is the first to be notified. Whether this is a good or bad thing is
open to debate.

The code is in the boost CVS repository, at boost/thread/win32/condition.hpp
on the thread_rewrite branch.

http://boost.cvs.sourceforge.net/*checkout*/boost/boost/boost/thread/win32/condition.hpp?revision=1.1.2.4&pathrev=thread_rewrite

The algorithm is as follows:

waiting_list_node:
waiting_list_node* next, prev
HANDLE thread_handle
bool notified

waiting_list: doubly-linked list of waiting_list_node
gate: mutex

init():
waiting_list.next=waiting_list.prev=&waiting_list
init mutex

wait(external_mutex, timeout):
create a new waiting_list_node
new_node.thread_handle=thread handle for this thread
new_node.prev=&waiting_list
lock(gate)
new_node.next=waiting_list.next
waiting_list.next=&new_node
new_node.next->prev=&new_node
unlock(external_mutex)
unlock(gate)

// Any APC will break the sleep, so keep sleeping until we've been
// notified, or we've timed out
while(!atomic_read(new_node.notified)
&& SleepEx(milliseconds_until(timeout), true)==WAIT_IO_COMPLETION);

lock(gate)
unlink(new_node)
unlock(gate)
lock(external_mutex)
return new_node.notified // did we timeout, or were we notified?

unlink(node)
// unlink the node from the list
node.next->prev=new_node.prev
node.prev->next=new_node.next
node.next=node.prev=&node

notify_and_unlink_entry(node)
atomic_set(node->notified,true)
unlink(node)
// wake the node's thread by queueing an APC
// the APC func doesn't have to do anything to wake the thread
QueueUserAPC(NOP(),node->thread_handle)

notify_one()
lock(gate)
if(waiting_list.prev==&waiting_list) do nothing
else
notify_and_unlink_entry(waiting_list.prev)
unlock(gate)

notify_all()
create a waiting_list_node for new_list
lock(gate)
// transfer the existing list over to new_list
new_list.prev=waiting_list.prev
new_list.next=waiting_list.next
new_list.next->prev=&new_list
new_list.prev->next=&new_list
// the waiting_list is now empty
waiting_list.next=waiting_list.prev=&waiting_list
unlock(gate) // noone has to wait for us any more

while(new_list.prev!=&new_list) // loop until the captured list is empty
notify_and_unlink_entry(new_list.prev)

Thanks for reading, and for any comments.

Anthony
--
Anthony Williams
Software Developer
Just Software Solutions Ltd
http://www.justsoftwaresolutions.co.uk

Chris Thomasson

unread,

Sep 20, 2006, 1:48:41 AM9/20/06

to

"Anthony Williams" <anthon...@yahoo.com> wrote in message
news:8xkf3m...@yahoo.com...

> Hi all,
>
> I've been working on implementing condition variables on win32 for the
> boost
> thread library (www.boost.org), and I'd like folks here to comment on my
> algorithm.
>

FWIW, here is another technique using one of my heavily fast-pathed
eventcount algorithms:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/aa8c62ad06dbb380/79016c78a3696f6c?lnk=gst&q=portable+eventcount&rnum=1#79016c78a3696f6c

Joe Seigh and I have had some discussions on this zero-overhead signaling
stuff a while back... IIRC, his atomic-ptr-plus project had a
high-performance futex-based eventcount implementation... I can't really
remember if he used his trick for implementing a 63-bit counter that only
made use of 32-bit atomic operations...

;)

One thing about my algorithm that you will quickly notice is that it uses
pthread calls for all of its slow-paths! Funny... It seems that a wrapping a
clever lock-free algorithm around most existing pthread implementations'
usually buys you some extreme performance enhancements... It sure does seem
that most of the pthread implementations' out there simply deserve to be
"restricted to the slow-paths of any synchronization algorithms"...

Just kidding!

:)

class mutex {...};

class condition {
ec m_ec;

public:

void wait(mutex &mtx) {
int ec = m_ec.get();
mutex::guard_unlock unlock(mtx);
m_ec.wait(ec);
}

void broadcast() { m_ec.signal(); }
};

simple eventcount based condition variable with a waiters bit. It has an
atomic operation free fast-path for signalers'; only relies on a load and a
membar. Its basically a broadcast only generation-based (e.g., basic
eventcount functionality) waitset; adding support single-signaling and
timeouts is fairly trivial...

Chris Thomasson

unread,

Sep 20, 2006, 2:00:58 AM9/20/06

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:XOadnTwGRr4oTo3Y...@comcast.com...

> "Anthony Williams" <anthon...@yahoo.com> wrote in message
> news:8xkf3m...@yahoo.com...
>> Hi all,
>>
>> I've been working on implementing condition variables on win32 for the
>> boost
>> thread library (www.boost.org), and I'd like folks here to comment on my
>> algorithm.
>>
>
> FWIW, here is another technique using one of my heavily fast-pathed
> eventcount algorithms:

Here is another one of my algorithms that might be of interest to you:

http://groups.google.com/group/comp.programming.threads/msg/af00cb5997f95027
(pseudo-code impl)

http://groups.google.com/group/comp.programming.threads/msg/2b5b6096101948d8
(fix)

http://groups.google.com/group/comp.programming.threads/msg/496d9522aa637656
(fix)

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/204de663711e39b8/35c84a8d008b8ee1?lnk=gst&q=lock-free+win32+condvar&rnum=1#35c84a8d008b8ee1

IIRC, once you apply the fixes that I outlined in that thread, the
implementation should work... However, there is a caveat: It uses a scheme
that "doesn't rely on a single kernel object as a waitset"; your algorithm
is bound to SCHED_OTHER. I believe that this holds true for your
implementation as well?

Also, how does your implementation cope with POSIX safety, wrt the
race-condition that exists when a thread starts to destroy synchronization
objects; any waiters, anybody?...

;)

In order to achieve proper POSIX safety you need to use static storage
(e.g., fixed arrays of waitsets /w parking interface), or you have to use
GC, PDR, or atomic reference counting...

Alexander Terekhov

unread,

Sep 20, 2006, 4:10:48 AM9/20/06

to

You don't have to use funky stuff for CVs with real queues. Per-thread
binary semas for parking/unparking would do just fine. The problem is
that you're making scheduling decisions (FIFO sucks BTW, LIFO is much
better for CVs) and you might have problems in the process shared case
(8x was used to implement CV style primitive on top of process shared
semas, and no-pointers was a must-have).

regards,
alexander.

Anthony Williams

unread,

Sep 20, 2006, 5:34:23 AM9/20/06

to

Alexander Terekhov <tere...@web.de> writes:

> You don't have to use funky stuff for CVs with real queues. Per-thread
> binary semas for parking/unparking would do just fine.

I agree that binary semas would work fine. I was concerned at the overhead of
creating and destroying an OS-level sync object (win32 Event or Semaphore) for
every wait. Was that really what you meant, or did you have something else in
mind?

> The problem is
> that you're making scheduling decisions (FIFO sucks BTW, LIFO is much
> better for CVs) and you might have problems in the process shared case
> (8x was used to implement CV style primitive on top of process shared
> semas, and no-pointers was a must-have).

I agree that this technique would have problems in the process shared case,
but I don't intend it to work under that scenario, only under the in-process
case.

I agree my algorithm is making scheduling decisions. I'm not sure why that's a
bad thing, since it's a "fair" scheme.

Why is a LIFO queue better than a FIFO queue for CVs? Surely if there are many
waiters, only being notified one at a time, then the first-to-wait may end up
waiting for a very long time? Changing to a LIFO is really easy, but I'd want
to understand why before making the change.

Anthony Williams

unread,

Sep 20, 2006, 5:50:39 AM9/20/06

to

"Chris Thomasson" <cri...@comcast.net> writes:

> "Chris Thomasson" <cri...@comcast.net> wrote in message
> news:XOadnTwGRr4oTo3Y...@comcast.com...
>> "Anthony Williams" <anthon...@yahoo.com> wrote in message
>> news:8xkf3m...@yahoo.com...
>>> Hi all,
>>>
>>> I've been working on implementing condition variables on win32 for the
>>> boost
>>> thread library (www.boost.org), and I'd like folks here to comment on my
>>> algorithm.
>>>
>>
>> FWIW, here is another technique using one of my heavily fast-pathed
>> eventcount algorithms:
>
>
> Here is another one of my algorithms that might be of interest to you:

These look really similar to my implementation, except that they use an event
for each waiting thread in order to signal the wakeups.

> IIRC, once you apply the fixes that I outlined in that thread, the
> implementation should work... However, there is a caveat: It uses a scheme
> that "doesn't rely on a single kernel object as a waitset"; your algorithm
> is bound to SCHED_OTHER. I believe that this holds true for your
> implementation as well?

Yes, just like yours; it's the same algorithm.

> Also, how does your implementation cope with POSIX safety, wrt the
> race-condition that exists when a thread starts to destroy synchronization
> objects; any waiters, anybody?...

Undefined behaviour, as usual.

Chris Newbold

unread,

Sep 20, 2006, 8:42:05 AM9/20/06

to

On Tue, 19 Sep 2006 16:33:56 +0100, Anthony Williams wrote:

> I've been working on implementing condition variables on win32 for the
> boost thread library (www.boost.org), and I'd like folks here to comment
> on my algorithm.

I'm still looking over your example, but I'll come back with any comments.

In the mean time, here is a condition-variable implementation which I
assembled quickly to meet our needs for POSIX semantics on Win32. It does
not address issues with destruction (undefined behavior here) and does not
attempt to make any scheduling decisions-- I defer those to the OS with
ReleaseSemaphore().

I did not spend more than an afternoon on this, so there is likely
something horribly wrong with it, though so far we've never caught it
violating the POSIX definition.

One place where I can see that it is a bit dodgy is with a concurrent
wait() and signal(): it's possible for the wait() to grab the semaphore
that was just posted by signal() and sail on through without sleeping.
This isn't "incorrect" in the sense that no other thread will be released,
but it is certainly not fair from a scheduling standpoint. I guess it
amounts to LIFO under these conditions.

-Chris

class ConditionImpl
{
public:
explicit ConditionImpl(HANDLE externalLock);
~ConditionImpl();
void signal();
void signalAll();
void wait() { wait(INFINITE); }
WaitStatus wait(DWORD duration);
private:
// The external mutex which is released and reacquired
// during wait().
HANDLE _extLock;

// Count of threads sleeping in wait() and a fast lock to
// protect it.
CRITICAL_SECTION _intLock;
LONG _numWaiters;

// Semaphore acting as the queue of threads to wake in
// response to signal() or signalAll().
HANDLE _wakeupSemaphore;
}; // class ConditionImpl

ConditionImpl::ConditionImpl(HANDLE externalLock)
: _extLock(externalLock),
_numWaiters(0),
_wakeupSemaphore(CreateSemaphore(NULL, /* no security */
0, /* initially 0 */
0x7fffffff, /* max count */
NULL /* unnamed */))

{
assert(_wakeupSemaphore != NULL);
InitializeCriticalSection(&_intLock);
}

ConditionImpl::~ConditionImpl()
{
if (CloseHandle(_wakeupSemaphore) == 0) {
assert(0);
}
DeleteCriticalSection(&_intLock);
}

void ConditionImpl::signal()
{
// Snarf one of the waiters
EnterCriticalSection(&_intLock);
bool wakeup(false);
if (_numWaiters != 0) {
--_numWaiters;
wakeup = true;
}
LeaveCriticalSection(&_intLock);

// If there was a waiter, wake someone up
if (wakeup) {
if (ReleaseSemaphore(_wakeupSemaphore, 1, NULL) == 0) {
assert(0);
}
}
}

void ConditionImpl::signalAll()
{
// Snarf all of the waiters
EnterCriticalSection(&_intLock);
const LONG count(_numWaiters);
_numWaiters = 0;
LeaveCriticalSection(&_intLock);

// If there were waiters, wake them all
if (count != 0) {
if (ReleaseSemaphore(_wakeupSemaphore, count, NULL) == 0) {
assert(0);
}
}
}

WaitStatus ConditionImpl::wait(DWORD duration)
{
// Note that there is another thread waiting
EnterCriticalSection(&_intLock);
++_numWaiters;
LeaveCriticalSection(&_intLock);

// Release the external mutex and wait on the semaphore for a
// Signal() or Broadcast().
DWORD e = SignalObjectAndWait(_extLock,
_wakeupSemaphore,
duration,
FALSE /* alertable */);

WaitStatus result(THR_INVALID);

switch (e) {
case WAIT_TIMEOUT:
result = THR_TIMEDOUT;
break;
case WAIT_OBJECT_0:
result = THR_ACQUIRED;
break;
default:
assert(0);
}

// Reacquire the external mutex before returning
e = WaitForSingleObject(_extLock, INFINITE);
assert(e == WAIT_OBJECT_0);

return result;
}

Anthony Williams

unread,

Sep 20, 2006, 8:51:53 AM9/20/06

to

Chris Newbold <cnew...@mathworks.com> writes:

> On Tue, 19 Sep 2006 16:33:56 +0100, Anthony Williams wrote:
>
>> I've been working on implementing condition variables on win32 for the
>> boost thread library (www.boost.org), and I'd like folks here to comment
>> on my algorithm.
>
> I'm still looking over your example, but I'll come back with any comments.
>
> In the mean time, here is a condition-variable implementation which I
> assembled quickly to meet our needs for POSIX semantics on Win32.

Thanks.

> One place where I can see that it is a bit dodgy is with a concurrent
> wait() and signal(): it's possible for the wait() to grab the semaphore
> that was just posted by signal() and sail on through without sleeping.
> This isn't "incorrect" in the sense that no other thread will be released,
> but it is certainly not fair from a scheduling standpoint. I guess it
> amounts to LIFO under these conditions.

For a single signal/wait it's not too bad. The issue is with signalAll --- a
fresh waiter might steal a wake-up intended for one of the original waiters,
which does violate POSIX. That's why I isolate the current waiter list before
notifying any of them.

Alexander Terekhov

unread,

Sep 20, 2006, 9:14:39 AM9/20/06

to

Anthony Williams wrote:
>
> Alexander Terekhov <tere...@web.de> writes:
>
> > You don't have to use funky stuff for CVs with real queues. Per-thread
> > binary semas for parking/unparking would do just fine.
>
> I agree that binary semas would work fine. I was concerned at the overhead of
> creating and destroying an OS-level sync object (win32 Event or Semaphore) for
> every wait. Was that really what you meant, or did you have something else in
> mind?

I meant per-thread semas "permanently" assigned to threads.

[...]

> Why is a LIFO queue better than a FIFO queue for CVs?

Basically same thing as with a lock and "bias" in favor of the currently
running thread regarding unlock()->lock().

regards,
alexander.

Alexander Terekhov

unread,

Sep 20, 2006, 9:16:31 AM9/20/06

to

Chris Newbold wrote:
[...]

> something horribly wrong with it, though so far we've never caught it
> violating the POSIX definition.

Try to play

http://sourceware.org/ml/pthreads-win32/2001/msg00015.html
(tennis.cpp/tennisb.cpp)

with it.

regards,
alexander.

Anthony Williams

unread,

Sep 26, 2006, 12:19:20 PM9/26/06

to

Alexander Terekhov <tere...@web.de> writes:

> Anthony Williams wrote:
>>
>> Alexander Terekhov <tere...@web.de> writes:
>>
>> > You don't have to use funky stuff for CVs with real queues. Per-thread
>> > binary semas for parking/unparking would do just fine.
>>
>> I agree that binary semas would work fine. I was concerned at the overhead of
>> creating and destroying an OS-level sync object (win32 Event or Semaphore) for
>> every wait. Was that really what you meant, or did you have something else in
>> mind?
>
> I meant per-thread semas "permanently" assigned to threads.

Thanks. That's an idea worth considering.

> [...]
>> Why is a LIFO queue better than a FIFO queue for CVs?
>
> Basically same thing as with a lock and "bias" in favor of the currently
> running thread regarding unlock()->lock().

Do you have a reference for somewhere I can read more on this. The only
mention I've been able to find was someone saying that LIFO can lead to
starvation, which is my gut feel.

Chris Thomasson

unread,

Sep 26, 2006, 3:42:18 PM9/26/06

to

"Anthony Williams" <anthon...@yahoo.com> wrote in message

news:ejtyy5...@yahoo.com...

> Alexander Terekhov <tere...@web.de> writes:
>
>> Anthony Williams wrote:
>>>
>>> Alexander Terekhov <tere...@web.de> writes:
>>>
>>> > You don't have to use funky stuff for CVs with real queues. Per-thread
>>> > binary semas for parking/unparking would do just fine.
>>>
>>> I agree that binary semas would work fine. I was concerned at the
>>> overhead of
>>> creating and destroying an OS-level sync object (win32 Event or
>>> Semaphore) for
>>> every wait. Was that really what you meant, or did you have something
>>> else in
>>> mind?
>>
>> I meant per-thread semas "permanently" assigned to threads.
>
> Thanks. That's an idea worth considering.

I used it here:

http://groups.google.com/group/comp.programming.threads/msg/af00cb5997f95027

per-thread bin-semas. It works fine, but SCHED_OTHER is a problem...

>> [...]
>>> Why is a LIFO queue better than a FIFO queue for CVs?
>>
>> Basically same thing as with a lock and "bias" in favor of the currently
>> running thread regarding unlock()->lock().
>
> Do you have a reference for somewhere I can read more on this. The only
> mention I've been able to find was someone saying that LIFO can lead to
> starvation, which is my gut feel.

Yeah. You can starve under the right conditions... Usually under moderate
periods of load where there could be persistent contention on the lock. A
waiter at the bottom of a LIFO stack could stay there for a while... Of
course, when you are creating your own waitsets, you don't have to choose
LIFO or FIFO and stick to it. You can switch back and forth on the fly. You
could introduce an extra piece of state into your waitsets that can
determine that there is a possibility of starvation, and switch over to FIFO
to "drain the queue", after that you can switch back over to LIFO...
Simple...

;)

Anyway, I have developed a high-performance solution that doesn't tie a
users synchronization scheme to SCHED_OTHER. Here is a brief introduction:

http://appcore.home.comcast.net/vzoom/pthread/

The new synchronization primitive I created allows Win95 through Vista to
have heavily fast-pathed and robust process/thread-shared condition
variables... Pretty cool huh?

:)

Any thoughts?

Anthony Williams

unread,

Sep 27, 2006, 12:03:00 PM9/27/06

to

"Chris Thomasson" <cri...@comcast.net> writes:

> "Anthony Williams" <anthon...@yahoo.com> wrote in message
> news:ejtyy5...@yahoo.com...
>> Alexander Terekhov <tere...@web.de> writes:

>>> I meant per-thread semas "permanently" assigned to threads.
>>
>> Thanks. That's an idea worth considering.
>
> I used it here:
>
> http://groups.google.com/group/comp.programming.threads/msg/af00cb5997f95027

Yes. I knew I'd seen an implementation that did that here at somepoint.

>>> [...]
>>>> Why is a LIFO queue better than a FIFO queue for CVs?
>>>
>>> Basically same thing as with a lock and "bias" in favor of the currently
>>> running thread regarding unlock()->lock().
>>
>> Do you have a reference for somewhere I can read more on this. The only
>> mention I've been able to find was someone saying that LIFO can lead to
>> starvation, which is my gut feel.
>
> Yeah. You can starve under the right conditions... Usually under moderate
> periods of load where there could be persistent contention on the lock. A
> waiter at the bottom of a LIFO stack could stay there for a while... Of
> course, when you are creating your own waitsets, you don't have to choose
> LIFO or FIFO and stick to it. You can switch back and forth on the fly. You
> could introduce an extra piece of state into your waitsets that can
> determine that there is a possibility of starvation, and switch over to FIFO
> to "drain the queue", after that you can switch back over to LIFO...
> Simple...

Yes, I could do that.

> Anyway, I have developed a high-performance solution that doesn't tie a
> users synchronization scheme to SCHED_OTHER. Here is a brief introduction:
>
> http://appcore.home.comcast.net/vzoom/pthread/
>
> The new synchronization primitive I created allows Win95 through Vista to
> have heavily fast-pathed and robust process/thread-shared condition
> variables... Pretty cool huh?

> Any thoughts?

Sounds neat, but that's worthless without code we can use.

When can you post some code? What licence would it be under?

Chris Thomasson

unread,

Sep 27, 2006, 7:09:40 PM9/27/06

to

"Anthony Williams" <anthon...@yahoo.com> wrote in message

news:1wpxxp...@yahoo.com...

> "Chris Thomasson" <cri...@comcast.net> writes:
>
>> "Anthony Williams" <anthon...@yahoo.com> wrote in message
>> news:ejtyy5...@yahoo.com...
>>> Alexander Terekhov <tere...@web.de> writes:
>
>>>> I meant per-thread semas "permanently" assigned to threads.
>>>
>>> Thanks. That's an idea worth considering.
>>
>> I used it here:
>>
>> http://groups.google.com/group/comp.programming.threads/msg/af00cb5997f95027
>
> Yes. I knew I'd seen an implementation that did that here at somepoint.

;)

[...]

>> Yeah. You can starve under the right conditions...

[...]

>> Of course, when you are creating your own waitsets, you don't have to
>> choose
>> LIFO or FIFO and stick to it. You can switch back and forth on the fly.
>> You
>> could introduce an extra piece of state into your waitsets that can
>> determine that there is a possibility of starvation, and switch over to
>> FIFO
>> to "drain the queue", after that you can switch back over to LIFO...
>> Simple...
>
> Yes, I could do that.

I have experimentally used the technique a couple of times.

It sure does seem to solve/address the starvation problem...

>> Anyway, I have developed a high-performance solution that doesn't tie a
>> users synchronization scheme to SCHED_OTHER. Here is a brief
>> introduction:
>>
>> http://appcore.home.comcast.net/vzoom/pthread/
>>
>> The new synchronization primitive I created allows Win95 through Vista to
>> have heavily fast-pathed and robust process/thread-shared condition
>> variables... Pretty cool huh?
>
>> Any thoughts?
>
> Sounds neat, but that's worthless without code we can use.

I hope you can read pseudo-code! Just kidding!

:)

> When can you post some code?

Not sure if I am going to post full blown source code... I was thinking
about including pseudo-code in a technical document that will explain some
of the algorithms in some detail. I have two versions of my library; an
academic version and a commercial version. The commercial version has some
"sensitive" algorithms that I don't want to give out to the public domain
just yet; its also based on some of my patented techniques...

The academic version will have pseudo-code and technical documents to guide
you through...

Well, I can say the academic version includes a lock-free PDR algorithm I
invented:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/b2f751d5dad6c07b

This invention can get around DWCAS...

> What licence would it be under?

The basic pthread library would be under a non-commercial academic style
license; still working out specifics. The full pthread library would be
under a commercial style license. Of course, the commercial version of my
library has tons of extra's; Including a several full blown working
versions' of my vZOOM patent... And a highly tuned version of the
high-performance memory allocator I invented:

http://groups.google.com/group/comp.arch/browse_frm/thread/24c40d42a04ee855

What do you think of this invention? I intend to make the commercial version
give you the most bang for your buck!

:)

Markus Elfring

unread,

Sep 27, 2006, 11:55:20 PM9/27/06

to

> I've been working on implementing condition variables on win32 for the boost
> thread library (www.boost.org), and I'd like folks here to comment on my algorithm.

Can the evolving interfaces help in the future?
http://msdn.microsoft.com/library/en-us/dllproc/base/using_condition_variables.asp

See also the article "New Vista concurrency features" by Joe Duffy:
http://www.bluebytesoftware.com/blog/PermaLink,guid,17433c64-f45e-40f7-8772-dedb69ab2190.aspx

Regards,
Markus

Chris Thomasson

unread,

Sep 28, 2006, 12:19:56 AM9/28/06

to

"Markus Elfring" <Markus....@web.de> wrote in message
news:4o0v9pF...@individual.net...

They are acting like a fast-pathed rw-lock is something new:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5ffb0ed46a4bcb4b

Humm...

Anthony Williams

unread,

Sep 28, 2006, 4:26:24 AM9/28/06

to

"Chris Thomasson" <cri...@comcast.net> writes:

> "Anthony Williams" <anthon...@yahoo.com> wrote in message
> news:1wpxxp...@yahoo.com...
>> "Chris Thomasson" <cri...@comcast.net> writes:
>>
>
>>> Anyway, I have developed a high-performance solution that doesn't tie a
>>> users synchronization scheme to SCHED_OTHER. Here is a brief
>>> introduction:
>>>
>>> http://appcore.home.comcast.net/vzoom/pthread/
>>>
>

>> What licence would it be under?
>
> The basic pthread library would be under a non-commercial academic style
> license; still working out specifics. The full pthread library would be
> under a commercial style license. Of course, the commercial version of my
> library has tons of extra's; Including a several full blown working
> versions' of my vZOOM patent...

That's no good for boost, then; the code needs to be under a no-strings free
licence, which can be used for commercial and non-commercial use:

See http://www.boost.org/more/license_info.html for background
See http://www.boost.org/LICENSE_1_0.txt for license text

Chris Thomasson

unread,

Sep 28, 2006, 9:45:14 PM9/28/06

to

"Anthony Williams" <anthon...@yahoo.com> wrote in message

news:wt7ov1...@yahoo.com...

So, you have to give your intellectual property away to the world when you
contribute anything to boost?

Humm.... Interesting... I wonder if you have to teach Boost Consulting how
to properly teach their clients exactly how to leverage other peoples
inventions against their in-house frameworks'...

Wow...

Joe Seigh

unread,

Sep 28, 2006, 10:41:32 PM9/28/06

to

C++0x will eventually get around to defining the low level threading
support. Then at some point in the future they will get around to
defining the intermediate api's to support thread-safe and lock-free
library implementation. PDR should be part of that api. If you
define a PDR api, then you can have multiple implementations of that api,
e.g. GC, RCU, SMR, VZOOM, single-target lock-free reference counting, etc...
Boost will have to be satisfied with whatever PDR schemes are in the public
domain. And you'll possibly have commercial high performance analogs of
the Boost library api. For money of course. :)

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Anthony Williams

unread,

Sep 29, 2006, 3:23:06 AM9/29/06

to

"Chris Thomasson" <cri...@comcast.net> writes:

> "Anthony Williams" <anthon...@yahoo.com> wrote in message
> news:wt7ov1...@yahoo.com...
>> "Chris Thomasson" <cri...@comcast.net> writes:
>>> "Anthony Williams" <anthon...@yahoo.com> wrote in message
>>> news:1wpxxp...@yahoo.com...
>>>> "Chris Thomasson" <cri...@comcast.net> writes:
>>> The basic pthread library would be under a non-commercial academic style
>>> license; still working out specifics. The full pthread library would be
>>> under a commercial style license. Of course, the commercial version of my
>>> library has tons of extra's; Including a several full blown working
>>> versions' of my vZOOM patent...
>>
>> That's no good for boost, then; the code needs to be under a no-strings
>> free
>> licence, which can be used for commercial and non-commercial use:
>>
>> See http://www.boost.org/more/license_info.html for background
>> See http://www.boost.org/LICENSE_1_0.txt for license text
>
> So, you have to give your intellectual property away to the world when you
> contribute anything to boost?

Yes. Same as for many open source projects. Contributing to the boost project
is done because you think other people will find your contribution useful, and
you want to share with the world. The fact that people are using your code
should be its own reward.

Boost is a highly respected collection of C++ libraries, many of which are
being considered (or already accepted) into the next C++ standard.

> Humm.... Interesting... I wonder if you have to teach Boost Consulting how
> to properly teach their clients exactly how to leverage other peoples
> inventions against their in-house frameworks'...

Boost Consulting is a private company that happens to provide many resources
to the boost project, including a couple of the most active contributors, a
bit like the way Cygnus contributes to other open source projects.

Chris Thomasson

unread,

Sep 29, 2006, 6:53:12 PM9/29/06

to

"Anthony Williams" <anthon...@yahoo.com> wrote in message

news:lko387...@yahoo.com...

> "Chris Thomasson" <cri...@comcast.net> writes:
>> "Anthony Williams" <anthon...@yahoo.com> wrote in message
>> news:wt7ov1...@yahoo.com...
>>> "Chris Thomasson" <cri...@comcast.net> writes:
>>>> "Anthony Williams" <anthon...@yahoo.com> wrote in message
>>>> news:1wpxxp...@yahoo.com...
>>>>> "Chris Thomasson" <cri...@comcast.net> writes:

[...]

>> So, you have to give your intellectual property away to the world when
>> you
>> contribute anything to boost?
>
> Yes. Same as for many open source projects.

Oh well. I have some stuff that I don't want to give away just yet... Sorry!

:(

> Contributing to the boost project
> is done because you think other people will find your contribution useful,
> and
> you want to share with the world. The fact that people are using your code
> should be its own reward.

Humm... I have contributed tons of algorithms, techniques and a library to
this group:

http://appcore.home.comcast.net/

I will "donate" one of those contributions to Boost:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/4e8717d3bcdedfe9

this will give Boost the ability to have "true" atomically thread-safe
"mostly" lock-free reference counted pointers; something that its currently
lacking. shared_ptr doesn't cut it. So, you can keep shared_ptr. We make a
new pointer class (e.g. name-goes-here_ptr) ;).

These pointers can be compare-and-swapped or exchanged (e.g., cmpxchg, xchg
or cas, swap for SPARC). You can't do this with shared_ptr...

What do you think of this?

BTW, I have never posted any code for anything to do with boost. What is the
easiest way to go about doing that?

> Boost is a highly respected collection of C++ libraries, many of which are
> being considered (or already accepted) into the next C++ standard.
>
>> Humm.... Interesting... I wonder if you have to teach Boost Consulting
>> how
>> to properly teach their clients exactly how to leverage other peoples
>> inventions against their in-house frameworks'...
>
> Boost Consulting is a private company that happens to provide many
> resources
> to the boost project, including a couple of the most active contributors,
> a
> bit like the way Cygnus contributes to other open source projects.

Seriously here, would I have to teach Boost Consulting how to use the stuff
I post to Boost? I believe they would have to learn about it all on their
own... Their private...

I am thinking about donating one more thing:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/e0c011baf08844c4

Once you augment this with a "backoff-and-retry" algorithm, it will provide
Boost with a lock-based software transactional memory (STM) library. It will
also give it a deadlock-free two-phased locking algorithm...

Chris Thomasson

unread,

Sep 29, 2006, 7:26:38 PM9/29/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:JsGdnXntf82lGoHY...@comcast.com...

> Chris Thomasson wrote:
>> "Anthony Williams" <anthon...@yahoo.com> wrote in message
>> news:wt7ov1...@yahoo.com...
>>>"Chris Thomasson" <cri...@comcast.net> writes:
>>>>"Anthony Williams" <anthon...@yahoo.com> wrote in message

[...]

> C++0x will eventually get around to defining the low level threading
> support.

Hopefully. Before I started with threading or any atomic operations / memory
barrier API, I would try to go ahead and address compiler and hardware
reordering. I have given them some quick suggestions:

http://groups.google.com/group/comp.lang.c++.moderated/msg/3cc13067c406e2fe

http://groups.google.com/group/comp.lang.c++.moderated/msg/027d27df6b617f2d

http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/c471250c03498781

std::atomic should behave as if it were an unknown and externally assembled
function; to get a compiler barrier. I would copy the memory barrier
functionality from the SPARC to solve the hardware ordering. Simply embed
the barriers in the atomic operations:

#define mbstore_acquire std::storeload | std::storestore

#define mbload_acquire std::loadstore | std::loadload

#define mbload_acquire_depends mbload_acquire | std::depends

std::atomic<int32_t>::cas<mbstore_acquire>(...);

std::atomic<int32_t>::load<mbload_acquire>(...);

std::atomic<int32_t>::load<mbload_acquire_depends>(...);

That should pretty much solve everything? Humm...

;)

> Then at some point in the future they will get around to
> defining the intermediate api's to support thread-safe and lock-free
> library implementation.

Yes. They can't really have any built-in lock-free library in std::
namespace... Because of some patents...

Anyway, I would not expect C/C++ to have any lock-free libraries'. I like
the low-level nature of C/C++, so I would fully support a atomic operations'
and memory barrier API, and that's about it...

> PDR should be part of that api.

Indeed!

:)

> If you
> define a PDR api, then you can have multiple implementations of that api,
> e.g. GC, RCU, SMR, VZOOM, single-target lock-free reference counting,
> etc...

Very true. Yes... All of those techniques could naturally support a common
PDR api... In fact, the most current version of vZOOM uses something like
the following:

/* configure PDR behavior */
vzpdr_cfg(...)

/* enter/increment a PDR collected region; this is recursive. */
vzpdr_inc(...)

/* leave/decrement a PDR collected region */
vzpdr_dec(...)

/* defer objects for latter processing */
vzpdr_defer(...)

/* waits for certain PDR events (e.g., sync-epochs) */
vzpdr_wait(...)

Of course this API has no support for reference counting of the objects that
are actually protected by the PDR. As you know, one can extend the lifetime
of an object across multiple synchronization epochs by reference counting
them. I think that support for the PDR-based reference counting should go
into a separate API?...

I think we can design a PDR standard here on c.p.t! You already started with
this post:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/82376b7295e6ce1a

Maybe I will post something entitled:

[RFC] The C.P.T PDR Standard...

And we should all start a company called C.P.T Consulting and claim
everything posted here as Prior Art...

lol!

;)

We will instantly have a company that has some of the best multi-threaded
programmers around...

> Boost will have to be satisfied with whatever PDR schemes are in the
> public
> domain. And you'll possibly have commercial high performance analogs of
> the Boost library api. For money of course. :)

Yes... Lots of $$$!

:)

Chris Thomasson

unread,

Sep 29, 2006, 11:01:24 PM9/29/06

to

> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:JsGdnXntf82lGoHY...@comcast.com...
>> Chris Thomasson wrote:
>>> "Anthony Williams" <anthon...@yahoo.com> wrote in message
>>> news:wt7ov1...@yahoo.com...
>>>>"Chris Thomasson" <cri...@comcast.net> writes:
>>>>>"Anthony Williams" <anthon...@yahoo.com> wrote in message

>> PDR should be part of that api.
>
> Indeed!
>
> :)

PDR seems like its low-level enough to be put into C/C++... Its a useful
tool wrt building "slightly higher-level" lock-free libraries...

Martin James

unread,

Sep 30, 2006, 12:10:41 AM9/30/06

to

Alexander Terekhov wrote:
> Anthony Williams wrote:
>> Alexander Terekhov <tere...@web.de> writes:
>>
>>> You don't have to use funky stuff for CVs with real queues. Per-thread
>>> binary semas for parking/unparking would do just fine.
>> I agree that binary semas would work fine. I was concerned at the overhead of
>> creating and destroying an OS-level sync object (win32 Event or Semaphore) for
>> every wait. Was that really what you meant, or did you have something else in
>> mind?
>
> I meant per-thread semas "permanently" assigned to threads.
>

Why not keep a queue pool of synchro-objects inside the CV/whatever
class? If a thread needs to wait, it can dequeue an event and wait on
it. The event can then be put on a LIFO/FIFO queue of events. A new
synchro-object, (ie kernel call), would then only be required when the
number of waiting threads exceeds the peak count of previous-waiters.

This would avoid threads that wish to use the PC-queue/CV/whatever
having to supply their own synchro-object & so improving flexibility and
encapsulation.

I use this approach for P-C queues - it works OK.

Rgds,
Martin

Joe Seigh

unread,

Sep 30, 2006, 7:56:09 AM9/30/06

to

Although, more likely and not given who is working on C++0x, it will be a
GC interface. Even there you may not see a standard lock-free library
like Java's java.util.concurrent. If they do something, they'll reinvent
it on their own and not take advantage of Java having worked out some of
the concurrency issues.

Anthony Williams

unread,

Oct 3, 2006, 5:17:15 AM10/3/06

to

"Chris Thomasson" <cri...@comcast.net> writes:

> "Anthony Williams" <anthon...@yahoo.com> wrote in message
> news:lko387...@yahoo.com...
>> "Chris Thomasson" <cri...@comcast.net> writes:
>>> "Anthony Williams" <anthon...@yahoo.com> wrote in message
>>> news:wt7ov1...@yahoo.com...
>>>> "Chris Thomasson" <cri...@comcast.net> writes:
>> Contributing to the boost project
>> is done because you think other people will find your contribution useful,
>> and
>> you want to share with the world. The fact that people are using your code
>> should be its own reward.
>
> Humm... I have contributed tons of algorithms, techniques and a library to
> this group:
>
> http://appcore.home.comcast.net/

Yes, and it's much appreciated.

> I will "donate" one of those contributions to Boost:
>
> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/4e8717d3bcdedfe9
>
> this will give Boost the ability to have "true" atomically thread-safe
> "mostly" lock-free reference counted pointers; something that its currently
> lacking. shared_ptr doesn't cut it. So, you can keep shared_ptr. We make a
> new pointer class (e.g. name-goes-here_ptr) ;).
>
> These pointers can be compare-and-swapped or exchanged (e.g., cmpxchg, xchg
> or cas, swap for SPARC). You can't do this with shared_ptr...
>
> What do you think of this?

Does it require different semantics, or could it be used as the backend for
shared_ptr (i.e. your code could be used as the basis for an improved
shared_ptr implementation)?

> BTW, I have never posted any code for anything to do with boost. What is the
> easiest way to go about doing that?

Post a proposal to the boost developers list. See
http://lists.boost.org/mailman/listinfo.cgi/boost for subscription details, or
news://news.gmane.org/gmane.comp.lib.boost.devel for a newsgroup interface.

> Seriously here, would I have to teach Boost Consulting how to use the stuff
> I post to Boost? I believe they would have to learn about it all on their
> own... Their private...

If you make a submission, you would have to explain to the reviewers of your
code how it is intended to be used, and provide documentation.

> I am thinking about donating one more thing:
>
> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/e0c011baf08844c4
>
>
> Once you augment this with a "backoff-and-retry" algorithm, it will provide
> Boost with a lock-based software transactional memory (STM) library. It will
> also give it a deadlock-free two-phased locking algorithm...

Sounds good.

Chris Thomasson

unread,

Oct 7, 2006, 2:47:55 AM10/7/06

to

"Anthony Williams" <anthon...@yahoo.com> wrote in message

news:zmcdg3...@yahoo.com...

> "Chris Thomasson" <cri...@comcast.net> writes:
>> "Anthony Williams" <anthon...@yahoo.com> wrote in message
>> news:lko387...@yahoo.com...
>>> "Chris Thomasson" <cri...@comcast.net> writes:
>>>> "Anthony Williams" <anthon...@yahoo.com> wrote in message
>>>> news:wt7ov1...@yahoo.com...
>>>>> "Chris Thomasson" <cri...@comcast.net> writes:

[...]

>>> Contributing to the boost project
>>> is done because you think other people will find your contribution
>>> useful,
>>> and
>>> you want to share with the world. The fact that people are using your
>>> code
>>> should be its own reward.
>>
>> Humm... I have contributed tons of algorithms, techniques and a library
>> to
>> this group:
>>
>> http://appcore.home.comcast.net/
>
> Yes, and it's much appreciated.

Thank You! :^)

>> I will "donate" one of those contributions to Boost:
>>
>> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/4e8717d3bcdedfe9

Here is an IA-32 based prototype of my idea; it's currently in "pre-alpha"
status:

http://appcore.home.comcast.net/vzoom/refcount/

[...]

> Does it require different semantics, or could it be used as the backend
> for
> shared_ptr (i.e. your code could be used as the basis for an improved
> shared_ptr implementation)?

I would keep things separate, for now... I am not skilled enough with the
current shared_ptr implementation. I have had some discussions' with Peter
Dimov in the past. We were discussing the "classic" race-condition that
exists in shared_ptr. shared_ptr falls victim to this infamous
race-condition simply because its not a "truly" based on any sort of
atomically thread-safe counting implementation. FWIW, here are some links
that briefly describe the "specific" race-condition:

http://groups.google.com/group/comp.programming.threads/msg/e4058f431de68e45

http://groups.google.com/group/comp.programming.threads/msg/9be1ea81f6020d5b

http://groups.google.com/group/comp.programming.threads/msg/93983d56839ffb45

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/3c763698da537d8f

My refcount library can provide a solution to the "nasty" race-condition!

;)

>> BTW, I have never posted any code for anything to do with boost. What is
>> the
>> easiest way to go about doing that?
>
> Post a proposal to the boost developers list. See
> http://lists.boost.org/mailman/listinfo.cgi/boost for subscription
> details, or
> news://news.gmane.org/gmane.comp.lib.boost.devel for a newsgroup
> interface.

Humm... I "think" that I am a "fairly" decent C and Assembly programmer; not
too sure about C++! I have created experimental in-house lock-free libraries
a couple of years ago, however, they seemed to have little "C-isms"
scattered around here and there.

;^(...

Once I finalize the user api that refcount exports, perhaps you can help me
review some of my C++ wrappers for my refcount C API?

lol!

I am thinking about wrapping this low-level C reference counting api up with
a smart pointer, or something... Humm...

;^)

One quick question, how do the Boost reviewers would generally react to a
C++ synchronization library submission that has implementation details which
can largely be based in strict C and Assembly Language?

I was think that as long as I declare a C++ API for my submission, and stick
to it, then it should not really matter what the implementation details
actually look like. The details could be based entirely on POSIX mutexs, or
be specialized for the architecture (e.g., my refcount wrt ia-32 and SPARC).
Need to think some of the details and finish up all of my tedious bug
hunting activities; so far so good!

;)

[...]

>> Seriously here, would I have to teach Boost Consulting how to use the
>> stuff
>> I post to Boost? I believe they would have to learn about it all on their
>> own... Their private...
>
> If you make a submission, you would have to explain to the reviewers of
> your
> code how it is intended to be used, and provide documentation.

So Boost Consulting can make $$$ when it teaches one of its "private"
clients how to make efficient use of "anything" in the boost library? I am
committed to providing Boost with my refcount library, but, this aspect kind
of makes me weary about posting some of my more advanced techniques to
Boost... Humm...

>> I am thinking about donating one more thing:
>>
>> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/e0c011baf08844c4
>>
>>
>> Once you augment this with a "backoff-and-retry" algorithm, it will
>> provide
>> Boost with a lock-based software transactional memory (STM) library. It
>> will
>> also give it a deadlock-free two-phased locking algorithm...
>
> Sounds good.

I was thinking that this could work out fine... I would have to implement it
fully, and augment it with the proper generalized form of two-phased
locking... I could even use the refcount library to help this multi_mutex
thing along... Humm...

Chris Thomasson

unread,

Oct 24, 2006, 7:36:52 AM10/24/06

to

"Anthony Williams" <anthon...@yahoo.com> wrote in message

news:64fizx...@yahoo.com...

> "Chris Thomasson" <cri...@comcast.net> writes:
>> "Chris Thomasson" <cri...@comcast.net> wrote in message
>> news:XOadnTwGRr4oTo3Y...@comcast.com...
>>> "Anthony Williams" <anthon...@yahoo.com> wrote in message
>>> news:8xkf3m...@yahoo.com...

[...]

>> Here is another one of my algorithms that might be of interest to you:
>
> These look really similar to my implementation, except that they use an
> event
> for each waiting thread in order to signal the wakeups.
>
>> IIRC, once you apply the fixes that I outlined in that thread, the
>> implementation should work...

[...]

> Yes, just like yours; it's the same algorithm.

Except mine has the ability to fast-path (e.g., *only performs a CAS on the
fast-path) the signal/broadcast function when there are no waiters... And I
don't have to create a new wait node every time a thread needs to wait; I
link the thread structure directly...

Other than that, they are the same.

* - this condition variable can actually avoid the atomic operation on the
fast-paths of signalers/broadcasters:

http://groups.google.com/group/comp.programming.threads/msg/b9e80d0b4de447ea

This is a high-performance condition variable implementation indeed. You can
even use to to speed up virtually any suboptimal mutex and/or condition
variable out there:

http://groups.google.com/group/comp.programming.threads/msg/ac7dabf396a4d3b9

Humm...