Why does Microsoft never document threading right?

David Schwartz

unread,

Sep 29, 2006, 5:03:51 AM9/29/06

to

I'm sorry to vent, but here's what has me irritated right now:
http://msdn.microsoft.com/msdnmag/issues/03/12/CriticalSections/default.aspx

"A critical section is a lightweight mechanism for allowing only one
thread at a time to execute a given piece of code. Critical sections
are typically used when modifying global data such as collection
classes. Unlike events, mutexes, and semaphores, which are also used
for multithreaded synchronization, critical sections don't always
perform an expensive control transfer to kernel mode. As you'll see
later, acquiring an unheld critical section requires, in effect, just a
few memory modifications and is very quick. Only if you try to acquire
an already-held critical section does it jump into kernel mode. The
downside to this lightweight behavior is that critical sections can
only be used to synchronize threads within the same process."

The article goes on to give internal details in how the
CRITICAL_SECTION structure is implemented internally on Windows.

Could someone please explain to the authors of this article the the
Windows CRITICAL_SECTION structure is not a critical section but is in
fact a mutex.

For example, if you have *two* instances of the same class, each may
have their own 'CRITICAL_SECTION' object to protect the data in the
class. Guess what, the exact same code that runs with the critical
section acquired can run concurrently in two threads, one holding each
object's critical section.

DS

adeb...@club-internet.fr

unread,

Sep 29, 2006, 9:03:14 AM9/29/06

to

David Schwartz wrote:
> I'm sorry to vent, but here's what has me irritated right now:
> http://msdn.microsoft.com/msdnmag/issues/03/12/CriticalSections/default.aspx
>
> "A critical section is a lightweight mechanism for allowing only one
> thread at a time to execute a given piece of code. Critical sections
> are typically used when modifying global data such as collection
> classes. Unlike events, mutexes, and semaphores, which are also used
> for multithreaded synchronization, critical sections don't always
> perform an expensive control transfer to kernel mode. As you'll see
> later, acquiring an unheld critical section requires, in effect, just a
> few memory modifications and is very quick. Only if you try to acquire
> an already-held critical section does it jump into kernel mode. The
> downside to this lightweight behavior is that critical sections can
> only be used to synchronize threads within the same process."

What's wrong with this description?

> The article goes on to give internal details in how the
> CRITICAL_SECTION structure is implemented internally on Windows.
>
> Could someone please explain to the authors of this article the the
> Windows CRITICAL_SECTION structure is not a critical section but is in
> fact a mutex.

It is *implemented* using a mutex, but it is a critical section....
Anyway, mutexes and critical sections have very similar semantics, but
different performances (a mutex can be used between processes, but a
critsec is genreally faster... exactly for the reason explained in the
article).

>
> For example, if you have *two* instances of the same class, each may
> have their own 'CRITICAL_SECTION' object to protect the data in the
> class. Guess what, the exact same code that runs with the critical
> section acquired can run concurrently in two threads, one holding each
> object's critical section.

Yes, and so what? If the 2 instances of the same class each had their
own mutex, the behaviour would be the same...

ONE critical section protect ONE set of ressources (or one portion of
code). If you've got 2 different critical sections, that's another
story.... Anyway, this is exactly the same thing for mutexes.

Arnaud

Joe Seigh

unread,

Sep 29, 2006, 10:59:55 AM9/29/06

to

adeb...@club-internet.fr wrote:
> David Schwartz wrote:
>
>>I'm sorry to vent, but here's what has me irritated right now:
>>http://msdn.microsoft.com/msdnmag/issues/03/12/CriticalSections/default.aspx
>>
>>"A critical section is a lightweight mechanism for allowing only one
>>thread at a time to execute a given piece of code.

>

> What's wrong with this description?
>
>

Only if all threads in question have locked the same instance of
a critical section. Anyway code isn't the issue, shared data is.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Chris Friesen

unread,

Sep 29, 2006, 11:41:20 AM9/29/06

to

David Schwartz wrote:
> I'm sorry to vent, but here's what has me irritated right now:
> http://msdn.microsoft.com/msdnmag/issues/03/12/CriticalSections/default.aspx
>

> "... As you'll see

> later, acquiring an unheld critical section requires, in effect, just a
> few memory modifications and is very quick. Only if you try to acquire
> an already-held critical section does it jump into kernel mode."

Interesting...this sounds exactly like futexes on linux. I wonder which
came first.

Chris

John Hickin

unread,

Sep 29, 2006, 11:57:20 AM9/29/06

to

"Chris Friesen" <cbf...@mail.usask.ca> wrote in message
news:12hqfl0...@corp.supernews.com...

It may not much matter if one works and the other doesn't :-)

Johnson Hart made an empirical study of CRITICAL_SECTION and found that it
degenerated very badly in performance when there was significant contention
on multi-processor machines.

Regards, John.

Joe Seigh

unread,

Sep 29, 2006, 12:18:00 PM9/29/06

to

Fast pathing has been around forever.

Chris Thomasson

unread,

Sep 29, 2006, 3:59:38 PM9/29/06

to

"John Hickin" <hic...@nortelnetworks.com> wrote in message
news:efjfp0$pgq$1...@zcars129.ca.nortel.com...

> "Chris Friesen" <cbf...@mail.usask.ca> wrote in message
> news:12hqfl0...@corp.supernews.com...
>> David Schwartz wrote:

[...]

> It may not much matter if one works and the other doesn't :-)
>
> Johnson Hart made an empirical study of CRITICAL_SECTION and found that it
> degenerated very badly in performance when there was significant
> contention
> on multi-processor machines.

Yes. They used to handoff ownership of the mutex across context-switches:

http://groups.google.com/group/microsoft.public.win32.programmer.kernel/msg/525a1a8ac1a46f0f?hl=en

I don't think that they do this anymore...

Chris Thomasson

unread,

Sep 29, 2006, 4:01:05 PM9/29/06

to

http://groups.google.com/group/comp.programming.threads/msg/5354a3b6b0753932
(Neill Clift from Microsoft...)

David Schwartz

unread,

Sep 29, 2006, 5:47:46 PM9/29/06

to

adeb...@club-internet.fr wrote:

> > "A critical section is a lightweight mechanism for allowing only one
> > thread at a time to execute a given piece of code. Critical sections
> > are typically used when modifying global data such as collection
> > classes. Unlike events, mutexes, and semaphores, which are also used
> > for multithreaded synchronization, critical sections don't always
> > perform an expensive control transfer to kernel mode. As you'll see
> > later, acquiring an unheld critical section requires, in effect, just a
> > few memory modifications and is very quick. Only if you try to acquire
> > an already-held critical section does it jump into kernel mode. The
> > downside to this lightweight behavior is that critical sections can
> > only be used to synchronize threads within the same process."

> What's wrong with this description?

It massively confuses "critical sections", a computer science term that
refers to code that can only be run by one thread at a time and the
Windows "CRITICAL_SECTION" object, which is a fast mutex (and has
nothing to do with protecting code but protects data).

> > The article goes on to give internal details in how the
> > CRITICAL_SECTION structure is implemented internally on Windows.

> > Could someone please explain to the authors of this article the the
> > Windows CRITICAL_SECTION structure is not a critical section but is in
> > fact a mutex.

> It is *implemented* using a mutex, but it is a critical section....

No, it is a mutex. If it was a critical section, then it would do what
the article says critical sections do, allow only one thread at a time
to run a particular section of code. Instead it does what mutexes do,
which is prevent two threads from accessing the same data, whether
they're running the same code or different code.

> Anyway, mutexes and critical sections have very similar semantics, but
> different performances (a mutex can be used between processes, but a
> critsec is genreally faster... exactly for the reason explained in the
> article).

Yes, if by "mutex" you mean the Windows structure named "Mutex" and by
"critical section" you mean the Windows structure named
"CRITICAL_SECTION". But this has nothing to do with actual critical
sections which allow only one thread at a time to access a particular
section of code.

The article sometimes means a critical section, it sometimes means the
CRITICAL_SECTION class, and sometimes means both at the same time in
different parts of the same sentennce. You'll sometimes see sentences
like:

A critical section does (something true about a critical section, but
false about the Windows CRITICAL_SECTION object) and (something true
about the CRITICAL_SECTION object but false about critical sections in
general).

The article is a *disaster* and feeds into one of the more serious
multithreading errors that people make -- confusing the protection of
data with concurrent execution of the same code.

> > For example, if you have *two* instances of the same class, each may
> > have their own 'CRITICAL_SECTION' object to protect the data in the
> > class. Guess what, the exact same code that runs with the critical
> > section acquired can run concurrently in two threads, one holding each
> > object's critical section.

> Yes, and so what? If the 2 instances of the same class each had their
> own mutex, the behaviour would be the same...

Right, so what the article says is false, the windows CRITICAL_SECTION
object does not allow only one thread at time to execute the same code.
It protects data, not code.

> ONE critical section protect ONE set of ressources (or one portion of
> code). If you've got 2 different critical sections, that's another
> story.... Anyway, this is exactly the same thing for mutexes.

Right, that's why the article is totally wrong. Windows
CRITICAL_SECTION objects have nothing whatsoever to do with ensuring
only one thread at a time executes some particular section of code.

DS

David Schwartz

unread,

Sep 29, 2006, 5:52:41 PM9/29/06

to

Chris Friesen wrote:

> David Schwartz wrote:

It's a superficial similarity. Futexes are really not at all
(internally) like Windows critical sections. That's too bad, because
you can use the Linux futex internals to create fast and efficient
condition variables and reader/writer locks, and that's very hard to do
on Windows.

DS

Chris Thomasson

unread,

Sep 29, 2006, 6:16:57 PM9/29/06

to

"David Schwartz" <dav...@webmaster.com> wrote in message
news:1159566761.7...@i42g2000cwa.googlegroups.com...

> Chris Friesen wrote:
>> David Schwartz wrote:
>
>> > I'm sorry to vent, but here's what has me irritated right now:
>> > http://msdn.microsoft.com/msdnmag/issues/03/12/CriticalSections/default.aspx
>
>> > "... As you'll see
>> > later, acquiring an unheld critical section requires, in effect, just a
>> > few memory modifications and is very quick. Only if you try to acquire
>> > an already-held critical section does it jump into kernel mode."
>
>> Interesting...this sounds exactly like futexes on linux. I wonder which
>> came first.
>
> It's a superficial similarity. Futexes are really not at all
> (internally) like Windows critical sections.

Remember to keep in mind that. Futexs are tricky:

http://people.redhat.com/~drepper/futex.pdf

;)

> That's too bad, because
> you can use the Linux futex internals to create fast and efficient
> condition variables

Curious... Are you using Joe Seighs futex-based eventcount from
atomic-ptr-plus? Or, have you invented something else?

> and reader/writer locks,

fast-pathed rw-locks are fairly easy to create on Windows:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/5ffb0ed46a4bcb4b

You can use semaphores for the slow paths...

>and that's very hard to do on Windows.

Not really. I think I have to disagree here...

FWIW, I have a native eventcount implementation for Windows that allows me
to implement heavily fast-pathed synchronization primitives:

http://appcore.home.comcast.net/vzoom/pthread/

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/aa8c62ad06dbb380

http://appcore.home.comcast.net/appcore/src/ac_eventcount_algo1_c.html

http://appcore.home.comcast.net/appcore/include/ac_eventcount_algo1_h.html

My native windows eventcount implementation even has atomic operation free
fast-paths... That is about as high-performance as you are going to get wrt
synchronization primitive implementations'...

Any thoughts?

David Hopwood

unread,

Sep 30, 2006, 3:42:53 PM9/30/06

to

David Schwartz wrote:
> I'm sorry to vent, but here's what has me irritated right now:
> http://msdn.microsoft.com/msdnmag/issues/03/12/CriticalSections/default.aspx

This is not platform documentation. It's a magazine article.

(I don't disagree that similar criticisms apply to some of the actual
documentation.)

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Chris Friesen

unread,

Sep 30, 2006, 11:04:49 PM9/30/06

to

Chris Thomasson wrote:
> "David Schwartz" <dav...@webmaster.com> wrote in message
> news:1159566761.7...@i42g2000cwa.googlegroups.com...

>>It's a superficial similarity. Futexes are really not at all

>>(internally) like Windows critical sections.
>
>
> Remember to keep in mind that. Futexs are tricky:
>
> http://people.redhat.com/~drepper/futex.pdf
>
> ;)

Yeah, naked futexes are a pain in the butt. Which is why its nice that
with NPTL the pthreads reader-writer locks and condition variables are
implemented internally using futexes, so that the end-user doesn't need
to deal with them.

Chris

Joe Seigh

unread,

Oct 1, 2006, 8:25:05 AM10/1/06

to

Well, they're a low level interface so they don't have as easy to use.
Still they're a lot trickier than they have to be for implementing
things at that level.

Christopher Layne

unread,

Oct 2, 2006, 4:31:45 AM10/2/06

to

Chris Thomasson wrote:
> My native windows eventcount implementation even has atomic operation free
> fast-paths... That is about as high-performance as you are going to get wrt
> synchronization primitive implementations'...
>
> Any thoughts?

Sure, is there anything you *don't* have?

Joe Seigh

unread,

Oct 2, 2006, 8:34:56 AM10/2/06

to

From the SCOOL Panel session "Are locks dead?" from about
a year ago, a quote by Satnam Singh of Microsoft

Non-blocking code:
we’ve tried it: it’s mind bending!

What would you do if you developed a "mind bending" technique?
Just not say anything about it? I know, c.p.t. is just filled
with people who are overly modest about their accomplishments. :)

The real problem with lock-free is that it's about scalability.
Scalability means large programs for the most part. So it's
hard for someone to demonstrate that without writing something
huge like a database or something. Written an enterprise class
database lately? It took you more than a weekend, didn't it?

Michael K. O'Neill

unread,

Oct 2, 2006, 2:08:12 PM10/2/06

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:1159566466.2...@i42g2000cwa.googlegroups.com...

>
> Right, that's why the article is totally wrong. Windows
> CRITICAL_SECTION objects have nothing whatsoever to do with ensuring
> only one thread at a time executes some particular section of code.
>
> DS
>

I think that this statement is overly broad and is therefore subject to
criticism on the basis that it's not always true.

Of course you are correct that Microsoft's CRITICAL_SECTION's are all about
protecting data. But consider the following very-simple example, which is
executed by identical ones of multiple threads to add a new handle to a
vector of open handles:

EnterCriticalSection( &g_CriticalSection );
g_Handles.push_back( newHandle );
LeaveCriticalSection( &g_CriticalSection );

Sure, the main point here is to protect shared data. But it's also true
that the critical section has the effect of "ensuring only one thread at a
time executes some particular section of code", which here is the
"push_back" code.

OTOH, I think that the article's focus on protection of code execution,
rather than on the protection of shared data, is misleading to those
learning about multithreading, and in the long run will cause them needless
confusion.

Mike

David Schwartz

unread,

Oct 3, 2006, 2:50:22 PM10/3/06

to

Michael K. O'Neill wrote:

> "David Schwartz" <dav...@webmaster.com> wrote in message
> news:1159566466.2...@i42g2000cwa.googlegroups.com...

> > Right, that's why the article is totally wrong. Windows
> > CRITICAL_SECTION objects have nothing whatsoever to do with ensuring
> > only one thread at a time executes some particular section of code.

> I think that this statement is overly broad and is therefore subject to

> criticism on the basis that it's not always true.

How so?

> Of course you are correct that Microsoft's CRITICAL_SECTION's are all about
> protecting data. But consider the following very-simple example, which is
> executed by identical ones of multiple threads to add a new handle to a
> vector of open handles:
>
> EnterCriticalSection( &g_CriticalSection );
> g_Handles.push_back( newHandle );
> LeaveCriticalSection( &g_CriticalSection );

If someone defined a car as "a device you can eat breakfast on" and I
took issue with it, would it be fair for you to respond with a picture
of a person eating breakfast on a car? Yes, you can use a Windows
CRITICAL_SECTION as a critical section, just as you can use any mutex
as a critical section. But a mutex *is* *not* a critical section. It's
a different synchronization mechanism entirely, even though it could be
used to implement a critical section if you wanted to do that with it.

> Sure, the main point here is to protect shared data. But it's also true
> that the critical section has the effect of "ensuring only one thread at a
> time executes some particular section of code", which here is the
> "push_back" code.

It can have that affect if it's used for that purpose. But that's not
what it does inherently.

> OTOH, I think that the article's focus on protection of code execution,
> rather than on the protection of shared data, is misleading to those
> learning about multithreading, and in the long run will cause them needless
> confusion.

Exactly. There are several specific serious misconceptions that this
article flat out says as if they were fact.

DS

Chris Thomasson

unread,

Oct 3, 2006, 7:58:58 PM10/3/06

to

"Chris Friesen" <cbf...@mail.usask.ca> wrote in message

news:12huc2h...@corp.supernews.com...

> Chris Thomasson wrote:
>> "David Schwartz" <dav...@webmaster.com> wrote in message
>> news:1159566761.7...@i42g2000cwa.googlegroups.com...

[...]

>> http://people.redhat.com/~drepper/futex.pdf

[...]

> Yeah, naked futexes are a pain in the butt.

;)

> Which is why its nice that with NPTL the pthreads reader-writer locks and
> condition variables are implemented internally using futexes, so that the
> end-user doesn't need to deal with them.

I remember browsing some NPTL code and noticing that the rw-mutexs were
implemented with mutexs and condvars internally. I also remember something
about the condition variable implementation; it was lacking an atomic
operation free fast-path in the signal/broadcast without waiters case...

Has this all changed? Could you point me to a quick description of the
algorithms for the "new" NPTL rw-mutexs and condvars? I wonder if they
finally got their act together...

David Schwartz

unread,

Oct 3, 2006, 10:01:03 PM10/3/06

to

Chris Thomasson wrote:

> Remember to keep in mind that. Futexs are tricky:
>
> http://people.redhat.com/~drepper/futex.pdf

Sounds like going to a lot of effort to make something very simple
hard, just because you can. I've used futexes, and they're no harder to
use than other synchronization ultra-primitives.

DS

Christopher Layne

unread,

Oct 3, 2006, 10:17:10 PM10/3/06

to

Chris Thomasson wrote:

> Has this all changed? Could you point me to a quick description of the
> algorithms for the "new" NPTL rw-mutexs and condvars? I wonder if they
> finally got their act together...

Yea.. those NPTL guys.. such idiots right?

Why don't you write to Ingo and Ulrich and ask them if they've "gotten their
act together" yet and are ready to follow your "new model", otherwise can you
spare the group for a bit? It's getting just a tad bit predictable around
here.

Chris Thomasson

unread,

Oct 4, 2006, 2:15:22 AM10/4/06

to

"Christopher Layne" <cla...@com.anodized> wrote in message
news:1159928314_2151@news-west.n...

> Chris Thomasson wrote:
>
>> Has this all changed? Could you point me to a quick description of the
>> algorithms for the "new" NPTL rw-mutexs and condvars? I wonder if they
>> finally got their act together...
>
> Yea.. those NPTL guys.. such idiots right?

So, you think their idiots...Why? I don't agree with you at all. I think
your way out of line here!

I was being a *bit sarcastic as well...

> Why don't you write to Ingo and Ulrich and ask them if they've "gotten
> their
> act together" yet and are ready to follow your "new model",

Actually... My signaling/broadcasting model, inspired from previous
discussions' with Joe Seigh, involves something called an eventcount. I was
experimenting one day and discovered a technique for implementing
eventcounts that allow for atomic operation free fast-pathed
signal/broadcasting... I have never made direct use of a condition variable
in any of my code again, after I created the following algorithm:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/aa8c62ad06dbb380

This is 100% compatible with POSIX mutexs and virtually any lock-free
algorithm that whishes to add support for
waiting/timeouts/signaling/broadcasting. It works for lock-free algorithms
because it has ultra low-overhead signaling. A futex is kind of like an
eventcount... You can use eventcounts for lock-free programming... I can use
futexs for some lock-free programming... The problems I have with NPTL is
that an implementation I was looking through had rw-mutexs and condition
variables that were based on locks, and had no real low-overhead
fast-pathing... *They could do MUCH better! Got a problem with that? Any
questions?

> otherwise can you spare the group for a bit?

I guess you should killfile me then. Oh well...

:O

> It's getting just a tad bit predictable around here.

Please clarify that statement???

:^/

Chris Thomasson

unread,

Oct 4, 2006, 2:19:42 AM10/4/06

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:1159927263.5...@k70g2000cwa.googlegroups.com...

> Chris Thomasson wrote:
>> Remember to keep in mind that. Futexs are tricky:
>> http://people.redhat.com/~drepper/futex.pdf
>
> Sounds like going to a lot of effort to make something very simple
> hard, just because you can.

You kind of have to take this stuff into consideration...

> I've used futexes,
> and they're no harder to
> use than other synchronization ultra-primitives.

Exactly how are you using futexs to implement condition variables? Are you
using something like Joe created for atomic-ptr-plus? What about rw-mutexs?

Joe Seigh

unread,

Oct 4, 2006, 6:54:34 AM10/4/06

to

Chris Thomasson wrote:
> "Chris Friesen" <cbf...@mail.usask.ca> wrote in message
> news:12huc2h...@corp.supernews.com...
>
>

>>Which is why its nice that with NPTL the pthreads reader-writer locks and
>>condition variables are implemented internally using futexes, so that the
>>end-user doesn't need to deal with them.
>
>
> I remember browsing some NPTL code and noticing that the rw-mutexs were
> implemented with mutexs and condvars internally. I also remember something
> about the condition variable implementation; it was lacking an atomic
> operation free fast-path in the signal/broadcast without waiters case...
>
> Has this all changed? Could you point me to a quick description of the
> algorithms for the "new" NPTL rw-mutexs and condvars? I wonder if they
> finally got their act together...
>
>

http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/nptl/sysdeps/pthread/?cvsroot=glibc

Doesn't look like it, unless the "new" stuff is somewhere else.
rwlocks are suboptimal anyway. Why bother?

Joe Seigh

unread,

Oct 4, 2006, 7:05:00 AM10/4/06

to

They're just these guys, see?

Anyway, anyone who has worked in those types of development environment
knows that people get a vested interest in doing things a certain way
and don't change. It has nothing to do with what's the best way to do
something or not. Even if you can demonstrate there's a better way.

Joe Seigh

unread,

Oct 4, 2006, 7:09:47 AM10/4/06

to

The tricky part was trying to make something that was designed
for one thing, mutexes, work for other things as well, rather
than design more appropiate primitives.

David Schwartz

unread,

Oct 4, 2006, 9:50:11 AM10/4/06

to

Chris Thomasson wrote:

> > I've used futexes,
> > and they're no harder to
> > use than other synchronization ultra-primitives.

> Exactly how are you using futexs to implement condition variables? Are you
> using something like Joe created for atomic-ptr-plus? What about rw-mutexs?

Because condition variables don't work very well on Win32, I don't use
them. They're an ugly compromise anyway.

I use various types of locks (pure sleep, spin-sleep, reader/writer and
a few others) as well as events of various types (wake one, wake all,
gate).

I do have one structure that's kind of like a mutex and a condition
variable in one class. We do implement it with futexes on Linux. It
looks kind of like this:

void Lock(void)
{
while(InterlockedExchange(&locked, 1)!=0)
{
InterlockedIncrement(&waiters);
while(InterlockedGet(&locked)!=0) // wait until locked is 0
sys_futex(&locked, FUTEX_WAIT, 1, NULL);
InterlockedDecrement(&waiters);
}
}

bool TryLock(void)
{
return InterlockedExchange(&locked, 1)==0;
}

void Unlock(void)
{
__asm__ __volatile__("" : : : "memory");
InterlockedSet(&locked, 0);
if(InterlockedGet(&waiters)!=0) // any waiters?
sys_futex(&locked, FUTEX_WAKE, 1, NULL);
}

void Wake(int count=INT_MAX)
{ // best to call without lock
InterlockedIncrement(&generation);
sys_futex(&generation, FUTEX_WAKE, count, NULL);
}

void Block(void)
{ // must have lock
LONG gen=InterlockedGet(&generation);
Unlock();
sys_futex(&generation, FUTEX_WAIT, gen, NULL);
Lock();
}

Wake(1) is roughly equivalent to pthread_cond_signal and wake() is
roughly equivalent to pthread_cond_broadcast. Note that trying to add
timed waits to this would make things a bit more complex -- you'd have
to make sure you didn't consume a Wake(1) when you timed out, possibly
just by signalling an intentional possibly-spurious wakeup.

DS

Christopher Layne

unread,

Oct 4, 2006, 5:17:42 PM10/4/06

to

Chris Thomasson wrote:

> So, you think their idiots...Why? I don't agree with you at all. I think
> your way out of line here!
>
> I was being a *bit sarcastic as well...

I was obviously being sarcastic - as both Ingo and Ulrich know what they are
doing - and have proven so.

> Actually... My signaling/broadcasting model, inspired from previous
> discussions' with Joe Seigh, involves something called an eventcount. I was

"my, my my"

> experimenting one day and discovered a technique for implementing
> eventcounts that allow for atomic operation free fast-pathed
> signal/broadcasting... I have never made direct use of a condition variable
> in any of my code again, after I created the following algorithm:

It's not the concepts that are even the issue. It's that you're selfishly
using this newsgroup as a self-advertisement avenue for your library at any
chance you can get. If it's not a reference to some past post you made where
once you pontificated on the difference between lockless free algorithims for
shower nozzles or how do lock a mutex faster than an adolescent can crash
their first car - it's something else. But fact of the matter is - I don't
take away anything positive when you post - just the same tiring 'look at
me!' nonsense.

Reminds me of another guy who used to grandstand, I mean post, in here..
SenderX.

Joe Seigh

unread,

Oct 4, 2006, 5:35:18 PM10/4/06

to

Christopher Layne wrote:
>
> It's not the concepts that are even the issue. It's that you're selfishly
> using this newsgroup as a self-advertisement avenue for your library at any
> chance you can get. If it's not a reference to some past post you made where
> once you pontificated on the difference between lockless free algorithims for
> shower nozzles or how do lock a mutex faster than an adolescent can crash
> their first car - it's something else. But fact of the matter is - I don't
> take away anything positive when you post - just the same tiring 'look at
> me!' nonsense.
>
> Reminds me of another guy who used to grandstand, I mean post, in here..
> SenderX.

IIRC, Douglas Schmidt used to flog ACE quite a bit back when he was trying
to get it started. It seemed all his usenet postings were an excuse to
promote ACE.

Chris Thomasson

unread,

Oct 4, 2006, 9:11:28 PM10/4/06

to

"Christopher Layne" <cla...@com.anodized> wrote in message

news:1159996749_19@news-west.n...

> Chris Thomasson wrote:
>
>> So, you think their idiots...Why? I don't agree with you at all. I think
>> your way out of line here!
>>
>> I was being a *bit sarcastic as well...
>
> I was obviously being sarcastic - as both Ingo and Ulrich know what they
> are
> doing - and have proven so.
>
>> Actually... My signaling/broadcasting model, inspired from previous
>> discussions' with Joe Seigh, involves something called an eventcount. I
>> was
>
> "my, my my"

:)

>> experimenting one day and discovered a technique for implementing
>> eventcounts that allow for atomic operation free fast-pathed
>> signal/broadcasting... I have never made direct use of a condition
>> variable
>> in any of my code again, after I created the following algorithm:
>
> It's not the concepts that are even the issue. It's that you're selfishly
> using this newsgroup as a self-advertisement avenue for your library at
> any
> chance you can get.

Are you talking about AppCore or vZOOM?

> If it's not a reference to some past post you made where

> once you pontificated on the difference between lockless free algorithms'

> for
> shower nozzles or how do lock a mutex faster than an adolescent can crash
> their first car - it's something else.

I try to help people all the time in this group. It just so happens that I
am skilled in the art of multi-threading, so, if I have been there and done
that, I WILL POINT PEOPLE TO MY PRIOR ART!

What's wrong with that? So what if I enjoy helping others design highly
scaleable code; I must be ignorant or something right? As many people know,
I have posted many different algorithms to this group; including one of the
fastest memory allocators out there... Of course if you don't seem to get
it, or if it deeply offends you... Well:

TOO BAD!

You can lead a horse to water; can you get him to drink?

:^)

> But fact of the matter is - I don't
> take away anything positive when you post - just the same tiring 'look at
> me!' nonsense.

I assume that you would be happy/appeased if I never share any techniques
that I invent with this group? I wonder if anybody else would be happy with
that... I think I have posted some fairly good stuff in this group; what say
you?

> Reminds me of another guy who used to grandstand, I mean post, in here..
> SenderX.

That's me... But, that was the "old" SenderX. You know: Lock-Free, or, DIE!

He is still locked up in his box-and-chains...

:)

Trust me, if I let him out of the box, and completely remove all of his
restraints, I fully expect David Butenhof to swoop down again and take care
of business... David has experience with the old SenderX... I don't think he
liked him very much...

;)

However, I do think I have gained/earned his respect over the past couple of
years... Apparently, you still have a problem with me... Well, that's what a
killfile is for; correct?

P.S.

Are you trolling here? If so, I am sorry for feeding you...

Chris Thomasson

unread,

Oct 4, 2006, 9:31:08 PM10/4/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:q-Wdnb7U9MytD77Y...@comcast.com...

> Chris Thomasson wrote:
>> "Chris Friesen" <cbf...@mail.usask.ca> wrote in message
>> news:12huc2h...@corp.supernews.com...

[...]

> http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/nptl/sysdeps/pthread/?cvsroot=glibc
>
> Doesn't look like it

Damn! That's the same stuff I looked at in the past... The first thing I see
when I look at condition variable signals/broadcast is:

/* Make sure we are alone. */
lll_mutex_lock (cond->__data.__lock);

> unless the "new" stuff is somewhere else.

I hope so!

;)

> rwlocks are suboptimal anyway.

Big time!

> Why bother?

Interesting... You have brought this up in that past, remember NPTL2?

Nobody listened! Humm...

Christopher Layne

unread,

Oct 4, 2006, 9:39:26 PM10/4/06

to

Chris Thomasson wrote:
> I assume that you would be happy/appeased if I never share any techniques
> that I invent with this group? I wonder if anybody else would be happy with
> that... I think I have posted some fairly good stuff in this group; what say
> you?

It's called balance.

Chris Friesen

unread,

Oct 5, 2006, 12:28:17 AM10/5/06

to

Chris Thomasson wrote:
> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:q-Wdnb7U9MytD77Y...@comcast.com...

>>http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/nptl/sysdeps/pthread/?cvsroot=glibc

>>
>>Doesn't look like it

> Damn! That's the same stuff I looked at in the past... The first thing I see
> when I look at condition variable signals/broadcast is:
>
>
> /* Make sure we are alone. */
> lll_mutex_lock (cond->__data.__lock);

Which appears to be implemented using a futex. So, it uses atomic ops,
but doesn't require a syscall.

Chris

Chris Thomasson

unread,

Oct 5, 2006, 1:25:23 AM10/5/06

to

"Chris Friesen" <cbf...@mail.usask.ca> wrote in message

news:12i92f2...@corp.supernews.com...

The point I am trying to get across is that condition variable
implementations' don't need to use any atomic operations when they
signal/broadcast to a condition variable or lock-free algorithm, whenever
its hits a "there are not waiters case/path" (e.g., they hit the fast-path).
This happens to be fairly compatible with "virtually any existing lock-free
algorithm", simply because it doesn't require the "user" algorithm to
maintain any "extra state", and it knows when to hit the fast-path in a
higly "opportunistic manner"; look at Joe Seighs usage pattern, simple, and
effective. Its contained in a previous link I posted.

The eventcount can simply rely on a "user" lock-free algorithms "natural"
points (e.g., queue empty/full, msg limits, ect...). You can implement an
eventcount, then implement any type of condition variable/monitor-thing that
outperforms the "current" NPTL. If NPTL is indeed currently using the code
that Joe links to, then my assertion hold true for now. However, you can
apply my eventcount algorithm to the code that Joe linked to, and get some
very nice performance gains... Interesting:

You can choose to simply wrap the "current" NPTL with my algorithm:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/aa8c62ad06dbb380

it doesn't matter if "current" NPTL uses atomic ops for condvars... You wrap
them with my eventcount, and you "BAN the current NPTL" to the
"slow-paths"... If they insis't on using atomic-ops when they don't have to,
they catch my eye, and make me uses them as "kernel-based" waitsets...
Slow-paths indeed.

>> /* Make sure we are alone. */
>> lll_mutex_lock (cond->__data.__lock);

Applying load to this lock for a condition variable is not truly needed.
Remember to keep in mind that "any extra" atomic operations are a no go wrt
many different type of existing lock-free algorithms'... An eventcount can
provide exact same functionality as condition variables, then can be used
with POSIX mutexs, and then can be used with lock-free algorithms. They
depend on CAS, memory barriers and POSIX condition variable/mutex for
slow-path. Simple and fairly portable.

Any thoughts?

Chris Thomasson

unread,

Oct 5, 2006, 1:42:12 AM10/5/06

to

"David Schwartz" <dav...@webmaster.com> wrote in message

news:1159969811....@c28g2000cwb.googlegroups.com...

>
> Chris Thomasson wrote:
>
>> > I've used futexes,
>> > and they're no harder to
>> > use than other synchronization ultra-primitives.
>
>> Exactly how are you using futexs to implement condition variables? Are
>> you
>> using something like Joe created for atomic-ptr-plus? What about
>> rw-mutexs?
>
> Because condition variables don't work very well on Win32, I don't use
> them. They're an ugly compromise anyway.

Not when you use a "native" windows eventcount. I have created one. I think
are other techniques you can use besides mine, whatever. You can emulate
true robust kernel objects with PDR, and "still provide atomic operation
free fast-paths for signals/broadcasts in windows 95 - Vista". Period.

Well, you can add timed waits to an eventcount without adding extra
overhead. A waiter has to real state to set. It just reads the eventcount,
and compares to the previous "load/get". So timeouts/cancellations are very
straightforward and simple. A futex is basically an eventcount. I highly
suggest that you look for Joe Seighs eventcount that is based on futexs. I
think he posted a link to some source code on this group in the past.

Your

> void Wake(int count=INT_MAX)

Uses this operation:

> InterlockedIncrement(&generation);

when it does not have to. Also, it hits a slow-path for every call to this
function:

> sys_futex(&generation, FUTEX_WAKE, count, NULL);

when it clearly does not have to... I think that your algorithm is correct,
but can be improved...

Chris Thomasson

unread,

Oct 5, 2006, 1:44:28 AM10/5/06

to

> Well, you can add timed waits to an eventcount without adding extra
> overhead. A waiter has to real state to set. It just reads the eventcount,

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I meant to say:

A waiter has NO real state to set.

Chris Thomasson

unread,

Oct 5, 2006, 1:56:38 AM10/5/06

to

Okay... I think I see what you are doing here.You seem to be using
eventcount functionality of sorts...

FWIW, you don't have to modify the generation (e.g., your eventcount logic)
in the Wake, unless you hit a slow-path. You read the generation or
eventcount, and set waiters-bit in Block(...) BEFORE the Unlock(...). You
transfer the slow-paths to WAITERS. You do not want SIGNALERS to hit the
slow-path ALL of the time.. A signaler should only have to load the
word-sized shared eventcount variable with the appropriate memory barriers,
and check for a waiters bit that should have been set by any potential
waiter (e.g. the eventcount read-and-set by the waiter in the
critical-section provided by the "user" mutex). The signaler incs the
monotonic event counter, and turns off the waiters-bit for slow-paths...
Very simple...

;)

> void Wake(int count=INT_MAX)
> { // best to call without lock
> InterlockedIncrement(&generation);
> sys_futex(&generation, FUTEX_WAKE, count, NULL);
> }

sys_futex is very expensive to call.

> void Block(void)
> { // must have lock
> LONG gen=InterlockedGet(&generation);
> Unlock();
> sys_futex(&generation, FUTEX_WAIT, gen, NULL);
> Lock();
> }

Okay. Your eventcount implementation can be improved. Instead of integrating
condition variable with eventcount, create the eventcount as a completely
separate object, then create the condition variable on top of it...

Any thoughts?

Chris Thomasson

unread,

Oct 5, 2006, 2:00:21 AM10/5/06

to

"Chris Thomasson" <cri...@comcast.net> wrote in message
news:Mb6dnXiyVeSNBLnY...@comcast.com...

Well, it has to set the waiters-bit, but that's after the fact of testing
the predicate of data held in a critical-section provided by an external
mutex, or the natural conditions in a lock-free algorithm. So, even the
waiters are fasted pathed in a sense. You stick to Joe Seighs usage
pattern...

Chris Thomasson

unread,

Oct 5, 2006, 2:07:22 AM10/5/06

to

"Christopher Layne" <cla...@com.anodized> wrote in message

news:1160012451_38@news-west.n...

Well, I know for sure that I don't want to make predictions or balances wrt
the "level-of-difficulty" when I post stuff to this group. Other than that,
I see your point.

David Schwartz

unread,

Oct 5, 2006, 3:53:09 AM10/5/06

to

Chris Thomasson wrote:

> > void Wake(int count=INT_MAX)
> > { // best to call without lock
> > InterlockedIncrement(&generation);
> > sys_futex(&generation, FUTEX_WAKE, count, NULL);
> > }
>
> sys_futex is very expensive to call.

Good point. This can be improved. The code will look a bit uglier, but
the performance will likely be better if no waiters is the most common
case. As it happens, by pure look, no waiters is very rare in the only
thing I actually use this for. But that was sheer luck on my part.

> > void Block(void)
> > { // must have lock
> > LONG gen=InterlockedGet(&generation);
> > Unlock();
> > sys_futex(&generation, FUTEX_WAIT, gen, NULL);
> > Lock();
> > }

> Okay. Your eventcount implementation can be improved. Instead of integrating
> condition variable with eventcount, create the eventcount as a completely
> separate object, then create the condition variable on top of it...

That was already done but I removed it for simplification.

DS

Joe Seigh

unread,

Oct 5, 2006, 8:00:05 AM10/5/06

to

That's only in the non-contention case, otherwise you do get syscalls.
rwlocks are almost by definition high contention otherwise a mutex
would work just as well and you wouldn't need rwlocks. You could
argue that since the internal mutex is only held for a very short
period of time that contention would be rare. But how do you know
that?

I've done a lock-free futex based condvar implementation and compared
it against the NPTL lock based one. IIRC, it was about 10% better than
the NPTL implmentation under high contention on a single processor. It
would likely be a lot higher on multiprocessor systems. So even mutexes
held for only a short interval can be a problem.

There are a lot of systems that have large global data structures, like
search trees or hash tables, where a large number of read only look ups
are done. And a lot of those don't bother to use rwlocks or mutexes to
ensure the look ups are performed correctly. Some of that is due to
laziness or ignorance. But a lot of them believe the look up is so
fast they won't hit a race condition and so the overhead of an rwlock is
not justified. And since your vendor's support engineers won't be able
to recreate the race condition that you experienced, they don't have to
fix it and can close the problem as non-reproducable. This is what
thinking NPTL's code is "good enough" gets you.

Chris Friesen

unread,

Oct 5, 2006, 11:05:22 AM10/5/06

to

Chris Thomasson wrote:

> The point I am trying to get across is that condition variable
> implementations' don't need to use any atomic operations when they
> signal/broadcast to a condition variable or lock-free algorithm, whenever
> its hits a "there are not waiters case/path" (e.g., they hit the fast-path).

Fair enough.

> You can choose to simply wrap the "current" NPTL with my algorithm:
>
>
> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/aa8c62ad06dbb380

>>> /* Make sure we are alone. */
>>> lll_mutex_lock (cond->__data.__lock);

> Applying load to this lock for a condition variable is not truly needed.

Just curious...have you proposed this modification to the libc guys?
Sounds like Joe may have had some interaction there...

Chris

Joe Seigh

unread,

Oct 5, 2006, 12:30:32 PM10/5/06

to

There was some discussion on LKML about condition variables. I don't think
the NPTL or libc guys hang out there.

As far as rwlocks are concerned, I don't think it's worth optimizing them,
given that they're inherently suboptimal to begin with. Lock-free is the
way to go.

The thing about rwlocks is a lot of people imprint in the standard textbook
example of an rwlock, the one with # readers, # writers, # of waiting readers,
and # waiting writers, with slight logic changes for read preference or write
preference, etc... That many variables and most people require an internal
mutex to keep track of what state everything is in since most people can't
deal with concurrency and asynchronicity. NPTL rwlock case in point.
"Make sure we are alone" is the same as "Stop everything so we can figure
out what's going on".