RCU+SMR

Joe Seigh

unread,

Jun 17, 2005, 3:55:35 PM6/17/05

to

I created an api for the platform specific processor or thread status
to make porting easier. I have it working on OS X and Linux. Needs
a rewrite since it's a hack job by this point.

Depending on the testcase parms I can get up to 32000 reads/sec/thread on
Linux. So it's clear it's fast enough that all I'm seeing on a single
processor system is scheduler artifacts.

Now all I have to do (after the rewrite of course) is find a killer
app to write for it.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Joe Seigh

unread,

Jun 20, 2005, 5:20:00 PM6/20/05

to

Joe Seigh wrote:
>
> Now all I have to do (after the rewrite of course) is find a killer
> app to write for it.
>

Damned if I can think of anything that isn't major like a database
or something. I also thought of writing a thread profiling tool,
there don't appear to be any out there, but that seems like a lot
of work. The system interfaces don't give much useful information
here and of course they're all different for each platform.

I can think of lot's of non threaded programs to write.

Joe Seigh

unread,

Jun 27, 2005, 8:12:48 PM6/27/05

to

Yeow! I've been using hazard pointers like RCU. That plainly does
not work. Different rules. Time for a testcase rewrite. :)

Joe Seigh

unread,

Jun 28, 2005, 8:49:06 AM6/28/05

to

Joe Seigh wrote:
> Yeow! I've been using hazard pointers like RCU. That plainly does
> not work. Different rules. Time for a testcase rewrite. :)
>

Interesting. Reference counting wouldn't be an option here as it
would be for normal hazard pointers. Plus arbitrarily requiring
reference counting turns reference counting into a problem rather
than a solution. I think I have something that will work though.

Joe Seigh

unread,

Jun 28, 2005, 3:42:08 PM6/28/05

to

Joe Seigh wrote:
> Joe Seigh wrote:
>
>> Yeow! I've been using hazard pointers like RCU. That plainly does
>> not work. Different rules. Time for a testcase rewrite. :)
>>
>
> Interesting. Reference counting wouldn't be an option here as it
> would be for normal hazard pointers. Plus arbitrarily requiring
> reference counting turns reference counting into a problem rather
> than a solution. I think I have something that will work though.
>

w00t! (works, I think) No testcase rewrite needed.

If C++ goes with deferred reference counting, they're going to run into
the same issues RCU+SMR would have with reference counting so it wouldn't
even be very feasible.

Lefty Bigfoot

unread,

Jun 29, 2005, 9:05:49 PM6/29/05

to

On Tue, 28 Jun 2005 14:42:08 -0500, Joe Seigh wrote
(in article <uaGdnTfU2as...@comcast.com>):

> Joe Seigh wrote:
>> Joe Seigh wrote:
>>> Joe Seigh wrote:
>>>> Joe Seigh wrote:
>>>>> Joe Seigh wrote:
>>>>>> Joe Seigh wrote:
>>>>>>> Joe Seigh wrote:
>>>>>>>> Joe Seigh wrote:
>>>>>>>>> Joe Seigh wrote:
>>>>>>>>>> Joe Seigh wrote:

Is this a blog site, or a newsgroup?

Joe Seigh

unread,

Jun 29, 2005, 9:58:19 PM6/29/05

to

Less off topic posts that way. I don't think
the newbies would appreciate replies about
advanced lock-free programing to their elementary
questions. :)

Jomu

unread,

Jun 30, 2005, 5:50:39 PM6/30/05

to

Any references to code? Some sample lock-free queue or something like?

Joe Seigh

unread,

Jun 30, 2005, 9:06:52 PM6/30/05

to

Jomu wrote:
> Any references to code? Some sample lock-free queue or something like?
>

There's some lock-free synchronization stuff here.
http://atomic-ptr-plus.sourceforge.net/
The testcase isn't meant to be an example. It uses a doubly
linked circular queue for some reason. I've changed that to
a singly linked list but that's not out there yet. The
RCU+SMR package that's out there doesn't work for linked
lists, just simple objects. I have a version that does
work for linked lists but it's not released yet. I'm
think of using it in some suitable application if I can
find one. Plus it's still in the middle of what we quaintly
call refactoring, plus documentation, plus some form of
quasi-formal proofs. And I'm currently having fun trying
to make sense of the scheduler artifacts I'm seeing on
a single processor system.

And as you can see by this topic practically being a monologue,
there's no huge demand for this kind of stuff, so I'm taking
advantage of the more leisurely schedule that allows.

David Hopwood

unread,

Jun 30, 2005, 9:48:33 PM6/30/05

to

Joe Seigh wrote:
> Jomu wrote:
>
>> Any references to code? Some sample lock-free queue or something like?

[...]

> And as you can see by this topic practically being a monologue,
> there's no huge demand for this kind of stuff, so I'm taking
> advantage of the more leisurely schedule that allows.

What would be really useful is a lock-free *multiple-writers*,
single-reader FIFO message queue. An implementation of a message-
passing concurrent language uses this kind of queue for all
communication between processes, and would benefit from an
implementation with very low overhead on multiprocessors. The
queue entries have to be traceable by a GC.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Joe Seigh

unread,

Jun 30, 2005, 10:35:07 PM6/30/05

to

What's wrong with Maged Michael's lock-free FIFO and variations
on that? The only problem I'm aware of is that if you use the
GC to avoid the ABA problem you can trash the GC if you have
a high rate of usage.

Way back, I came up with a scheme where you make the queue a lock-free
LIFO stack using compare and swap. The single reader just grabbed the
entire queue using compare and swap, reversed order making it FIFO
and worked off of that util it was empty and then repeated the whole
process by grabbing the shared LIFO stack again.

Joe Seigh

unread,

Jul 1, 2005, 8:43:24 AM7/1/05

to

Joe Seigh wrote:
>
> Way back, I came up with a scheme where you make the queue a lock-free
> LIFO stack using compare and swap. The single reader just grabbed the
> entire queue using compare and swap, reversed order making it FIFO
> and worked off of that util it was empty and then repeated the whole
> process by grabbing the shared LIFO stack again.
>

Actually, the original was just a doubly linked list with the reader
filling in the back links as needed. You don't even need to ever
dequeue from the queue anchor if you just leave that last element
on the queue as a place holder. There's some cache hit to the reader
doing the reverse linking but probably less overall than some complicated
lock-free fifo queue algorithms. If you had spare hardware threads,
you could have one doing the reverse linking in parallel with the
reader thread and improve latency for the reader thread.

Haven't done that kind of stuff in a while. Usually, I'm doing single
writer, multiple reader lock-free queues for multi-casting.

Randy Howard

unread,

Jul 1, 2005, 5:41:21 PM7/1/05

to

On Thu, 30 Jun 2005 20:06:52 -0500, Joe Seigh wrote
(in article <3cqdnbE9cbI...@comcast.com>):

> And as you can see by this topic practically being a monologue,
> there's no huge demand for this kind of stuff, so I'm taking
> advantage of the more leisurely schedule that allows.

I wonder if lack of portability is partly to blame?

What would be the minimal number of extensions to C (or
whatever) that would allow this to be implemented portably by a
compiler supporting, for example 'C99+atomic'?

If you can quantify that, you might actually be able to make a
case for language extensions so that lock-free algorithms can be
used in cross-platform production projects.

I would be very interested in anything along those lines, but as
long as it is either OS-centric, CPU-centric, or both, I don't
see much value in it for the type of software I am involved
with.

--
Randy Howard (2reply remove FOOBAR)

Joe Seigh

unread,

Jul 1, 2005, 6:53:37 PM7/1/05

to

Randy Howard wrote:
> On Thu, 30 Jun 2005 20:06:52 -0500, Joe Seigh wrote
> (in article <3cqdnbE9cbI...@comcast.com>):
>
>>And as you can see by this topic practically being a monologue,
>>there's no huge demand for this kind of stuff, so I'm taking
>>advantage of the more leisurely schedule that allows.
>
>
> I wonder if lack of portability is partly to blame?
>
> What would be the minimal number of extensions to C (or
> whatever) that would allow this to be implemented portably by a
> compiler supporting, for example 'C99+atomic'?

That topic pops up from time to time. It doesn't go anywhere
due to lack of interest. Very few people use that kind of stuff.
There are some api's out there like atomic_ops here
http://www.hpl.hp.com/research/linux/qprof/
but it's fairly limited. If you have to add your own
you might as well do the whole api on your own as it's not
that much more work. If I used atomic_ops, I'd still have
to provide my own atomic ops package.

>
> If you can quantify that, you might actually be able to make a
> case for language extensions so that lock-free algorithms can be
> used in cross-platform production projects.

It's mostly a chicken and egg question. Present usage doesn't
justify putting in the features.

>
> I would be very interested in anything along those lines, but as
> long as it is either OS-centric, CPU-centric, or both, I don't
> see much value in it for the type of software I am involved
> with.
>

You have to start somewhere and I can't exactly port to hardware
I don't have. So probably ia32-64 on Linux for the time being.
Apple is getting off of ppc and I'm not going to buy propietary
hardware just for the privelege of porting to it. Ditto with
Sun. Also, their list of Solaris qualified hardware is unusable.
You'd likely end up getting hardware you really didn't want,
only to find it doesn't work with Solaris. I can't afford
to do that kind of experimentation.

You get what you pay for. Free software gets little or no
support.

The other thing here is api's are a really hard sell. I did
a low level api once, a program that translated boolean
queries into decision trees. Almost no interest apart
from my own use. Tacked it onto a program to scan files,
a sort of flat file database and it became hugely popular.
More than 10 years after I stopped supporting it and after I
left IBM, someone tracked me down to get it enhanced (couldn't,
I didn't have the source anymore.) So that's why I'm looking
for a moderate sized project to do instead.

Randy Howard

unread,

Jul 8, 2005, 10:22:57 AM7/8/05

to

Joe Seigh wrote
(in article <wqydnT0dXax...@comcast.com>):

> Randy Howard wrote:
>> On Thu, 30 Jun 2005 20:06:52 -0500, Joe Seigh wrote
>> (in article <3cqdnbE9cbI...@comcast.com>):
>>
>>> And as you can see by this topic practically being a monologue,
>>> there's no huge demand for this kind of stuff, so I'm taking
>>> advantage of the more leisurely schedule that allows.
>>
>>
>> I wonder if lack of portability is partly to blame?
>>
>> What would be the minimal number of extensions to C (or
>> whatever) that would allow this to be implemented portably by a
>> compiler supporting, for example 'C99+atomic'?
>
> That topic pops up from time to time. It doesn't go anywhere
> due to lack of interest. Very few people use that kind of stuff.
> There are some api's out there like atomic_ops here
> http://www.hpl.hp.com/research/linux/qprof/
> but it's fairly limited.

I'll take a look at it, thanks.

>> If you can quantify that, you might actually be able to make a
>> case for language extensions so that lock-free algorithms can be
>> used in cross-platform production projects.
>
> It's mostly a chicken and egg question. Present usage doesn't
> justify putting in the features.

Fair enough.

>> I would be very interested in anything along those lines, but as
>> long as it is either OS-centric, CPU-centric, or both, I don't
>> see much value in it for the type of software I am involved
>> with.
>
> You have to start somewhere and I can't exactly port to hardware
> I don't have.

Understood, for the record, I was not shooting darts at your
approach, just stating why I personally can't use it as is.
Hopefully you didn't misread that as an attack or something
along those lines.

> Apple is getting off of ppc and I'm not going to buy propietary
> hardware just for the privelege of porting to it.

Well, I can't argue with that approach. I just bought a G5 dual
powermac (about a month before the switch). First Apple
computer in my life, having avoided them for decades, then boom,
I get shot between the eyes. :-)

Oh well, at least its a decent box and gives me another testbed
to play with, despite its long-term boat anchor status.

> You get what you pay for. Free software gets little or no
> support.

It seems to depend upon the software's usage (in volume). A
very few OSS projects get massive support because they appeal to
a very broad audience, but they are the exception.

> The other thing here is api's are a really hard sell.

True. That's why I was wondering about standardization. With
all the pushing for SMP, dual-core, multi-core, (and the
abomination known as "hyperthreading", you would think some of
the big names (like Intel and AMD) would be pushing for ways to
get the most out of new processor features and parallelism
without everyone having to roll their own.

Oh well, I guess it will be a long time before this sort of
thing is widely used enough to become more generally applicable
to portable software.

Joe Seigh

unread,

Jul 8, 2005, 12:30:58 PM7/8/05

to

Randy Howard wrote:
> Joe Seigh wrote
> (in article <wqydnT0dXax...@comcast.com>):

...

>>The other thing here is api's are a really hard sell.
>
>
> True. That's why I was wondering about standardization. With
> all the pushing for SMP, dual-core, multi-core, (and the
> abomination known as "hyperthreading", you would think some of
> the big names (like Intel and AMD) would be pushing for ways to
> get the most out of new processor features and parallelism
> without everyone having to roll their own.

Intel and AMD are basically hardware companies and are run by
hardware not software types. They know HPC (high performance
computing) and parallel programming. Anything outside of that
is pretty much alien to them. They also have a view on ROI
(return on investment) that's hardware rather than software
oriented. We're still running Cobol programs written in the 60's
because it cost too much to rewrite them. Intel and AMD will
stick an extension on just long enough to sell a few million
processors and then drop it once they got their money back.
So while you could do some really neat things with hyperthreading,
no one will because it will be gone by the time the OSes could
provide an api to exploit it.

A good example of what you can do with hyperthreading is
producer/consumer. One of the problems with consumer/producer
is either the consumers or producers will get ahead of the other.
In normal threading, you use condition variables to suspend one
and resume it when the other catches up. That suspend and resume
adds overhead. One of the things people try to avoid this is use
lock-free but if you spin wait you're wasting cycles. With hyperthreading,
the faster thread can just spin wait using the PAUSE instruction and
the cpu resources are given to the other thread to make it run faster.
As soon as it catches up, the waiting thread resumes again with no
overhead. There is no way to do the equivalent in normal threading
except to say that condition variables act that way sometimes except
with more overhead.

It's not just Intel and AMD. Sun Research has a whole group that
does synchronization, mostly lock-free. But I have yet to see any
evidence that Sun exploits any of this. And they're coming out with
Niagara which will have up to 8 cores, 4 threads per core. You
would think.

>
> Oh well, I guess it will be a long time before this sort of
> thing is widely used enough to become more generally applicable
> to portable software.
>

Well, a killer app will speed things up. But people expect bigger
and more feature rich apps. It's not like the old days when you
could write something from scratch in a few days or weeks.

I was going to suggest to Intel that they list and sponsor projects
that would benefit from being "parallelized" much like IBM and
Google did with some projects. But I'm not sure they know what
projects those would be.

Randy Howard

unread,

Jul 8, 2005, 1:58:35 PM7/8/05

to

Joe Seigh wrote
(in article <hNqdnWVPdPY...@comcast.com>):

> Randy Howard wrote:

>>
>> That's why I was wondering about standardization. With
>> all the pushing for SMP, dual-core, multi-core, (and the
>> abomination known as "hyperthreading", you would think some of
>> the big names (like Intel and AMD) would be pushing for ways to
>> get the most out of new processor features and parallelism
>> without everyone having to roll their own.
>
> Intel and AMD are basically hardware companies and are run by
> hardware not software types.

True, but I think they (especially Intel) realize that the
profit is in systems, not isolating hardware from software.
IOW, recognizing that they go together. Intel has had a lot of
effort in software over the years, including compilers, IDF,
software development forums on their website, reams of
documentation, white papers, example code, etc. They have been
harping on threading and the need to develop parallel solutions
for a long time.

> They know HPC (high performance computing) and parallel programming.

Exactly. If they have the energy to push something like OpenMP
and their own compilers, then working on pushing lock-free
adoption doesn't seem /that/ unlikely.

> So while you could do some really neat things with hyperthreading,
> no one will because it will be gone by the time the OSes could
> provide an api to exploit it.

I hope so. Hyperthreading is beneficial in a very few special
cases, but in general it sucks compared to SMP or multi-core
solutions. I suppose keeping it around inside of multi-core
processors isn't horrible, as long as you can turn it off. In
some cases, it makes things much worse enabled than disabled.

> It's not just Intel and AMD.

True, but they represent the bulk of the market. If they go in
a direction, then it can happen. If they don't, then the idea
will die on the vine, more often than not.

> Sun Research has a whole group that does synchronization, mostly
> lock-free. But I have yet to see any evidence that Sun exploits any of this.

If sun doesn't use it themselves, you can't expect any of their
customers or developer community to do so. They *could* push
it, but they've got bigger fish to fry right now trying to keep
Java alive in the onslaught of alternatives coming along now.

> And they're coming out with Niagara which will have up to 8 cores,
> 4 threads per core. You would think.

I hope those folks in India and China are up on threaded
programming. :-)

>> Oh well, I guess it will be a long time before this sort of
>> thing is widely used enough to become more generally applicable
>> to portable software.
>>
> Well, a killer app will speed things up.

True. I've yet to hear of any commercial or popular OSS
application bragging about the massive performance benefits
achieved through the use of lock-free techniques. Real-world
examples convince people more than research papers. Perhaps it
is being widely used but not talked about, or I just missed the
claims?

Joe Seigh

unread,

Jul 8, 2005, 3:50:16 PM7/8/05

to

Randy Howard wrote:
> Joe Seigh wrote
> (in article <hNqdnWVPdPY...@comcast.com>):
>
>

>>Sun Research has a whole group that does synchronization, mostly
>>lock-free. But I have yet to see any evidence that Sun exploits any of this.
>
>
>
> If sun doesn't use it themselves, you can't expect any of their
> customers or developer community to do so. They *could* push
> it, but they've got bigger fish to fry right now trying to keep
> Java alive in the onslaught of alternatives coming along now.
>

Hmm, they're been pretty busy. Take a look at the USPO published
applications page and search on in/herlihy-maurice. They may be
trying to patent any reference counting or lock-free algorithm that
uses GC ("value recyling") to avoid the ABA problem. It's pretty
bad when you are an expert in the area of application and can't figure
out what the claims actually say. Herlihy's ROP terminology doesn't
help either.

...

>
>>>Oh well, I guess it will be a long time before this sort of
>>>thing is widely used enough to become more generally applicable
>>>to portable software.
>>>
>>
>>Well, a killer app will speed things up.
>
>
> True. I've yet to hear of any commercial or popular OSS
> application bragging about the massive performance benefits
> achieved through the use of lock-free techniques. Real-world
> examples convince people more than research papers. Perhaps it
> is being widely used but not talked about, or I just missed the
> claims?
>
>

Well, Linux used RCU to speed it up. SCO initially claimed IBM
stole it from them but I think SCO had no idea what RCU actually
was.

ANTs Software has a "lock-free" database. They specialize in putting
out a press release every month about their "lock-free" database.
They have two patents,
6,760,726 System and method of managing concurrent operations on linked lists
6,763,447 Lock-free list for use with computer system utilizing FIFO queue for tracking order of various sublists

They look like a LIFO stack using double wide cas with a counter field and a circular FIFO
queue of some sort to me but what do I know. I guess databases aren't the kiler app. :)

A major open source file server is hitting enough contention on some session tables
that they need a lock-free reader/writer solution.

There's a debugger for multi-threaded programs that's experiencing serious
contention on some AVL trees they use so verify valid storage accesses and
they need a lock-free reader/writer solution.

These last two aren't desperate enough yet to actually try any lock-free
solutions yet but if they're having problems it means others are as well.

David Hopwood

unread,

Jul 8, 2005, 7:31:13 PM7/8/05

to

Joe Seigh wrote:

> David Hopwood wrote:
>
>> What would be really useful is a lock-free *multiple-writers*,
>> single-reader FIFO message queue. An implementation of a message-
>> passing concurrent language uses this kind of queue for all
>> communication between processes, and would benefit from an
>> implementation with very low overhead on multiprocessors. The
>> queue entries have to be traceable by a GC.
>
> What's wrong with Maged Michael's lock-free FIFO and variations
> on that?

Do you mean <http://www.cs.rochester.edu/u/michael/PODC96.html>, or
something more recent? Possibly there's nothing wrong with it. However,
the correctness argument seems very informal, and I don't see any
discussion of what memory consistency model is being assumed (apart
from an assumption that the ABA problem does not occur).

It also looks expensive: two CAS operations for enqueue in the best
case, and one CAS to dequeue each item (rather than being able to
dequeue multiple items at once, which should be possible since we
have a single reader). Although a message queue must in general
handle multiple writers, most such queues will in fact only be written
by a single writer; the problem is that the language implementation
doesn't know that statically. So it should be possible to do much
better than Michael's algorithms for this application.

> The only problem I'm aware of is that if you use the
> GC to avoid the ABA problem you can trash the GC if you have
> a high rate of usage.
>
> Way back, I came up with a scheme where you make the queue a lock-free
> LIFO stack using compare and swap. The single reader just grabbed the
> entire queue using compare and swap, reversed order making it FIFO

> and worked off of that until it was empty and then repeated the whole

> process by grabbing the shared LIFO stack again.

These comments prompted me to think of another way of implementing message
passing that greatly reduces the need for (and hence the overhead of)
synchronization. I'll post it here with the subject "Implementing message
passing on multiprocessors".

--
David Hopwood <david.nosp...@blueyonder.co.uk>

David Hopwood

unread,

Jul 8, 2005, 9:04:06 PM7/8/05

to

Randy Howard wrote:
> Joe Seigh wrote:

>>So while you could do some really neat things with hyperthreading,
>>no one will because it will be gone by the time the OSes could
>>provide an api to exploit it.
>
> I hope so. Hyperthreading is beneficial in a very few special
> cases, but in general it sucks compared to SMP or multi-core
> solutions. I suppose keeping it around inside of multi-core
> processors isn't horrible, as long as you can turn it off. In
> some cases, it makes things much worse enabled than disabled.

I'm not a fan of hyperthreading either, but putting it in processors
and then turning it off is the worst of all worlds: the processor
designers have to make compromises to support HT and spend time
debugging it, that could have been spent more profitably with other
optimizations and with verification.

(On the subject of excessive complexity in processors, watch this
talk by Bob Colwell, formerly of Intel:
<http://stanford-online.stanford.edu/courses/ee380/040218-ee380-100.asx>.)

> True. I've yet to hear of any commercial or popular OSS
> application bragging about the massive performance benefits
> achieved through the use of lock-free techniques.

I very much doubt that there is enough benefit from using lock-free
techniques in *applications* to justify the complexity cost, and the
effort required to learn how to use them correctly. They belong in
operating systems and language implementations, where these costs are
amortized over many apps.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

David Schwartz

unread,

Jul 8, 2005, 9:33:44 PM7/8/05

to

"David Hopwood" <david.nosp...@blueyonder.co.uk> wrote in message
news:aoFze.110124$Vo6.1...@fe3.news.blueyonder.co.uk...

> I very much doubt that there is enough benefit from using lock-free
> techniques in *applications* to justify the complexity cost, and the
> effort required to learn how to use them correctly. They belong in
> operating systems and language implementations, where these costs are
> amortized over many apps.

And where portability is less important since you already know you're
going to have to reimplement and tweak.

DS

Chris Thomasson

unread,

Jul 9, 2005, 8:42:49 AM7/9/05

to

> Way back, I came up with a scheme where you make the queue a lock-free
> LIFO stack using compare and swap. The single reader just grabbed the
> entire queue using compare and swap, reversed order making it FIFO
> and worked off of that util it was empty and then repeated the whole
> process by grabbing the shared LIFO stack again.

Yep. That works very well. I have used the simple scheme for a while as
well. I used an atomic exchange for the consumers:

typedef struct n_
{
struct n_ *next; /* = 0 */
void *state;

} n_t;

typedef struct q_
{
n_t *front; /* = 0 */

} q_t;

static q_t q;

// producer

n_t *o, *c, *n = malloc( sizeof( *n ) );

o = q.front;

do
{
n->next = o;
c = o;
o = InterlockedCompareExchangePointer
( &q.front,
n,
c );
}

while ( c != o );

// consumer

n_t *f = InterlockedExchangePointer( &q.front, 0 );

if ( ! f ) { return EAGAIN; }

// reverse order of f, and process.

LIFO producers are inherently not affected by aba. On the other hand, LIFO
consumers are affected, but only if there is a "compare" type method being
used for synchronization. Using an atomic exchange to set the stack anchor
to zero can get around aba wrt LIFO lock-free stacks. Was that similar to
your initial solution?

--
http://appcore.home.comcast.net/
(portable lock-free data-structures)

Chris Thomasson

unread,

Jul 9, 2005, 8:57:00 AM7/9/05

to

>> Way back, I came up with a scheme where you make the queue a lock-free
>> LIFO stack using compare and swap. The single reader just grabbed the
>> entire queue using compare and swap, reversed order making it FIFO
>> and worked off of that until it was empty and then repeated the whole
>> process by grabbing the shared LIFO stack again.
>
> These comments prompted me to think of another way of implementing message
> passing that greatly reduces the need for (and hence the overhead of)
> synchronization.

Joes idea of converting an IBM freelist into a queue does reduce the number
instructions normally needed for a lock-free queue. Consumers only need an
atomic exchange, and producers need a normal cas loop. Look on my site for
my implementation of a zero-overhead queue that does not need any
synchronization instructions, just a dependant-load for the consumer and a
store barrier for the producer. However, it is a single-producer/consumer
queue. But you can use it in various distributed message passing schemes to
help reduce their synchronization overhead.

> I'll post it here with the subject "Implementing message
> passing on multiprocessors".

There was one scheme I used that avoided synchronization instructions for a
certain type of thread-to-thread message passing. The basic scheme was that
each producer thread had its own zero-overhead single-producer/consumer
queue that would be sampled by dedicated consumer threads. This allowed me
to create a fairly efficient "many-producer/few-consumer" distributed
messaging passing system.

Joe Seigh

unread,

Jul 9, 2005, 9:16:19 AM7/9/05

to

David Hopwood wrote:

> Randy Howard wrote:
>
>> True. I've yet to hear of any commercial or popular OSS application
>> bragging about the massive performance benefits achieved through the
>> use of lock-free techniques.
>
>
> I very much doubt that there is enough benefit from using lock-free
> techniques in *applications* to justify the complexity cost, and the
> effort required to learn how to use them correctly. They belong in
> operating systems and language implementations, where these costs are
> amortized over many apps.
>

Depends on the api and usage patterns that you create for the lock-free
stuff. Some people believe threading is too difficult to use and
should not be used by applications. Some have gone a far to say that
they would fire any of their employees that used multi-threading. I've
even seen that in job ads, that the candidate should know how to not
use threading. A lot of trauma cases out there apparently. :)

Joe Seigh

unread,

Jul 9, 2005, 9:52:39 AM7/9/05

to

Chris Thomasson wrote:
>>Way back, I came up with a scheme where you make the queue a lock-free
>>LIFO stack using compare and swap. The single reader just grabbed the
>>entire queue using compare and swap, reversed order making it FIFO
>>and worked off of that util it was empty and then repeated the whole
>>process by grabbing the shared LIFO stack again.
>
>
> Yep. That works very well. I have used the simple scheme for a while as
> well. I used an atomic exchange for the consumers:

[...]

> LIFO producers are inherently not affected by aba. On the other hand, LIFO
> consumers are affected, but only if there is a "compare" type method being
> used for synchronization. Using an atomic exchange to set the stack anchor
> to zero can get around aba wrt LIFO lock-free stacks. Was that similar to
> your initial solution?
>
>

Yes. The null object has special properties. Remember null is an arbitrary
definition. It doesn't have to be (void *)0 (except for C++). You should
be able to make it an arbitrary designated address. In other words any
designated placeholder object will work as well.

David Hopwood

unread,

Jul 11, 2005, 4:23:04 PM7/11/05

to

Joe Seigh wrote:
> David Hopwood wrote:
>> Randy Howard wrote:
>>
>>> True. I've yet to hear of any commercial or popular OSS application
>>> bragging about the massive performance benefits achieved through the
>>> use of lock-free techniques.
>>
>> I very much doubt that there is enough benefit from using lock-free
>> techniques in *applications* to justify the complexity cost, and the
>> effort required to learn how to use them correctly. They belong in
>> operating systems and language implementations, where these costs are
>> amortized over many apps.
>>
> Depends on the api and usage patterns that you create for the lock-free
> stuff.

I don't think it does, really. Note that I was talking about using lock-free
techniques directly in applications; not using "infrastructure" libraries
that happen to be implemented using lock-free stuff, but that do not expose
it to their clients. My argument is that lock-free primitives are inherently
too complex and error-prone to be exposed directly for general-purpose
application programming, no matter how well-designed the API.

> Some people believe threading is too difficult to use and
> should not be used by applications. Some have gone a far to say that
> they would fire any of their employees that used multi-threading. I've
> even seen that in job ads, that the candidate should know how to not
> use threading.

Concurrency is essential; threading isn't.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Joe Seigh

unread,

Jul 11, 2005, 5:28:59 PM7/11/05

to

David Hopwood wrote:
> Joe Seigh wrote:
>
>> David Hopwood wrote:
>>
>>> Randy Howard wrote:
>>>
>>>> True. I've yet to hear of any commercial or popular OSS application
>>>> bragging about the massive performance benefits achieved through the
>>>> use of lock-free techniques.
>>>
>>>
>>> I very much doubt that there is enough benefit from using lock-free
>>> techniques in *applications* to justify the complexity cost, and the
>>> effort required to learn how to use them correctly. They belong in
>>> operating systems and language implementations, where these costs are
>>> amortized over many apps.
>>>
>> Depends on the api and usage patterns that you create for the lock-free
>> stuff.
>
>
> I don't think it does, really. Note that I was talking about using
> lock-free
> techniques directly in applications; not using "infrastructure" libraries
> that happen to be implemented using lock-free stuff, but that do not expose
> it to their clients. My argument is that lock-free primitives are
> inherently
> too complex and error-prone to be exposed directly for general-purpose
> application programming, no matter how well-designed the API.

My experience is that people who believe their low level api are too complex
for normal programmers to use and and only allow useage through a high level
api, generally suck at designing api's. If the low level api was available,
at least someone else who isn't so bad at api design could try to design a
better one. And they certainly couldn't design a new or different api if the
low level api wasn't available. If the only api available is a lock-free
linked list and someone wants to design a lock-free tree, they won't be able
to do one.

Java is a good example here. There are a lot of classes that are final and
not extendable simply because the class author doesn't trust users to extend
their class even though that is the whole *point* of OOP. There are design
patterns whose main purpose in being seems to be to get around such restrictions
which is certainly an irony.

David Hopwood

unread,

Jul 11, 2005, 7:25:38 PM7/11/05

to

Joe Seigh wrote:
> David Hopwood wrote:
>> Joe Seigh wrote:
>>> David Hopwood wrote:
>>>
>>>> I very much doubt that there is enough benefit from using lock-free
>>>> techniques in *applications* to justify the complexity cost, and the
>>>> effort required to learn how to use them correctly. They belong in
>>>> operating systems and language implementations, where these costs are
>>>> amortized over many apps.
>>>>
>>> Depends on the api and usage patterns that you create for the lock-free
>>> stuff.
>>
>> I don't think it does, really. Note that I was talking about using
>> lock-free techniques directly in applications; not using "infrastructure"
>> libraries that happen to be implemented using lock-free stuff, but that do not
>> expose it to their clients. My argument is that lock-free primitives are
>> inherently too complex and error-prone to be exposed directly for general-

>> purpose application programming, no matter how well-designed the API.

>
> My experience is that people who believe their low level api are too
> complex for normal programmers to use and and only allow useage through
> a high level api, generally suck at designing api's.

My experience is that people generally suck at designing APIs. This does not
imply that all APIs suck.

> If the low level api was available, at least someone else who isn't so bad
> at api design could try to design a better one. And they certainly couldn't
> design a new or different api if the low level api wasn't available.

That's not the case: they can always drop down to assembler. Or they can reuse
the low-level guts of an open-source implementation of some other high-level
API.

> If the only api available is a lock-free linked list and someone wants to
> design a lock-free tree, they won't be able to do one.

Yes they will, by modifying the lock-free linked list code to implement a tree.
"Black box" reuse doesn't work, but reuse based on modification of source
code has been quite successful.

If the lock-free linked list implementation doesn't come with source, throw
it away; you can't rely on it.

> Java is a good example here. There are a lot of classes that are final and
> not extendable simply because the class author doesn't trust users to
> extend their class even though that is the whole *point* of OOP.

'final' was included in Java for security reasons, because some of the
system classes would have been exploitable by subclasses (it can be argued
that this was addressing the symptom and not the cause of other security
design problems).

If an API designer is making basic mistakes in the use of 'final', they're
probably making other basic mistakes, and their code would not have been
reusable anyway.

--
David Hopwood <david.nosp...@blueyonder.co.uk>

Joe Seigh

unread,

Jul 11, 2005, 8:29:52 PM7/11/05

to

David Hopwood wrote:
> Joe Seigh wrote:
>
>> David Hopwood wrote:
>>

>>> Joe Seigh wrote:
>>>
>> If the low level api was available, at least someone else who isn't so
>> bad
>> at api design could try to design a better one. And they certainly
>> couldn't
>> design a new or different api if the low level api wasn't available.
>
>
> That's not the case: they can always drop down to assembler. Or they can
> reuse
> the low-level guts of an open-source implementation of some other
> high-level
> API.

If documentation is available for it somehow like for Linux futex. It's
an unstable api so you can't depend on it as if it were an official api.
I've done an async safe eventcount using it. If I wanted to port it I
would have to figure out low level api's for other platforms though.

>
>> If the only api available is a lock-free linked list and someone wants to
>> design a lock-free tree, they won't be able to do one.
>
>
> Yes they will, by modifying the lock-free linked list code to implement
> a tree.
> "Black box" reuse doesn't work, but reuse based on modification of source
> code has been quite successful.
>

Assuming the linked list hasn't been hard coded into the low level api.
Unfortunately, I have seen low level api's screwed up that way.

Has there been any studies that show that lack of documentation for a low level
api has never inhibited anyone from using the low level api? How would they
measure that? Source isn't really a substitute. Otherwise I wouldn't have
to document RCU+SMR which would save a lot of trouble for me.