Somebody is commercializing lock-free

Joe Seigh

unread,

Apr 24, 2006, 7:31:59 AM4/24/06

to

That somebody is here
http://www.pss-ab.com/

Mostly write lock-free based on multi-word compare and swap if
I remember earlier papers by these guys correctly.

In the documents section the first paper
http://www.non-blocking.com/download/Multithread.pdf
has some interesting comments dissing the alternatives
(reader writer solutions) namely RCU and some "amateur"
implementations. "just running for any number of hours",
who would that be? :)

Also further down the list are a couple of papers on lock-free
reference counting.
http://www.non-blocking.com/download/GidPST05_LockFreeGC_TR.pdf
http://www.non-blocking.com/download/Sun04_WaitFreeRef_TR.pdf
They use hazard pointers to ensure the refcount increment is
done safely. They also seem to know about Sun's SLRC refcounting
scheme so the paper is fairly recent.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Chris Thomasson

unread,

Apr 24, 2006, 8:37:43 PM4/24/06

to

Joe Seigh wrote:
> That somebody is here
> http://www.pss-ab.com/
>
> Mostly write lock-free based on multi-word compare and swap if
> I remember earlier papers by these guys correctly.
>
> In the documents section the first paper
> http://www.non-blocking.com/download/Multithread.pdf
> has some interesting comments dissing the alternatives
> (reader writer solutions) namely RCU and some "amateur"
> implementations. "just running for any number of hours",
> who would that be? :)

Wow. They call us amateur... ?

;)

They really seem to have missed the whole point of RCU-SMR...

They fuckin% trash RCU, and us "amateurs". They must not be paying too
much attention... They use SMR, and they trash RCU!!!!!!!

fuc2$@#$@#$

I love one of his points:

Amateur implementations: Might not be really lock-free or wait-free....

There are some benefits of blending lock-free and lock-based algorithms
together. They must be like the "old" SenderX... Lock-free or Die! I am
pissed off now...

lol!

I notice I don't see any kind of zero-overhead eventcount in their
package... Oh, yeah... An eventcount might block, therefore it would be
unacceptable for their 100% lock-free SMR hazard pointer based package
of expensive memory barriers!

There package does not seem to contain ANY sort of virtually
zero-overhead algorithms... They all seem to use atomic operations and
membars. Don't even get me started on their "transactional memory"
overheads... I would not even get near STM until it was supported
directly by the hardware.... Even then, it has to be more expensive then
"normal" LL/SC.

Luckily, the zero-overhead stuff we have, blows them away... VZOOM and
your RCU-SMR outperform SMR by a very, very wide margin...

Let see here, the main point of attack wrt RCU is on memory consumption
during periods of load. This is 100% true for "original" RCU. The patent
teachings did not address this issue. However, the solution is trivial,
and scalable. It has been mentioned here many times:

http://groups.google.com/group/comp.programming.threads/msg/de3eff43d6905998

> Also further down the list are a couple of papers on lock-free
> reference counting.
> http://www.non-blocking.com/download/GidPST05_LockFreeGC_TR.pdf
> http://www.non-blocking.com/download/Sun04_WaitFreeRef_TR.pdf
> They use hazard pointers to ensure the refcount increment is
> done safely.

Yeah, my AppCore project has been doing this for a very long time now:

http://appcore.home.comcast.net/
http://appcore.home.comcast.net/appcore/include/cpu/i686/ac_i686_lfgc_refcount_h.html

I have not read their documentation yet. If you are correct that they
are indeed using SMR, I have one serious question.....

Are they using the original SMR algorithm?

If they are, then they truly have not been paying attention. I could not
imagine why they would use plain SMR for reference count updates. My
god, they trash RCU and use a solution that requires a #StoreLoad to set
the pointer and #LoadStore|#StoreStore to release the pointer. Then you
have to factor in the atomic operation and membar needed to protect the
reference count updates... You can have 3 membars and an atomic op
(probably CAS to guard against incrementing a count that is < 1), in a
loop, just to modify a reference count!

AFAICT, they did not mention the massive overhead involved with that
#StoreLoad dependency inherent in the original SMR algorithm.

As for their NOBEL library, here is what I thought of an older version:

http://groups.google.com/group/comp.programming.threads/msg/0ec3907e9cb3e50e

I wonder if they still are using Valois linked-list algorithms(s). I
warned people back then to be weary of NOBEL because they were using it.
Even if they are not using it anymore, apparently, they are using
SMR. Which, IMHO, is completely useless due to the members...

> They also seem to know about Sun's SLRC refcounting
> scheme so the paper is fairly recent.

ummmm, don't you mean... Your atomic_ptr?

;)

Chris Thomasson

unread,

Apr 24, 2006, 8:53:24 PM4/24/06

to

Joe Seigh wrote:
> http://www.non-blocking.com/download/GidPST05_LockFreeGC_TR.pdf

I just briefly scanned some of their counting pseudo-code, and it
actually uses hazard pointers! Now I know why they did now show any
memory barriers in their pseudo-code... If they did, it would quickly
show off how inefficient it really is...

I bet they will not be getting rid of those expensive barriers anytime
soon... They seem to be referring to RCU as the great satan......

Chris Thomasson

unread,

Apr 24, 2006, 9:01:17 PM4/24/06

to

Joe Seigh wrote:
> http://www.non-blocking.com/download/Sun04_WaitFreeRef_TR.pdf

The counting algorithms in this paper, seem to rely on knowing the total
number of threads in advance. This seems to be a constant NR_THREADS in
the paper. IMHO, not acceptable, and is major drawback of SMR-like
algorithms...

Chris Thomasson

unread,

Apr 24, 2006, 9:04:39 PM4/24/06

to

Joe Seigh wrote:
> That somebody is here
> http://www.pss-ab.com/
>
> Mostly write lock-free based on multi-word compare and swap if
> I remember earlier papers by these guys correctly.
>
> In the documents section the first paper
> http://www.non-blocking.com/download/Multithread.pdf
> has some interesting comments dissing the alternatives
> (reader writer solutions) namely RCU and some "amateur"
> implementations. "just running for any number of hours",
> who would that be? :)
>
> Also further down the list are a couple of papers on lock-free
> reference counting.
> http://www.non-blocking.com/download/GidPST05_LockFreeGC_TR.pdf
> http://www.non-blocking.com/download/Sun04_WaitFreeRef_TR.pdf
> They use hazard pointers to ensure the refcount increment is
> done safely.

Humm, did they license the SMR patent from IBM or something? Surely they
know about the IBM patent application...

Joe Seigh

unread,

Apr 24, 2006, 9:17:09 PM4/24/06

to

I'm not sure why they're concerned with RCU since it's not the same type
of lock-free as what they do except maybe some uses of refcounting.
They also seem to think the read lock-free trade offs are bad. Everything
is a trade off. If you use a pure lock-free algorithms you're benefiting
writer performance at the expense of reader performance. It's a judgement
call. There are no absolute scales of performance for lock-free or for
locking.

Joe Seigh

unread,

Apr 24, 2006, 9:18:21 PM4/24/06

to

Joe Seigh wrote:
> That somebody is here
> http://www.pss-ab.com/
>
> Mostly write lock-free based on multi-word compare and swap if
> I remember earlier papers by these guys correctly.
>

Nah, that's some guys in England. These guys just did lock-free
data structures.

Chris Thomasson

unread,

Apr 25, 2006, 5:29:09 AM4/25/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:AZKdnX2wHr5...@comcast.com...

> Joe Seigh wrote:
>> That somebody is here
>> http://www.pss-ab.com/
>>
>> Mostly write lock-free based on multi-word compare and swap if
>> I remember earlier papers by these guys correctly.
>>
> Nah, that's some guys in England. These guys just did lock-free
> data structures.

Well, it seems that at least one of NOBEL's programmers "briefly" posted
something on this a while back:

http://groups.google.ca/group/comp.programming.threads/msg/5e42ba895bcc602a

Humm...

FWIW
----------
IMHO, I believe that some of our key inventions can be used to improve
"virtually all" of NOBELS "heavily SMR-based" writer lock-free algorithms.
If you actually look at "some" of the algorithms the library is apparently
using, you "should" quickly notice that they are just "saturated" with
CAS-loops that contain various "internal" #StoreLoad dependencies. They
posted links on their site to some of the papers that NOBEL is apparently
based on. Let's see if I can, very briefly, compare-and-contrast some of the
algorithms that NOBLE exposes, to some of our virtually zero-overhead
alternatives. Remember, they refer to us as "amateurs":

1. Writer Lock-Free Algorithms /w Conditional Blocking
---------------
This is means that an application thread has the ability to conditionally
block on target algorithms natural or special conditions, for a specified
amount of time. A simple example would be when consumer threads choose to
block on an empty collection (e.g., queue), or when producer threads choose
to block on a full collection, ect...

NOBEL
----------
AFAICT, none of the writer lock-free algorithms presented by the library
have any sort of "wait and/or timedwait" functionality. It does not seem to
provide anything that could allow a thread to block on any type of
conditions that naturally exist in "most" writer lock-free algorithms (e.g.,
stack empty). So, it seems like there isn't any sort of "conditional
blocking" mechanism offered by NOBEL.

IMHO
----------
A "robust" lock-free library should really provide "some sort" of external,
or even built-in, "lightweight" mechanism to allow for a thread to
conditionally block on target lock-free algorithms natural/special
conditions. There is a very simple and robust synchronization object, called
an eventcount, which can provide what we need.

- I prefer to use my "portable" version a virtually zero-overhead
eventcount. I first mentioned the algorithm here, in pseudo-code:

http://groups.google.com/group/comp.programming.threads/msg/f46ad77b39935c93

I finally implemented it for my AppCore library, here:

http://appcore.home.comcast.net
> http://appcore.home.comcast.net/appcore/src/ac_eventcount_algo1_c.html
>
http://appcore.home.comcast.net/appcore/include/ac_eventcount_algo1_h.html

- I would also have no problem using Joe Seighs high-performance eventcount
algorithm that is based on futexs:

http://prdownloads.sourceforge.net/atomic-ptr-plus/eventcount_0-0-0-i386.tar.gz?download

- Also, I mentioned so called "build-in" conditional-blocking. Here my
straightforward pseudo-code example of this sort of setup:

http://groups.google.de/group/comp.programming.threads/msg/632b6bdc2a137459

2. Multi-Producer/Consumer Queue's
---------------

NOBEL
----------
The lock-free queue by Michael & Scott... IIRC, they are using that
algorithm, or something very similar. The last version I saw of that
particular "beast" requires at least 2, sometimes 3 CAS operations that all
need membars; the logic is in a giant loop. Please note that this is for the
push operation alone; the pop operations overheads are similar. Basic and
result is that the overhead is equal to, or sometimes greater than, the
overhead associated with most mutex implementations... Humm...

IMHO
----------
A "robust" lock-free library should really provide a multi producer/consumer
queue implementation that actually has less overhead than uncontended mutexs
do; or what's the point?

- For a multi-producer/consumer lock-free queue I would much rather use the
"reverse" IBM Freelist trick that you mentioned here:

http://groups.google.ca/group/comp.programming.threads/msg/49a5db082507cb87

I posted a pseudo-code implementation here:

http://groups.google.ca/group/comp.programming.threads/msg/a359bfb41c68b98b

This scheme should totally outperform the algorithm that NOBEL is using.
IIRC, you mentioned this "optimization" here a while back. It doesn't even
require DWCAS! The overhead consists of a single release barrier along with
a normal CAS for the "ABA-proof" push operations, and a single loopless
atomic XCHG operation along with a single acquire barrier for the pop
operations. After a pop operation completes its XCHG and acquire barrier it
then simply processes the LIFO ordered data, in reverse order... Bingo! You
have one of the fastest multi-producer/consumer lock-free queues out there;
no kidding!

3. Single-Producer/Consumer Queue's
---------------
Sometimes, an application will only use a single producer thread, and a
single consumer thread to access a shared queue. A specialized queue
implementation can take direct advantage of this knowledge. The types of
optimizations that can be achieved are quite remarkable.

NOBEL
----------
AFAICT, the library does not have any queues that address the cases where
only a "single producer and consumer will ever access the queue".

IMHO
----------
A "robust" lock-free library should really provide an ultra low overhead
single-producer/consumer queue implementation. It should have, at least,
substantially less overhead than a multi-producer/consumer queue...

- For a single-producer/consumer lock-free queue I would much rather use my
virtually zero-overhead queue located here:

http://appcore.home.comcast.net/
>
http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_gcc_asm.html
> http://appcore.home.comcast.net/appcore/src/ac_queue_spsc_c.html
> http://appcore.home.comcast.net/appcore/include/ac_queue_spsc_h.html

No atomic-ops, or any #StoreLoad dependencies. Alls that is requires is a
single #StoreStore for the producer thread and dependant load ordering for
the consumer thread. This is a very lightweight and efficent design.

I am going to study their SMR-based counting algorithms some more... From
what I can now gather, every single writer-base lock-free algorithm in their
package makes use of their counting algorithm. They are most likely using
them to avoid using DWCAS to solve ABA, and to make then
"dynamically-sized". The older versions of NOBEL's algorithms were
statically sized; no memory management except static-sized IBM Freelist's
was used... Humm... I could go on and on... Their description of us, or
basically anybody else, that provides a lock-free library as "amateurs" has
provided me with a little extra motivation... So to speak...

;)

Chris Thomasson

unread,

Apr 25, 2006, 6:37:36 AM4/25/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:AZKdnUKwHr4...@comcast.com...

> Chris Thomasson wrote:
>> Joe Seigh wrote:
>>
>>> http://www.non-blocking.com/download/GidPST05_LockFreeGC_TR.pdf
>>
>>
>> I just briefly scanned some of their counting pseudo-code, and it
>> actually uses hazard pointers! Now I know why they did now show any
>> memory barriers in their pseudo-code... If they did, it would quickly
>> show off how inefficient it really is...
>>
>> I bet they will not be getting rid of those expensive barriers anytime
>> soon... They seem to be referring to RCU as the great satan......
>
> I'm not sure why they're concerned with RCU since it's not the same type
> of lock-free as what they do except maybe some uses of refcounting.
> They also seem to think the read lock-free trade offs are bad.

Yes, it sure does seem that way. Humm... I am finding it somewhat amusing
that they attack RCU, and then turn around and, apparently, make the
original SMR algorithm a "central theme" throughout their "entire library".
This will have the unfortunate "side-effect" of spreading the overhead that
comes along with SMR, all over the internals of the NOBEL library; no doubt
about it...

I have assumed that it was fairly well known, by now, that the original SMR
algorithm will add, at least, 2 expensive membars to "any" lock-free
operation that makes proper use of it... I guess my assumption may have
been, wrong... Yikes!

;(...

>Everything is a trade off.

Humm, this brings a couple of, perhaps, "sarcastic" questions to mind:

- Would I trade in a library that provides low-overhead queues, for one that
has queues that are even remotely similar to the Michael & Scott algorithm?

- Would I trade in a library that provides low-overhead eventcounts, for one
that does not have any support for conditional blocking?

- Would I trade in a CAS-based lock-free algorithm, for one that is based on
STM?

- Would I trade in a writer lock-free algorithm that used IBM Freelist-based
memory management, for one that uses the original SMR for its memory
management?

- Would I trade in VZOOM or Joe Seigh's RCU-SMR, for the original SMR thing?

... My answer to every question, happens to be the same ...

No FU%^ING Way!!!

I have to admit that the comments they made about the inventions that were
created by us "amateurs" made me get just a little pissed off... Humm... I
wonder why I continually choose to make successful use of ultra low-overhead
Proxy GC and queues in many different client/server projects I have been
responsible for; both large and small, mission-critical or not...

Humm... There has to be a reason why I continue to make such "radical"
decisions wrt implementing software that has to handle very high-loads...
Perhaps I am suffering from a personal problem, or something like that...
Humm...

Well come to think of it, after all, they describe us as "amateurs",
therefore, they really should cut us a little slack. No?

;)

Joe Seigh

unread,

Apr 25, 2006, 8:06:19 AM4/25/06

to

Chris Thomasson wrote:
> Humm...
>
> Well come to think of it, after all, they describe us as "amateurs",
> therefore, they really should cut us a little slack. No?
>

Well, they consider "amateur" attempts seriously enough as a threat that
it is some form of compliment. Anyway, they're promoting lock-free.