STL Containers w/ Virtually Zero-Overhead Lock-Free Reads...

101 views
Skip to first unread message

Chris Thomasson

unread,
Sep 19, 2005, 1:36:36 PM9/19/05
to
Should it be done? I wonder how popular a read optimized and memory barrier
free library of STL containers would actually be. Does anybody know if there
are any plans to incorporate "zero-overhead" lock-free techniques into the
STL? It seems that something like this is going to be badly needed in the
near feature... The performance difference between normal lock-free and
zero-overhead is quite remarkable. Perhaps I should create a demo virtually
zero-overhead container class...

Would anybody be interested in such a thing?

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

kimo

unread,
Sep 20, 2005, 5:37:49 AM9/20/05
to
Here's some stuff on C++ threading development
http://www.techweb.com/wire/software/163700651

particularly the fancy Boost Threading library which i believe
typically results in STL adoption.

http://www.boost.org/doc/html/threads.html#threads.introduction

Maciej Sobczak

unread,
Sep 20, 2005, 5:40:20 AM9/20/05
to
Chris Thomasson wrote:

> Perhaps I should create a demo virtually
> zero-overhead container class...

Please do, it would be a perfect feasibility study.

But be careful to implement the existing interface exactly and
completely, especially the guarantees about iterators lifetime and
validity - this will be needed to actually apply some algorithms to it.

> Would anybody be interested in such a thing?

Certainly.


--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/

kanze

unread,
Sep 20, 2005, 7:21:58 AM9/20/05
to
Chris Thomasson wrote:

> Should it be done?

I'm not sure exactly what you are asking for. Currently, the
SGI implementation doesn't require external synchronization if
all accesses are non-modifying. Globally, I don't think that
they had to do anything special to achieve this. (It does mean
that std::string cannot be COW.)

If you're asking about lock free read access while someone else
is modifying, and only the modifier having a lock, I'm
sceptical. A modifying access can generally rearrange memory,
changing the addresses of objects, etc. Which in turn
invalidates results from read accesses, even in a single thread
environment.

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

kibun

unread,
Sep 20, 2005, 9:59:12 AM9/20/05
to
are you claiming that it is possible to write a portable,
"zero-overhead", lock-free version of all the STL containers ?

David Abrahams

unread,
Sep 20, 2005, 10:00:40 AM9/20/05
to
Maciej Sobczak <no....@no.spam.com> writes:

> Chris Thomasson wrote:
>
>> Perhaps I should create a demo virtually
>> zero-overhead container class...
>
> Please do, it would be a perfect feasibility study.
>
> But be careful to implement the existing interface exactly and
> completely, especially the guarantees about iterators lifetime and
> validity - this will be needed to actually apply some algorithms to it.

I could be missing something important, but it seems to me that if you
want to "actually apply some algorithms to it" then a lock-free
container is of little use, since you get synchronization at the wrong
level of granularity. You'd need algorithm-level synchronization,
rather than container-level synchronization, in that case. In other
words, the SGI argument against putting mutexes in the containers
applies just as well to the question of whether lock-free container
operations are a good idea.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

Maciej Sobczak

unread,
Sep 20, 2005, 8:03:10 PM9/20/05
to
David Abrahams wrote:

>>>Perhaps I should create a demo virtually
>>>zero-overhead container class...
>>
>>Please do, it would be a perfect feasibility study.
>>
>>But be careful to implement the existing interface exactly and
>>completely, especially the guarantees about iterators lifetime and
>>validity - this will be needed to actually apply some algorithms to it.
>
> I could be missing something important, but it seems to me that if you
> want to "actually apply some algorithms to it" then a lock-free
> container is of little use, since you get synchronization at the wrong
> level of granularity.

That's exactly the point. STL containers do not exist in a vacuum, but
are part of something bigger (well, STL) and their design is part of
this bigger picture. Implementing the STL containers using some
lock-free techniques (the goal of original poster) may force the
implementer to drive the design somewhere else - just to show the
advantages of lock-free programming - meaning that it will not be STL
any longer.

Having said that, the implementation of STL that I'm using is lock-free
in the sense that there are no locks at all. Why improve it? ;-)

But more seriously, I admit that I have no idea what the original poster
*really* meant by lock-free STL containers, but anything that is
constructive will be for sure of service to the community.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Joe Seigh

unread,
Sep 20, 2005, 9:04:54 PM9/20/05
to
David Abrahams wrote:
> Maciej Sobczak <no....@no.spam.com> writes:
>
>
>>Chris Thomasson wrote:
>>
>>
>>>Perhaps I should create a demo virtually
>>>zero-overhead container class...
>>
>>Please do, it would be a perfect feasibility study.
>>
>>But be careful to implement the existing interface exactly and
>>completely, especially the guarantees about iterators lifetime and
>>validity - this will be needed to actually apply some algorithms to it.
>
>
> I could be missing something important, but it seems to me that if you
> want to "actually apply some algorithms to it" then a lock-free
> container is of little use, since you get synchronization at the wrong
> level of granularity. You'd need algorithm-level synchronization,
> rather than container-level synchronization, in that case. In other
> words, the SGI argument against putting mutexes in the containers
> applies just as well to the question of whether lock-free container
> operations are a good idea.
>

It's not like putting synchronization wrappers around an existing
STL container. It involves different implememntations and possibly
subtle or not so subtle semantic differences, particularly in the
area of iterators and handling of "temporary" raw references. I
think this lack of transparancy and the lack of experience in this
area will be the cause of some amount of shock if and when lock-free
containers show up. Putting on my pointy ears, I'd estimate the
probability of lock-free containers being adopted as quite low.
Low enough that I've stopped doing any serious work with developing
new lock-free techniques and mechanisms.


--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Chris Thomasson

unread,
Sep 21, 2005, 5:13:32 AM9/21/05
to
>> Perhaps I should create a demo virtually
>> zero-overhead container class...
>
> Please do, it would be a perfect feasibility study.
>
> But be careful to implement the existing interface exactly and
> completely, especially the guarantees about iterators lifetime and
> validity - this will be needed to actually apply some algorithms to

IMHO, this is the most important issue of all.

I know virtually zero-overhead STL is possible; however, there are a couple
of subtitle “features” that come along with “state-of-the-art solutions” to
the classic reader/writer problem.

Unfortunately, these “features” are not very forgiving wrt algorithms that
incorporate object membership in a particular collection into their standard
logic.

Basically, objects can legitimately exist in a collection “after other
threads” have atomically removed them.

It is safe, and “normal” for iterating threads to access a “recently
removed” object, simply because the objects memory is being meticulously
managed by the “collector”.


>> Would anybody be interested in such a thing?
>
> Certainly.

Okay. I thought that there might be a little interest. I am currently
polishing a proof-of-concept library right now. However, I am very busy
incorporating it parts of it into some in-house server software. I will have
the time to provide the demo library in the near future (most likely before
next year). Humm... Perhaps a “new collection and iteration model” should be
created, and the STL could stay the same. Something like

std::fast_map< ... > ect.

I believe something has to be done in C++ that can address the
high-performance computers that are becoming available to everyday
customers...


Thank you for your comments.


P.S.

When I use the term “features”, I am referring to an inherent side-effect
that is common to algorithms that allow writes, and memory barrier / atomic
operation free reads to execute in parallel. This type of behavior is key to
scalability wrt SMP, NUMA, ect... BTW, there is an odd “solution” to the
iteration problem; however, I am really not allowed to post any information.
;(...

kanze

unread,
Sep 21, 2005, 5:48:21 AM9/21/05
to
Maciej Sobczak wrote:

[...]


> Having said that, the implementation of STL that I'm using is
> lock-free in the sense that there are no locks at all.

Not even in the allocators? In the end, all of the allocator
implementations I know finish sooner or later in malloc, and the
malloc implementations I know use a lock.

(But that's just a nit, of course.)

--
James Kanze GABI Software
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Chris Thomasson

unread,
Sep 21, 2005, 7:56:32 AM9/21/05
to
> Having said that, the implementation of STL that I'm using is lock-free
> in the sense that there are no locks at all. Why improve it? ;-)

Good point!

:)


> But more seriously, I admit that I have no idea what the original poster
> *really* meant by lock-free STL containers, but anything that is
> constructive will be for sure of service to the community.

I am VERY interested in creating "memory barrier and atomic operation free"
reads for shared containers. Unfortunately, these types of extremely
expensive operations are common place in many other lock-free algorithms. I
have a couple of virtually zero-overhead "STL-Like" containers, however they
all suffer from the subtle iteration issue. They work fine as drop in
replacements, except when a certain type of algorithm uses it. Oh well. I
guess a new type of iteration model may be in order. Would you still be
interested?

;)

Chris Thomasson

unread,
Sep 21, 2005, 7:57:18 AM9/21/05
to
> If you're asking about lock free read access while someone else
> is modifying, and only the modifier having a lock, I'm
> sceptical.

When you get some time, go ahead and study some of the RCU literature.
Basically, readers can observe "and access" objects that have been removed;
memory is tracked by the collector...


> A modifying access can generally rearrange memory,
> changing the addresses of objects, etc. Which in turn
> invalidates results from read accesses, even in a single thread
> environment.

You would not generally directly modify the object that a node was
representing in a collection, unless the "reader algorithm" you use has the
proper logic to handle and detect updates. There are several methods you can
use. For instance, you could use a mutex to pop the node, then send it
through the collector. When it comes out on the other side you are
guaranteed that no other threads hold any references to it. You then modify
and push it back into the collection. You could also use monotonic counters;
like some ABA solutions, ect, ect...

Chris Thomasson

unread,
Sep 21, 2005, 7:56:56 AM9/21/05
to
> are you claiming that it is possible to write a portable,
> "zero-overhead", lock-free version of all the STL containers ?

Yes. There is only one issue...

It creates some iteration "features" that may affect "certain" algorithms.
Unfortunally, the framework that allows for great performance on SMP and
NUMA boxes has a side-effect that can break certain existing algorithms that
incorporate "membership in a collection" into their logic.

As far portability goes, I patented a method that can rely on POSIX mutex
semantics. IMHO, it is basically as portable as you can get.

kibun

unread,
Sep 21, 2005, 12:43:58 PM9/21/05
to
- just to be sure, is your solution a container+algorithm combo or
container only ?
- if it relies on mutexes, it is not lock-free, is it?

chris noonan

unread,
Sep 22, 2005, 4:54:51 AM9/22/05
to

kanze wrote:
> [...] In the end, all of the allocator

> implementations I know finish sooner or later in malloc, and the
> malloc implementations I know use a lock.

There are lock-free allocators around. Check out
http://www.leapheap.com

Regards,
Chris Noonan

Markus Elfring

unread,
Sep 22, 2005, 4:55:27 AM9/22/05
to
> Not even in the allocators? In the end, all of the allocator
> implementations I know finish sooner or later in malloc, and the
> malloc implementations I know use a lock.

Would you like to consider lock-free memory allocators?
Is the approach that is described in the paper "Scalable Lock-Free Dynamic Memory
Allocation" by
Maged M. Michael appropriate for the discussed use cases?
http://www.research.ibm.com/people/m/michael/pldi-2004.pdf

How do you think about alternative ways to improve the situation for multi-threaded
programming?
http://en.wikipedia.org/wiki/Hoard_memory_allocator

Regards,
Markus

Zhen Yao

unread,
Sep 22, 2005, 5:11:15 AM9/22/05
to
I think you don't really understand what "lock-free" means. You may
like to search the web for "Lock-Free Data Structures" or "Lock-Free
Algorithms" to get an fundamental idea of it.

Cheers,

Zhen Yao

Chris Thomasson

unread,
Sep 22, 2005, 5:16:34 AM9/22/05
to
> I could be missing something important, but it seems to me that if you
> want to "actually apply some algorithms to it" then a lock-free
> container is of little use, since you get synchronization at the wrong
> level of granularity. You'd need algorithm-level synchronization,
> rather than container-level synchronization, in that case. In other
> words, the SGI argument against putting mutexes in the containers
> applies just as well to the question of whether lock-free container
> operations are a good idea.

Per-container mutexs? That would be a nightmare! Think of an applications
that happens to create extremely high numbers of collection objects...

;)


You could use groups of global "static" arrays of mutexs. Hash the this
pointer of a collection into an index and do your business. There is a
catch, you can deadlock real fast, like this:

http://groups.google.com/group/comp.programming.threads/msg/56406d367ac85dcb?hl=en


A scaleable solution involves keeping order/track of the array of mutexs an
application thread has locked on a per-thread basis...

Dietmar Kuehl

unread,
Sep 22, 2005, 7:19:44 AM9/22/05
to
kanze wrote:
> Not even in the allocators? In the end, all of the allocator
> implementations I know finish sooner or later in malloc, and the
> malloc implementations I know use a lock.

Wouldn't 'malloc()' be one of the first functions to be written in
a lock-free manner?
--
<mailto:dietma...@yahoo.com> <http://www.dietmar-kuehl.de/>
<http://www.eai-systems.com> - Efficient Artificial Intelligence

David Abrahams

unread,
Sep 22, 2005, 10:38:32 AM9/22/05
to
Joe Seigh <jsei...@xemaps.com> writes:

> David Abrahams wrote:
>> Maciej Sobczak <no....@no.spam.com> writes:
>>
>>
>>>Chris Thomasson wrote:
>>>
>>>
>>>>Perhaps I should create a demo virtually
>>>>zero-overhead container class...
>>>
>>>Please do, it would be a perfect feasibility study.
>>>
>>>But be careful to implement the existing interface exactly and
>>>completely, especially the guarantees about iterators lifetime and
>>>validity - this will be needed to actually apply some algorithms to it.
>>
>>
>> I could be missing something important, but it seems to me that if you
>> want to "actually apply some algorithms to it" then a lock-free
>> container is of little use, since you get synchronization at the wrong
>> level of granularity. You'd need algorithm-level synchronization,
>> rather than container-level synchronization, in that case. In other
>> words, the SGI argument against putting mutexes in the containers
>> applies just as well to the question of whether lock-free container
>> operations are a good idea.
>>
>
> It's not like putting synchronization wrappers around an existing
> STL container. It involves different implememntations

I know that.

> and possibly subtle or not so subtle semantic differences,
> particularly in the area of iterators and handling of "temporary"
> raw references.

Well, yeah. But either that's not what I'm talking about or it's a
very low-level way to say the same thing.

Lock-free seems to make sense for things like memory allocators,
reference counts, and queues because they represent "virtually uniform
and unordered" resources -- if somebody gets in while you're trying to
do your work, it's no big deal; you can just start over. It doesn't
matter whether you get this memory block or the next one; it doesn't
matter if some other thread gets on the queue ahead of you as long as
you get on reasonably soon. But not everything is like that.

We could make vector::push_back lock-free at a complexity cost (O(N)
instead of amortized O(1)), and then vectors could *almost* be used as
queues. But you still don't have the right interface to remove
elements from such a queue because you need to get the element value
and remove it in one operation. Even if we solve that, vectors are
general-purpose containers; it's not possible to make sensible
semantic guarantees for arbitrary combinations of operations.

Suppose we had lock-free vectors. Now one thread calls
sort(v.begin(), v.end()) while another calls v.push_back(5). What
does either thread see? It seems to me that even if it's possible to
come up with coherent semantics for this case (which in itself is
debatable), it's impossible to come up with _useful_ semantics.
If containers are to be shared across threads, their state will almost
always be just a small piece of some larger invariant, in which case
you'll need to synchronize or "lock-free-ize" at a different level.

And once you start handing out forward/bidirectional/random-access
iterators, references, and pointers into a data structure, the
implementor of the data structure loses _all_ control over the kind of
atomicity you need when removing elements from queues. So, this
doesn't look like a subtle problem to me, not at all.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Peter Dimov

unread,
Sep 22, 2005, 3:49:47 PM9/22/05
to
Maciej Sobczak wrote:

> But more seriously, I admit that I have no idea what the original poster
> *really* meant by lock-free STL containers, but anything that is
> constructive will be for sure of service to the community.

The container portion of java.util.concurrent (made as close to STL as
possible but no closer) would be a good first step, I guess.

http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/package-summary.html

Torsten Robitzki

unread,
Sep 23, 2005, 9:39:28 AM9/23/05
to
Dietmar Kuehl wrote:

> kanze wrote:
>
>>Not even in the allocators? In the end, all of the allocator
>>implementations I know finish sooner or later in malloc, and the
>>malloc implementations I know use a lock.
>
>
> Wouldn't 'malloc()' be one of the first functions to be written in
> a lock-free manner?

It depends ;-) If you have little to no contention to the shared data
structures used by the allocator, you might get better performance with
a plane old lock. Entering a critical section and leaving it would
result in skiping a bit plus a membar on each side.

But if you have a lot of CPUs and each of them make heavy use of a
global heap, you might get better performance by using a lockfree heap
that can make progress on every CPU or will spin a little bit.

Chris Thomasson

unread,
Sep 23, 2005, 10:28:43 AM9/23/05
to
>- just to be sure, is your solution a container+algorithm combo or
> container only ?

Just containers for now.


> - if it relies on mutexes, it is not lock-free, is it?

Yes. I figured out a way to use mutexs for creating an "environment" where
all sorts of virtually lock-free algorithms/containers are possible...

Joe Seigh

unread,
Sep 23, 2005, 10:27:58 AM9/23/05
to
David Abrahams wrote:
> We could make vector::push_back lock-free at a complexity cost (O(N)
> instead of amortized O(1)), and then vectors could *almost* be used as
> queues. But you still don't have the right interface to remove
> elements from such a queue because you need to get the element value
> and remove it in one operation. Even if we solve that, vectors are
> general-purpose containers; it's not possible to make sensible
> semantic guarantees for arbitrary combinations of operations.

I have done a linked list implementation where the find operation
could be done lock-free and just the remove operation used a
conventional lock (and the iterator value without redoing the
lookup). Don't worry. It was just an experiment.


>
> Suppose we had lock-free vectors. Now one thread calls
> sort(v.begin(), v.end()) while another calls v.push_back(5). What
> does either thread see? It seems to me that even if it's possible to
> come up with coherent semantics for this case (which in itself is
> debatable), it's impossible to come up with _useful_ semantics.
> If containers are to be shared across threads, their state will almost
> always be just a small piece of some larger invariant, in which case
> you'll need to synchronize or "lock-free-ize" at a different level.
>
> And once you start handing out forward/bidirectional/random-access
> iterators, references, and pointers into a data structure, the
> implementor of the data structure loses _all_ control over the kind of
> atomicity you need when removing elements from queues. So, this
> doesn't look like a subtle problem to me, not at all.
>

It's not. And it's an easy to avoid problem. Just don't create
lock-free containers. You don't really need them. Not that I
think anyone is going to listen to me here. They'll go ahead and
do it anyhow and they won't wait for it to be done "properly".
You need to convince them to wait until (if ever) C++ supports doing
it properly.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Torsten Robitzki

unread,
Sep 23, 2005, 10:54:35 AM9/23/05
to
Chris Thomasson wrote:

>>are you claiming that it is possible to write a portable,
>>"zero-overhead", lock-free version of all the STL containers ?
>
>
> Yes. There is only one issue...

I wonder how a sequence like

your::vector<int> v(/* filled some how */);

if ( !v.empty() )
{
v.first();
}

could be implemented without a lock, and without a lock external to the
container.

Lock-Free or Not Lock Free, I think that STL-Containers have a interface
that is quit useless for beeing used in a threaded eviroment.

regards
Torsten

Chris Thomasson

unread,
Sep 23, 2005, 6:21:41 PM9/23/05
to
>- just to be sure, is your solution a container+algorithm combo or
> container only ?

Just to clarify:

I created and patented an algorithm that I am using to build
high-performance shared containers.

I have only used the invention for shared containers so far...

Chris Thomasson

unread,
Sep 23, 2005, 6:21:07 PM9/23/05
to
>> I could be missing something important, but it seems to me that if you
>> want to "actually apply some algorithms to it" then a lock-free
>> container is of little use,

Not true.


>> since you get synchronization at the wrong
>> level of granularity. You'd need algorithm-level synchronization,
>> rather than container-level synchronization, in that case.

I believe you might be misunderstanding what I mean by virtually
zero-overhead lock-free reads. User-defined algorithms are not restricted
from using their own form(s) of mutual exclusion to protect
“critical-sequences” of operations that are required to be executed
atomically:


Thread_A {
user_app::mutex::t_guard lock( m_lock );
sort( v.begin(), v.end() );
}

Thread_B {
user_app::mutex::t_guard lock( m_lock );
v.push_back(5);
}


This is perfectly fine; a well built generic lock-free read container doesn’t
need to rely on internally defined forms of mutual exclusion in order to
provide "read-only access" that doesn't use any nasty atomic operations
and/or StoreLoad style memory barriers...


> Suppose we had lock-free vectors. Now one thread calls
> sort(v.begin(), v.end()) while another calls v.push_back(5).

These would be considered writes.


> What does either thread see?

It depends on which thread executes first; remember writes are mutually
excluded.


> It seems to me that even if it's possible to
> come up with coherent semantics for this case (which in itself is
> debatable), it's impossible to come up with _useful_ semantics.
> If containers are to be shared across threads, their state will almost
> always be just a small piece of some larger invariant, in which case
> you'll need to synchronize or "lock-free-ize" at a different level.

This is completely compatible with virtually zero-overhead lock-free reads.
Many existing application defined algorithms that decided to use STL
containers in multithreaded environments use mutexs to protect everything
anyway; therefore they are already compatible with the requirements of the
highly flexible lock-free algorithm I am using. Application defined
algorithm can literately pick and choose exactly which if their "read-only
operations" can be executed without a lock. For instance, a thread that
makes frequent read-only const_iteration's over a shared collection would be
a great candidate to "lock-free-ize"...

:)

Joe Seigh

unread,
Sep 23, 2005, 10:00:00 PM9/23/05
to
Peter Dimov wrote:
> Maciej Sobczak wrote:
>
>
>>But more seriously, I admit that I have no idea what the original poster
>>*really* meant by lock-free STL containers, but anything that is
>>constructive will be for sure of service to the community.
>
>
> The container portion of java.util.concurrent (made as close to STL as
> possible but no closer) would be a good first step, I guess.
>
> http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/package-summary.html

It has a few of the write lock-free classes and a couple of read lock-free
classes, the COW (copy on write) ones. It might be illustrative to compare
this to the regular Java collections to see what kind of things can be made
lock-free and what the differences are.

Part of the problem that Java has and the STL has is that the orginal collections
were over engineered making an efficient lock-free implementation which preserves
all of the orginal semantics all but impossible.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

David Abrahams

unread,
Sep 24, 2005, 4:17:35 PM9/24/05
to
Torsten Robitzki <MyFir...@Robitzki.de> writes:

> Chris Thomasson wrote:
>
>>>are you claiming that it is possible to write a portable,
>>>"zero-overhead", lock-free version of all the STL containers ?
>>
>>
>> Yes. There is only one issue...
>
> I wonder how a sequence like
>
> your::vector<int> v(/* filled some how */);
>
> if ( !v.empty() )
> {
> v.first();
> }
>
> could be implemented without a lock, and without a lock external to the
> container.
>
> Lock-Free or Not Lock Free, I think that STL-Containers have a interface
> that is quit useless for beeing used in a threaded eviroment.

That's quite a bit too strong.

The STL Container interface makes them useless **as shared data
structures** in a threaded environment **unless you use external
locks**.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Calum Grant

unread,
Sep 24, 2005, 4:21:49 PM9/24/05
to
Chris Thomasson wrote:
>>- just to be sure, is your solution a container+algorithm combo or
>>container only ?
>
>
> Just to clarify:
>
> I created and patented an algorithm that I am using to build
> high-performance shared containers.

How does this compare to Microsoft's SLists (nonblocking singly-linked
lists), that are part of the Windows SDK?

>
>
> I have only used the invention for shared containers so far...

Which containers have you implemented? Do you have performance data?

Calum

Chris Thomasson

unread,
Sep 24, 2005, 9:34:51 PM9/24/05
to
> I wonder how a sequence like
>
> your::vector<int> v(/* filled some how */);
>
> if ( !v.empty() )
> {
> v.first();
> }
>
> could be implemented without a lock, and without a lock external to the
> container.

An external lock would be completely fine and compatible with my algorithm;
the method I am using provides the ability for STL-Containters to achieve
lock-free read-only iteration. This means that any "critical-sequences" of
operations that need to executed atomically are allowed to use various forms
of use mutual exclusion, including mutexs...


> Lock-Free or Not Lock Free, I think that STL-Containers have a interface
> that is quit useless for beeing used in a threaded eviroment.

Yeah. Your example demonstrates this. Luckily, STL-Containers are not
useless for high-performance lock-free read-only iterations...

David Abrahams

unread,
Sep 24, 2005, 9:38:56 PM9/24/05
to
"Chris Thomasson" <_no_spam_cristom@no_spam_comcast._net> writes:

>>> I could be missing something important, but it seems to me that if you
>>> want to "actually apply some algorithms to it" then a lock-free
>>> container is of little use,
>
> Not true.
>
>>> since you get synchronization at the wrong
>>> level of granularity. You'd need algorithm-level synchronization,
>>> rather than container-level synchronization, in that case.
>
> I believe you might be misunderstanding what I mean by virtually
> zero-overhead lock-free reads.

I don't think so.

> User-defined algorithms are not restricted
> from using their own form(s) of mutual exclusion to protect
> “critical-sequences” of operations that are required to be executed
> atomically:
>
>
> Thread_A {
> user_app::mutex::t_guard lock( m_lock );
> sort( v.begin(), v.end() );
> }
>
> Thread_B {
> user_app::mutex::t_guard lock( m_lock );
> v.push_back(5);
> }
>
>
> This is perfectly fine; a well built generic lock-free read container doesn’t
> need to rely on internally defined forms of mutual exclusion in order to
> provide "read-only access" that doesn't use any nasty atomic operations
> and/or StoreLoad style memory barriers...

Of course. But the effort you put into making your vector lock-free is
not needed to make the above work, so in effect it goes to waste. And
even lock free primitives like CAS are significantly more expensive
than their "plain old" counterparts, so making the vector lock free
actually imposes an efficiency cost for no benefit in this case.

Anyway "read-only access" usually works just fine even with a
completely thread-unaware container, so I'm not sure what you're
driving at here.

>> Suppose we had lock-free vectors. Now one thread calls
>> sort(v.begin(), v.end()) while another calls v.push_back(5).
>
> These would be considered writes.

Yes.

>> What does either thread see?
>
> It depends on which thread executes first; remember writes are mutually
> excluded.

The individual writes to the vector are mutually excluded (when you
have control over them -- which you don't, since the sort algorithm
gets random access iterators and can thus grab addresses and
references into the vector), but the call to sort consists of many
individual writes to the vector. Even if you _could_ mutually exclude
each one (which you can't, for reasons I just described) there's no
reason to think that push_back doesn't sneak between two of them.

>> It seems to me that even if it's possible to
>> come up with coherent semantics for this case (which in itself is
>> debatable), it's impossible to come up with _useful_ semantics.
>> If containers are to be shared across threads, their state will almost
>> always be just a small piece of some larger invariant, in which case
>> you'll need to synchronize or "lock-free-ize" at a different level.
>
> This is completely compatible with virtually zero-overhead lock-free
> reads.

Yes, but it renders wasteful any special measures taken or cycles
spent to do lock free reads.

> Many existing application defined algorithms that decided to use STL
> containers in multithreaded environments use mutexs to protect everything
> anyway; therefore they are already compatible with the requirements of the
> highly flexible lock-free algorithm I am using. Application defined
> algorithm can literately pick and choose exactly which if their "read-only
> operations" can be executed without a lock. For instance, a thread that
> makes frequent read-only const_iteration's over a shared collection would be
> a great candidate to "lock-free-ize"...

I thought you were "lock-free-izing" the container. If so, you have
done nothing to protect the entire process of iterating over it from
being disturbed by writes. If not, maybe you should revise your
description of what you're proposing.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Chris Thomasson

unread,
Sep 25, 2005, 10:08:58 AM9/25/05
to

Chris Thomasson

unread,
Sep 25, 2005, 10:08:23 AM9/25/05
to
>> a well built generic lock-free read container doesn’t
>> need to rely on internally defined forms of mutual exclusion in order to
>> provide "read-only access" that doesn't use any nasty atomic operations
>> and/or StoreLoad style memory barriers...
>
> Of course. But the effort you put into making your vector lock-free is
> not needed to make the above work, so in effect it goes to waste.

Correct.


> , so in effect it goes to waste.

My solution, wrt STL containers, doesn’t involve lock-free or lock-based
writes. It’s strictly a solution for readers.

> And even lock free primitives like CAS are significantly more expensive
> than their "plain old" counterparts, so making the vector lock free
> actually imposes an efficiency cost for no benefit in this case.

Humm... I wonder what “plain old” counterparts you might be referring to? I
totally agree that atomic operations are very expensive for modern
processors to execute. This is one of the main reasons why I am working on
completely eliminating them from read-only iterations.


> Anyway "read-only access" usually works just fine even with a
> completely thread-unaware container

Humm... This is not really true in a multi-threaded environment...


What happens if a writer thread pops a node and frees it while there are a
couple of reader’s threads concurrently performing iterations? How can we
efficiently and portably ensure that the readers don’t segfault? I will
refer to this specific situation as the “reader/writer problem”.


Now, you might be thinking that shared_ptr could be a viable solution to the
problem. Unfortunately, it’s simply not compatible; believe it or not,
shared_ptr is not a “true” atomic reference count:


http://groups.google.com/group/comp.lang.c++.moderated/msg/0c6f5c5ab72164d0?hl=en

http://groups.google.com/group/comp.programming.threads/msg/a60ce03f406dfcb5

http://groups.google.com/group/comp.programming.threads/msg/93983d56839ffb45


Even if shared_ptr could handle this type of reference counting, it still
relies on expensive atomic operations and StoreLoad style memory barriers
for virtually every reference modification. This produces a high amount of
unneeded overhead.


I guess one could use full-blown garbage collection. IMHO, that would be
overkill and, unfortunately, garbage collectors still produce unneeded
overhead in the form of stop-the-world techniques, memory barriers, ect...


Luckily, all is not lost! :)

There are some existing solutions to the reader/writer problem:

IBM: Read, Copy and Update (RCU)

IBM: Safe Memory Reclamation (SMR)

SUN: Repeat Offender Problem (ROP)

All of these work; however, they are not practical for any sort of “generic
solution”. Here are just a couple of reasons why:

http://groups.google.com/group/comp.programming.threads/msg/b28bdb0051e2c2c6


> , so I'm not sure what you're driving at here.

I am very interested in incorporating a flexible solution to the
reader/writer problem into STL container logic. This would give them the
inherent ability to perform “extremely well” on various modern
multi-processor systems.


>>> Suppose we had lock-free vectors. Now one thread calls
>>> sort(v.begin(), v.end()) while another calls v.push_back(5).
>>
>> These would be considered writes.
>
> Yes.
>
>>> What does either thread see?
>>
>> It depends on which thread executes first; remember writes are mutually
>> excluded.
>
> The individual writes to the vector are mutually excluded (when you
> have control over them -- which you don't, since the sort algorithm
> gets random access iterators and can thus grab addresses and
> references into the vector), but the call to sort consists of many
> individual writes to the vector. Even if you _could_ mutually exclude
> each one (which you can't, for reasons I just described) there's no
> reason to think that push_back doesn't sneak between two of them.

The calls to sort and push_back could be protected by an external mutex. The
individual writes would fall under the same critical-section; therefore they
would be atomic. Please note, I am not advocating lock-free writes in any
way, shape or form. You are 100% correct in your assertion that lock-free
writes are generally useless wrt STL Containers; however, this is not true
for lock-free reads...


>>> It seems to me that even if it's possible to
>>> come up with coherent semantics for this case (which in itself is
>>> debatable), it's impossible to come up with _useful_ semantics.
>>> If containers are to be shared across threads, their state will almost
>>> always be just a small piece of some larger invariant, in which case
>>> you'll need to synchronize or "lock-free-ize" at a different level.
>>
>> This is completely compatible with virtually zero-overhead lock-free
>> reads.
>
> Yes, but it renders wasteful any special measures taken or cycles
> spent to do lock free reads.

How so? In the realm of lock-free reads, special measures simply have to be
taken if you don’t want your collection to basically be a massive
race-condition. Writers need to be specially coordinated with the readers in
order to solve the nasty reader/writer problem...


>> Many existing application defined algorithms that decided to use STL
>> containers in multithreaded environments use mutexs to protect everything
>> anyway; therefore they are already compatible with the requirements of
>> the
>> highly flexible lock-free algorithm I am using. Application defined
>> algorithm can literately pick and choose exactly which if their
>> "read-only
>> operations" can be executed without a lock. For instance, a thread that
>> makes frequent read-only const_iteration's over a shared collection would
>> be
>> a great candidate to "lock-free-ize"...
>
> I thought you were "lock-free-izing" the container.

Not the entire container. I just think that they should give an application
programmer the ability to “lock-free-ize” read-only iterations. That's all.


> If so, you have
> done nothing to protect the entire process of iterating over it from
> being disturbed by writes.

This is exactly where a reader/writer solution would come into play. They
can directly address this type of situation.


> If not, maybe you should revise your description of what you're proposing.

I agree. I have a feeling that I am not sufficiently and clearly stating my
case. For that, I am sorry. Let me attempt to clear some things up:


1. I am proposing incorporating a solution to the reader/writer problem into
the STL container logic.

2. I am not advocating lock-free writes in any way, shape, or form.


I think that STL containers should provide the ability for lock-free
read-only access to application programmers. This would allow STL containers
to efficiently address multi-processor systems. I believe this ability would
further help C++ enter the world of high-performance multi-threading...


Any thoughts?


BTW:

Thank you, David, for your input, patience and more importantly, your time.

I really do appreciate it.

:)

kimo

unread,
Sep 25, 2005, 10:01:13 AM9/25/05
to
Hey Chris-

Since you have patented the method and we are all programmers here,
please post the actual code or detailed psuedo code, so there is no
more beating around the bush-

I think we are are interested in your innovation but are getting caught
up on definition of terms.

Also this will allow people to perform benchmarking, and an analysis of
the tradeoffs of this technique with others that are out there -
(memory, SMP, Distributed, Message based etc)

Chris Thomasson

unread,
Sep 25, 2005, 10:13:27 AM9/25/05
to
>> I created and patented an algorithm that I am using to build
>> high-performance shared containers.
>
> How does this compare to Microsoft's SLists (nonblocking singly-linked
> lists), that are part of the Windows SDK?

They are simply not comparable because the "SList" algorithm exists in a
completely different category. BTW, did you know that this particular one
has been around for ages, and is better known as the IBM FreeList:


http://publibz.boulder.ibm.com/epubs/pdf/dz9zr002.pdf


Look in appendix A, under multi-processing examples. You will actually find
Microsoft's algorithm in an IBM publication...

;)


For what its worth, here is my old and experimental x86 implementation of
the IBM FreeList:

http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_gcc_asm.html


The algorithm resides in the following functions

ac_i686_stack_mpmc_push_cas
np_ac_i686_stack_mpmc_pop_dwcas


The data-structure is here:

http://appcore.home.comcast.net/appcore/include/cpu/i686/ac_i686_h.html

ac_i686_stack_mpmc_t


You can use this on Linux; no windows required!

;)


Please note, my code uses the mfence instruction for its SMR implementation.
This instruction doesn't work on every x86. You might need to change it to
an (lock xadd) instruction that operates on a dummy location.


>> I have only used the invention for shared containers so far...
>
> Which containers have you implemented?

Mainly just singly and doubly-linked lists, hash-maps, arrays and trees. I
am also implementing smart-pointers, eventcounts, and memory allocators...


> Do you have performance data?

I have a whole lot of performance data. I am getting ready to present it. It
should be ready before 2006. There will also be some portable and adjustable
demo applications that show off some rather amazing performance benefits.
The demos will show how reference counting and linked-lists can be
efficiently implemented for multi-processor systems.


For instance... There is a demo that creates a group of reader threads and a
couple of writer threads. Each reader acquires a reference to a shared
object, reads some data from the object, randomly yields, releases the
reference and repeats until the object is null. The writer threads, every so
often, swaps the shared object with a new one, and then destroys the old
copy. The writer threads swap in a null pointer to end the test after it has
performed all of its writes. I measure the average number of reads from the
shared object a reader thread performed per. second during the duration of
the test.


Here are some quick numbers for a very common processor:


P4 HyperThreaded 3.06ghz
( I have also tested SMP )


Boost Mutex
10 Readers - 2 Writers ( 500 updates )
-------------------------
22,254 avg. reads per-second per-reader.


Generic Proxy Based Collector
10 Readers - 2 Writers ( 500 updates )
-------------------------
583,045 avg. reads per-second per-reader.


SMR Hazard Pointers
10 Readers - 2 Writers ( 500 updates )
-------------------------
706,881 avg. reads per-second per-reader.


VZOOM
10 Readers - 2 Writers ( 500 updates )
-------------------------
1,235,159 avg. reads per-second per-reader.


The reason VZOOM allows the reader threads to achieve so many reads
per-second is because, unlike the others, it doesn't use any loops, atomic
operations, and/or memory barriers to acquire/release a reference to the
shared object.


When the demo is published, you will be able to "test and tweak" on various
systems to see some of the numbers for yourself...

;)

Joe Seigh

unread,
Sep 25, 2005, 10:15:28 AM9/25/05
to
David Abrahams wrote:

> "Chris Thomasson" <_no_spam_cristom@no_spam_comcast._net> writes:
>
>>Many existing application defined algorithms that decided to use STL
>>containers in multithreaded environments use mutexs to protect everything
>>anyway; therefore they are already compatible with the requirements of the
>>highly flexible lock-free algorithm I am using. Application defined
>>algorithm can literately pick and choose exactly which if their "read-only
>>operations" can be executed without a lock. For instance, a thread that
>>makes frequent read-only const_iteration's over a shared collection would be
>>a great candidate to "lock-free-ize"...
>
>
> I thought you were "lock-free-izing" the container. If so, you have
> done nothing to protect the entire process of iterating over it from
> being disturbed by writes. If not, maybe you should revise your
> description of what you're proposing.

He's proposing a readers/writers solution, i.e. write with concurrent
reads. It's for scalability which is mostly subjective. If you
don't think you have a scalability problem then you don't need lock-free.
If you have a single threaded application then you won't expect better
performance on a multiprocessor vs. a single processor system and you
don't need lock-free. If you have a multi-threaded application and
you don't expect better performance the more processors you have then
you don't need lock-free.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Torsten Robitzki

unread,
Sep 27, 2005, 10:34:33 AM9/27/05
to
David Abrahams wrote:

> Torsten Robitzki <MyFir...@Robitzki.de> writes:
>>
>>Lock-Free or Not Lock Free, I think that STL-Containers have a interface
>>that is quit useless for beeing used in a threaded eviroment.
>
>
> That's quite a bit too strong.
>
> The STL Container interface makes them useless **as shared data
> structures** in a threaded environment **unless you use external
> locks**.


Yes, you are right. And as Chris said, most of the STL implemenations
should be usable if only read accesses occure.

regards
Torsten

Markus Elfring

unread,
Sep 27, 2005, 10:51:27 AM9/27/05
to
> I have a whole lot of performance data. I am getting ready to present it. It
> should be ready before 2006. There will also be some portable and adjustable
> demo applications that show off some rather amazing performance benefits.
> The demos will show how reference counting and linked-lists can be
> efficiently implemented for multi-processor systems.

Would you like to compare your approach with this library implementation?
http://www.cs.chalmers.se/~noble/

Regards,
Markus

gottlo...@gmail.com

unread,
Sep 27, 2005, 10:51:05 AM9/27/05
to

Chris Thomasson wrote:
> ... VZOOM...

Only one question: if you are patenting it, what good is it to us?

Joe Seigh

unread,
Sep 27, 2005, 10:28:26 PM9/27/05
to
gottlo...@gmail.com wrote:
> Chris Thomasson wrote:
>
>>... VZOOM...
>
>
> Only one question: if you are patenting it, what good is it to us?

A better question would be what's not been or being patented? Can
you list the lock-free techniques that are in the public domain?
Even techniques using tracing GC that is atomically thread-safe
like Java uses are being patented. Sun has a patent on a lock-free
linked list in Java. Patent activity in this area (in everything
actually) has been really intense lately. If you're a late arrival
in this area of computing, you won't find much left that's free to use.
There will be lock-free libraries and api's. They just won't be
part of an open and free standard.

And another thing is I don't think you can actually put anything
in the public domain even if you wanted to as a practical matter.
A blocking patent on top of a public domain idea can have you owning
that idea. The initial presentation of any non-trivial idea is likely
to have minor errors or omissions. You can patent those bug fixes
or ommissions. If you patent the underlying idea then you diminish
the relative power of the blocking patents.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Chris Thomasson

unread,
Sep 28, 2005, 6:15:26 AM9/28/05
to
"kimo" <kimocr...@gmail.com> wrote in message
news:1127609579.5...@o13g2000cwo.googlegroups.com...

> Hey Chris-
>
> Since you have patented the method and we are all programmers here,
> please post the actual code or detailed psuedo code, so there is no
> more beating around the bush-
>
> I think we are are interested in your innovation but are getting caught
> up on definition of terms.

I am sorry for that. Unfortunately, I simply have to run some documentation
by another party before I have permission to publicly post some details on
how my stuff actually works. This should be cleared before 2006, and then I
can go ahead and post some fairly detailed pseudo-code and figures. The
problem is there are a couple of embodiments that fall under the same
claims, and its hard to choose which one to publish. I am not sure I should
post the "preferred embodiment", when there are others that still generate
high performance numbers, but don't give away something "too private..."

;)

Chris Thomasson

unread,
Sep 28, 2005, 6:15:48 AM9/28/05
to
> Would you like to compare your approach with this library implementation?
> http://www.cs.chalmers.se/~noble/

Not comparable. All of Nobles stuff is statically sized. My algorithm tracks
the memory that a lock-free algorithm uses. Its a tool that can be used to
build scaleable and dynamically sized lock-free data structures, or allow
for the lock-free traversal of various linked structures while there are
concurrent modifications. Mutexs, Read-Write Locks, SMR or ROP like
algorithms are better candidates for comparison. RCU algorithms would
probably not be a fair comparison, because in RCU you have to drop all of
your references before you exit a quiescent state; a quiescent state is
basically a "safe-point" between "collected-regions"...

Anyway, I have to run some documentation through another party before I have

permission to publicly post some details on how my stuff actually works.

This should be cleared before 2006, and then I can post some pseudo-code and

figures. The problem is there are a couple of embodiments that fall under
the same claims, and its hard to choose which one to publish. I am not sure

I should post the preferred embodiment, when there are others that still

generate high performance numbers, but don't give away something too
private...

;)

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Chris Thomasson

unread,
Sep 28, 2005, 6:01:25 PM9/28/05
to
> your references before you exit a quiescent state; a quiescent state is
^^^^

I got that backwards! It should say "enter" instead of "exit"!!

Sorry.

Chris Thomasson

unread,
Dec 9, 2005, 5:58:51 AM12/9/05
to
"Chris Thomasson" <_no_spam_cristom@no_spam_comcast._net> wrote in message
news:MaidnfVDJOk...@comcast.com...
> Should it be done? I wonder how popular a read optimized and memory
> barrier
> free library of STL containers would actually be. Does anybody know if
> there
> are any plans to incorporate "zero-overhead" lock-free techniques into the
> STL? It seems that something like this is going to be badly needed in the
> near feature... The performance difference between normal lock-free and
> zero-overhead is quite remarkable. Perhaps I should create a demo
> virtually
> zero-overhead container class...
>
> Would anybody be interested in such a thing?

Well, I finally setup a site for VZOOM at:

http://home.comcast.net/~vzoom/

Various demo applications will be available for download before 2006. Some
documentation will be posted sometime in January 2007. I am sorry for the
delay, but I am just so damn busy! Oh well. Anyway, I really do believe you
will find the techniques I used to implement VZOOM to be pretty nifty. IMHO,
its an elegant and straightforward algorithm.

For those of you who are interested, you might want to take a quick look a
particular algorithm that I created and used to benchmark against VZOOM
during its early development phase. Its your basic proxy collector scheme.
It doesn't require DWCAS. The basic trick is to align collector structures
on a 128 or greater byte-boundary. The extra space is used as a master
reference count. Differential counting between the master count and a
per-collector private count along with back-linked reference counting are
used to manage the lifetime of the collectors and to ensure strict FIFO
ordering:

http://groups.google.com/group/comp.programming.threads/msg/3b0b0439045991f4?hl=en
(FIFO ordering wrt certain proxy designs, like the following, is crucial)


This is some of the first demo code I can provide right now. This proxy
collector algorithm has some nice performance characteristics. This is
mainly due to the fact that only one atomic sequence requires a loop; the
CAS logic in the pc_dec(...) function:

http://home.comcast.net/~vzoom/demos/pc_sample.c

However, believe it or not ;), the VZOOM algorithm blows this one completely
out of the water. This is because it is highly distributed and does not use
atomic ops/membars to acquire/release a reference to a shared data-object.
The overall synchronization scheme can be tailored to an applications needs,
and its also a "lot" more portable. VZOOM can be configured to only rely on
dependant-load ordering (e.g., read_barrier_depends()). IMHO, its about as
portable as you can get. VZOOM and the presented proxy collector demo code
can be compatible with virtually all of the existing user created
"thread-safe" STL-based algorithms out there. A user algorithm is probably
using external locking to protect accesses to any shared STL-containers.
Believe it or not, this pattern goes very well with proxy collector
behavior(s) in general. A nifty STL-implementation could assume that if it
is being used in a multi-threaded environment the user must be providing a
form of external mutual-exclusion. Therefore, all accesses to the
STL-container would fall under a "critical-section". This means it could
reap all of the benefits of being in a "collected-region". A collected
region can be as simple as this:


enter_proxy_collected_region(...);

/* if the user needs any external sync, invoke it here */

/* access STL-container(s) */

/* if the user used external sync, revoke it here */

leave_proxy_collected_region(...);


Does this look similar to a solution that involves mutexs? Before I go on
and on, I should ask if you have any thoughts/questions? I am looking
forward to discussing the subject in-depth...


Thank you for your Time,
Chris Thomasson

Chris Thomasson

unread,
Dec 13, 2005, 6:07:18 PM12/13/05
to
DOH!

I made a mistake in the code when I was stripping out some of my library
specific stuff. The cas loop in the pc_dec( ... ) function did not reload
the cmp var when a cas failed!

Here is the fixed code:

http://home.comcast.net/~vzoom/demos/pc_sample.c

Sorry!

Chris Thomasson

unread,
Jun 28, 2006, 6:28:00 AM6/28/06
to
"Chris Thomasson" <_no_spam_cristom@no_spam_comcast._net> wrote in message
news:MaidnfVDJOk...@comcast.com...
> Should it be done? I wonder how popular a read optimized and memory
> barrier
> free library of STL containers would actually be. Does anybody know if
> there
> are any plans to incorporate "zero-overhead" lock-free techniques into the
> STL? It seems that something like this is going to be badly needed in the
> near feature... The performance difference between normal lock-free and
> zero-overhead is quite remarkable. Perhaps I should create a demo
> virtually
> zero-overhead container class...
>
> Would anybody be interested in such a thing?

Well, I really lucked out this time!


My vZOOM library is one of the four finalists of the SUN CoolThreads Prize
for Innovation Contest:

https://coolthreads.dev.java.net/

http://newsletter.paragon-systems.com/articles/94/1/news/15572


"Hopefully", I will soon be able to provide my easy to use, and 100%
working, lock-free/zero-overhead framework to the programming community. To
show off my marketing side, ;), it has one of the fastest lock-free memory
allocators out there:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/8245f4b48591fc69/e3efa5628aad4a82?q=lock-free+slab&rnum=1#e3efa5628aad
4a82
(quick and dirty sketch)

http://groups.google.com/group/comp.programming.threads/msg/0aab29e24e6631ab
http://groups.google.com/group/comp.programming.threads/msg/907573c0e81dba56
http://groups.google.com/group/comp.programming.threads/msg/1d3aeaa6ebf5181f
(part of the implementation I used in vZOOM)


Even outperforms Michaels lock-free allocator... By a wide margin... And it
has a built in virtually zero-overhead proxy garbage collection scheme that,
IMHO, works well, and is very similar to, C++ RAII... I am soooo very sorry
for keeping this thing so secret; I am in a serious competition, that just
got a "bit" more serious... :O

P.S.

Here is what my "first round" winning vZOOM project consists of:


Distributed Lock-Free Memory Allocator
http://groups.google.ca/group/comp.programming.threads/browse_thread/thread/8245f4b48591fc69/e3efa5628aad4a82?q=lock-free+slab&rnum=1#e3efa5628a
ad4a82

(BTW, this is a blazingly fast allocator, wow! :)


Lock-Free Single-Producer/Consumer Queue
http://appcore.home.comcast.net/

(This is also extremely fast)


Lock-Free EventCount
http://groups.google.ca/group/comp.programming.threads/browse_thread/thread/aa8c62ad06dbb380/79016c78a3696f6c?q=portable+eventcount&rnum=1#79016
c78a3696f6c

(the eventcount solution is extremely more flexible than built-in condition
blocking, everything has their tradeoffs, and its good to cover both sides
of the specturm.. ;)


POSIX Version Of vZOOM
http://groups.google.ca/group/comp.programming.threads/msg/59e9b6e427b4a144

(Portable to any arch/os combination that has mutexs and dependant load
ordering, super portable solution to the reader-writer problem in general)

For the second round I am going to have to add one my old algorithms:

Lock-Free Stack /w Built-In Conditional Blocking
http://groups.google.de/group/comp.programming.threads/msg/632b6bdc2a137459

( the built-in blocking is more efficient than a eventcount solution)


I believe that my project has the potential to help the programming
community efficiently address the concurrency revolution that has recently
trampled over the programming community in general...

Joe Seigh

unread,
Jun 29, 2006, 4:54:49 AM6/29/06
to
Chris Thomasson wrote:
> "Chris Thomasson" <_no_spam_cristom@no_spam_comcast._net> wrote in message
> news:MaidnfVDJOk...@comcast.com...
>
>>Should it be done? I wonder how popular a read optimized and memory
>>barrier
>>free library of STL containers would actually be. Does anybody know if
>>there
>>are any plans to incorporate "zero-overhead" lock-free techniques into the
>>STL? It seems that something like this is going to be badly needed in the
>>near feature... The performance difference between normal lock-free and
>>zero-overhead is quite remarkable. Perhaps I should create a demo
>>virtually
>>zero-overhead container class...
>>
>>Would anybody be interested in such a thing?
>
>
> Well, I really lucked out this time!
>
>
> My vZOOM library is one of the four finalists of the SUN CoolThreads Prize
> for Innovation Contest:
>
> https://coolthreads.dev.java.net/
>
> http://newsletter.paragon-systems.com/articles/94/1/news/15572
>
....

Go for it!

The atomic_ptr stuff is pretty much dead because of patent issues.
Though I will probably see how much money can be made with what
are known as blocking patents. RCU+SMR would have been a really
good one. Plus you don't actually have to write and support code.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Chris Thomasson

unread,
Jul 7, 2006, 7:58:30 PM7/7/06
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:jPOdnWmFiOw08D_Z...@comcast.com...

> Chris Thomasson wrote:
>> "Chris Thomasson" <_no_spam_cristom@no_spam_comcast._net> wrote in
>> message
>> news:MaidnfVDJOk...@comcast.com...
[...]

>> My vZOOM library is one of the four finalists of the SUN CoolThreads
>> Prize
>> for Innovation Contest:
>>
>> https://coolthreads.dev.java.net/
>>
>> http://newsletter.paragon-systems.com/articles/94/1/news/15572
>>
> ....
>
> Go for it!
>
> The atomic_ptr stuff is pretty much dead because of patent issues.
> Though I will probably see how much money can be made with what
> are known as blocking patents. RCU+SMR would have been a really
> good one. Plus you don't actually have to write and support code.

Humm... The vZOOM patent application has claims that could block a RCU or
SMR "upgrade" patent...

I claim that a plurality of persistent references can be acquired or
released without the use of atomic operations and/or memory barriers... I
claim that a plurality of persistent references can persist across two or
more successive synchronization epochs... I claim that a synchronization
epoch may be executed by a cpu or thread on an episodic or periodic basis...

I claim that a polling entity may be executed on an episodic or periodic
basis... I claim that a polling entity may poll for a plurality of
synchronization epochs on an episodic or periodic basis... I claim that a
polling entity may detect that a plurality of cpu or thread has recently
executed a synchronization epoch on an episodic or periodic basis... I claim
that a polling entity may poll a plurality of persistent references after
synchronization epoch detection on an episodic or periodic basis...

That "seems" to block RCU or SMR upgrades...

Joe Seigh

unread,
Jul 8, 2006, 7:29:11 PM7/8/06
to
:)

Well, the RCU+SMR stuff is in the public domain. I haven't mentioned
which additional techniques, if any exist, would be applicable as
blocking patents. One of the reasons for deactivating the
atomic-ptr-plus subprojects was so there'd be no clues here.
Plus I can take my time in deciding whether the techniques are
non trivial enough.

Blocking or defensive patents are improvements on the orginal patents.
Stuff that didn't occur to the orignal inventors. Sometimes the patent
office will grant patents for really trivial improvements that the original
inventors though too obvious to mention in the claims and that can be real
annoying.

I don't think this will affect C++0x if they keep all their support for
atomic ops at a low level where they won't actually be using any algorithms.
So the actual lock-free stuff will go in as propietary libraries, which
probably precludes STL or something like it ever being lock-free.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Chris Thomasson

unread,
Jul 11, 2006, 4:45:03 PM7/11/06
to
"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:JvKdnTsP5J1QnTLZ...@comcast.com...

> Chris Thomasson wrote:
>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>> news:jPOdnWmFiOw08D_Z...@comcast.com...
>>
>>>Chris Thomasson wrote:
>>>
>>>>"Chris Thomasson" <_no_spam_cristom@no_spam_comcast._net> wrote in
>>>>message
>>>>news:MaidnfVDJOk...@comcast.com...
>>
>> [...]
>>
>>>>My vZOOM library is one of the four finalists of the SUN CoolThreads
>>>>Prize
>>>>for Innovation Contest:
>>>>
>>>>https://coolthreads.dev.java.net/
>>>>
>>>>http://newsletter.paragon-systems.com/articles/94/1/news/15572
>>>>
>>>
>>>....

[...]


>> That "seems" to block RCU or SMR upgrades...
>>
>>
> :)
>
> Well, the RCU+SMR stuff is in the public domain. I haven't mentioned
> which additional techniques, if any exist, would be applicable as
> blocking patents.

Actually, the stuff I have is different than RCU, SMR or RCU+SMR. Here are
some basic points:


- It differs from RCU because references can persist across sync epochs
(rcu-grace/quiescent); this is a fundamental difference...


- It differs from SMR because it does not any sort of "hazard pointer
scheme". vZOOM is built one of my distributed proxy reference counting
techniques. Also, with vZOOM, there is no bound put on the number of
references a thread may acquire at any one time; this is a fundamental
difference...


- It differs from RCU+SMR for the same reasons it differs from SMR...


What do you think?


> One of the reasons for deactivating the
> atomic-ptr-plus subprojects was so there'd be no clues here.

[...]

;)


> Blocking or defensive patents are improvements on the orginal patents.
> Stuff that didn't occur to the orignal inventors. Sometimes the patent
> office will grant patents for really trivial improvements that the
> original
> inventors though too obvious to mention in the claims and that can be real
> annoying.
>
> I don't think this will affect C++0x if they keep all their support for
> atomic ops at a low level where they won't actually be using any
> algorithms.

Humm... I would be fine with that... Come to think about it... C++0x doesn't
"really" need any garbage collectors either...

We can build them ourselves with the low-level atomic ops and memory barrier
functionality that will be reliably enforced by a coherent C++ Standard...

:)


[...]

Joe Seigh

unread,
Jul 12, 2006, 6:48:06 PM7/12/06
to
Chris Thomasson wrote:
> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:JvKdnTsP5J1QnTLZ...@comcast.com...

>>
>>I don't think this will affect C++0x if they keep all their support for
>>atomic ops at a low level where they won't actually be using any
>>algorithms.
>
>
> Humm... I would be fine with that... Come to think about it... C++0x doesn't
> "really" need any garbage collectors either...
>
> We can build them ourselves with the low-level atomic ops and memory barrier
> functionality that will be reliably enforced by a coherent C++ Standard...
>

Discussion on the C++0x mailing list has picked up again but they still
haven't figured out how to define semantics for the atomic ops. I'd
say at this point there's a minimal chance that they come up with anything
that's generally useful (as opposed to being useful for specific preconceived
purposes). So it looks like lock-free implementors will be stuck with
non portable inline assembler for the forseeable future.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Reply all
Reply to author
Forward
0 new messages