I've nearly finished my master thesis which covers the shift to
parallelism and C++0x's upcoming memory model. I'd really like get to
some comments on the C++ part from you experts. Of course, you are
more than welcome to comment on other parts too.
The document and some example lock-free C++0x code can be found at
http://74.53.180.2/~thorpe/
ABSTRACT
The first part is an overview of the paradigmatic shift to parallelism
that is currently taking place. It explains why processors needs to
become parallel, how they might function and what types of parallelism
that are. Given that information, it explains why threads and locks is
not a suitable programming model, how threading is being improved and
used to extract parallel performance and what problems awaits new
parallel programming models and how they might work. The final chapter
surveys the landscape of existing parallel software and hardware
projects and relates them to the overview. The overview is intended
for desktop and embedded programmers and architects.
The second part explains how to use C++'s upcoming memory model and
atomic API. It also relates the memory model to classical definitions
of distributed computing in an attempt to bridge the gap in
terminology between the research literature and C++. An implementation
of hazard pointers and a lock-free stack and queue are given as
example C++0x code. This part is aimed at expert C++ developers and
the research community.
Best Regards, Johan Torp
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
> An implementation
> of hazard pointers
Aren't those patented, and thus, unusable?
It's just sad how so many things in lock-free programming seem to be
patented. What's the point of research if it cannot be used in the
real world before decades?
Yes.
> and thus, unusable?
You would need to grab a licence from IBM.
> It's just sad how so many things in lock-free programming seem to be
> patented. What's the point of research if it cannot be used in the
> real world before decades?
RCU is patented and its used within the Linux Kernel. If you use Linux, your
using patented algorihtms indeed!
Well, there is still parts of the world where software patents are not
recognised, so unless you plan to export your software to parts where
they are you can still use them.
--
Erik Wikström
> RCU is patented and its used within the Linux Kernel.
That's because IBM allows royalty-free usage of RCU for any software
licensed under GPL.
On Sep 7, 2:28 am, Mathias Gaunard <loufo...@gmail.com> wrote:
> On 6 sep, 18:33,Johan Torp<johan.t...@gmail.com> wrote:
>
> > An implementation
> > of hazard pointers
>
> Aren't those patented, and thus, unusable?
Yes, http://www.freepatentsonline.com/y2006/0265373.html
I'm not sure how unusable they are, if they can be used in commercial
applications with Maged Michael's approval. I can also imagine that
Michael patented hazard pointers to not get run over by the patents of
big companies.
> It's just sad how so many things in lock-free programming seem to be
> patented. What's the point of research if it cannot be used in the
> real world before decades?
I agree, it's a real shame. Lock-free programming might have a short
shelf-life too, since transactional memory is hopefully on the way and
since non thread-based programming models will hopefully become more
and more commonplace. I.e., lock-free programming should be applied
now, to help extract performance out of today's threading solutions.
The whole concept of lock-free programming (not the lock-free
property) seems like a big hack to me, you really do not want to
program that way but have to with today's software stack if you are to
extract maximum throughput, responsiveness and to avoid dead-locks.
Johan
You use it like so:
struct A:
enable_mutual_ptr<A>//allocator and intrusive ref counter
{
int a;
int b;
};
mutual_ptr<A> global;//multiple threads can access atomically
void threadfunc(void*){
isolated_ptr<A> read;
isolated_ptr<A> update;
do{
read=global;
if(read){
update.reset(new A(*read));//copy
update->a=read->a+1011;//modify, etc
}else{
update.reset(new A);
//more stuff
}
}while(global.CAS(read,update));
}
The allocator serves a few purposes:
1. That only A's are allocated, since fields may be accessed after
return to the allocator (a standard allocator does not allow this)
2. I used DCAS internally, however special allocator would allow for
pointer compression if DCAS is undesirable
3. it is also lock-free
The basics of the guts are this:
When an A is constructed, a version is stored inside the object next
to the reference count. a 0 value for version is special.
The mutual_ptr stores both the pointer and the version
The release method checks both the reference count and the version. If
the versions match AND the refcount is 0, then the version is also set
to 0. (i.e. if refcount is 1 and version is X, set (version,refcount)-
>(0,0) atomically. Then delete.
//part of "enable_mutual_ptr"
friend void mutual_ptr_release(T const* p){
//storage_type is pair of refcount and version(tag)
do{
storage_type oldval(p->m_refs.get_data(),p->m_refs.get_tag());//tag
and count
unsigned new_tag=oldval.get_tag();
if(new_tag==0)return;
unsigned cnt=oldval.get_data();
if(cnt==1)new_tag=0;
storage_type newval(cnt-1,new_tag);
if(p->m_refs.CAS(oldval,newval)){
if(cnt==1) delete const_cast<T*>(p);
return;
}
}while(true);
}
Now, the tricky part comes in when making an isolated_ptr from a
mutual_ptr -- it is possible that mutual_ptr contains a stale pointer.
So instead of the traditional refcount increment, we do this
//part of "enable_mutual_ptr"
freind unsigned mutual_ptr_add_ref(T const* p, unsigned version){
//storage_type is pair of refcount and version
unsigned ret=0;
do{
storage_type oldval(p->m_refs.get_version(),p->m_refs.get_tag());//
tag and count
if(oldval.get_tag()!=tag)return 0;
ret=oldval.get_data()+1;
storage_type newval(ret,tag);
if(p->m_refs.CAS(oldval,newval))break;
}while(true);
return ret;
}
When the pointer is isolated, the traditional "increment the reference
count" can be done.
Now when making the copy, we must try a repeatable read (some will
argue that a repeatable read is not strictly lock-free, since is it
possible that no thread makes progress. However, it woould have the
obstruction free property--if I compressed the pointer, it would be
lock free)
template<class Y>
isolated_ptr(mutual_ptr<Y> const & mutual_rhs)
{
//m_p is member which is pair of pointer and version
do{
m_p=mutual_rhs.m_p;//pointer and version
if(!m_p.CAS(mutual_rhs.m_p,mutual_rhs.m_p))//not sliced
continue;
T* lcl=m_p.get_data();
if(lcl==0)return;
unsigned cnt = mutual_ptr_add_ref(lcl,m_p.get_tag());
if(cnt >0)//not a zombie
return;
else{
sched_yield();//backoff
}
//eventually mutual_rhs will either have a live value or a 0
}while(true);
}
These are basically the highlights. I use this inside some market data
servers (those things that takes all those stock market ticks and does
stuff with them --typically 100K msg per second with <1ms latency)
So there you have it. A lockfree pointer that has no patents (and not
even a copyright). I suppose a newsgroup is evidence enough of prior-
art is someone did try to patent it.
Lance
C++0x is geared toward providing the ability for programmers to be able to
create portable lock-free programming indeed! Good for C++.
Anyway, you have not done proper research because lock-free programming has
been around for ages. Sorry, but its has already been on the shelf for a
very long time indeed. One simple example:
http://publibz.boulder.ibm.com/epubs/pdf/dz9zr002.pdf
In appendix A, under multi-processing. Freelist and the execellent PLO
instruction.
> since transactional memory is hopefully on the way
BTW, did you know that transaction memory has been around for ages as well?
You need to do some more research. Think of the Journal File System for AIX
in rs/6000:
http://groups.google.com/group/comp.arch/msg/995379a16beb3b69
How would you solve the major caveats?
Live-lock for starters:
http://groups.google.com/group/comp.lang.c++/msg/93170262d5b55519
(read all; even Intel STM compiler does not have an answer!)
AFAICT, the best thing so far would be the excellent research from AMD.
However, its geared toward creating lock-free algorihtms:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/8d9708596c832a18
Lock-free programming is not going away anytime soon indeed.
> and
> since non thread-based programming models will hopefully become more
> and more commonplace. I.e., lock-free programming should be applied
> now, to help extract performance out of today's threading solutions.
Sure. C++0x will be a GREAT help indeed.
> The whole concept of lock-free programming (not the lock-free
> property) seems like a big hack to me
Why???
> , you really do not want to
> program that way but have to with today's software stack if you are to
> extract maximum throughput, responsiveness and to avoid dead-locks.
This is nothing new.
I posted the following in a forum in which its on-topic:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/a8a8fde275843504
Please respond as I am interested in your algorithm. Anyway, there are
several non-patented memory reclamation algorihtms here:
http://atomic-ptr-plus.sourceforge.net
Here is one of mine:
http://webpages.charter.net/appcore/misc/pc_sample_h_v1.html
Here is another:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/3ded0e3c5496eaa0
If you want instructions on how to use them, please post a query in
comp.programming.threads.
Right. Its up to the patent holder to decide how his/her invention is going
to be used!
Please check out Joe Seighs atomic_ptr:
http://atomic-ptr-plus.sourceforge.net
Its not patented and is fully lock-free, however, it uses DWCAS. You could
get around that by using an alignment trick.
I know, I've read plenty of articles which are 10-20 years old. I
probably wasn't clear enough.
I meant that lock-free programming for thread-based programming on
desktop computers mihgt have a short shelf-life. Lock-free programming
provides three things; throughput, responsiveness and the progress
condition. As other programming models (such as transactional memory,
more general purpose DLP models, message passing languages, other
models not based on threads etc) emerge lock-free programming will
become less important other than for the most performance critical
parts. Also (and perhaps more importantly), lock-free programming is
tied to shared memory architectures which might not stay mainstream
for desktop computing for more than 5-10 years (wild speculation, but
it seems possible).
> > since transactional memory is hopefully on the way
>
> BTW, did you know that transaction memory has been around for ages as well?
Yes. For instance, I mention a transactional memory paper from the 80s
(Tom Knight 86).
> Live-lock for starters:
>
> http://groups.google.com/group/comp.lang.c++/msg/93170262d5b55519
> (read all; even Intel STM compiler does not have an answer!)
I'm well aware that transactional memory is not a silver bullet and
that it might not even be viable as a programming model. Time will
tell, it will be interesting to see what happens when Sun releases
Rock. It is like all programming models, you have to have 100s or
1000s of large projects testing them before you can discern the real
pros and cons.
> AFAICT, the best thing so far would be the excellent research from AMD.
> However, its geared toward creating lock-free algorihtms:
Thanks for the pointers, I will look in to them.
> Lock-free programming is not going away anytime soon indeed.
I don't think so either. The question is how many programmers will be
doing lock-free programming and how many of the performance critical
languages and middleware will be using it.
> > and
> > since non thread-based programming models will hopefully become more
> > and more commonplace. I.e., lock-free programming should be applied
> > now, to help extract performance out of today's threading solutions.
>
> Sure. C++0x will be a GREAT help indeed.
I hope so, but I'm not 100% sure. It will probably be good for
targeting efficiency for one architecture and getting portable code
for others for free. But that does not mean that you get efficient
code on those other architectures. Which CMP architecture you're
targeting might decide which high-level algorithm you should use. It
can also affect lower level atomic API calls such as using
memory_order_seq_cst or adding some extra memory_order_acq_rel
operations to do some sycnhronization manually. Also, C++0x's memory
model might be to complicated. Perhaps it is not a good model to make
formal proofs in either. Again, time will tell if it is successful.
> > The whole concept of lock-free programming (not the lock-free
> > property) seems like a big hack to me
>
> Why???
It is a horribly complex way of programming. You manually have to
figure out how to perform operations that appear atomical on multiple
memory locations with CAS-loops and the like. It is not modular; You
basically have to consider ALL methods of a concurrent object together
to verify that your code is correct. You also have to keep track of
what OTHER threads are doing and manually do things like helping. Lock-
free objects to do not automatically compose with other lock-free
objects. It is very hard to spot, test and reproduce bugs. It is
definetely not a good environment for the mainstream parallel
programmer.
> > , you really do not want to
> > program that way but have to with today's software stack if you are to
> > extract maximum throughput, responsiveness and to avoid dead-locks.
>
> This is nothing new.
I'm not aiming to provide something new. I'm aiming to provide an
overview of contemporary desktop parallelism. The overview in itself
is new, not the content.
Johan