Re: [lock-free] Single target proxy collector

485 views
Skip to first unread message

Anthony Williams

unread,
Feb 15, 2013, 4:16:36 AM2/15/13
to lock...@googlegroups.com
On 14/02/13 23:07, jseig...@gmail.com wrote:
> C11 and C++11 left out the most powerful atomic primitives. Despite
> that you can do some limited implementations though they're much more
> complicated than they should be and a PITA to develop.

I'd love to know what primitives you think we left out. Pretty much
everything that was proposed for threading was included, and we're
actively seeking proposals for C++14 and C++17, so a clear description
of the desired facility could lead to it being in the next C++ standard.

> Also not
> helping, stdatomic.h seems to be missing from Ubuntu 12.10. I have no
> idea where it is or when or if it will ever show up.

I didn't think gcc was targetting C11. gcc 4.7 certainly has the
<atomic> header from C++11.

> http://sourceforge.net/projects/atomic-ptr-plus/files/stpc/

Interesting. I'll have to have a good look at this.

Cheers,

Anthony
--
Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/
just::thread C++11 thread library http://www.stdthread.co.uk
Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk
15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976

Anthony Williams

unread,
Feb 15, 2013, 7:42:17 AM2/15/13
to lock...@googlegroups.com
On 15/02/13 12:09, jseig...@gmail.com wrote:
>
>
> On Friday, February 15, 2013 4:16:36 AM UTC-5, Anthony Williams wrote:
>
> On 14/02/13 23:07, jseig...@gmail.com <javascript:> wrote:
> > C11 and C++11 left out the most powerful atomic primitives. Despite
> > that you can do some limited implementations though they're much more
> > complicated than they should be and a PITA to develop.
>
> I'd love to know what primitives you think we left out. Pretty much
> everything that was proposed for threading was included, and we're
> actively seeking proposals for C++14 and C++17, so a clear description
> of the desired facility could lead to it being in the next C++
> standard.
>
>
> Double word compare and swap, or load locked/store conditional. Either
> one makes it trivial to implement a reference counted proxy collector,
> e.g. rcpc on the same project site. There was some "universal atomic
> primitive" elixir thinking going on. If double word compare and swap
> wasn't really important, why would IBM and Intel continue to support it
> every time they bumped up the word size. Intel didn't have to add
> CMPXCHG16B for compatibility since it was a new instruction. Try
> implementing a lock-free stack using C++11 or C11. It's trivial using
> double word compare and swap or ll/sc. The algorithm using the former
> has been around for almost 4 decades which is an eternity in the
> computing milieu.

If the CPU has DCAS, so does C++11:

struct dw{
unsigned first;
unsigned second;
};

std::atomic<dw> value;

void func(){
dw expected={42,97};
dw temp={1,2};
if(value.compare_exchange_strong(expected,temp)){
do_something();

Anthony Williams

unread,
Feb 15, 2013, 12:44:56 PM2/15/13
to lock...@googlegroups.com
On 15/02/13 17:38, jseig...@gmail.com wrote:
>
>
> On Friday, February 15, 2013 7:42:17 AM UTC-5, Anthony Williams wrote:
> If the CPU has DCAS, so does C++11:
>
> struct dw{
> unsigned first;
> unsigned second;
> };
>
> std::atomic<dw> value;
>
> void func(){
> dw expected={42,97};
> dw temp={1,2};
> if(value.compare_exchange_strong(expected,temp)){
> do_something();
> }
> }
>
>
> Interesting. It wasn't clear they were going to do that. I assume
> they emulate that on architectures with double word ll/sc, e.g. powerpc
> and arm. So what architectures are left out? Sparc, MIPS? Ones
> most of us don't care about probably.

If there is no native DCAS or double-word ll/sc that can be used, then
this will still compile, but will use a library-supplied mutex instead.
You can test this with the is_lock_free() member function.

Chris M. Thomasson

unread,
Feb 16, 2013, 3:10:54 PM2/16/13
to Scalable Synchronization Algorithms


On Feb 14, 3:07 pm, jseigh16...@gmail.com wrote:
> C11 and C++11 left out the most powerful atomic primitives.   Despite that
> you can do some limited implementations though they're much more
> complicated than they should be and a PITA to develop.  Also not helping,
> stdatomic.h seems to be missing from Ubuntu 12.10.   I have no idea where
> it is or when or if it will ever show up.
>


Nice to hear from you Joe! :^)


Anyway, FWIW, here is another "single target" proxy collector that
does not use any CAS at all:

http://pastebin.com/f71480694


This is in the from of a unit test that runs in Relacy Race Detector:

http://www.1024cores.net/home/relacy-race-detector


It uses just a single fetch-and-add for acquiring a proxy reference,
and single fetch-and-sub for releasing it. IMVHO, that's a pretty
"nifty" property. Here is a discussion I had regarding the algorithm
over on the Boost mailing list a couple of years ago:

http://article.gmane.org/gmane.comp.lib.boost.devel/198747


IMO, this algorithm can be extended to more than 2 collector objects
quite easily...




> http://sourceforge.net/projects/atomic-ptr-plus/files/stpc/

Looks very interesting! I will make sure to study this one Joe.


Thanks!

Anthony Williams

unread,
Feb 19, 2013, 6:19:17 AM2/19/13
to lock...@googlegroups.com
On 19/02/13 10:59, jseig...@gmail.com wrote:
> Side note. Does anyone recall any discussion of explicit intent to
> implement maximum width compare and swap in C++11 or C11 in the
> cpp-threads mailing list? I see some double word stuff in the NP
> atomics library. Also, not surprisingly, support for Itanium. I
> don't think it will show up in gcc. Some vendor compilers probably.

My Just::Threads implementation of the C++11 thread library for gcc 4.3+
and MSVC 2005+ has double-word compare and swap
(http://www.stdthread.co.uk).

For 64-bit targets, if the CPU doesn't have CMPXCHG16B then it falls
back to an internal mutex. You can test this at runtime with the
is_lock_free() member function I mentioned earlier.

I expect other vendors will follow suit in due course --- I had explicit
customer requests for this when it was omitted from early versions.

Chris M. Thomasson

unread,
Feb 20, 2013, 5:23:22 PM2/20/13
to lock...@googlegroups.com, jseig...@gmail.com

On first glance, I see 1 CAS-loop on `LL(…)’, and one or “more” CAS-loop(s) on the `SC(…)’.  Actually, it seems as if it can spiral into many CAS-loops on `SC()’ side of the algorithm.  Keep in mind that `setNLPred()’ also contains its own CAS-loop; `Transfer()’ has one too...

Actually, if I take the papers example code “literally”, and if (sizeof(int) == sizeof(void*)) happens to be true, then something like DWCAS will be required…     ;^)


My algorithm has no loops.  IMVVHO, since I am using raw atomic RMW operations that return no failure value in the sense that they _always_ mutate data. (e.g., Fetch-and-Add/Exchange instructions), well, that has to at least count for something damn it, on certain arch’s at least!   ;^/


Need to study STPC some more!   :^)


What do you think?



I hate dealing with any sort of imperial entanglement…


:^o


Chris M. Thomasson

unread,
Feb 20, 2013, 5:31:53 PM2/20/13
to lock...@googlegroups.com, jseig...@gmail.com


On Wednesday, February 20, 2013 2:23:22 PM UTC-8, Chris M. Thomasson wrote:

On first glance, I see 1 CAS-loop on `LL(…)’, and one or “more” CAS-loop(s) on the `SC(…)’.  Actually, it seems as if it can spiral into many CAS-loops on `SC()’ side of the algorithm.  Keep in mind that `setNLPred()’ also contains its own CAS-loop; `Transfer()’ has one too...

[...]

 

My algorithm has no loops.  

 
Well, one could count the call to `proxy<...>::prv_destroy(...)' as a loop.

My only argument is that I did not lace it with CAS-loops...

Also, there is no reason the thread that detected quiescence should be the exact same thread that iterates a list of dtors...

Chris M. Thomasson

unread,
Feb 26, 2013, 3:20:27 AM2/26/13
to lock...@googlegroups.com, jseig...@gmail.com
FWIW, here is some weird seqlock proxy collector hybrid thing. Needs mutex on collect phase, but hay now... It certainly keeps the collector in line (e.g., atomic) with the _current_ reference count or else the whole thing bites the dust!

;^o


http://pastebin.com/TgTcfYtR



Anyway, wrt stpc, the fact that you load a pointer to collector along with the CAS loop to a seq number in VERY interesting indeed!

:^)

Chris M. Thomasson

unread,
Feb 26, 2013, 4:45:38 AM2/26/13
to lock...@googlegroups.com, jseig...@gmail.com
FWIW, here is yet another proxy collector, just to show off how useful DWCAS can actually be:


;^)

Chris M. Thomasson

unread,
Oct 31, 2017, 1:32:07 AM10/31/17
to Scalable Synchronization Algorithms
On Friday, February 15, 2013 at 1:16:36 AM UTC-8, Anthony Williams wrote:
On 14/02/13 23:07, jseig...@gmail.com wrote:
> C11 and C++11 left out the most powerful atomic primitives.   Despite
> that you can do some limited implementations though they're much more
> complicated than they should be and a PITA to develop.

I'd love to know what primitives you think we left out. Pretty much
everything that was proposed for threading was included, and we're
actively seeking proposals for C++14 and C++17, so a clear description
of the desired facility could lead to it being in the next C++ standard.

> Also not
> helping, stdatomic.h seems to be missing from Ubuntu 12.10.   I have no
> idea where it is or when or if it will ever show up.

I didn't think gcc was targetting C11. gcc 4.7 certainly has the
<atomic> header from C++11.

> http://sourceforge.net/projects/atomic-ptr-plus/files/stpc/

Interesting. I'll have to have a good look at this.

It seems that Joe Seigh's posts got deleted from this thread. Damnm, that is a shame.

Dmitry Vyukov

unread,
Oct 31, 2017, 2:32:54 AM10/31/17
to lock...@googlegroups.com
Humm... Maybe Joe deleted it himself? At least we have it in Anthony's reply.
Reply all
Reply to author
Forward
0 new messages