Simple Proxy Collector...

Chris M. Thomasson

unread,

Mar 27, 2018, 11:56:07 PM3/27/18

to

Fwiw, since I have been working on fractals so much, I was wondering if
creating a C++11 implementation of a proxy collector of mine would be as
easy as I did it in the past. For some more information please read here:

https://groups.google.com/d/topic/lock-free/X3fuuXknQF0/discussion

Think of user-space RCU

https://en.wikipedia.org/wiki/Read-copy-update

Here is my C++11 code, it should compile and run on 32-bit x86 with
support for DWCAS. Well, it should run on any other arch with support
for lock-free DWCAS. In other words, the following struct needs to be at
least lock-free atomic wrt std::atomic<T>:
_______________________
struct anchor
{
std::intptr_t count;
node* head;
};
_______________________

http://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free

Has to return true. The lock-free part is the DWCAS in the acquire function.

Here is my C++11 code [1]:

https://pastebin.com/raw/KAt4nhCj

The core algorithm has been verified with Relacy Race Detector:

https://groups.google.com/d/topic/lock-free/QuLBH87z6B4/discussion

I am currently writing up how this can be useful, sorry about that.
Basically, its like RCU where reader threads can read a data-structure
concurrently along with writer threads mutating it. Writer threads can
collect objects while reader threads are accessing them. Every thing is
accounted for using differential counting.

I can get the code to compile and run on 32-bit x86.

Can anybody else get it to compile and/or run?

Thanks.

[1] code:
___________________________
/* Simple Proxy Collector (DWCAS) - C++11 Version

http://www.1024cores.net/home/relacy-race-detector

Copyright 3/27/2018
___________________________________________________*/

#if defined (_MSC_VER)
# define _ENABLE_ATOMIC_ALIGNMENT_FIX // for dwcas
#endif

#include <atomic>
#include <mutex>

#include <cstdio>
#include <cstdlib>
#include <cstdint>
#include <cassert>

static std::atomic<unsigned int> g_allocs(0);

/* Utilities
___________________________________________________*/
void ct_cout(
const char* msg
) {
static std::mutex g_cout_lock;
g_cout_lock.lock();
std::printf(msg);
std::fflush(stdout);
g_cout_lock.unlock();
}

/* Member Fence Abstraction
___________________________________________________*/
#define ct_mb_relaxed std::memory_order_relaxed
#define ct_mb_acquire std::memory_order_acquire
#define ct_mb_release std::memory_order_release
#define ct_mb_acq_rel std::memory_order_acq_rel
#define ct_mb_seq_cst std::memory_order_seq_cst
#define ct_mb_fence(mb_membar) std::atomic_thread_fence(mb_membar)

namespace ct {
namespace proxy {

// User object base
struct object
{
virtual ~object() {}
};

// Proxy node
struct node
{
std::atomic<std::intptr_t> count;
node* next;
object* obj;

node(node const& n)
: count(n.count.load(ct_mb_relaxed)), next(n.next), obj(n.obj)
{

}

node& operator =(node const& n)
{
count.store(n.count.load(ct_mb_relaxed));
next = n.next;
obj = n.obj;
return *this;
}

node(std::intptr_t count_, node* next_, object* obj_)
: count(count_), next(next_), obj(obj_)
{
g_allocs.fetch_add(1, ct_mb_relaxed);
}

~node()
{
g_allocs.fetch_sub(1, ct_mb_relaxed);
}
};

// DWCAS target
struct anchor
{
std::intptr_t count;
node* head;
};

// Proxy Collector
struct gc
{
std::atomic<anchor> head;

gc() : head(anchor { 0, new node(0, nullptr, nullptr) }) {}

~gc()
{
anchor cmp = head.load(ct_mb_relaxed);
assert(cmp.count > -1);
assert(cmp.head->next == nullptr);
prv_dtor(cmp.head);
}

// Release a reference
bool prv_release(node* n)
{
std::intptr_t count = n->count.fetch_sub(2, ct_mb_relaxed);
if (count == 3) return true;
return false;
}

// Destroy a node
void prv_dtor(node* n)
{
object* obj = n->obj;
if (obj != nullptr) delete obj;
delete n;
}

// Dump backlinks
void prv_dump(node* n)
{
node* cur = n->next;
n->next = nullptr;

// Release
while (prv_release(cur))
{
ct_mb_fence(ct_mb_acquire);
node* next = cur->next;
cur->next = n;
n = cur;
cur = next;
}

// Destroy
while (n)
{
node* next = n->next;
prv_dtor(n);
n = next;
}
}

// Collect a node
void prv_collect(node* n)
{
anchor xchg = { 0, n };

ct_mb_fence(ct_mb_release);

anchor cmp = head.exchange(xchg, ct_mb_relaxed);

ct_mb_fence(ct_mb_acquire);
cmp.head->next = xchg.head;
ct_mb_fence(ct_mb_release);

std::intptr_t count = cmp.head->count.fetch_add(cmp.count +
1, ct_mb_relaxed);

if (count + cmp.count == 0)
{
prv_dump(cmp.head);
}
}

// Acquire a node
node* acquire()
{
anchor cmp = head.load(ct_mb_relaxed);
anchor xchg = { cmp.count + 2, cmp.head };

while (! head.compare_exchange_weak(cmp, xchg, ct_mb_relaxed))
{
xchg = { cmp.count + 2, cmp.head };
}

ct_mb_fence(ct_mb_acquire);

return cmp.head;
}

// Release a node
void release(node* n)
{
ct_mb_fence(ct_mb_release);

if (prv_release(n))
{
ct_mb_fence(ct_mb_acquire);
prv_dump(n);
}
}

// Collect an object
void collect(object* obj)
{
prv_collect(new node(2, nullptr, obj));
}
};
}}

// Test object
struct foo : public ct::proxy::object
{
std::atomic<foo*> next;

foo(foo* next_ = nullptr) : next(next_)
{
g_allocs.fetch_add(1, ct_mb_relaxed);
}

virtual ~foo()
{
g_allocs.fetch_sub(1, ct_mb_relaxed);
}
};

// An atomic LIFO of test objects
struct foo_lifo
{
std::atomic<foo*> head;

foo_lifo(foo* next_ = nullptr) : head(next_) { }

void push(foo* n)
{
foo* cmp = head.load(ct_mb_relaxed);

do {
n->next.store(cmp, ct_mb_relaxed);
ct_mb_fence(ct_mb_release);
} while (! head.compare_exchange_weak(cmp, n, ct_mb_relaxed));
}

// Flush all items, and return LIFO list or nullptr.
foo* flush()
{
foo* cmp = head.exchange(nullptr, ct_mb_relaxed);
if (cmp) ct_mb_fence(ct_mb_acquire);
return cmp;
}
};

/* Test program
____________________________________________ */
#include <thread>
#include <functional>

#define READERS 7
#define WRITERS 5
#define ITERS 1000000
#define THREADS (READERS + WRITERS)

// Our shared state
struct ct_test_state
{
ct::proxy::gc gc;
foo_lifo lifo;

ct_test_state() : gc(), lifo() {}
};

// Read the tstate.lifo shared data-structure.
void ct_reader_thread(
ct_test_state& tstate
) {
ct_cout("ct_reader_thread: Enter\n");

// Read...
for (unsigned int i = 0; i < ITERS * 2; ++i)
{
// Acquire proxy
ct::proxy::node* pcn = tstate.gc.acquire();

// Iterate list
{
foo* cur = tstate.lifo.head.load(ct_mb_relaxed);
ct_mb_fence(ct_mb_acquire);

while (cur)
{
foo* next = cur->next.load(ct_mb_relaxed);
std::this_thread::yield();
cur = next;
}
}

// Release proxy
tstate.gc.release(pcn);

if (! (i % (ITERS / 64)))
{
ct_cout("ct_reader_thread: Processing...\n");
}
}

ct_cout("ct_reader_thread: Exit\n");
}

// Mutate the tstate.lifo shared data-structure.
void ct_writer_thread(
ct_test_state& tstate
) {
ct_cout("ct_writer_thread: Enter\n");

// Create and collect some nodes
for (unsigned int i = 0; i < ITERS; ++i)
{
// Create
tstate.lifo.push(new foo(nullptr));
std::this_thread::yield();
tstate.lifo.push(new foo(nullptr));
std::this_thread::yield();
tstate.lifo.push(new foo(nullptr));
std::this_thread::yield();

// Collect
foo* cur = tstate.lifo.flush();

while (cur)
{
std::this_thread::yield();
foo* next = cur->next.load(ct_mb_relaxed);
tstate.gc.collect(cur);
std::this_thread::yield();
cur = next;
}

if (!(i % (ITERS / 64)))
{
ct_cout("ct_writer_thread: Processing...\n");
}
}

ct_cout("ct_writer_thread: Exit\n");
}

int main()
{
{
ct_test_state tstate;

std::thread threads[THREADS];

// Reader threads
for (unsigned int i = 0; i < READERS; ++i)
{
threads[i] = std::thread(ct_reader_thread, std::ref(tstate));
}

// Writer threads
for (unsigned int i = READERS; i < THREADS; ++i)
{
threads[i] = std::thread(ct_writer_thread, std::ref(tstate));
}

// Join threads
for (unsigned int i = 0; i < THREADS; ++i)
{
threads[i].join();
}
}

// Check for memory leaks
if (g_allocs.load(ct_mb_relaxed) != 0)
{
std::printf("MEMORY LEAK!\n");
assert(false);
}

std::printf("\n\nFin!\n");
std::fflush(stdout);
std::getchar();

return EXIT_SUCCESS;
}
___________________________

Chris M. Thomasson

unread,

Mar 30, 2018, 12:30:39 AM3/30/18

to

On 3/27/2018 8:55 PM, Chris M. Thomasson wrote:
> Fwiw, since I have been working on fractals so much, I was wondering if
> creating a C++11 implementation of a proxy collector of mine would be as
> easy as I did it in the past. For some more information please read here:
>
> https://groups.google.com/d/topic/lock-free/X3fuuXknQF0/discussion
>
> Think of user-space RCU
>
> https://en.wikipedia.org/wiki/Read-copy-update
>
> Here is my C++11 code, it should compile and run on 32-bit x86 with
> support for DWCAS. Well, it should run on any other arch with support
> for lock-free DWCAS. In other words, the following struct needs to be at
> least lock-free atomic wrt std::atomic<T>:
> _______________________
> struct anchor
> {
> std::intptr_t count;
> node* head;
> };
> _______________________
>
> http://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free
>
> Has to return true. The lock-free part is the DWCAS in the acquire
> function.
>
> Here is my C++11 code [1]:
>
> https://pastebin.com/raw/KAt4nhCj

[...]

> I am currently writing up how this can be useful, sorry about that.
> Basically, its like RCU where reader threads can read a data-structure
> concurrently along with writer threads mutating it. Writer threads can
> collect objects while reader threads are accessing them. Every thing is
> accounted for using differential counting.
>
> I can get the code to compile and run on 32-bit x86.
>
> Can anybody else get it to compile and/or run?
>
> Thanks.
>
>
> [1] code:
> ___________________________

[...]
> ___________________________

Perhaps I can explain this through a series of questions. Let me try to
start...

Question 1

How many people want to iterate over a shared data-structure without
using a read/write mutex? In other words, would making the writers run
concurrently with the readers be beneficial to you at all? Readers can
run full steam ahead, while the writers do not interrupt them. This can,
increase concurrency and overall performance, and can get rid of certain
bottlenecks especially if the problem is mostly read based.

Chris M. Thomasson

unread,

Apr 5, 2018, 3:50:38 PM4/5/18

to

Think about it.... Being able to have a shi% load of reader threads
concurrently iterating a shared data-structure that cannot be blocked by
any concurrent writer threads. I can think of many situations where this
is a desirable feature...

Can anybody else think of one?

Chris M. Thomasson

unread,

Apr 8, 2018, 12:32:54 AM4/8/18

to

Think of frequent read-based database queries that you do not want to be
blocked up by other concurrent tasks that happen to have some writes...

Jorgen Grahn

unread,

Apr 9, 2018, 11:20:25 AM4/9/18

to

On Sun, 2018-04-08, Chris M. Thomasson wrote:
> On 4/5/2018 12:50 PM, Chris M. Thomasson wrote:
>> On 3/29/2018 9:30 PM, Chris M. Thomasson wrote:
>>> On 3/27/2018 8:55 PM, Chris M. Thomasson wrote:

...

> Think of frequent read-based database queries that you do not want to be
> blocked up by other concurrent tasks that happen to have some writes...

I /do/ believe there are people who are interested in this stuff,
but they don't seem to read comp.lang.c++ ...

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Chris M. Thomasson

unread,

Apr 9, 2018, 5:32:13 PM4/9/18

to

On 4/9/2018 8:20 AM, Jorgen Grahn wrote:
> On Sun, 2018-04-08, Chris M. Thomasson wrote:
>> On 4/5/2018 12:50 PM, Chris M. Thomasson wrote:
>>> On 3/29/2018 9:30 PM, Chris M. Thomasson wrote:
>>>> On 3/27/2018 8:55 PM, Chris M. Thomasson wrote:
> ...
>> Think of frequent read-based database queries that you do not want to be
>> blocked up by other concurrent tasks that happen to have some writes...
>
> I /do/ believe there are people who are interested in this stuff,
> but they don't seem to read comp.lang.c++ ...

Damn. Well, this simple proxy collector can help out. It is not the most
efficient solution, but it gives an idea on how something like this can
be implemented in pure C++ 11. Unfortunately, the following group seems
to have less traffic:

https://groups.google.com/forum/#!forum/lock-free

Argh!

Jorgen Grahn

unread,

Apr 10, 2018, 11:23:12 AM4/10/18

to

Surely there's /some/ place where this is discussed openly? Although
if the forum is C-oriented, I imagine mentioning C++ can be difficult.

Richard

unread,

Apr 10, 2018, 3:55:24 PM4/10/18

to

[Please do not mail me a copy of your followup]

Jorgen Grahn <grahn...@snipabacken.se> spake the secret code
<slrnpcplmj.31...@frailea.sa.invalid> thusly:

>Surely there's /some/ place where this is discussed openly? Although
>if the forum is C-oriented, I imagine mentioning C++ can be difficult.

Wherever it is, it is http based and not nntp based :-P
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Chris M. Thomasson

unread,

Apr 10, 2018, 4:37:29 PM4/10/18

to

On 4/10/2018 8:22 AM, Jorgen Grahn wrote:
> On Mon, 2018-04-09, Chris M. Thomasson wrote:
>> On 4/9/2018 8:20 AM, Jorgen Grahn wrote:
>>> On Sun, 2018-04-08, Chris M. Thomasson wrote:
>>>> On 4/5/2018 12:50 PM, Chris M. Thomasson wrote:
>>>>> On 3/29/2018 9:30 PM, Chris M. Thomasson wrote:
>>>>>> On 3/27/2018 8:55 PM, Chris M. Thomasson wrote:
>>> ...
>>>> Think of frequent read-based database queries that you do not want to be
>>>> blocked up by other concurrent tasks that happen to have some writes...
>>>
>>> I /do/ believe there are people who are interested in this stuff,
>>> but they don't seem to read comp.lang.c++ ...
>>
>> Damn. Well, this simple proxy collector can help out. It is not the most
>> efficient solution, but it gives an idea on how something like this can
>> be implemented in pure C++ 11. Unfortunately, the following group seems
>> to have less traffic:
>>
>> https://groups.google.com/forum/#!forum/lock-free
>>
>> Argh!
>
> Surely there's /some/ place where this is discussed openly?

comp.programming.threads used to be the place a decade or so ago. Humm.
Perhaps we can try to revive it. ;^)

> Although
> if the forum is C-oriented, I imagine mentioning C++ can be difficult.

Well, then we go into C11. Pelles C compiler can handle that. Ahh, I
should implement my Proxy collector in C11.

Chris M. Thomasson

unread,

Apr 10, 2018, 4:37:56 PM4/10/18

to

On 4/10/2018 12:55 PM, Richard wrote:
> [Please do not mail me a copy of your followup]
>
> Jorgen Grahn <grahn...@snipabacken.se> spake the secret code
> <slrnpcplmj.31...@frailea.sa.invalid> thusly:
>
>> Surely there's /some/ place where this is discussed openly? Although
>> if the forum is C-oriented, I imagine mentioning C++ can be difficult.
>
> Wherever it is, it is http based and not nntp based :-P
>

Argh! ;^)