a lock-based proxy collector?

Chris M. Thomasson

unread,

Dec 18, 2009, 8:36:00 AM12/18/09

to

Okay, for those of you who are familiar with proxy garbage collectors, check
this simple pseudo-code out:
________________________________________________________________
struct node
{
node* m_next;
};

struct collector
{
node* m_defer; // = NULL
rwmutex m_rwmutex;

void acquire()
{
m_rwmutex.lock_read();
}

void release()
{
m_rwmutex.unlock_read();
}

// defer only called while this collector is acquired
void defer(struct node* n)
{
n->m_next = ATOMIC_SWAP_RELAXED(&m_defer, n);
}

node* collect()
{
m_rwmutex.lock_write();

node* n = m_defer;

m_defer = NULL;

m_rwmutex.unlock_write();

// if `n' is not NULL then all nodes in the
// chain are now in a quiescent state.

return n;
}
};

template<size_t T_depth>
struct proxy
{
uint32_t m_head; // = 0
collector m_collect[T_depth];

// loads a reference to the current collector
collector& load()
{
uint32_t head = ATOMIC_LOAD(&m_head);

return m_collect[head % T_depth];
}

// returns reference to a collector than can be collected
collector& mutate()
{
// atomically install next collector
uint32_t head = ATOMIC_FAA(&m_head, 1);

return m_collect[head % T_depth];
}
};
________________________________________________________________

Unless I am missing something, and as long a `proxy::mutate()' is called on
a periodic/episodic basis by any thread then everything should work fine.
The performance of this collector should not be "all that bad" especially
when used with a "decent" read/write mutex implementation like:

http://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/65822

BTW, this is not the same as stop-the-world style garbage collection. For
instance, if threads A-B are in collector C1; thread C mutates to collector
C2 and obtains C1; threads D-F enter C2; thread C collects C1.

This means that thread C does not need to synchronize with threads D-F
because they were not in collector C1 at the time of mutation. AFAICT, this
should significantly reduce chances of contention. Also, to further reduce
contention, thread C could collect C1 after it mutates and obtains another
collector. This would give threads A-B a chance to leave C1. You could allow
thread C to defer collections even further to try and reduce contention even
more.

I need to look over the code to see if I am missing anything, but other than
that, well, it seems like this is another example of how combining blocking
and non-blocking algorithms can be beneficial...

;^)

Chris M. Thomasson

unread,

Dec 18, 2009, 8:53:57 AM12/18/09

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:3yLWm.35418$gd1....@newsfe05.iad...

> Okay, for those of you who are familiar with proxy garbage collectors,
> check this simple pseudo-code out:
> ________________________________________________________________

[...]

> ________________________________________________________________
>
>
>
>
> Unless I am missing something, and as long a `proxy::mutate()' is called
> on a periodic/episodic basis by any thread then everything should work
> fine.

[...]

> I need to look over the code to see if I am missing anything,

I am missing the fact that this is going to require back-linking collector
objects which will ruin everything!!!

Oh well, idea shot down; shi% happens!

;^o

Dmitriy Vyukov

unread,

Dec 18, 2009, 9:48:14 AM12/18/09

to

On Dec 18, 4:53 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Chris M. Thomasson" <n...@spam.invalid> wrote in messagenews:3yLWm.35418$gd1....@newsfe05.iad...

>
> > Okay, for those of you who are familiar with proxy garbage collectors,
> > check this simple pseudo-code out:
> > ________________________________________________________________
> [...]
> > ________________________________________________________________
>
> > Unless I am missing something, and as long a `proxy::mutate()' is called
> > on a periodic/episodic basis by any thread then everything should work
> > fine.
>
> [...]
>
> > I need to look over the code to see if I am missing anything,
>
> I am missing the fact that this is going to require back-linking collector
> objects which will ruin everything!!!
>
> Oh well, idea shot down; shi% happens!

Yeah, if a thread acquires collector Cn, it also acquires Cn+1, Cn
+2... till infinity.
I will think on this, probably the issue can be avoided somehow...

--
Dmitriy V'jukov

Dmitriy Vyukov

unread,

Dec 18, 2009, 2:39:56 PM12/18/09

to

I think you may organize collectors in a way of bounded circular
queue. "Head" denotes current collector, "tail" denotes "to reclaim"
collector, everything between head and tail is busy, everything
between tail and head is free.

Also it may be beneficial if reader that releases rw_mutex of non-
current collector last will also purge it. Plus mutate collector
automatically if garbage threshold is reached. This way you do not
need any "external periodic" activity.

--
Dmitriy V'jukov

Chris M. Thomasson

unread,

Dec 19, 2009, 1:07:45 AM12/19/09

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
news:3497f35c-827e-42c2...@s31g2000yqs.googlegroups.com...

Need to think some more, but it seems like you cannot reclaim any nodes
unless the head was equal to the tail at the time of mutation...

Chris M. Thomasson

unread,

Dec 20, 2009, 5:38:12 AM12/20/09

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:r3_Wm.9221$ft1....@newsfe10.iad...

[...]

Humm, well it seems like a "possible" scheme might be to let the readers
simply load a reference to current collector just like in my first example
pseudo-code. The mutator collects nodes and only frees them when a threshold
is encountered. The mutator could do this by simply collecting from the
_current_ collector instead of the previous one . Okay, check this scheme
out:
______________________________________________________________________
template<size_t T_depth,
size_t T_limit>

struct proxy
{
uint32_t m_head; // = 0
collector m_collect[T_depth];

node* m_defer // = NULL;
uint32_t m_defer_count; // = 0;

collector& load()
{
return m_collect[LOAD(&m_head) % T_depth];
}

// should only be called by a single thread at a time!!!
void mutate()
{
if (m_defer_count == T_limit)
{
collector& c = m_collect[LOAD(&m_head) % T_depth];

uint32_t count;
node* head = c.collect();
node* tail = get_tail_from_lifo_list_and_count(head, count);

tail->m_next = m_defer;
m_defer = tail;
m_defer_count += count;

while (m_defer)
{
node* next = m_defer;
delete m_defer;
--m_defer_count;
m_defer = next;
}

assert(! m_defer_count);
}

else
{
collector& c = m_collect[FAA(&m_head, 1) % T_depth];

uint32_t count;
node* head = c.collect();
node* tail = get_tail_from_lifo_list_and_count(head, count);

tail->m_next = m_defer;
m_defer = tail;
m_defer_count += count;
}
}
};
______________________________________________________________________

Need to think if this can possibly work or not... I will probably model this
in Relacy.

;^)

Chris M. Thomasson

unread,

Dec 20, 2009, 8:40:14 AM12/20/09

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:X6nXm.3563$yM1...@newsfe11.iad...

[...]

> ______________________________________________________________________
>
>
> Need to think if this can possibly work or not... I will probably model
> this in Relacy.

I don't think this will work. Need to think some more...

Chris M. Thomasson

unread,

Dec 20, 2009, 8:51:54 AM12/20/09

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:BNpXm.702$8e4...@newsfe03.iad...

[...]
>> Humm, well it seems like a "possible" scheme might be to let the readers
>> simply load a reference to current collector just like in my first
>> example pseudo-code. The mutator collects nodes and only frees them when
>> a threshold is encountered. The mutator could do this by simply
>> collecting from the _current_ collector instead of the previous one .
>> Okay, check this scheme out:
>> ______________________________________________________________________
> [...]
>> ______________________________________________________________________
>>
>>
>> Need to think if this can possibly work or not... I will probably model
>> this in Relacy.
>
> I don't think this will work. Need to think some more...

Well, so far Relacy disagrees with me because the following test I modeled
of the algorithm is churning right along as I type:

http://relacy.pastebin.com/f5292fbf4

And Relacy usually finds and error in about a half a second or so! This
algorithm uses deferment in order to help reduce contention. In order to
free nodes, it collects from the current collector and dumps all previously
collected nodes and keeps the nodes it got from the current collector until
the next synchronization epoch. I will investigate how much contention is
actually being avoided.

Chris M. Thomasson

unread,

Dec 20, 2009, 10:39:17 AM12/20/09

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:vYpXm.703$8e4...@newsfe03.iad...

> "Chris M. Thomasson" <n...@spam.invalid> wrote in message
> news:BNpXm.702$8e4...@newsfe03.iad...
> [...]
>>> Humm, well it seems like a "possible" scheme might be to let the readers
>>> simply load a reference to current collector just like in my first
>>> example pseudo-code. The mutator collects nodes and only frees them when
>>> a threshold is encountered. The mutator could do this by simply
>>> collecting from the _current_ collector instead of the previous one .
>>> Okay, check this scheme out:
>>> ______________________________________________________________________
>> [...]
>>> ______________________________________________________________________
>>>
>>>
>>> Need to think if this can possibly work or not... I will probably model
>>> this in Relacy.
>>
>> I don't think this will work. Need to think some more...
>
> Well, so far Relacy disagrees with me because the following test I modeled
> of the algorithm is churning right along as I type:
>
>
> http://relacy.pastebin.com/f5292fbf4
>

I posted the wrong version damn it! Here is the corrected link:

http://relacy.pastebin.com/f5253cbb2

Sorry about that! I will let you know if Relacy finds an error.

It seems like when I set the `COLLECTOR_DEPTH' macro to a higher value
(e.g., 32) the contention for read access is very few and far between, which
is damn good. Humm... Now, I need to test for write contention on
collections... Interesting.

Chris M. Thomasson

unread,

Dec 20, 2009, 10:48:23 AM12/20/09

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:dxrXm.3864$yM1....@newsfe11.iad...
[...]

>
> I posted the wrong version damn it! Here is the corrected link:
>
>
> http://relacy.pastebin.com/f5253cbb2
>
>
> Sorry about that! I will let you know if Relacy finds an error.

Yes. Relacy finds an access to freed memory error in the stack on line `288'
on iteration `2448409'. Can you reproduce the error if you set the initial
state to 2448409? I am not exactly sure what's going on here because if I
define `RL_FORCE_SEQ_CST' then the test passes that iteration and continues
on it's merry way. This is telling me there is a memory visibility issue
somewhere. As of now, I cannot seem to find it!

DAMN!

;^)

Dmitriy Vyukov

unread,

Dec 20, 2009, 11:27:58 AM12/20/09

to

On Dec 20, 6:48 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Chris M. Thomasson" <n...@spam.invalid> wrote in messagenews:dxrXm.3864$yM1....@newsfe11.iad...

Yes, I see the error on iteration 2448409.
I think that iterations with RL_FORCE_SEQ_CST defined are not the same
iterations as without RL_FORCE_SEQ_CST. Relaxed memory model
"consumes" additional indeterminacy, so to say. I.e. if the error is
NOT fixed with RL_FORCE_SEQ_CST, it will be detected on different
iteration. So you better re-run the test from scratch with
RL_FORCE_SEQ_CST defined.

--
Dmitriy V'jukov

Chris M. Thomasson

unread,

Dec 20, 2009, 12:39:11 PM12/20/09

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:c1971339-c0b7-40ad...@a32g2000yqm.googlegroups.com...

I am in the process of re-running the test with RL_FORCE_SEQ_CST defined.
Anyway, I believe I have found the error. Imagine there are 4 collectors
[C1..C4]. The tail has collected from [C1..C2], now the head wraps to C2.
The tail detects this and frees nodes collected from [C1..C2] without
collecting from [C3..C4]. This can cause a race-condition in which the freed
nodes are being accessed from [C3..C4].

Does that make sense to you?

Dmitriy Vyukov

unread,

Dec 20, 2009, 1:36:54 PM12/20/09

to

On Dec 20, 8:39 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Dmitriy Vyukov" <dvyu...@gmail.com> wrote in message

I can't say right now.
But I know that wrapping represents a problem for such algorithms. One
may have old writer step on new writer, old writer step on new reader,
old reader step on new writer. All these cases must be correctly
handled. Monotonic head/tail index may solve some of these problems
(because it effectively becomes IBM ABA counter).

--
Dmitriy V'jukov

Chris M. Thomasson

unread,

Dec 20, 2009, 1:54:03 PM12/20/09

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:0ed72eb1-1132-4a8f...@d20g2000yqh.googlegroups.com...

On Dec 20, 8:39 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:

[...]

> > I am in the process of re-running the test with RL_FORCE_SEQ_CST
> > defined.

It passes, but I feel that there is still a problem.

> > Anyway, I believe I have found the error. Imagine there are 4 collectors
> > [C1..C4]. The tail has collected from [C1..C2], now the head wraps to
> > C2.
> > The tail detects this and frees nodes collected from [C1..C2] without
> > collecting from [C3..C4]. This can cause a race-condition in which the
> > freed
> > nodes are being accessed from [C3..C4].
> >
> > Does that make sense to you?

> I can't say right now.
> But I know that wrapping represents a problem for such algorithms. One
> may have old writer step on new writer, old writer step on new reader,
> old reader step on new writer. All these cases must be correctly
> handled. Monotonic head/tail index may solve some of these problems
> (because it effectively becomes IBM ABA counter).

Humm... You could not allow the head to acquire a collector that the tail
was collecting from. I am not sure how to handle this. Need to think some
more.

Dmitriy Vyukov

unread,

Dec 21, 2009, 3:44:38 AM12/21/09

to

On Dec 20, 9:54 pm, "Chris M. Thomasson" <n...@spam.invalid> wrote:
> "Dmitriy Vyukov" <dvyu...@gmail.com> wrote in message

Just a thought. Maybe introduce explicit state for each collector:
busy, collection pending, collecting, free. Then readers will be able
to detect collisions.

--
Dmitriy V'jukov

Chris M. Thomasson

unread,

Dec 24, 2009, 8:50:02 PM12/24/09

to

"Dmitriy Vyukov" <dvy...@gmail.com> wrote in message

news:c085d068-f533-4ab7...@j14g2000yqm.googlegroups.com...

I "think" I have made an observation that "should" detect grace periods
sufficient to safely destroy nodes within this scheme. Take the case with an
array of 3 collector objects. And the index points to collector 1. A mutator
advances the index to collector 2; takes write lock on previous collector 1;
defers object A in current defer list; releases write lock on collector 1.
Now, assume current readers in collector 2 have raced in have a reference to
object A, which is entirely possible. Then the mutator performs the same
action and defers node B; the current collector is now 3. AFAICT, it is
completely impossible for any readers in collector 3 to have visibility to
node A, therefore node A is quiescent because write locks were acquired and
released for collector 1 and 2.

Okay, this tells me that two mutations have to occur in order for a node in
the first mutation to become quiescent. I am To handle collection index
wrapping, the per collector defer list would be a FIFO. Popping an item
would remove the oldest node which would be the quiescent one.

A simple way to do this would be to use a single collector thread that would
have a FIFO defer list. As soon as it detects that it has made two
mutations, it can pop a node from the FIFO and destroy it. I really need to
draw all of this on paper; well, it seems to work out here:

http://relacy.pastebin.com/f571d4312
(passes 10,000,000 iterations without `RL_FORCE_SEQ_CST' defined)

Now, if this observation turns out to be true, then I believe I can apply it
to any "array-based" proxy collector scheme. Also, I don't even think I need
acquire/release semantics on the current collector index variable because
the collector objects provide proper synchronization in and of themselves.

The key elements are objects are deferred in FIFO order. After two mutations
occur, one can pop a node off the FIFO and destroy it because it's
quiescent. I believe there are many improvements I can make to this
lock-based collector as well as non-blocking collectors based on the same
basic logic. I will do further testing with Relacy...

What do you think about this Dmitriy?

BTW, Merry Christmas Everybody!

:^D

Chris M. Thomasson

unread,

Dec 24, 2009, 8:56:42 PM12/24/09

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:URUYm.2918$8e4....@newsfe03.iad...
[...]

> Also, I don't even think I need
> acquire/release semantics on the current collector index variable because
> the collector objects provide proper synchronization in and of themselves.

On second thought, I do need acquire/release because I want the deferred
node to be queued into the FIFO BEFORE to be committed _before_ I bump the
index.

[...]

Chris M. Thomasson

unread,

Dec 25, 2009, 12:20:18 AM12/25/09

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:URUYm.2918$8e4....@newsfe03.iad...

> "Dmitriy Vyukov" <dvy...@gmail.com> wrote in message
> news:c085d068-f533-4ab7...@j14g2000yqm.googlegroups.com...

[...]

> The key elements are objects are deferred in FIFO order. After two
> mutations
> occur, one can pop a node off the FIFO and destroy it because it's
> quiescent. I believe there are many improvements I can make to this
> lock-based collector as well as non-blocking collectors based on the same
> basic logic. I will do further testing with Relacy...

Test passes 50,000,000 iterations. I cannot seem to find any problems right
now.

Chris M. Thomasson

unread,

Dec 20, 2009, 9:30:41 AM12/20/09

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:vYpXm.703$8e4...@newsfe03.iad...

> "Chris M. Thomasson" <n...@spam.invalid> wrote in message
> news:BNpXm.702$8e4...@newsfe03.iad...
> [...]
>>> Humm, well it seems like a "possible" scheme might be to let the readers
>>> simply load a reference to current collector just like in my first
>>> example pseudo-code. The mutator collects nodes and only frees them when
>>> a threshold is encountered. The mutator could do this by simply
>>> collecting from the _current_ collector instead of the previous one .
>>> Okay, check this scheme out:
>>> ______________________________________________________________________
>> [...]
>>> ______________________________________________________________________
>>>
>>>
>>> Need to think if this can possibly work or not... I will probably model
>>> this in Relacy.
>>
>> I don't think this will work. Need to think some more...
>
> Well, so far Relacy disagrees with me because the following test I modeled
> of the algorithm is churning right along as I type:
>
>
> http://relacy.pastebin.com/f5292fbf4
>

I posted the wrong version damn it! Here is the corrected link:

http://relacy.pastebin.com/f53e494e6

Sorry about that! I am running the test with `RL_FORCE_SEQ_CST' defined in
order to see if the algorithm will work at all.

I don't know about this. I have my doubts on the algorithm.

Oh well.

Chris M. Thomasson

unread,

Dec 20, 2009, 9:30:41 AM12/20/09

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message

news:vYpXm.703$8e4...@newsfe03.iad...

> "Chris M. Thomasson" <n...@spam.invalid> wrote in message
> news:BNpXm.702$8e4...@newsfe03.iad...
> [...]
>>> Humm, well it seems like a "possible" scheme might be to let the readers
>>> simply load a reference to current collector just like in my first
>>> example pseudo-code. The mutator collects nodes and only frees them when
>>> a threshold is encountered. The mutator could do this by simply
>>> collecting from the _current_ collector instead of the previous one .
>>> Okay, check this scheme out:
>>> ______________________________________________________________________
>> [...]
>>> ______________________________________________________________________
>>>
>>>
>>> Need to think if this can possibly work or not... I will probably model
>>> this in Relacy.
>>
>> I don't think this will work. Need to think some more...
>
> Well, so far Relacy disagrees with me because the following test I modeled
> of the algorithm is churning right along as I type:
>
>
> http://relacy.pastebin.com/f5292fbf4
>

I posted the wrong version damn it! Here is the corrected link:

Dmitriy Vyukov

unread,

Dec 26, 2009, 6:50:27 AM12/26/09

to

On Dec 25, 4:50 am, "Chris M. Thomasson" <n...@spam.invalid> wrote:

> The key elements are objects are deferred in FIFO order. After two mutations
> occur, one can pop a node off the FIFO and destroy it because it's
> quiescent. I believe there are many improvements I can make to this
> lock-based collector as well as non-blocking collectors based on the same
> basic logic. I will do further testing with Relacy...
>
> What do you think about this Dmitriy?

I need to think on this.
Currently I am busy with my new CopyOnWrite-MultiVersion-ReaderWriter-
Transactional-Sequence-Lock. Doing tests and benchmarks on 2 processor/
8 core/16 hw threads Xeon and 8 core/64 hw threads UltraSPARC T2. It's
fun to see how traditional mutexes (pthread_mutex_t/pthread_spinlock_t/
pthread_rwlock_t) behave in synthetic benchmarks on such machines :)

--
Dmitriy V'jukov