my first RCU design - comments / thoughts, please!

Toebs Douglass

unread,

Feb 26, 2011, 9:29:24 AM2/26/11

to

Hi.

I'm now implementing RCU for liblfds.

I've come up with a design and I would really, REALLY appreciate
feedback / peer review. You all know how complex and subtle this stuff
is and can be, and this is my first attempt at RCU.

So...

Each thread has a thread-local singly-threaded list, where it stores
delete candidates.

There is a global counter which is the generation counter.

There is a global, lock-free add-only singly-linked list, with one
element per thread - these are the per-thread states for RCU. When a
thread begins, it scans the list for the first available state and adds
a new element if there are none. A wrapper function is implemented for
the OS new_thread() function, whereby a structure is returned, which
contains the normal OS output from that function (typically a thread
handle of some kind), as well as a pointer to the RCU state for that
thread in this list. When that thread calls the RCU functions, it must
pass in the pointer to its RCU state. This saves us having to do
lookups in that list every time we call an RCU function.

There is a global lock (a single lock-free flag) which is used with the
list when it comes to seeing if the generation counter can be
incremented, in a way I will explain.

Each thread goes about its business and when it deletes, the element in
question is moved to the thread-local delete candidate list, being
marked with the current generation counter. When an element is moved
into the delete candidate list, the thread checks the current value of
the global generation counter against a local copy and if the global
counter has advanced, updates the local copy and purges the delete
candidate list, free()ing all elements from the old local generation to
the new generation.

The per-thread RCU state contains two flags; one is raised when the
thread enters an RCU critical section and is lowered when it exits. The
other is raised upon exit (and is never lowered, except as part of a
scan of the list, which is described further down).

As such, we know when a thread is *in* an RCU section and if it has,
since the exited flag was last lowered, *exited* a section. (This is
all to do with being able to handle idle threads).

After exiting n RCU critical sections (or some other criteria), a thread
will attempt to hold the global lock on the list. If the lock cannot be
held, the thread simply continues, since it knows some other thread
already holds the lock and is performing the work it would itself now
try to perform, if it had taken the lock.

When a thread takes the lock, it then scans the list. The list contains
the value of the RCU flags for this thread *and the value of the exit
flag for the last time the list was checked*.

As such, we can see if a thread has exited an RCU critical section since
the last check (the exit flag is raised), or if the thread is *in* an
RCU critical section *now* (the "in" flag is raised) and we can see if a
thread has been idle - the old exit flag is lowered, the current flag is
lowered, and the "in" flag is lowered.

If all threads have exited an RCU critical section and/or been idle
since the last check (not entered or exited), then it is safe to
increment the generation counter. (New threads can be added at any time
- the list lock does not prevent this; but this is okay, since they
cannot be dependent on any elements which are already in delete
candidate lists. Threads which terminated will leave their RCU state
marked vacant, so they will be ignored).

If it is safe to do so, the checking thread makes a second pass down the
list, updating the old values of the RCU flags with the current values
and resetting the current exit flags to LOWERED. (This may overwrite an
exit which occurred, but it's an inefficiently only).

The checking thread then increments the global generation counter and
releases the list lock. Other threads, when they next exit an RCU
section, will notice the generation counter has changed and purge their
thread-local delete candidate lists accordingly.

---

So, that's it! It's a lot to take in; I'm obviously not anticipating
anyone really sitting down and working it all out! but it was very
helpful writing it out for an audience, and maybe also there will be
things in there people can spot and comment on, just reading lightly
over it all.

Dmitriy Vyukov

unread,

Feb 26, 2011, 1:18:53 PM2/26/11

to

Hi Toebs,

In general the design looks sane to me. Thread local defer lists are
good, thread local states are good.

Be careful to prevent a race between a thread entering into RCU read
section and a thread promoting epoch. RCU read lock hardly can be just
a set of the bit, it must be either a store-load-branch sequence (with
a potential store-load fence in between) or a load-store sequence
(which represents an acknowledgment of epoch change).

Also be careful with blocked threads or threads that do not execute
RCU read section for a long time, they should not prevent system-wide
memory reclamation.

Do you aware of URCU design? If not, check out the sources and google
a paper about it - especially (AFAIR) 3 work modes.

--
Dmitry Vyukov
All about lockfree/waitfree algorithms, multicore, scalability,
parallel computing and related topics:
http://www.1024cores.net

Toebs Douglass

unread,

Feb 26, 2011, 2:14:23 PM2/26/11

to

In article <488cd2a0-dbd2-48de-b1d1-
fb15f2...@k16g2000vbq.googlegroups.com>, dvy...@gmail.com says...
> Hi Toebs,

Hey, Dmitry!

> In general the design looks sane to me. Thread local defer lists are
> good, thread local states are good.

Yay!

> Be careful to prevent a race between a thread entering into RCU read
> section and a thread promoting epoch. RCU read lock hardly can be just
> a set of the bit, it must be either a store-load-branch sequence (with
> a potential store-load fence in between) or a load-store sequence
> (which represents an acknowledgment of epoch change).

Right now my thought is that moving into an RCU read section will
require an atomic set of a flag, so that the thread checking the shared
list of RCU states will always correctly know when a thread really is
actually in an RCU read section. The flag that a thread has come *out*
of a section can be a normal variable - it doesn't matter if the list
checking thread sees it lowered when it's actually been raised - it's
just an inefficiency.

> Also be careful with blocked threads or threads that do not execute
> RCU read section for a long time, they should not prevent system-wide
> memory reclamation.

Idle threads are explicitly handled - I *know* when a thread is idle, so
I know I can safely advance the RCU generation counter. Blocked threads
however kill me stone dead. I don't see what can be done about them.

> Do you aware of URCU design? If not, check out the sources and google
> a paper about it - especially (AFAIR) 3 work modes.

I've not read the URCU design. I'm tempted to look, but I think you
really -learn- something by working it out from scratch yourself.

Thanks for looking it over, Dmitry! you're always there for the
newbies!

Toebs Douglass

unread,

Feb 26, 2011, 2:43:31 PM2/26/11

to

In article <MPG.27d36f79b...@news-europe.giganews.com>,
a...@b.com says...

> Right now my thought is that moving into an RCU read section will
> require an atomic set of a flag, so that the thread checking the shared
> list of RCU states will always correctly know when a thread really is
> actually in an RCU read section.

So I'm making a trade-off here. I could as an alternative provide the
user an API function where he indicates a thread will be idle for a long
time and should be ignored. Or I can track idleness myself - but that
requires an atomic set on entry to RCU.

Toebs Douglass

unread,

Feb 26, 2011, 8:54:35 PM2/26/11

to

I wrote:
> [snip RCU design]

I missed an issue.

What happens to the delete candidate list of a thread which has finished
and has been deregistered from RCU?

It can be that some or even all of the elements in that threads list are
not yet safe to deallocate; but there is no longer a thread using that
state, so there is no thread to notice when the generation counter has
moved on and free()ing can occur.

One response is fairly passive; you can say that since the RCU thread
states never get deallocated (it's an add-only singly-linked list), then
in time another thread will take over that RCU thread state, and the
outstanding elements will then again be taken care of.

But I think there's a better solution. When a thread takes the RCU
thread state lock and performs the scan to see if the generation counter
can be incremented, that thread can also detect any not-in-use elements
which have non-empty delete candidate lists, temporarily take them over
and purge their delete candidate lists, then return them to being
available.

This means that when an RCU thread state is released and becomes
available, we in fact in the case of non-empty delete candidate lists
need to take care to retain some of the RCU state data (current
generation for that thread), so we can continue to deal with those
delete candidate lists.

Toebs Douglass

unread,

Feb 26, 2011, 8:57:46 PM2/26/11

to

In article <MPG.27d3ccb0c...@news-europe.giganews.com>,
a...@b.com says...

> But I think there's a better solution. When a thread takes the RCU
> thread state lock and performs the scan to see if the generation counter
> can be incremented, that thread can also detect any not-in-use elements
> which have non-empty delete candidate lists, temporarily take them over
> and purge their delete candidate lists, then return them to being
> available.
>
> This means that when an RCU thread state is released and becomes
> available, we in fact in the case of non-empty delete candidate lists
> need to take care to retain some of the RCU state data (current
> generation for that thread), so we can continue to deal with those
> delete candidate lists.

Although this begins to bring up sychronization issues, since a thread
can be scanning the RCU thread state list at any time and a thread can
claim an available RCU thread state at any time.

I need to think about this.

Dmitriy Vyukov

unread,

Feb 27, 2011, 6:52:29 AM2/27/11

to

On 26 фев, 22:14, Toebs Douglass <a...@b.com> wrote:
> In article <488cd2a0-dbd2-48de-b1d1-
> fb15f2c7f...@k16g2000vbq.googlegroups.com>, dvyu...@gmail.com says...

>
> > Hi Toebs,
>
> Hey, Dmitry!
>
> > In general the design looks sane to me. Thread local defer lists are
> > good, thread local states are good.
>
> Yay!
>
> > Be careful to prevent a race between a thread entering into RCU read
> > section and a thread promoting epoch. RCU read lock hardly can be just
> > a set of the bit, it must be either a store-load-branch sequence (with
> > a potential store-load fence in between) or a load-store sequence
> > (which represents an acknowledgment of epoch change).
>
> Right now my thought is that moving into an RCU read section will
> require an atomic set of a flag, so that the thread checking the shared
> list of RCU states will always correctly know when a thread really is
> actually in an RCU read section. The flag that a thread has come *out*
> of a section can be a normal variable - it doesn't matter if the list
> checking thread sees it lowered when it's actually been raised - it's
> just an inefficiency.

Atomic RMW will do.

> > Also be careful with blocked threads or threads that do not execute
> > RCU read section for a long time, they should not prevent system-wide
> > memory reclamation.
>
> Idle threads are explicitly handled - I *know* when a thread is idle, so
> I know I can safely advance the RCU generation counter. Blocked threads
> however kill me stone dead. I don't see what can be done about them.

If you would read the URCU paper then you would see ;) I suggested it
here on c.p.t as well, but unable to find it right now.
Basically, you need a pair functions that denote a region where a
thread is going to use RCU:
rcu_enter()/rcu_leave()
also a function that denotes a quiescence state:
rcu_quiescent()
and a pair functions that denote potential IO blocking:
rcu_before_blocking()/rcu_after_blocking()

However, if you use atomic RMW in rcu_enter(), then you don't actually
need all that. You only need to discourage a user from blocking inside
of RCU read section. If he do, then there is [almost] nothing you can
do with that. And he does not, then blocking outside of read section
won't create any problems anyway.

> > Do you aware of URCU design? If not, check out the sources and google
> > a paper about it - especially (AFAIR) 3 work modes.
>
> I've not read the URCU design. I'm tempted to look, but I think you
> really -learn- something by working it out from scratch yourself.
>
> Thanks for looking it over, Dmitry! you're always there for the
> newbies!

--
Dmitry Vyukov

Dmitriy Vyukov

unread,

Feb 27, 2011, 6:58:02 AM2/27/11

to

On 27 фев, 04:57, Toebs Douglass <a...@b.com> wrote:
> In article <MPG.27d3ccb0cd709668989...@news-europe.giganews.com>,

I think that it will be better if a deregistering thread transfers his
defer list to a special global list. It will eliminate necessity in
scanning whole list of thread descriptors (there can be hundreds of
threads) on each epoch promote. And potentially it will eliminate some
problems with list synchronization.

--
Dmitry Vyukov

Toebs Douglass

unread,

Feb 27, 2011, 12:49:20 PM2/27/11

to

In article <4230769b-3cca-409b-8e22-81e36bef9374
@x4g2000prf.googlegroups.com>, dvy...@gmail.com says...

> On 26 ???, 22:14, Toebs Douglass <a...@b.com> wrote:
> > In article <488cd2a0-dbd2-48de-b1d1-
> > fb15f2c7f...@k16g2000vbq.googlegroups.com>, dvyu...@gmail.com says...

> > > Also be careful with blocked threads or threads that do not

execute
> > > RCU read section for a long time, they should not prevent system-wide
> > > memory reclamation.
> >
> > Idle threads are explicitly handled - I *know* when a thread is idle, so
> > I know I can safely advance the RCU generation counter. Blocked threads
> > however kill me stone dead. I don't see what can be done about them.
>
> If you would read the URCU paper then you would see ;) I suggested it
> here on c.p.t as well, but unable to find it right now.
> Basically, you need a pair functions that denote a region where a
> thread is going to use RCU:
> rcu_enter()/rcu_leave()
> also a function that denotes a quiescence state:
> rcu_quiescent()
> and a pair functions that denote potential IO blocking:
> rcu_before_blocking()/rcu_after_blocking()
>
> However, if you use atomic RMW in rcu_enter(), then you don't actually
> need all that. You only need to discourage a user from blocking inside
> of RCU read section. If he do, then there is [almost] nothing you can
> do with that. And he does not, then blocking outside of read section
> won't create any problems anyway.

I was thinking about this earlier today. I don't have a problem with
blocked threads, not yet, anyway - *ALL* my RCU read sections are inside
what from the users point of view are single function calls. The *user*
is not calling the RCU entry/exit functions - all he's doing is calling,
say, queue_enqueue().

Toebs Douglass

unread,

Mar 3, 2011, 6:38:29 PM3/3/11

to

In article <a6e3050f-beb0-4924-ae91-7b6663acb527
@o14g2000prb.googlegroups.com>, dvy...@gmail.com says...

> On 27 ???, 04:57, Toebs Douglass <a...@b.com> wrote:

> > Although this begins to bring up sychronization issues, since a
thread
> > can be scanning the RCU thread state list at any time and a thread can
> > claim an available RCU thread state at any time.
> >
> > I need to think about this.

> I think that it will be better if a deregistering thread transfers his
> defer list to a special global list. It will eliminate necessity in
> scanning whole list of thread descriptors (there can be hundreds of
> threads) on each epoch promote. And potentially it will eliminate some
> problems with list synchronization.

That's a good idea.

Toebs Douglass

unread,

Mar 6, 2011, 5:22:57 AM3/6/11

to

In article <MPG.27da44dbd...@news-europe.giganews.com>,
a...@b.com says...

> In article <a6e3050f-beb0-4924-ae91-7b6663acb527
> @o14g2000prb.googlegroups.com>, dvy...@gmail.com says...

> > I think that it will be better if a deregistering thread transfers

his
> > defer list to a special global list. It will eliminate necessity in
> > scanning whole list of thread descriptors (there can be hundreds of
> > threads) on each epoch promote. And potentially it will eliminate some
> > problems with list synchronization.
>
> That's a good idea.

I'm not sure I properly understood this, in fact.

So, we have potentially many threads exiting at the same moment and all
wishing to place their outstanding delete candidates into a single,
global list.

The atomic list I have is add-only. I could use that - and I do do the
RCU thread states - but it's not as nice to have to use it for elements,
because there could be a lot, many more than thread states.

I could use a normal singly-threaded list, with a lock, which would mean
threads upon exit take maybe a bit longer to exit, since they need to
grab the lock so they can shift their elements into the singly-threaded
list.

That's not too bad, although it is potentially a bottleneck, since it
will delay the reclamation of delete candidates (where they idle in an
exiting threads list while it waits for the lock).

So I was wondering if you had a particular idea about how to go about
doing this - but something has just occurred to me now, re-reading your
paragraph.

I was thinking of a global list of delete candidates. I think you were
thinking about a global list of *retired rcu thread states*.

Toebs Douglass

unread,

Mar 6, 2011, 5:33:38 AM3/6/11

to

In article <MPG.27dd7ea2...@news-europe.giganews.com>, a...@b.com
says...

> I was thinking of a global list of delete candidates. I think you
were
> thinking about a global list of *retired rcu thread states*.

So, yeah, I think I have it now; you have this second global list, also
atomic add-only singly-linked. When you come to retire a state in the
RCU thread state list, you claim or make a new element in the retired
list, copy the retiring states state over to that element and then mark
in the main list the element as available.