RCU+SMR has a patent application

Joe Seigh

unread,

Dec 14, 2006, 12:34:10 PM12/14/06

to

20060265373 Hybrid multi-threaded access to data structures using hazard pointers for reads and locks for updates

If the link works
http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220060265373%22.PGNR.&OS=DN/20060265373&RS=DN/20060265373
otherwise try
http://appft1.uspto.gov/netahtml/PTO/srchnum.html
and paste in the application number.

And no, I have no clue.

--
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software.

Bin Chen

unread,

Dec 14, 2006, 7:10:33 PM12/14/06

to

On Dec 15, 1:34 am, Joe Seigh <jseigh...@xemaps.com> wrote:
> 20060265373 Hybrid multi-threaded access to data structures using hazard pointers for reads and locks for updates
>

> If the link workshttp://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=...
> otherwise tryhttp://appft1.uspto.gov/netahtml/PTO/srchnum.html

> and paste in the application number.
>
> And no, I have no clue.
>

For such patent issue, is it very inconvinient for people outside US?

Joe Seigh

unread,

Dec 14, 2006, 8:22:04 PM12/14/06

to

Bin Chen wrote:
>
> On Dec 15, 1:34 am, Joe Seigh <jseigh...@xemaps.com> wrote:
>
>>20060265373 Hybrid multi-threaded access to data structures using hazard pointers for reads and locks for updates
>>
>>If the link workshttp://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=...
>>otherwise tryhttp://appft1.uspto.gov/netahtml/PTO/srchnum.html
>>and paste in the application number.
>>
>>And no, I have no clue.
>>
>
> For such patent issue, is it very inconvinient for people outside US?
>

The software patent issue is very screwed up for now. It's not clear
how things will work out. If you're outside the US and don't do
business inside the US you could probably ignore them. What's going
on in this particular case is very strange. I don't understand IBM's
strategy here.

Chris Thomasson

unread,

Dec 14, 2006, 11:40:49 PM12/14/06

to

FUCK!

Joe Seigh

unread,

Dec 15, 2006, 3:46:22 AM12/15/06

to

Joe Seigh wrote:
> 20060265373 Hybrid multi-threaded access to data structures using hazard
> pointers for reads and locks for updates
>
> If the link works
> http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220060265373%22.PGNR.&OS=DN/20060265373&RS=DN/20060265373
>
> otherwise try
> http://appft1.uspto.gov/netahtml/PTO/srchnum.html
> and paste in the application number.
>
> And no, I have no clue.
>

I looked at the patent in a little more detail and it doesn't appear to be
that at all. I think the fact that McKenney and Michael worked on it
mislead me it had something to do with hazard pointers and RCU. Especially
with all the mentions of memory barriers.

I'm guessing (patents are hard to read) that it has to do with PDR using
hazard pointers. It like like a proxy collector but instead of using
multiple objects they use the data structure itself as a proxy and only
free removed nodes when no hazard pointer points to the data struture
and thus no readers of the data structure. So it doesn't appear have
the fifo logic and granularity of what I've been defining proxy collector
as having.

And a few weeks would have been an extrordinarily short amount of time
to execute a patent application. It would have to been in the works
for a while.

Never mind. :)

Joe Seigh

unread,

Dec 15, 2006, 4:52:54 AM12/15/06

to

> I'm guessing (patents are hard to read) that it has to do with PDR using
> hazard pointers. It like like a proxy collector but instead of using
> multiple objects they use the data structure itself as a proxy and only
> free removed nodes when no hazard pointer points to the data struture
> and thus no readers of the data structure. So it doesn't appear have
> the fifo logic and granularity of what I've been defining proxy collector
> as having.

It's closer to Treiber's technique but using hazard pointers instead
of a counter, I think.

Message has been deleted

Chris Thomasson

unread,

Dec 16, 2006, 11:31:21 PM12/16/06

to

"Elcaro Nosille" <Elcaro....@mailinator.com> wrote in message
news:45849571$0$5714$9b4e...@newsspool3.arcor-online.net...
> Chris Thomasson schrieb:
>
>> FUCK!
>
> Can you explain why you're annoyed?

This is Joes idea, not McKenney's idea.

Joe Seigh

unread,

Dec 16, 2006, 11:43:44 PM12/16/06

to

The patent isn't on RCU+SMR. There's no RCU in it. It
just proxy SMR with a single permanent proxy object, meaning
deleted objects can't be deallocated until there are no readers.
So it's subject to the lock-free version of writer starvation.
The same issues Treiber's solution had.

But "FUCK!" is a perfectly valid reaction to any patent
in general. :)

It can take six months to do a patent application plus
eighteen to public the application, so we'll have to
wait a while longer to see if anyone filed any patent
applications in this area.

I'm trying to do a write up on proxy collectors. Trying
to explain how RCU is a proxy collector is a bit more
work than I want to do so I'll probably just edit it
out.

I'll have to figure out when I said you could do proxy
collectors with hazard pointers. I guess you could
file patents on proxy collectors with any PDR scheme
we haven't mentioned so far. I'm not seriously suggesting
it, but you could.

Joe Seigh

unread,

Dec 17, 2006, 12:15:23 AM12/17/06

to

Joe Seigh wrote:
>
> I'll have to figure out when I said you could do proxy
> collectors with hazard pointers. I guess you could
> file patents on proxy collectors with any PDR scheme
> we haven't mentioned so far. I'm not seriously suggesting
> it, but you could.
>

Well, maybe not. It looks like back in June 2003 when I
first brought up proxy collectors, I mentioned you could use
any form of GC which is the term I replaced with PDR.

Joe Seigh

unread,

Dec 18, 2006, 9:23:31 AM12/18/06

to

Joe Seigh wrote:
>
> I'm trying to do a write up on proxy collectors. Trying
> to explain how RCU is a proxy collector is a bit more
> work than I want to do so I'll probably just edit it
> out.

Maybe I'll just put that write up on hold for now. It's
really difficult to write articles without any editorial
feedback.

Chris Thomasson

unread,

Dec 25, 2006, 3:58:59 AM12/25/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message
news:OLCdnSgzgK5RFxzY...@comcast.com...

> 20060265373 Hybrid multi-threaded access to data structures using hazard
> pointers for reads and locks for updates
>
> If the link works
> http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220060265373%22.PGNR.&OS=DN/20060265373&RS=DN/20060265373
> otherwise try
> http://appft1.uspto.gov/netahtml/PTO/srchnum.html
> and paste in the application number.
>
> And no, I have no clue.

I asked Paul if he could discuss some of this stuff here on c.p.t from time
to time. I think he may agree... So far he has not heard of vZOOM, but he
said you guys sent some e-mails back-and-fourth wrt your RCU+SMR
algorithm... I think Paul's input would be an asset to this group.
Apparently, he is helping out with the current C++ thread/memory model
standardization process... Seems like they may be having a bit of a
difficult time totally grasping loads with associated data-dependency
hints...

IMO, C++ memory model simply has to be weak enough to support
data-dependency's...

Sean Kelly

unread,

Dec 26, 2006, 8:07:54 PM12/26/06

to

Chris Thomasson wrote:
>
> I asked Paul if he could discuss some of this stuff here on c.p.t from time
> to time. I think he may agree... So far he has not heard of vZOOM, but he
> said you guys sent some e-mails back-and-fourth wrt your RCU+SMR
> algorithm... I think Paul's input would be an asset to this group.
> Apparently, he is helping out with the current C++ thread/memory model
> standardization process... Seems like they may be having a bit of a
> difficult time totally grasping loads with associated data-dependency
> hints...
>
> IMO, C++ memory model simply has to be weak enough to support
> data-dependency's...

The problem seems to be more with defining rules to allow the optimizer
to be as clever as possible without actually breaking anything, and to
allow unordered atomic operations in the language. Given these
requirements, the need to explicitly define some sort of dependency
rules seems almost unavoidable. But this may simply be an instance
where the design is being over-complicated. Given the frequency with
which unordered atomic operations will probably be performed, it may be
sufficient to require the compiler to not reorder stores to occur
before an atomic load, regardless of ordering constraints. Recently,
I've had the feeling that the discussion may benefit most from the
participation of more compiler writers, as the discussion in this area
seems a bit too speculative.

Sean

Joe Seigh

unread,

Dec 26, 2006, 9:48:19 PM12/26/06

to

Chris Thomasson wrote:

>
>
> I asked Paul if he could discuss some of this stuff here on c.p.t from time
> to time. I think he may agree... So far he has not heard of vZOOM, but he
> said you guys sent some e-mails back-and-fourth wrt your RCU+SMR
> algorithm... I think Paul's input would be an asset to this group.
> Apparently, he is helping out with the current C++ thread/memory model
> standardization process... Seems like they may be having a bit of a
> difficult time totally grasping loads with associated data-dependency
> hints...

Yeah, I noticed. I'm still not sure whether getting involved in the
C++00x threading stuff would do any good. They're pretty set in
their ways.

Sean Kelly

unread,

Dec 26, 2006, 10:13:56 PM12/26/06

to

Sean Kelly wrote:

> Chris Thomasson wrote:
> >
> > IMO, C++ memory model simply has to be weak enough to support
> > data-dependency's...
>
> The problem seems to be more with defining rules to allow the optimizer
> to be as clever as possible without actually breaking anything, and to
> allow unordered atomic operations in the language. Given these
> requirements, the need to explicitly define some sort of dependency
> rules seems almost unavoidable.

I suppose this isn't true. The spec may simply be able to include a
clause containing the conditions that must be considered when
optimizing code--largely concerning requirements and assumptions for
atomic types. But this doesn't seem to be the way the committee is
heading. Rather, they seem inclined to define a set of ordering rules
for all operations within the language. Perhaps there is some
requirement that this all be explicitly stated in the spec? I'll admit
I don't entirely understand the reason for such a complex definition,
particularly since it seems extremely difficult to predict and account
for future developments in compiler and hardware design. So I suppose
I agree with what you said regarding a weak memory model :-)

Sean

Joe Seigh

unread,

Dec 27, 2006, 7:09:13 AM12/27/06

to

In other words, it will limit better implementations that can take
advantage of improvements in hardware, compilers, or algorithms.

Joe Seigh

unread,

Dec 27, 2006, 7:42:01 AM12/27/06

to

Chris Thomasson wrote:
>
> IMO, C++ memory model simply has to be weak enough to support
> data-dependency's...
>
>

Dependent loads are a little bit strange. I did point out before
that if you can't rely on dependent loads working and have to use
load with acquire that you don't need load acquire every place you
used load dependent at. You just need them for mutable pointers and
regular loads for non mutable pointers. So it's a little more
subtle and you basically need smarter pointers.

Sean Kelly

unread,

Dec 27, 2006, 11:57:03 AM12/27/06

to

Joe Seigh wrote:

> Sean Kelly wrote:
> >
> >
> > I suppose this isn't true. The spec may simply be able to include a
> > clause containing the conditions that must be considered when
> > optimizing code--largely concerning requirements and assumptions for
> > atomic types. But this doesn't seem to be the way the committee is
> > heading. Rather, they seem inclined to define a set of ordering rules
> > for all operations within the language. Perhaps there is some
> > requirement that this all be explicitly stated in the spec? I'll admit
> > I don't entirely understand the reason for such a complex definition,
> > particularly since it seems extremely difficult to predict and account
> > for future developments in compiler and hardware design. So I suppose
> > I agree with what you said regarding a weak memory model :-)
>
> In other words, it will limit better implementations that can take
> advantage of improvements in hardware, compilers, or algorithms.

I suspect it will. Though there may be a way to work around the
restrictions and retain the important bits (restricting compiler
optimizations in critical areas, for example). And there's still a lot
about which to speculate. The spec for atomic concrete types hasn't
been made public, for example.

Sean

Sean Kelly

unread,

Dec 27, 2006, 12:01:28 PM12/27/06

to

By the way, I thought Doug Lea's comment about always specifying
ordering and letting the compiler worry about the details has seemed
the most reasonable. Attempting to sort out dependent load rules for
raw operations doesn't seem like a fruitful avenue of enquiry. But who
knows... the discussion is still in progress.

Sean

Joe Seigh

unread,

Dec 27, 2006, 12:39:53 PM12/27/06

to

The reqular ordering rules are based on memory barriers that are part
of most memory models. The dependent load ordering isn't explicitly
a part of any official memory model AFAIK. The only people using
dependent loads are those working with lock-free since there's a
significant performance advantage in using dependent loads over load
with acquire semantics. Java only has load acquire using the volatile
attribute. The only reason they get away with it is one, Java has
a lot of overhead anyway, and two, how would you know?

Sean Kelly

unread,

Dec 27, 2006, 2:41:50 PM12/27/06

to

Joe Seigh wrote:
> Sean Kelly wrote:
> > By the way, I thought Doug Lea's comment about always specifying
> > ordering and letting the compiler worry about the details has seemed
> > the most reasonable. Attempting to sort out dependent load rules for
> > raw operations doesn't seem like a fruitful avenue of enquiry. But who
> > knows... the discussion is still in progress.
>
> The reqular ordering rules are based on memory barriers that are part
> of most memory models. The dependent load ordering isn't explicitly
> a part of any official memory model AFAIK. The only people using
> dependent loads are those working with lock-free since there's a
> significant performance advantage in using dependent loads over load
> with acquire semantics.

Right. I think the pertinent point here is that it's sometimes useful
to restrict compiler reordering without restricting hardware
reordering. So a dependent load operation might be flagged as an
acquire with respect to the compiler but raw with respect to the
hardware. This is sort of what D does with its re-interpretation of
volatile--in D, the volatile statement is strictly a compiler-level
barrier. The only downside to this approach is that it assumes a
certain level of understanding which appears to be beyond the level the
C++ committee is targeting.

What I don't entirely understand is if the compiler must already
recognize some values or operations as atomic, it seems like that
should in itself be sufficient to determine which optimizations are
allowable. What is the value in explicitly defining the rules for all
situations? It allows the correctness of a compiler implementation to
be verified directly against the spec, I suppose, but is this
necessary?

Sean

Joe Seigh

unread,

Dec 27, 2006, 4:26:27 PM12/27/06

to

Sean Kelly wrote:
> What I don't entirely understand is if the compiler must already
> recognize some values or operations as atomic, it seems like that
> should in itself be sufficient to determine which optimizations are
> allowable. What is the value in explicitly defining the rules for all
> situations? It allows the correctness of a compiler implementation to
> be verified directly against the spec, I suppose, but is this
> necessary?
>
>

They can then use the atomic operations defined in the memory model
to do meta implementations defining higher level synchronization.

Look at the Java specs. While not a meta implementation since they
stayed a little vague to sound more profound, you're really restricted
as to the implementations allowed.

http://java.sun.com/docs/books/vmspec/2nd-edition/html/Threads.doc.html#25549

So in the future if someone implemented STM for Java, they'd have to add
a new keyword "atomic". To convert programs to STM they'd have to change
all occurances of the string "synchronized" with the string "atomic"
since just implementing Java locks with STM would be forbidden by the specs.

Chris Thomasson

unread,

Dec 29, 2006, 1:39:04 AM12/29/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:x-GdnQARi4s79A_Y...@comcast.com...

> Chris Thomasson wrote:
>>
>> IMO, C++ memory model simply has to be weak enough to support
>> data-dependency's...
>>
>>
>
> Dependent loads are a little bit strange. I did point out before
> that if you can't rely on dependent loads working and have to use
> load with acquire that you don't need load acquire every place you
> used load dependent at.

Right!

http://groups.google.com/group/comp.programming.threads/msg/7b427bceff6f75da

You amortize the cost of #LoadLoad by reading a single batch of nodes at a
time... The single #LoadLoad will apply to every one in the group. The
bigger the group, the less overheads you will suffer from the damn
#LoadLoad!

I am happy to know that clever PDR implementations' can still be very speedy
on architectures' that simply do not support dependant-loads...

:^)

Oh well, it not like we are going to see any more architectures' like the
DEC Alpha anymore... IMHO, current trend is fuc$KING iron-clad bloated cache
coherency... You know, the kind of total crap that has to be implemented in
order to even think about support any kind of HTM!

ACK!!!

;^)

Joe Seigh

unread,

Dec 29, 2006, 8:33:09 AM12/29/06

to

Chris Thomasson wrote:
> "Joe Seigh" <jsei...@xemaps.com> wrote in message
> news:x-GdnQARi4s79A_Y...@comcast.com...
>
>>Chris Thomasson wrote:
>>
>>>IMO, C++ memory model simply has to be weak enough to support
>>>data-dependency's...
>>>
>>>
>>
>>Dependent loads are a little bit strange. I did point out before
>>that if you can't rely on dependent loads working and have to use
>>load with acquire that you don't need load acquire every place you
>>used load dependent at.
>
>
> Right!
>
> http://groups.google.com/group/comp.programming.threads/msg/7b427bceff6f75da
>
> You amortize the cost of #LoadLoad by reading a single batch of nodes at a
> time... The single #LoadLoad will apply to every one in the group. The
> bigger the group, the less overheads you will suffer from the damn
> #LoadLoad!

I don't know how that would work. For example, using proxy collectors
doesn't remove the requirement for load with acquire. Its whether
the pointers you load are mutable or not. E.g. a LIFO queue here only
the queue anchor is mutable, you can use a load w/ acquire on the queue
anchor and plain loads on the node pointers, or dependent loads on all
of them. For a linked list with random insert and delete, you need
either load w/ acquire on everything, or dependent load on everthing.
The pointers in LIFO queue nodes are not mutable. The pointers in a
random insert/delete queue are mutable.

I've mentioned it here down at the bottom
http://groups.google.com/group/comp.programming.threads/msg/c8206c2535fe816e?hl=en&
but I think I mentioned it earlier as well.

Chris Thomasson

unread,

Dec 30, 2006, 1:25:07 PM12/30/06

to

"Joe Seigh" <jsei...@xemaps.com> wrote in message

news:rtKdnWCCg9AChQjY...@comcast.com...

> Chris Thomasson wrote:
>> "Joe Seigh" <jsei...@xemaps.com> wrote in message
>> news:x-GdnQARi4s79A_Y...@comcast.com...
>>
>>>Chris Thomasson wrote:
>>>
>>>>IMO, C++ memory model simply has to be weak enough to support
>>>>data-dependency's...
>>>>
>>>>
>>>
>>>Dependent loads are a little bit strange. I did point out before
>>>that if you can't rely on dependent loads working and have to use
>>>load with acquire that you don't need load acquire every place you
>>>used load dependent at.
>>
>>
>> Right!
>>
>> http://groups.google.com/group/comp.programming.threads/msg/7b427bceff6f75da
>>
>> You amortize the cost of #LoadLoad by reading a single batch of nodes at
>> a time... The single #LoadLoad will apply to every one in the group. The
>> bigger the group, the less overheads you will suffer from the damn
>> #LoadLoad!
>
>
> I don't know how that would work.

Well, you could create and initialize a batch of nodes... do a #StoreStore,
and atomic store a pointer to the batch to a location that a reader thread
can see... Then the reader loads the pointer to the batch and does a
#LoadLoad, then it does a read-only traversal of the stuff that the
#StoreStore helped to make visible (e.g., every node in the batch)...

Any thoughts?

Joe Seigh

unread,

Jan 2, 2007, 8:09:21 AM1/2/07

to

Chris Thomasson wrote:
> "Joe Seigh" <jsei...@xemaps.com> wrote
>>>

>>>http://groups.google.com/group/comp.programming.threads/msg/7b427bceff6f75da
>>>
>>>You amortize the cost of #LoadLoad by reading a single batch of nodes at
>>>a time... The single #LoadLoad will apply to every one in the group. The
>>>bigger the group, the less overheads you will suffer from the damn
>>>#LoadLoad!
>>
>>
>>I don't know how that would work.
>
>
> Well, you could create and initialize a batch of nodes... do a #StoreStore,
> and atomic store a pointer to the batch to a location that a reader thread
> can see... Then the reader loads the pointer to the batch and does a
> #LoadLoad, then it does a read-only traversal of the stuff that the
> #StoreStore helped to make visible (e.g., every node in the batch)...
>
>
> Any thoughts?
>
>

Figuring out how to batch it without adding more overhead than a
conventional solution would be the trick.

Chris Thomasson

unread,

Jan 5, 2007, 12:11:46 AM1/5/07

to

>> Any thoughts?
>
> Figuring out how to batch it without adding more overhead than a
> conventional solution would be the trick.

Good point!

;^)

Well, I tend to batch thing naturally.... I mean if the queue your pulling
from is not empty yet, why process the message right away... Load up a nice
batch of work if the queue is not empty... I make frequent use of
queue_trypop like functions adn eventcounts for my message passing scheme...