If the link works
http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220060265373%22.PGNR.&OS=DN/20060265373&RS=DN/20060265373
otherwise try
http://appft1.uspto.gov/netahtml/PTO/srchnum.html
and paste in the application number.
And no, I have no clue.
--
Joe Seigh
When you get lemons, you make lemonade.
When you get hardware, you make software.
On Dec 15, 1:34 am, Joe Seigh <jseigh...@xemaps.com> wrote:
> 20060265373 Hybrid multi-threaded access to data structures using hazard pointers for reads and locks for updates
>
> If the link workshttp://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=...
> otherwise tryhttp://appft1.uspto.gov/netahtml/PTO/srchnum.html
> and paste in the application number.
>
> And no, I have no clue.
>
For such patent issue, is it very inconvinient for people outside US?
The software patent issue is very screwed up for now. It's not clear
how things will work out. If you're outside the US and don't do
business inside the US you could probably ignore them. What's going
on in this particular case is very strange. I don't understand IBM's
strategy here.
I looked at the patent in a little more detail and it doesn't appear to be
that at all. I think the fact that McKenney and Michael worked on it
mislead me it had something to do with hazard pointers and RCU. Especially
with all the mentions of memory barriers.
I'm guessing (patents are hard to read) that it has to do with PDR using
hazard pointers. It like like a proxy collector but instead of using
multiple objects they use the data structure itself as a proxy and only
free removed nodes when no hazard pointer points to the data struture
and thus no readers of the data structure. So it doesn't appear have
the fifo logic and granularity of what I've been defining proxy collector
as having.
And a few weeks would have been an extrordinarily short amount of time
to execute a patent application. It would have to been in the works
for a while.
Never mind. :)
It's closer to Treiber's technique but using hazard pointers instead
of a counter, I think.
This is Joes idea, not McKenney's idea.
But "FUCK!" is a perfectly valid reaction to any patent
in general. :)
It can take six months to do a patent application plus
eighteen to public the application, so we'll have to
wait a while longer to see if anyone filed any patent
applications in this area.
I'm trying to do a write up on proxy collectors. Trying
to explain how RCU is a proxy collector is a bit more
work than I want to do so I'll probably just edit it
out.
I'll have to figure out when I said you could do proxy
collectors with hazard pointers. I guess you could
file patents on proxy collectors with any PDR scheme
we haven't mentioned so far. I'm not seriously suggesting
it, but you could.
Maybe I'll just put that write up on hold for now. It's
really difficult to write articles without any editorial
feedback.
I asked Paul if he could discuss some of this stuff here on c.p.t from time
to time. I think he may agree... So far he has not heard of vZOOM, but he
said you guys sent some e-mails back-and-fourth wrt your RCU+SMR
algorithm... I think Paul's input would be an asset to this group.
Apparently, he is helping out with the current C++ thread/memory model
standardization process... Seems like they may be having a bit of a
difficult time totally grasping loads with associated data-dependency
hints...
IMO, C++ memory model simply has to be weak enough to support
data-dependency's...
The problem seems to be more with defining rules to allow the optimizer
to be as clever as possible without actually breaking anything, and to
allow unordered atomic operations in the language. Given these
requirements, the need to explicitly define some sort of dependency
rules seems almost unavoidable. But this may simply be an instance
where the design is being over-complicated. Given the frequency with
which unordered atomic operations will probably be performed, it may be
sufficient to require the compiler to not reorder stores to occur
before an atomic load, regardless of ordering constraints. Recently,
I've had the feeling that the discussion may benefit most from the
participation of more compiler writers, as the discussion in this area
seems a bit too speculative.
Sean
>
>
> I asked Paul if he could discuss some of this stuff here on c.p.t from time
> to time. I think he may agree... So far he has not heard of vZOOM, but he
> said you guys sent some e-mails back-and-fourth wrt your RCU+SMR
> algorithm... I think Paul's input would be an asset to this group.
> Apparently, he is helping out with the current C++ thread/memory model
> standardization process... Seems like they may be having a bit of a
> difficult time totally grasping loads with associated data-dependency
> hints...
Yeah, I noticed. I'm still not sure whether getting involved in the
C++00x threading stuff would do any good. They're pretty set in
their ways.
I suppose this isn't true. The spec may simply be able to include a
clause containing the conditions that must be considered when
optimizing code--largely concerning requirements and assumptions for
atomic types. But this doesn't seem to be the way the committee is
heading. Rather, they seem inclined to define a set of ordering rules
for all operations within the language. Perhaps there is some
requirement that this all be explicitly stated in the spec? I'll admit
I don't entirely understand the reason for such a complex definition,
particularly since it seems extremely difficult to predict and account
for future developments in compiler and hardware design. So I suppose
I agree with what you said regarding a weak memory model :-)
Sean
In other words, it will limit better implementations that can take
advantage of improvements in hardware, compilers, or algorithms.
Dependent loads are a little bit strange. I did point out before
that if you can't rely on dependent loads working and have to use
load with acquire that you don't need load acquire every place you
used load dependent at. You just need them for mutable pointers and
regular loads for non mutable pointers. So it's a little more
subtle and you basically need smarter pointers.
I suspect it will. Though there may be a way to work around the
restrictions and retain the important bits (restricting compiler
optimizations in critical areas, for example). And there's still a lot
about which to speculate. The spec for atomic concrete types hasn't
been made public, for example.
Sean
Sean
The reqular ordering rules are based on memory barriers that are part
of most memory models. The dependent load ordering isn't explicitly
a part of any official memory model AFAIK. The only people using
dependent loads are those working with lock-free since there's a
significant performance advantage in using dependent loads over load
with acquire semantics. Java only has load acquire using the volatile
attribute. The only reason they get away with it is one, Java has
a lot of overhead anyway, and two, how would you know?
Right. I think the pertinent point here is that it's sometimes useful
to restrict compiler reordering without restricting hardware
reordering. So a dependent load operation might be flagged as an
acquire with respect to the compiler but raw with respect to the
hardware. This is sort of what D does with its re-interpretation of
volatile--in D, the volatile statement is strictly a compiler-level
barrier. The only downside to this approach is that it assumes a
certain level of understanding which appears to be beyond the level the
C++ committee is targeting.
What I don't entirely understand is if the compiler must already
recognize some values or operations as atomic, it seems like that
should in itself be sufficient to determine which optimizations are
allowable. What is the value in explicitly defining the rules for all
situations? It allows the correctness of a compiler implementation to
be verified directly against the spec, I suppose, but is this
necessary?
Sean
They can then use the atomic operations defined in the memory model
to do meta implementations defining higher level synchronization.
Look at the Java specs. While not a meta implementation since they
stayed a little vague to sound more profound, you're really restricted
as to the implementations allowed.
http://java.sun.com/docs/books/vmspec/2nd-edition/html/Threads.doc.html#25549
So in the future if someone implemented STM for Java, they'd have to add
a new keyword "atomic". To convert programs to STM they'd have to change
all occurances of the string "synchronized" with the string "atomic"
since just implementing Java locks with STM would be forbidden by the specs.
Right!
http://groups.google.com/group/comp.programming.threads/msg/7b427bceff6f75da
You amortize the cost of #LoadLoad by reading a single batch of nodes at a
time... The single #LoadLoad will apply to every one in the group. The
bigger the group, the less overheads you will suffer from the damn
#LoadLoad!
I am happy to know that clever PDR implementations' can still be very speedy
on architectures' that simply do not support dependant-loads...
:^)
Oh well, it not like we are going to see any more architectures' like the
DEC Alpha anymore... IMHO, current trend is fuc$KING iron-clad bloated cache
coherency... You know, the kind of total crap that has to be implemented in
order to even think about support any kind of HTM!
ACK!!!
;^)
I don't know how that would work. For example, using proxy collectors
doesn't remove the requirement for load with acquire. Its whether
the pointers you load are mutable or not. E.g. a LIFO queue here only
the queue anchor is mutable, you can use a load w/ acquire on the queue
anchor and plain loads on the node pointers, or dependent loads on all
of them. For a linked list with random insert and delete, you need
either load w/ acquire on everything, or dependent load on everthing.
The pointers in LIFO queue nodes are not mutable. The pointers in a
random insert/delete queue are mutable.
I've mentioned it here down at the bottom
http://groups.google.com/group/comp.programming.threads/msg/c8206c2535fe816e?hl=en&
but I think I mentioned it earlier as well.
Well, you could create and initialize a batch of nodes... do a #StoreStore,
and atomic store a pointer to the batch to a location that a reader thread
can see... Then the reader loads the pointer to the batch and does a
#LoadLoad, then it does a read-only traversal of the stuff that the
#StoreStore helped to make visible (e.g., every node in the batch)...
Any thoughts?
Figuring out how to batch it without adding more overhead than a
conventional solution would be the trick.
Good point!
;^)
Well, I tend to batch thing naturally.... I mean if the queue your pulling
from is not empty yet, why process the message right away... Load up a nice
batch of work if the queue is not empty... I make frequent use of
queue_trypop like functions adn eventcounts for my message passing scheme...