In terms of time lines, I have a couple of Haswell laptops and planning to order a dual E5-2600v2 system this week.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Was there a video of his talk?
Mike.
On 3 February 2014 08:56, Martin Thompson <mjp...@gmail.com> wrote:
> Attached is a presentation on what's new in Haswell from Ravi who invented
> Lock Elision. I managed to get him out of hiding to speak at QCon SF.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mechanical-sympathy+unsub...@googlegroups.com.
Time
@t=0
@t=1
@t=2
@t=4 |
HTM_Thrd_1(Operator)
public synchronized void tx_mutate_cache() {
/* From Java byte-code view this method executes with isolation=PESSIMISTIC_SERIALIZABLE … i..e when HTM_Thrd_1 enters/owns the lock, HTM_Thrd_2 may not perform any CRUD operation on HTM_CACHE_operand.
From HTM modified machine code view, however, the native locking protocol is re-written to be isolation=OPTIMISTIC_(SPECULATIVE)_READ_COMMITTED … i.e. when HTM_Thrd_1 enters/owns this block, HTM_Thrd 2 may still perform a READ operation … however if the HTM modified machine code’s coherency protocol at any time determines that any optimistic READ is determined to have been exposed to DIRTY_READ risk (or any other coherency conflict) the HTM modified machine code view reverts back to isolation=PESSIMISTIC_SERIALIZABLE flow of control and re-trys the entire transaction. */
mutate_cache($200);
/* does processing */
}
|
HTM_CACHE (Operand)
$100.00
$200
$200
$100
|
HTM_Thrd_2(Operator)
public synchronized void tx_access_cache() {
/* From Java byte-code view this method executes with isolation=PESSIMISTIC_SERIALIZABLE … i..e HTM_Thrd_2 may not perform any CRUD operation in this block while HTM_Thrd_1 owns the lock.
From HTM modified machine code view, however, the native locking protocol is re-written to be isolation=OPTIMISTIC_(SPECULATIVE)_READ_COMMITTED … i.e. when HTM_Thrd_1 owns the lock, HTM_Thrd 2 may still perform a READ operation … however if the HTM modified machine code’s coherency protocol at any time determines that any optimistic READ is determined to have been exposed to DIRTY_READ risk (or any other coherency conflict) the HTM modified machine code view reverts back to isolation=PESSIMISTIC_SERIALIZABLE flow of control and re-trys transaction. */
/* from Java view @t=2 nothing happens (isolation=SERIALIZABLE) */
/*
from HTM native view @t=2 READ is allowed to proceed,
speculating that isolation=OPTIMISTIC_READ_COMMITTED
will succeed */
bal = access_cache(); //bal assigned $200.00
/* does processing */
/* from HTM native view @t=3 the READ performed @t=2 (bal=$200) must now be scored as a DIRTY_READ conflict */
If (no_coherencyconflict()) Commit(); } else { RollBackRetry(); /* this is what EXECUTES from HTM view */ }
|
|
|
|
|
|
|
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
See inline.
On 3 Feb 2014 14:35, "Rüdiger Möller" <moru...@gmail.com> wrote:
> Some questions:
> If the programmer uses course grained locking, this will only be scuccessful on low contended data structures, isn't it ?
What matters is low write contention on cache lines. Different threads can alter differemt parts of a data structure concurrently.
> Does allocation inside speculative code increase probability of having an abort due to modifications of common data in the GC ?
AFAICS it doesn't have too if the constructor has no side effects. The memory allocation is usually thread local.
> If cache eviction triggers abort, there will be even more need to control memory data layout (or let the VM get more clever doing that).
Aborts can happen rarely even for fairly optimal code e.g. cache lines happen to exceed the 8-way associativity or your hyperthreaded cpu does something unusual in the other thread (uses the same L1 cache)
> How expensive is an abort ?
I image it can be pretty expensive so you want the JVM to monitor how often a block of code aborts and change the code to regular locking code if this is happening too much
See inline.
On 3 Feb 2014 14:35, "Rüdiger Möller" <moru...@gmail.com> wrote:
> Some questions:
> If the programmer uses course grained locking, this will only be scuccessful on low contended data structures, isn't it ?What matters is low write contention on cache lines. Different threads can alter differemt parts of a data structure concurrently.
> Does allocation inside speculative code increase probability of having an abort due to modifications of common data in the GC ?
AFAICS it doesn't have too if the constructor has no side effects. The memory allocation is usually thread local.
> If cache eviction triggers abort, there will be even more need to control memory data layout (or let the VM get more clever doing that).
Aborts can happen rarely even for fairly optimal code e.g. cache lines happen to exceed the 8-way associativity or your hyperthreaded cpu does something unusual in the other thread (uses the same L1 cache)
> How expensive is an abort ?
I image it can be pretty expensive so you want the JVM to monitor how often a block of code aborts and change the code to regular locking code if this is happening too much
Previously I had spent some time reading Cliff Click's account on the experience Azul had with HTM in the Vega boxes. The big takeaway seemed to be that HTM could work very well except that a lot of code tends to hit a common piece of data, e.g. the mod-count field in a collections classes. I would expect that this problem could still dominate most existing code.
Is anyone considering building HTM friendly collections libraries etc?
As an aside, as I was re-reading Cliffs articles I noticed you guys were using a Micro-Kernal OS on the Vega boxes. How did you find that?
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/f84bwRQpyTQ/unsubscribe.
To unsubscribe from this group and all of its topics, send an email to mechanical-symp...@googlegroups.com.
However, as multi-core platforms started to become more and more common, people obviously care more and more about serializing operations that prevent multiple cores from being used at the same time, and those bottlenecks get worked on. Two things happen because of that:
1. The biggest bottlenecks that people run into get addressed first, leaving less and less for a future OTC/HTM thing to help with. This one is obvious.
2. People spend time to "Study" the bottlenecks even if they don't fix them. This one has a less obvious and strongly detrimental effect for OTC/HTM: The first thing people seem to do when they identify a scale-limiting contention point is to instrument it. This often takes the form of adding some counters to the operation. And that in turn makes data contention (on the counters, not the actual data protected y the critical section) equal to lock contention. The very act of adding counters inside critical sections removes most of the OTC/HTM ability to improve them.
So the side effect is that OTC/HTM can help things nobody cared enough about to study...
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/f84bwRQpyTQ/unsubscribe.
To unsubscribe from this group and all of its topics, send an email to mechanical-symp...@googlegroups.com.
To unsubscribe from this group and all of its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Can an HTM re-write of a strict .java synchronized code block ever lead to producing a run-time whose concurrent access scheme exhibits anything other than isolation=SERIALIZABLE?
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/f84bwRQpyTQ/unsubscribe.
To unsubscribe from this group and all of its topics, send an email to mechanical-symp...@googlegroups.com.
To unsubscribe from this group and all of its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.
To unsubscribe from this group and all of its topics, send an email to mechanical-sympathy+unsub...@googlegroups.com.
Thanks Gil (and Ben, as I have now read your response too).
What would be a user-view example of HTM capabilities empowering apps with performance benifits, if the HTM re-write can't get past strict isolation=SERIALIZABLE. I don't see what is gained?
HashMap could work concurrrently if the size doesn't change eg a replace and some altering of how modCount is detected.
HashMap is in the JDK so it could be optimised/changed. What I had in mind is code which is not easily changed but doesn't have side effects like a mod count.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
The see a practical user-view of the actual effects of the semantically identical synchronized block execution with HTM-assist, look at slides 22/23 in the my "Speculative Locking: Breaking the Scale Barrier (JAOO 2005)" presentation. These are not hypothetical or modeled numbers, they are actual measurements on actual Vega hardware and actual production JVMs. This is just how Vega JVMs behave, and we've been shipping them wwith OTC on by default since 2006.
Also note that the vertical axis is logarithmic ;-).I highly recommend people go through the whole deck though (step by step). It's 8+ years old, but re-reading my own material in the Intel TSX-capable commodity server context, it reads as if I had a time machine view for what you'll need to know later this year in commodity hardware world. The presentation it presents the motivation and logic in detail, including hints on how to write better "HTM friendly" code in tour synchronized blocks. As long as Vega was the only machine that did this stuff, we didn't really expect much HTM-aware code writing to happen, but with TSX showing up on every commodity Intel x86 server starting later this year, writing HTM-sympathetic code is something to start thinking of.I should probably start submitting the presentation (almost as is) to conference talks this year...
On Tuesday, February 4, 2014 8:58:02 AM UTC-8, Michael Hamrick wrote:
Not quite. I am a step ahead of that in the off heap JEP.
I am assuming HTM will be available one day and that the concurrency library e.g.Lock or similar, will have support for it as well. This will need native/intrinsic operations to implement this and tradionally this would have been added to Unsafe but this is becoming restricted to internal code.
So the suggested JEP includes support for HTM in the replacement for Unsafe, in particular so it can be applied to off heap memory as well.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.
I suspect that Oracle doesn't have same experience that Azul has in this space and I expect that their use of HTM in Java 9 will be minimal. I hope it will be more than just the proposed 128-bit CAS which is a little underwhelming.
However, given there is just 3 machine code instructions for Intel TSX all you need for "new" Unsafe is three intrinsic methods and perhaps a method to say whether it is supported. Ideally this would be compatible with Vega and AMD's implementation.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
Bingo. Now "the light" has come on in my head. It is now no-longer strict SERIALIZABLE it is now a betterSERIALIZABLE -- established not through "stricness" but through "validated speculation".
So cool.
Now my question is (learning through endless questions, you know) what in Peter's JEP will he ask the Java run-time to adopt so that it can assist HTM in re-writing (strict SERIALIZABLE) .java synchronized code blocks into (better SERIALIZABLE) HTM machine code? What does Peter's JEP ask the Java run-time to do to help?
Thanks!
On Tuesday, February 4, 2014 11:32:05 AM UTC-5, Martin Grajcar wrote:On Tue, Feb 4, 2014 at 5:00 PM, Michael Hamrick <michael.to...@gmail.com> wrote:
Thanks Gil (and Ben, as I have now read your response too).What would be a user-view example of HTM capabilities empowering apps with performance benifits, if the HTM re-write can't get past strict isolation=SERIALIZABLE. I don't see what is gained?I guess I could answer this part (learning by answering, you know). A synchronized block as such allows no concurrency, but if it gets implemented as HTM it can. Imagine two threads reading from a synchronized HashMap (I mean what you get from Collections.synchronizedMap(new HashMap())). Normally, one must wait till the other finishes, but with HTM both can work in parallel as the outcome is guaranteed to be the same. It's sort if a better implementation of SERIALIZABLE. Using synchronized block is the simplest and the least concurrent implementation Using HTM is a clever trick allowing concurrent access as long as no conflict occurs (write and any access to the same cache line is a conflict, reading alone if fine and so is accessing different cache-lines).
For writes to a synchronized HashMap, this would nearly work, too. As long as both threads access parts of the HashMap belonging to different cache lines, they could work in parallel. Unfortunately, there are fields like size and modCount, which gets modified (nearly) on each write, so this wouldn't work in this case. It might work for one writer and one (or more) readers.
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/f84bwRQpyTQ/unsubscribe.
To unsubscribe from this group and all of its topics, send an email to mechanical-symp...@googlegroups.com.