class A { self =>
lazy val x: Int = {
(new Thread() {
override def run() = self.synchronized { while (true) {} }
}).start()
1
}
}
This code won't create deadlocks for implementation from scalac, as lazy val initialization holds monitor on self till till val initialization succeeds.
will give a thread unsafe implementation. Though, I believe that we should live @volatile as default, following 'least surprise' principle, but i didn't come up with an annotation that doesn't have a misleading name for such 'thread-unsafe lazy vals'.
Cheers,
Dmtiry
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
vis-a-vis @paulp and his critiques of Scala in its current state, what affect will Dotty have on the language?
Can't imagine Dotty in the mainstream, if it happens, will occur before 2016/2017, but am very curious to know of its potential impact on the Scala ecosystem.
I'd prefer just having one version of lazy. If it is slower, because it's thread-safe, then so be it.
Having two lazies is not worth the trouble and confusion imho.
On Sunday, February 16, 2014 05:45:02 Simon Ochsenreither wrote:
> Thinking further about this, I'd really like to drop the "2." from Scala's
> version number ... the effect from it is kind of insane from a marketing
> POV anyway.
Or like Java did it: version 1.6.0 is *marketed* as Java SE 6.
Cheers,
Juha
Or like Java did it: version 1.6.0 is *marketed* as Java SE 6.
On Sat, Feb 15, 2014 at 6:57 PM, Rex Kerr <ich...@gmail.com> wrote:
@noThreads
I think it's pretty clear what the assumptions are with that name.
It’s technically wrong though. Although I prefer having to use @volatile
, if we must stay backwards compatible, a better name would be @singleThread
.
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Hi Dmitry,using global (even if striped) locks for lazy vals is _extremely_ unsatisfactory (especially when used with identityHashCode, as it requires the hashCode to be stored in the object header due to compaction by GC (at least this is what I was told)).Using global locks will make lazy vals basically unusable for multithreaded systems as seemingly unrelated things will create contention–creating nondeterministic contention due to TLAB address ranges and allocation rates between runs.Some ideas come to mind:1) deadlock detection + blow up2) being able to specify what monitor to use for lazy valsHave you looked into either of those and what does that look like?
On Mon, Feb 17, 2014 at 1:19 PM, √iktor Ҡlang <viktor...@gmail.com> wrote:
Hi Dmitry,using global (even if striped) locks for lazy vals is _extremely_ unsatisfactory (especially when used with identityHashCode, as it requires the hashCode to be stored in the object header due to compaction by GC (at least this is what I was told)).Using global locks will make lazy vals basically unusable for multithreaded systems as seemingly unrelated things will create contention–creating nondeterministic contention due to TLAB address ranges and allocation rates between runs.Some ideas come to mind:1) deadlock detection + blow up2) being able to specify what monitor to use for lazy valsHave you looked into either of those and what does that look like?The simpler alternative would be to use the object containing the lazy val itself for the locking. The only disadvantage is then that user code that takes a lock on this same object can mess up things. So we would have to state a policy that objects containing lazy vals may not be used as locks in user code themselves. I am personally OK with that and think it might be more predictable than throwing too much magic with global lock pools at the problem. What do others think?
The simpler alternative would be to use the object containing the lazy val itself for the locking. The only disadvantage is then that user code that takes a lock on this same object can mess up things. So we would have to state a policy that objects containing lazy vals may not be used as locks in user code themselves. I am personally OK with that and think it might be more predictable than throwing too much magic with global lock pools at the problem. What do others think?I once prototyped a scheme whereby an object could declare a member `__lazyLock` which would be used in preference to `this` as the monitor for all lazy vals inside it.
On Mon, Feb 17, 2014 at 1:19 PM, √iktor Ҡlang <viktor...@gmail.com> wrote:
Hi Dmitry,using global (even if striped) locks for lazy vals is _extremely_ unsatisfactory (especially when used with identityHashCode, as it requires the hashCode to be stored in the object header due to compaction by GC (at least this is what I was told)).Using global locks will make lazy vals basically unusable for multithreaded systems as seemingly unrelated things will create contention–creating nondeterministic contention due to TLAB address ranges and allocation rates between runs.Some ideas come to mind:1) deadlock detection + blow up2) being able to specify what monitor to use for lazy valsHave you looked into either of those and what does that look like?The simpler alternative would be to use the object containing the lazy val itself for the locking. The only disadvantage is then that user code that takes a lock on this same object can mess up things. So we would have to state a policy that objects containing lazy vals may not be used as locks in user code themselves. I am personally OK with that and think it might be more predictable than throwing too much magic with global lock pools at the problem. What do others think?
-jason
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
We could also change the defaults a bit, and synthesize that member as a val containg with a fresh object. If you don't want to burn the extra field/allocation, and are happy to live with the risk of other code contending the lock, you could define that manually to return `this`.Can we make something where you can specify different monitors for different lazy vals (independent) in the same instance.
Can we make something where you can specify different monitors for different lazy vals (independent) in the same instance.
I'm wondering if the proposals that allow customization of lazy val are not going a bit too far? How often do you need to specify different monitors for lazy vals? Would it be ok to just roll your own implementation of the lazy val in such cases?On 17 February 2014 13:47, √iktor Ҡlang <viktor...@gmail.com> wrote:
Can we make something where you can specify different monitors for different lazy vals (independent) in the same instance.
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
I'm wondering if the proposals that allow customization of lazy val are not going a bit too far? How often do you need to specify different monitors for lazy vals? Would it be ok to just roll your own implementation of the lazy val in such cases?On 17 February 2014 13:47, √iktor Ҡlang <viktor...@gmail.com> wrote:
Can we make something where you can specify different monitors for different lazy vals (independent) in the same instance.
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
On Mon, Feb 17, 2014 at 1:56 PM, Grzegorz Kossakowski <grzegorz.k...@gmail.com> wrote:
I'm wondering if the proposals that allow customization of lazy val are not going a bit too far? How often do you need to specify different monitors for lazy vals? Would it be ok to just roll your own implementation of the lazy val in such cases?On 17 February 2014 13:47, √iktor Ҡlang <viktor...@gmail.com> wrote:
Can we make something where you can specify different monitors for different lazy vals (independent) in the same instance.I tend to agree. In any case, if there is only one class containing the lazyval, the __lazyLock workaround does not seem to buy us much.
Instead ofclass C {val __lazyLock = new Objectlazy val x = ...}one could equally well writeclass C {object lazies { lazy val x = ... }
}In both cases, the C instances would then be free to take locks.Cheers- Martin--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--Martin Odersky
EPFL--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
I tend to agree. In any case, if there is only one class containing the lazyval, the __lazyLock workaround does not seem to buy us much.
Instead ofclass C {val __lazyLock = new Objectlazy val x = ...}one could equally well writeclass C {object lazies { lazy val x = ... }}In both cases, the C instances would then be free to take locks.
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
class C {object lazies { lazy val x = ... }you'd need:def x = lazies.x
I want to see a revised benchmark where the initialization is properly lazy before I commit to agreeing that the feature sounds great. By-name parameters aren't free either, and right now the benchmark mixes the two issues.
There should be head to head non-by-name tests, and head to head by-name tests.
--
You received this message because you are subscribed to a topic in the Google Groups "scala-internals" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scala-internals/4sjw8pcKysg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scala-interna...@googlegroups.com.
Hi Dmitry,using global (even if striped) locks for lazy vals is _extremely_ unsatisfactory (especially when used with identityHashCode, as it requires the hashCode to be stored in the object header due to compaction by GC (at least this is what I was told)).Using global locks will make lazy vals basically unusable for multithreaded systems as seemingly unrelated things will create contention–creating nondeterministic contention due to TLAB address ranges and allocation rates between runs.
Some ideas come to mind:1) deadlock detection + blow up2) being able to specify what monitor to use for lazy vals
You received this message because you are subscribed to a topic in the Google Groups "scala-internals" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scala-internals/4sjw8pcKysg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scala-interna...@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "scala-internals" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scala-internals/4sjw8pcKysg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scala-interna...@googlegroups.com.
On 17 February 2014 13:19, √iktor Ҡlang <viktor.klang@gmail.com> wrote:
Hi Dmitry,using global (even if striped) locks for lazy vals is _extremely_ unsatisfactory (especially when used with identityHashCode, as it requires the hashCode to be stored in the object header due to compaction by GC (at least this is what I was told)).Using global locks will make lazy vals basically unusable for multithreaded systems as seemingly unrelated things will create contention–creating nondeterministic contention due to TLAB address ranges and allocation rates between runs.Hi Victor,
That's true, I've spend a week digging into this potential issue.
Please note that each lock is held only for small period of time, that doesn't depend on user-provided initialization code. That's actually why I'm creating number of monitors quadratic to number of processors: I've been benchmarking on 2XE5645 (12 cores, 24 threads) and collecting contention information by Intel tool "Vtune Amplifier XE". Test was: 50 threads concurrently creating LazyVals, inititalizing them and throwing them away. For 2*CPU_COUNT^2 contention was only 15% bigger that if using independent monitors. But: if using independent monitors there was a 5% slowdown, due to monitor expanding(monitors start as minimalistic biased locking, but are expended when there's contention). If 5*CPU_COUNT^2 monitors are used, the 'global-monitors' implementation is actually 2-3% faster.
But note that those benchmarks a EXTREMELY specific to hardware configuration.
especially when used with identityHashCode, as it requires the hashCode to be stored in the object header due to compaction by GC (at least this is what I was told).
On 17 February 2014 13:19, √iktor Ҡlang <viktor...@gmail.com> wrote:
especially when used with identityHashCode, as it requires the hashCode to be stored in the object header due to compaction by GC (at least this is what I was told).Both information about identityHashCode and lock is stored(or referenced) in 'MarkWord' field of object header.It has such structure @see slide6 http://www.cs.princeton.edu/picasso/mats/HotspotOverview.pdf
The transitions between different representations are explained here: https://wikis.oracle.com/display/HotSpotInternals/Synchronization
Note that by default, biased locking IS enabled, but as soon as first contention is found it's disabled for all class by bulk revocation. And this IS slow. Note that due to CAS biased locking isn't required, as CAS effectively replaces it with a better implementation.
The proposed approach with predefined set of monitors has advantage of not expanding monitors on lazy-val holder objects to heavyweight monitors, but it indeed switches objects to 'unlocked non-biasable' states.
I believe that identityHashCode is a lot less evil than locking on 'this', and actually more deterministic.
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
On Mon, Feb 17, 2014 at 4:01 PM, Dmitry Petrashko <dmitry.p...@gmail.com> wrote:
On 17 February 2014 13:19, √iktor Ҡlang <viktor...@gmail.com> wrote:
especially when used with identityHashCode, as it requires the hashCode to be stored in the object header due to compaction by GC (at least this is what I was told).Both information about identityHashCode and lock is stored(or referenced) in 'MarkWord' field of object header.It has such structure @see slide6 http://www.cs.princeton.edu/picasso/mats/HotspotOverview.pdf
The transitions between different representations are explained here: https://wikis.oracle.com/display/HotSpotInternals/Synchronization
Note that by default, biased locking IS enabled, but as soon as first contention is found it's disabled for all class by bulk revocation. And this IS slow. Note that due to CAS biased locking isn't required, as CAS effectively replaces it with a better implementation.
this is only in HotSpot though. We have J9, Zing, JRockit etc.
The proposed approach with predefined set of monitors has advantage of not expanding monitors on lazy-val holder objects to heavyweight monitors, but it indeed switches objects to 'unlocked non-biasable' states.
Yes, it would also most likely throw lock coarsening and lock elision out the window. Amirite?
I believe that identityHashCode is a lot less evil than locking on 'this', and actually more deterministic.How so?
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to a topic in the Google Groups "scala-internals" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scala-internals/4sjw8pcKysg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scala-interna...@googlegroups.com.
On 17 February 2014 16:07, √iktor Ҡlang <viktor...@gmail.com> wrote:
On Mon, Feb 17, 2014 at 4:01 PM, Dmitry Petrashko <dmitry.p...@gmail.com> wrote:
On 17 February 2014 13:19, √iktor Ҡlang <viktor...@gmail.com> wrote:
especially when used with identityHashCode, as it requires the hashCode to be stored in the object header due to compaction by GC (at least this is what I was told).Both information about identityHashCode and lock is stored(or referenced) in 'MarkWord' field of object header.It has such structure @see slide6 http://www.cs.princeton.edu/picasso/mats/HotspotOverview.pdf
The transitions between different representations are explained here: https://wikis.oracle.com/display/HotSpotInternals/Synchronization
Note that by default, biased locking IS enabled, but as soon as first contention is found it's disabled for all class by bulk revocation. And this IS slow. Note that due to CAS biased locking isn't required, as CAS effectively replaces it with a better implementation.
this is only in HotSpot though. We have J9, Zing, JRockit etc.The proposed approach with predefined set of monitors has advantage of not expanding monitors on lazy-val holder objects to heavyweight monitors, but it indeed switches objects to 'unlocked non-biasable' states.
Yes, it would also most likely throw lock coarsening and lock elision out the window. Amirite?AFAIK lock coarsening is independent of MarkWord structure, and I've been observing lock elision in my implementation just by reading native code. I believe that they still work.
I believe that identityHashCode is a lot less evil than locking on 'this', and actually more deterministic.How so?Because in proposed implementation on logic(both our and jvm's) will work on per-instance basis. While a single contended lazy val synchronizing on 'this' can disable biased locking for ALL instances of same class. In total we have both low cost for uncontended instance(simple CAS), and we are also better worse for contended case. In contended case were better because:
1) in case initializer is fast, in most cases initializer thread is fast enough to finish initialization before anyone is able to successfully CAS bitmap to state 2(this state means that someone is waiting for initialization to finish). This 'failed' CASes of other threads effectively replace a short spinlock.
2) in case initializer is slow, waiting IS required.
There can be the case, that additional hand-written spinlock will be beneficial: the case where initializers's cost is comparative to cost of CAS, but I've tried to do this, and with spin lock method is already big to inline, and benchmarks show see a 15% slowdown in more common cases.
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to a topic in the Google Groups "scala-internals" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scala-internals/4sjw8pcKysg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scala-interna...@googlegroups.com.--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
On Mon, Feb 17, 2014 at 4:30 PM, Dmitry Petrashko <dmitry.p...@gmail.com> wrote:
On 17 February 2014 16:07, √iktor Ҡlang <viktor...@gmail.com> wrote:
On Mon, Feb 17, 2014 at 4:01 PM, Dmitry Petrashko <dmitry.p...@gmail.com> wrote:
On 17 February 2014 13:19, √iktor Ҡlang <viktor...@gmail.com> wrote:
especially when used with identityHashCode, as it requires the hashCode to be stored in the object header due to compaction by GC (at least this is what I was told).Both information about identityHashCode and lock is stored(or referenced) in 'MarkWord' field of object header.It has such structure @see slide6 http://www.cs.princeton.edu/picasso/mats/HotspotOverview.pdf
The transitions between different representations are explained here: https://wikis.oracle.com/display/HotSpotInternals/Synchronization
Note that by default, biased locking IS enabled, but as soon as first contention is found it's disabled for all class by bulk revocation. And this IS slow. Note that due to CAS biased locking isn't required, as CAS effectively replaces it with a better implementation.
this is only in HotSpot though. We have J9, Zing, JRockit etc.The proposed approach with predefined set of monitors has advantage of not expanding monitors on lazy-val holder objects to heavyweight monitors, but it indeed switches objects to 'unlocked non-biasable' states.
Yes, it would also most likely throw lock coarsening and lock elision out the window. Amirite?AFAIK lock coarsening is independent of MarkWord structure, and I've been observing lock elision in my implementation just by reading native code. I believe that they still work.I don't see how it could elide it since you'll make the striped locks contended.
I believe that identityHashCode is a lot less evil than locking on 'this', and actually more deterministic.How so?Because in proposed implementation on logic(both our and jvm's) will work on per-instance basis. While a single contended lazy val synchronizing on 'this' can disable biased locking for ALL instances of same class. In total we have both low cost for uncontended instance(simple CAS), and we are also better worse for contended case. In contended case were better because:
1) in case initializer is fast, in most cases initializer thread is fast enough to finish initialization before anyone is able to successfully CAS bitmap to state 2(this state means that someone is waiting for initialization to finish). This 'failed' CASes of other threads effectively replace a short spinlock.
2) in case initializer is slow, waiting IS required.But the problem is that _all_ contended initializations that hash to the same lock that is already used during initialization will essentially park.
Doesn't that come with a huge domino-warning?Shouldn't the stripe length be configured as a System property? With followup: What is a good value for the stripe length?
There can be the case, that additional hand-written spinlock will be beneficial: the case where initializers's cost is comparative to cost of CAS, but I've tried to do this, and with spin lock method is already big to inline, and benchmarks show see a 15% slowdown in more common cases.
Did you try the technique I linked? And if so, what was the result?
On 17 February 2014 16:59, √iktor Ҡlang <viktor...@gmail.com> wrote:
On Mon, Feb 17, 2014 at 4:30 PM, Dmitry Petrashko <dmitry.p...@gmail.com> wrote:
On 17 February 2014 16:07, √iktor Ҡlang <viktor...@gmail.com> wrote:
On Mon, Feb 17, 2014 at 4:01 PM, Dmitry Petrashko <dmitry.p...@gmail.com> wrote:
On 17 February 2014 13:19, √iktor Ҡlang <viktor...@gmail.com> wrote:
especially when used with identityHashCode, as it requires the hashCode to be stored in the object header due to compaction by GC (at least this is what I was told).Both information about identityHashCode and lock is stored(or referenced) in 'MarkWord' field of object header.It has such structure @see slide6 http://www.cs.princeton.edu/picasso/mats/HotspotOverview.pdf
The transitions between different representations are explained here: https://wikis.oracle.com/display/HotSpotInternals/Synchronization
Note that by default, biased locking IS enabled, but as soon as first contention is found it's disabled for all class by bulk revocation. And this IS slow. Note that due to CAS biased locking isn't required, as CAS effectively replaces it with a better implementation.
this is only in HotSpot though. We have J9, Zing, JRockit etc.The proposed approach with predefined set of monitors has advantage of not expanding monitors on lazy-val holder objects to heavyweight monitors, but it indeed switches objects to 'unlocked non-biasable' states.
Yes, it would also most likely throw lock coarsening and lock elision out the window. Amirite?AFAIK lock coarsening is independent of MarkWord structure, and I've been observing lock elision in my implementation just by reading native code. I believe that they still work.I don't see how it could elide it since you'll make the striped locks contended.As lock is held for a very short period it's very hard to make it contended. But indeed possible. As I told, the direct benchmark that actually meassures this locking contention shows that contention isn't big.
I believe that identityHashCode is a lot less evil than locking on 'this', and actually more deterministic.How so?Because in proposed implementation on logic(both our and jvm's) will work on per-instance basis. While a single contended lazy val synchronizing on 'this' can disable biased locking for ALL instances of same class. In total we have both low cost for uncontended instance(simple CAS), and we are also better worse for contended case. In contended case were better because:
1) in case initializer is fast, in most cases initializer thread is fast enough to finish initialization before anyone is able to successfully CAS bitmap to state 2(this state means that someone is waiting for initialization to finish). This 'failed' CASes of other threads effectively replace a short spinlock.
2) in case initializer is slow, waiting IS required.But the problem is that _all_ contended initializations that hash to the same lock that is already used during initialization will essentially park.They will only park if they intend to wait on this monitor, which is a very rare occurrence for fast initializers and, as I believe, expected behavior for slow ones.Doesn't that come with a huge domino-warning?Shouldn't the stripe length be configured as a System property? With followup: What is a good value for the stripe length?I performed the benchmark for sake of detecting this good value. But, indeed system property can be provided. But we're back to question: if user wants a highly customizable lazy val, maybe he'd better go with his custom preferred implementation Lazy[T]?
There can be the case, that additional hand-written spinlock will be beneficial: the case where initializers's cost is comparative to cost of CAS, but I've tried to do this, and with spin lock method is already big to inline, and benchmarks show see a 15% slowdown in more common cases.
Did you try the technique I linked? And if so, what was the result?If I understood technique correctly, this is a cycle, that with probability of 1/2 decreases counter that was initially equal to 256?
I've been trying technique with predefined spinlock on 500 iterations.
On Mon, Feb 17, 2014 at 6:03 PM, Dmitry Petrashko <dmitry.p...@gmail.com> wrote:
On 17 February 2014 16:59, √iktor Ҡlang <viktor...@gmail.com> wrote:
On Mon, Feb 17, 2014 at 4:30 PM, Dmitry Petrashko <dmitry.p...@gmail.com> wrote:
On 17 February 2014 16:07, √iktor Ҡlang <viktor...@gmail.com> wrote:
On Mon, Feb 17, 2014 at 4:01 PM, Dmitry Petrashko <dmitry.p...@gmail.com> wrote:
On 17 February 2014 13:19, √iktor Ҡlang <viktor...@gmail.com> wrote:
especially when used with identityHashCode, as it requires the hashCode to be stored in the object header due to compaction by GC (at least this is what I was told).Both information about identityHashCode and lock is stored(or referenced) in 'MarkWord' field of object header.It has such structure @see slide6 http://www.cs.princeton.edu/picasso/mats/HotspotOverview.pdf
The transitions between different representations are explained here: https://wikis.oracle.com/display/HotSpotInternals/Synchronization
Note that by default, biased locking IS enabled, but as soon as first contention is found it's disabled for all class by bulk revocation. And this IS slow. Note that due to CAS biased locking isn't required, as CAS effectively replaces it with a better implementation.
this is only in HotSpot though. We have J9, Zing, JRockit etc.The proposed approach with predefined set of monitors has advantage of not expanding monitors on lazy-val holder objects to heavyweight monitors, but it indeed switches objects to 'unlocked non-biasable' states.
Yes, it would also most likely throw lock coarsening and lock elision out the window. Amirite?AFAIK lock coarsening is independent of MarkWord structure, and I've been observing lock elision in my implementation just by reading native code. I believe that they still work.I don't see how it could elide it since you'll make the striped locks contended.As lock is held for a very short period it's very hard to make it contended. But indeed possible. As I told, the direct benchmark that actually meassures this locking contention shows that contention isn't big.What's your testbed and which are the test-cases?
I believe that identityHashCode is a lot less evil than locking on 'this', and actually more deterministic.How so?Because in proposed implementation on logic(both our and jvm's) will work on per-instance basis. While a single contended lazy val synchronizing on 'this' can disable biased locking for ALL instances of same class. In total we have both low cost for uncontended instance(simple CAS), and we are also better worse for contended case. In contended case were better because:
1) in case initializer is fast, in most cases initializer thread is fast enough to finish initialization before anyone is able to successfully CAS bitmap to state 2(this state means that someone is waiting for initialization to finish). This 'failed' CASes of other threads effectively replace a short spinlock.
2) in case initializer is slow, waiting IS required.But the problem is that _all_ contended initializations that hash to the same lock that is already used during initialization will essentially park.They will only park if they intend to wait on this monitor, which is a very rare occurrence for fast initializers and, as I believe, expected behavior for slow ones.Doesn't that come with a huge domino-warning?Shouldn't the stripe length be configured as a System property? With followup: What is a good value for the stripe length?I performed the benchmark for sake of detecting this good value. But, indeed system property can be provided. But we're back to question: if user wants a highly customizable lazy val, maybe he'd better go with his custom preferred implementation Lazy[T]?True, but are you expecting him/her to rewrite all his/her lazy-val code and get the added benefit of higher memory usage + worse locality?
There can be the case, that additional hand-written spinlock will be beneficial: the case where initializers's cost is comparative to cost of CAS, but I've tried to do this, and with spin lock method is already big to inline, and benchmarks show see a 15% slowdown in more common cases.
Did you try the technique I linked? And if so, what was the result?If I understood technique correctly, this is a cycle, that with probability of 1/2 decreases counter that was initially equal to 256?
I've been trying technique with predefined spinlock on 500 iterations.AFAICS the contended benchmark has at most 4 Threads.
On Monday, 17 February 2014 18:25:43 UTC+1, √iktor Klang wrote:On Mon, Feb 17, 2014 at 6:03 PM, Dmitry Petrashko <dmitry.p...@gmail.com> wrote:
On 17 February 2014 16:59, √iktor Ҡlang <viktor...@gmail.com> wrote:
On Mon, Feb 17, 2014 at 4:30 PM, Dmitry Petrashko <dmitry.p...@gmail.com> wrote:
On 17 February 2014 16:07, √iktor Ҡlang <viktor...@gmail.com> wrote:
On Mon, Feb 17, 2014 at 4:01 PM, Dmitry Petrashko <dmitry.p...@gmail.com> wrote:
On 17 February 2014 13:19, √iktor Ҡlang <viktor...@gmail.com> wrote:
especially when used with identityHashCode, as it requires the hashCode to be stored in the object header due to compaction by GC (at least this is what I was told).Both information about identityHashCode and lock is stored(or referenced) in 'MarkWord' field of object header.It has such structure @see slide6 http://www.cs.princeton.edu/picasso/mats/HotspotOverview.pdf
The transitions between different representations are explained here: https://wikis.oracle.com/display/HotSpotInternals/Synchronization
Note that by default, biased locking IS enabled, but as soon as first contention is found it's disabled for all class by bulk revocation. And this IS slow. Note that due to CAS biased locking isn't required, as CAS effectively replaces it with a better implementation.
this is only in HotSpot though. We have J9, Zing, JRockit etc.The proposed approach with predefined set of monitors has advantage of not expanding monitors on lazy-val holder objects to heavyweight monitors, but it indeed switches objects to 'unlocked non-biasable' states.
Yes, it would also most likely throw lock coarsening and lock elision out the window. Amirite?AFAIK lock coarsening is independent of MarkWord structure, and I've been observing lock elision in my implementation just by reading native code. I believe that they still work.I don't see how it could elide it since you'll make the striped locks contended.As lock is held for a very short period it's very hard to make it contended. But indeed possible. As I told, the direct benchmark that actually meassures this locking contention shows that contention isn't big.What's your testbed and which are the test-cases?testbeds: 1) i7-3930K; oracleJDK 1.7.0_45; 2) 2x Xeon E5645; oracle JDK 1.7.0_51; 3) i7-2640M oracleJDK 1.7.0_51
test case: given array of 1 000 000 objects, start locking on them concurrently in 50 threads from different starting locations, compared to: start locking on getMonitor(object) implemented in prototype.
Body of synchronized method is per-thread public volatile update.
I believe that identityHashCode is a lot less evil than locking on 'this', and actually more deterministic.How so?Because in proposed implementation on logic(both our and jvm's) will work on per-instance basis. While a single contended lazy val synchronizing on 'this' can disable biased locking for ALL instances of same class. In total we have both low cost for uncontended instance(simple CAS), and we are also better worse for contended case. In contended case were better because:
1) in case initializer is fast, in most cases initializer thread is fast enough to finish initialization before anyone is able to successfully CAS bitmap to state 2(this state means that someone is waiting for initialization to finish). This 'failed' CASes of other threads effectively replace a short spinlock.
2) in case initializer is slow, waiting IS required.But the problem is that _all_ contended initializations that hash to the same lock that is already used during initialization will essentially park.They will only park if they intend to wait on this monitor, which is a very rare occurrence for fast initializers and, as I believe, expected behavior for slow ones.Doesn't that come with a huge domino-warning?Shouldn't the stripe length be configured as a System property? With followup: What is a good value for the stripe length?I performed the benchmark for sake of detecting this good value. But, indeed system property can be provided. But we're back to question: if user wants a highly customizable lazy val, maybe he'd better go with his custom preferred implementation Lazy[T]?True, but are you expecting him/her to rewrite all his/her lazy-val code and get the added benefit of higher memory usage + worse locality?I'm trying to come up with implementation that does reasonably good for common cases and is easy to maintain.
There can be the case, that additional hand-written spinlock will be beneficial: the case where initializers's cost is comparative to cost of CAS, but I've tried to do this, and with spin lock method is already big to inline, and benchmarks show see a 15% slowdown in more common cases.
Did you try the technique I linked? And if so, what was the result?If I understood technique correctly, this is a cycle, that with probability of 1/2 decreases counter that was initially equal to 256?
I've been trying technique with predefined spinlock on 500 iterations.AFAICS the contended benchmark has at most 4 Threads.This particular contended benchmark has only a constant measurable difference for running time for different number of threads(until reaching number of cores). Increasing number of threads makes test less informative, as it begins to benchmark OS scheduler.
While running this test on 2xXeon I pin threads to different CPUs during warmup.