Summary of discussions on lazy vals

687 views
Skip to first unread message

Aleksandar Prokopec

unread,
Apr 24, 2013, 1:29:40 PM4/24/13
to scala-i...@googlegroups.com, Miguel Garcia, Hubert Plociniczak, Odersky Martin
Hi all,

I am posting a summary of discussions we (Martin, Miguel, Hubert and I)
have recently had about how lazy vals are implemented and a possible
solution to the problem.
Several lazy vals-related problems were mentioned which I describe first.
A discussion and a proposed solution comes after that.
At the end of the post there are several questions.

Problem 1
--------

One problem we've discussed on one of the Scala meetings recently was
about lazy vals initialized by different threads potentially leading to
deadlocks.
It's as follows.
Assume there are two objects A and B with lazy vals `a` and `b`, and
`c`, respectively:

A {
lazy val a = B.b
lazy val c = ...
}

B {
lazy val b = A.c
}

The initialization block of `a` above refers to `b` in B, and the
initialization of B refers to `A.c`.
While a circular dependency exists between these two objects, there is
actually no circular dependency between specific lazy vals `a`, `b` and `c`.

However, in the current lazy vals implementation there exists a
possibility of a deadlock occuring if thread Ta attempts to initialize
`A.a` and thread Tb attempts to initialize `B.b`.
The initialization looks like the following (simplified below):

A {
def a = if (bitmap(a)) a else this.synchronized {
if (bitmap(a)) a else {
val res = B.b
bitmap(a) = true
a = res
a
}
}
}

And the similar pattern for `b`.
Assume now that both T1 and T2 start the synchronized blocks
simultaneously - a deadlock will occur due to each thread trying to grab
a lock of the other object.

A solution is described below that eliminates the possibility of the
deadlock in this situation.


Problem 2
--------
Just discussed with Hubert - assume that in the previous example `c`
refers to `A.a`, thus forming a circular dependency between the lazy vals.
Both in the current lazy vals implementation and in the in the solution
described lower in the post this would cause a deadlock.

Note that in a single threaded scenario lazy val circular dependencies
lead to stack overflows:

scala> object Test {
| object A { lazy val a: Int = B.b }
| object B { lazy val b: Int = A.a }
| }
defined object Test

scala> Test
res0: Test.type = Test$@6fd1046d

scala> Test.A.a
java.lang.StackOverflowError

Therefore, I think this is ok to treat this as erroneous code, and this
should not be a major issue.



Problem 3
--------
There was a discussion on whether or not singleton objects (and lazy
vals) as a whole are a broken abstraction, and if their initialization
semantics are wrong.
Concretely, the problem has to do with accessing a singleton object from
a thread created from within the singleton object constructor, which
then waits for the completion of that thread.

I won't post a snippet here, but there is exactly the same stack
overflow question with parallel collections used from within singleton
objects (same thing happens with futures, actors, or just by using
threads):

http://stackoverflow.com/questions/15176199/scala-parallel-collection-in-object-initializer-causes-a-program-to-hang


Apparently, Java suffers from exactly the same issues when it comes to
static initialization blocks:

http://stackoverflow.com/questions/7517964/program-hangs-if-thread-is-created-in-static-initializer-block

http://docs.oracle.com/javase/specs/jls/se7/html/jls-12.html#jls-12.4.2

I think that ignoring this is doing what Java does in exactly the same
situation.

Note that this problem also happens if two separate threads try to
initialize two singleton objects that refer to each other, which is
somewhat more severe.
Nevertheless, the rule of the thumb for programmers should be -
singleton objects should not refer to each other during initialization.


Discussion
---------
One possible solution proposed was to allow potential double
initializations and assign values to the lazy field with a CAS
atomically to avoid deadlocks.

Personally, I'm not too much in favor of having lazy fields potentially
doubly-initialized, which a pure lock-free CAS approach would allow.
1) Bugs that arise from these kind of semantics are extremely hard to
track. A deadlock occurs more regularly, and can be tracked down more
easily (in the sense of everything coming to a halt which you can then
analyze with VisualVM, YourKit or something similar - you get a stack
trace of the problem, rather than the consequences, for which you have
to deduced what caused them).
2) While it eliminates deadlocks, it does not eliminate stack overflows
and infinite recursion in e.g. problem 2 above. Erroneous code stays
erroneous.
3) There exists a problem of CAS instructions not being available due to
permission limitations. A synchronized block is on the other hand always
available.

However, Miguel here had a nice idea about how this problem could be solved.
The idea is to remove the initialization code out of the synchronized block.
It is as follows:

0) Every lazy val field has the type AnyRef, initialized with some
singleton value UNINITIALIZED .
1) To initialize the lazy val, read the lazy val field. If it is set to
UNINITIALIZED , go to 2).
Otherwise, if it is set to some singleton value INPROGRESS,
synchronize(this) { while (READ(lazyValField) == INPROGRESS) this.wait()
}, then re-read.
2) Use CAS on the lazy val object field to switch it from UNINITIALIZED
to INPROGRESS. If unsuccessful, retry from 1), otherwise go to 3).
3) Initialize the lazy val by executing the initialization block and CAS
from INPROGRESS to the computed value.
4) synchronize(this) { this.notifyAll() }

Note that this solves Problem 1 above - it eliminates the possibility of
a deadlock. But it requires CAS, boxing and casting.
This idea can be done using synchronized blocks only and can work
without using CAS, boxing and casting.

0) Every lazy val field has its own type (value or reference type),
initialized with the default value.
It also uses a volatile bitmap as in the current implementation, but
each lazy val state is represented with 2 bits instead of 1.
1) To initialize the lazy val, read its corresponding 2 bits in the
bitmap (from now on, I refer to this as state). If state is INITIALIZED,
just read the lazy val field (fast path).
Otherwise, if it is UNINITIALIZED, go to 2).
Otherwise, if it is INPROGRESS, synchronize(this) { while (readBitmap
== INPROGRESS) this.wait() }, the re-read.
2) synchronize(this) { if (readBitmap == UNINITIALIZED)
writeBitmap(INPROGRESS) else "go to 1)" }
3) compute the value for the lazy val
4) synchronize(this) { writeBitmap(INITIALIZED); writeField(computed
value); this.notifyAll() }

The main difference is that in the new solution the `this` object is not
used as a lock to protect the critical section, but as a monitor on
which objects can wait for the initialization of a specific lazy val to
complete and then be properly notified once it is.

The changes versus the current implementation are thus:
- each lazy val requires 2 bits instead of one to encode 3 different
states - I don't think this changes much in the memory footprint since
we waste _at least_ an entire byte anyway as soon as the first lazy val
appears in a class, the bitmap of which is currently encoded as a
boolean (see why here:
http://www.codeinstructions.com/2008/12/java-objects-memory-structure.html),
and in most cases for objects that come in great numbers there shouldn't
be many lazy vals in them
- we have 2 synchronized blocks instead of one - one at the beginning to
set the bitmap state for the lazy val to INPROGRESS, and one at the end
to set the bitmap to INITIALIZED and write to the field
- note - the fast path does not change

Question for concurrency and performance experts is:
1) Do you see anything wrong with this idea (in terms of solving Problem
1, but also in general)?
2) Performance-wise, do you see any problems with having two
synchronized blocks instead of just one as in the current implementation?

Questions in general:
1) Do you see this kind of a change as valuable / should it be implemented?
2) Does this warrant a SIP?

Thanks,
Alex



--
Aleksandar Prokopec
Doctoral Assistant
LAMP, IC, EPFL
http://people.epfl.ch/aleksandar.prokopec

Eugene Burmako

unread,
Apr 24, 2013, 1:37:43 PM4/24/13
to <scala-internals@googlegroups.com>, Miguel Garcia, Hubert Plociniczak, Odersky Martin
How will this work with multiple lazy vals in the same object initialized simultaneously?




--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-internals+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Sébastien Doeraene

unread,
Apr 24, 2013, 1:51:56 PM4/24/13
to scala-i...@googlegroups.com, Miguel Garcia, Hubert Plociniczak, Odersky Martin
On Wed, Apr 24, 2013 at 7:29 PM, Aleksandar Prokopec <aleksanda...@gmail.com> wrote:
0) Every lazy val field has its own type (value or reference type), initialized with the default value.
  It also uses a volatile bitmap as in the current implementation, but each lazy val state is represented with 2 bits instead of 1.
1) To initialize the lazy val, read its corresponding 2 bits in the bitmap (from now on, I refer to this as state). If state is INITIALIZED, just read the lazy val field (fast path).
  Otherwise, if it is UNINITIALIZED, go to 2).
  Otherwise, if it is INPROGRESS, synchronize(this) { while (readBitmap == INPROGRESS) this.wait() }, the re-read.
2) synchronize(this) { if (readBitmap == UNINITIALIZED) writeBitmap(INPROGRESS) else "go to 1)" }
3) compute the value for the lazy val
4) synchronize(this) { writeBitmap(INITIALIZED); writeField(computed value); this.notifyAll() }

Shouldn't the two assignments in step 4) be inverted?
writeField(computed value); writeBitmap(INITIALIZED);
to cope with a concurrent thread just arriving in the fast path? If that thread sees the bitmap change and proceeds to the fast path (without locking), it could read the not-yet-initialized field, couldn't it?

And even so, isn't there a possibility that the write to the bitmap is seen by another thread doing the fast path before that thread sees the write of the computed value? I'm not completely sure of the memory model of the JVM in that respect.

Cheers,
Sébastien

Simon Ochsenreither

unread,
Apr 24, 2013, 2:13:07 PM4/24/13
to scala-i...@googlegroups.com, Miguel Garcia, Hubert Plociniczak, Odersky Martin
Did you consider something along the lines of http://www.slideshare.net/CharlesNutter/jax-2012-invoke-dynamic-keynote, slides 82ff.?

Miguel Garcia

unread,
Apr 24, 2013, 2:54:00 PM4/24/13
to scala-i...@googlegroups.com, Miguel Garcia, Hubert Plociniczak, Odersky Martin
It gets a bit off topic (Java 8) but in a nutshell, "lazy constants" via invokedynamic,

  https://www.java.net//jsr-292-goodness-lazy-singleton-pattern-12

amounts to having the bootstrap method for that particular invokedynamic occurrence return something like:

  new ConstantCallSite(MethodHandles.constant(the-computed-lazy-value));

From then on, that invokedynamic occurrence will just invoke the already installed ConstantCallSite, which will return the wrapped value. The VM is supposed to reduce all that to just a constant load, at least that's the idea.

In other words, the body of the getter for a lazy val consists solely of the invokedynamic in question (that getter should remain as single entry point to the bootstrap method that actually computes the value).

Two remarks:

  (a) computing for the first time the value requires synchronization in the bootstrap method, thus the main problem doesn't go away.
  (b) on the plus side, after initialization no bit-checking, no double-checked-lock, no nothing is needed. Assuming the VM optimizes as advertised :)


Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/

Rex Kerr

unread,
Apr 24, 2013, 3:05:36 PM4/24/13
to scala-i...@googlegroups.com
On Wed, Apr 24, 2013 at 1:29 PM, Aleksandar Prokopec <aleksanda...@gmail.com> wrote:

0) Every lazy val field has its own type (value or reference type), initialized with the default value.
  It also uses a volatile bitmap as in the current implementation, but each lazy val state is represented with 2 bits instead of 1.
1) To initialize the lazy val, read its corresponding 2 bits in the bitmap (from now on, I refer to this as state). If state is INITIALIZED, just read the lazy val field (fast path).
  Otherwise, if it is UNINITIALIZED, go to 2).
  Otherwise, if it is INPROGRESS, synchronize(this) { while (readBitmap == INPROGRESS) this.wait() }, the re-read.
2) synchronize(this) { if (readBitmap == UNINITIALIZED) writeBitmap(INPROGRESS) else "go to 1)" }
3) compute the value for the lazy val
4) synchronize(this) { writeBitmap(INITIALIZED); writeField(computed value); this.notifyAll() }


Well, aside from the trivial change of swapping the field write and initialized (so that when the flag goes live, the data is in place), that looks fine.  I don't quite follow why you want to synchronize on the INPROGRESS loop in (1); you end up waiting out of it anyway.  Is this just to trigger the notify instead of busy-waiting?

In my simplest-possible test of a single synchronized block vs. two wrapping an increment (i.e. sync { x += 1 } vs. val y = sync{x}+1, sync{x = y}), the former is twice as fast.

Since you have to pay this double penalty _every_ time, I wonder whether it is worth it to switch between the two implementations.  That is, you _only_ use this method if the computation of the lazy val may involve another lazy val; otherwise if you've got the lock you just compute the lazy val right there instead of dropping back out and in again.  Otherwise, memoization-heavy operations could take twice as long.

  --Rex

Aleksandar Prokopec

unread,
Apr 24, 2013, 4:01:40 PM4/24/13
to scala-internals
A completely valid remark, Sebastien - yes, they should be inverted.

Note, the bitmap must be volatile.
I would have to brush up on the JMM details, but I believe that since
the read of the bitmap in the 2nd thread would precede the read of the
field, and the write of the bitmap in the initializing thread comes
after the write to the field. This should ensure visibility of the
lazy field write.

Thanks,
Alex
> by another thread doing the fast path *before* that thread sees the write

Aleksandar Prokopec

unread,
Apr 24, 2013, 4:03:08 PM4/24/13
to scala-internals
This is precisely the scenario that this is supposed to address - if
the lazy value initialization does not form a circular dependency,
then any number of threads doing the initialization should eventually
complete.

Could you elaborate further - do you maybe have some specific
initialization example?

Cheers,
Alex

On Apr 24, 7:37 pm, Eugene Burmako <eugene.burm...@epfl.ch> wrote:
> How will this work with multiple lazy vals in the same object initialized
> simultaneously?
>
> On 24 April 2013 19:29, Aleksandar Prokopec
> <aleksandar.proko...@gmail.com>wrote:
> >http://stackoverflow.com/**questions/15176199/scala-**
> > parallel-collection-in-object-**initializer-causes-a-program-**to-hang<http://stackoverflow.com/questions/15176199/scala-parallel-collection...>
>
> > Apparently, Java suffers from exactly the same issues when it comes to
> > static initialization blocks:
>
> >http://stackoverflow.com/**questions/7517964/program-**
> > hangs-if-thread-is-created-in-**static-initializer-block<http://stackoverflow.com/questions/7517964/program-hangs-if-thread-is...>
> >http://docs.oracle.com/javase/**specs/jls/se7/html/jls-12.**
> > html#jls-12.4.2<http://docs.oracle.com/javase/specs/jls/se7/html/jls-12.html#jls-12.4.2>
> > (see why here:http://www.codeinstructions.**com/2008/12/java-objects-**
> > memory-structure.html<http://www.codeinstructions.com/2008/12/java-objects-memory-structure...>),
> > and in most cases for objects that come in great numbers there shouldn't be
> > many lazy vals in them
> > - we have 2 synchronized blocks instead of one - one at the beginning to
> > set the bitmap state for the lazy val to INPROGRESS, and one at the end to
> > set the bitmap to INITIALIZED and write to the field
> > - note - the fast path does not change
>
> > Question for concurrency and performance experts is:
> > 1) Do you see anything wrong with this idea (in terms of solving Problem
> > 1, but also in general)?
> > 2) Performance-wise, do you see any problems with having two synchronized
> > blocks instead of just one as in the current implementation?
>
> > Questions in general:
> > 1) Do you see this kind of a change as valuable / should it be implemented?
> > 2) Does this warrant a SIP?
>
> > Thanks,
> > Alex
>
> > --
> > Aleksandar Prokopec
> > Doctoral Assistant
> > LAMP, IC, EPFL
> >http://people.epfl.ch/**aleksandar.prokopec<http://people.epfl.ch/aleksandar.prokopec>
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "scala-internals" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to scala-internals+unsubscribe@**googlegroups.com<scala-internals%2Bunsubscrib e...@googlegroups.com>
> > .
> > For more options, visithttps://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
> > .

Eugene Burmako

unread,
Apr 24, 2013, 4:06:03 PM4/24/13
to <scala-internals@googlegroups.com>
Nothing specific, just wondered whether you're essentially suggesting to serialize initialization of possibly independent lazy fields of the same object. But, on a second thought, you can't know that they are independent, so I guess it's the only way to go.


To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.

Aleksandar Prokopec

unread,
Apr 24, 2013, 4:13:40 PM4/24/13
to scala-internals
In loop in (1) we synchronize on this, and before leaving the monitor
verify that the field has really been initialized - a spurious wakeup
may have woken up the thread, in which case it might go to wait again.
Am I missing something?

I think that with knowledge that an initialization expression does not
depend on other lazy vals you could safely use the single synchronized
block initialization - seems like a nice idea.
One would have to include the interaction with static class
initializers into this analysis, though, as that could cause a
deadlock in the same way as two mutually interacting lazy vals.

Cheers,
Alex

Jason Zaugg

unread,
Apr 24, 2013, 4:46:41 PM4/24/13
to scala-i...@googlegroups.com
On Wed, Apr 24, 2013 at 7:29 PM, Aleksandar Prokopec <aleksanda...@gmail.com> wrote:
Questions in general:
1) Do you see this kind of a change as valuable / should it be implemented?
2) Does this warrant a SIP?

Yes and yes.

I share your opinion that we should not* pursue any scheme that merely minimizes, rather than eliminates, the chance of a lazy val being successfully computed twice.

I'd like an 'executable' SIP for this one: elaborate the scheme(s) under consideration with some manually implemented examples, and let the reviewers try to find ways to deadlock them, and to measure the performance.

-jason

* well, there is probably still room for, as Paul recently suggested, `@unsynchronized lazy val ... `, which would be useful for caches or for objects that need not be thread safe.

Rex Kerr

unread,
Apr 24, 2013, 5:12:12 PM4/24/13
to scala-i...@googlegroups.com
No, you're not missing anything; I'm just wondering if there's any point I missed beside the avoidance of a busy-wait.  Busy-waits are generally faster for very short times, but of course you have no idea how long the initialization might take--you might read a file, download the entire internet, etc.--so I didn't mean that the scheme was a bad one.  I think I used the wrong phrasing to indicate the nature of my uncertainty.

Anyway, yes, you would need to be careful about static initializers also.  I am afraid I don't fully understand the rules behind static initializers; I've seen enough weird stack traces through them to convince myself that it's stranger than I'd imagined.

  --Rex

P.S. It's also worth keeping in mind enforcing an ordering on lock acquisition to avoid deadlocks.  I'm not sure whether this kind of analysis would be available, but it's an alternate way to do things: if A < B, and you lock B, you guarantee that you will return without needing to lock A.  Otherwise, you must lock A first, then B.  If you can find (and follow) a partial ordering s.t. every set of locks actually required simultaneously is totally ordered, you are guaranteed not to deadlock.  (This is generally not a good idea if there are dozens of leaves you might possibly want to lock.  For relatively shallow chains it can be very effective.)




Erik Osheim

unread,
Apr 25, 2013, 1:57:37 PM4/25/13
to scala-i...@googlegroups.com
On Wed, Apr 24, 2013 at 10:46:41PM +0200, Jason Zaugg wrote:
> On Wed, Apr 24, 2013 at 7:29 PM, Aleksandar Prokopec <
> aleksanda...@gmail.com> wrote:
>
> > Questions in general:
> > 1) Do you see this kind of a change as valuable / should it be implemented?
> > 2) Does this warrant a SIP?
> >
>
> Yes and yes.

Agreed.

> I'd like an 'executable' SIP for this one: elaborate the scheme(s) under
> consideration with some manually implemented examples, and let the
> reviewers try to find ways to deadlock them, and to measure the performance.

This is a great idea.

-- Erik

Paul Phillips

unread,
Apr 25, 2013, 3:33:08 PM4/25/13
to scala-i...@googlegroups.com

On Wed, Apr 24, 2013 at 1:46 PM, Jason Zaugg <jza...@gmail.com> wrote:
I'd like an 'executable' SIP for this one:

I'd like an 'executable' SIP for all of them.

Erik Bruchez

unread,
Apr 25, 2013, 5:53:49 PM4/25/13
to scala-i...@googlegroups.com
On Wed, Apr 24, 2013 at 11:54 AM, Miguel Garcia <miguel...@tuhh.de> wrote:
> It gets a bit off topic (Java 8) but in a nutshell, "lazy constants" via
> invokedynamic,

[…]

> (b) on the plus side, after initialization no bit-checking, no
> double-checked-lock, no nothing is needed. Assuming the VM optimizes as
> advertised :)

This is great! A discussion about the future of lazy vals should
definitely look into performance considerations.

-Erik

Aleksandar Prokopec

unread,
May 17, 2013, 8:46:56 AM5/17/13
to scala-i...@googlegroups.com, Paul Phillips, Jason Zaugg, Miguel Garcia, Hubert Plociniczak, Adriaan Moors, Odersky Martin
Hi,

As a follow-up to what's been agreed on the last Scala meeting,
I am posting are the performance measurements for the current lazy val initialization overheads,
and a breakdown of the overheads related to the proposed changes.

I measured the following:
- instantiating an object with a value
- instantiating an object and initializing its lazy value
- same thing, but with manually implemented lazy vals with boolean bitmaps
- same thing, but with byte bitmaps
- the proposed lazy value change with 2 synchronizations blocks, but without a notifying potential waiters
- same thing, but with a notify call
repeated 1000000 through 5000000 times, in steps of 1000000.
The instantiated object is assigned to a field each time, to discard potential VM optimizations.
The initialization code is composed of creating an integer literal `0` - I chose to minimum amount
of computation to avoid having the computation costs amortize the lazy-val-related initialization costs.
This should thus be the borderline case where the program does nothing else but instantiate objects and initialize their lazy vals.
Before taking a look at the results, note that this is not what most applications do.

In these tests I focused on the case where there are no other threads accessing these fields - the access to the field is not contended.

My platform:
i7-2600, quad-core, hyperthreading, 3.4GHz
MacOS X 10.7.5
Java 7, update 4, 64-bit
Scala 2.10.1

The reports themselves are here:

http://lampwww.epfl.ch/~prokopec/lazyvals/report/

(click on the LazyVals box in the field to select all of the different measurements)

Comment:
The current lazy val implementation seems to incur initialization costs that are at least 6 times greater compared to referencing a regular val.
The handwritten implementation produces identical bytecode, with the difference that the calls are virtual instead of just querying the field value, thus it is 25% slower.
Replacing boolean bitmaps with bytes seems to slow things down another 25%, possibly due to bit manipulations (or int to byte conversions).
The 2 synchronized blocks design is as fast as the manually written byte-bitmaps version
(although after a _long_ warmup, initially it is 50% slower - possibly some JIT optimizations kicks in that would not be available if the code weren't executing in isolation).
The 2 synchronized blocks design with a notify is 5 times slower than the current lazy val implementation - just adding the `notifyAll` changes things considerably.

The repo with the executable microbenchmarks (using ScalaMeter) with the code that produced these:

https://github.com/axel22/lazy-val-bench

The snippets being tested themselves are here:

https://github.com/axel22/lazy-val-bench/blob/master/src/test/scala/example/SyncBench.scala

Any comments on different approaches to measuring the performance overheads are appreciated.
Also, feel free to run these on your machines as well.

Question is also whether we're willing to bite the bullet with these numbers, and if we're not, is there a way to avoid the notify.

Thanks,
Alex

Simon Ochsenreither

unread,
May 17, 2013, 9:21:47 AM5/17/13
to scala-i...@googlegroups.com, Paul Phillips, Jason Zaugg, Miguel Garcia, Hubert Plociniczak, Adriaan Moors, Odersky Martin
Wow, nice benchmark!
Did you consider a design based on a SwitchPoint?

Aleksandar Prokopec

unread,
May 17, 2013, 9:39:11 AM5/17/13
to scala-i...@googlegroups.com, Simon Ochsenreither, Paul Phillips, Jason Zaugg, Miguel Garcia, Hubert Plociniczak, Adriaan Moors, Odersky Martin
Interesting. No I did not think about this implementation - I have no prior experiences with MethodHandles performance.

Cheers,
Alex



On 5/17/13 3:21 PM, Simon Ochsenreither wrote:
Wow, nice benchmark!
Did you consider a design based on a SwitchPoint?
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 


-- 
Aleksandar Prokopec,
Doctoral Assistant
LAMP, IC, EPFL
http://people.epfl.ch/aleksandar.prokopec

Simon Ochsenreither

unread,
May 17, 2013, 9:47:23 AM5/17/13
to scala-i...@googlegroups.com, Simon Ochsenreither, Paul Phillips, Jason Zaugg, Miguel Garcia, Hubert Plociniczak, Adriaan Moors, Odersky Martin, aleksanda...@epfl.ch
As far as I understand (and I could be totally wrong), you could have a SwitchPoint where the “primary” action is to evaluate the val. As soon as this has happened, the SwitchPoint switches to the “secondary” action which just returns the value and invalidates the old call-sites so that they pick up the new action. This would mean we wouldn't need any bitset at all.

Please take all of this with tons of salt.

Aleksandar Prokopec

unread,
May 17, 2013, 9:49:50 AM5/17/13
to scala-i...@googlegroups.com, Simon Ochsenreither, Paul Phillips, Jason Zaugg, Miguel Garcia, Hubert Plociniczak, Adriaan Moors, Odersky Martin
Would that mean we would need one SwitchPoint object per lazy-val field?


On 5/17/13 3:47 PM, Simon Ochsenreither wrote:
As far as I understand (and I could be totally wrong), you could have a SwitchPoint where the �primary� action is to evaluate the val. As soon as this has happened, the SwitchPoint switches to the �secondary� action which just returns the value and invalidates the old call-sites so that they pick up the new action. This would mean we wouldn't need any bitset at all.


Please take all of this with tons of salt.
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
�
�

Johannes Rudolph

unread,
May 17, 2013, 10:03:23 AM5/17/13
to scala-i...@googlegroups.com, Paul Phillips, Jason Zaugg, Miguel Garcia, Hubert Plociniczak, Adriaan Moors, Odersky Martin
Isn't this a "static" (link-time) optimization which only works for
global singletons? Or, in other words, many call-sites don't access
singleton lazy vals but rather fields of instances. How would
switching out the call-target help there?

--
Johannes

-----------------------------------------------
Johannes Rudolph
http://virtual-void.net

Miguel Garcia

unread,
May 17, 2013, 10:25:55 AM5/17/13
to scala-i...@googlegroups.com, Paul Phillips, Jason Zaugg, Miguel Garcia, Hubert Plociniczak, Adriaan Moors, Odersky Martin, johannes...@googlemail.com

SwitchPoitns is JDK 8 music therefore not for 2.11

global-singleton lazy vals can be implemented either as:
  (a.1) invokedynamic returning a constant callsite wrapping a constant MH
        or
  (a.2) as a SwitchPoint singleton

a non-global lazy val can be implemented with a SwitchPoint instance,
  reset upon entering the scope of the lazy-val (for example, a lazy val declared in a loop body is reset on each iteration)
  invoke the constant method handle afterwards, which in turn is established upon first access after reset

The implementation in both cases is simpler than current one, with the caveat that execution on "first access" still needs synchronization. Subsequent accesses should be faster.

Sounds promising, but not for 2.11.


Miguel
http://lampwww.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/

Simon Ochsenreither

unread,
May 17, 2013, 10:47:34 AM5/17/13
to scala-i...@googlegroups.com, Paul Phillips, Jason Zaugg, Miguel Garcia, Hubert Plociniczak, Adriaan Moors, Odersky Martin, johannes...@googlemail.com

SwitchPoints is JDK 8 music therefore not for 2.11

It's in JDK7, but considering the sad state of invokedynamic, I agree.

Sounds promising, but not for 2.11.

Agree.

Johannes Rudolph

unread,
May 17, 2013, 11:20:34 AM5/17/13
to scala-i...@googlegroups.com, Paul Phillips, Jason Zaugg, Miguel Garcia, Hubert Plociniczak, Adriaan Moors, Odersky Martin
Thanks for the explanation Miguel. So, IIUC you trade the
unsynchronized test vs. one (or more?) additional indirect jumps.

On Fri, May 17, 2013 at 4:25 PM, Miguel Garcia <miguel...@tuhh.de> wrote:
> The implementation in both cases is simpler than current one, with the
> caveat that execution on "first access" still needs synchronization.
> Subsequent accesses should be faster.

Faster than guarded by a condition (current situation) or faster than
on first access? It's not clear to me why the added indirection should
be faster than a simple condition (though maybe easier to implement).
For branch prediction the guard would maybe even be easier: if you
assume that instances usually are initialized the branch predictor
could make use of this fact. To predict the right branch when you have
a loop that accesses different instances of a lazy val field is
impossible with the added indirection because every indirection would
be to a different target (especially if all values are initialized
because all will usually point to different constant methods). But
maybe my understanding or intuition is wrong or it just doesn't make
such a big difference.

Johannes Rudolph

unread,
May 17, 2013, 11:54:59 AM5/17/13
to scala-i...@googlegroups.com, Paul Phillips, Jason Zaugg, Miguel Garcia, Hubert Plociniczak, Adriaan Moors, Odersky Martin
On Fri, May 17, 2013 at 5:20 PM, Johannes Rudolph
<johannes...@googlemail.com> wrote:
> be to a different target (especially if all values are initialized
> because all will usually point to different constant methods). But
> maybe my understanding or intuition is wrong or it just doesn't make
> such a big difference.

Scrap that, I was talking garbage, there would, of course, be only one
constant method containing code to just access the value of the
instance. This should be predictable the same way as the guard.

Peter Levart

unread,
May 20, 2013, 5:31:29 PM5/20/13
to scala-i...@googlegroups.com, Miguel Garcia, Hubert Plociniczak, Odersky Martin
Hi Aleksandar,

I saw your post on the concurrenc...@cs.oswego.edu list about the overhead of notifyAll(). Here's what I came up with:

https://github.com/plevart/concurrent-utils/blob/master/src/si/pele/concurrent/LazyLock.java

and a usecase:

https://github.com/plevart/concurrent-utils/blob/master/src/si/pele/concurrent/Container.java


This is a hybrid of both approaches described in your post. It uses 32 bit CAS which is standard JDK7 API accessible to anyone (not just system classes) and does not do boxing of individual lazy vals. It maintains the "initialized" state together with "lock" state in 2 bits per lazy val (4 different states). Initially the overhead for state is one int field which is enough for 16 lazy vals. This is because the CAS api is only 32 bit. The benefit of this approach is that it uses .wait() and .notifyAll() only when there is contention (2 or more threads are competing to initialize the lazy val). When there's not contention on the lazy val initialization, the cost of initialization is just 2 x CAS.

What do you think about this approach?

Regards, Peter

Aleksandar Prokopec

unread,
May 20, 2013, 6:03:52 PM5/20/13
to scala-i...@googlegroups.com
I just took a brief look, will respond in more detail in the morning.
So, is the idea to try to circumvent the notifyAll call if there are no other readers?

In fact, we might do this without a CAS too - we could have the second-arriving thread set the bitmap into the 4th state, that means that a notify is needed. We use 2 bits for the lazy val in the new scheme anyway, and this allows us to encode these 4 states.


Peter Levart

unread,
May 20, 2013, 6:43:55 PM5/20/13
to scala-i...@googlegroups.com


On Tuesday, May 21, 2013 12:03:52 AM UTC+2, Aleksandar Prokopec wrote:
I just took a brief look, will respond in more detail in the morning.
So, is the idea to try to circumvent the notifyAll call if there are no other readers?

Exactly. If there are no concurrent 1st-time readers.
 

In fact, we might do this without a CAS too - we could have the second-arriving thread set the bitmap into the 4th state, that means that a notify is needed. We use 2 bits for the lazy val in the new scheme anyway, and this allows us to encode these 4 states.

That's exactly the idea I implemented. But I also used CAS instead of synchronization for the case where there's no concurrent 1st-time readers and no notification is necessary. But could do with synchronization too. If it's a situation with no contention then object monitor will not be inflated and it could be fast too. CAS should be of course faster. Why don't you like CAS? Because it's JDK7+ only?

Regards, Peter
 

Aleksandar Prokopec

unread,
May 21, 2013, 4:33:05 AM5/21/13
to scala-i...@googlegroups.com, Peter Levart
On 5/21/13 12:43 AM, Peter Levart wrote:


On Tuesday, May 21, 2013 12:03:52 AM UTC+2, Aleksandar Prokopec wrote:
I just took a brief look, will respond in more detail in the morning.
So, is the idea to try to circumvent the notifyAll call if there are no other readers?

Exactly. If there are no concurrent 1st-time readers.
 

In fact, we might do this without a CAS too - we could have the second-arriving thread set the bitmap into the 4th state, that means that a notify is needed. We use 2 bits for the lazy val in the new scheme anyway, and this allows us to encode these 4 states.

That's exactly the idea I implemented. But I also used CAS instead of synchronization for the case where there's no concurrent 1st-time readers and no notification is necessary. But could do with synchronization too. If it's a situation with no contention then object monitor will not be inflated and it could be fast too. CAS should be of course faster. Why don't you like CAS? Because it's JDK7+ only?

Regards, Peter
 

We could use the CAS in JDK6 as well, either with the atomic field updaters or using Unsafe directly (we already do this in some places).
We actually had a long discussion about this on one of the Scala meetings, and the overall opinion was that we should avoid using a library-level CAS for the basic language feature - different permissions
could disallow the use of Unsafe or Atomic* classes, rendering the code unrunnable.

Here's the updated version of the graphs:

http://lampwww.epfl.ch/~prokopec/lazyvals/report/
(the `lazy-simulation-v3` version is the initialization with 4 states)

The code:
https://github.com/axel22/lazy-val-bench/blob/master/src/test/scala/example/SyncBench.scala#L104

Avoiding the `notifyAll` in the uncontended case seems to work nicely.

Cheers,
Alex


-- 
Aleksandar Prokopec,
Doctoral Assistant
LAMP, IC, EPFL
http://people.epfl.ch/aleksandar.prokopec

Peter Levart

unread,
May 21, 2013, 8:34:52 AM5/21/13
to Aleksandar Prokopec, scala-i...@googlegroups.com
On 05/21/2013 10:33 AM, Aleksandar Prokopec wrote:
On 5/21/13 12:43 AM, Peter Levart wrote:


On Tuesday, May 21, 2013 12:03:52 AM UTC+2, Aleksandar Prokopec wrote:
I just took a brief look, will respond in more detail in the morning.
So, is the idea to try to circumvent the notifyAll call if there are no other readers?

Exactly. If there are no concurrent 1st-time readers.
 

In fact, we might do this without a CAS too - we could have the second-arriving thread set the bitmap into the 4th state, that means that a notify is needed. We use 2 bits for the lazy val in the new scheme anyway, and this allows us to encode these 4 states.

That's exactly the idea I implemented. But I also used CAS instead of synchronization for the case where there's no concurrent 1st-time readers and no notification is necessary. But could do with synchronization too. If it's a situation with no contention then object monitor will not be inflated and it could be fast too. CAS should be of course faster. Why don't you like CAS? Because it's JDK7+ only?

Regards, Peter
 

We could use the CAS in JDK6 as well, either with the atomic field updaters or using Unsafe directly (we already do this in some places).
We actually had a long discussion about this on one of the Scala meetings, and the overall opinion was that we should avoid using a library-level CAS for the basic language feature - different permissions
could disallow the use of Unsafe or Atomic* classes, rendering the code unrunnable.

Unsafe is restricted for use by classes loaded by bootstrap classloader (JDK system classes), but with no SecurityManager installed, it can be used by ordinary classes too, using reflection to obtain access. In environments with SecurityManager (app servers, applets, JavaWebStart apps) this can be prevented, so Unsafe is not appropriate. But Atomic* classes are not restricted. I don't know of any permission that would disallow their use. It's like ConcurrentHashMap for example, or practically any class in java.util.concurrent. Are you strict and don't use standard core Java libraries in language features to minimize dependencies?

Regards, Peter

Aleksandar Prokopec

unread,
May 21, 2013, 1:36:38 PM5/21/13
to Peter Levart, scala-i...@googlegroups.com
I believe that the concern was partly about minimizing dependencies to
ease the future development of alternate backends, and partly about Unsafe.
I will microbenchmark the solution that uses the updaters.

Regards,
Aleksandar

√iktor Ҡlang

unread,
May 21, 2013, 1:39:35 PM5/21/13
to scala-i...@googlegroups.com, Peter Levart
On Tue, May 21, 2013 at 7:36 PM, Aleksandar Prokopec <aleksanda...@epfl.ch> wrote:
I believe that the concern was partly about minimizing dependencies to ease the future development of alternate backends, and partly about Unsafe.
I will microbenchmark the solution that uses the updaters.

Updaters are in my experience 5-15% slower than direct Unsafe access, YMMV.
Cheers,
 

Regards,
Aleksandar


Unsafe is restricted for use by classes loaded by bootstrap classloader (JDK system classes), but with no SecurityManager installed, it can be used by ordinary classes too, using reflection to obtain access. In environments with SecurityManager (app servers, applets, JavaWebStart apps) this can be prevented, so Unsafe is not appropriate. But Atomic* classes are not restricted. I don't know of any permission that would disallow their use. It's like ConcurrentHashMap for example, or practically any class in java.util.concurrent. Are you strict and don't use standard core Java libraries in language features to minimize dependencies?

Regards, Peter


Here's the updated version of the graphs:

http://lampwww.epfl.ch/~prokopec/lazyvals/report/
(the `lazy-simulation-v3` version is the initialization with 4 states)

The code:
https://github.com/axel22/lazy-val-bench/blob/master/src/test/scala/example/SyncBench.scala#L104

Avoiding the `notifyAll` in the uncontended case seems to work nicely.

Cheers,
Alex


--
Aleksandar Prokopec,
Doctoral Assistant
LAMP, IC, EPFL
http://people.epfl.ch/aleksandar.prokopec



--
Aleksandar Prokopec,
Doctoral Assistant
LAMP, IC, EPFL
http://people.epfl.ch/aleksandar.prokopec

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-internals+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.





--
Viktor Klang
Director of Engineering

Twitter: @viktorklang

Aleksandar Prokopec

unread,
May 22, 2013, 3:37:18 AM5/22/13
to scala-i...@googlegroups.com, Peter Levart
I've just merged the CAS-based version PR from Viktor for the 1 lazy val
per class case (lazy-simulation-v4).
It seems to be roughly as fast as the current scheme with one
synchronization block.

http://lampwww.epfl.ch/~prokopec/lazyvals/report/

Regards,
Aleksandar

martin odersky

unread,
May 22, 2013, 6:28:14 AM5/22/13
to scala-internals, Peter Levart
On Wed, May 22, 2013 at 9:37 AM, Aleksandar Prokopec <aleksanda...@epfl.ch> wrote:
I've just merged the CAS-based version PR from Viktor for the 1 lazy val per class case (lazy-simulation-v4).
It seems to be roughly as fast as the current scheme with one synchronization block.

http://lampwww.epfl.ch/~prokopec/lazyvals/report/

Regards,
Aleksandar

Way to go then! - Martin 
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-internals+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.





--
Martin Odersky
Prof., EPFL and Chairman, Typesafe
PSED, 1015 Lausanne, Switzerland
Tel. EPFL: +41 21 693 6863
Tel. Typesafe: +41 21 691 4967

√iktor Ҡlang

unread,
May 22, 2013, 7:42:02 AM5/22/13
to scala-i...@googlegroups.com, Peter Levart
For whom it may concern: :)

The idea behind v4 is that the uncontended init phase is: volatile read + tableswitch + LOCK CMPXCHG + store + LOCK XCHG + return (piggybacking the safe publication of the value on the LOCK XMCHG establishing a happens-before relationship with the initial volatile read)
The uncontended subsequent reads are: volatile read + tableswitch + return
Since we can detect contended inits we also know when we need to use monitors, so we only pay the price of synchronized for those situations.

Cheers,


To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
 
 



--

Aleksandar Prokopec

unread,
May 22, 2013, 12:09:51 PM5/22/13
to scala-i...@googlegroups.com, √iktor Ҡlang, Peter Levart, Odersky Martin
The CAS approach indeed looks.
However, I think that we would need to generalize it a bit.
It seems to me that we cannot use the `getAndSet` in general, but we need some retry-logic when there are multiple lazy val fields, due to concurrent accesses to other lazy val fields.
Also, we don't always have the option of extending AtomicIntegers, and we cannot use Unsafe for reasons discussed earlier.
That leaves us with atomic integer updaters.

Here is an alternative, more general implementation for cases where there are multiple lazy vals per objects, and we use atomic field updaters:

class LazyCellBase { // in a Java file - we need a public bitmap_0
    public static AtomicIntegerFieldUpdater<LazyCellBase> aifu_0 =
      AtomicIntegerFieldUpdater.newUpdater(LazyCellBase.class, "bitmap_0");
    public volatile int bitmap_0 = 0;
  }

  final class LazyCell extends LazyCellBase {
    import LazyCellBase._
    var value_0: Int = _
    @tailrec final def value(): Int = (aifu_0.get(this): @switch) match {
      case 0 =>
        if (aifu_0.compareAndSet(this, 0, 1)) {
          val result = 0
          value_0 = result

          @tailrec def complete(): Unit = (aifu_0.get(this): @switch) match {
            case 1 =>
              if (!aifu_0.compareAndSet(this, 1, 3)) complete()
            case 2 =>
              if (aifu_0.compareAndSet(this, 2, 3)) {
                synchronized { notifyAll() }
              } else complete()
          }

          complete()
          result
        } else value()
      case 1 =>
        aifu_0.compareAndSet(this, 1, 2)
        synchronized {
          while (aifu_0.get(this) != 3) wait()
        }
        value_0
      case 2 =>
        synchronized {
          while (aifu_0.get(this) != 3) wait()
        }
        value_0
      case 3 => value_0
    }
  }


With these modifications, the initialization (v4-general) becomes almost as slow as the 2 synchronized blocks approach, although it does work better in the contended case than the current lazy val initialization scheme.

http://lampwww.epfl.ch/~prokopec/lazyvals/report/

Note that I added a test with contention, and note that this is a borderline synthetic microbenchmark, not a typical use case.

After having spent some time thinking about this, I am still convinced that we should take the synchronized path.
Here are some points:
- the overall memory footprint is possibly increased by always using an integer bitmap field - we would spend a minimum of 4 bytes per bitmap on first lazy val, where this was previously 1 byte
- in a setting with multiple bitmaps or an existing base class, we cannot extend `AtomicInteger` (that internally uses `Unsafe` directly), but need to use `AtomicIntegerFieldUpdater`s that are slower due to extra checks
- in a setting with multiple lazy val fields, we can no longer use a `getAndSet` in the initialization (concurrent accesses to other lazy fields may modify the bitmap - we have to read and recompute the expected bitmap state) - we need a `compareAndSet` and have some retry-logic (see the `complete` method above), which is slower
- due to the restrictions on the `AtomicIntegerFieldUpdater`s, we would need to make the bitmap_0 field publicly visible on the bytecode level, which might be an issue for Java code interfacing Scala code
- the 2 synchronized blocks implementation is much simpler overall

Cheers,
Alex
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.





--
Martin Odersky
Prof., EPFL and Chairman, Typesafe
PSED, 1015 Lausanne, Switzerland
Tel. EPFL: +41 21 693 6863
Tel. Typesafe: +41 21 691 4967
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Viktor Klang
Director of Engineering

Twitter: @viktorklang
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 


-- 
Aleksandar Prokopec,
Doctoral Assistant
LAMP, IC, EPFL
http://people.epfl.ch/aleksandar.prokopec

√iktor Ҡlang

unread,
May 22, 2013, 12:45:55 PM5/22/13
to scala-i...@googlegroups.com, Peter Levart, Odersky Martin
Well, except for multithreaded/highly concurrent applications. (Akka comes to mind :-) )
 

After having spent some time thinking about this, I am still convinced that we should take the synchronized path.

I disagree, I think there's a non-blocking solution waiting to be discovered :-)
 
Here are some points:
- the overall memory footprint is possibly increased by always using an integer bitmap field - we would spend a minimum of 4 bytes per bitmap on first lazy val, where this was previously 1 byte

IIRC a single "byte" on the JVM eats 32-bit of storage and only byte arrays are 1-byte per cell.
 
- in a setting with multiple bitmaps or an existing base class, we cannot extend `AtomicInteger` (that internally uses `Unsafe` directly), but need to use `AtomicIntegerFieldUpdater`s that are slower due to extra checks

I'd opt for Unsafe as scala.concurrent is already based on it. FieldUpdaters are bad.
 
- in a setting with multiple lazy val fields, we can no longer use a `getAndSet` in the initialization (concurrent accesses to other lazy fields may modify the bitmap - we have to read and recompute the expected bitmap state) - we need a `compareAndSet` and have some retry-logic (see the `complete` method above), which is slower

True, let me see what I can do about that case.
 
- due to the restrictions on the `AtomicIntegerFieldUpdater`s, we would need to make the bitmap_0 field publicly visible on the bytecode level, which might be an issue for Java code interfacing Scala code

That problem goes away with Unsafe (and I think that package-protected is good enough for AIFU?)
 
- the 2 synchronized blocks implementation is much simpler overall

I think the solution could be abstracted so that the generated compute methods would be lean as well.
I'll see if I can take a stab at simplifying the generated code while still providing decent uncontended performance.

Cheers,

Rex Kerr

unread,
May 22, 2013, 12:57:49 PM5/22/13
to scala-i...@googlegroups.com, Peter Levart, Odersky Martin
On Wed, May 22, 2013 at 12:45 PM, √iktor Ҡlang <viktor...@gmail.com> wrote:



IIRC a single "byte" on the JVM eats 32-bit of storage and only byte arrays are 1-byte per cell.

Everything is rounded up to 8, but it's packed efficiently within that constraint.

  --Rex
 

√iktor Ҡlang

unread,
May 22, 2013, 1:01:16 PM5/22/13
to scala-i...@googlegroups.com, Peter Levart, Odersky Martin
I couldn't find it in the spec so I guess it's implementation dependent, or do you have a linky?

Cheers,
 


  --Rex
 

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Rex Kerr

unread,
May 22, 2013, 1:15:20 PM5/22/13
to scala-i...@googlegroups.com
I haven't found that information in the spec.  I used an agent to tell me experimentally.  A bare class takes 12 bytes, by the way (rounded up to 16), so you can store up to an Int for "free".  See below:

kerrr@angband:~/code/scala/ichi$ scala -J-javaagent:/jvm/IchiAgent.jar -cp /jvm/Ichi.jar
Welcome to Scala version 2.10.0 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_37).
Type in expressions to have them evaluated.
Type :help for more information.

scala> val I = ichi.tools.ScalaInstrumentation.instrument.get
I: java.lang.instrument.Instrumentation = sun.instrument.InstrumentationImpl@3d0bbf6d

scala> import I.{ getObjectSize => sz }
import I.{getObjectSize=>sz}

scala> class A {}; sz(new A)
defined class A
res0: Long = 16

scala> class B { var x: Byte = 0 }; sz(new B)
defined class B
res1: Long = 16

scala> class C { var x: Byte = 0; var y: Short = 0 }; sz(new C)
defined class C
res2: Long = 16

scala> class D { var x: Byte = 0; var y: Short = 0; var z: Byte = 0 }; sz(new D)
defined class D
res3: Long = 16

scala> class E { var x: Byte = 0; var y: Short = 0; var z: Byte = 0; var w: Byte = 0 }; sz(new E)
defined class E
res4: Long = 24

scala> class F { var a: Int = 0; var b: Long = 0 }; sz(new F)
defined class F
res5: Long = 24

scala> class G { var a: Int = 0; var b: Long = 0; var c: Boolean = false }; sz(new G)
defined class G
res6: Long = 32


  --Rex

Aleksandar Prokopec

unread,
May 22, 2013, 1:24:01 PM5/22/13
to scala-i...@googlegroups.com
I'm not sure if that is spec-ed, but there's a detailed blog post about the rules of how the object fields are packed.


There is a memory footprint benchmark in the repo I've previously shared - seems like the footprint stays the same for the case of a single lazy val field object.
I think that due to the alignment to an 8-byte boundary, usually the footprint should stay the same, in some situations it might slightly increase if the new bitmap size causes the object size to cross the 8-byte boundary. 

Simon Ochsenreither

unread,
May 22, 2013, 1:39:37 PM5/22/13
to scala-i...@googlegroups.com
I remember that there has been a highly informative talk by Gil Tene who addressed these questions and explained some recent changes in the JVM, but I can't find the link right now.

Aleksandar Prokopec

unread,
May 22, 2013, 1:39:39 PM5/22/13
to scala-i...@googlegroups.com
http://lampwww.epfl.ch/~prokopec/lazyvals/report/

Note that I added a test with contention, and note that this is a borderline synthetic microbenchmark, not a typical use case.

Well, except for multithreaded/highly concurrent applications. (Akka comes to mind :-) )
  

I think that multithreaded/highly concurrent applications might not be using lazy vals in that particular scenario. :)
 

 
After having spent some time thinking about this, I am still convinced that we should take the synchronized path.

I disagree, I think there's a non-blocking solution waiting to be discovered :-)

Well, in the general case that might be hard, because what the initializer block does might have side-effects.
Can't think of anything there besides using a sandboxing non-blocking STM for the lazy val initialization ;)
 
 
- in a setting with multiple bitmaps or an existing base class, we cannot extend `AtomicInteger` (that internally uses `Unsafe` directly), but need to use `AtomicIntegerFieldUpdater`s that are slower due to extra checks

I'd opt for Unsafe as scala.concurrent is already based on it. FieldUpdaters are bad.

Agree with you completely there - Unsafe is faster.
We had previously held a long discussion on the Scala meeting (where I had originally argued for the CAS approach) and the general concern that the people had was the interaction with SecurityManagers.
 
Cheers,
Alex

Rex Kerr

unread,
May 22, 2013, 2:03:35 PM5/22/13
to scala-i...@googlegroups.com
The blog doesn't make clear that Shorts and Bytes don't each receive their own padding, but they don't as of 1.6u37, anyway: one Short and two Bytes fit in the four bytes free in a basic object (note: base object size is 12 bytes, not 8, on a 64 bit machine).

  --Rex

√iktor Ҡlang

unread,
May 22, 2013, 2:11:14 PM5/22/13
to scala-i...@googlegroups.com
On Wed, May 22, 2013 at 7:39 PM, Aleksandar Prokopec <aleksanda...@gmail.com> wrote:

http://lampwww.epfl.ch/~prokopec/lazyvals/report/

Note that I added a test with contention, and note that this is a borderline synthetic microbenchmark, not a typical use case.

Well, except for multithreaded/highly concurrent applications. (Akka comes to mind :-) )
  

I think that multithreaded/highly concurrent applications might not be using lazy vals in that particular scenario. :)

You mean for the highly contended init? ;-)
Well, technically the problem is more and more prevalent the longer the init takes…
 
 

 
After having spent some time thinking about this, I am still convinced that we should take the synchronized path.

I disagree, I think there's a non-blocking solution waiting to be discovered :-)

Well, in the general case that might be hard, because what the initializer block does might have side-effects.
Can't think of anything there besides using a sandboxing non-blocking STM for the lazy val initialization ;)

Hehe, sounds like a fair challenge ;-)
 
 
 
- in a setting with multiple bitmaps or an existing base class, we cannot extend `AtomicInteger` (that internally uses `Unsafe` directly), but need to use `AtomicIntegerFieldUpdater`s that are slower due to extra checks

I'd opt for Unsafe as scala.concurrent is already based on it. FieldUpdaters are bad.

Agree with you completely there - Unsafe is faster.
We had previously held a long discussion on the Scala meeting (where I had originally argued for the CAS approach) and the general concern that the people had was the interaction with SecurityManagers.

I'd be cheering you on if I was present! Regarding SecurityManagers, we'd have the same problem with scala.concurrent right, so it's more of a question of "sandboxing" from a scala-library perspective, right?

As for double-synchronized-on-init-but-volatile-read-when-inited it could be a decent tradeoff as most commonly it'd be completely non-blocking.

Cheers,
 
 
Cheers,
Alex

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Rex Kerr

unread,
May 22, 2013, 2:54:35 PM5/22/13
to scala-i...@googlegroups.com
Sorry that I haven't fully followed the discussion, but have you tried a two-separate-flag-var, three-bits-total strategy to get down to a single synchronized block when uncontested?

if flag1 == DONE, use value
else {
  sync {
    if flag2a==WORKING {
      flag2b = WAITING
      if (flag1 != DONE) wait()
    } else {
      flag2a = WORKING
      flag2b = ! WAITING
    }
  }
  run
  load
  flag1 = DONE
  if flag2b==WAITING {
    sync {
      flag2a = ! WORKING
      notifyAll()
    }
  }
}
This way, DONE is only set when there's actually a value there, and because you set WAITING first and then check for DONE before wait(), while you set DONE first and then check for WAITING when you've actually finished the work, you can't get stuck in a wait without a notify.

So you should have a single sync strategy when uncontested (with only one extra volatile var read over what we have now), and the other nice properties of the two-sync strategy should be preserved.  (I.e. this is two-sync-on-demand.)

  --Rex



Rex Kerr

unread,
May 22, 2013, 2:57:50 PM5/22/13
to scala-i...@googlegroups.com
I should clarify that the storage requirement is a byte per lazy val extra in addition to howevermuch is needed to store the 2a/2b flags, as flag1 is manipulated outside of a lock.  Otherwise you'd need yet more logic to avoid overrwrites from different threads dealing with different lazy vals.

  --Rex

√iktor Ҡlang

unread,
May 22, 2013, 3:24:10 PM5/22/13
to scala-i...@googlegroups.com
We're actually down to 0 sync blocks for the uncontended case. The current issue with that approach is that it becomes a bit cryptic once multiple vals share the same bitmap.

Cheers,

Rex Kerr

unread,
May 22, 2013, 3:33:30 PM5/22/13
to scala-i...@googlegroups.com
Er, sorry, I mean without using unsafe or anything bulky like AtomicWhatever.  Isn't having unsafe in something as central as lazy vals just raising headaches for any non-Oracle-JVM target?  Or is unsafe completely standardized across JVMs these days (and portable to CLR)?

  --Rex

Simon Ochsenreither

unread,
May 22, 2013, 3:38:11 PM5/22/13
to scala-i...@googlegroups.com

I'd be cheering you on if I was present! Regarding SecurityManagers, we'd have the same problem with scala.concurrent right, so it's more of a question of "sandboxing" from a scala-library perspective, right?

As far as I understand, it is possible to sign the jar file to allow for a higher privilege levels (not sure if this only applies to Applets/WebStart or to all SecurityManager-secured environments). But this isn't an option for Scala anyway, because the library is not safe from a security POV.

√iktor Ҡlang

unread,
May 22, 2013, 3:43:28 PM5/22/13
to scala-i...@googlegroups.com


On May 22, 2013 9:33 PM, "Rex Kerr" <ich...@gmail.com> wrote:
>
> Er, sorry, I mean without using unsafe or anything bulky like AtomicWhatever.

See Alex's original proposal.

  Isn't having unsafe in something as central as lazy vals just raising headaches for any non-Oracle-JVM target?  Or is unsafe completely standardized across JVMs these days

Yes, according to Mr Lea.

(and portable to CLR)?

How lazy vals are encoded must be platform specific.

Rex Kerr

unread,
May 22, 2013, 4:36:06 PM5/22/13
to scala-i...@googlegroups.com
On Wed, May 22, 2013 at 3:43 PM, √iktor Ҡlang <viktor...@gmail.com> wrote:


On May 22, 2013 9:33 PM, "Rex Kerr" <ich...@gmail.com> wrote:
>
> Er, sorry, I mean without using unsafe or anything bulky like AtomicWhatever.

See Alex's original proposal.


That's what I was responding to--removing one of the two synchronized blocks at the cost of a bit more storage space.
 

  Isn't having unsafe in something as central as lazy vals just raising headaches for any non-Oracle-JVM target?  Or is unsafe completely standardized across JVMs these days

Yes, according to Mr Lea.

Well, I guess if anyone could define the truth, he'd be the one (in this realm).  That is, if your JVM doesn't support it, you can always complain, "Hey, this isn't a real implementation--no sun.misc.Unsafe?  It's just a toy, I guess."

  --Rex
 

√iktor Ҡlang

unread,
May 22, 2013, 4:58:14 PM5/22/13
to scala-i...@googlegroups.com
It's a shame that there is no low-level officially supported API. Having to implement "languages"/standard libraries using the same tools that are available to application developers isn't really the right way forward, as witnessed by the use of intrinsics in the JDK.
</rant>

Oh well...

Cheers,
 


  --Rex
 

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Simon Ochsenreither

unread,
May 22, 2013, 5:11:48 PM5/22/13
to scala-i...@googlegroups.com
It's a shame that there is no low-level officially supported API. Having to implement "languages"/standard libraries using the same tools that are available to application developers isn't really the right way forward, as witnessed by the use of intrinsics in the JDK.
</rant>

Feel free to to specify the semantics you want to have and make a pitch for it at the Avian mailing list. :-P

√iktor Ҡlang

unread,
May 22, 2013, 5:44:16 PM5/22/13
to scala-i...@googlegroups.com
Thanks for the invite :)
 

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply all
Reply to author
Forward
0 new messages