What needs to be done short-term: Auto-generate BoxingConversion
objects. Enforce the restrictions laid down in the SIP. More testing.
Converting selective library classes to be value classes.
Longer term: Try to get specialization to a point where common
FlatArray operations need lead to boxing anymore. Right now, every
time you select an element in a FlatArray it is boxed, because
FlatArray's interface is generic. The boxed object is typically
short-lived because we often store it in unboxed form afterwards.
Specialization is something that should be doable here. It will
require effort, so it might arrive after 2.10. But I think the basic
value classes scheme is validated and usable now.
Cheers
-- Martin
Is the proposal limited to value classes with a single constructor
parameter? If so, how would it interact with future work to model
structs, e.g. Complex(r: Double, i: Double).
These are supported on .NET, and might be added to the JVM one day. I
suppose a JVM compiler could encode them as groups of
fields/parameters, too.
-jason
--
Daniel C. Sobral
I travel to the future all the time.
parameter? If so, how would it interact with future work to modelIs the proposal limited to value classes with a single constructor
structs, e.g. Complex(r: Double, i: Double).
That's not _quite_ true. It just requires a lot of extra compiler work and wouldn't perform as well as a JVM-supported version.
Personally I don't think it's worth the effort, but I do think it's plausible. These are just the changes that one makes by hand in order to achieve top performance when it really matters.
It is, as you say, the best one can do in the JVM though.
I'd been thinking about this some. How slow are the reinterpret-cast
type operations? doubleToRawLongBits etc? One crazy idea is to
actually back everything by Array[Byte] or Array[Long] and do
everything c-style. It solves the cache problem, but it's probably
hard to get all the subtleties right.
-- David
>
> Best,
> Ismael
> On Fri, Feb 17, 2012 at 13:16, martin odersky <martin....@epfl.ch> wrote:
>> I have now a proposal I am quite happy with. I changed the array
>> handling scheme to FlatArrays that take implicit parameters
>> representing box/unbox conversions. This went well. I have a version
>> of FlatArray which is fully integrated in Scala collectionns.
>>
>> What needs to be done short-term: Auto-generate BoxingConversion
>> objects. Enforce the restrictions laid down in the SIP. More testing.
>> Converting selective library classes to be value classes.
>>
>> Longer term: Try to get specialization to a point where common
>> FlatArray operations need lead to boxing anymore. Right now, every
>> time you select an element in a FlatArray it is boxed, because
>> FlatArray's interface is generic. The boxed object is typically
>> short-lived because we often store it in unboxed form afterwards.
>> Specialization is something that should be doable here. It will
>> require effort, so it might arrive after 2.10. But I think the basic
>> value classes scheme is validated and usable now.
>>
>> Cheers
>>
>> -- Martin
>
>
>
> --
> Daniel C. Sobral
>
> I travel to the future all the time.
--
Martin Odersky
Prof., EPFL and Chairman, Typesafe
PSED, 1015 Lausanne, Switzerland
Tel. EPFL: +41 21 693 6863
Tel. Typesafe: +41 21 691 4967
Cheers
-- Martin
n-arr parameter
> These are supported on .NET, and might be added to the JVM one day. I
> suppose a JVM compiler could encode them as groups of
> fields/parameters, too.
>
> -jason
--
Yes, this is the easy part.
> You still get into trouble if you have to return values (you have to
> create an object, but if you made it mutable you could cleverly push it
> outside of loops so you didn't have to pay the creation tax every
> iteration) or put them into a generic collection (you have to box, but
> you would have had to with a primitive anyway).
This won't work in the general case. What about calling a method in a
loop that calls another method that return value types? The only way I
see that can remove object creation totally is either inline everything,
or pass a singleton, mutable object containing lots of primitive fields
of all types that are use for return values. Neither solution is very
practical.
If only the JVM added multiple return values you could quite easily
emulate .NET value types in the JVM similar to the way Martin has done
for single value types. Unfortunately this is not very high on Oracle's
priority list as they are reluctant towards JVM spec changes.
/Jesper Nordenberg
Unfortunately this is not very high on Oracle's priority list as they are reluctant towards JVM spec changes.
First, the restriction that value classes may not have super accesses
can be dropped. This was fairly easy to implement.
Second, I also has to drop the idea that we could auto-generate the
implicit BoxingConversion terms. The reason is pretty technical but it
comes down to this: We need to implement these implicits very early in
the compilation scheme, so that any potential calls will see the
implicits. But finding out whether to generate these implicits in that
early phase would have triggered more type exploration than previously
with the consequence that some reasonable programs would be rejected
with "illegal cyclic reference" errors. The only way we could avoid
the errors would be if there was a syntactic rule that described value
classes unambiguously without resorting to name and type explorations.
Note that the "extends AnyVal" in
class Foo(x) extends AnyVal
is not unambiguous. AnyVal might be imported to mean something else,
so sometimes extending something named "AnyVal" does not imply having
a value class. Or else we might have a type alias
type AV = scala.AnyVal
class Bar(x) extends AV
This shows that some value classes do not extend AnyVal literally.
I toyed with the idea of finding new syntax for value classes. But in
the end I came down with the opinion that instead of micro-optimizing
now we should try in the long run to find a more general scheme where
classes could derive stuff automatically. So, it seemed best to keep
the minimalistic syntax.
Cheers
-- Martin
type operations? doubleToRawLongBits etc? One crazy idea is toI'd been thinking about this some. How slow are the reinterpret-cast
actually back everything by Array[Byte] or Array[Long] and do
everything c-style. It solves the cache problem, but it's probably
hard to get all the subtleties right.
I just wanted to follow up on my earlier email about value classes (SIP-15)
When replying to Simon's email in that thread Martin says that
"flatArray is the best anyone can do." But I think we must try to do
better.
My basic concern is that value classes as specified are a bit too
unpredictable. Specifically, I expect a programmer to use a value type
for performance reasons when she wants to ensure that the underlying
value is never boxed.
Allowing value classes extend traits (via the universal trait
mechanism) is nice, but the price (complicated rules around Arrays and
boxing) is not worth it. I think that programmers who use value types
would be willing to give up inheritance and reflection in order to gain
assurances about unboxed representation. (At least, I would.) Even
without this, the extension methods, rerouting, and peephole
optimizations still make using value classes worthwhile.
Type classes give us a way to abstract across the AnyVal types, and it
seems totally natural to me that other value classes would require the
same mechanism (rather than being able to extend Ordered[T] or
Printable or whatnot). While allowing value classes to extend universal
traits is nice, it's not worth the pain of boxing (even if we hope that
specialization might be able to solve that in the future)... especially
since types like Int and Double will still need typeclasses.
If we didn't permit value classes to extend methods, I think we *could*
do a lot better than FlatArray, right? The example in SIP-15
(illustrating why arrays must do boxing) would not be legal (since a
value class T could not be a subtype of Printable).
If there is some reason why arrays *still* must box which I haven't
thought of, I think we should move to a FlatArray that is parameterized
on both Boxed and Unboxed. At least then, users would have some way to
get and set unboxed values directly without intermediate boxing.
Without this, I'm not sure FlatArray is much better than Array (you
save on storage space a bit but pay even more costs on object
creation).
I realize that such a FlatArray would not be very compatible with
collections, but when I have to reach for FlatArray it's because I
*really* don't want boxing... so building boxing into it seems pretty
wrong. I can provide an alternate FlatArray implementation if people
would like to compare/contrast.
Sorry to be so negative--I really am excited about value classes, which
is why I'm hoping to make the spec as nice as possible. My opinion is
that it is close, but not quite there
Thanks,
-- Erik
Hi,
I just wanted to follow up on my earlier email about value classes (SIP-15)
When replying to Simon's email in that thread Martin says that
"flatArray is the best anyone can do." But I think we must try to do
better.
My basic concern is that value classes as specified are a bit too
unpredictable. Specifically, I expect a programmer to use a value type
for performance reasons when she wants to ensure that the underlying
value is never boxed.
Allowing value classes extend traits (via the universal trait
mechanism) is nice, but the price (complicated rules around Arrays and
boxing) is not worth it. I think that programmers who use value types
would be willing to give up inheritance and reflection in order to gain
assurances about unboxed representation. (At least, I would.) Even
without this, the extension methods, rerouting, and peephole
optimizations still make using value classes worthwhile.
Type classes give us a way to abstract across the AnyVal types, and it
seems totally natural to me that other value classes would require the
same mechanism (rather than being able to extend Ordered[T] or
Printable or whatnot). While allowing value classes to extend universal
traits is nice, it's not worth the pain of boxing (even if we hope that
specialization might be able to solve that in the future)... especially
since types like Int and Double will still need typeclasses.
- members of value classes must print like their underlying type. I.e
you could not get 10.0m as a toString from a Meter quantity.
- members of value classes would be equal to members of the
underlying type. So 10 meter would equal 10 would equal 10 feet. The
last one, for me, is the killer.
- Martin
Cheers
- Martin
--
Philosophically, we have to realize that Array on the JVM is
fundamentally broken.
I have seen this already.
I agree that this would happen when the type of the paramter isn't
known. I could imagine being able to create an "extension" of toString
that would work around it in some cases, but you're right that in many
cases it would happen. To me it's acceptable, even if it's a bit ugly.
> - members of value classes would be equal to members of the
> underlying type. So 10 meter would equal 10 would equal 10 feet. The
> last one, for me, is the killer.
This is a big problem, for sure, a natural outcome of the "equals
situation" we inherit from Java.
I am on the fence about how bad this is, compared to the boxing issues
I raised. In some cases (like units) it's clearly not great. Unsigned
integers are another place that is a bit strange. I'm not sure that
Complex(0.0F, 0.0F) == 0L is wrong per se though.
I think for my purposes I would rather have value classes that are
guaranteed to be fast (unboxed) but have some gotchas: use something
besides toString to print them out, and something else (e.g. ===) to
test for equality instead of ==.
What are other people's thoughts? I'm sure there are many perspectives
on this.
-- Erik
I am also interested in people's opinions on this. Personally, I
believe it is wrong to give up semantic integrity for speed.
- Martin
Trunk currently has this:
final class StringOps(override val repr: String) extends AnyVal with
StringLike[String] {
final class Ensuring[A](val __resultOfEnsuring: A) extends AnyVal {
final class ArrowAssoc[A](val __leftOfArrow: A) extends AnyVal {
I haven't measured performance changes, but the bytecode changes
looked promising to me. Now, the first one would be made illegal by
the change proposed, and I can't think of any case where the other two
would be affected.
I'm not sure about the issues surrounding array. It seems to me that
trying to abstract anything about them is always so full or corner
cases as to make them too dangerous to use. Of course, I'd love to be
proven wrong, but if the choice is between having to use unabstracted
arrays vs having to tread warily whenever I use abstracted arrays, I'd
pick the former (ie, keep the current SIP).
> On Mon, Mar 26, 2012 at 08:29:51PM +0200, martin odersky wrote:
>> There's more than that. If you go down that route,
>>
>> - members of value classes must print like their underlying type. I.e
>> you could not get 10.0m as a toString from a Meter quantity.
>
> I have seen this already.
>
> I agree that this would happen when the type of the paramter isn't
> known. I could imagine being able to create an "extension" of toString
> that would work around it in some cases, but you're right that in many
> cases it would happen. To me it's acceptable, even if it's a bit ugly.
>
>> - members of value classes would be equal to members of the
>> underlying type. So 10 meter would equal 10 would equal 10 feet. The
>> last one, for me, is the killer.
>
> This is a big problem, for sure, a natural outcome of the "equals
> situation" we inherit from Java.
>
> I am on the fence about how bad this is, compared to the boxing issues
> I raised. In some cases (like units) it's clearly not great. Unsigned
> integers are another place that is a bit strange. I'm not sure that
> Complex(0.0F, 0.0F) == 0L is wrong per se though.
>
You picked exactly the one value for which this is okay. Would you say the same for Complex(1f, 1f)==4575657222473777152L? (my own concocted encoding, don’t know if this matches what you do)
> I think for my purposes I would rather have value classes that are
> guaranteed to be fast (unboxed) but have some gotchas: use something
> besides toString to print them out, and something else (e.g. ===) to
> test for equality instead of ==.
>
Consistency is rather important when discussing a programming language feature, I think. If you want raw arrays, then you obviously care so much that you should probably just use raw arrays, I guess.
> What are other people's thoughts? I'm sure there are many perspectives
> on this.
>
Take this with a pinch of salt, as I have never actually implemented something close to what you want to have on the JVM. But I’m still worried about the equals thingy.
Regards,
Roland
Yeah, I'm not honestly sure how useful these are going to be if arrays
are slow. I'd bet that the vast majority of the use cases surrounding
value types will want arrays. It's hard for me to think of any non-toy
numeric application of value classes that won't use arrays. And, as
has been pointed out, we're going to need typeclasses to do any actual
generic operations with these things, anyway, since we'll want fast
Double/Float/Int ops still.
I do a lot of AI/ML work, and I'd love to have floats with more
exponent and less mantissa, or being able to represent doubles in log
space without overhead. (I'd especially love being able to have a
double and a long in a value class. Maybe one day.) As it currently
stands, I don't think we could really get any benefit out of using
these in Scalala, and I don't think I could use them in my inner
loops. (Basic example inner loop: CKY for weighted context free
grammars.)
Maybe it'll be possible to write a macro class version of FlatArray
one day? I have no faith in specialization...
-- David
martin odersky wrote:
FWIW, I agree that giving up semantic integrity for speed would be wrong. But Object#equals is already pretty broken (in my book) and I see toString abused more often than not. Personally, I could live with having to use a "Show" type class instead of "toString" and === resp. an "Equal" type class instead of == resp. equals.
But then... toString and equals are probably in too widespread use already, making value classes incompatible with libraries that are already out there.
All in all, I suspect it would bring more trouble than it's worth. :(
Kind regards
Andreas
Sorry if that seemed flippant. I guess my opinion is that if you're
comparing Complex to Long you should know what you're doing. Especially
if we don't get improved array support I expect I will be doing a lot
of Complex#u == Long tests, so this result would be OK.
Really, in my own work I would probably be using an Eq[A] typeclass
with === so I would only use == precisely when I was willing to have
something "interesting" like this happen.
> Consistency is rather important when discussing a programming
> language feature, I think. If you want raw arrays, then you obviously
> care so much that you should probably just use raw arrays, I guess.
Of course. I accept that weirdos who care about performance of large
arrays of data are not usually the main constituency for these things.
:) That said, for a moment it seemed like value classes would permit a
lot of interesting uses in this area.
Thanks for your feedback.
-- Erik
That's a good point. Just because value classes aren't solving all my
problems doesn't mean they aren't solving any of them. :)
If we can (eventually) get these things playing nicely with specialized
Ops[A] classes then that by itself will be a gigantic win.
-- Erik
> So, Value classes are solving a few different problems (in Scala). One is that of extension methods and implicit views. We can get the same performance (or bytecode) as extension methods on implicit views using value classes. That's pretty spiffy.
>
> The remaining additions are gravy. I'd prefer to not loose the above optimisation in anything else we do. I think it's ok for us to err on the side of correctness rather than efficiency and hope the JVM eventually corrects itself.
+1. Our primary interest in value classes for my company's code base are for the optimization of implicits. The other stuff is merely a nice-to-have for us.
I've been getting nervous when I see people talk about how value classes might be more trouble than they are worth based on arrays or whatnot. I couldn't disagree more!
>
> I've been getting nervous when I see people talk about how value classes might be more trouble than they are worth based on arrays or whatnot. I couldn't disagree more!
Just to be clear, I meant if value classes give up semantic consistency (toString, equals) for speed, then they might be more trouble than they are worth. The proposal in its current form doesn't have that problem, as far as I understand, and I'm all for it. Sorry if I wasn't clear about that. :(
Kind regards
Andreas
I strongly agree with this sentiment. There are clearly places to
favor performance over correctness, and if (generic) you disagree you
had better at minimum be using BigInt everywhere.
The complexity burden that specialization has placed on the
implementation has been very very high. We must avoid the further
imposition of complexity on the noble implementors where the upside of
the complexity fails to carry its weight.
I believe it is that FlatArray[Meter] needs to take Meters as the
elements that go in and out of a FlatArray, including any constructor
arguments. But FlatArray is a generic class, so any elements have to
be boxed and unboxed when they go in or out of a FlatArray. They would
still be stored as unboxed. But to take an element out of a FlatArray,
say, you'd have to box it for the generic interface, then immediately
unbox it at the monomorphic use site. One hopes that escape analysis
will find and eliminate the short-lived object, but that depends on
the code being inlined by Hotspot.
In the SIP we intentionally did not want to include specialization.
But to make FlatArray go well we need specialization to value class
type parameters. This will be rather hairy I am afraid, maybe not even
possible without a fundamental rethinking of the specialization
architecture.
The first problem here is one of phases. Value classes need to be
expanded during erasure. Specialization happens before that. So
Specialization is not able to treat a value class as its underlying
type. To be able to do that is the whole point of the transform in
erasure and postErasure.
So it seems for the mid term our only choices are:
- keep FlatArray and rely on escape analysis.
- don't have a solution for arrays at all.
In fact, for large arrays, FlatArray should win over non-flat arrays
even if boxing occurs at the interface because of the better locality
and smaller memory footprint. So there might be reasons enough to keep
FlatArray.
Cheers
- Martin
-- Martin
> It will be interesting to see the actual performance tradeoffs in
> Scala when run on the different VMs.
>
> Anyways, I think you and all the other Scala contributors are doing
> amazing things with the language right now. I believe that macros
> alone, if implemented well, have enormous potential behind them. On
> top of that, we have reflection, extension methods, a better pattern
> matcher, and so many other things, all in one release!
>
> -- Sean
--
If the use-case for value classes is mostly for things like implicit
conversions (rather than for creating new number types which are
totally unboxed), then I would support removing FlatArray to simplify
the SIP and to avoid creating expectations which we can't fulfill.
> In fact, for large arrays, FlatArray should win over non-flat arrays
> even if boxing occurs at the interface because of the better locality
> and smaller memory footprint. So there might be reasons enough to keep
> FlatArray.
This is true, but you can always allocate the Array[Unboxed] yourself
and get/set manually to get the same space savings. This is a bit more
verbose but at least it is clear how it works (Array[Foo] is boxed,
Array[Double] is not), and not very error prone.
-- Erik
I think both are use cases. And, I also think that FlatArray is the
only thing we will ever be able to do for arrays. No other scheme will
work.
The choice is: Introduce it now, and hope for escape analysis and
future improvements to specialization to remove the bottlenecks. Or
introduce it later when these improvements have materialized but then
face the problem that people will have used other solutions in their
code (ie boxed arrays or arrays of primitive values with manual
packing/unpacking).
Cheers
- Martin
That's not dissimilar from the choice which was faced with
specialization. And I thought then and still think that "introduce it
now" was not the way to go.
But there is a crucial difference. Specialization is a huge and hairy beast
FlatArray is a simple class with 150 lines of code total, fully
integrated into collections. And it's the canonical class, it will not
be any different if we introduce it in 2.11 or 2.10.
I would still also hesitate if there was a chance that there would be
a better way to deal with arrays in the future. But that won't happen.
Cheers
- Martin
After working some more on the compilation scheme I came to do two
more changes on the value class proposal.First, the restriction that value classes may not have super accesses
can be dropped. This was fairly easy to implement.Second, I also has to drop the idea that we could auto-generate the
implicit BoxingConversion terms. The reason is pretty technical but it
comes down to this: We need to implement these implicits very early in
the compilation scheme, so that any potential calls will see the
implicits. But finding out whether to generate these implicits in that
early phase would have triggered more type exploration than previously
with the consequence that some reasonable programs would be rejected
with "illegal cyclic reference" errors. The only way we could avoid
the errors would be if there was a syntactic rule that described value
classes unambiguously without resorting to name and type explorations.
Note that the "extends AnyVal" in
class Foo(x) extends AnyVal
is not unambiguous. AnyVal might be imported to mean something else,
so sometimes extending something named "AnyVal" does not imply having
a value class. Or else we might have a type aliastype AV = scala.AnyVal
class Bar(x) extends AVThis shows that some value classes do not extend AnyVal literally.
I toyed with the idea of finding new syntax for value classes. But in
the end I came down with the opinion that instead of micro-optimizing
now we should try in the long run to find a more general scheme where
classes could derive stuff automatically. So, it seemed best to keep
the minimalistic syntax.Cheers
-- Martin
After working some more on the compilation scheme I came to do two
more changes on the value class proposal.First, the restriction that value classes may not have super accesses
can be dropped. This was fairly easy to implement.Second, I also has to drop the idea that we could auto-generate the
implicit BoxingConversion terms. The reason is pretty technical but it
comes down to this: We need to implement these implicits very early in
the compilation scheme, so that any potential calls will see the
implicits. But finding out whether to generate these implicits in that
early phase would have triggered more type exploration than previously
with the consequence that some reasonable programs would be rejected
with "illegal cyclic reference" errors. The only way we could avoid
the errors would be if there was a syntactic rule that described value
classes unambiguously without resorting to name and type explorations.
Note that the "extends AnyVal" inclass Foo(x) extends AnyVal
is not unambiguous. AnyVal might be imported to mean something else,
so sometimes extending something named "AnyVal" does not imply having
a value class. Or else we might have a type aliastype AV = scala.AnyVal
class Bar(x) extends AVThis shows that some value classes do not extend AnyVal literally.
I toyed with the idea of finding new syntax for value classes. But in
the end I came down with the opinion that instead of micro-optimizing
now we should try in the long run to find a more general scheme where
classes could derive stuff automatically. So, it seemed best to keep
the minimalistic syntax.Cheers
-- Martin
-- Martin
--
On Tue, Mar 27, 2012 at 7:57 PM, Pavel Pavlov <pavel.e...@gmail.com> wrote:
>
> Maybe `val class Meter(x: Double)` ?
>
>
Yes, that might indeed be a possibility. When I first thought of it id
did not look appealing but now that I see it written out, it looks not
so bad.
On Tue, Mar 27, 2012 at 7:12 PM, Erik Osheim <er...@plastic-idolatry.com> wrote:I think both are use cases. And, I also think that FlatArray is the
> On Tue, Mar 27, 2012 at 06:16:59PM +0200, martin odersky wrote:
>> So it seems for the mid term our only choices are:
>>
>> - keep FlatArray and rely on escape analysis.
>> - don't have a solution for arrays at all.
>
> If the use-case for value classes is mostly for things like implicit
> conversions (rather than for creating new number types which are
> totally unboxed), then I would support removing FlatArray to simplify
> the SIP and to avoid creating expectations which we can't fulfill.
>
only thing we will ever be able to do for arrays. No other scheme will
work.
--Rex
I'm very sceptical that will happen, and certainly not in the
timeframes we want to consider. Look at the treatments of arrays today
in Java - you can't even define an Array[List[String]]. - Martin
timeframes we want to consider.I'm very sceptical that will happen, and certainly not in the
In the SIP we intentionally did not want to include specialization.
But to make FlatArray go well we need specialization to value class
type parameters. This will be rather hairy I am afraid, maybe not even
possible without a fundamental rethinking of the specialization
architecture.The first problem here is one of phases. Value classes need to be
expanded during erasure. Specialization happens before that. So
Specialization is not able to treat a value class as its underlying
type. To be able to do that is the whole point of the transform in
erasure and postErasure.So it seems for the mid term our only choices are:
- keep FlatArray and rely on escape analysis.
- don't have a solution for arrays at all.
- members of value classes would be equal to members of the
underlying type. So 10 meter would equal 10 would equal 10 feet. The
last one, for me, is the killer.
1. Value classes as presented in the SIP: Can extend traits, need
FlatArray for unboxed array representation.
What some people here are after is this:
2. Newtypes over primitive value types. These could neither extend
traits nor reimplement an Object method (equals, toString, hashCode).
Since == maps to equals, == can also not be redefined. The latter is a
point that may or may not change in the future. Such val newtypes
could be stored in plain arrays without being boxed.
It turns out that (2) is not only compatible with (1), but it needs
(1) to work well. The only way to attach new behavior to a val newtype
is with an implicit decorator. And the only reliable way to make
implicit decorators work without overhead is (1).
So I believe that (1) can very well stand on its own. I would also
give FlatArray a shot and see how far we can get with it with
optimizations on the Scala side and escape analysis on the JVM (which
is also getting better btw). The worst that can happen is that we
conclude that we need (2) for Scala 2.11, and that then FlatArray
would be less useful (I claim it would likely still be useful). If
nobody uses it in 2.11, we can deprecate it and get rid of it later.
But I do not expect that to happen.
Cheers
- Martin
But the most dangerous usages of equals are the ones you do not see:
Java collections, for instance.
It would be very inconvenient if you stored meters and feet in a
collection and they got confused with each other.
- Martin