Hash map plans

1,289 views
Skip to first unread message

Rex Kerr

unread,
Nov 1, 2013, 6:47:18 PM11/1/13
to scala-i...@googlegroups.com
I have been working on improving HashMap for 2.11.  To make a long story short, a significant part of why Java's HashMap is faster than the Scala one is that Java does less work on equality checking.  Scala provides a very full-featured == which will let you get the same answer for boxed primitives as if you knew the types and wrote ==.  But there's runtime overhead associated with even allowing for the possibility of this once you get outside of tight loops that always do the same thing (i.e. most non-microbenchmark code).

As a solution, I propose to add three new hash maps to the mutable collections.  They will inherit from Map but _not_ from HashMap because the contract is subtly but importantly different.  They are:

(1) EqualsHashMap.  (I am open to a better name.)  This map will work just like Java's HashMap in that it will rely solely on (a equals b) on boxed types to check whether two things are the same.  If something is boxed, it will work as well or poorly as the boxes' equals works.  This will be consistently faster than the existing mutable.HashMap as long as the constraints are okay.  (For something like String, it will be okay.)  I will try allowing the user to specify an equals, but I expect that the overhead of multiple dispatch via typeclasses--here manifested as inability to inline--will eat up any performance gains.  If not, it would be great to allow eq only, equals, or == all through the same interface.

(2) LongHashMap.  This map has Long keys (smaller types are okay too).  It will be much faster (2-3x at least) than using boxed primitive whole number keys.  I will probably parameterize it so that you can specify that only some types are okay as keys; I haven't yet seen if there's a performance penalty there.

(3) DoubleHashMap.  This map has Double keys (Float is okay too).  Again, it will be 2x+ faster than boxed primitives.

This should address the performance complaints about Scala HashMap being slower than Java; EqualsHashMap should be every bit as fast (and I might just wrap Java's, though I think one of the variants I've tried is superior) and have more features, and the Long and Double versions will be faster than anything shipped with Java.

Does this sound like a reasonable plan?

  --Rex

Aleksandar Prokopec

unread,
Nov 1, 2013, 7:00:47 PM11/1/13
to scala-i...@googlegroups.com
Hi,

Sounds pretty good. Did you consider having the same specialized hashmap codebase for all three implementations, and extract the (double/long/reference) array instantiation as well as equality comparisons into a typeclass, or did you plan to have three separate implementations?

Cheers,
Alex


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Erik Osheim

unread,
Nov 1, 2013, 7:02:59 PM11/1/13
to scala-i...@googlegroups.com
On Fri, Nov 01, 2013 at 03:47:18PM -0700, Rex Kerr wrote:
> As a solution, I propose to add three new hash maps to the mutable
> collections. They will inherit from Map but _not_ from HashMap because the
> contract is subtly but importantly different. They are:

FWIW, I've actually had good results using specialization with mutable
HashMaps. Maybe it's worth trying that, or even just specializing the
interface and then doing something with class tags on instantiation? I
guess either way the need for class tags is a bit ugly, but obviously
specialization can remove some of the overhead that equals and
hashCode impose.

As an aside, I'm happy to try to collaborate on some fast
implementations if you have something you want to send me privately.

-- Erik

Rex Kerr

unread,
Nov 1, 2013, 7:14:11 PM11/1/13
to scala-i...@googlegroups.com
Hi Erik and Alex,

Specialization produces too much code.  It's also valuable to have a leaner library.

I will have to have three separate implementations since JIT compilers are not all that great at realizing that instead of de-optimizing code in an inner loop, it can maintain multiple versions of the whole hierarchy.  So I will essentially be manually miniboxing.  I agree that it's annoying, but unless a solution is on a near horizon, I don't see a better option.

(I will explore making the JavaConverters wrapper over java.util.HashMap more robust and using that instead of EqualsHashMap, since that's what the Java version does anyway.)

  --Rex


Simon Ochsenreither

unread,
Nov 2, 2013, 9:22:18 AM11/2/13
to scala-i...@googlegroups.com
Can you expand on the equality stuff a bit? What would be a case where Java equality approach to HashMap would return different results than in Scala?

LongHashMap/DoubleHashMap: This sounds a lot like admitting defeat in the face of erasure. If we decide to do that, can we do it as loudly and publicly as possible? :-)

Rex Kerr

unread,
Nov 2, 2013, 1:49:35 PM11/2/13
to scala-i...@googlegroups.com
Hi Simon,

Scala hash maps understand boxing.  Java hash maps don't.  See example below.

We already have LongMap in collection.immutable.  So it's not exactly a precedent to add a LongMap to mutable as well.  (I guess I'll drop the "hash" part for brevity.)

I will add to the scaladoc of HashMap the tradeoffs between different hash maps.

  --Rex

scala> val hm1 = new scala.collection.mutable.HashMap[Any, String]
hm1: scala.collection.mutable.HashMap[Any,String] = Map()

scala> hm1 += ((1.0, "one"))
res24: hm1.type = Map(1.0 -> one)

scala> hm1(1)
res25: String = one


scala> val hm2 = new java.util.HashMap[Any, String]
hm2: java.util.HashMap[Any,String] = {}

scala> hm2 put (1.0, "one")
res26: String = null

scala> hm2.get(1)
res27: String = null


scala> hm2.get(1.0)
res28: String = one




On Sat, Nov 2, 2013 at 6:22 AM, Simon Ochsenreither <simon.och...@gmail.com> wrote:
Can you expand on the equality stuff a bit? What would be a case where Java equality approach to HashMap would return different results than in Scala?

LongHashMap/DoubleHashMap: This sounds a lot like admitting defeat in the face of erasure. If we decide to do that, can we do it as loudly and publicly as possible? :-)

--

Paul Phillips

unread,
Nov 2, 2013, 5:28:03 PM11/2/13
to scala-i...@googlegroups.com

On Sat, Nov 2, 2013 at 10:49 AM, Rex Kerr <ich...@gmail.com> wrote:
We already have LongMap in collection.immutable.  So it's not exactly a precedent to add a LongMap to mutable as well.  (I guess I'll drop the "hash" part for brevity.)

Are you planning on factoring out common logic? Doesn't seem wise to have completely independently developed implementations in mutable.LongMap and immutable.LongMap.


Paul Phillips

unread,
Nov 2, 2013, 5:38:55 PM11/2/13
to scala-i...@googlegroups.com
On Sat, Nov 2, 2013 at 10:49 AM, Rex Kerr <ich...@gmail.com> wrote:
Scala hash maps understand boxing.  Java hash maps don't.  See example below.

It is worth noting that the java behavior depends on java's signature for get, which I guess is some kind of "oops no contravariance" adjustment.

  public V get(Object key)

Observe what happens if key is a "K" instead of Object.

scala> class GetterHashMap[K, V] extends java.util.HashMap[K, V] { def typedGet(key: K): V = this get key }
defined class GetterHashMap

scala> val hm = new GetterHashMap[Float, String]
hm: GetterHashMap[Float,String] = {}

scala> hm put (1.0f, "one")
res7: String = null

scala> hm get 1
res8: String = null

scala> hm typedGet 1
res9: String = one


Rex Kerr

unread,
Nov 2, 2013, 8:50:54 PM11/2/13
to scala-i...@googlegroups.com
No, I'm not planning on factoring out common logic because there is a negligible amount, and multiple dispatch is the bane of performance.  So I would likely save very little effort and make both slower.  I will revisit the decision once I've actually got mutable.LongMap fully implemented, but right now I don't see a win there.  If I change my mind, the common stuff will of course go into collection.LongMapLike.

  --Rex



Paul Phillips

unread,
Nov 2, 2013, 8:53:01 PM11/2/13
to scala-i...@googlegroups.com

On Sat, Nov 2, 2013 at 5:50 PM, Rex Kerr <ich...@gmail.com> wrote:
No, I'm not planning on factoring out common logic because there is a negligible amount, and multiple dispatch is the bane of performance.

I phrase it "factoring out the common logic" explicitly to suggest that inheritance is not the only mechanism of abstract. Multiple dispatch is not required for code reuse.

Paul Phillips

unread,
Nov 2, 2013, 8:53:23 PM11/2/13
to scala-i...@googlegroups.com

On Sat, Nov 2, 2013 at 5:53 PM, Paul Phillips <pa...@improving.org> wrote:
of abstract

"of abstraction", right.

Rex Kerr

unread,
Nov 2, 2013, 8:56:45 PM11/2/13
to scala-i...@googlegroups.com
If you neglect all the stuff in common that is best handled by inheritance, then you are really scraping the bottom of the barrel for common code.

  --Rex


Paul Phillips

unread,
Nov 2, 2013, 8:59:54 PM11/2/13
to scala-i...@googlegroups.com
On Sat, Nov 2, 2013 at 5:56 PM, Rex Kerr <ich...@gmail.com> wrote:
If you neglect all the stuff in common that is best handled by inheritance, then you are really scraping the bottom of the barrel for common code.

I thought your implication was that none of it is best handled by inheritance, because multiple dispatch is the bane of performance. But, anyway, I leave it to you. Codebase isn't going to notice a little more duplication.

Rex Kerr

unread,
Nov 2, 2013, 9:03:14 PM11/2/13
to scala-i...@googlegroups.com
To clarify:

There are commonalities of interface that are best handled by inheritance, but you had better not actually use that common interface in the implementation of anything where you want maximum speed and might use both the immutable and mutable classes.

There is very little code in common otherwise, so any effort to factor out what's in common would end up making both code bases less clear.

  --Rex


Martin Odersky

unread,
Nov 3, 2013, 10:02:41 AM11/3/13
to scala-internals
Agreed on doing everything we must do to get good performance for HashMap. But I generally don't like these manual specializations. I think immutable.LongMap was a mistake, adding to it will not improve things. 

One question: Did you try hoisting out the test in HashMap? I mean, instead of going through the complicated logic for each comparison, have a test like this one:

  def get(key: Object) =
     if (key.isInstanceOf[Number]) numGet(key)
     else if (key.isInstanceOf[Character]) charGet(key)
     else fastGet(key)

where fastGet would use (x equals y) insead of (x == y). 

Cheers

 - Martin





 
  --Rex

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Martin Odersky
Chairman, Typesafe
The company for Reactive Apps on the JVM



Rex Kerr

unread,
Nov 3, 2013, 5:57:35 PM11/3/13
to scala-i...@googlegroups.com
I did try doing part of the equality test early but it didn't work all that well.

I can get more precise numbers if you'd like, but the problems are
  (1) numGet(key) is slow itself compared to simply looking up a primitive or doing fastGet, so if you use only Int you cannot be as fast as with Java (unless your algorithm is far superior otherwise, which is hard to manage)
  (2) You still do have to run through all that logic once, and then the other half of the logic once you hit a key (to figure out what the other thing is).  If you have to do N compares, you save on about (N-1)/2 of the logic you would have to do.  Unfortunately, with a well-designed hash map N ~= 1/3 anyway, which means you save on -1/3 of your logic: you do more lookup than you need to when you just hit an empty cell.  (Yes, with additional complexity you could defer the logic until you needed to test equality.  I haven't tried that.)
  (3) Once you have three or more implementations of get, you aren't able to manually inline parts of get where doing so provides a significant performance boost.

I understand the reluctance to manually specialize, but I hardly think adding a few collections that handle primitive types or simple equality for those cases where they really matter is a bad tradeoff given that
  (a) The less clean use of the Scala library is more than offset in users' code by people using all sorts of ad-hoc non-standard-library solutions
  (b) A sensible compiler-generated solution is not imminent
  (c) The hard part is creating, not maintaining, these collections, and I already know how to and partly have
  (d) Although having immutable.LongMap has been a mistake, inconsistency in the library is also not great

I'll continue to try to find some better compromise, but I've explored this reasonably thoroughly now and I doubt one exists.  I'll also post numbers when I get them.

  --Rex



Aleksandar Prokopec

unread,
Nov 4, 2013, 4:43:07 AM11/4/13
to scala-i...@googlegroups.com
Hi,

Just as a follow-up to what I said before, here's a gist with the specialized hash map with a separate typeclass for empty slots and array instantiation - I guess equality comparisons could be a part of the `Empty` typeclass too:

https://gist.github.com/axel22/7300277

I have to admit that I had not done any benchmarking on it, though, so cannot tell if it actually speeds things out.
It does avoid the need for explicit separate classes.

Cheers,
Alex
-- 
Aleksandar Prokopec,
Doctoral Assistant
LAMP, IC, EPFL
http://people.epfl.ch/aleksandar.prokopec

martin odersky

unread,
Nov 4, 2013, 4:58:52 AM11/4/13
to scala-internals
On Sun, Nov 3, 2013 at 11:57 PM, Rex Kerr <ich...@gmail.com> wrote:
I did try doing part of the equality test early but it didn't work all that well.

I can get more precise numbers if you'd like, but the problems are
  (1) numGet(key) is slow itself compared to simply looking up a primitive or doing fastGet, so if you use only Int you cannot be as fast as with Java (unless your algorithm is far superior otherwise, which is hard to manage)

I was mostly interested to make the non-int case fast. If we must, we can add a mutable.LongMap. But it would be nice if the standard HashMap was roughly as fast as the proposed EqualsHashMap. It's one thing to specialize on a type, but another to specialize on a different algorithm which is faster but supposedly less correct.

  (2) You still do have to run through all that logic once, and then the other half of the logic once you hit a key (to figure out what the other thing is).  If you have to do N compares, you save on about (N-1)/2 of the logic you would have to do.  Unfortunately, with a well-designed hash map N ~= 1/3 anyway, which means you save on -1/3 of your logic: you do more lookup than you need to when you just hit an empty cell.  (Yes, with additional complexity you could defer the logic until you needed to test equality.  I haven't tried that.)

No, if the selector is not a number you do not need to do the comparison again once you hit a key. None of the special cases in == applies to the case where a number is compared to something which is not a number. Also, of course, you can use hashCode instead of ##.

  (3) Once you have three or more implementations of get, you aren't able to manually inline parts of get where doing so provides a significant performance boost.

Ideally, the frontline get would be under 35 bytes, so would be inlined itself. 

I understand the reluctance to manually specialize, but I hardly think adding a few collections that handle primitive types or simple equality for those cases where they really matter is a bad tradeoff given that
  (a) The less clean use of the Scala library is more than offset in users' code by people using all sorts of ad-hoc non-standard-library solutions
  (b) A sensible compiler-generated solution is not imminent
  (c) The hard part is creating, not maintaining, these collections, and I already know how to and partly have
  (d) Although having immutable.LongMap has been a mistake, inconsistency in the library is also not great

I am ready to concede a mutable.LongMap :-). DoubleMap: Not sure. Given floating point imprecision I believe there will not be many use cases. EqualsMap: It would be good if we could avoid it. Let's do a best effort to speed up HashMap to see whether we can.

Cheers

 - Martin



--
Prof. Martin Odersky
LAMP/IC, EPFL
Station 14, 1015 Lausanne, Switzerland
Tel: +41 21 693 6863

√iktor Ҡlang

unread,
Nov 4, 2013, 5:20:02 AM11/4/13
to scala-i...@googlegroups.com
In my fantasy-land there'd be a typeclass to HashMap which provides the hashCode impl and equality-check.
Cheers,

Viktor Klang

Director of Engineering

Twitter: @viktorklang

Matthew Pocock

unread,
Nov 4, 2013, 5:29:02 AM11/4/13
to scala-i...@googlegroups.com

In mine, scalac would online the typeclass instance.

M

Sent from my android - may contain predictive text entertainment.

√iktor Ҡlang

unread,
Nov 4, 2013, 5:35:51 AM11/4/13
to scala-i...@googlegroups.com
Sounds mazing. It would put it on teh Internetz?

Vlad Ureche

unread,
Nov 4, 2013, 7:58:49 AM11/4/13
to scala-internals, Rex Kerr

2013/11/2 Rex Kerr <ich...@gmail.com>

Hi Erik and Alex,

Specialization produces too much code.  It's also valuable to have a leaner library.

I will have to have three separate implementations since JIT compilers are not all that great at realizing that instead of de-optimizing code in an inner loop, it can maintain multiple versions of the whole hierarchy.  So I will essentially be manually miniboxing.  I agree that it's annoying, but unless a solution is on a near horizon, I don't see a better option.

(I will explore making the JavaConverters wrapper over java.util.HashMap more robust and using that instead of EqualsHashMap, since that's what the Java version does anyway.)


I'm a little late to the party, but I'm glad to hear you are trying to use miniboxing to improve the HashMap performance. If I can help with anything just ping me.

Also, I would encourage you to try the miniboxing plugin and benchmark the code it produces. I'm currently working on implementing your suggestion of splitting Long and Double and I have a new java runtime that is always inlined by the JVM, thus yielding very reliable speedups.

Cheers,
Vlad

Nils Kilden-Pedersen

unread,
Nov 4, 2013, 12:57:28 PM11/4/13
to scala-i...@googlegroups.com

On Fri, Nov 1, 2013 at 5:47 PM, Rex Kerr <ich...@gmail.com> wrote:

I have been working on improving HashMap for 2.11.  To make a long story short, a significant part of why Java's HashMap is faster than the Scala one is that Java does less work on equality checking.  Scala provides a very full-featured == which will let you get the same answer for boxed primitives as if you knew the types and wrote ==.  But there's runtime overhead associated with even allowing for the possibility of this once you get outside of tight loops that always do the same thing (i.e. most non-microbenchmark code).

As a solution, I propose to add three new hash maps to the mutable collections.  They will inherit from Map but _not_ from HashMap because the contract is subtly but importantly different.  They are:

(1) EqualsHashMap.  (I am open to a better name.)  This map will work just like Java's HashMap in that it will rely solely on (a equals b) on boxed types to check whether two things are the same.  If something is boxed, it will work as well or poorly as the boxes' equals works.  This will be consistently faster than the existing mutable.HashMap as long as the constraints are okay.  (For something like String, it will be okay.)  I will try allowing the user to specify an equals, but I expect that the overhead of multiple dispatch via typeclasses--here manifested as inability to inline--will eat up any performance gains.  If not, it would be great to allow eq only, equals, or == all through the same interface.

(2) LongHashMap.  This map has Long keys (smaller types are okay too).  It will be much faster (2-3x at least) than using boxed primitive whole number keys.  I will probably parameterize it so that you can specify that only some types are okay as keys; I haven't yet seen if there's a performance penalty there.

(3) DoubleHashMap.  This map has Double keys (Float is okay too).  Again, it will be 2x+ faster than boxed primitives.

How would NaN behave? Like the boxed version, or lost forever?


This should address the performance complaints about Scala HashMap being slower than Java; EqualsHashMap should be every bit as fast (and I might just wrap Java's, though I think one of the variants I've tried is superior) and have more features, and the Long and Double versions will be faster than anything shipped with Java.

Does this sound like a reasonable plan?

  --Rex

Aleksandar Prokopec

unread,
Nov 4, 2013, 1:46:02 PM11/4/13
to scala-i...@googlegroups.com
Btw, Rex, are you planning to implement a bucketed hash map with closed addressing, or an open addressing one with linear collision resolution (like current mutable.HashSets)?
If the choice is the latter, will it be possible to choose a special value that denotes an empty bucket (e.g. 0, -1, Long.MinValue, depending on the client's needs)?

Paul Phillips

unread,
Nov 4, 2013, 3:24:44 PM11/4/13
to scala-i...@googlegroups.com
On Mon, Nov 4, 2013 at 10:46 AM, Aleksandar Prokopec <aleksanda...@gmail.com> wrote:
If the choice is the latter, will it be possible to choose a special value that denotes an empty bucket (e.g. 0, -1, Long.MinValue, depending on the client's needs)?

That screams value class, or would if value classes were better at being values than classes. Still, "ClientKey.Empty" is quite an improvement over "0L" if one looks past performance to some of the lesser considerations. Will one have to use undecorated Longs as keys to see the gains?

Rex Kerr

unread,
Nov 4, 2013, 4:27:35 PM11/4/13
to scala-i...@googlegroups.com
On Mon, Nov 4, 2013 at 1:58 AM, martin odersky <martin....@epfl.ch> wrote:

On Sun, Nov 3, 2013 at 11:57 PM, Rex Kerr <ich...@gmail.com> wrote:
I did try doing part of the equality test early but it didn't work all that well.

I can get more precise numbers if you'd like, but the problems are
  (1) numGet(key) is slow itself compared to simply looking up a primitive or doing fastGet, so if you use only Int you cannot be as fast as with Java (unless your algorithm is far superior otherwise, which is hard to manage)

I was mostly interested to make the non-int case fast. If we must, we can add a mutable.LongMap. But it would be nice if the standard HashMap was roughly as fast as the proposed EqualsHashMap. It's one thing to specialize on a type, but another to specialize on a different algorithm which is faster but supposedly less correct.

Ah.  I was worrying more about the Int case because this is where the biggest difference is from Java.
 

  (2) You still do have to run through all that logic once, and then the other half of the logic once you hit a key (to figure out what the other thing is).  If you have to do N compares, you save on about (N-1)/2 of the logic you would have to do.  Unfortunately, with a well-designed hash map N ~= 1/3 anyway, which means you save on -1/3 of your logic: you do more lookup than you need to when you just hit an empty cell.  (Yes, with additional complexity you could defer the logic until you needed to test equality.  I haven't tried that.)

No, if the selector is not a number you do not need to do the comparison again once you hit a key. None of the special cases in == applies to the case where a number is compared to something which is not a number. Also, of course, you can use hashCode instead of ##.

Indeed.  That's worth trying, given LongMap as an alternative.
 
I understand the reluctance to manually specialize, but I hardly think adding a few collections that handle primitive types or simple equality for those cases where they really matter is a bad tradeoff given that
  (a) The less clean use of the Scala library is more than offset in users' code by people using all sorts of ad-hoc non-standard-library solutions
  (b) A sensible compiler-generated solution is not imminent
  (c) The hard part is creating, not maintaining, these collections, and I already know how to and partly have
  (d) Although having immutable.LongMap has been a mistake, inconsistency in the library is also not great

I am ready to concede a mutable.LongMap :-). DoubleMap: Not sure. Given floating point imprecision I believe there will not be many use cases. EqualsMap: It would be good if we could avoid it. Let's do a best effort to speed up HashMap to see whether we can.

Well, the other problem with HashMap is that the interface is inherently difficult to speed up.  It relies enormously on the JIT compiler figuring out how to remove extra object creations and so on.  Sometimes it can, but the extra step of switching back and forth between HashEntry and Tuple2 is not really a good way to make the JVM's life easy.  So there is some advantage to another map that does not have the same baggage.

But I'll try a three-pronged strategy:
  (1) Manually inline the equals logic in the non-number case and give that a good workout
  (2) Add a LongMap to cover the number case
  (3) See if an alternate implementation with has enough of a performance boost to be worth offering

I don't think it's practical to remove the old HashTable/HashMap given how much it's used (not without having a superior alternative around for a while).

--Rex

Rex Kerr

unread,
Nov 4, 2013, 4:29:24 PM11/4/13
to Vlad Ureche, scala-internals
Sounds very interesting!  I'm going to be hard-pressed to get the HashMap stuff ready for M7, though, so I will probably not try this until after.  And I think with only handling LongMap, what I write won't really be much like manual miniboxing except inasmuch as the smaller primitive number types widen into Long automatically.

  --Rex

Rex Kerr

unread,
Nov 4, 2013, 4:31:05 PM11/4/13
to scala-i...@googlegroups.com
It probably won't behave like anything since Martin's less enthusiastic about DoubleMap and I'm short on time.  I handled it as a special case in my highly incomplete test code.
  --Rex

Rex Kerr

unread,
Nov 4, 2013, 4:36:52 PM11/4/13
to scala-i...@googlegroups.com
Open addressing with nonlinear collision resolution is the best-performing for Long keys, I think.  I'm using an exponential search stride right now, but I havent' exhaustively tested different stride types.  (The problem with linear is that if there are twenty entries in bucket N, all of the buckets N+1 through N+20 are guaranteed a slow search because they all travel the same path.  Most every nonlinear strategy will not suffer from this problem.)

The empty bucket is zero, and there is a special slot for the contents of zero.  So there are no forbidden keys.

I'll try to get around to testing your specialized versions for speed.

  --Rex


On Mon, Nov 4, 2013 at 10:46 AM, Aleksandar Prokopec <aleksanda...@gmail.com> wrote:

Erik Osheim

unread,
Nov 4, 2013, 5:21:18 PM11/4/13
to scala-i...@googlegroups.com
On Mon, Nov 04, 2013 at 01:36:52PM -0800, Rex Kerr wrote:
> Open addressing with nonlinear collision resolution is the best-performing
> for Long keys, I think. I'm using an exponential search stride right now,
> but I havent' exhaustively tested different stride types. (The problem
> with linear is that if there are twenty entries in bucket N, all of the
> buckets N+1 through N+20 are guaranteed a slow search because they all
> travel the same path. Most every nonlinear strategy will not suffer from
> this problem.)
>
> The empty bucket is zero, and there is a special slot for the contents of
> zero. So there are no forbidden keys.
>
> I'll try to get around to testing your specialized versions for speed.

If you're using a special slot for zero, do you have another sentinel
value with a special slot as well? If not, how are you tracking
deletions?

Is there any chance of seeing the implementation you're working on
before it gets officially committed? I'd love to be able to try to
help speed it up while it's in the design phase as opposed to the
binary compatibility phase.

-- Erik

P.S. I agree that open addressing makes a lot of sense here. In my
experience open addressing has performed better than closed in most
cases, as long as the rehashing strategy is reasonable.

Rex Kerr

unread,
Nov 4, 2013, 5:44:31 PM11/4/13
to scala-i...@googlegroups.com
Hi Erik,

Sure, I'll send you my implementation when it's not bits and pieces all over the place.

There are special slots for 0 and Long.MinValue because these you can pull out with a single condition x == -x (so as to slow down the normal cases as little as possible).  I haven't actually checked that this is faster than something like x == 0 || x == 1 (or x==0 | x==1).

  --Rex


Nils Kilden-Pedersen

unread,
Nov 4, 2013, 7:03:11 PM11/4/13
to scala-i...@googlegroups.com
On Mon, Nov 4, 2013 at 3:31 PM, Rex Kerr <ich...@gmail.com> wrote:
It probably won't behave like anything since Martin's less enthusiastic about DoubleMap and I'm short on time.  I handled it as a special case in my highly incomplete test code.

Since Double bits can be translated to Long bits (and back), couldn't you wrap the LongMap? Or are there corner cases (or too much overhead)?

Rex Kerr

unread,
Nov 4, 2013, 8:03:21 PM11/4/13
to scala-i...@googlegroups.com
I think I'll let people convert their Doubles bitwise into Longs on their own and just punt on the NaN etc. issues.  Wrappings are hard to maintain.  (Usually actually less hard than implementations with duplicated code, but they can be sociologically harder since when you fiddle with LongMap you probably won't feel like going through DoubleMap to see if any of the assumptions made are now violated.)

  --Rex

Rex Kerr

unread,
Nov 8, 2013, 10:51:50 PM11/8/13
to scala-i...@googlegroups.com
For what it's worth, the existing HashMap cannot be modified in a straightforward way to use the manual dispatch for both ## and BoxesRunTime.equals2; even though without optimization it gives a nice though not overwhelming performance boost (~15% on contains), with optimization there is some horrible failure of inlining that results in 2x slower code.

So I'm exploring whether I can provide a fully general alternative (flat) hash map in addition to the two that we have already got (yay, code duplication?!), or whether it's going to be LongMap alone, or whether I think there's a case for a RefMap that is not just a wrapper around Java.

That sounds like a lot, but I am actually almost done here.

  --Rex



On Mon, Nov 4, 2013 at 1:58 AM, martin odersky <martin....@epfl.ch> wrote:

Rex Kerr

unread,
Nov 10, 2013, 10:35:02 PM11/10/13
to scala-i...@googlegroups.com
It's very late in the release cycle to be making any additional plans, but the case for a replacement HashMap is a complicated one.

I've condensed a bunch of testing over a range of map sizes (1 to 300,000) and key data types (sequential integers, non-sequential integers, strings where reference equality holds between creation and testing, strings where reference equality fails, nested Tuple2s, and a mix of all of the above) into a series of graphs.

There are five basic tests: get (without modification), contains (without modification), iteration (accumulating a sum), interleaved deletion/re-addition of everything in the map, and creation of a map from scratch (which involves lots of putting).  These are plotted in different shapes and colors in the attached graphs.

The first thing to notice (in Scala_vs_Java) is that for a wide variety of objects Scala's HashMap is a lot slower than Java's.  Not a surprise, but you can see that get and contains, which are presumably the most heavily-used methods, are frequently 1.5x - 3x slower across practically all conditions.

The revised map I've created improves matters substantially for small maps, but starts lagging badly for large ones (especially for, it turns out, sequential integer keys).  This is illustrated in Any_vs_Java.

If we restrict the keys to Long using a similar strategy, suddenly we can beat Java handily (Long_vs_Java).  I've already submitted a pull request with LongMap to cover that case.

If we restrict the keys to AnyRef, which allows us to use .equals and .hashCode instead of BoxesRunTime.equals2 and .##, then the performance tracks Java quite well, except--not shown--the new data structure is more amenable to speeding up particular collections operations.  For example, it's possible to give builders access to special put methods that can bypass the need to check for removed keys.  There's also a performance penalty for wrapping Java's HashMap (also not shown), so overall it's usually a win.  AnyRef_vs_Java shows how they compare.

The question then is whether to supply a hash map that can be used with any key, or only with AnyRef keys.  The detailed information one needs to make that decision is n Any_vs_AnyRef: you can observe an inconsistent improvement with the latter on types that both support, but overall you generally get a 20-40% improvement.  (I've not been able to use early detection of key type to improve the performance.)

I prefer using an AnyRef-only map since anything that isn't an AnyRef can either be stored unboxed (value types) or converted to a Long (and thus go into LongMap).  This would lead naturally to selection of the type of map that will give best performance.  I think the performance gains are large enough to justify doing something.  It's close to a drop-in replacement, but the new implementation isn't fully source-compatible with collection.mutable.HashMap so I can't just swap the old one out.

I'm not sure there's much of a use case to have distinct Any and AnyRef capability, so we should pick one.  I know Martin favored the generic capability, but for the reasons described above I prefer the Long/AnyRef specialization without an Any-key implementation.

Graphs follow.  In all cases, the thick black line is "same speed" (1x).  If a point is to the right and below, it means that the collection in question is slower than one would hope.  To the left and above, faster.  Most comparisons are to Java, since that's what people blog about.

I plan to try submitting pull requests for both the AnyRef and the Any maps in the expectation that at least one will be rejected, but I'm not sure I'll have time to finish all the niceties with both (builders, docs, etc.).  Any hints for prioritization would help.

  --Rex


Scala_vs_Java.png
Any_vs_Java.png
Long_vs_Java.png
AnyRef_vs_Java.png
Any_vs_AnyRef.png

martin odersky

unread,
Nov 11, 2013, 2:32:26 AM11/11/13
to scala-internals
On Mon, Nov 11, 2013 at 4:35 AM, Rex Kerr <ich...@gmail.com> wrote:
It's very late in the release cycle to be making any additional plans, but the case for a replacement HashMap is a complicated one.

I've condensed a bunch of testing over a range of map sizes (1 to 300,000) and key data types (sequential integers, non-sequential integers, strings where reference equality holds between creation and testing, strings where reference equality fails, nested Tuple2s, and a mix of all of the above) into a series of graphs.

There are five basic tests: get (without modification), contains (without modification), iteration (accumulating a sum), interleaved deletion/re-addition of everything in the map, and creation of a map from scratch (which involves lots of putting).  These are plotted in different shapes and colors in the attached graphs.

The first thing to notice (in Scala_vs_Java) is that for a wide variety of objects Scala's HashMap is a lot slower than Java's.  Not a surprise, but you can see that get and contains, which are presumably the most heavily-used methods, are frequently 1.5x - 3x slower across practically all conditions.

The revised map I've created improves matters substantially for small maps, but starts lagging badly for large ones (especially for, it turns out, sequential integer keys).  This is illustrated in Any_vs_Java.

If we restrict the keys to Long using a similar strategy, suddenly we can beat Java handily (Long_vs_Java).  I've already submitted a pull request with LongMap to cover that case.

If we restrict the keys to AnyRef, which allows us to use .equals and .hashCode instead of BoxesRunTime.equals2 and .##, then the performance tracks Java quite well, except--not shown--the new data structure is more amenable to speeding up particular collections operations.  For example, it's possible to give builders access to special put methods that can bypass the need to check for removed keys.  There's also a performance penalty for wrapping Java's HashMap (also not shown), so overall it's usually a win.  AnyRef_vs_Java shows how they compare.

The question then is whether to supply a hash map that can be used with any key, or only with AnyRef keys.  The detailed information one needs to make that decision is n Any_vs_AnyRef: you can observe an inconsistent improvement with the latter on types that both support, but overall you generally get a 20-40% improvement.  (I've not been able to use early detection of key type to improve the performance.)

I think we definitely need a HashMap for Any keys. Any is the standard usage case. AnyRef is already a specialization. It's a pity that early key type detection did not give a win. I am somewhat puzzled why not. There should be a clear reduction of operations, right? I mean, in a normal contains case you need to do at least one ## and one ==, and, if there are collisions, potentially more than one ==. If these operations are expensive, then cutting them by half or more should show a win. Do we have an explanation why not?
 
I prefer using an AnyRef-only map since anything that isn't an AnyRef can either be stored unboxed (value types) or converted to a Long (and thus go into LongMap).  This would lead naturally to selection of the type of map that will give best performance.  I think the performance gains are large enough to justify doing something.  It's close to a drop-in replacement, but the new implementation isn't fully source-compatible with collection.mutable.HashMap so I can't just swap the old one out.

I'm not sure there's much of a use case to have distinct Any and AnyRef capability, so we should pick one.  I know Martin favored the generic capability, but for the reasons described above I prefer the Long/AnyRef specialization without an Any-key implementation.

So what would the implementation of a Map[Any, Any] be then?

Cheers

 - Martin
 
Graphs follow.  In all cases, the thick black line is "same speed" (1x).  If a point is to the right and below, it means that the collection in question is slower than one would hope.  To the left and above, faster.  Most comparisons are to Java, since that's what people blog about.

I plan to try submitting pull requests for both the AnyRef and the Any maps in the expectation that at least one will be rejected, but I'm not sure I'll have time to finish all the niceties with both (builders, docs, etc.).  Any hints for prioritization would help.

  --Rex


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Rex Kerr

unread,
Nov 11, 2013, 3:58:52 AM11/11/13
to scala-i...@googlegroups.com
To clarify: the existing HashMap would not change.  The reason is that it's heavily used and extended (mostly with the admittedly problematic SynchronizedMap and the not-entirely-elegant MultiMap, though there are a number of other cases), and the hooks we give people to extend it (via HashTable) prevent any substantial changes to the algorithm.  We're basically limited to reshuffling the bits of the hash code.  Although I had hoped to provide a drop-in replacement, I don't think now is the time for that.  (If we could break things more extensively maybe the open strategy with Any is the way to go.)

Given that we can't really get away from HashMap, Map[Any, Any] would just be a HashMap as always.

If HashMap can't change, the question is what to do about the speed difference with Java.

We could
  (1) Wrap Java's HashMap, but silently lose "proper" behavior of key comparison since Java only uses .hashCode and equals.
  (2) Provide a second HashMap (like OpenHashMap tries to do) that works with all types.
  (3) Only provide specialized maps that cover the most common cases.  LongMap does this with Long, and most everything else is an AnyRef.  I could specialize even further to StringMap, since most maps reportedly have string keys.

I prefer (3) since (1) alone is broken, and (2) seems redundant--we have too many hash maps around already.  (3) will tend to lead people who need performance to be likely to make the right choices.

Finally, I think the reason key type detection doesn't work well is that it is very difficult to keep the bytecode down and the JIT compiler is already being taxed to its limits to make hash maps work fast.  In particular,

  def put(k: K, v: V): Option[V] = {
    if (k.isInstanceOf[java.lang.Number] || k.isInstanceOf[java.lang.Character]) putA(k, v)
    else putB(k, v)
  }

is below the magic 35 byte limit but separate puts for Number and Character are not (unless you -XX:MaxInlineSize=38, I suppose...maybe I should have tried?).  And it's probably running afoul of InlineSmallCode and FreqInlineSize limits also.

Anyway, I'm not certain; I didn't check the machine code.  I agree that it is formally less work.  But as a practical matter it seems to like to have the work structured as it is now, with the checks local to the usage.

Note that _sometimes_ the early type detection works.  Sometimes there's no penalty for late type detection either!  But I can't get consistent improvement (and often find degradations) in code that otherwise is consistently faster when I just switch .## to .hashCode and == to equals.  With infinite time I could probably figure out exactly why and maybe how to fix it (under the Oracle JVM only), but I don't have infinite time.

  --Rex
    



√iktor Ҡlang

unread,
Nov 11, 2013, 4:12:33 AM11/11/13
to scala-i...@googlegroups.com
What about:

def put(k: K, v: V): Option[V] = if (k.isInstanceOf[java.lang.Number]) putNum(k, v) else putOther(k, v)
private[this] final def putOther(k: K, v: V): Option[V] = if (k.isInstanceOf[java.lang.Character]) putChar(k, v) else putAny(k, v)

Rex Kerr

unread,
Nov 11, 2013, 2:43:43 PM11/11/13
to scala-i...@googlegroups.com
Split up two ways instead of three, which should be better for the Any branch, gives sometimes equal performance to just bare .equals and .hashCode, and sometimes about 20% worse (on contains).

  --Rex

√iktor Ҡlang

unread,
Nov 11, 2013, 2:47:42 PM11/11/13
to scala-i...@googlegroups.com
I'm not sure I understand how you mean, elaborate?

Rex Kerr

unread,
Nov 11, 2013, 4:00:07 PM11/11/13
to scala-i...@googlegroups.com
The code is the same as I posted before, except with contains:

  def contains(k: K): Boolean = {
    if (k.isInstanceOf[java.lang.Number] || k.isInstanceOf[java.lang.Character])
      containsNC(k)
    else containsAny(k)
  }

which is below the 35 byte limit and still sometimes gives a ~20% penalty.  Not always, mind you--just occasionally.  And when the only thing that is changing is the size of the collection.

  --Rex

√iktor Ҡlang

unread,
Nov 11, 2013, 4:12:53 PM11/11/13
to scala-i...@googlegroups.com

Yes, but what is the impl of containsNC?

Rex Kerr

unread,
Nov 11, 2013, 4:23:02 PM11/11/13
to scala-i...@googlegroups.com
containsNC's implementation has been either of

  = ???

and

  (the full logic for looking up keys with ==)

and it doesn't matter which implementation is used (as it shouldn't) when the keys are String.

I was only worrying about the impact on containsAny.  So to even dispatch to containsAny in this way you _sometimes_ get a 20% penalty.  If you have a primitive, though, you should use LongMap instead.  As shown in my email with the graphs, that one really is faster.

  --Rex

√iktor Ҡlang

unread,
Nov 11, 2013, 5:18:02 PM11/11/13
to scala-i...@googlegroups.com
So how did my suggestion perform in comparison?

Rex Kerr

unread,
Nov 11, 2013, 7:09:49 PM11/11/13
to scala-i...@googlegroups.com
Even worse.  There is no improvement in your scheme over just using == in the inner loop.  So it's about a 20% penalty when you get it, but you get it under more conditions (about the same as with ==), at least in my incomplete testing.

  --Rex

√iktor Ҡlang

unread,
Nov 11, 2013, 7:21:46 PM11/11/13
to scala-i...@googlegroups.com
So the cost is not in the branching or the inlining?

Rex Kerr

unread,
Nov 11, 2013, 7:55:49 PM11/11/13
to scala-i...@googlegroups.com
I haven't enough time to track it down in gory detail.  I suspect that the cost is in the branching, and that inlining can be used by the JIT to understand more context and therefore skip branches.  But "helping" the JIT reduce branches manually can induce different choices for inlining.  So it's a wash.

But that's just speculation.  I'd have to check the compiled code to know, and I don't have the tooling set up to make that easy, and I don't have the time to either generate the tooling or wade through the morass of compilation output that you get when you turn on assembly output.

  --Rex

Rex Kerr

unread,
Nov 11, 2013, 11:56:33 PM11/11/13
to scala-i...@googlegroups.com
Here's an example, for the curious, of why tuning for performance on the JVM is so tricky when one gets to this level of detail and intricacy.  Consider two pieces of code:

  def contains(key: K): Boolean = seekEntry(hashOf(key), key) >= 0

and

  def getOrNull(key: K): V = {
    val i = seekEntry(hashOf(key), key)
    (if (i < 0) null else myValues(i)).asInstanceOf[V]
  }

Since the second method requires all the work of the first plus an extra lookup and cast on success, obviously the first will be faster.

Benchmark (in 629.1 ms):  containsOC100
    AnyMap     18.29 ns   95% CI 17.90 ns - 18.68 ns

Benchmark (in 577.9 ms):  getOrNullOC100
    AnyMap     13.17 ns   95% CI 12.86 ns - 13.48 ns

Um, thanks JIT compiler!  Do more work and take less time!  If only this extended indefinitely....

(This only happens with the non-AnyRef specialized version.)

So, shall I commit the Any-containing version or not?

  --Rex

Fernando Racca

unread,
Nov 12, 2013, 4:30:23 AM11/12/13
to scala-i...@googlegroups.com


On Monday, 11 November 2013 08:58:52 UTC, Rex Kerr wrote:

If HashMap can't change, the question is what to do about the speed difference with Java.

We could
  (1) Wrap Java's HashMap, but silently lose "proper" behavior of key comparison since Java only uses .hashCode and equals.
  (2) Provide a second HashMap (like OpenHashMap tries to do) that works with all types.
  (3) Only provide specialized maps that cover the most common cases.  LongMap does this with Long, and most everything else is an AnyRef.  I could specialize even further to StringMap, since most maps reportedly have string keys.

I prefer (3) since (1) alone is broken, and (2) seems redundant--we have too many hash maps around already.  (3) will tend to lead people who need performance to be likely to make the right choices.

If ever comes the chance of having to choose for even just one optimized-for-speed map, then i would certainly favour having a StringMap (or StringKeyMap perhaps?).

My 2 cents,
Fernando Racca 

Vlad Ureche

unread,
Nov 12, 2013, 8:22:09 AM11/12/13
to scala-internals, Rex Kerr
Rex, did you check how the maps react to multiple type profiles and paths taken through the code?
I found this affects both miniboxed and generic code to a big extent:

  if (multi_context) {
    ignore_result(bechmark_create_INT())
    ignore
_result(bechmark_create_DOUBLE())
    ignore
_result(bechmark_create_SHORT())
  }
  report(bechmark_create_LONG()) // depending on whether the calls
                                 //
above were executed, the timing
                                 // can change, because different
                                 // inlining decision are taken

Just setting multi_context to true gives me a 110x slowdown when benchmarking a 3M element generic linked list creation (see page 15 of the miniboxing paper):

(Generic) List creation
Single Context (multi_context=false):   16.7 ms
Multi Context
(multi_context=true)  : 1841.0 ms

Speaking of obtaining the numbers, is your project public so I can try benchmarking it?

Cheers,


2013/11/11 Rex Kerr <ich...@gmail.com>

Rex Kerr

unread,
Nov 12, 2013, 3:50:39 PM11/12/13
to Vlad Ureche, scala-internals
I didn't specifically check.  One test uses a mix of different types, though, and it produces similar results to the others.

(I am aware that multiple dispatch can cause dramatic changes to later optimization of code.  I've never seen an effect quite as huge as the one you saw; the biggest I found was something like 12x IIRC.)

The project isn't public because I was scrambling to get everything working and it's kind of a mess.  I suppose I can put it somewhere if you don't mind the random combination of bugs, poor design practices, and multiple ways of doing the same thing interleaved in a way that ends up working but doesn't look like it should.

  --Rex

martin odersky

unread,
Nov 12, 2013, 3:57:21 PM11/12/13
to scala-internals
On Mon, Nov 11, 2013 at 8:43 PM, Rex Kerr <ich...@gmail.com> wrote:
Split up two ways instead of three, which should be better for the Any branch, gives sometimes equal performance to just bare .equals and .hashCode, and sometimes about 20% worse (on contains).

If I understood you correctly, that's looking promising, right?

 - Martin

Rex Kerr

unread,
Nov 12, 2013, 5:15:24 PM11/12/13
to scala-i...@googlegroups.com
Well, I'm not really sure whether that is "promising" or "a lot of code duplication for dubious gain".

It *is* a lot of code duplication.  A number of my attempts to push that through all the methods have resulted in even slower code than just relying on ==.

I guess I can run a full set of head-to-head benchmarks for == vs. two-way split vs. AnyRef specialized.  The spot testing wasn't very encouraging.

  --Rex

Jason Zaugg

unread,
Nov 12, 2013, 5:15:10 PM11/12/13
to scala-i...@googlegroups.com
On Tue, Nov 12, 2013 at 9:50 PM, Rex Kerr <ich...@gmail.com> wrote:
I didn't specifically check.  One test uses a mix of different types, though, and it produces similar results to the others.

(I am aware that multiple dispatch can cause dramatic changes to later optimization of code.  I've never seen an effect quite as huge as the one you saw; the biggest I found was something like 12x IIRC.)

The project isn't public because I was scrambling to get everything working and it's kind of a mess.  I suppose I can put it somewhere if you don't mind the random combination of bugs, poor design practices, and multiple ways of doing the same thing interleaved in a way that ends up working but doesn't look like it should.

I've been working on compiler tuning for the last two weeks. In aggregate, I've got close to 30% bump on this branch for hot, repeated, resident compilation of scalap. Throw in GenBCode and -Ydelambdafy:method and we hit 35% speedup.

Indeed, some of the wins have come from avoiding s.c.m.HashMap, and also `BoxesRuntime.{_hash, equals2}`. I can benchmark competing AnyRef map implementations in the compiler to help inform our decision making, please let me know when you're ready for that.

Until now, I was just using YourKit, but today I bought a copy of Java Performance and tried once again to spin up a Centos VM to run the benchmark through the oh-so-catchily named "Oracle Solaris Studio Performance Analyzer". Previous attempts led only to frustration and pain, but this time I *might* have something running.

There is a 3x slowdown in the VM compared to the host, but soldiering on, this result seems interesting enough to share. (Forgive the horrific fonts / screenshot, if I knew how to get this as text I would have.) What we're looking at here are the top callers to the most heavily used method in the system, "<vtable stub>"

Inline image 1

There are a few other vtable stubs, with a similar array of hot, megamorphic call sites leading into them. Notice that HashTable features heavily. We can see that "index" is in there, as "improves" is polymorphic.

It occurs to me that our encoding of traits must give rise to "accidental" megamorphism: each mixin creates a different forwarder, which all happen to contain the same logic to forward to one implementation class.

We can also see callers to another top player in the trace "slow_subtype_check Runtime1 stub" 

Inline image 2

I need to run the benchmark for substantially longer to reduce the noise, and learn what else the tool can tell us. Will keep digging into this tomorrow.

I'd appreciate any pointers from folks more familiar with this low-level profiling.

-jason
image.png
image.png

√iktor Ҡlang

unread,
Nov 12, 2013, 5:17:57 PM11/12/13
to scala-i...@googlegroups.com
Jason: Can the forwarders be eliminated via pruning or similar?


--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
image.png
image.png

Jason Zaugg

unread,
Nov 12, 2013, 5:24:48 PM11/12/13
to scala-i...@googlegroups.com
On Tue, Nov 12, 2013 at 11:17 PM, √iktor Ҡlang <viktor...@gmail.com> wrote:
Jason: Can the forwarders be eliminated via pruning or similar?

I'm not sure which technique you're referring to, nor whether you mean in scalac or in HotSpot.

To elaborate a little on what I was talking about, see the code below. We end up with two concrete implementations of `foo`. Abstract classes are shackling, but lead to bytecode patterns that seem more amenable to HotSpot.

-jason

trait T {
  def foo = 0
}
class C extends T
class D extends T
topic/opt5 /code/scala3 scalac -Xprint:mixin sandbox/test.scala
[[syntax trees at end of                     mixin]] // test.scala
package <empty> {
  abstract trait T extends Object {
    def foo(): Int
  };
  class C extends Object with T {
    def foo(): Int = T$class.foo(C.this);
    def <init>(): C = {
      C.super.<init>();
      T$class./*T$class*/$init$(C.this);
      ()
    }
  };
  class D extends Object with T {
    def foo(): Int = T$class.foo(D.this);
    def <init>(): D = {
      D.super.<init>();
      T$class./*T$class*/$init$(D.this);
      ()
    }
  };
  abstract trait T$class extends  {
    def foo($this: T): Int = 0;
    def /*T$class*/$init$($this: T): Unit = {
      ()
    }
  }
}

√iktor Ҡlang

unread,
Nov 12, 2013, 5:42:13 PM11/12/13
to scala-i...@googlegroups.com
On Tue, Nov 12, 2013 at 11:24 PM, Jason Zaugg <jza...@gmail.com> wrote:
On Tue, Nov 12, 2013 at 11:17 PM, √iktor Ҡlang <viktor...@gmail.com> wrote:
Jason: Can the forwarders be eliminated via pruning or similar?

I'm not sure which technique you're referring to, nor whether you mean in scalac or in HotSpot.

I thought you meant that you got chains of forwarders.
 

--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Grzegorz Kossakowski

unread,
Nov 13, 2013, 5:43:50 AM11/13/13
to scala-internals
On 12 November 2013 23:15, Jason Zaugg <jza...@gmail.com> wrote:

I'd appreciate any pointers from folks more familiar with this low-level profiling.

If you went that far then you can also give Intel's VTune a spin. I've heard that it has smaller overhead and higher precision than most profilers due to the fact that it reads data from special hardware counters that are available in Intel's i7.

I never got it to work because it requires Linux/Windows installation without VM being involved (so it can access the hardware directly). I guess this is a good idea to try if you drop VM and use Parallels on your Mac.

--
Grzegorz Kossakowski
Scalac hacker at Typesafe
twitter: @gkossakowski

Rex Kerr

unread,
Nov 13, 2013, 1:46:37 PM11/13/13
to scala-i...@googlegroups.com
Okay, here's the documentation and more comprehensive analysis of what is going on with early key detection.

In comparison to AnyRef specialized maps, when keys are of type AnyRef,
  * Sometimes Any maps using == are the same speed and sometimes they're about 50% slower.
  * Of those which are 50% slower
    - A few can be rescued a to "normal" by key specialization
    - Many more are worsened to 50% slower by key specialization
  * There are also a few operations that are about 20% slower
    - A few can be rescued to "normal" by key specialization
    - One is worsened from normal to 20% slower

Overall, this is not a win.  And you'd never want to use this for Long keys because Long is way faster.

So LongMap is clearly a win.

AnyRefMap is more of a win than AnyMap.

The options are (with consequences):

(1) Have AnyMap but not AnyRefMap.  Make people live with 50% slowdowns under some conditions in exchange for library simplicity.
(2) Have AnyRef map but not AnyMap.  Make people specialize on keys but have automatic best-class performance.
(3) Have both AnyMap and AnyRefMap.  Suffer increased library complexity but explain the tradeoffs in the Scaladoc.

I have been assuming (2) is the way to go, so haven't added AnyMap to the pull request for these things.  But it's all tested and ready to go.  If we want to switch to or add AnyMap, I can do so in short order.

  --Rex

P.S. I didn't try key specialization for update, given that get was worse than contains.

P.P.S. getOrNull--which is perhaps not ideally named but returns null.asInstanceOf[V] when a key is not found--is 20% faster on a 50-50 mix of found and missed keys than is apply or getOrElse.  That's why I included it in the API and why I benchmarked using it.

contains-spec.png
Reply all
Reply to author
Forward
0 new messages