--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Can you expand on the equality stuff a bit? What would be a case where Java equality approach to HashMap would return different results than in Scala?
LongHashMap/DoubleHashMap: This sounds a lot like admitting defeat in the face of erasure. If we decide to do that, can we do it as loudly and publicly as possible? :-)
--
We already have LongMap in collection.immutable. So it's not exactly a precedent to add a LongMap to mutable as well. (I guess I'll drop the "hash" part for brevity.)
Scala hash maps understand boxing. Java hash maps don't. See example below.
No, I'm not planning on factoring out common logic because there is a negligible amount, and multiple dispatch is the bane of performance.
If you neglect all the stuff in common that is best handled by inheritance, then you are really scraping the bottom of the barrel for common code.
--Rex
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--Martin OderskyChairman, Typesafe
The company for Reactive Apps on the JVM
-- Aleksandar Prokopec, Doctoral Assistant LAMP, IC, EPFL http://people.epfl.ch/aleksandar.prokopec
I did try doing part of the equality test early but it didn't work all that well.(1) numGet(key) is slow itself compared to simply looking up a primitive or doing fastGet, so if you use only Int you cannot be as fast as with Java (unless your algorithm is far superior otherwise, which is hard to manage)
I can get more precise numbers if you'd like, but the problems are
(2) You still do have to run through all that logic once, and then the other half of the logic once you hit a key (to figure out what the other thing is). If you have to do N compares, you save on about (N-1)/2 of the logic you would have to do. Unfortunately, with a well-designed hash map N ~= 1/3 anyway, which means you save on -1/3 of your logic: you do more lookup than you need to when you just hit an empty cell. (Yes, with additional complexity you could defer the logic until you needed to test equality. I haven't tried that.)
(3) Once you have three or more implementations of get, you aren't able to manually inline parts of get where doing so provides a significant performance boost.
(b) A sensible compiler-generated solution is not imminent(a) The less clean use of the Scala library is more than offset in users' code by people using all sorts of ad-hoc non-standard-library solutionsI understand the reluctance to manually specialize, but I hardly think adding a few collections that handle primitive types or simple equality for those cases where they really matter is a bad tradeoff given that
(c) The hard part is creating, not maintaining, these collections, and I already know how to and partly have(d) Although having immutable.LongMap has been a mistake, inconsistency in the library is also not great
In mine, scalac would online the typeclass instance.
M
Sent from my android - may contain predictive text entertainment.
(I will explore making the JavaConverters wrapper over java.util.HashMap more robust and using that instead of EqualsHashMap, since that's what the Java version does anyway.)I will have to have three separate implementations since JIT compilers are not all that great at realizing that instead of de-optimizing code in an inner loop, it can maintain multiple versions of the whole hierarchy. So I will essentially be manually miniboxing. I agree that it's annoying, but unless a solution is on a near horizon, I don't see a better option.Hi Erik and Alex,Specialization produces too much code. It's also valuable to have a leaner library.
On Fri, Nov 1, 2013 at 5:47 PM, Rex Kerr <ich...@gmail.com> wrote:
(3) DoubleHashMap. This map has Double keys (Float is okay too). Again, it will be 2x+ faster than boxed primitives.(2) LongHashMap. This map has Long keys (smaller types are okay too). It will be much faster (2-3x at least) than using boxed primitive whole number keys. I will probably parameterize it so that you can specify that only some types are okay as keys; I haven't yet seen if there's a performance penalty there.(1) EqualsHashMap. (I am open to a better name.) This map will work just like Java's HashMap in that it will rely solely on (a equals b) on boxed types to check whether two things are the same. If something is boxed, it will work as well or poorly as the boxes' equals works. This will be consistently faster than the existing mutable.HashMap as long as the constraints are okay. (For something like String, it will be okay.) I will try allowing the user to specify an equals, but I expect that the overhead of multiple dispatch via typeclasses--here manifested as inability to inline--will eat up any performance gains. If not, it would be great to allow eq only, equals, or == all through the same interface.I have been working on improving HashMap for 2.11. To make a long story short, a significant part of why Java's HashMap is faster than the Scala one is that Java does less work on equality checking. Scala provides a very full-featured == which will let you get the same answer for boxed primitives as if you knew the types and wrote ==. But there's runtime overhead associated with even allowing for the possibility of this once you get outside of tight loops that always do the same thing (i.e. most non-microbenchmark code).As a solution, I propose to add three new hash maps to the mutable collections. They will inherit from Map but _not_ from HashMap because the contract is subtly but importantly different. They are:
How would NaN
behave? Like the boxed version, or lost forever?
--RexDoes this sound like a reasonable plan?This should address the performance complaints about Scala HashMap being slower than Java; EqualsHashMap should be every bit as fast (and I might just wrap Java's, though I think one of the variants I've tried is superior) and have more features, and the Long and Double versions will be faster than anything shipped with Java.
If the choice is the latter, will it be possible to choose a special value that denotes an empty bucket (e.g. 0, -1, Long.MinValue, depending on the client's needs)?
On Sun, Nov 3, 2013 at 11:57 PM, Rex Kerr <ich...@gmail.com> wrote:
I did try doing part of the equality test early but it didn't work all that well.(1) numGet(key) is slow itself compared to simply looking up a primitive or doing fastGet, so if you use only Int you cannot be as fast as with Java (unless your algorithm is far superior otherwise, which is hard to manage)
I can get more precise numbers if you'd like, but the problems are
I was mostly interested to make the non-int case fast. If we must, we can add a mutable.LongMap. But it would be nice if the standard HashMap was roughly as fast as the proposed EqualsHashMap. It's one thing to specialize on a type, but another to specialize on a different algorithm which is faster but supposedly less correct.
(2) You still do have to run through all that logic once, and then the other half of the logic once you hit a key (to figure out what the other thing is). If you have to do N compares, you save on about (N-1)/2 of the logic you would have to do. Unfortunately, with a well-designed hash map N ~= 1/3 anyway, which means you save on -1/3 of your logic: you do more lookup than you need to when you just hit an empty cell. (Yes, with additional complexity you could defer the logic until you needed to test equality. I haven't tried that.)
No, if the selector is not a number you do not need to do the comparison again once you hit a key. None of the special cases in == applies to the case where a number is compared to something which is not a number. Also, of course, you can use hashCode instead of ##.
(b) A sensible compiler-generated solution is not imminentI understand the reluctance to manually specialize, but I hardly think adding a few collections that handle primitive types or simple equality for those cases where they really matter is a bad tradeoff given that(a) The less clean use of the Scala library is more than offset in users' code by people using all sorts of ad-hoc non-standard-library solutions
(c) The hard part is creating, not maintaining, these collections, and I already know how to and partly have(d) Although having immutable.LongMap has been a mistake, inconsistency in the library is also not great
I am ready to concede a mutable.LongMap :-). DoubleMap: Not sure. Given floating point imprecision I believe there will not be many use cases. EqualsMap: It would be good if we could avoid it. Let's do a best effort to speed up HashMap to see whether we can.
It probably won't behave like anything since Martin's less enthusiastic about DoubleMap and I'm short on time. I handled it as a special case in my highly incomplete test code.
It's very late in the release cycle to be making any additional plans, but the case for a replacement HashMap is a complicated one.
I've condensed a bunch of testing over a range of map sizes (1 to 300,000) and key data types (sequential integers, non-sequential integers, strings where reference equality holds between creation and testing, strings where reference equality fails, nested Tuple2s, and a mix of all of the above) into a series of graphs.
There are five basic tests: get (without modification), contains (without modification), iteration (accumulating a sum), interleaved deletion/re-addition of everything in the map, and creation of a map from scratch (which involves lots of putting). These are plotted in different shapes and colors in the attached graphs.
The first thing to notice (in Scala_vs_Java) is that for a wide variety of objects Scala's HashMap is a lot slower than Java's. Not a surprise, but you can see that get and contains, which are presumably the most heavily-used methods, are frequently 1.5x - 3x slower across practically all conditions.
The revised map I've created improves matters substantially for small maps, but starts lagging badly for large ones (especially for, it turns out, sequential integer keys). This is illustrated in Any_vs_Java.
If we restrict the keys to Long using a similar strategy, suddenly we can beat Java handily (Long_vs_Java). I've already submitted a pull request with LongMap to cover that case.If we restrict the keys to AnyRef, which allows us to use .equals and .hashCode instead of BoxesRunTime.equals2 and .##, then the performance tracks Java quite well, except--not shown--the new data structure is more amenable to speeding up particular collections operations. For example, it's possible to give builders access to special put methods that can bypass the need to check for removed keys. There's also a performance penalty for wrapping Java's HashMap (also not shown), so overall it's usually a win. AnyRef_vs_Java shows how they compare.
The question then is whether to supply a hash map that can be used with any key, or only with AnyRef keys. The detailed information one needs to make that decision is n Any_vs_AnyRef: you can observe an inconsistent improvement with the latter on types that both support, but overall you generally get a 20-40% improvement. (I've not been able to use early detection of key type to improve the performance.)
I prefer using an AnyRef-only map since anything that isn't an AnyRef can either be stored unboxed (value types) or converted to a Long (and thus go into LongMap). This would lead naturally to selection of the type of map that will give best performance. I think the performance gains are large enough to justify doing something. It's close to a drop-in replacement, but the new implementation isn't fully source-compatible with collection.mutable.HashMap so I can't just swap the old one out.
I'm not sure there's much of a use case to have distinct Any and AnyRef capability, so we should pick one. I know Martin favored the generic capability, but for the reasons described above I prefer the Long/AnyRef specialization without an Any-key implementation.
Graphs follow. In all cases, the thick black line is "same speed" (1x). If a point is to the right and below, it means that the collection in question is slower than one would hope. To the left and above, faster. Most comparisons are to Java, since that's what people blog about.
I plan to try submitting pull requests for both the AnyRef and the Any maps in the expectation that at least one will be rejected, but I'm not sure I'll have time to finish all the niceties with both (builders, docs, etc.). Any hints for prioritization would help.
--Rex
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Yes, but what is the impl of containsNC?
If HashMap can't change, the question is what to do about the speed difference with Java.We could(1) Wrap Java's HashMap, but silently lose "proper" behavior of key comparison since Java only uses .hashCode and equals.(2) Provide a second HashMap (like OpenHashMap tries to do) that works with all types.(3) Only provide specialized maps that cover the most common cases. LongMap does this with Long, and most everything else is an AnyRef. I could specialize even further to StringMap, since most maps reportedly have string keys.I prefer (3) since (1) alone is broken, and (2) seems redundant--we have too many hash maps around already. (3) will tend to lead people who need performance to be likely to make the right choices.
Split up two ways instead of three, which should be better for the Any branch, gives sometimes equal performance to just bare .equals and .hashCode, and sometimes about 20% worse (on contains).
I didn't specifically check. One test uses a mix of different types, though, and it produces similar results to the others.(I am aware that multiple dispatch can cause dramatic changes to later optimization of code. I've never seen an effect quite as huge as the one you saw; the biggest I found was something like 12x IIRC.)
The project isn't public because I was scrambling to get everything working and it's kind of a mess. I suppose I can put it somewhere if you don't mind the random combination of bugs, poor design practices, and multiple ways of doing the same thing interleaved in a way that ends up working but doesn't look like it should.
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Jason: Can the forwarders be eliminated via pruning or similar?
On Tue, Nov 12, 2013 at 11:17 PM, √iktor Ҡlang <viktor...@gmail.com> wrote:
Jason: Can the forwarders be eliminated via pruning or similar?I'm not sure which technique you're referring to, nor whether you mean in scalac or in HotSpot.
--
You received this message because you are subscribed to the Google Groups "scala-internals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-interna...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
I'd appreciate any pointers from folks more familiar with this low-level profiling.