TrieMap & Hasher

168 views
Skip to first unread message

√iktor Ҡlang

unread,
May 20, 2012, 12:06:58 PM5/20/12
to scala-i...@googlegroups.com
Hey,

could we add the opportunity to specify a Hasher into TrieMap, so if I need to create an IdentityTrieMap or just want to improve the hash of a known bad implementation I can override the default which would be key.##?

Cheers,

--
Viktor Klang

Akka Tech Lead
Typesafe - The software stack for applications that scale

Twitter: @viktorklang

Prokopec Aleksandar

unread,
May 20, 2012, 12:11:50 PM5/20/12
to scala-i...@googlegroups.com
Hi!

I personally see no reason why not.
This is different than the current approach in mutable.HashTable, where an appropriate method should be overridden to achieve the same effect.
But I think it's a better approach to have a custom Hasher object, due to the fact that calling `par` on an overridden hashtable would produce a parallel hash table which "forgets" the original hashing strategy. So, maybe we should do the same thing there.

Cheers,
Alex


________________________________________
From: scala-i...@googlegroups.com [scala-i...@googlegroups.com] on behalf of √iktor Ҡlang [viktor...@gmail.com]
Sent: 20 May 2012 18:06
To: scala-i...@googlegroups.com
Subject: [scala-internals] TrieMap & Hasher

Hey,

could we add the opportunity to specify a Hasher into TrieMap, so if I need to create an IdentityTrieMap or just want to improve the hash of a known bad implementation I can override the default which would be key.##?

Cheers,


--
Viktor Klang

Akka Tech Lead
Typesafe<http://www.typesafe.com/> - The software stack for applications that scale

Twitter: @viktorklang

√iktor Ҡlang

unread,
May 20, 2012, 12:49:13 PM5/20/12
to scala-i...@googlegroups.com
Hi Aleks,

sounds like a good plan, looking forward to switching from CHM to TrieMap in Akka 2.1 :-) (just need that Hasher)

Cheers,
Typesafe - The software stack for applications that scale

Twitter: @viktorklang

Adriaan Moors

unread,
May 31, 2012, 7:46:56 AM5/31/12
to scala-i...@googlegroups.com
We were just talking about this in a meeting, and Martin remarked that we should do this factoring
as high up as possible in the hierarchy. It's also a nice opportunity
to unify WeakIdentityHashMap and WeakHashMap

On Sunday, May 20, 2012 6:49:13 PM UTC+2, √iktor Klang wrote:
Hi Aleks,

sounds like a good plan, looking forward to switching from CHM to TrieMap in Akka 2.1 :-) (just need that Hasher)

Cheers,


On Sun, May 20, 2012 at 6:11 PM, Prokopec Aleksandar <aleksanda...@epfl.ch> wrote:
Hi!

I personally see no reason why not.
This is different than the current approach in mutable.HashTable, where an appropriate method should be overridden to achieve the same effect.
But I think it's a better approach to have a custom Hasher object, due to the fact that calling `par` on an overridden hashtable would produce a parallel hash table which "forgets" the original hashing strategy. So, maybe we should do the same thing there.

Cheers,
Alex


________________________________________

Sent: 20 May 2012 18:06

Subject: [scala-internals] TrieMap & Hasher

Hey,

could we add the opportunity to specify a Hasher into TrieMap, so if I need to create an IdentityTrieMap or just want to improve the hash of a known bad implementation I can override the default which would be key.##?

Cheers,


--
Viktor Klang

Akka Tech Lead
Typesafe<http://www.typesafe.com/> - The software stack for applications that scale

Twitter: @viktorklang

√iktor Ҡlang

unread,
May 31, 2012, 7:55:09 AM5/31/12
to scala-i...@googlegroups.com
On Thu, May 31, 2012 at 1:46 PM, Adriaan Moors <adriaa...@gmail.com> wrote:
We were just talking about this in a meeting, and Martin remarked that we should do this factoring
as high up as possible in the hierarchy. It's also a nice opportunity
to unify WeakIdentityHashMap and WeakHashMap


Good call

Stefan Zeiger

unread,
May 31, 2012, 8:25:44 AM5/31/12
to scala-i...@googlegroups.com
On 2012-05-31 13:46, Adriaan Moors wrote:
> We were just talking about this in a meeting, and Martin remarked that
> we should do this factoring
> as high up as possible in the hierarchy. It's also a nice opportunity
> to unify WeakIdentityHashMap and WeakHashMap

Note that this doesn't work for wrapped Java Maps because they don't
provide a similar mechanism. WeakHashMap is currently written in that
way. We'd need a completely new implementation.

-sz

Jonas Bonér

unread,
May 31, 2012, 8:26:22 AM5/31/12
to scala-i...@googlegroups.com
Very nice.

On Thu, May 31, 2012 at 1:46 PM, Adriaan Moors <adriaa...@gmail.com> wrote:
> We were just talking about this in a meeting, and Martin remarked that we
> should do this factoring
> as high up as possible in the hierarchy. It's also a nice opportunity
> to unify WeakIdentityHashMap and WeakHashMap
>
>
> On Sunday, May 20, 2012 6:49:13 PM UTC+2, √iktor Klang wrote:
>>
>> Hi Aleks,
>>
>> sounds like a good plan, looking forward to switching from CHM to TrieMap
>> in Akka 2.1 :-) (just need that Hasher)
>>
>> Cheers,
>> √
>>
>> On Sun, May 20, 2012 at 6:11 PM, Prokopec Aleksandar
>> <aleksanda...@epfl.ch> wrote:
>>>
>>> Hi!
>>>
>>> I personally see no reason why not.
>>> This is different than the current approach in mutable.HashTable, where
>>> an appropriate method should be overridden to achieve the same effect.
>>> But I think it's a better approach to have a custom Hasher object, due to
>>> the fact that calling `par` on an overridden hashtable would produce a
>>> parallel hash table which "forgets" the original hashing strategy. So, maybe
>>> we should do the same thing there.
>>>
>>> Cheers,
>>> Alex
>>>
>>>
>>> ________________________________________
>>> From: scala-i...@googlegroups.com [scala-i...@googlegroups.com]
>>> on behalf of √iktor Ҡlang [viktor...@gmail.com]
>>> Sent: 20 May 2012 18:06
>>> To: scala-i...@googlegroups.com
>>> Subject: [scala-internals] TrieMap & Hasher
>>>
>>> Hey,
>>>
>>> could we add the opportunity to specify a Hasher into TrieMap, so if I
>>> need to create an IdentityTrieMap or just want to improve the hash of a
>>> known bad implementation I can override the default which would be key.##?
>>>
>>> Cheers,
>>> √
>>>
>>> --
>>> Viktor Klang
>>>
>>> Akka Tech Lead
>>> Typesafe<http://www.typesafe.com/> - The software stack for applications
>>> that scale
>>>
>>> Twitter: @viktorklang
>>
>>
>>
>>
>> --
>> Viktor Klang
>>
>> Akka Tech Lead
>> Typesafe - The software stack for applications that scale
>>
>> Twitter: @viktorklang
>>
>



--
Jonas Bonér
CTO
Typesafe - The software stack for applications that scale
Phone: +46 733 777 123
Twitter: @jboner

Aleksandar Prokopec

unread,
May 31, 2012, 8:32:21 AM5/31/12
to scala-i...@googlegroups.com
I've implemented it here:

I can also move this further up, and put it on ordinary HashMaps.
However, a problem I see with it is that after you do a map, a Hasher is not propagated to the mapped HashMap/TrieMap.
This is a similar issue to Ordering not being propagated with have on ordered collections and priority queues.
Below the ordering is not preserved.

import collectioWelcome to Scala version 2.10.0-20120515-155010-c7699ebe45 (Java HotSpot(TM) Server VM, Java 1.7.0_03).
Type in expressions to have them evaluated.
Type :help for more information.

scala> import collection._
import collection._

scala> val pq = new mutable.PriorityQueue()(implicitly[Ordering[Int]].reverse)
pq: scala.collection.mutable.PriorityQueue[Int] = PriorityQueue()

scala> pq ++= List(1, 2, 3)
res1: pq.type = PriorityQueue(1, 2, 3)

scala> pq map { x => x} dequeueAll
warning: there were 1 feature warnings; re-run with -feature for details
res9: scala.collection.immutable.IndexedSeq[Int] = Vector(3, 2, 1)

Aleksandar Prokopec

unread,
May 31, 2012, 8:38:05 AM5/31/12
to scala-i...@googlegroups.com
A possible solution to this problem is to make newBuilder take a parameter collection so that it can extract the relevant information from it - such as a custom hasher, an ordering, load factor, initial capacity, a manifest/tag or other things one might desire.

Josh Suereth

unread,
May 31, 2012, 8:47:04 AM5/31/12
to scala-i...@googlegroups.com
That's not a bad idea.   Most of the builders already take a collection as a "size hint".  Maybe we just make that part mandatory and not size-hint related?

You still have the issue of *ordering* and *hashing* though.   You have to ensure the type parameters after mapping are exactly the same type, otherwise you wind up with an incompatible hasher.

- Josh

Aleksandar Prokopec

unread,
May 31, 2012, 8:50:20 AM5/31/12
to scala-i...@googlegroups.com
True, this only solves the problem there for those few transformers where the type parameter does not change.
Though, ideally, when we use a transformer which can change the collection type (i.e. it takes a CanBuildFrom), but the element type does not change, then ideally, we want to keep the same ordering/hashing.

Stefan Zeiger

unread,
May 31, 2012, 8:51:32 AM5/31/12
to scala-i...@googlegroups.com
On 2012-05-31 14:47, Josh Suereth wrote:
> That's not a bad idea. Most of the builders already take a
> collection as a "size hint". Maybe we just make that part mandatory
> and not size-hint related?

But you also need to be able to build a collection from scratch. Most
uses of builders outside of the actual collection implementations are
probably of that kind.

-sz

Josh Suereth

unread,
May 31, 2012, 9:19:42 AM5/31/12
to scala-i...@googlegroups.com
Yep.  I agree.  How to accomplish this though.....

Ismael Juma

unread,
May 31, 2012, 3:43:24 PM5/31/12
to scala-i...@googlegroups.com
On Thu, May 31, 2012 at 2:19 PM, Josh Suereth <joshua....@gmail.com> wrote:
Yep.  I agree.  How to accomplish this though.....

It is probably relevant to see what is being done in the Java side:

"The current proposal for Java 8:

- A new interface Hashable32 is introduced.
- Hashable32 provides a method hash32()
- String implements Hashable32 and hash32() method
- HashMap et al recognize String and invoke hash32() rather than hashCode()
- ** End of current implementation **
- When the mainline javac supports extensions methods the implementation will be extended.
- Add extension method to Hashable32.hash32() that calls hashCode()
- Object implements Hashable32  [controversial idea]
- HashMap et al unconditionally call hash32(). Other JDK code may also switch to call hash32() instead of hashCode()"


Best,
Ismael

Stefan Zeiger

unread,
May 31, 2012, 5:37:03 PM5/31/12
to scala-i...@googlegroups.com
And all this just to change String.hashCode without actually changing String.hashCode (because that would require a spec change)? Or is there some other reason for this seemingly pointless* shuffling around?

-sz

* at least if Object implements Hashable32

Ismael Juma

unread,
May 31, 2012, 7:01:36 PM5/31/12
to scala-i...@googlegroups.com
On Thu, May 31, 2012 at 10:37 PM, Stefan Zeiger <sze...@novocode.com> wrote:
And all this just to change String.hashCode without actually changing String.hashCode (because that would require a spec change)?

It's a bit unclear. On the one hand, it does look like that. At the same time, the initial plan is to only use the new hashCode for strings larger than a certain size as the old hashCode is supposedly better for smaller Strings.

Best,
Ismael

Aleksandar Prokopec

unread,
Jun 1, 2012, 5:02:07 AM6/1/12
to scala-i...@googlegroups.com, Josh Suereth
Not sure.
We can't do any checks at runtime, obviously, so this decision should be done statically.
Possibly by having a second CanBuildFrom implicit value somewhere which is only triggered if the source element type and target type are the same. This would require adding an additional type parameter to CanBuildFrom, however...
- Josh





Sent: 20 May 2012 18:06

Subject: [scala-internals] TrieMap & Hasher

Hey,

could we add the opportunity to specify a Hasher into TrieMap, so if I need to create an IdentityTrieMap or just want to improve the hash of a known bad implementation I can override the default which would be key.##?

Cheers,


--
Viktor Klang

Akka Tech Lead
Typesafe<http://www.typesafe.com/> - The software stack for applications that scale

Twitter: @viktorklang



--
Viktor Klang

Akka Tech Lead
Typesafe - The software stack for applications that scale

Twitter: @viktorklang




--
Viktor Klang

Akka Tech Lead
Typesafe - The software stack for applications that scale

Twitter: @viktorklang








-- 
Aleksandar Prokopec,
Doctoral Assistant
LAMP, IC, EPFL
http://people.epfl.ch/aleksandar.prokopec

Aleksandar Prokopec

unread,
Jun 1, 2012, 5:05:06 AM6/1/12
to scala-i...@googlegroups.com, Ismael Juma
I think that maybe we should have a Hashing typeclass, similar to Ordering typeclass, for collections which use custom hashing.
It would have a default value or a globally available implicit value which just uses `hashCode()`.

Stefan Zeiger

unread,
Jun 1, 2012, 5:23:41 AM6/1/12
to scala-i...@googlegroups.com
On 2012-06-01 11:05, Aleksandar Prokopec wrote:
I think that maybe we should have a Hashing typeclass, similar to Ordering typeclass, for collections which use custom hashing.
It would have a default value or a globally available implicit value which just uses `hashCode()`.

If we want to allow identity hashmaps (required to unify WeakHashMap and WeakIdentityHashMap), hashing is not enough. We also need equality.

-sz

Simon Ochsenreither

unread,
Jun 1, 2012, 6:02:46 AM6/1/12
to scala-i...@googlegroups.com
The idea to stick random interfaces to Object is just amazingly stupid. I wonder what they were thinking ... so next year we probably have “class Object implements Hashable32, Bug123423, SomeOtherStupidInterfaceJustBecauseWeCan, WorkAroundForFoo, SomeThingWeHaven'tThoughtAboutYet, ...”?

... and the decision to add yet another field to String is just bizarre. It looks like the left hand doesn't know what the rigt-hand does at Oracle.
The HotSpot team is implementing VM-level hacks to reduce the bloat of Strings and those guys just add another field to it? That's like ... brilliant.

Aleksandar Prokopec

unread,
Jun 1, 2012, 7:09:58 AM6/1/12
to scala-i...@googlegroups.com
True. Do you have a link to the current weakidentity hashmap implementation?
--
Aleksandar Prokopec
LAMP, IC, EPFL

Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Stefan Zeiger

unread,
Jun 1, 2012, 6:19:55 AM6/1/12
to scala-i...@googlegroups.com
On 2012-06-01 13:09, Aleksandar Prokopec wrote:
True. Do you have a link to the current weakidentity hashmap implementation?

"The" current implementation does not exists. There is one in SLICK that I wrote a couple of days ago (and posted here, in case there is any interest in including it in the standard library): https://github.com/slick/slick/commit/922db71c03676e153c5f3d02673bd815cd71ca67 . With some further tweaking that could possibly become the standard WeakHashMap implementation for Scala. (The current one won't work because it wraps a Java WeakHashMap - there is no way to change hashing and equality)

-sz

Aleksandar Prokopec

unread,
Jun 1, 2012, 10:30:49 AM6/1/12
to scala-i...@googlegroups.com, Stefan Zeiger
Here's the Equality and Hashing typeclasses prototypes:

https://github.com/axel22/scala-github/compare/feature/hasher

We should not use these on all the hash-related classes in 2.10, but if it works out ok, we can push them to all the maps in a later release.
Currently, only TrieMaps use them.
One thing I've been thinking about is whether Hashing should extend Equality, since two equal objects should produce two equal hashcodes.

Josh Suereth

unread,
Jun 1, 2012, 10:34:03 AM6/1/12
to scala-i...@googlegroups.com, Stefan Zeiger
I like.  I'd also think the === and !== method using the type-class would be fun too.  That's the JavaScript guy in me....

Aleksandar Prokopec

unread,
Jun 1, 2012, 11:05:21 AM6/1/12
to scala-i...@googlegroups.com, Josh Suereth, Stefan Zeiger
Sounds tempting.
Would you put these on the Equality companion object, so that they have to be imported? Or make them globally available?

Stefan Zeiger

unread,
Jun 1, 2012, 11:12:43 AM6/1/12
to scala-i...@googlegroups.com
Please don't make those global names. We're using === for equality in the SLICK query language, and I suppose many other DSLs do the same.

BTW, !== is a bad choice because it has the wrong precedence. SLICK is using using =!= instead. Witness:

scala> object x { def !== (x: Any) = true; def =!= (x: Any) = true }
defined module x

scala> x =!= x && x =!= x
res6: Boolean = true

scala> x !== x && x !== x
<console>:10: error: value && is not a member of object x
              x !== x && x !== x
                      ^

Aleksandar Prokopec

unread,
Jun 1, 2012, 11:49:35 AM6/1/12
to scala-i...@googlegroups.com, Josh Suereth, Stefan Zeiger
Here's a pull:

https://github.com/scala/scala/pull/652

We can additionally add some methods on the Equality companion object once we reach a consensus for this.


On 6/1/12 4:34 PM, Josh Suereth wrote:

Erik Osheim

unread,
Jun 1, 2012, 1:15:38 PM6/1/12
to scala-i...@googlegroups.com
On Fri, Jun 01, 2012 at 05:12:43PM +0200, Stefan Zeiger wrote:
> BTW, !== is a bad choice because it has the wrong precedence. SLICK
> is using using =!= instead. Witness:

Yeah, we got hit with this in Spire. We also use =!= now. :(

-- Erik

nafg

unread,
Jun 24, 2012, 3:36:46 PM6/24/12
to scala-i...@googlegroups.com

On Friday, June 1, 2012 11:12:43 AM UTC-4, Stefan Zeiger wrote:
Please don't make those global names. We're using === for equality in the SLICK query language, and I suppose many other DSLs do the same.

BTW, !== is a bad choice because it has the wrong precedence. SLICK is using using =!= instead. Witness:

Why is that? Aren't = and ! on the same line in the precedence table?

Erik Osheim

unread,
Jun 24, 2012, 4:18:40 PM6/24/12
to scala-i...@googlegroups.com
On Sun, Jun 24, 2012 at 12:36:46PM -0700, nafg wrote:
> Why is that? Aren't = and ! on the same line in the precedence table?

If you read the SLS closely you'll notice there is an exception to the
precedence hierarchy:

6.12.3 Infix Operations

The precedence of an infix operator is determined by the operator's
first character.

...

There's one exception to this rule, which concerns assignment
operators (6.12.4). The precedence of an assigment operator is the
same as the one of simple assignment (=). That is, it is lower than
the precedence of any other operator.

The definition of assignment operators follows:

6.12.4 Assignment Operators

An assignment operator is an operator symbol (syntax category op in
(1.1)) that ends in an equals character "=", with the exception of
operators for which one of the following conditions holds:

*the operator also starts with an equals character, or
*the operator is one of (<=), (>=), (!=).

As you can see, !== is interpreted as an assignment operator, thus
having the lowest possible precedence. This is why many of us choose to
use things like =!= or =/= instead.

-- Erik

Naftoli Gugenheim

unread,
Jun 24, 2012, 4:37:57 PM6/24/12
to scala-i...@googlegroups.com
Ah, not having read 6.12.4, I assumed equality would not be an assignment operator, although in retrospect there's no way the compiler could know that early.
var b = { def !(c: C) = null }
val a = b ! c
b != c
Reply all
Reply to author
Forward
0 new messages