hash-map based on identity

Mark Engelberg

unread,

Mar 7, 2009, 1:49:55 AM3/7/09

to clo...@googlegroups.com

Is there a variation of hash-map which supports comparison of keys
using identical? rather than = ? Ditto with sets.

Jeffrey Straszheim

unread,

Mar 7, 2009, 9:31:47 AM3/7/09

to clo...@googlegroups.com

java.util.IdentityHashMap

David Powell

unread,

Mar 7, 2009, 10:57:41 AM3/7/09

to Mark Engelberg, clo...@googlegroups.com

> Is there a variation of hash-map which supports comparison of keys
> using identical? rather than = ? Ditto with sets.

There is java.util.IdentityHashMap, but it is mutable.

What objects are you trying to store? The ideal in the Clojure world,
is that most objects are immutable.

Immutable objects will implement value equality in .equals() ('=' in
Clojure), and you shouldn't care about whether two objects are
identical? because it won't affect anything.

Mutable objects will implement .equals() the same as identical?, so
you shouldn't need to use identical? because .equals() ('=') will do
the same thing.

Of course, if you are using Java objects from some existing code, they
might not implement equality in the ideal way described above.

--
Dave

Mark Engelberg

unread,

Mar 7, 2009, 3:44:15 PM3/7/09

to David Powell, clo...@googlegroups.com

Here's why it would be useful to have the option of hash maps that use
identity...

Let's say my keys are rather long lists. Equality comparison will be
slow. But if I know that the keys are always the exact same long
lists (perhaps because I traversed the key sequence to find a key with
a certain property, and then am updating with that precise key as one
example), then I can save lots of time by using an identity
comparison.

FWIW, the Scheme I usually use (PLT Scheme) supports both eq? and
equal? based hash tables, and I have found both to be useful (although
I prefer equal? based hash tables as the default).

On Sat, Mar 7, 2009 at 7:57 AM, David Powell <djpo...@djpowell.net> wrote:
> What objects are you trying to store? The ideal in the Clojure world,
> is that most objects are immutable.
>
> Immutable objects will implement value equality in .equals() ('=' in
> Clojure), and you shouldn't care about whether two objects are
> identical? because it won't affect anything.

It's an efficiency issue.

David Nolen

unread,

Mar 7, 2009, 4:58:02 PM3/7/09

to clo...@googlegroups.com

Depending on what you are doing perhaps you could use (hash x). This might not be very helpful if you need to update keys frequently. However if you're mostly doing lookups, then this would help quite a bit.

Rich Hickey

unread,

Mar 7, 2009, 5:57:18 PM3/7/09

to Clojure

On Mar 7, 3:44 pm, Mark Engelberg <mark.engelb...@gmail.com> wrote:
> Here's why it would be useful to have the option of hash maps that use
> identity...
>
> Let's say my keys are rather long lists. Equality comparison will be
> slow. But if I know that the keys are always the exact same long
> lists (perhaps because I traversed the key sequence to find a key with
> a certain property, and then am updating with that precise key as one
> example), then I can save lots of time by using an identity
> comparison.
>

Identity is tested first in equality, if identical, equal, full stop.
So if you are using identical and unique collections as keys you'll
find them without a value-by-value comparison. If they are not
present, it's unlikely you'll get a matching hashCode and a matching
count and a matching long prefix from a mismatch.

> It's an efficiency issue.

I don't think so. Identity keys are most often needed e.g. when you
are trying to save/restore object graphs and need to know that you've
already seen a particular object perhaps equal to another non-
identical object in the same graph, or otherwise tracking object
identity for synchronization with an external system. Using
IdentityHashMap as an implementation detail in these situations is
perfectly fine. I don't see a compelling argument for clojure to offer
identity-based collections.

Rich

Mark Engelberg

unread,

Mar 7, 2009, 8:30:58 PM3/7/09

to clo...@googlegroups.com

On Sat, Mar 7, 2009 at 2:57 PM, Rich Hickey <richh...@gmail.com> wrote:
> Identity is tested first in equality, if identical, equal, full stop.
> So if you are using identical and unique collections as keys you'll
> find them without a value-by-value comparison. If they are not
> present, it's unlikely you'll get a matching hashCode and a matching
> count and a matching long prefix from a mismatch.

But if the key is a long list, wouldn't it have to traverse the list
just to come up with the hash code for that list? And doesn't it need
that hash code just to know which "bin" to look in for matching keys?
If so, then the performance hit occurs before you even get to
comparing against other keys.

My understanding is that eq?-based associative collections use a
completely different hashing scheme, hashing on the address or some
other unique/stable value associated with each object, and then use
eq? (like identical?) to do the actual comparison when they get to the
right "bin".

Another use case: Consider sorting a collection of lengthy lists, and
you want to sort based on the length of the lists. You want to cache
the length of those lists. Using an identity-based hash-map would be
the right kind of cache for such a thing, because you're looking up
the exact same lists and want to be efficient about it (assuming my
comments above about hash performance is correct).

Rich Hickey

unread,

Mar 7, 2009, 9:03:39 PM3/7/09

to clo...@googlegroups.com

On Mar 7, 2009, at 8:30 PM, Mark Engelberg wrote:

>
> On Sat, Mar 7, 2009 at 2:57 PM, Rich Hickey <richh...@gmail.com>
> wrote:
>> Identity is tested first in equality, if identical, equal, full stop.
>> So if you are using identical and unique collections as keys you'll
>> find them without a value-by-value comparison. If they are not
>> present, it's unlikely you'll get a matching hashCode and a matching
>> count and a matching long prefix from a mismatch.
>
> But if the key is a long list, wouldn't it have to traverse the list
> just to come up with the hash code for that list? And doesn't it need
> that hash code just to know which "bin" to look in for matching keys?
> If so, then the performance hit occurs before you even get to
> comparing against other keys.

The core collection classes cache their hash codes. There is an issue
to have them create them incrementally, amortizing the cost over all
insertions, which will work for all but lists, which have a front-to-
back hash algorithm but are built back-to-front.

http://code.google.com/p/clojure/issues/detail?id=11

>
> My understanding is that eq?-based associative collections use a
> completely different hashing scheme, hashing on the address or some
> other unique/stable value associated with each object, and then use
> eq? (like identical?) to do the actual comparison when they get to the
> right "bin".
>

Yes, so? It's still not an argument for another collection family,
which will only engender confusion as people make similar incorrect
assumptions about performance and equality.

> Another use case: Consider sorting a collection of lengthy lists, and
> you want to sort based on the length of the lists. You want to cache
> the length of those lists. Using an identity-based hash-map would be
> the right kind of cache for such a thing, because you're looking up
> the exact same lists and want to be efficient about it (assuming my
> comments above about hash performance is correct).
>

This argument only applies if you never want to calculate the hash
code by a walk, even once. I think that's a false economy, and again,
not persuasive. With incremental hashcode calculation, even less so.

Clojure has other data structures that address some of the
deficiencies of lists, but even PersistentList is Counted, i.e.
provides its count in constant time, as do maps, sets and vectors,
array, vector and string seqs etc.

Rich

Stephen C. Gilardi

unread,

Mar 7, 2009, 9:13:11 PM3/7/09

to clo...@googlegroups.com

On Mar 7, 2009, at 8:30 PM, Mark Engelberg wrote:

> But if the key is a long list, wouldn't it have to traverse the list
> just to come up with the hash code for that list?

Yes, but only once. Since Clojure lists are immutable, their hash code
is calculated once and cached and the cache will never become invalid.

--Steve

David Powell

unread,

Mar 8, 2009, 6:17:29 AM3/8/09

to Rich Hickey, Clojure

> Identity is tested first in equality, if identical, equal, full stop.

That's what I'd assumed (it's what the JDK collections do), but
looking at the code, to say, APersistentVectory, I can't see where the
identity test is done? Am I looking in the wrong place?

--
Dave

Rich Hickey

unread,

Mar 8, 2009, 10:28:45 PM3/8/09

to Clojure

Clojure code routes through clojure.lang.Util.equiv()/equals() which
do the check, but each collection should as well, for use outside of
Clojure.

Issue/patch welcome.

Rich

Chouser

unread,

Mar 9, 2009, 10:09:32 AM3/9/09

to clo...@googlegroups.com

I've created an issue, just so we don't lose track of this, but I'm
not writing a patch at the moment. This is probably some nice
low-hanging fruit for someone who has sent in their CA and is
interested in starting to get patches into Clojure itself: