There is java.util.IdentityHashMap, but it is mutable.
What objects are you trying to store? The ideal in the Clojure world,
is that most objects are immutable.
Immutable objects will implement value equality in .equals() ('=' in
Clojure), and you shouldn't care about whether two objects are
identical? because it won't affect anything.
Mutable objects will implement .equals() the same as identical?, so
you shouldn't need to use identical? because .equals() ('=') will do
the same thing.
Of course, if you are using Java objects from some existing code, they
might not implement equality in the ideal way described above.
--
Dave
Let's say my keys are rather long lists. Equality comparison will be
slow. But if I know that the keys are always the exact same long
lists (perhaps because I traversed the key sequence to find a key with
a certain property, and then am updating with that precise key as one
example), then I can save lots of time by using an identity
comparison.
FWIW, the Scheme I usually use (PLT Scheme) supports both eq? and
equal? based hash tables, and I have found both to be useful (although
I prefer equal? based hash tables as the default).
On Sat, Mar 7, 2009 at 7:57 AM, David Powell <djpo...@djpowell.net> wrote:
> What objects are you trying to store? The ideal in the Clojure world,
> is that most objects are immutable.
>
> Immutable objects will implement value equality in .equals() ('=' in
> Clojure), and you shouldn't care about whether two objects are
> identical? because it won't affect anything.
It's an efficiency issue.
But if the key is a long list, wouldn't it have to traverse the list
just to come up with the hash code for that list? And doesn't it need
that hash code just to know which "bin" to look in for matching keys?
If so, then the performance hit occurs before you even get to
comparing against other keys.
My understanding is that eq?-based associative collections use a
completely different hashing scheme, hashing on the address or some
other unique/stable value associated with each object, and then use
eq? (like identical?) to do the actual comparison when they get to the
right "bin".
Another use case: Consider sorting a collection of lengthy lists, and
you want to sort based on the length of the lists. You want to cache
the length of those lists. Using an identity-based hash-map would be
the right kind of cache for such a thing, because you're looking up
the exact same lists and want to be efficient about it (assuming my
comments above about hash performance is correct).
>
> On Sat, Mar 7, 2009 at 2:57 PM, Rich Hickey <richh...@gmail.com>
> wrote:
>> Identity is tested first in equality, if identical, equal, full stop.
>> So if you are using identical and unique collections as keys you'll
>> find them without a value-by-value comparison. If they are not
>> present, it's unlikely you'll get a matching hashCode and a matching
>> count and a matching long prefix from a mismatch.
>
> But if the key is a long list, wouldn't it have to traverse the list
> just to come up with the hash code for that list? And doesn't it need
> that hash code just to know which "bin" to look in for matching keys?
> If so, then the performance hit occurs before you even get to
> comparing against other keys.
The core collection classes cache their hash codes. There is an issue
to have them create them incrementally, amortizing the cost over all
insertions, which will work for all but lists, which have a front-to-
back hash algorithm but are built back-to-front.
http://code.google.com/p/clojure/issues/detail?id=11
>
> My understanding is that eq?-based associative collections use a
> completely different hashing scheme, hashing on the address or some
> other unique/stable value associated with each object, and then use
> eq? (like identical?) to do the actual comparison when they get to the
> right "bin".
>
Yes, so? It's still not an argument for another collection family,
which will only engender confusion as people make similar incorrect
assumptions about performance and equality.
> Another use case: Consider sorting a collection of lengthy lists, and
> you want to sort based on the length of the lists. You want to cache
> the length of those lists. Using an identity-based hash-map would be
> the right kind of cache for such a thing, because you're looking up
> the exact same lists and want to be efficient about it (assuming my
> comments above about hash performance is correct).
>
This argument only applies if you never want to calculate the hash
code by a walk, even once. I think that's a false economy, and again,
not persuasive. With incremental hashcode calculation, even less so.
Clojure has other data structures that address some of the
deficiencies of lists, but even PersistentList is Counted, i.e.
provides its count in constant time, as do maps, sets and vectors,
array, vector and string seqs etc.
Rich
> But if the key is a long list, wouldn't it have to traverse the list
> just to come up with the hash code for that list?
Yes, but only once. Since Clojure lists are immutable, their hash code
is calculated once and cached and the cache will never become invalid.
--Steve
That's what I'd assumed (it's what the JDK collections do), but
looking at the code, to say, APersistentVectory, I can't see where the
identity test is done? Am I looking in the wrong place?
--
Dave
I've created an issue, just so we don't lose track of this, but I'm
not writing a patch at the moment. This is probably some nice
low-hanging fruit for someone who has sent in their CA and is
interested in starting to get patches into Clojure itself:
http://code.google.com/p/clojure/issues/detail?id=92
--Chouser