hashmap keys

Jay Porcasi

unread,

Nov 10, 2017, 12:08:48 AM11/10/17

to Clojure

i would love to know in a bit more detail how Clojure manages to allow arbitrary data structures as keys for its immutable hashmaps

hashing an atomic value as a string or number is fast and i see how that would work when atomic values are used as keys

but how about using a big nested vector as a key? wouldn't hashing it take an unacceptable amount of time?

yet Clojure manages to do just that and i wonder how

thank you for any bit of clarification

Jay

Timothy Baldridge

unread,

Nov 10, 2017, 12:29:41 AM11/10/17

to clo...@googlegroups.com

Most Clojure collections cache their hashcode, so that improves things quite bit. Also, very large collections are rarely used as *keys* in other maps. Most of the time key collections are one or two values. This means that what we're really talking about is combining the hash values of a few values, and that's pretty fast.

So sure, the initial hash of one million symbols inside a vector may take a fair amount of time, but I've never seen that in the wild.

Timothy

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

“One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.”
(Robert Firth)

Jay Porcasi

unread,

Nov 12, 2017, 2:35:29 AM11/12/17

to Clojure

thank you very much for the clarification

so no magic there :-)

using big nested structures as keys is gonna slow down things a bit

but how caching can help?

Gary Verhaegen

unread,

Nov 12, 2017, 3:33:31 AM11/12/17

to clo...@googlegroups.com

Retrieving a value from a hashmap goes through the following steps:

1) Compute the hashcode of the supplied key.

2) Navigate through the map's internal tree structure to the bucket that should contain this hashcode (the complexity of this step depends on the number of different hashcodes stored in the map, not on the complexity of the key you're searching for).

3) If that bucket is empty, return the provided "missing" value (default: nil).

4) If the bucket is not empty, for each item in the bucket (all of them have the same hashcode):

5) Compare for pointer equality; if equal, return the associated value

6) Compare for true equality; if equal, return the associated value

7) Return the provided "missing" value.

So, yes, if you use complex, deeply nested keys _and_ you retrieve the associated value through using newly-created, value-equal instances of those keys, it will be a bit slow because you'll have to recompute the hashcode of the supplied key each time you build it again, and then you'll still need to traverse the supplied key and the existing copy in the hashmap for true equality.

In practice, it will most likely be much faster than the worst case because you will not rebuild the full key from scratch each time, and Clojure value equality is aware of structural sharing (i.e. if there is a subtree of the nested keys you're comparing that is actually equal, Clojure will not need to traverse that part). Due to the way Clojure works with immutable values, the most likely scenario is that you will have built that key once, and use the same key (same object in memory) in the hasmap as in the code that retrieves it; it's also likely that you will have just that one key for that hashcode value in the map. In that situation, in order to retrieve the key you just need one hashcode retrieval (it's been cached on first computation, when you added the key to the map), then one traversal to the correct bucket (that's typically a O(log32 N) depth), then one pointer equality check; all of that will be very fast and independent of the complexity of the key.

Note that the above assumes your keys are Clojure values; if they are arbitrary Java objects the equality comparison defaults to Java's .equals so it may be a bit different; also, most Java objects do not cache their hashcodes.

Jay Porcasi

unread,

Nov 12, 2017, 7:29:28 AM11/12/17

to Clojure

that was extremely instructive

thank you very much!

Reply all

Reply to author

Forward