Keyword comparison performance

973 views
Skip to first unread message

Jony Hudson

unread,
Oct 10, 2014, 6:23:10 PM10/10/14
to clo...@googlegroups.com
Hi All,

 I've been optimising a piece of code lately, and have come to wonder about the performance of keyword comparison. Specifically, I'm not sure whether the performance I'm seeing is what is expected. The data structures page on clojure.org [1] indicates that keywords "provide very fast equality tests". If I micro-benchmark with criterium, then I find the following:

As a baseline, comparing integers with `(= 0 1)` takes around 4ns.

Comparing keywords with `(= :plus :minus)` takes around 30ns.

This is about the same amount of time it takes to compare strings, `(= "plus" "minus")`, which comes in at about 25ns.

This surprised me, as I would have guessed that "fast" would have been closer to the integer performance than the string performance. It's worth saying that I don't know a lot about benchmarking, but I do have some "real" code that's performance depends heavily on comparisons, and it seems to line up performance-wise with these micro-benchmarks.

So, am I doing something silly (like I don't know about the fast = for keywords)? Or, are my expectations wrong, and this is about how long "fast" should be? Or is there a performance bug lurking?

I'm using Clojure 1.6.0 (but have tried 1.5.0 and 1.7.0-alpha1 with similar results).
x86_64 Mac OS X 10.9.5 4 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 25.5-b02

Thanks in advance for any input,


Jony

Stephen Gilardi

unread,
Oct 10, 2014, 8:17:02 PM10/10/14
to clo...@googlegroups.com
 I've been optimising a piece of code lately, and have come to wonder about the performance of keyword comparison. Specifically, I'm not sure whether the performance I'm seeing is what is expected. The data structures page on clojure.org [1] indicates that keywords "provide very fast equality tests". If I micro-benchmark with criterium, then I find the following:

As a baseline, comparing integers with `(= 0 1)` takes around 4ns.

Comparing keywords with `(= :plus :minus)` takes around 30ns.

I ran the same test and saw similar results.

I noticed that using “identical?” instead of “=“ brings the performance much closer to the integer case because “identical?" does less work.

In most cases, equal keywords are also identical because the keyword objects are cached by clojure.lang.Keyword/intern. If you don’t need to worry about the case of multiple Clojure runtimes being loaded by separate classloaders[1], and if you’re in control of the code doing the comparison, you could change to using “identical?" for better performance.

Treatment of keyword equality and identity also came up during the development of ClojureScript. I found this discussion interesting: https://groups.google.com/forum/#!topic/clojurescript/bSFK6CEE3PE .

—Steve


Mikera

unread,
Oct 11, 2014, 6:41:12 AM10/11/14
to clo...@googlegroups.com
I believe this is a symptom of the fact that the Clojure compiler isn't very type aware, and = inserts a bunch of redundant runtime type checks (see clojure.lang.Util.equiv(Object, Object) )

In particular, it always checks for whether the arguments are instances of Number or IPersistentCollection first, because these get special handling, before calling Keyword.equals(). Obviously, these checks are pointless if you already know one or both arguments is a keyword, but the compiler doesn't make that inference at compile time so we have to pay the runtime overhead. These type checks are pretty cheap, but certainly not free. It's also possible that the JVM isn't inlining as much as it could.

Mikera

unread,
Oct 11, 2014, 6:50:07 AM10/11/14
to clo...@googlegroups.com
Just done some of my own quick benchmarking on this.

(= :foo :bar)          ;; 21.7 ns
(.equals :foo :bar)    ;;  3.0 ns
(identical? :foo :bar) ;;  3.0 ns

(.equals "foo" "bar")  ;;  5.6 ns
(= "foo" "bar")        ;; 24.4 ns

This seems to support my suspicion that the overhead is in the extra overhead in clojure.core/=, rather than an intrinsic issue with keyword equality performance.

David Nolen

unread,
Oct 11, 2014, 9:25:47 AM10/11/14
to clo...@googlegroups.com
As already mentioned use identical? In ClojureScript you must use keyword-identical? for fast comparisons.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jony Hudson

unread,
Oct 11, 2014, 10:59:47 AM10/11/14
to clo...@googlegroups.com
Thanks all, that makes a lot of sense.

To summarise, if I understand correctly, (on the JVM) comparing with = :
- first checks if the objects are one and the same, returns true if so
- if not, checks whether they are numbers or persistent collections, which have special-case checks
- if neither of the above, delegate to the object's .equals method.
Keywords, as best I can tell, don't override Object/equals, so the test is the same as identical? in the end i.e. whether they are the same object.

It makes sense, then, that comparing keywords that are the same is fast (~4ns, same as an identical? comparison on my machine), as this returns on the first step above. And comparison of differing keywords is slower as it goes through all of the steps above, as noted by @Mikera.

I also notice that the code for core/case seems to suggest that if the cases are labelled by keyword, then an "identical" check will be used (although, I don't yet understand the compiler code well enough to verify that this is true in the implementation of case*).

I guess a question that remains is whether this behaviour of keywords, that identical? and = give the same result, is something that is guaranteed, or whether it should be viewed as an implementation detail? If it is a guarantee I wonder whether there should be something on the data structures page of clojure.org indicating this, and noting how to do fast keyword comparison?

Another question, which is more for my understanding, is given that is currently the case that identical? and = are equivalent for keywords (and some core code depends on this behaviour), why does it work? Looking at Keyword.java I see that weak references are stored to interned keywords in a table. But if any of these Keyword objects were garbage-collected, I think it would break the parity between identical? and =. So it must be the case that something else is holding on to strong references somewhere. Which makes me wonder why weak references are used in the table. Anyone familiar with the compiler able to shed light on this for my edification? :-)


Jony

Stephen Gilardi

unread,
Oct 11, 2014, 11:35:36 AM10/11/14
to clo...@googlegroups.com

On Oct 11, 2014, at 10:59 AM, Jony Hudson <jonye...@gmail.com> wrote:

But if any of these Keyword objects were garbage-collected, I think it would break the parity between identical? and =. 

The Keyword construction and interning mechanism ensures that whenever there exists at least once (strong) reference to a given Keyword object, all references will be to the same object.

A given Keyword object can only be garbage collected if it’s garbage—if there are no (strong) references to it currently in existence. That means it’s not a key or value in any map, it’s not in use by any compiled code, it’s not the value of any local, etc. Without any existing strong reference to it, no comparison involving that Keyword object is possible.

If at any point in time there are no strong references to it, it’s safe to un-intern the Keyword object. If a keyword of that name (and namespace) is needed subsequently, a new Keyword object can be created and interned and the promised identity and equality semantics will be satisfied from that point forward.

—Steve

Colin Fleming

unread,
Oct 11, 2014, 7:28:25 PM10/11/14
to clo...@googlegroups.com
Does this mean that keywords don't have any efficiency advantages over strings when used as map keys?

Andy Fingerhut

unread,
Oct 11, 2014, 7:59:41 PM10/11/14
to clo...@googlegroups.com
I haven't taken the time to verify these statements with microbenchmarks, but there are 2 reasons why keywords should have efficiency advantages over strings when used as hash map keys:

(a) For keywords, hash values are calculated and cached when the keyword is first constructed.  When looking up a keyword as a hash key, it simply returns the precomputed hashed value [1].  For strings, the hash is recalculated on every lookup [2].

(b) The common case after the hash value is calculated, if the key exists is to compare against a single candidate key in the hash map.  For string = between the same strings, but not identical, this is slower than identical?.  For = between the same keywords, they are guaranteed to be identical? and thus fast to compare for =.

For array map keys (for smaller maps, usually with at most 8 key/value pairs), I can't think of much of a reason that keys would be faster than strings, except for the final = comparison if/when a matching key is found.

[1] https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Keyword.java#L87
[2] https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Util.java#L171

David Nolen

unread,
Oct 11, 2014, 8:18:58 PM10/11/14
to clo...@googlegroups.com
Advantages exist, ClojureScript and Clojure both use the most efficient equality check during key lookup in maps.

Mike Rodriguez

unread,
Oct 11, 2014, 8:28:39 PM10/11/14
to clo...@googlegroups.com
To the point (b) it seems that this posts is saying the clj's = will not be faster for keyword than string since the runtime type checking overhead is where most time is spent. So the identity part of keyword equals doesn't show its benefit here (unless these were long strings vs long keywords I suspect, but this is uncommon).

The hash code calculation stuff would add performance value though.

I'm just saying I don't see how point (b) is adding performance gains in the context of what else I've read here.

If the compiler (doesn't seem always possible) or map impl's or something is smart enough to not use clj's basic = impl on keywords though and use its equals directly instead, that'd sound like a win.

Message has been deleted
Message has been deleted

Andy Fingerhut

unread,
Oct 11, 2014, 9:06:36 PM10/11/14
to clo...@googlegroups.com
The common case in hash map lookup for keywords _that finds a matching key in the map_ is:

(a) hash the keyword (fast, because it is cached)
(b) use the hash code value to walk one or more nodes in a tree to reach a leaf, where there can be a single value, or a linked list of multiple colliding values (the colliding value case should be quite rare in Clojure 1.6.0 and later)
(c) now compare the searched-for keyword against the key in the map, which by the assumption in the first sentence is =.  If two keywords are =, then they are also identical, and so = takes the fastest possible path of returning true because they are identical.

The common case for looking up a keyword that does not find a matching key in the map is (a), then (b), then falling out of the tree and not comparing against any other value, so even faster than the above.

Andy


Mike Rodriguez

unread,
Oct 11, 2014, 9:28:22 PM10/11/14
to clo...@googlegroups.com
Thanks for taking the time for the (detailed) clarification. I understand what you were saying now.
:)

Alex Miller

unread,
Oct 13, 2014, 4:53:41 PM10/13/14
to clo...@googlegroups.com
Not really chiming in directly on this, but one thing I wanted to note is that getting meaningful performance numbers for keyword-related code is somewhat tricky due to caching and interning.

Keywords are cached so new keyword construction is much slower than existing keyword construction (and there are GC and concurrency-related consequences). Keywords hold symbols that hold strings for namespace and name. Those strings are (until Clojure 1.7) interned in the JVM. That has its own consequences for string construction (interning itself is slow) however hash codes are cached on Strings. As of 1.7, the strings in Symbols are not interned (which is faster for construction, but uses more memory). 

It is challenging to account for all of these factors when benchmarking in ways that actually represent real applications. Testing your own actual application performance with different strategies and versions is of course, the best way to understand how it actually affects you. 
Reply all
Reply to author
Forward
0 new messages