hash equality between Clojure and ClojureScript

Skip to first unread message

Peter Taoussanis

Apr 20, 2015, 12:03:08 AM4/20/15
to clojur...@googlegroups.com
Hi there,

Am running Clojure 1.7.0-beta1, ClojureScript 0.0-3196.

Just noticed:
(hash 1) ; 1, ClojureScript
(hash 1 ) ; 1392991556, Clojure
(.hashCode 1) ; 1, Clojure

i.e. numeric hashes aren't consistent between Clojure and ClojureScript.

I'm assuming that's intentional?

This got me wondering: is there an official contract somewhere describing hash behaviour similarities we _can_ safely depend on?

Keywords, strings, and collections of these seem to produce matching hashes (?) - but is that dependable behaviour or subject to change?

Thanks a lot , cheers! :-)

Herwig Hochleitner

Apr 20, 2015, 9:00:48 AM4/20/15
to clojurescript
2015-04-20 6:03 GMT+02:00 Peter Taoussanis <ptaou...@gmail.com>:
This got me wondering: is there an official contract somewhere describing hash behaviour similarities we _can_ safely depend on?

I don't think there is. IIRC clojurescript _has_ duplicated the recent work to minimize collisions in collection hashing and I think that a ticket unifying numeric hashes _might_ as well be accepted, if the performance impact is not too high.
However, the only real requirement is that a hash equality in either CLJ or CLJS implies a hash equality in the other. There is value in being able to tweak the trade-off between performance, hash distribution and conformance to the host platform, so I don't expect hash algorithms to be standardized.

When you want reproducability between runtimes, the right way (tm) is to use cryptographic content-based hashing. See https://github.com/ghubber/hasch for an implementation that might suit your needs.

Francis Avila

Apr 20, 2015, 2:02:23 PM4/20/15
to clojur...@googlegroups.com
There's no contract, but strings, keywords, and symbols should hash the same, and collections of these (vectors, lists, maps, sets) should hash the same.

It's difficult to hash numbers the same between Clojure and Clojurescript.

Clojurescript numbers are all doubles (because JS), so they should in theory hash the same as Clojure doubles. Clojure hashes doubles using Java's Double.hashCode(), which relies on knowing the exact bits of the double. These bits are not available in Javascript (at least not easily or without using typedarrays). I'm also not sure if this particular implementation of hashCode is part of the Java spec (i.e. implemented the same by all JDKs and JVMs.)

Of course in practice the same clojure form (as read) will not hash the same in clj and cljs because clj uses longs most of the time.

You could take a hybrid approach where integers in cljs are hashed like longs in Clojure. This works up to 52 bits, but longs with more bits than that are not representable in Clojurescript. This is an approach I was pursuing in my murmur3 hashing implementation for cljs:


In practice you will hash the same most of the time if you most deal with integer numbers, but any doubles, bigdecimals, or large integers will still hash differently.

This is what happens in clojurescript now:

(js-mod (Math/floor o) 2147483647)

Peter Taoussanis

Apr 21, 2015, 3:03:50 AM4/21/15
to clojur...@googlegroups.com
Thanks Herwig, Francis - appreciate the assistance!

Christian Weilbach

Apr 21, 2015, 5:54:56 AM4/21/15
to clojur...@googlegroups.com
Hash: SHA1

I am the author of hasch (1) for cryptographic cross-platform hashing,
which Herwig mentioned. The numeric types were a problem for me, too,
as Francis describes. I guess that all the corner cases are difficult
to catch, so I decided to treat all numbers like doubles in edn data.
Still this is fragile due to floating point arithmetic differences to
JVM integer arithmetic... JavaScript is really bad for numerical tasks
sadly and I don't see a lightweight work-around. So once you
numerically calculate the same thing on both runtimes and expect the
hashed results to be the same, there could be trouble.

Additionally there is no Character type in JavaScript, so you have to
treat them like Strings as well. Normally different types with the
same content should hash differently, except for seqs and vectors, to
my current understanding.

My implementation is just a recursive hashing scheme protocol, which
allows to swap the hash function, so if you want something more
lightweight than sha512, you could also use a cross-platform murmur
implementation with hasch maybe. Maps and Sets are hashed elementwise
and XORed afterwards, which might cause performance problems in your
case. I haven't found a cheaper way to ensure against malicious


(1) https://github.com/ghubber/hasch

Version: GnuPG v1

Reply all
Reply to author
0 new messages