hash equality between Clojure and ClojureScript

347 views
Skip to first unread message

Peter Taoussanis

unread,
Apr 20, 2015, 12:03:08 AM4/20/15
to clojur...@googlegroups.com
Hi there,

Am running Clojure 1.7.0-beta1, ClojureScript 0.0-3196.

Just noticed:
(hash 1) ; 1, ClojureScript
(hash 1 ) ; 1392991556, Clojure
(.hashCode 1) ; 1, Clojure

i.e. numeric hashes aren't consistent between Clojure and ClojureScript.

I'm assuming that's intentional?

This got me wondering: is there an official contract somewhere describing hash behaviour similarities we _can_ safely depend on?

Keywords, strings, and collections of these seem to produce matching hashes (?) - but is that dependable behaviour or subject to change?

Thanks a lot , cheers! :-)

Herwig Hochleitner

unread,
Apr 20, 2015, 9:00:48 AM4/20/15
to clojurescript
2015-04-20 6:03 GMT+02:00 Peter Taoussanis <ptaou...@gmail.com>:
This got me wondering: is there an official contract somewhere describing hash behaviour similarities we _can_ safely depend on?

I don't think there is. IIRC clojurescript _has_ duplicated the recent work to minimize collisions in collection hashing and I think that a ticket unifying numeric hashes _might_ as well be accepted, if the performance impact is not too high.
However, the only real requirement is that a hash equality in either CLJ or CLJS implies a hash equality in the other. There is value in being able to tweak the trade-off between performance, hash distribution and conformance to the host platform, so I don't expect hash algorithms to be standardized.

When you want reproducability between runtimes, the right way (tm) is to use cryptographic content-based hashing. See https://github.com/ghubber/hasch for an implementation that might suit your needs.

Francis Avila

unread,
Apr 20, 2015, 2:02:23 PM4/20/15
to clojur...@googlegroups.com
There's no contract, but strings, keywords, and symbols should hash the same, and collections of these (vectors, lists, maps, sets) should hash the same.

It's difficult to hash numbers the same between Clojure and Clojurescript.

Clojurescript numbers are all doubles (because JS), so they should in theory hash the same as Clojure doubles. Clojure hashes doubles using Java's Double.hashCode(), which relies on knowing the exact bits of the double. These bits are not available in Javascript (at least not easily or without using typedarrays). I'm also not sure if this particular implementation of hashCode is part of the Java spec (i.e. implemented the same by all JDKs and JVMs.)

Of course in practice the same clojure form (as read) will not hash the same in clj and cljs because clj uses longs most of the time.

You could take a hybrid approach where integers in cljs are hashed like longs in Clojure. This works up to 52 bits, but longs with more bits than that are not representable in Clojurescript. This is an approach I was pursuing in my murmur3 hashing implementation for cljs:

http://dev.clojure.org/jira/browse/CLJS-754
https://github.com/favila/clojurescript/blob/murmur3/src/cljs/cljs/core.cljs#L1111
https://github.com/favila/clojurescript/blob/murmur3/src/cljs/cljs/murmur3.cljs#L61


In practice you will hash the same most of the time if you most deal with integer numbers, but any doubles, bigdecimals, or large integers will still hash differently.

This is what happens in clojurescript now:

(js-mod (Math/floor o) 2147483647)

Peter Taoussanis

unread,
Apr 21, 2015, 3:03:50 AM4/21/15
to clojur...@googlegroups.com
Thanks Herwig, Francis - appreciate the assistance!

Christian Weilbach

unread,
Apr 21, 2015, 5:54:56 AM4/21/15
to clojur...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,

I am the author of hasch (1) for cryptographic cross-platform hashing,
which Herwig mentioned. The numeric types were a problem for me, too,
as Francis describes. I guess that all the corner cases are difficult
to catch, so I decided to treat all numbers like doubles in edn data.
Still this is fragile due to floating point arithmetic differences to
JVM integer arithmetic... JavaScript is really bad for numerical tasks
sadly and I don't see a lightweight work-around. So once you
numerically calculate the same thing on both runtimes and expect the
hashed results to be the same, there could be trouble.

Additionally there is no Character type in JavaScript, so you have to
treat them like Strings as well. Normally different types with the
same content should hash differently, except for seqs and vectors, to
my current understanding.

My implementation is just a recursive hashing scheme protocol, which
allows to swap the hash function, so if you want something more
lightweight than sha512, you could also use a cross-platform murmur
implementation with hasch maybe. Maps and Sets are hashed elementwise
and XORed afterwards, which might cause performance problems in your
case. I haven't found a cheaper way to ensure against malicious
collisions.

Cheers,
Christian

(1) https://github.com/ghubber/hasch

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJVNh5tAAoJEKel+aujRZMk7mcH/05CV5ACKIZDJTEE7X38xG98
Sgxqc81JNKH5NFgcrkTfjqV2SlGROt7mxt3pVUEuLoiyE5U3CzeHBge0wpiBVmsB
eougHX43a5EL2qJWYhJTlORBXidxiT5uWdonyKH7AVNcURcM0I6P0BH92NPgq4hE
q8vQLdEJeqWoeHllLHb+te2sfmSArEESgNSMStgDZQF+J7ODJCKzako49UCtJxg1
FVjsHMK2+d50X9yTw4NP644PKssH3GdPBiqmyr0cx20jtpvn5x+q+xU5XaYbA0L4
Uk1FT38L4FoSRVzBMZfb0FsmF7WxdQOgKuQc3tVeLgJhcWSOjJ2js5Sd8gwMQgM=
=lywe
-----END PGP SIGNATURE-----
Reply all
Reply to author
Forward
0 new messages