Expected inconsistency between set and map w/ ArrayList?

314 views
Skip to first unread message

John D. Hume

unread,
May 9, 2014, 6:06:19 PM5/9/14
to clo...@googlegroups.com
Is this behavioral change in Clojure 1.6.0 expected? Under 1.6.0, a set and a map seem to treat a java.util.ArrayList differently with respect to its equivalence to a vector.
https://gist.github.com/duelinmarkers/7c9f84cfc238e5d37a09 

user=> (-> {} (assoc [1 2] "vec") (assoc (java.util.ArrayList. [1 2]) "alist"))
{[1 2] "alist"}
user=> (-> #{} (conj [1 2]) (conj (java.util.ArrayList. [1 2])))
#{[1 2]}
user=> *clojure-version*
{:major 1, :minor 5, :incremental 1, :qualifier nil}
 
;;;;;;;;;;;;;;;;;;;;;
 
user> (-> {} (assoc [1 2] "vec") (assoc (java.util.ArrayList. [1 2]) "alist"))
{[1 2] "alist"}
user> (-> #{} (conj [1 2]) (conj (java.util.ArrayList. [1 2])))
#{[1 2] [1 2]}
user> *clojure-version*
{:major 1, :minor 6, :incremental 0, :qualifier nil}

Mike Fikes

unread,
May 9, 2014, 6:18:04 PM5/9/14
to clo...@googlegroups.com
Perhaps this is a consequence of the hashing changes in 1.6. (http://dev.clojure.org/display/design/Better+hashing)

Andy Fingerhut

unread,
May 9, 2014, 7:34:26 PM5/9/14
to clo...@googlegroups.com
Mike is correct.  This change in behavior is due to the hashing changes in Clojure 1.6.  See the comments on ticket http://dev.clojure.org/jira/browse/CLJ-1372 for some discussion of whether this is considered a bug.  It appears that perhaps the hash consistency is not promised for mutable objects, only immutable ones.

Below are a couple more REPL transcripts that illustrate the change in behavior:


user=> *clojure-version*
{:major 1, :minor 5, :incremental 1, :qualifier nil}
user=> (= [1 2] (java.util.ArrayList. [1 2]))
true
user=> (map hash [ [1 2] (java.util.ArrayList. [1 2]) ])
(994 994)
user=> (hash-map [1 2] "vec" (java.util.ArrayList. [1 2]) "alist")
{[1 2] "alist"}



user=> *clojure-version*

{:major 1, :minor 6, :incremental 0, :qualifier nil}
user=> (= [1 2] (java.util.ArrayList. [1 2]))
true
user=> (map hash [ [1 2] (java.util.ArrayList. [1 2]) ])
(156247261 994)
user=> (hash-map [1 2] "vec" (java.util.ArrayList. [1 2]) "alist")
{[1 2] "alist", [1 2] "vec"}


Andy


On Fri, May 9, 2014 at 3:18 PM, Mike Fikes <mike...@me.com> wrote:
Perhaps this is a consequence of the hashing changes in 1.6. (http://dev.clojure.org/display/design/Better+hashing)

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Hume

unread,
May 10, 2014, 10:51:24 AM5/10/14
to clo...@googlegroups.com
Thanks for the ticket pointer. I didn't find it in my initial search because it "Affects Version/s: None" :-/, and I knew this behavior was new to 1.6.

For those not up for the medium-length comment-thread read: non-Clojure-aware Java Collections Framework interface instances are not considered values,* and 'Rich says "all bets should be off for hasheq/equiv of non-values."' That's a bit down in the details, but I think the upshot is that they're not currently supported as set members or hash keys. Unfortunately they used to be supported (unintentionally, I guess) due to Clojure following Java's lead, so I have some interop-heavy code to review and test very carefully. Though the above seems to frame non-support as based on principle, it seems the driver is keeping the new hasheq stuff highly performant. I would hope a performant implementation that supported the Collections interfaces would be accepted for some future version.

*If there is one, I'd like to read a clear definition of what "values" means in Clojure. The headings on http://clojure.org/data_structures are almost an enumeration of them, except the "Collections" section talks a little about Java Collections classes, hashCode, and hasheq without making it clear that non-Clojure collections are not values. Unfortunately the current definition seems to admit only the Java primitives and near-primitives (String and primitive wrappers) and clojure-aware types.

What "not supported" looks like via maps:
user> *clojure-version*
{:major 1, :minor 6, :incremental 0, :qualifier nil}
user> (hash-map [1 2] "v" (java.util.ArrayList. [1 2]) "a")
{[1 2] "a", [1 2] "v"}
user> {[1 2] "v" (java.util.ArrayList. [1 2]) "a"}
IllegalArgumentException Duplicate key: [1 2]  clojure.lang.PersistentArrayMap.createWithCheck (PersistentArrayMap.java:70)
user> (-> {} (assoc [1 2] "v") (assoc (java.util.ArrayList. [1 2]) "a"))
{[1 2] "a"}

Andy Fingerhut

unread,
May 10, 2014, 3:36:38 PM5/10/14
to clo...@googlegroups.com
On Sat, May 10, 2014 at 7:51 AM, John Hume <duelin....@gmail.com> wrote:
Thanks for the ticket pointer. I didn't find it in my initial search because it "Affects Version/s: None" :-/, and I knew this behavior was new to 1.6.

I changed Affects Version/s to Release 1.6

Whether that field is filled in for any particular ticket depends on the choice of the person creating the ticket, in most cases, although occasionally others will edit that field after that.

Andy

Mikera

unread,
May 11, 2014, 3:06:15 PM5/11/14
to clo...@googlegroups.com, cloju...@googlegroups.com
OK..... this thread is a bit worrying. If I understand correctly, it means that we've now got inconsistent hash and equals functions. I suspect this hasn't bitten many people yet simply because it is unusual to mix Java and Clojure collections as keys in the same structure, but it seems like a really nasty issue.

Hash and equality functions generally have a very simple rule: if two things are equal they must have the same hashcode. But now we've got:

(= (java.util.ArrayList. [1 2]) [1 2])
=> true

(hash (java.util.ArrayList. [1 2]))
=> 994

(hash [1 2])
=> 156247261

I consider this a major defect. If we want to define our own equals and hashcode that's fine, but then these functions absolutely should be consistent with each other.

If it is really true that non-Clojure collections aren't considered as values and aren't expected to participate in hash/= comparisons then I'd expect an exception, not an erroneous value. Silent potential corruption of data is *much* worse than a noisy failure. But I think even that is a bad direction to go: easy Java ecosystem interop is definitely one of the main reasons why I'm using Clojure, and I'd like to see this maintained.

I can think of only two sensible ways to fix this inconsistency:
a) Roll back the hash changes. Correctness is much more important than a performance edge case. (I suspect the hash changes have hurt average case performance anyway...)
b) Make (= (java.util.ArrayList. [1 2]) [1 2]) and similar comparisons evaluate to false (on the grounds that ArrayLists aren't Clojure collections, so should not be considered equal).

Luc Prefontaine

unread,
May 11, 2014, 4:28:48 PM5/11/14
to clo...@googlegroups.com
Euh....
From the top of my head after a long working day,
aren't we comparing apples with cabbages here ?

How can you expect a mutable
structure to have any stability
over time ?

They're not values and even worse
they are alien to Clojure.

To how much contortions should
go through to deal with this issue over
all the implementations while
retaining some common semantic ?

I have a pretty rigid principle
of keeping interop stuff away from
the core of my code.

Interop is needed but at the edges.

My answer to such an issue is that you need to
convert alien structures before
getting them to participate on the
Clojure side of the fence.

But I may be old fashion...

Luc P.
> >> http://dev.clojure.org/jira/browse/CLJ-1372<http://www.google.com/url?q=http%3A%2F%2Fdev.clojure.org%2Fjira%2Fbrowse%2FCLJ-1372&sa=D&sntz=1&usg=AFQjCNG6I-rrRfAMsUZUomYEbGOJkJqWGA>for some discussion of whether this is considered a bug. It appears that
> >> perhaps the hash consistency is not promised for mutable objects, only
> >> immutable ones.
> >>
> >> Below are a couple more REPL transcripts that illustrate the change in
> >> behavior:
> >>
> >> user=> *clojure-version*
> >> {:major 1, :minor 5, :incremental 1, :qualifier nil}
> >> user=> (= [1 2] (java.util.ArrayList. [1 2]))
> >> true
> >> user=> (map hash [ [1 2] (java.util.ArrayList. [1 2]) ])
> >> (994 994)
> >> user=> (hash-map [1 2] "vec" (java.util.ArrayList. [1 2]) "alist")
> >> {[1 2] "alist"}
> >>
> >>
> >>
> >> user=> *clojure-version*
> >> {:major 1, :minor 6, :incremental 0, :qualifier nil}
> >> user=> (= [1 2] (java.util.ArrayList. [1 2]))
> >> true
> >> user=> (map hash [ [1 2] (java.util.ArrayList. [1 2]) ])
> >> (156247261 994)
> >> user=> (hash-map [1 2] "vec" (java.util.ArrayList. [1 2]) "alist")
> >> {[1 2] "alist", [1 2] "vec"}
> >>
> >>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
--
Luc Prefontaine<lprefo...@softaddicts.ca> sent by ibisMail!

Jozef Wagner

unread,
May 11, 2014, 4:58:30 PM5/11/14
to clo...@googlegroups.com, cloju...@googlegroups.com
I'm the reporter of the mentioned ticket and I'm no longer inclined into fixing the hash. The real issue seems to lay in the fact that the = equality tries to work with host non-values. Whether the usefulness of such interoperability outweights the correctness of the =/hash is the issue which should be more discussed, with real examples. 

So for me, the option a) is not the way we should go, and I would like to add an option c), to introduce a strict variant of = operation alongside the current one, and use it in cases where the =/hash inconsistency causes real trouble.

Jozef

Alex Miller

unread,
May 11, 2014, 7:33:25 PM5/11/14
to cloju...@googlegroups.com, clo...@googlegroups.com
On Sun, May 11, 2014 at 2:06 PM, Mikera <mike.r.an...@gmail.com> wrote:
OK..... this thread is a bit worrying. If I understand correctly, it means that we've now got inconsistent hash and equals functions. I suspect this hasn't bitten many people yet simply because it is unusual to mix Java and Clojure collections as keys in the same structure, but it seems like a really nasty issue.

Reports of real problems in real codebases would change my own opinion of the severity of this issue. 
 
Hash and equality functions generally have a very simple rule: if two things are equal they must have the same hashcode. But now we've got:

(= (java.util.ArrayList. [1 2]) [1 2])
=> true

(hash (java.util.ArrayList. [1 2]))
=> 994

(hash [1 2])
=> 156247261

I consider this a major defect. If we want to define our own equals and hashcode that's fine, but then these functions absolutely should be consistent with each other.

If it is really true that non-Clojure collections aren't considered as values and aren't expected to participate in hash/= comparisons then I'd expect an exception, not an erroneous value. Silent potential corruption of data is *much* worse than a noisy failure. But I think even that is a bad direction to go: easy Java ecosystem interop is definitely one of the main reasons why I'm using Clojure, and I'd like to see this maintained.

Please be more specific about where an exception could be thrown (equals vs hash vs collection usage). 

I can think of only two sensible ways to fix this inconsistency:
a) Roll back the hash changes. Correctness is much more important than a performance edge case.

We're not doing that.
 
(I suspect the hash changes have hurt average case performance anyway...)

We have made hash calculation slightly slower to generate better hashes across the board. Timing info is at http://dev.clojure.org/display/design/Better+hashing but in general, the time increases per hash are on the order of nanoseconds for simple values and sub-millisecond for more complicated strings and collections (where hashes are cached and times are likely to be dwarfed by other things). The better hashes avoid catastrophically bad (and relatively easy to find) cases. 

It is quite challenging to predict the performance impact of the hashing changes in Strings, keywords, and symbols due to the effects of usage patterns, interning, caching, etc. I am aware of several people that have reported better performance overall in 1.6 and I don't think I've heard any reports of significantly slower performance in 1.6. I'm calling FUD on this unless you have data to report.
 
b) Make (= (java.util.ArrayList. [1 2]) [1 2]) and similar comparisons evaluate to false (on the grounds that ArrayLists aren't Clojure collections, so should not be considered equal).

Equality comparison between Clojure and Java collections is something that seems relatively common. For example, calling a Java API that returns a List and comparing the returned value with a vector. I don't want to lose that ability.

c) Fail on hasheq of Java collections

Adding this check would decrease the performance of hasheq for many other types. Given what I know right now, I'm not inclined to make that tradeoff. 

d) Fail when conj'ing Java collections into Clojure map keys or sets.

This would scope the check more but still add a performance hit for this case on all other types. Note that Clojure maps or sets of only Java collections currently work so this would take away functionality that might be useful even if in a gray area.



 

--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev.

John Hume

unread,
May 12, 2014, 1:15:31 PM5/12/14
to clo...@googlegroups.com, cloju...@googlegroups.com
On Sunday, May 11, 2014 6:33:25 PM UTC-5, Alex Miller wrote:
On Sun, May 11, 2014 at 2:06 PM, Mikera <mike.r.an...@gmail.com> wrote:
OK..... this thread is a bit worrying. If I understand correctly, it means that we've now got inconsistent hash and equals functions. I suspect this hasn't bitten many people yet simply because it is unusual to mix Java and Clojure collections as keys in the same structure, but it seems like a really nasty issue.

Reports of real problems in real codebases would change my own opinion of the severity of this issue. 

Here's the real problem I feared when thinking about what was new in Clojure 1.6, which led to the testing that led to this thread:
One of our Clojure applications has two kinds of clients. (1) Legacy Java clients use a legacy remoting lib w/ generated, shared-across-client-and-server Java value types as messages. Where those have a collection, that's exposed as a java.util.List<AnotherValueType>. (2) Newer (Clojure) clients use a lighter-weight message format that's easily read into Clojure data structures. Clients interested in the same data need to be aggregated, and the commonality is contained in one java.util.List<AnotherValueType> from a message the legacy clients send. The old code used those as map keys. The new code for Clojure clients currently builds a vector of AnotherValueType and throws that into the same map, where, under Clojure 1.5.1, vectors and non-Clojure Lists can play nicely together. Under Clojure 1.6, they won't play nicely any more.

This is an easy problem to work around by making the keys from the two client types more similar, either vectors on both sides or non-Clojure java.util.Lists on both sides. Luckily the change on our side is recent, and Clojure 1.6 is brand new, so this jumped out at me as something to test well before embarking on an upgrade, and this is a (probably) subtle bug we've avoided.

What primarily worries me about the new behavior is that Clojure has always embraced the host platform, but the notion of "value" that seems to be reified in 1.6 excludes the closest things the Java platform has to collection values. While I agree a java.util.ArrayList is a bad value, mutability is not part of the java.util.List contract – it's optional – and even the best (immutable) Java collection value type one can write wouldn't be treated as a value by Clojure 1.6 unless it also implemented some clojure.lang interface(s) (which no sane Java lib would do). In the spirit of embracing the host platform, I'd like to see support for Java collections as values (and let those who mutate their value-treated Java collections deal with the nasty consequences).

From the comments on the ticket (http://dev.clojure.org/jira/browse/CLJ-1372), particularly Alex Miller's two longer comments from March 10th, I have the impression that there's no principled objection to this, just the performance concern with supporting so many types in hasheq. Does that seem right, Alex? I'd feel much better about an open issue and agreement that Clojure is in search of a high-performance implementation treating Java collections as values than about clear documentation of when and why hasheq is and will remain inconsistent w/ = and all the quirks that follow from that. (Though since we're living w/ the inconsistency, that documentation is important.)

Thanks.
-hume.

Alex Miller

unread,
May 12, 2014, 1:53:34 PM5/12/14
to cloju...@googlegroups.com, clo...@googlegroups.com
On Mon, May 12, 2014 at 12:15 PM, John Hume <duelin....@gmail.com> wrote:
On Sunday, May 11, 2014 6:33:25 PM UTC-5, Alex Miller wrote:
On Sun, May 11, 2014 at 2:06 PM, Mikera <mike.r.an...@gmail.com> wrote:
OK..... this thread is a bit worrying. If I understand correctly, it means that we've now got inconsistent hash and equals functions. I suspect this hasn't bitten many people yet simply because it is unusual to mix Java and Clojure collections as keys in the same structure, but it seems like a really nasty issue.

Reports of real problems in real codebases would change my own opinion of the severity of this issue. 

Here's the real problem I feared when thinking about what was new in Clojure 1.6, which led to the testing that led to this thread:
One of our Clojure applications has two kinds of clients. (1) Legacy Java clients use a legacy remoting lib w/ generated, shared-across-client-and-server Java value types as messages. Where those have a collection, that's exposed as a java.util.List<AnotherValueType>. (2) Newer (Clojure) clients use a lighter-weight message format that's easily read into Clojure data structures. Clients interested in the same data need to be aggregated, and the commonality is contained in one java.util.List<AnotherValueType> from a message the legacy clients send. The old code used those as map keys.

I think the general recommendation would be to not do that. :)  My recommendation in Java would be the same - using mutable objects as keys in a map (where mutability changes the hashcode) is a bug waiting to happen. 
 
The new code for Clojure clients currently builds a vector of AnotherValueType and throws that into the same map, where, under Clojure 1.5.1, vectors and non-Clojure Lists can play nicely together. Under Clojure 1.6, they won't play nicely any more.

This is an easy problem to work around by making the keys from the two client types more similar, either vectors on both sides or non-Clojure java.util.Lists on both sides.

IMO vectors are the only choice that makes sense here.
 
Luckily the change on our side is recent, and Clojure 1.6 is brand new, so this jumped out at me as something to test well before embarking on an upgrade, and this is a (probably) subtle bug we've avoided.

Depending on your point of view, you could also say that the use of a mutable Java collection as a key is the bug you had already incurred and it's just now biting you. :)  

I really am sympathetic to what you're describing, but just trying to highlight how the mutability of Java collections really is a problem when used for hashing (and this is true regardless of language). Immutability is what allows us to cache the hash code of a collection (something that cannot be done in Java or Scala due to the mutability of objects, even in immutable collections). 
 
What primarily worries me about the new behavior is that Clojure has always embraced the host platform, but the notion of "value" that seems to be reified in 1.6 excludes the closest things the Java platform has to collection values.

I really don't think anything has changed philosophically in 1.6. There is a tension between interop support for mutable objects and the immutability at the core of Clojure. We've come to a point where performance concerns are driving us a bit further in one direction. 
 
While I agree a java.util.ArrayList is a bad value, mutability is not part of the java.util.List contract – it's optional – and even the best (immutable) Java collection value type one can write wouldn't be treated as a value by Clojure 1.6 unless it also implemented some clojure.lang interface(s) (which no sane Java lib would do). In the spirit of embracing the host platform, I'd like to see support for Java collections as values (and let those who mutate their value-treated Java collections deal with the nasty consequences).

In general, Clojure treats Java collections (and arrays) as values now ... but if you do mutation, then any Clojure guarantees are off (in hashing, in STM, etc). I think that's been generally a happy compromise so far. The question is to what extent that should continue. 
 
From the comments on the ticket (http://dev.clojure.org/jira/browse/CLJ-1372), particularly Alex Miller's two longer comments from March 10th, I have the impression that there's no principled objection to this, just the performance concern with supporting so many types in hasheq. Does that seem right, Alex?

That's my feeling (not speaking for Rich) - I'd prefer to say without exception that equiv=true => hash=same. As I said above, I think treating Java collections as "look the other way" values has been mostly a happy compromise. 

It may even be that this could be supported without much of a hit. Strings, numbers, keywords, symbols, and all Clojure collections should be handled above the suggested change and that covers most common key types. Some research would need to be done on what common non-Clojure classes (hence not IHashEq implementors) fall into the final else case - java.lang.Class is one that comes to mind which is used a lot inside Clojure itself. 
 
I'd feel much better about an open issue and agreement that Clojure is in search of a high-performance implementation treating Java collections as values than about clear documentation of when and why hasheq is and will remain inconsistent w/ = and all the quirks that follow from that. (Though since we're living w/ the inconsistency, that documentation is important.)

Thanks.
-hume.

--

John D. Hume

unread,
May 12, 2014, 5:26:49 PM5/12/14
to cloju...@googlegroups.com, clo...@googlegroups.com
On Mon, May 12, 2014 at 12:53 PM, Alex Miller <al...@puredanger.com> wrote:
My recommendation in Java would be the same - using mutable objects as keys in a map (where mutability changes the hashcode) is a bug waiting to happen. 

Although I used java.util.ArrayList in the sample REPL session showing the surprising behavior, I never said anything about the Java types used in our application being mutable. That gets at my point: the "best" Java collection values, the only ones that make good hashmap keys, are immutable, but Clojure still gives them this mixed status where they're equiv but not hasheq to IPersistentCollections w/ the same contents.

[Having made that point again, I'll now admit that under the cover of java.util.List<V>, the lists in messages our app receives are ArrayLists :-), BUT they're never mutated, and we'd have the same issue if they were Guava ImmutableList<V>, which is a fine value type (so long as V is immutable).]

In general, Clojure treats Java collections (and arrays) as values now ... but if you do mutation, then any Clojure guarantees are off (in hashing, in STM, etc). I think that's been generally a happy compromise so far. The question is to what extent that should continue. 

I'm totally happy to see crazy things happen to anyone who treats a mutable collection as a value and then mutates it.

It may even be that this could be supported without much of a hit. Strings, numbers, keywords, symbols, and all Clojure collections should be handled above the suggested change and that covers most common key types. Some research would need to be done on what common non-Clojure classes (hence not IHashEq implementors) fall into the final else case - java.lang.Class is one that comes to mind which is used a lot inside Clojure itself.

I like the sound of this. One of your comments on the ticket mentioned JVM inlining issues, which scared me.

Is it a safe bet that protocol dispatch would be too slow for hasheq?

Alex Miller

unread,
May 12, 2014, 5:41:27 PM5/12/14
to cloju...@googlegroups.com, clo...@googlegroups.com
On Mon, May 12, 2014 at 4:26 PM, John D. Hume <duelin....@gmail.com> wrote:
On Mon, May 12, 2014 at 12:53 PM, Alex Miller <al...@puredanger.com> wrote:
My recommendation in Java would be the same - using mutable objects as keys in a map (where mutability changes the hashcode) is a bug waiting to happen. 

Although I used java.util.ArrayList in the sample REPL session showing the surprising behavior, I never said anything about the Java types used in our application being mutable. That gets at my point: the "best" Java collection values, the only ones that make good hashmap keys, are immutable, but Clojure still gives them this mixed status where they're equiv but not hasheq to IPersistentCollections w/ the same contents.

[Having made that point again, I'll now admit that under the cover of java.util.List<V>, the lists in messages our app receives are ArrayLists :-), BUT they're never mutated, and we'd have the same issue if they were Guava ImmutableList<V>, which is a fine value type (so long as V is immutable).]

If you don't know, then you don't know... But, point taken.
 
In general, Clojure treats Java collections (and arrays) as values now ... but if you do mutation, then any Clojure guarantees are off (in hashing, in STM, etc). I think that's been generally a happy compromise so far. The question is to what extent that should continue. 

I'm totally happy to see crazy things happen to anyone who treats a mutable collection as a value and then mutates it.

It may even be that this could be supported without much of a hit. Strings, numbers, keywords, symbols, and all Clojure collections should be handled above the suggested change and that covers most common key types. Some research would need to be done on what common non-Clojure classes (hence not IHashEq implementors) fall into the final else case - java.lang.Class is one that comes to mind which is used a lot inside Clojure itself.

I like the sound of this. One of your comments on the ticket mentioned JVM inlining issues, which scared me.

Yes, me too. :)  Needs investigation, which is why we did not try to jam it into 1.6. 

Is it a safe bet that protocol dispatch would be too slow for hasheq?

No clue - needs more investigation.  

Mikera

unread,
May 12, 2014, 10:13:53 PM5/12/14
to cloju...@googlegroups.com, clo...@googlegroups.com
On Monday, 12 May 2014 00:33:25 UTC+1, Alex Miller wrote:
On Sun, May 11, 2014 at 2:06 PM, Mikera <mike.r.an...@gmail.com> wrote:
OK..... this thread is a bit worrying. If I understand correctly, it means that we've now got inconsistent hash and equals functions. I suspect this hasn't bitten many people yet simply because it is unusual to mix Java and Clojure collections as keys in the same structure, but it seems like a really nasty issue.

Reports of real problems in real codebases would change my own opinion of the severity of this issue. 

Are you suggesting correctness issues in APIs aren't severe? This is broken according to the documentation. From the docstrings:

clojure.core/= "Same as
  Java x.equals(y) except it also works for nil, and compares
  numbers and collections in a type-independent manner."

clojure.core/hash "Returns the hash code of its argument. Note this is the hash code
  consistent with =, and thus is different than .hashCode for Integer,
  Short, Byte and Clojure collections."

This promises consistency, and also promises behaviour that works like x.equals(y) and x.hashCode() - except for a a few defined exceptions. Which means I should expect them to work on arbitrary Java objects.

As a result, all sorts of code that depends on hashes directly or indirectly now has subtle bugs. clojure.core/merge is now broken for example:

(let [a {[1 2] :a}
      b {(java.util.ArrayList. [1 2]) :a}
      m (zipmap (range 20) (range 20))]
  [(= a b)   (= (merge m a) (merge m b))])
=> [true false]

I hope everyone agrees that this isn't the kind of behaviour we want in core functions?

 
Hash and equality functions generally have a very simple rule: if two things are equal they must have the same hashcode. But now we've got:

(= (java.util.ArrayList. [1 2]) [1 2])
=> true

(hash (java.util.ArrayList. [1 2]))
=> 994

(hash [1 2])
=> 156247261

I consider this a major defect. If we want to define our own equals and hashcode that's fine, but then these functions absolutely should be consistent with each other.

If it is really true that non-Clojure collections aren't considered as values and aren't expected to participate in hash/= comparisons then I'd expect an exception, not an erroneous value. Silent potential corruption of data is *much* worse than a noisy failure. But I think even that is a bad direction to go: easy Java ecosystem interop is definitely one of the main reasons why I'm using Clojure, and I'd like to see this maintained.

Please be more specific about where an exception could be thrown (equals vs hash vs collection usage). 

I can think of only two sensible ways to fix this inconsistency:
a) Roll back the hash changes. Correctness is much more important than a performance edge case.

We're not doing that.
 
(I suspect the hash changes have hurt average case performance anyway...)

We have made hash calculation slightly slower to generate better hashes across the board. Timing info is at http://dev.clojure.org/display/design/Better+hashing but in general, the time increases per hash are on the order of nanoseconds for simple values and sub-millisecond for more complicated strings and collections (where hashes are cached and times are likely to be dwarfed by other things). The better hashes avoid catastrophically bad (and relatively easy to find) cases.  

It is quite challenging to predict the performance impact of the hashing changes in Strings, keywords, and symbols due to the effects of usage patterns, interning, caching, etc. I am aware of several people that have reported better performance overall in 1.6 and I don't think I've heard any reports of significantly slower performance in 1.6. I'm calling FUD on this unless you have data to report.

It's absolutely not FUD. I'm generally a pretty big Clojure evangelist.... and my aim here is simply to be very clear about issues so that we can improve Clojure. I hope all my comments are taken in that spirit.

The page you have linked shows that the hashing changes have slowed down various workloads (including the full Clojure compile). The Clojure micro-benchmarks posted recently also show various slowdowns related to hashing. 

For good measure (and because I care about small vector performance for numerical work / core.matrix stuff) here's my own criterium benchmark:

;; Clojure 1.6.0
(c/quick-bench 
     (dotimes [i 1000] (hash [i i])))
WARNING: Final GC required 9.3578822831726 % of runtime
WARNING: Final GC required 119.680511859946 % of runtime
Evaluation count : 1950 in 6 samples of 325 calls.
             Execution time mean : 339.879773 µs
    Execution time std-deviation : 98.446114 µs
   Execution time lower quantile : 240.545585 µs ( 2.5%)
   Execution time upper quantile : 453.154577 µs (97.5%)
                   Overhead used : 4.007792 ns

;; Clojure 1.5.1
(c/quick-bench 
     (dotimes [i 1000] (hash [i i])))
WARNING: Final GC required 50.6032996175532 % of runtime
Evaluation count : 5058 in 6 samples of 843 calls.
             Execution time mean : 185.715873 µs
    Execution time std-deviation : 16.476580 µs
   Execution time lower quantile : 165.946452 µs ( 2.5%)
   Execution time upper quantile : 207.266354 µs (97.5%)
                   Overhead used : 4.905631 ns

That's about a 1.8x slowdown, which seems quite noticeable (I conclude the cost here is from hashing changes, since the performance of vector construction is unchanged - the same benchmarks omitting the call to hash take about 57 microseconds in both Clojure 1.5.1 and 1.6.0)

So - the data I've seen so far supports the assertion that the hashing changes slow down performance in the average / typical cases.

Please Note: I'm not saying that the hashing changes are bad overall (I certainly like the guarantee of better hashes and the edge cases are indeed nasty). But let's not deny that this is a real trade-off.

 
 
b) Make (= (java.util.ArrayList. [1 2]) [1 2]) and similar comparisons evaluate to false (on the grounds that ArrayLists aren't Clojure collections, so should not be considered equal).

Equality comparison between Clojure and Java collections is something that seems relatively common. For example, calling a Java API that returns a List and comparing the returned value with a vector. I don't want to lose that ability.

Agreed. This would be a breaking change. I certainly rely on this a lot in test suites.
 

c) Fail on hasheq of Java collections

Adding this check would decrease the performance of hasheq for many other types. Given what I know right now, I'm not inclined to make that tradeoff. 

I don't think this is a good idea either. Worse than performance, this would be a pretty severe breaking change. 
 

d) Fail when conj'ing Java collections into Clojure map keys or sets.

This would scope the check more but still add a performance hit for this case on all other types. Note that Clojure maps or sets of only Java collections currently work so this would take away functionality that might be useful even if in a gray area.

Again this is a big breaking change in behaviour. I don't think this is a good idea.

Ultimately, I think the only good way to fix all this is to make hash and = consistent for all types. If we want to avoid significant breaking changes and we also want to keep the new better hash functions, then I think this implies extending the new hash calculation to relevant Java collections. 

Michał Marczyk

unread,
May 13, 2014, 12:38:54 AM5/13/14
to cloju...@googlegroups.com, clo...@googlegroups.com
I've posted a patch that makes java.util.{List,Map,Map.Entry,Set}
hashes consistent with those of appropriate Clojure collection types
on the ticket.

Performance of repeated hasheq lookups on a single PHM actually seems
to be slightly better with this patch applied. Adding the hasheqs of a
PHM, a string and a long takes slightly longer than on 1.6.0. See the
ticket for details.

The patch also changes "default" return-the-hashCode behaviour of
hasheq to match Strings. That's mostly because I originally wanted to
leave out the String case altogether, but that seems to hurt String
hashing quite badly. The original branch is up on GitHub, however (see
ticket for link).


On 13 May 2014 04:13, 'Mikera' via Clojure Dev

Mike Fikes

unread,
May 13, 2014, 8:00:50 AM5/13/14
to clo...@googlegroups.com, cloju...@googlegroups.com
To facilitate inlining, the patch calls out to a separate larger method which handles a group of cases.

+        if(o.getClass().getName().startsWith("java.util."))
+                return doalienhasheq(o);
+        return o.hashCode();

I was wondering whether an efficient improvement is possible that would support things like Guava ImmutableList.

In particular, I was wonder which "default" cases are currently handled by the return o.hashCode() above. Replacing the three lines above with 

+        return doalienhasheq(o);

would allow the patch to also handle non-java.util collection implementations, but push the "default" cases down into the bottom of that method.

Michał Marczyk

unread,
May 13, 2014, 8:20:00 AM5/13/14
to clojure, cloju...@googlegroups.com
It's not just to facilitate inlining, but also to limit the perf hit
for hashing non-collections, some of which make completely reasonable
map keys and set members. I've used the classes Alex Miller mentioned
he was interested in for benchmarking: Class, Character, Var; these
are all good examples, and only one is under Clojure's control.

I used a "map or map entry or collection" test in earlier versions of
the patch, but switched to the class name hack because the three-way
type check seems to slow down getting to the default case much more
than examining the class name. For types that actually implement
IHashEq, however, the original patch was faster, so I think HotSpot is
happy to inline either version.
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+u...@googlegroups.com.

Michał Marczyk

unread,
May 13, 2014, 8:40:42 AM5/13/14
to clojure, cloju...@googlegroups.com
Also, = defers to equals in all cases except two: (1) numeric
arguments, (2) at least one IPersistentCollection among the arguments.

Clojure collections are allowed to determine whether they are = to the
other thing or not. So, we should calculate special hashCodes for the
java.util collections Clojure collections know about.

Other collection types will just use equals for comparisons and so for
them we have to return hashCode to preserve =/hash semantics by
relying on the equals/hashCode contract -- there is nothing else we
can do. Well, strictly speaking we could apply an arbitrary function
to hashCode, so murmuring it (as with Strings) is an option; that
would come at a performance cost, however (I benchmarked a version
that did that), and I don't think there'd be much benefit -- certainly
none at all for classes using identity hash codes.

Cheers,
Michał

Michał Marczyk

unread,
May 13, 2014, 8:47:17 AM5/13/14
to clojure, cloju...@googlegroups.com
Unfortunately the Guava example means that this approach doesn't
really solve the problem at hand... Thanks for pointing it out.

The earlier version with the three-way test in place of the
"java.util." check should be semantically correct, at least.

Michał Marczyk

unread,
May 13, 2014, 9:10:50 AM5/13/14
to clojure, cloju...@googlegroups.com
New patch reinstates the three-way check, but puts it in a separate
static method. Performance seems to be just fine.

I wonder if my previous round of benchmarking with the inline
three-way check was somehow flawed... Either that or the JIT is
happier with the new arrangement for some reason. In any case I kind
of like the separate isAlien.

Phillip Lord

unread,
May 15, 2014, 10:29:31 AM5/15/14
to clo...@googlegroups.com


I am trying to dump a representation of the contents of a list to file.
I've recently changed how I generated this list and it's now lazy (not really
by design more by side-effect, if you will excuse the poor choice of words).

I was using

(spit "file" (str lst "\n"))

which worked quite nicely, but now it is failing. The problem is that I get a
file full of "clojure.lang.LazySeq@" lines. The problem comes from LazySeq
directly, as this demonstration with "range" shows.

user> (str (list 1 2 3 4 5 ))
"(1 2 3 4 5)"
user> (str (range 4))
"clojure.lang.LazySeq@e1b83"
user> (println (range 4))
(0 1 2 3)
nil


println is using prn and a multimethod to print out. In fact,
clojure.lang.LazySeq doesn't implement toString, nor does it's super class.

The best solution that I have come up with so far is to do

(str (apply list (range 4)))

I guess I can see why LazySeq doesn't implement toString by printing
everything out, but is there a better way around my problem?

Phil

Michał Marczyk

unread,
May 15, 2014, 10:59:50 AM5/15/14
to clojure
Use pr-str:

user=> (str (lazy-seq (list 1 2 3)))
"clojure.lang.LazySeq@7861"
user=> (pr-str (lazy-seq (list 1 2 3)))
"(1 2 3)"

Cheers,
Michał

Phillip Lord

unread,
May 16, 2014, 4:52:42 AM5/16/14
to clo...@googlegroups.com

Ah, that's better. Thank you!

Phil
--
Phillip Lord, Phone: +44 (0) 191 222 7827
Lecturer in Bioinformatics, Email: philli...@newcastle.ac.uk
School of Computing Science, http://homepages.cs.ncl.ac.uk/phillip.lord
Room 914 Claremont Tower, skype: russet_apples
Newcastle University, twitter: phillord
NE1 7RU
Reply all
Reply to author
Forward
0 new messages