user=> (-> {} (assoc [1 2] "vec") (assoc (java.util.ArrayList. [1 2]) "alist")){[1 2] "alist"}user=> (-> #{} (conj [1 2]) (conj (java.util.ArrayList. [1 2])))#{[1 2]}user=> *clojure-version*{:major 1, :minor 5, :incremental 1, :qualifier nil};;;;;;;;;;;;;;;;;;;;;user> (-> {} (assoc [1 2] "vec") (assoc (java.util.ArrayList. [1 2]) "alist")){[1 2] "alist"}user> (-> #{} (conj [1 2]) (conj (java.util.ArrayList. [1 2])))#{[1 2] [1 2]}user> *clojure-version*{:major 1, :minor 6, :incremental 0, :qualifier nil}
Perhaps this is a consequence of the hashing changes in 1.6. (http://dev.clojure.org/display/design/Better+hashing)
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Thanks for the ticket pointer. I didn't find it in my initial search because it "Affects Version/s: None" :-/, and I knew this behavior was new to 1.6.
OK..... this thread is a bit worrying. If I understand correctly, it means that we've now got inconsistent hash and equals functions. I suspect this hasn't bitten many people yet simply because it is unusual to mix Java and Clojure collections as keys in the same structure, but it seems like a really nasty issue.
Hash and equality functions generally have a very simple rule: if two things are equal they must have the same hashcode. But now we've got:(= (java.util.ArrayList. [1 2]) [1 2])=> true(hash (java.util.ArrayList. [1 2]))=> 994(hash [1 2])=> 156247261I consider this a major defect. If we want to define our own equals and hashcode that's fine, but then these functions absolutely should be consistent with each other.If it is really true that non-Clojure collections aren't considered as values and aren't expected to participate in hash/= comparisons then I'd expect an exception, not an erroneous value. Silent potential corruption of data is *much* worse than a noisy failure. But I think even that is a bad direction to go: easy Java ecosystem interop is definitely one of the main reasons why I'm using Clojure, and I'd like to see this maintained.
I can think of only two sensible ways to fix this inconsistency:a) Roll back the hash changes. Correctness is much more important than a performance edge case.
(I suspect the hash changes have hurt average case performance anyway...)
b) Make (= (java.util.ArrayList. [1 2]) [1 2]) and similar comparisons evaluate to false (on the grounds that ArrayLists aren't Clojure collections, so should not be considered equal).
--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev.
On Sun, May 11, 2014 at 2:06 PM, Mikera <mike.r.an...@gmail.com> wrote:
OK..... this thread is a bit worrying. If I understand correctly, it means that we've now got inconsistent hash and equals functions. I suspect this hasn't bitten many people yet simply because it is unusual to mix Java and Clojure collections as keys in the same structure, but it seems like a really nasty issue.Reports of real problems in real codebases would change my own opinion of the severity of this issue.
On Sunday, May 11, 2014 6:33:25 PM UTC-5, Alex Miller wrote:On Sun, May 11, 2014 at 2:06 PM, Mikera <mike.r.an...@gmail.com> wrote:
OK..... this thread is a bit worrying. If I understand correctly, it means that we've now got inconsistent hash and equals functions. I suspect this hasn't bitten many people yet simply because it is unusual to mix Java and Clojure collections as keys in the same structure, but it seems like a really nasty issue.Reports of real problems in real codebases would change my own opinion of the severity of this issue.Here's the real problem I feared when thinking about what was new in Clojure 1.6, which led to the testing that led to this thread:One of our Clojure applications has two kinds of clients. (1) Legacy Java clients use a legacy remoting lib w/ generated, shared-across-client-and-server Java value types as messages. Where those have a collection, that's exposed as a java.util.List<AnotherValueType>. (2) Newer (Clojure) clients use a lighter-weight message format that's easily read into Clojure data structures. Clients interested in the same data need to be aggregated, and the commonality is contained in one java.util.List<AnotherValueType> from a message the legacy clients send. The old code used those as map keys.
The new code for Clojure clients currently builds a vector of AnotherValueType and throws that into the same map, where, under Clojure 1.5.1, vectors and non-Clojure Lists can play nicely together. Under Clojure 1.6, they won't play nicely any more.This is an easy problem to work around by making the keys from the two client types more similar, either vectors on both sides or non-Clojure java.util.Lists on both sides.
Luckily the change on our side is recent, and Clojure 1.6 is brand new, so this jumped out at me as something to test well before embarking on an upgrade, and this is a (probably) subtle bug we've avoided.
What primarily worries me about the new behavior is that Clojure has always embraced the host platform, but the notion of "value" that seems to be reified in 1.6 excludes the closest things the Java platform has to collection values.
While I agree a java.util.ArrayList is a bad value, mutability is not part of the java.util.List contract – it's optional – and even the best (immutable) Java collection value type one can write wouldn't be treated as a value by Clojure 1.6 unless it also implemented some clojure.lang interface(s) (which no sane Java lib would do). In the spirit of embracing the host platform, I'd like to see support for Java collections as values (and let those who mutate their value-treated Java collections deal with the nasty consequences).
From the comments on the ticket (http://dev.clojure.org/jira/browse/CLJ-1372), particularly Alex Miller's two longer comments from March 10th, I have the impression that there's no principled objection to this, just the performance concern with supporting so many types in hasheq. Does that seem right, Alex?
I'd feel much better about an open issue and agreement that Clojure is in search of a high-performance implementation treating Java collections as values than about clear documentation of when and why hasheq is and will remain inconsistent w/ = and all the quirks that follow from that. (Though since we're living w/ the inconsistency, that documentation is important.)
Thanks.-hume.
--
My recommendation in Java would be the same - using mutable objects as keys in a map (where mutability changes the hashcode) is a bug waiting to happen.
In general, Clojure treats Java collections (and arrays) as values now ... but if you do mutation, then any Clojure guarantees are off (in hashing, in STM, etc). I think that's been generally a happy compromise so far. The question is to what extent that should continue.
It may even be that this could be supported without much of a hit. Strings, numbers, keywords, symbols, and all Clojure collections should be handled above the suggested change and that covers most common key types. Some research would need to be done on what common non-Clojure classes (hence not IHashEq implementors) fall into the final else case - java.lang.Class is one that comes to mind which is used a lot inside Clojure itself.
On Mon, May 12, 2014 at 12:53 PM, Alex Miller <al...@puredanger.com> wrote:
My recommendation in Java would be the same - using mutable objects as keys in a map (where mutability changes the hashcode) is a bug waiting to happen.Although I used java.util.ArrayList in the sample REPL session showing the surprising behavior, I never said anything about the Java types used in our application being mutable. That gets at my point: the "best" Java collection values, the only ones that make good hashmap keys, are immutable, but Clojure still gives them this mixed status where they're equiv but not hasheq to IPersistentCollections w/ the same contents.
[Having made that point again, I'll now admit that under the cover of java.util.List<V>, the lists in messages our app receives are ArrayLists :-), BUT they're never mutated, and we'd have the same issue if they were Guava ImmutableList<V>, which is a fine value type (so long as V is immutable).]
In general, Clojure treats Java collections (and arrays) as values now ... but if you do mutation, then any Clojure guarantees are off (in hashing, in STM, etc). I think that's been generally a happy compromise so far. The question is to what extent that should continue.I'm totally happy to see crazy things happen to anyone who treats a mutable collection as a value and then mutates it.It may even be that this could be supported without much of a hit. Strings, numbers, keywords, symbols, and all Clojure collections should be handled above the suggested change and that covers most common key types. Some research would need to be done on what common non-Clojure classes (hence not IHashEq implementors) fall into the final else case - java.lang.Class is one that comes to mind which is used a lot inside Clojure itself.
I like the sound of this. One of your comments on the ticket mentioned JVM inlining issues, which scared me.
Is it a safe bet that protocol dispatch would be too slow for hasheq?
On Sun, May 11, 2014 at 2:06 PM, Mikera <mike.r.an...@gmail.com> wrote:
OK..... this thread is a bit worrying. If I understand correctly, it means that we've now got inconsistent hash and equals functions. I suspect this hasn't bitten many people yet simply because it is unusual to mix Java and Clojure collections as keys in the same structure, but it seems like a really nasty issue.Reports of real problems in real codebases would change my own opinion of the severity of this issue.
Hash and equality functions generally have a very simple rule: if two things are equal they must have the same hashcode. But now we've got:(= (java.util.ArrayList. [1 2]) [1 2])=> true(hash (java.util.ArrayList. [1 2]))=> 994(hash [1 2])=> 156247261I consider this a major defect. If we want to define our own equals and hashcode that's fine, but then these functions absolutely should be consistent with each other.If it is really true that non-Clojure collections aren't considered as values and aren't expected to participate in hash/= comparisons then I'd expect an exception, not an erroneous value. Silent potential corruption of data is *much* worse than a noisy failure. But I think even that is a bad direction to go: easy Java ecosystem interop is definitely one of the main reasons why I'm using Clojure, and I'd like to see this maintained.Please be more specific about where an exception could be thrown (equals vs hash vs collection usage).I can think of only two sensible ways to fix this inconsistency:a) Roll back the hash changes. Correctness is much more important than a performance edge case.
We're not doing that.(I suspect the hash changes have hurt average case performance anyway...)We have made hash calculation slightly slower to generate better hashes across the board. Timing info is at http://dev.clojure.org/display/design/Better+hashing but in general, the time increases per hash are on the order of nanoseconds for simple values and sub-millisecond for more complicated strings and collections (where hashes are cached and times are likely to be dwarfed by other things). The better hashes avoid catastrophically bad (and relatively easy to find) cases.
It is quite challenging to predict the performance impact of the hashing changes in Strings, keywords, and symbols due to the effects of usage patterns, interning, caching, etc. I am aware of several people that have reported better performance overall in 1.6 and I don't think I've heard any reports of significantly slower performance in 1.6. I'm calling FUD on this unless you have data to report.
b) Make (= (java.util.ArrayList. [1 2]) [1 2]) and similar comparisons evaluate to false (on the grounds that ArrayLists aren't Clojure collections, so should not be considered equal).Equality comparison between Clojure and Java collections is something that seems relatively common. For example, calling a Java API that returns a List and comparing the returned value with a vector. I don't want to lose that ability.
c) Fail on hasheq of Java collectionsAdding this check would decrease the performance of hasheq for many other types. Given what I know right now, I'm not inclined to make that tradeoff.
d) Fail when conj'ing Java collections into Clojure map keys or sets.This would scope the check more but still add a performance hit for this case on all other types. Note that Clojure maps or sets of only Java collections currently work so this would take away functionality that might be useful even if in a gray area.
+ if(o.getClass().getName().startsWith("java.util.")) + return doalienhasheq(o); + return o.hashCode();
I was wondering whether an efficient improvement is possible that would support things like Guava ImmutableList.
In particular, I was wonder which "default" cases are currently handled by the return o.hashCode() above. Replacing the three lines above with
+ return doalienhasheq(o);
would allow the patch to also handle non-java.util collection implementations, but push the "default" cases down into the bottom of that method.