Bugs in contains? (?)

8 views
Skip to first unread message

Jason

unread,
Jan 3, 2009, 4:32:08 AM1/3/09
to Clojure
Is this desired behavior for contains?

1:1 user=> (= [1 2] '(1 2))
true
1:2 user=> (contains? (hash-set [1 2]) '(1 2))
false
1:3 user=> (contains? (sorted-set [1 2]) '(1 2))
java.lang.ClassCastException: clojure.lang.PersistentList (repl-1:2)
1:4 user=> (contains? (hash-map [1 2] nil) '(1 2))
false
1:5 user=> (contains? (array-map [1 2] nil) '(1 2))
true

As far as I can figure all five of these should return true (the
exception is particularly weird). Am I missing something here?

Thanks, Jason

MikeM

unread,
Jan 3, 2009, 3:19:54 PM1/3/09
to Clojure
I believe the exception is expected here due to the implementation of
sorted sets. See this thread for explanation of a similar exception
thrown by a sorted map:
http://groups.google.com/group/clojure/browse_frm/thread/34def2ec6342f40c/c87ee712482110ee#c87ee712482110ee

Martin Wood-Mitrovski

unread,
Jan 3, 2009, 5:40:20 AM1/3/09
to clo...@googlegroups.com

actually i think what i said wasnt accurate, its not checking identity
because you can do this :


=> (def x [1 2])
=> (def z (hash-set [1 2]))

=> (contains? z x)
true

=> (contains? z [1 2])
true

=> (== x [1 2])
false


That means the check for contains isnt identity, but rather the same
type and same value.

You could put those three things in some kind of order of strictness :

== : the same thing
contains? : has something of the same type and value
= : the same value


im sure someone else can explain in more exact terms what happens :)

Martin Wood-Mitrovski

unread,
Jan 3, 2009, 5:35:00 AM1/3/09
to clo...@googlegroups.com

Im just starting with clojure but to me it looks right.

In the first case the compare is true because its comparing them by
value, whereas in all the other cases it fails because you are asking
if a structure with a vector in it contains a list, which isnt true
because its not comparing them by value but by identity.

thats how i would understand it.

So if you change 1 to :

(== [1 2] '(1 2))

then you get false for that as well.

Martin Wood-Mitrovski

unread,
Jan 3, 2009, 5:40:20 AM1/3/09
to clo...@googlegroups.com

Martin Wood-Mitrovski

unread,
Jan 3, 2009, 5:35:00 AM1/3/09
to clo...@googlegroups.com
On Sat, 3 Jan 2009 01:32:08 -0800 (PST)
Jason <jaw...@berkeley.edu> wrote:

Im just starting with clojure but to me it looks right.

Jason

unread,
Jan 3, 2009, 4:24:19 PM1/3/09
to Clojure
> I believe the exception is expected here due to the implementation of
> sorted sets. See this thread for explanation of a similar exception
> thrown by a sorted map:http://groups.google.com/group/clojure/browse_frm/thread/34def2ec6342...

OK, that part makes sense to me, thanks!

-Jason

Jason

unread,
Jan 3, 2009, 4:30:30 PM1/3/09
to Clojure

> You could put those three things in some kind of order of strictness :
>
> ==        : the same thing
> contains? : has something of the same type and value
> =         : the same value
>

Thanks for your posts. I think I understand what happens now, but I
still maintain that it's a bug. In particular, the Java API says: "If
two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of the two objects must produce
the same integer result." This contract is clearly violated by
the .hashCode and .equals methods for Clojure vectors and lists:

user> (.equals [1 2] '(1 2))
true
user> (list (.hashCode [1 2]) (.hashCode '(1 2)))
(994 -1919631597)

Cheers, Jason

Andrew Baine

unread,
Jan 3, 2009, 4:40:57 PM1/3/09
to clo...@googlegroups.com


On Sat, Jan 3, 2009 at 1:30 PM, Jason <jaw...@berkeley.edu> wrote:
Thanks for your posts.  I think I understand what happens now, but I
still maintain that it's a bug.  In particular, the Java API says: "If
two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of the two objects must produce
the same integer result."  This contract is clearly violated by
the .hashCode and .equals methods for Clojure vectors and lists:

user> (.equals [1 2] '(1 2))
true
user> (list (.hashCode [1 2]) (.hashCode '(1 2)))
(994 -1919631597)

Cheers, Jason


I agree.  This does not seem right:

user> (= [1 2] '(1 2))
true
user> (map #(contains? (hash-set [1 2]) %) '((1 2) [1 2])) 
(false true)

Meikel Brandmeyer

unread,
Jan 3, 2009, 5:16:37 PM1/3/09
to clo...@googlegroups.com
Hi,

Am 03.01.2009 um 11:40 schrieb Martin Wood-Mitrovski:

> => (== x [1 2])
> false

== is for numbers.

1:2 user=> (def x [1 2])
#'user/x
1:3 user=> (= x [1 2])
true

The checks for = are by value not identity. Identity
can be checked with identical?, but this is rarely
needed.

Sincerely
Meikel

Christian Vest Hansen

unread,
Jan 3, 2009, 5:48:27 PM1/3/09
to clo...@googlegroups.com
On Sat, Jan 3, 2009 at 10:30 PM, Jason <jaw...@berkeley.edu> wrote:
>
>
>> You could put those three things in some kind of order of strictness :
>>
>> == : the same thing

== is only for numeric types. The Java == operator is called
identical? in Clojure.

>> contains? : has something of the same type and value
>> = : the same value
>>
>
> Thanks for your posts. I think I understand what happens now, but I
> still maintain that it's a bug. In particular, the Java API says: "If
> two objects are equal according to the equals(Object) method, then
> calling the hashCode method on each of the two objects must produce
> the same integer result." This contract is clearly violated by
> the .hashCode and .equals methods for Clojure vectors and lists:
>
> user> (.equals [1 2] '(1 2))
> true
> user> (list (.hashCode [1 2]) (.hashCode '(1 2)))
> (994 -1919631597)
>
> Cheers, Jason
> >
>



--
Venlig hilsen / Kind regards,
Christian Vest Hansen.

Chouser

unread,
Jan 3, 2009, 5:50:00 PM1/3/09
to clo...@googlegroups.com
On Sat, Jan 3, 2009 at 5:35 AM, Martin Wood-Mitrovski
<marv...@gmail.com> wrote:
>
> On Sat, 3 Jan 2009 01:32:08 -0800 (PST) Jason <jaw...@berkeley.edu> wrote:
>
>>
>> Is this desired behavior for contains?
>>
>> 1:1 user=> (= [1 2] '(1 2))
>> true
>> 1:2 user=> (contains? (hash-set [1 2]) '(1 2))
>> false
>> 1:3 user=> (contains? (sorted-set [1 2]) '(1 2))
>> java.lang.ClassCastException: clojure.lang.PersistentList (repl-1:2)
>> 1:4 user=> (contains? (hash-map [1 2] nil) '(1 2))
>> false
>> 1:5 user=> (contains? (array-map [1 2] nil) '(1 2))
>> true
>>
>> As far as I can figure all five of these should return true (the
>> exception is particularly weird). Am I missing something here?
>
> Im just starting with clojure but to me it looks right.
>
> In the first case the compare is true because its comparing them by
> value, whereas in all the other cases it fails because you are asking
> if a structure with a vector in it contains a list, which isnt true
> because its not comparing them by value but by identity.

Neither == nor hashes require true identity.

(def v1 [1 2])
(def v2 [1 2])
(def l1 '(1 2))
(def l2 '(1 2))

You can do what I believe are simple pointer comparisons (Java ==)
with 'identical?':
(identical? v1 v1) ==> true
(identical? l1 l1) ==> true
(identical? v1 v2) ==> false
(identical? l1 l2) ==> false

If hashes used identity, then you wouldn't be able to use v1 to look
up a hash with a key of v2, but you can:

(get (hash-map v1 :found) v2) ==> :found
(get (hash-map l1 :found) l2) ==> :found

But as noted originally, hashes are a bit more strict than Clojure =:

(= v1 l1) ==> true
(get (hash-map v1 :found) l1) ==> nil

You can see why by calling the 'hash' function directly:

(hash v1) ==> 994
(hash l1) ==> -1919631597

The same applies for hash-sets. ...but not for array-maps or
sorted-maps. Others have already explained the sorted-map exception,
but it may be worth noting again as the original post demonstrated
that array-maps appear to use =, ignoring the results of 'hash'

(get (array-map v1 :found) l1) ==> :found

So you can list things that relate to equality in order from most- to
least-strict as: identical?, hash, =

The == function doesn't fit on the scale because for numbers
it seems to act like =, but for non-numbers it just returns false:

(== v1 v1) ==> false
(identical? 1 1.0) ==> false
(identical? 1 1M) ==> false
(== 1 1M 1.0) ==> true
(= 1 1M 1.0) ==> true

The value of == is that it's much faster than =, and responds well to
working with primitive numbers:

(time (dotimes [_ 123456789] (= 1 1.0))) ==> 1350.525414 msecs
(time (dotimes [_ 123456789] (== 1 1.0))) ==> 18.603425 msecs
(time (dotimes [_ 123456789] (== (int 1) (float 1.0)))) ==> 10.348273 msecs

--Chouser

Stephen C. Gilardi

unread,
Jan 3, 2009, 5:51:17 PM1/3/09
to clo...@googlegroups.com

On Jan 3, 2009, at 5:16 PM, Meikel Brandmeyer wrote:

== is for numbers.

That's correct. Now that "=" also handles numbers in an type-independent way, do well still need "==" in Clojure?

Is it primarily a higher performance equality check when you know you're dealing with numbers?

--Steve

Christian Vest Hansen

unread,
Jan 3, 2009, 5:56:55 PM1/3/09
to clo...@googlegroups.com

Yes. Using == instead of = when the arguments are guaranteed to be
numbers can give you quite a bit of a performance boots if your loops
are tight (according to a blog post I read the other day :p )

> --Steve

Stephen C. Gilardi

unread,
Jan 3, 2009, 6:14:08 PM1/3/09
to clo...@googlegroups.com
On Jan 3, 2009, at 5:50 PM, Chouser wrote:

But as noted originally, hashes are a bit more strict than Clojure =:

(= v1 l1)  ==> true
(get (hash-map v1 :found) l1)  ==> nil

I see this as a bug. For clojure data structures, = maps to .equals and Java defines rules for the relationship between .equals and .hashCode that appear to be violated by the current Clojure implementation.

The value of == is that it's much faster than =, and responds well to
working with primitive numbers:

(time (dotimes [_ 123456789] (=  1 1.0)))  ==> 1350.525414 msecs
(time (dotimes [_ 123456789] (== 1 1.0)))  ==> 18.603425 msecs
(time (dotimes [_ 123456789] (== (int 1) (float 1.0))))  ==> 10.348273 msecs

That's quite a large performance advantage, and "==" does fit in with others like ">=" in being numeric only, 2 characters long, and made up of non-alphabetic characters.

I really like the story on equality (as implemented by "=") that Clojure provides (value equality of immutable objects, with almost no emphasis on object identity) as compared to the variety of different kinds of equality in Common Lisp.

I think "==" entered Clojure's vocabulary at a time when "=" did not compare numbers in a type independent way. Now that it does, and now that we can use "=" in almost all cases, I wonder if it would be an improvement to rename "==" to "num=" to express more clearly its primary use as a high performance version of "=" for numbers only.

--Steve

Jason Wolfe

unread,
Jan 3, 2009, 6:16:21 PM1/3/09
to Clojure
On Jan 3, 2:50 pm, Chouser <chou...@gmail.com> wrote:
> (def v1 [1 2])
> (def l1 '(1 2))
...
> But as noted originally, hashes are a bit more strict than Clojure =:
>
> (= v1 l1) ==> true
> (get (hash-map v1 :found) l1) ==> nil
>
> You can see why by calling the 'hash' function directly:
>
> (hash v1) ==> 994
> (hash l1) ==> -1919631597

OK, I understand all of that. I still claim this is a bug. Clojure
hashes are allowed to be more strict than Clojure =, I guess, although
I would very much dislike this state of affairs.

However, it is a Java *contract* [1] that
(.equals x y) ==> (== (.hashCode x) (.hashCode y)) and currently
Clojure data structures violate this contract:

user> (doseq [s ['(1 2) (seq '(1 2)) [1 2] (seq [1 2]) (ArrayList. [1
2])]]
(print "\n" (.hashCode s))
(doseq [t ['(1 2) (seq '(1 2)) [1 2] (seq [1 2]) (ArrayList. [1
2])]]
(print "\t" (.equals s t))))

-1919631597 true true true true false
-1919631597 true true true true false
994 true true true true true
-1919631597 true true true true false
994 false false true false true

IMHO, Clojure seqs/lists should use the java.util.List hash and
equality rules [2] so that the above matrix contains only "994" and
"true".

If the hash codes are left the same, the only other consistent option
is to change the definition of .equals so we get

-1919631597 true true false true false
-1919631597 true true false true false
994 false false true false true
-1919631597 true true false true false
994 false false true false true

If it's really by design to violate the Java .hashCode contract, I'd
be very interested in hearing why.

Cheers,
Jason


[1] http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Object.html#hashCode()
[2] http://java.sun.com/j2se/1.4.2/docs/api/java/util/List.html#hashCode()

Jason Wolfe

unread,
Jan 3, 2009, 7:11:22 PM1/3/09
to Clojure
> However, it is a Java *contract* [1] that
> (.equals x y) ==> (== (.hashCode x) (.hashCode y)) and currently
> Clojure data structures violate this contract:

> [1]http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Object.html#hashCode()

And, while on this subject, it's also worth pointing out that this
contract is also violated by Clojure data structures if you mutate
things behind their backs:

user> (let [xs (take 2 (repeatedly #(make-array Object 1)))
init-seqs (map seq xs)]
(doseq [x init-seqs] (hash x))
(aset (first xs) 0 10)
(let [all-seqs (cons (seq (first xs)) init-seqs)]
(doseq [s all-seqs]
(print "\n" (.hashCode s))
(doseq [t all-seqs]
(print "\t" (.equals s t))))))

-1640531517 true true false
-1640531527 true true false
-1640531527 false false true

As an fun fact, commenting out the (on the surface, purely functional)
doseq line
actually eliminates the problem, since hash values are presumably
computed on first request and then cached.

I understand why this is done and don't see it as much of a problem,
just something that users should be aware of. Perhaps one of the
pages on the website (Java interop / data structures?) should mention
this?

Cheers, Jason

Jason Wolfe

unread,
Jan 15, 2009, 1:44:23 PM1/15/09
to Clojure
Update: Rich just fixed this in <a href="http://code.google.com/p/
clojure/issues/detail?id=37&colspec=ID%20Type%20Status%20Priority
%20Reporter%20Owner%20Summary">svn 1215 </a>.

-Jason

Jason Wolfe

unread,
Jan 15, 2009, 1:47:10 PM1/15/09
to Clojure
Er, oops, forgot you can't HTML here.

Anyway, the upshot is that now

user=> (import '(java.util ArrayList))
nil
(doseq [s ['(1 2) (seq '(1 2)) [1 2] (seq [1 2]) (ArrayList. [1
2])]]
(print "\n" (.hashCode s))
(doseq [t ['(1 2) (seq '(1 2)) [1 2] (seq [1 2]) (ArrayList. [1
2])]]
(print "\t" (.equals s t))))

994 true true true true true
994 true true true true true
994 true true true true true
994 true true true true true
994 true true true true true

and

user=> (map #(contains? (hash-set [1 2]) %) '((1 2) [1 2]))
(true true)

Woohoo, thanks Rich!

-Jason
Reply all
Reply to author
Forward
0 new messages