Improving equality (==)

10 views
Skip to first unread message

David James

unread,
Oct 19, 2009, 11:14:59 AM10/19/09
to MongoMapper
This is how I would expect an equality test to work in MongoMapper:

# Given Document, a MongoMapper Document with a :title key
@doc = Document.new
@doc_copy = @doc.dup
@doc.title = "changed"
assert_not_equal @doc_copy, @doc

My rationale: @doc and @doc_copy should be unequal because not all of
their key values have the same value. (Also of note: this would match
the styles in ActiveRecord and DataMapper.)

To double check my thinking, I did some research. Here is a great
summary of identity and equality in Ruby:
http://kentreis.wordpress.com/2007/02/08/identity-and-equality-in-ruby-and-smalltalk/

Currently, the MongoMapper == implementation looks like this:

def ==(other)
other.is_a?(self.class) && id == other.id
end

I notice two things wrong with this:
1. it doesn't check the underlying key values
2. it does a type check. equality in Ruby using == should not test
type or class.

I'm going to work on a patch so that MongoMapper's == checks the
underlying key values.

-David

David James

unread,
Oct 19, 2009, 11:20:05 AM10/19/09
to MongoMapper
One more thing. There is a time and place for checking type, but it
isn't ==. It is "eql?". So I plan on implementing eql? as well. From
the link I showed earlier: eql? is "True if the receiver and the
argument have both the same type and equal values."

David James

unread,
Oct 19, 2009, 11:51:58 AM10/19/09
to MongoMapper
Regarding the built-in '_id' property, I'm going to argue it shouldn't
be considered when checking equality with ==.

I consider the MongoMapper 'id' field to be analogous to Ruby's
#object_id. Ruby's == doesn't care if object_id is different. By
analogy, MongoMapper's == shouldn't care if '_id' is different.

To explain the Ruby example in detail:
s1 = "abc"
s2 = "abc"
assert_equal s1, s2
assert_not_equal s1.object_id, s2.object_id

John Nunemaker

unread,
Oct 19, 2009, 11:56:48 AM10/19/09
to mongo...@googlegroups.com
My initial reaction is I don't agree at all. I guess I'll have to see
what you come up with before I can really make up my mind.

David James

unread,
Oct 19, 2009, 12:16:15 PM10/19/09
to MongoMapper
Do you think the blog post I referenced accurately conveyed the "Ruby
way" of doing equality and identity?

David James

unread,
Oct 19, 2009, 12:30:47 PM10/19/09
to MongoMapper

Pol Llovet

unread,
Oct 19, 2009, 2:03:45 PM10/19/09
to mongo...@googlegroups.com
Just to add two cents, but I do agree with David on this one.

- pol

---
----------------------------------------------------------
pol m llovet
research software engineer

mobile: 406-579-1678
office: 406-994-3416
---------------------------------------------------------

Roy Wright

unread,
Oct 19, 2009, 2:25:23 PM10/19/09
to mongo...@googlegroups.com
I'm leaning towards agreeing with David too. In my case I hit an
issue where I had two queries and then discovered that #uniq did not
work on the combined result arrays. Here's a (very contrived) rspec
showing the issue:

http://gist.github.com/213584

HTH,
Roy

Michael Dirolf

unread,
Oct 19, 2009, 2:48:17 PM10/19/09
to mongo...@googlegroups.com
I think I am with John on this. _id in general can be any type, and
there are many cases (maybe not with MongoMapper, but with schemas in
MongoDB in general) where users use _id to store information that is
key to the data being represented (ie: usernames in a social network).
If MongoMapper is to play nice in the MongoDB ecosystem then I think
that _id should be taken into account when comparing documents for
equality.

- Mike

David James

unread,
Oct 19, 2009, 3:57:53 PM10/19/09
to MongoMapper
To summarize (and perhaps clarify and restate) my points:
A. unsaved documents (no _id):
1. == should depend only on key values (not type)
2. eql? should depend on type and key values
B. saved documents (with _id):
1. == should depend only on key values (not type)
2. eql? should depend on type and key values
* I think ignoring the value of '_id' makes the most sense

To elaborate on the (*) above... It is pretty clear to me that when
_id is just an arbitrary, machine-generated identifier, it should not
be part of the comparison -- because if it was, only object references
to the same database document would be == or eql?.

In the case Mike mentioned, when _id is used to store user-generated
information, I could understand why _id could be used as a basis for
differentiation. This situation (a special case in my opinion) could
be solved easily enough by overriding ==. It would be clean and easy
to do. On the other hand, if we decide the opposite (to make _id
matter for ==) then overriding would not be as clean.

Daniel DeLeo

unread,
Oct 19, 2009, 5:51:07 PM10/19/09
to mongo...@googlegroups.com
My two cents:
I'm unsure about some of these arguments. From the blog post itself:

It’s also worth noting that if you override eql? or ==, you’re expected to check to make sure that the objects have the same type before you start comparing any details. This is the common pattern in Smalltalk, too. Most Smalltalk implementations of #= first check in some way to see that the two objects are the same kind of thing and then proceed to compare relevant details.

Also, I often find that eql? is simply an alias for == and too unreliable across different versions of Ruby to be of much practical use. For example, in 1.8.6 Hash#eql? is an alias for equal?, but in 1.8.7, it is an alias for ==. 

I don't know if this muddles the discussion or illuminates it but it turns out that Hash#==, when comparing an object that isn't a hash, a) immediately returns false if the object has no #to_hash method and b) if the object does have a #to_hash method, delegates the decision to the other object (i.e., hash==(like_a_hash) ends up calling like_a_hash==hash). I choose hash to look at for an example since it's the most similar core datatype to a MM Document.

So, I'm unconvinced about having == and eql? be different, but whether or not there should be a type check really depends on whether you'd be happy to have a hash instead of a MM Document. I haven't really thought through all of the implications of one design over the other, so I don't have a strong opinion on that issue yet.

Cheers,
Daniel DeLeo

David James

unread,
Oct 19, 2009, 6:34:27 PM10/19/09
to MongoMapper
We have a lot of things up in the air now. So I'm going to start with
the least controversial point that I've made. Please consider the very
first code snippet I posted:

@doc = Document.new
@doc_copy = @doc.dup
@doc.title = "changed"
assert_not_equal @doc_copy, @doc

Unless I'm missing something, this makes sense. (And the opposite
would not make sense.) Do we agree this is how we want it to behave?
I've gotten yays from Pol, Roy, and Daniel. Reading between the lines,
I think Mike may agree with this part as well. John expressed
disagreement earlier, but I wonder if actually disagreed with this
part or something else.

Michael Dirolf

unread,
Oct 19, 2009, 6:37:16 PM10/19/09
to mongo...@googlegroups.com
I think that makes sense.

Sho Fukamachi

unread,
Oct 19, 2009, 7:39:21 PM10/19/09
to mongo...@googlegroups.com
I disagree with not including _id in an == check. It is not a
disposable temporary attribute like object_id; it is a permanent,
important part of the record. Excluding it is counterintuitive in my
opinion.

I sort of agree with your other point, that in an ideal world equality
should be checked by comparing keys. Problem is nested arrays etc
which can be arbitrarily deep and are very difficult to compare in
that way. I'd assume that's why John went with the method he did. In
fact I think his method is instructive because it reminds us of the
can of worms an equality check can be.

IMO the best, indeed only way to compare is to look at what the DB
actually sees - ie use mongo-ruby's functionality to dump to BSON
(including _id) and compare that. Anything else is making assumptions
which may not be correct all or even most of the time. But for now I
think John's implementation is fine, and if you actually care about
doing a recursive key-by-key check, it should be done at the
application level (where you know what you care about checking) or
maybe added as an additional function to MM (#really_equal?). The
current equality check should probably fail if _id is blank, though.


Sho

David James

unread,
Oct 19, 2009, 9:24:22 PM10/19/09
to MongoMapper
Sho wrote:
> The current equality check should probably fail if _id is blank, though.

So we have another +1 for the code snippet above. I'm trying to drive
towards consensus on the easy issues, if you can't tell already. :)

Sho Fukamachi

unread,
Oct 19, 2009, 10:19:34 PM10/19/09
to mongo...@googlegroups.com
Well kinda yeah. But the thing is, we all agree on what the
*undesired* behaviour is (returns true in your test case) - but what
is the *desired* behaviour? With these object comparisons you either
go the whole way or nothing at all.

Could this solve the problem?

def ==(other)
if id && other.id
other.is_a?(self.class) && id == other.id
else
raise CantDoThatException
end
end

One could make the case that's as far as it should go and that
comparisons of complex objects are an application level thing.

David James

unread,
Oct 20, 2009, 9:55:50 AM10/20/09
to MongoMapper
Sho, I think you've made a beautiful point: raising an error when you
try to do a comparison with an unsaved document IS probably the only
self-consistent thing you can do if you don't believe in checking the
underlying values.

>One could make the case that's as far as it should go and that
>comparisons of complex objects are an application level thing.
I wouldn't go there. Hashes (a sufficiently complicated example) know
how to compare against other hashes based on their
contents. MongoMapper should too.

Sho Fukamachi

unread,
Oct 20, 2009, 10:48:53 AM10/20/09
to mongo...@googlegroups.com

On 21/10/2009, at 12:55 AM, David James wrote:

> I wouldn't go there. Hashes (a sufficiently complicated example) know
> how to compare against other hashes based on their
> contents. MongoMapper should too.

If only it were that simple. Anything and everything can throw it off,
and once you say you can do it, you better do it right. I'd rather
just disclaim responsibility for the whole thing.

I just think this is one of these things which sounds simple but
isn't. I mean a few mails ago you were arguing against including _id
in a comparison; I would be dead against that. Timestamps are another
obvious point of controversy; if you're not checking _id then you
certainly wouldn't check timestamps and yet in my opinion they are
absolutely part of the data. Then there's time objects themselves,
always a nightmare. What about the question of do you compare
existence of keys, nils vs the key not being there in the first place,
hell, is it an ordered hash or an unordered hash. What if people have
callbacks. How about default values, should they be compared. What
about meta-data; dirty, new, etc, should that be compared.

Etc etc ....

I understand where you are coming from. I just think it is a giant
pain to implement, difficult to reach consensus on how it should work,
and in the end, there's only one person who knows what is the salient
data to be compared, and that is the application level developer.

That said I am actually completely ambivalent about this behaviour in
MM, I just thought it was an interesting topic. So I think I should
shut up and let you make your case,sorry .. : )


> >

David James

unread,
Oct 20, 2009, 4:49:22 PM10/20/09
to mongo...@googlegroups.com
Sho, I think there is some truth to this point you made:
there's only one person who knows what is the salient
> data to be compared, and that is the application level developer.

Ok, let's say for the sake of argument that I accept this. If an application designer wants to have a specialized version of equality, they are free to redefine it. But this doesn't mean that MM should punt on what equality means at the framework level. The framework (MM) should behave sensibly about equality!

The way MM behaves now is *not* sensible:

  @doc = Document.new
  @doc_copy = @doc.dup
  @doc.title = "changed"
  @doc_copy == @doc # => true

My goal is to convince enough people (especially John) that the desired result is ... drum roll please:

  false

I don't think any sane person would argue that the result should be true.

Sho threw out the possibility that the result could be an exception. I can see that, in the same-document-type-and-id-is-all-that-matters worldview, this would have the virtue of being self-consistent. But there are problems: you would have to wrap your statements containing == with rescues. THAT would be a bit ridiculous.

Roy Wright

unread,
Oct 21, 2009, 12:27:25 PM10/21/09
to mongo...@googlegroups.com
I'm going to throw an idea out for discussion.  What if equality (==) tested the #hash of the objects:

def hash
  _id || raise CantDoThatException
end

def ==(other)
  hash == other.hash
end

Then the application developer could easily include what should be tested by overriding the #hash.

Just an idea.

Have fun,
Roy

David James

unread,
Oct 21, 2009, 4:29:17 PM10/21/09
to MongoMapper
Roy, I'm not a big fan of this idea. #hash is a special function and
has a special relationship with #eql? have a special relationship:
"+a.eql?(b)+ implies +a.hash == b.hash+." From the docs:

------------------------------------------------------------
Object#hash
obj.hash => fixnum
------------------------------------------------------------------------
Generates a +Fixnum+ hash value for this object. This function
must
have the property that +a.eql?(b)+ implies +a.hash == b.hash+.
The
hash value is used by class +Hash+. Any hash value that exceeds
the
capacity of a +Fixnum+ will be truncated before being used.

If an application wants to override ==, it should just override ==.
(There is need to get hash involved. Because if you do then you have
to make sure eql? has a good definition as well. That is too much
complexity in comparison with a simple override of ==)
Reply all
Reply to author
Forward
0 new messages