[guava] thoughts on Sets.union/intersection using alternative comparator?

260 views
Skip to first unread message

fishtoprecords

unread,
Apr 25, 2010, 5:44:39 PM4/25/10
to guava-discuss
This is fundamentally about the Collections part of the Guava world;

I find the Sets implementations of union, intersection, and difference
to be very useful. But I have occasions when I would like to specify
an alternative Comparator<T> for the equality test. The parallel is
with the standard JDK Collections.sort() that can optionally take a
comparator to change the natural sort order.

Actually, I think it would be more properly a EquivalanceRelation,
than just a Comparator, but the idea would be the same, sometimes you
want to execute the usual union, intersection, difference functions
using something other than the class's natural (or overridden)
equals() function.

Was this considered during the development of Google-Collections?

if not, would an implementation be considered worthwhile?

Thanks
pat

--
guava-...@googlegroups.com.
http://groups.google.com/group/guava-discuss?hl=en
unsubscribe: guava-discus...@googlegroups.com

This list is for discussion; for help, post to Stack Overflow instead:
http://stackoverflow.com/questions/ask
Use the tag "guava".

Kevin Bourrillion

unread,
Apr 25, 2010, 5:56:40 PM4/25/10
to guava-...@googlegroups.com
If the inputs are SortedSets with your chosen comparator, it should work fine.
--
Kevin Bourrillion @ Google
internal:  http://goto/javalibraries
external: http://guava-libraries.googlecode.com

Kevin Bourrillion

unread,
Apr 25, 2010, 5:57:48 PM4/25/10
to guava-...@googlegroups.com
Of course, it won't be the most optimal implementation in that case -- it won't know to take advantage of the sortedness, so to speak.  There's something coming later this summer that would do the smart thing, called OrderedIterator.

Raymond Conner

unread,
Apr 25, 2010, 7:12:51 PM4/25/10
to guava-...@googlegroups.com
While a Comparator can be made to do that job, I'm not sure it's the right abstraction. An equivalence relation is the right abstraction, but of course that doesn't exist.

I'm not sure Pat realizes it, but either case would require that the Sets being delegated to have been constructed using this alternate definition of .equals(). And that means you either do as Kevin suggested, or wrap your objects in something that changes what .equals() and .hashCode() mean as the elements are added.

Saying it another way, you can't both have a lazy set union/intersection/difference and redefine the semantics of equality at the same time. The lazy implementations depend on delegateSet.contains(), iterator(), etc. doing the right thing, and those delegate sets don't know about your redefinition of equals().

The wrapper route can be made to work easily enough, but not without either exposing the fact that things are wrapped (a Set< EquivalenceWrapper< Foo > >), or violating the Set API contract. And now that I think about it, that's probably why there is no general equivalence relation interface. The abstraction is going to leak if you don't break the Set API, so why bother defining it?

I've got to go fix some old code that I now realize is broken. Thanks for making me think about this problem again :)

- Ray A. Conner

Pat Farrell

unread,
Apr 25, 2010, 9:05:11 PM4/25/10
to guava-...@googlegroups.com
On Sun, Apr 25, 2010 at 7:12 PM, Raymond Conner <ray.a....@gmail.com> wrote:
While a Comparator can be made to do that job, I'm not sure it's the right abstraction. An equivalence relation is the right abstraction, but of course that doesn't exist.


I agree that what you really want is an EquivalenceRelation thing, but they don't exist.
 
 
I'm not sure Pat realizes it, but either case would require that the Sets being delegated to have been constructed using this alternate definition of .equals(). And that means you either do as Kevin suggested, or wrap your objects in something that changes what .equals() and .hashCode() mean as the elements are added.

For now, I'm doing the wrapping hack. It works, but is kinda ugly
 

fishtoprecords

unread,
Apr 26, 2010, 2:23:23 AM4/26/10
to guava-discuss
On Apr 25, 9:05 pm, Pat Farrell <pat22...@gmail.com> wrote:
> For now, I'm doing the wrapping hack. It works, but is kinda ugly

Well, I spoke too soon. Its not functioning the way I want, but that
could be bugs in my side.

In case anyone is interested in why:

I have some nice PersistentObjects with all sorts of DB-related
fields, that I pull from the DB and populate. When I talk to the GUI,
I make up new Objects from the entered data, but they don't, of
course, have the DB related values such as Primary Key into the DB,
dirty flag, etc..

So what I want is "equals" that looks only at the "important" fields
and skips the DB related ones.

Roberto Scaramuzzi

unread,
Apr 26, 2010, 12:48:23 PM4/26/10
to guava-...@googlegroups.com
.NET has IEqualityComparer (http://msdn.microsoft.com/en-us/library/ms132152(v=VS.80).aspx) and a Hashtable constructor that takes it as an argument. 
--
Roberto Scaramuzzi
Software Engineer
Google

tomgibara

unread,
Apr 26, 2010, 3:28:22 PM4/26/10
to guava-discuss
I'm not familiar with .NET, but I think the equivalence relation
abstraction is cleaner, the natural object model for which would
probably be via equivalence classes.

interface EquivalenceRelation<E, C> {

C getEquivalenceClass(E element);

}

equals/hashCode on instances of C would stand-in for the respective
methods on E in collections that supported them. The EqualityComparer
interface would be more efficient in cases where instances of C cannot
be obtained directly from instances of E, but equivalence relations
have the benefit of being a long-time universally adopted abstraction.

Tom

PS: I think it was an unfortunate mistake that equals() and hashCode()
were included on Object - equality is never intrinsic. Lots of Java
code bends itself out of shape because the standard collections (at
least those that don't take comparators) don't allow equality to be
specified independently of their elements.

Raymond Conner

unread,
Apr 26, 2010, 6:40:50 PM4/26/10
to guava-...@googlegroups.com
Not being familiar with .NET, I'm guessing the difference is that the addition doesn't break the Hashtable contract.

Being more explicit, using such a thing would not break anything inside the Set implementation. What it breaks is the API contract as specified by the javadocs. For example:

Set< Foo > set = ....;
a.equals( b );     // returns true
set.contains( a ); // returns true
set.contains( b ); // returns false

// or this
x.equals( y );     // returns true
set.add( x );      // returns true, x was added
set.add( y );      // returns true, y was added

And those are clear violations of the Set API. This gets you into trouble when you supply these to other code that you don't control. When I write a library method that takes a Set argument from the user, I'm expecting it to conform to the contract. I'm sure Guava is no different. And that's exactly the kind of thing that leads to subtle, hard-to-find, bugs.

- Ray A. Conner
Reply all
Reply to author
Forward
0 new messages