Sets.union can take several sets

1,109 views
Skip to first unread message

Stanislav Kurilin

unread,
Apr 9, 2012, 12:16:18 PM4/9/12
to guava-...@googlegroups.com
Hi!
I believe it would be useful if Sets.union method would accept Collection/Iterable of sets instead of only two. You can make it using varaargs or with new method unionAll.

Thanks.

Louis Wasserman

unread,
Apr 9, 2012, 12:20:06 PM4/9/12
to Stanislav Kurilin, guava-...@googlegroups.com
-1. The performance of the methods of the "union set" degenerates linearly with the number of sets, so Sets.union(Sets.union(Sets.union(...))) is rarely what you want to do, especially with an unbounded number of sets.  Usually, a much better choice is adding the sets, one by one, to an ImmutableSet.Builder, and then building that.

The current setup makes it easier to do "the right thing" -- either take the union of a constant number of sets, or to build multiple sets into an ImmutableSet if you don't know how many sets you're getting.  I like that a lot.

Stanislav Kurilin

unread,
Apr 9, 2012, 12:23:55 PM4/9/12
to guava-...@googlegroups.com, Stanislav Kurilin
unionAll  method implementation can be implemented with constructing new set using ImmutableSet.Builder instead of calling Sets.union(Sets.union(Sets.union(...))), can't it?

Louis Wasserman

unread,
Apr 9, 2012, 12:27:00 PM4/9/12
to Stanislav Kurilin, guava-...@googlegroups.com
Sets.union(Set, Set) is specified to return a view, not a newly constructed copy, and for Sets.union(Iterable<Set>) to have different behavior could violate expectations even worse than a badly performing Set that at least has the same semantics as the two-argument version.

Stanislav Kurilin

unread,
Apr 9, 2012, 12:47:00 PM4/9/12
to Louis Wasserman, guava-...@googlegroups.com
OK. But we can deal with it by naming it unionCopy or copyUnionAll or
smt like that. Anyway, I believe, it could be very useful method.

2012/4/9 Louis Wasserman <wasserm...@gmail.com>:

--
Yours faithfully,
Stanislav Kurilin

Louis Wasserman

unread,
Apr 9, 2012, 12:52:32 PM4/9/12
to Stanislav Kurilin, guava-...@googlegroups.com
Perhaps, but it's already straightforward enough to implement the same functionality with already available methods, even in a single line: ImmutableSet.copyOf(Iterables.concat(sets)).  Given that, it doesn't seem likely that the minor savings of providing a single method for this operation outweighs the added API surface of a new method to an already-quite-large utility class.

Tim Peierls

unread,
Apr 9, 2012, 12:53:51 PM4/9/12
to Stanislav Kurilin, Louis Wasserman, guava-...@googlegroups.com

It's easy to write yourself if you want it, and leaving it out of Guava avoids the risk of confusion. Doesn't seem to me that it carries its own weight.

Emily Soldal

unread,
Apr 10, 2012, 11:51:46 AM4/10/12
to guava-discuss
Could we perhaps add this sort of thing to the comments? I feel it
would be useful to know how to get around such a limitation and be
made aware of it.

On 9 apr, 12:20, Louis Wasserman <wasserman.lo...@gmail.com> wrote:
> -1. The performance of the methods of the "union set" degenerates linearly
> with the number of sets, so Sets.union(Sets.union(Sets.union(...))) is
> rarely what you want to do, especially with an unbounded number of sets.
>  Usually, a much better choice is adding the sets, one by one, to an
> ImmutableSet.Builder, and then building that.
>
> The current setup makes it easier to do "the right thing" -- either take
> the union of a constant number of sets, or to build multiple sets into an
> ImmutableSet if you don't know how many sets you're getting.  I like that a
> lot.
>
> Louis Wasserman
> wasserman.lo...@gmail.comhttp://profiles.google.com/wasserman.louis

Louis Wasserman

unread,
Apr 10, 2012, 12:04:06 PM4/10/12
to Emily Soldal, guava-discuss
Hrrrrrmkay.  I'm weakly in favor of this, but Guava provides a staggering number of "view" operations, and (basically) all view operations add some small constant overhead to the operations of the underlying object.  I feel mostly comfortable assuming the knowledge that "views add overhead, and views of views add linear overhead in their depth."

To be sure, the "union operation" is probably more likely to be used recursively on many inputs than most view operations, so I can see the arguments for giving it special documentation that that's not always a good idea.

Dimitris Andreou

unread,
Apr 10, 2012, 1:08:26 PM4/10/12
to Emily Soldal, guava-discuss
This has bit me some years back too: http://code-o-matic.blogspot.com/2009/10/beware-of-recursive-set-union-building.html

(Ignore the false statement at the end of it, regarding HashSet being wasteful)

Another one to watch out for are recursive Iterators.concat(), which turns iteration from O(n) to O(nlogn) - not as bad as the O(n^2) of recursive union() (which could use a comment I guess), but still. This one might even be fixable. Why don't we unwrap the nested iterators when we encounter a concat() iterator, flattening the tree? Has anyone experimented with that?

Louis Wasserman

unread,
Apr 10, 2012, 1:11:55 PM4/10/12
to Dimitris Andreou, Emily Soldal, guava-discuss
If you try it, you discover that it's a bit more complicated than that, just because you have to do it "on the fly," in the course of the iteration.

Dimitris Andreou

unread,
Apr 10, 2012, 1:49:55 PM4/10/12
to Louis Wasserman, Emily Soldal, guava-discuss
Sure. When the next iterator we access, it actually a concat(), we'd better disintermediate it. Not sure what the complication was, I guess if I find time to try (just might) I'll see for myself.

Btw, I got the complexity wrong earlier? Isn't iteration of recursive concat O(n^2)? Not just O(nlogn) - that would only be for a balanced tree.

Louis Wasserman

unread,
Apr 10, 2012, 1:56:44 PM4/10/12
to Dimitris Andreou, Emily Soldal, guava-discuss
A call to Iterator.next() on a "deeply concatenated iterator" costs time linear in the depth of the concatenation at that point.  For a balanced tree, that's O(log m), where m is the number of iterators being concatenate.d

That said, "disintermediating" the concatenated iterator is...easier said than done.

Dimitris Andreou

unread,
Apr 12, 2012, 9:34:28 PM4/12/12
to Louis Wasserman, Emily Soldal, guava-discuss
http://code.google.com/p/google-collections/issues/detail?id=151#c5

In short: we can go from O(N^3) to O(N), for all tree shapes, for a constant amount of extra memory per concat(). 

I was assuming the current implementation was quadratic (worst case), but it is cubic. Also I assumed that in the balanced tree, iteration was O(N * logN), but it is O(N * log(N)^2)

Louis Wasserman

unread,
Apr 12, 2012, 10:11:14 PM4/12/12
to Dimitris Andreou, Emily Soldal, guava-discuss
I'm just not convinced that the current implementation can be improved upon significantly, I guess.

Dimitris Andreou

unread,
Apr 12, 2012, 10:34:23 PM4/12/12
to Louis Wasserman, Emily Soldal, guava-discuss
Hmm? I already have a linear implementation, compared to the cubic one. The only line of defense of the previous algorithm is that (hopefully) most of the time the depth of the tree is a small constant, so the cubic slowdown isn't noticed. But if any user applies concat() recursively, it's a performance bug. In the meantime, I guess I'll add a complexity note - while I'm at it, I will do the same thing for Sets#union
Reply all
Reply to author
Forward
0 new messages