Request for limits on Sets.cartesianProduct

243 views
Skip to first unread message

vinodhi...@gmail.com

unread,
Sep 3, 2013, 7:27:24 AM9/3/13
to guava-...@googlegroups.com
Hello All,
 
Thank you for such a great Guava project.
 
It would be really good to have an overloaded method for Sets.cartesianProduct(List<ImmutableSet<String>>, int limit);
 
e.g:
If limit is 200, only the first 200 cartesian product will be returned.
 
Advantages:
1.If the user only wants the first 200 elements, he can get it
2. If the user only wants first 200, but if the cartesian product is more than Integer.MAX_VALUE, it throws cartesian product too big. So by introducing the limit, user can get the limited number of outputs from cartesianProduct method
 
Hope this would be introduced in nearing releases. :)
 
Thanks
Vinodhini
 

Osvaldo Doederlein

unread,
Sep 3, 2013, 11:34:29 AM9/3/13
to vinodhi...@gmail.com, guava-discuss
This looks like a relatively niche feature, remarkably as you need to make it even more complex by defining what you mean by "first 200 elements". Is this the first 200 tuples, each with unrestricted size (no good if your input Sets have very skewed sizes)? Or maybe the opposite (i.e. "column-first" for two Sets), ok that you can get by changing the order of the Sets. But maybe you want the first 200 elements, no matter how many tuples. Or, somebody else may want a limit determined by diagonal scanning: the smallest top-left-cornered square (, cube...) that will contain at least 200 elements; this has the advantage of sampling the same number of elements of all input Sets (but may return less than the desired limit for skewed inputs). You could also make diagonal scanning with proportionally-identical limits for each Set, which enforces identical sampling of each Set and avoids problems with skewed inputs.

Or you can easily emulate all these scenarios by passing subsets of the inputs, like this:

int limit = 200;
double fraction = (double)limit / (list1.size() + list2.size() + list3.size());
Lists.cartesianProduct(ImmutableList.of(
    list1.subList(0, max(list1.size(), (int) ceil(fraction * list1.size())),
    list2.subList(0, max(list2.size(), (int) ceil(fraction * list2.size())),
    list3.subList(0, max(list3.size(), (int) ceil(fraction * list3.size())));

This code implements the fanciest, proportional diagonal sampling; other scenarios will be simpler. Notice you need Lists for this due to the sublist operations, lists are really more adequate for code that needs to slice and dice data (wouldn't be hard to take a random sample of each 'dimension', etc.). If your input data is already in ImmutableSets, the ImmutableList.copyOf() operation will be O(1).

 


--
--
guava-...@googlegroups.com
Project site: http://guava-libraries.googlecode.com
This group: http://groups.google.com/group/guava-discuss
 
This list is for general discussion.
To report an issue: http://code.google.com/p/guava-libraries/issues/entry
To get help: http://stackoverflow.com/questions/ask (use the tag "guava")
---
You received this message because you are subscribed to the Google Groups "guava-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guava-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/guava-discuss/fc8abf9d-b8d6-414e-a48b-4f73598c0358%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Osvaldo Doederlein | Software Engineer, Doubleclick Ad Exchange |  opi...@google.com

Jared Levy

unread,
Sep 3, 2013, 12:57:40 PM9/3/13
to vinodhi...@gmail.com, guava-discuss
On Tue, Sep 3, 2013 at 4:27 AM, <vinodhi...@gmail.com> wrote:
Hello All,
 
Thank you for such a great Guava project.
 
It would be really good to have an overloaded method for Sets.cartesianProduct(List<ImmutableSet<String>>, int limit);
 
e.g:
If limit is 200, only the first 200 cartesian product will be returned.
 
Advantages:
1.If the user only wants the first 200 elements, he can get it

To deal with this case, you could pass the output of Sets.cartesianProduct() to Iterables.limit().


 
2. If the user only wants first 200, but if the cartesian product is more than Integer.MAX_VALUE, it throws cartesian product too big. So by introducing the limit, user can get the limited number of outputs from cartesianProduct method
 
Hope this would be introduced in nearing releases. :)
 
Thanks
Vinodhini
 

--
Reply all
Reply to author
Forward
0 new messages