Cartesian porduct function

690 views
Skip to first unread message

Glen Hertz

unread,
Dec 15, 2012, 10:23:56 PM12/15/12
to julia...@googlegroups.com
Hi,

Does Julia have a way to calculate the cartesian product?  Something like:

product([1,2,3],[5,6]) --> [[1,5], [1,6], [2,5], [2,6], [3,5], [3,6]]

Thanks,

Glen

Patrick O'Leary

unread,
Dec 15, 2012, 10:39:21 PM12/15/12
to julia...@googlegroups.com
[(x, y) for x in [1,2,3], y in [5 6]] is my usual preference. Note that various choices of brackets ([], {}, ()) for either the reduction or the comprehension can have subtly different effects on the result; you can experiment a bit to get a feel.

Glen Hertz

unread,
Dec 15, 2012, 11:19:43 PM12/15/12
to julia...@googlegroups.com
That's very nice but I should have been cleaer.  I'm passing a variable number of vectors to a function.  For example:

function myfunc(vectors...)
  for combination in product(vectors...)
    # do something with each cartesian product
  end
end

myfunc([1,2,3], [4,5], [7,8])


Kevin Squire

unread,
Dec 16, 2012, 2:18:07 AM12/16/12
to julia...@googlegroups.com
There is a product function in the Iterators package:

julia> load("pkg")

julia> Pkg.update()
...
julia> Pkg.add("Iterators")
Installing Iterators: v0.0.0
Cloning into 'Iterators'...
remote: Counting objects: 7, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 7 (delta 0), reused 7 (delta 0)
Receiving objects: 100% (7/7), done.

julia> require("Iterators")

julia> for prod in Iterators.product([1,2],[3,4],[5,6])
          println(prod)
       end
(1,3,5)
(2,3,5)
(1,4,5)
(2,4,5)
(1,3,6)
(2,3,6)
(1,4,6)
(2,4,6)

Kevin

Glen Hertz

unread,
Dec 17, 2012, 9:41:58 PM12/17/12
to julia...@googlegroups.com
Hi Kevin,

Thanks, this is what I was looking for.  Do you know what the thinking is for why this isn't included as part of the Julia?

Glen

Kevin Squire

unread,
Dec 18, 2012, 1:09:59 AM12/18/12
to julia...@googlegroups.com
Hi Glen,

I think the current strategy is to keep the core of julia minimal and have as much as possible in packages.  I'm not sure what language(s) you're coming from, but even, e.g., in python, this type of operation is part of the itertools package (which is included with python, but you still have to import it).  At least one of the core developers (Jeff) has expressed interest in creating a version of julia available for down load which contains many/most packages, but for the near future, at least, they'll be bundled separately.

Kevin

Tim Holy

unread,
Dec 18, 2012, 5:16:45 AM12/18/12
to julia...@googlegroups.com
On Monday, December 17, 2012 06:41:58 PM Glen Hertz wrote:
> Thanks, this is what I was looking for. Do you know what the thinking is
> for why this isn't included as part of the Julia?

It's also probably fair to say that everyone has a few things they use all the
time and would rather have be part of "core." But because those things are
different for different people, adding them all would definitely bloat Julia.

A nice way to make sure you always have your favorite functionality available
is to customize your ~/.julirarc.jl file, which gets run on startup.

--Tim

Glen Hertz

unread,
Dec 18, 2012, 3:41:34 PM12/18/12
to julia...@googlegroups.com
I agree to not load() everything but having it available by default to load and documented would help out a lot.  I find that when I add a package and use help() it doesn't return anything.  I'm also not sure about the package quality when I grab it externally and I find it hard to do what I want.  For example, it is not obvious to me how to search and replace a string.  Is there any way to find out what package might provide this functionality?  If it is reasonable for any user to expect it as part of basic functionality (and it satisfies the core developers) I think it should be part of the install and documented under "Extras" in the manual.

Glen



--Tim

--



Jason Knight

unread,
Dec 18, 2012, 4:10:30 PM12/18/12
to julia...@googlegroups.com
Unfortunately, the documentation system (esp. for packages) is in flux and being discussed here: https://github.com/JuliaLang/julia/pull/1619

So help() will not work for packages at the moment.

Even the core documentation is in constant flux as it is hard to track a moving target. For example, see all of the file API undergoing a renaming here: https://github.com/JuliaLang/julia/issues/1782.

Kevin Squire

unread,
Dec 18, 2012, 5:07:48 PM12/18/12
to julia...@googlegroups.com

Regarding your desire to have basic packages available: one thing that might be useful would be to have package groups, including one which includes a number of useful packages by default.  (I'm struggling to come up with a good name... "Base" and "Core" are already used internally by julia...)  It probably isn't useful to include specialized packages in that group (e.g., I work in bioinformatics, and most bioinformatics packages won't be useful to most people), but a few core packages would


On Tuesday, December 18, 2012 12:41:34 PM UTC-8, Glen Hertz wrote:
I agree to not load() everything but having it available by default to load and documented would help out a lot.  I find that when I add a package and use help() it doesn't return anything.  I'm also not sure about the package quality when I grab it externally and I find it hard to do what I want.  For example, it is not obvious to me how to search and replace a string.  Is there any way to find out what package might provide this functionality?  If it is reasonable for any user to expect it as part of basic functionality (and it satisfies the core developers) I think it should be part of the install and documented under "Extras" in the manual.

Yeah, I think Julia is at a place where more people, like yourself, want to try it out, but it's documentation isn't quite up to snuff.  Hopefully that will be decided soon in the thread that Jason mentioned.!  In the mean time, I'll note that the (undocumented) replace() function is what you're looking for (and it's in Base).

I did post an issue previously about Itertools not being documented (https://github.com/JuliaLang/julia/issues/1506), but haven't gotten around to doing this myself, and no one else has picked up on it.

Kevin

Kevin Squire

unread,
Dec 18, 2012, 5:08:51 PM12/18/12
to julia...@googlegroups.com

Yeah, I think Julia is at a place where more people, like yourself, want to try it out, but it's documentation isn't quite up to snuff.  Hopefully that will be decided soon in the thread that Jason mentioned.!  In the mean time, I'll note that the (undocumented) replace() function is what you're looking for (and it's in Base).

(And just in case: Base is loaded by default.)

Cheers!

  Kevin

Jeff Bezanson

unread,
Dec 18, 2012, 5:44:00 PM12/18/12
to julia...@googlegroups.com
Iterating over a cartesian product is an especially interesting and
important case. It is very hard to efficiently update the iteration
state by one step, as you can see by looking at the code for the
Product iterator. This is one reason I hesitate to add this to Base.
I have been experimenting with this. So far it seems to me the most
efficient approach is to invert control and have a dedicated function
handle all the iteration:

product(a, b, c, ...) do (x, y, z, ...)
# loop body
end

I can make that 10x faster than any other approach we currently have.
This is definitely food for thought.
Tim, I suspect this might be relevant/helpful to your grid package
among many other things?
> --
>
>

Tim Holy

unread,
Dec 18, 2012, 6:21:25 PM12/18/12
to julia...@googlegroups.com
On Tuesday, December 18, 2012 05:44:00 PM Jeff Bezanson wrote:
> Tim, I suspect this might be relevant/helpful to your grid package
> among many other things?'

I was thinking the same thing---Grid's "Counter" iterator is exactly of this
type. I once tried a similar implementation with tuples, and got pretty poor
performance with mine, especially compared to the Array-based implementation
in Grid now. Bbut I didn't try the one in Iterators, simply because I didn't
remember its existence.

In its current incarnation, Grid's Counter performance seems quite good, with
a caveat (you surely remember this, but for others...): for grids where the
first dimension is not small, it turns out you can get particularly good
performance simply by expunging the first dimension from the general
multidimensional iterator and handling it separately in a for loop. (An
example of this trick can be found in prolong() for non-BLAS types). Under
this condition, you get pretty close to linear-indexing performance, so there
isn't much room for performance improvement.

This trick works great for grids with sizable first dimensions, because you
don't end up incrementing the Counter that often. For small grids, I've not
had success in getting great performance from anything other than
make_arrayind_loop_nest type of code, which is a bit scary for routine use.

So something that works in the general case would be most welcome!

--Tim

Tim Holy

unread,
Dec 18, 2012, 9:16:25 PM12/18/12
to julia...@googlegroups.com
Jeff, I did some performance testing. Counter is less general than Product, but
it is a wee bit faster:

File testcart.jl:
require("Iterators")
require("Grid")
using Iterators
using Grid

function it_product(niter::Int, r...)
s = 0
for i = 1:niter
for p in Iterators.Product(r...)
s += sum(p)
end
end
s
end

function it_counter(niter::Int, sz...)
s = 0
for i = 1:niter
for c in Counter(sz)
s += sum(c)
end
end
s
end

# Do each once to compile, then run for real
it_product(1, 10, 10)
@time sp = it_product(200, 1:10, 1:10, 1:10, 1:10)
it_counter(1, 10, 10)
@time sc = it_counter(200, 10, 10, 10, 10)
@assert sp == sc



julia> include("testcart.jl")
elapsed time: 2.6126868724823 seconds
elapsed time: 0.05725288391113281 seconds

How does this compare with your do method?

--Tim


On Tuesday, December 18, 2012 05:44:00 PM Jeff Bezanson wrote:

Jeff Bezanson

unread,
Dec 18, 2012, 10:34:49 PM12/18/12
to julia...@googlegroups.com
I get 2.21 seconds for it_product and 0.045 for Counter, similar to
your times, and my "do" method gets 0.3 seconds. In your case
necessity has certainly been the mother of optimization!

The advantages of my method are that it is purely functional, and will
work on any combination of iterables. There is also the hope that we
could eventually inline the closure, which could dramatically improve
my time. Counter is probably pretty close to maxed-out.

My code is just a simple wrapper for the nested loop generator we all
know and have mixed feelings about:

prod_cache = Dict()
function product(body, it...)
gen_cartesian_map(prod_cache, ivars->:(body($(ivars...))),
it, (:body,), body)
end
> --
>
>

Tim Holy

unread,
Dec 19, 2012, 5:21:03 AM12/19/12
to julia...@googlegroups.com
On Tuesday, December 18, 2012 10:34:49 PM Jeff Bezanson wrote:
> The advantages of my method are that it is purely functional, and will
> work on any combination of iterables.

That's a big plus, especially for anything one contemplates putting in Base.

> There is also the hope that we
> could eventually inline the closure, which could dramatically improve
> my time. Counter is probably pretty close to maxed-out.
>
> My code is just a simple wrapper for the nested loop generator we all
> know and have mixed feelings about:
>
> prod_cache = Dict()
> function product(body, it...)
> gen_cartesian_map(prod_cache, ivars->:(body($(ivars...))),
> it, (:body,), body)
> end

That's very elegant.

I'm actually a bit puzzled. We definitely know that gen_cartesian_map and
related can be blazingly fast, especially because in this case it doesn't have
to do that "multidimensional index"->"linear index" calculation that motivated
gen_array_index_map. The only thing I can think of is the closure, but isn't
the closure only being run once? If so, I confess I still don't understand why
your method doesn't beat mine. Could there be an issue with type inference
inside the body? Is it the dictionary lookup on the body expression? (If so a
local copy of prod_cache, plus integer-based dictionary lookup, might be a big
improvement.) Or is dictionary lookup somehow failing, and it needs to
recompile the body each time?

It occurs to me that perhaps running the profiler on your code along with a
renamed copy of gen_cartesian_map might lead to some insights. I'm too short
on time for the next couple of days to dig into this any further now, however.

--Tim

Reply all
Reply to author
Forward
0 new messages