Julia combinations again - Julia combinations vs. Julia for-loop

curiou...@gmail.com

unread,

Aug 16, 2013, 12:04:31 PM8/16/13

to julia...@googlegroups.com

Julia combinations vs. Julia for-loop

The two pieces of code achieve the same objective. One uses combinations, the other uses a nested for-loops. The latter is about 100x faster. Does this mean there is huge room for improvement in Julia combinations function, or something else is happening here?

BTW Julia's performance of the for-loop version is really impressive. PyPy takes about 12 seconds for an equivalent for-loop version and C take 0.64 seconds with -O3 option.

Code using combinations (takes about 93 seconds)

========================================

function countcombos()
counter = 0
sumcounter = 0

combos = combinations(1:334, 4)

for c in combos
counter = counter + 1
if sum(c) <= 93
sumcounter = sumcounter + 1
end
#if counter % 1000000 == 0
# println(counter)
#end
end
println("counter: $counter")
println("sumcounter: $sumcounter")
end

start_time = time()
countcombos()
stop_time = time()

println("time taken: $(stop_time - start_time)")

Code using loop (takes about 0.95 seconds)

===================================

t0 = time()

function findcombos()
counter = 0
sumcounter = 0
for i = 1:331
for j = i+1:332
for k = j+1:333
for l = k+1:334
sum = i + j + k + l
counter = counter + 1
if sum <= 93
sumcounter = sumcounter + 1
end
end
end
end
println(i)
end
println("Counter: ", counter)
println("Sumcounter: ", sumcounter)
end

findcombos()
t1 = time()
println(t1-t0)

John Myles White

unread,

Aug 16, 2013, 12:10:48 PM8/16/13

to julia...@googlegroups.com

I don't see any obvious reason why the combinations function should be so much slower except maybe that the code has to keep pulling references out of an array and can't just use four register the way the loop code might be able to.

When you're timing this sort of thing, he @time and @elapsed macros will save you a bit of typing.

-- John

Tim Holy

unread,

Aug 16, 2013, 12:38:11 PM8/16/13

to julia...@googlegroups.com

I don't have time to look into this myself, but try these tools:
https://github.com/timholy/ProfileView.jl
http://docs.julialang.org/en/latest/stdlib/profile/
(The first one is not required, but I find it nicer to use)

Best,
--Tim

On Friday, August 16, 2013 09:04:31 AM curiou...@gmail.com wrote:
> Julia combinations vs. Julia for-loop
>
> The two pieces of code achieve the same objective. One uses combinations,
> the other uses a nested for-loops. The latter is about 100x faster. Does
> this mean there is huge room for improvement in Julia combinations
> function, or something else is happening here?
>
> BTW Julia's performance of the for-loop version is really impressive. PyPy
> takes about 12 seconds for an equivalent for-loop version and C take 0.64
> seconds with -O3 option.
>
>

> *Code using combinations (takes about 93 seconds)*
> *========================================*

>
> function countcombos()
> counter = 0
> sumcounter = 0
>
> combos = combinations(1:334, 4)
>
> for c in combos
> counter = counter + 1
> if sum(c) <= 93
> sumcounter = sumcounter + 1
> end
> #if counter % 1000000 == 0
> # println(counter)
> #end
> end
> println("counter: $counter")
> println("sumcounter: $sumcounter")
> end
>
> start_time = time()
> countcombos()
> stop_time = time()
>
> println("time taken: $(stop_time - start_time)")
>
>
>

> *Code using loop (takes about 0.95 seconds)*
> *===================================*

Stefan Karpinski

unread,

Aug 16, 2013, 1:03:01 PM8/16/13

to Julia Users

It's not at all clear to me that these two code fragments are doing the same amount of work. The for loop one is using a lot of hard-coded knowledge about the specific problem you're solving whereas the combinations one is not – in particular, the latter doesn't work with any arrays at all, instead representing things using four numbers. Does a C function that can generically find all combinations for any array and any combination size perform any better? I don't imagine it would be much better. Perhaps combinations should be able to work more efficiently with ranges since they have special properties, but that's a feature request, not a performance problem.

Steven G. Johnson

unread,

Aug 16, 2013, 1:11:34 PM8/16/13

to julia...@googlegroups.com

On Friday, August 16, 2013 1:03:01 PM UTC-4, Stefan Karpinski wrote:

It's not at all clear to me that these two code fragments are doing the same amount of work. The for loop one is using a lot of hard-coded knowledge about the specific problem you're solving whereas the combinations one is not – in particular, the latter doesn't work with any arrays at all, instead representing things using four numbers. Does a C function that can generically find all combinations for any array and any combination size perform any better?

The Python itertools implementation *is* a C function. Stefan is probably right in that a specialized loop-based implementation that only works for a single case like this will inevitably be faster than any generic code, especially one that has to allocate and return arrays on each iteration. On the other hand, there may be room for special-case optimizations in our combinations() iterator for the common case of small t.

Stefan Karpinski

unread,

Aug 16, 2013, 1:18:05 PM8/16/13

to Julia Users

Right, that's true – the Python version is already a C implementation of this that we're killing, although going through the CPython API and handling arbitrary Python objects inevitably introduces a lot of overhead. A C++ templated version might be a more direct comparison, but the main point here is that these two versions aren't doing the same thing at all.

Tim Holy

unread,

Aug 16, 2013, 1:23:34 PM8/16/13

to julia...@googlegroups.com

The difference turns out to be virtually 100% garbage-collection. Someone who
cares about the performance of this code might take inspiration from Grid's
Counter, which avoids this problem:
https://github.com/timholy/Grid.jl/blob/master/src/counter.jl

--Tim

curiou...@gmail.com

unread,

Aug 16, 2013, 2:38:10 PM8/16/13

to julia...@googlegroups.com

John and Tim, thanks for the suggestions about tools that I can use for profiling.

Stefan and Steven, maybe there is some confusion, but in this case I am not talking about python itertools, but comparison of two ways of attaining a given result in Julia. I understand that the two ways are different, but I am still surprised that the performance can be that large, and I suppose the "surprise" is because I don't understand these things well. My thinking was, both codes have to produce every combination for the given case (combinations of 4 from 1:334, the loop does it one by one, the combinations() function probably returns some form of iterator) and obtain the sum of elements in each combination, and I did not think that the only way to write a flexible function is something that would make it so much slower than manually generating such a combination for a specific case.

Finally, I am very new to C, started learning it 3 days back, and hence don't know yet how to use the combinations library that is available in GSL. But I think what you are suggesting is to check what happens in that case. I will post back when I get around to checking with that...but it may take me some time to get to that.

Thank you.

Steven G. Johnson

unread,

Aug 16, 2013, 5:11:12 PM8/16/13

to julia...@googlegroups.com

On Friday, August 16, 2013 2:38:10 PM UTC-4, curiou...@gmail.com wrote:

Finally, I am very new to C, started learning it 3 days back, and hence don't know yet how to use the combinations library that is available in GSL. But I think what you are suggesting is to check what happens in that case. I will post back when I get around to checking with that...but it may take me some time to get to that.

Note that GSL will also have a performance advantage for the same allocation-based reason: GSL identifies a combination by an array of indices, and pre-allocates the index array before iteration.

To get this kind of performance in Julia, we'd mainly just need to pre-allocate the array holding the combinations (and be careful of other memory allocations in the inner loop). The downside of this is that the user would have to be aware that different iterations are returning the same array with mutated contents. e.g. collect(combinations(...)) wouldn't work.

Tim Holy

unread,

Aug 16, 2013, 7:22:04 PM8/16/13

to julia...@googlegroups.com

On Friday, August 16, 2013 02:11:12 PM Steven G. Johnson wrote:
> To get this kind of performance in Julia, we'd mainly just need to
> pre-allocate the array holding the combinations (and be careful of other
> memory allocations in the inner loop).

That's exactly how Counter works, and why it's very fast. Here, one extra
challenge is figuring out the right approach for cases where the different items
have different types (if that's relevant at all).

--Tim

Kevin Squire

unread,

Aug 16, 2013, 8:47:02 PM8/16/13

to julia...@googlegroups.com

When I reworked the `partitions` functions recently (which are quite related to combinations), I explicitly allocated a new array for each output. Before this, I was preallocating and reusing the output array, and `collect(partitions(x))` gave an array of references to the preallocated array, which all had the same value.

It would be possible for `collect` to be specialized for `combinations` and company, which is a compromise between having a different value for every iteration and full copy-on-write semantics.

Kevin

P.S. For reference, `collect` does not work with Counter for the same reason:

julia> for i in Counter([2,3])
          println(i)
       end
[1,1]
[2,1]
[1,2]
[2,2]
[1,3]
[2,3]

julia> collect(Counter([2,3]))
6-element Any Array:
 [1,4]
 [1,4]
 [1,4]
 [1,4]
 [1,4]
 [1,4]

Stefan Karpinski

unread,

Aug 16, 2013, 8:52:55 PM8/16/13

to Julia Users

Our general trend has been towards allowing the "dangerous", fast version, documented to be dangerous, and let the user do explicit copying. I'd be ok with that here. Kevin, your intermediate solution sounds pretty reasonable to me.

Tim Holy

unread,

Aug 17, 2013, 6:31:37 AM8/17/13

to julia...@googlegroups.com

Ah, I see the choices here. Specializing `collect` to make the copy seems like
a good compromise; allocate new memory only in cases where it's required.

--Tim

Reply all

Reply to author

Forward