Generators vs Comprehensions, Type-stability?

518 views
Skip to first unread message

Christoph Ortner

unread,
Sep 22, 2016, 2:21:36 PM9/22/16
to julia-users
I hope that there is something I am missing, or making a mistake in the following example: 

r = rand(10)
test1
(r) = sum( t^2 for t in r )
test2
(r)= sum( [t^2 for t in r] )
@code_warntype test1(r)   # return type Any is inferred
@code_warntype test2(r)   # return type Float64 is inferred


This caused a problem for me, beyond execution speed: I used a generator to create the elements for a comprehension, since the type was not inferred the zero-element could not be created.

Is this a known issue?

Stefan Karpinski

unread,
Sep 22, 2016, 3:27:37 PM9/22/16
to Julia Users
I see the same, yet:

julia> r = rand(10^5);

julia> @time test1(r)
  0.000246 seconds (7 allocations: 208 bytes)
33375.54531253989

julia> @time test2(r)
  0.001029 seconds (7 allocations: 781.500 KB)
33375.54531253966

So test1 is efficient, despite the codewarn output. Not sure what's up.

Christoph Ortner

unread,
Sep 22, 2016, 3:49:00 PM9/22/16
to julia-users
I didn't actually test performance - the problem for me was re-use of the output of test1. But it is hard to reproduce this with a simple example. The same code works in some situations and not in others - I haven't yet found out why.


Patrick Kofod Mogensen

unread,
Sep 22, 2016, 3:50:39 PM9/22/16
to julia-users
I've seen the same, and the answer I got at the JuliaLang gitter channel was that it could not be inferred because r could be of length 0, and in that case, the return type could not be inferred. My Julia-fu is too weak to then explain why the comprehension would be able to infer the return type.

Christoph Ortner

unread,
Sep 22, 2016, 4:43:57 PM9/22/16
to julia-users
sum( Float64[] ) = 0.0 ?

Christoph Ortner

unread,
Sep 22, 2016, 5:25:21 PM9/22/16
to julia-users
Yeah, this definitely matters for performance. It is a real shame since the generators are so elegant to use.

inner1(R, i) = sum( R[j,i] for j = 1:size(R,1) )
inner2
(R, i) = sum( [R[j,i] for j = 1:size(R,1)] )
   
function test(R, inner)
    n
= [ inner(R, i)^2 for i = 1:size(R,2) ]
    N
= length(n)
    s
= 0.0
   
for i = 1:N, j = max(1, i-5):min(N, i+5)
        s
+= n[i] * n[j]
   
end
   
return s
end
   
R
= rand(10, 1000);


@time test(R, inner1)
@time test(R, inner1)
@time test(R, inner1)  
#   0.002539 seconds (76.02 k allocations: 1.396 MB)
#   0.002264 seconds (76.02 k allocations: 1.396 MB)
#   0.002094 seconds (76.02 k allocations: 1.396 MB)


@time test(R, inner2)
@time test(R, inner2)
@time test(R, inner2)
#   0.000131 seconds (4.01 k allocations: 242.547 KB)
#   0.000106 seconds (4.01 k allocations: 242.547 KB)
#   0.000104 seconds (4.01 k allocations: 242.547 KB)



Christoph Ortner

unread,
Sep 22, 2016, 5:30:19 PM9/22/16
to julia-users
would it maybe be possible to introduce a macro like @inbounds that somehow turns off the check that the generator is empty?

Patrick Kofod Mogensen

unread,
Sep 22, 2016, 5:47:43 PM9/22/16
to julia-users
There might be a perfectly valid explanation for this, but this also surprises me.
r = rand(10)
f
(x) =  x^2
test1
(r) = sum( f(x) for t in r )
test2
(r) = sum( [f(x) for t in r] )

@code_warntype test1(r)   # return type Any is inferred
@code_warntype test2(r)   # return type Float64 is inferred


g
(x)::Float64 =  x^2
test3
(r) = sum( g(x) for t in r )
test4
(r) = sum( [g(x) for t in r] )
@code_warntype test3(r)   # return type Any is inferred
@code_warntype test4(r)   # return type Any is inferred

Why would the return type annotation in g(x) (compared to f(x)) ruin inference for test4? I might be doing something stupid, but...

Tsur Herman

unread,
Sep 22, 2016, 5:48:50 PM9/22/16
to julia-users

On my side both function perform equally. although test2 had to be timed twice to get to the same performance.

julia> test2(x)= sum( [t^2 for t in x] )

julia> @time test2(r)
  0.017423 seconds (13.22 k allocations: 1.339 MB)

julia> @time test2(r)
  0.000332 seconds (9 allocations: 781.531 KB)

I think the discrepancy  comes from the JITing process because if I time it without using  the macro @time, it works from the first run.

julia> test2(x)= sum( [t^2 for t in x] )
WARNING: Method definition test2(Any) in module Main at REPL[68]:1 overwritten at REPL[71]:1.
test2 (generic function with 1 method)

julia> tic();for i=1:10000 ; test2(r);end;toc()/10000
elapsed time: 3.090764498 seconds
0.0003090764498

About the memory footprint -> test2 first constructs the inner vector then calls sum.


since the type was not inferred the zero-element could not be created.
The sum of an empty set or vector is undefined it is not zero.
you can rewrite it in a more explicit way

test3(r) = begin
    total = Float64(0);
 for t in r total+=t ;end;end

Steven G. Johnson

unread,
Sep 22, 2016, 5:50:18 PM9/22/16
to julia-users
I don't think the empty case should be the problem.  If it can't infer the type, sum just throws an error.  So test1(r) actually always returns the same type for r::Array{Float64} in any case where it returns a value at al.

The real problem is that eltype(t^2 for t in rand(10)) returns Any.

Tsur Herman

unread,
Sep 22, 2016, 5:54:29 PM9/22/16
to julia-users
By the way my test3 functions is super fast

 
@time test3(r)
 
0.000032 seconds (4 allocations: 160 bytes)

Tsur Herman

unread,
Sep 22, 2016, 6:10:29 PM9/22/16
to julia-users
The real problem is that eltype(t^2 for t in rand(10)) returns Any.


that is not a problem (t^2 for t in rand(10)) is a generator its element type is Any which means a pointer to something complex.

Steven G. Johnson

unread,
Sep 22, 2016, 8:23:18 PM9/22/16
to julia-users


On Thursday, September 22, 2016 at 6:10:29 PM UTC-4, Tsur Herman wrote:
The real problem is that eltype(t^2 for t in rand(10)) returns Any.


that is not a problem (t^2 for t in rand(10)) is a generator its element type is Any which means a pointer to something complex.

It is a problem, because it means that the result type of sum cannot be inferred.

We could use type inference on the function t -> t^2 (which is buried in the generator) to determine a more specific eltype. 

Tsur Herman

unread,
Sep 23, 2016, 12:38:00 AM9/23/16
to julia...@googlegroups.com
I can a see a point in what you say .. eltype of a function should be the return type of that function if it can be inferred. 
Because an array is just a special kind of function with a special notation.

Milan Bouchet-Valat

unread,
Sep 23, 2016, 2:25:24 AM9/23/16
to julia...@googlegroups.com
Le jeudi 22 septembre 2016 à 14:54 -0700, Tsur Herman a écrit :
> By the way my test3 functions is super fast
>
>  @time test3(r)
>   0.000032 seconds (4 allocations: 160 bytes)
Beware, if you don't return 'total' from the function, LLVM optimizes
away the whole loops and turns the function into a no-op (have a look
at @code_llvm or @code_native).


Regards
> > > I've seen the same, and the answer I got at the JuliaLang gitter
> > > channel was that it could not be inferred because r could be of
> > > length 0, and in that case, the return type could not be
> > > inferred. My Julia-fu is too weak to then explain why the
> > > comprehension would be able to infer the return type.
> > >

Michele Zaffalon

unread,
Sep 23, 2016, 2:42:00 AM9/23/16
to julia...@googlegroups.com
On Fri, Sep 23, 2016 at 2:23 AM, Steven G. Johnson <steve...@gmail.com> wrote:

We could use type inference on the function t -> t^2 (which is buried in the generator) to determine a more specific eltype. 

Does this not require evaluating the function on all inputs thereby losing the advantage of having a generator? 

Christoph Ortner

unread,
Sep 23, 2016, 4:13:53 AM9/23/16
to julia-users
The sum of an empty set or vector is undefined it is not zero.
you can rewrite it in a more explicit way


Actually a sum over an empty set is normally defined to be zero while a product over an empty set is normally defined to be one.
 

Christoph Ortner

unread,
Sep 23, 2016, 4:14:35 AM9/23/16
to julia-users
why would type inference for sum(t^2 for t in r) be different from [t^2 for t in r] ?

Michele Zaffalon

unread,
Sep 23, 2016, 4:27:44 AM9/23/16
to julia...@googlegroups.com
From reading the manual, producing a value is delayed until that value is required. Is my understanding wrong?

Tsur Herman

unread,
Sep 23, 2016, 5:54:14 AM9/23/16
to julia-users

{We could use type inference on the function t -> t^2 (which is buried in the generator) to determine a more specific eltype.}

We can declare :

eltype(G::Base.Generator) = Base.code_typed(G.f,(eltype(G.iter),))[1].rettype

element type of Generator G will be the inferred return type of G.f with arguments of type eltype(G.iter)

And more generally the element type of function F with arguments args can be set to
Base.code_typed(F,args)[1].rettype 

Steven G. Johnson

unread,
Sep 23, 2016, 7:14:18 AM9/23/16
to julia-users
No, not if the eltype of the thing the generator iterates over (in this case, an Array{Float64}) is known.

Steven G. Johnson

unread,
Sep 23, 2016, 7:16:33 AM9/23/16
to julia-users
More precisely, the empty sum and product are the additive and multiplicative identities, respectively.

However, if you are summing something whose element type is unknown (Any), then the additive identity is also unknown (zero(Any) is undefined), and so sum(Any[]) is undefined (and throws an error).

Patrick Kofod Mogensen

unread,
Sep 23, 2016, 9:00:29 AM9/23/16
to julia-users
I still don't quite get why a) inference between the generator and the comprehension is different, and b) why inference went down the drain when I added the type annotation for the return value in my example above... Sorry if the answer is in this discussion somewhere!

Cristóvão Duarte Sousa

unread,
Sep 23, 2016, 9:35:28 AM9/23/16
to julia-users
On the other hand, it works ok for the mean function:

= rand(10)
test
(r) = mean( t^2 for t in r )
@code_warntype test(r)   # return type Float64 is inferred

Michele Zaffalon

unread,
Sep 24, 2016, 9:09:38 AM9/24/16
to julia...@googlegroups.com
Sorry for being slow: the input array rand(10) or the output array, the square of each element of rand(10)?

julia> (begin;println(t);t^2;end for t=1:10)
Base.Generator{UnitRange{Int64},##37#38}(#37,1:10)

Steven G. Johnson

unread,
Sep 24, 2016, 2:54:35 PM9/24/16
to julia-users


On Saturday, September 24, 2016 at 9:09:38 AM UTC-4, Michele Zaffalon wrote: 
Sorry for being slow: the input array rand(10) or the output array, the square of each element of rand(10)?

julia> (begin;println(t);t^2;end for t=1:10)
Base.Generator{UnitRange{Int64},##37#38}(#37,1:10)

Julia knows that the input to the generator is a UnitRange{Int64}, i.e. 1:10, so the input elements are Int64.   It knows that the function being computed is t -> t^2.   The compiler is smart enough that it can figure out that if t is an Int64, then t^2 is an Int64 too, because that function is type stable.   So, it can figure out that the eltype of the output (i.e. the Generator) is Int64, all at compile time without actually evaluating the function.

Michele Zaffalon

unread,
Sep 25, 2016, 1:03:44 PM9/25/16
to julia...@googlegroups.com
On Sat, Sep 24, 2016 at 8:54 PM, Steven G. Johnson <steve...@gmail.com> wrote:

julia> (begin;println(t);t^2;end for t=1:10)
Base.Generator{UnitRange{Int64},##37#38}(#37,1:10)

Julia knows that the input to the generator is a UnitRange{Int64}, i.e. 1:10, so the input elements are Int64.   It knows that the function being computed is t -> t^2.   The compiler is smart enough that it can figure out that if t is an Int64, then t^2 is an Int64 too, because that function is type stable.   So, it can figure out that the eltype of the output (i.e. the Generator) is Int64, all at compile time without actually evaluating the function.

It just feels like magic. Thank you for the explanation.

Christoph Ortner

unread,
Sep 25, 2016, 5:29:15 PM9/25/16
to julia-users

I didn't quite follow what the conclusion is: is it a bug that should be fixed (i.e. open an issue?), or is it expected behaviour and I should stop using generators when I need type inference?

Thanks.

Michele Zaffalon

unread,
Sep 25, 2016, 5:50:48 PM9/25/16
to julia...@googlegroups.com
In an email a few days back, Steven Johnson wrote: "We could use type inference on the function t -> t^2 (which is buried in the generator) to determine a more specific eltype." A feature request, maybe?
Reply all
Reply to author
Forward
0 new messages