Optimizing StepRanges in for-loops

Matthias Reisinger

unread,

Jun 22, 2016, 7:36:38 AM6/22/16

to julia-dev, Tobias Grosser, Jameson Nash, Tim Holy

Dear contributors,

recently I was inspecting the LLVM code that Julia produces when lowering loops of the form `for i = start:step:stop`. Julia will construct a `StepRange` for these loops to iterate over:

immutable StepRange{T,S} <: OrdinalRange{T,S}
    start::T
    step::S
    stop::T

    function StepRange(start::T, step::S, stop::T)
        new(start, step, steprange_last(start,step,stop))
    end
end

Fortunately, the `StepRange` constructor will be inlined and the only noticeable residual that will also remain in the LLVM code is the call to `steprange_last`. This, however, might become problematic when working with nested loops like

for i = 1:m, j = 1:step:n

For the `j` loop their would remain a call to `steprange_last` in the LLVM code that will not be hoisted outside the outer loop, i.e., `steprange_last` will be called at each iteration of the `i` loop. I noticed this when testing Polly on Julia because the presence of such calls within loops prevents Polly from optimizing such code regions. To make this more amenable to Polly it would therefore be necessary to "optimize away" these calls. I was especially thinking of situations in which the `step` is a known constant. For example, would it be possible to lower loops like

for i = 1:2:n
    ...
end

naively to something like

i = 1
while i <= n
    ...
    i += 2
end

Or likewise, when the step is a known negative constant like in

for i = n:-2:1
    ...
end

to

i = n
while i >= 1
    ...
    i -= 2
end

This would avoid the call to `steprange_last` which would maybe also be helpful for other optimizations outside Polly?

I am looking forward to your thoughts on this!

Best regards,

Matthias

Tobias Grosser

unread,

Jun 22, 2016, 10:35:35 AM6/22/16

to juli...@googlegroups.com

Hi Matthias,

thank you for this investigations.

On Wed, Jun 22, 2016, at 01:36 PM, Matthias Reisinger wrote:
> Dear contributors,
>
> recently I was inspecting the LLVM code that Julia produces when lowering
> loops of the form `for i = start:step:stop`. Julia will construct a
> `StepRange` for these loops to iterate over:
>
> immutable StepRange{T,S} <: OrdinalRange{T,S}
> start::T
> step::S
> stop::T
>
> function StepRange(start::T, step::S, stop::T)
> new(start, step, steprange_last(start,step,stop))
> end
> end
>
>
> Fortunately, the `StepRange` constructor will be inlined and the only
> noticeable residual that will also remain in the LLVM code is the call to
> `steprange_last`. This, however, might become problematic when working
> with
> nested loops like

Did you investigate why this call is not inlined. The function seems to
be trivial enough that the inliner should possibly catch it. Is this due
to the inliner believing that it is not profitable to inline
stepprange_last? Or is there some issue with when the inliner is called
(if at all).

Best,
Tobias

Message has been deleted

Matthias Reisinger

unread,

Jun 23, 2016, 7:17:59 AM6/23/16

to julia-dev, poll...@googlegroups.com, Tobias Grosser

Hi Tobias,

thank you for the answer! After looking at jitlayers.cpp [1] it seems that Julia does not register LLVM's inlining pass. There only seems to exist explicit inlining in `emit_function` in codegen.cpp [2] and I'm not sure how this mechanism decides what to inline. I also naively tried to annotate `steprange_last` via `@inline` in order to test the effects of inlining it but at this point in the base library `@inline` is not yet available. I therefore made a naive experiment by defining my own version of `steprange_last` to imitate the result of inlining it. To analyse this, I used the following function:

@polly function steprange_test(A)
    m,n = size(A)
    for i = 1:m
        start = 1
        step = 2
        last = my_inlined_steprange_last(1,2,n)
        j = start
        while j <= last
            A[i,j] = 0
            j += step
        end
    end
end

The code that is generated for this example should roughly correspond to

@polly function steprange_test(A)
    m,n = size(A)
    for i = 1:m, j = 1:2:n
        A[i,j] = 0
    end
end

with the difference that the call to `inlined_steprange_last` will be inlined. Polly will produce the following output for the former example (for "-debug-only=polly-detect"):

Checking region: top => <Function Return>

Top level region is invalid

Checking region: top.split => L22

Non affine loop bound '***COULDNOTCOMPUTE***' in loop: if44

Checking region: L8 => L.L22_crit_edge

Non affine loop bound '***COULDNOTCOMPUTE***' in loop: if44

Region can not profitably be optimized!

Checking region: L18 => L.loopexit

Non affine loop bound '***COULDNOTCOMPUTE***' in loop: if44

Checking region: if44 => L19.L.loopexit_crit_edge

OK

Expanding if44 => L19.L.loopexit_crit_edge

Trying if44 => L.loopexit

to if44 => L.loopexit

Region can not profitably be optimized!

The region "if44 => L.loopexit" marks the inner loop, so it seems that Polly will only be able to optimize the inner code region in this example. I also tried to optimize a simplified version of this, to make sure that Polly would be able in general to optimize such a code region when emitted by Julia:

@polly function simplified_steprange_test(A)
    m,n = size(A)
    for i = 1:m
        j = 1
        while j <= n
            A[i,j] = 0
            j += 2
        end
    end
end

for which Polly will print the following:

Checking region: top => <Function Return>

Top level region is invalid

Checking region: top.split => L5

OK

Expanding top.split => L5

Expanding top.split => L5 failed

Checking region: top.split => L5

OK

The region "top.split => L5" will contain both loops, so Polly will be able to optimize also the outer loop nest here. However, I have not yet analysed the exact reason why the former example is rejected by Polly, however it seems that the inlining here will cause quite complex control flow in the outermost loop. I also attached for both examples the LLVM code emitted by Julia, just before Polly is invoked (or to be more precise, after Polly's code-preparation pass has run).

Please correct me in case I made wrong assumptions about the result of the inlining here or in case I missed anything important. I am looking forward to your thoughts on this!

Best regards,

Matthias

[1] https://github.com/JuliaLang/julia/blob/master/src/jitlayers.cpp

[2] https://github.com/JuliaLang/julia/blob/master/src/codegen.cpp#L4811

steprange_test.ll

simplified_steprange_test.ll

Tim Holy

unread,

Jun 23, 2016, 2:44:05 PM6/23/16

to juli...@googlegroups.com

Hi Matthias,

As you observed, eariy in julia's bootstrap you can't say

@inline function foo(args...)
statement1
...

but you can say

function foo(args...)
@_inline_meta
statement1
...

Best,
--Tim

On Thursday, June 23, 2016 2:33:32 AM CDT Matthias Reisinger wrote:
> Hi Tobias,
>
> thank you for the answer! After looking at jitlayers.cpp [1] it seems that
> Julia does not register LLVM's inlining pass. There only seems to exist
> explicit inlining in `emit_function` in codegen.cpp [2] and I'm not sure
> how this mechanism decides what to inline. I also naively tried to annotate

> `steprange_last` via with `@inline` in order to test the effects of

> inlining it but at this point in the base library `@inline` is not yet
> available. I therefore made a naive experiment by defining my own version

> of `steprange_last` to immitate the result of inlining it. To analyse this,

> I used the following function:
>

> @polly function foo(A)

> m,n = size(A)
> for i = 1:m
> start = 1
> step = 2
> last = my_inlined_steprange_last(1,2,n)
> j = start
> while j <= last
> A[i,j] = 0
> j += step
> end
> end
> end
>
> The code that is generated for this example should roughly correspond to
>

> @polly function foo(A)
> m,n = size(A)

> for i = 1:m, j = 1:2:n
> A[i,j] = 0
> end
> end
>
>
>
>
>

> @polly function init_array(A)

> m,n = size(A)
> for i = 1:m
> j = 1
> while j <= n
> A[i,j] = 0
> j += 2
> end
> end
> end
>
>
>

> [x] https://github.com/JuliaLang/julia/blob/master/src/jitlayers.cpp
> [x] https://github.com/JuliaLang/julia/blob/master/src/codegen.cpp#L4811

>
> On Wednesday, 22 June 2016 16:35:35 UTC+2, Tobias Grosser wrote:

Matthias Reisinger

unread,

Jun 23, 2016, 3:38:08 PM6/23/16

to julia-dev

Hi Tim,

thank you for the hint, that's indeed good to know.

Just a side note: I saw that the message that you replied to is the draft from earlier today for which I accidentally hit the post-button, sorry for that.

Best,

Matthias

Tobias Grosser

unread,

Jun 23, 2016, 3:52:37 PM6/23/16

to Matthias Reisinger, julia-dev, poll...@googlegroups.com

I also just had a look at steprange and it seems the PHI nodes %last2.1
is pretty complicated especially due to the computations around
%remain.0 that feed into it. If you add a full -O3 sequence before
running Polly, some of this is simplified away, but Polly is still not
able to catch this. Getting any insights where all this code comes from
would be interesting.

Best,
Tobias

Matthias Reisinger

unread,

Jun 24, 2016, 6:02:41 AM6/24/16

to julia-dev, matthias.j...@gmail.com, poll...@googlegroups.com, Tobias Grosser, Jameson Nash, Tim Holy

Thank you for bringing this to notice, Tobias. However, unfortunately, I also noticed with the help of `@_inline_meta` that inlining `steprange_last` in the constructor of `StepRange` will destroy the ability to inline its object construction, which is also already reflected in the comment on `steprange_last` in `base/range.jl`:

# to make StepRange constructor inlineable, so optimizer can see `step` value
function steprange_last{T}(start::T, step, stop) ...

This seems to lower the chances of getting an inlined version of `steprange_last`. Maybe it's possible to change its implementation to make it inlineable without destroying the ability to inline the `StepRange` constructor? However, the logic within `steprange_last` suggests that there have been put profound thoughts in its implementation.

Best regards,

Matthias

Jameson Nash

unread,

Jun 25, 2016, 4:12:36 PM6/25/16

to juli...@googlegroups.com, poll...@googlegroups.com, Tobias Grosser

We don't use the llvm inlining pass since we have already optimized the IR to handle inlining in inference.jl. Take a look at `inlineable` to see where we do the inlining, or `inline_worthy` to see where we compute the cost metric that decides if it appears likely to be profitable.

Reply all

Reply to author

Forward