[llvm-dev] LSR

Jonas Paulsson via llvm-dev

unread,

Apr 10, 2017, 9:47:22 AM4/10/17

to llvm-dev

Hi,

I find that LSR is not helping enough on avoiding unfoldable offsets for
SystemZ. When the loop has three stores with unfoldable offsets, LSR
rewrites the IV in a good way. However, if adding another store with a
foldable offset that fits already, LSR fails to rewrite the three stores.

And if I happen to add a too big *positive* offset (the first three were
negative) instead of a foldable one, only the positive gets transformed.

* LSR is not rewriting the IV to have three foldable offsets rather than
one.

* It would actually be preferred in this case to use a second address
register for the offset that is too far away from the others.

Has anyone any idea on how to best handle this? Can LSR "split" an IV to
use an extra register? Or would this need to be done in a target
specific pass?

For a reduced test case for this problem, see
https://bugs.llvm.org//show_bug.cgi?id=32548.

Thanks,

Jonas

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Hal Finkel via llvm-dev

unread,

Apr 10, 2017, 10:53:10 PM4/10/17

to Jonas Paulsson, llvm-dev

On 04/10/2017 08:47 AM, Jonas Paulsson via llvm-dev wrote:
> Hi,
>
> I find that LSR is not helping enough on avoiding unfoldable offsets
> for SystemZ. When the loop has three stores with unfoldable offsets,
> LSR rewrites the IV in a good way. However, if adding another store
> with a foldable offset that fits already, LSR fails to rewrite the
> three stores.
>
> And if I happen to add a too big *positive* offset (the first three
> were negative) instead of a foldable one, only the positive gets
> transformed.
>
> * LSR is not rewriting the IV to have three foldable offsets rather
> than one.
>
> * It would actually be preferred in this case to use a second address
> register for the offset that is too far away from the others.
>
> Has anyone any idea on how to best handle this? Can LSR "split" an IV
> to use an extra register? Or would this need to be done in a target
> specific pass?

When you say "an extra address register" would this imply LSR adding an
additional PHI?

-Hal

>
> For a reduced test case for this problem, see
> https://bugs.llvm.org//show_bug.cgi?id=32548.
>
> Thanks,
>
> Jonas
>
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Jonas Paulsson via llvm-dev

unread,

Apr 11, 2017, 9:52:56 AM4/11/17

to Hal Finkel, llvm-dev

>> Has anyone any idea on how to best handle this? Can LSR "split" an IV
>> to use an extra register? Or would this need to be done in a target
>> specific pass?
>
> When you say "an extra address register" would this imply LSR adding
> an additional PHI?
>
> -Hal
>

Yes, that would have worked well at least in this type of loop. Can LSR
do this?

I experimented with adding a check for 12 bit offsets distance in
isProfitableIncrement() (checking against all members of the chain),
which resulted in several chains being produced by LSR, instead of just
one. The chains that now formed now had immediate offsets that were
close to each other, so that they should result in addresses with 12 bit
offsets. But, to my disappointment, LSR did not handle these different
chains by generating new PHI-nodes for each of them (or by skipping
those that ended up with just one store in the chain), but instead it
still output the stores in the same way as before.

I also see that LSR is thinking in terms of increments between the
memory accesses. In the loop I am working with it's disappointing to see
that before each memory access, the base address is loaded into
register, and then the offset is added, and then the access, which is 3
instructions. It should have been just an add/sub after the previous
access before the memory access, per LSRs intentions. I wonder where
this is supposed to be handled: In some sort of target pre-isel pass
that chains the GEPs? Or is this just folded more often on other targets?

/Jonas

Hal Finkel via llvm-dev

unread,

Apr 12, 2017, 10:11:50 AM4/12/17

to Jonas Paulsson, llvm-dev

On 04/11/2017 08:52 AM, Jonas Paulsson wrote:
>
>>> Has anyone any idea on how to best handle this? Can LSR "split" an
>>> IV to use an extra register? Or would this need to be done in a
>>> target specific pass?
>>
>> When you say "an extra address register" would this imply LSR adding
>> an additional PHI?
>>
>> -Hal
>>
> Yes, that would have worked well at least in this type of loop. Can
> LSR do this?

No, LSR won't add new PHIs. This is a long-standing deficiency (and
also, in part, prevents it from properly handling pre/post-inc
addressing modes, which motives some target-specific passes such as
lib/Target/PowerPC/PPCLoopPreIncPrep.cpp).

>
> I experimented with adding a check for 12 bit offsets distance in
> isProfitableIncrement() (checking against all members of the chain),
> which resulted in several chains being produced by LSR, instead of
> just one. The chains that now formed now had immediate offsets that
> were close to each other, so that they should result in addresses with
> 12 bit offsets. But, to my disappointment, LSR did not handle these
> different chains by generating new PHI-nodes for each of them (or by
> skipping those that ended up with just one store in the chain), but
> instead it still output the stores in the same way as before.
>
> I also see that LSR is thinking in terms of increments between the
> memory accesses. In the loop I am working with it's disappointing to
> see that before each memory access, the base address is loaded into
> register, and then the offset is added, and then the access, which is
> 3 instructions. It should have been just an add/sub after the previous
> access before the memory access, per LSRs intentions. I wonder where
> this is supposed to be handled: In some sort of target pre-isel pass
> that chains the GEPs? Or is this just folded more often on other targets?

As I recall, it does not do this now (although this is also needed for
handling pre/post-inc addressing modes properly).

-Hal

>
> /Jonas
>

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________

Jonas Paulsson via llvm-dev

unread,

Apr 18, 2017, 1:29:07 AM4/18/17

to Hal Finkel, llvm-dev

Hi Hal,

>
> No, LSR won't add new PHIs. This is a long-standing deficiency (and
> also, in part, prevents it from properly handling pre/post-inc
> addressing modes, which motives some target-specific passes such as
> lib/Target/PowerPC/PPCLoopPreIncPrep.cpp).
>

I do find in this regression that gcc manages to use four address
registers, while llvm uses just one, with a lot of extra address
building instructions as a result, where the gcc loop looks very clean.
I suspect that this could be a needed feature. Would you recommend doing
this (splitting PHIs) as a "LSRPrep" pass, perhaps in the target, or
would you try to extend LSR itself?

>>
>>
>> I also see that LSR is thinking in terms of increments between the
>> memory accesses. In the loop I am working with it's disappointing to
>> see that before each memory access, the base address is loaded into
>> register, and then the offset is added, and then the access, which is
>> 3 instructions. It should have been just an add/sub after the
>> previous access before the memory access, per LSRs intentions. I
>> wonder where this is supposed to be handled: In some sort of target
>> pre-isel pass that chains the GEPs? Or is this just folded more often
>> on other targets?
>
> As I recall, it does not do this now (although this is also needed for
> handling pre/post-inc addressing modes properly).
>

Same question - It might be simpler to do a separate post-LSR GEP
handling (in CodeGenPrepare, perhaps?), but I suspect it would also be
possible to extend LSR to do this instead?

/Jonas

Hal Finkel via llvm-dev

unread,

Apr 19, 2017, 12:25:21 PM4/19/17

to Jonas Paulsson, llvm-dev, Andrew Trick

On 04/18/2017 12:28 AM, Jonas Paulsson wrote:
> Hi Hal,
>>
>> No, LSR won't add new PHIs. This is a long-standing deficiency (and
>> also, in part, prevents it from properly handling pre/post-inc
>> addressing modes, which motives some target-specific passes such as
>> lib/Target/PowerPC/PPCLoopPreIncPrep.cpp).
>>
> I do find in this regression that gcc manages to use four address
> registers, while llvm uses just one, with a lot of extra address
> building instructions as a result, where the gcc loop looks very
> clean. I suspect that this could be a needed feature. Would you
> recommend doing this (splitting PHIs) as a "LSRPrep" pass, perhaps in
> the target, or would you try to extend LSR itself?

I recall thinking that the "right" way to do this is to extend LSR
itself. This way the cost of adding extra PHIs could be weighed with
addressing modes, register pressure, and other factors. I have not,
however, looked enough into the details to say exactly how this would
work - if I had, I probably would have done it myself ;) -- cc'ing Andy
in case he'd like to chime in.

>
>>>
>>>
>>> I also see that LSR is thinking in terms of increments between the
>>> memory accesses. In the loop I am working with it's disappointing to
>>> see that before each memory access, the base address is loaded into
>>> register, and then the offset is added, and then the access, which
>>> is 3 instructions. It should have been just an add/sub after the
>>> previous access before the memory access, per LSRs intentions. I
>>> wonder where this is supposed to be handled: In some sort of target
>>> pre-isel pass that chains the GEPs? Or is this just folded more
>>> often on other targets?
>>
>> As I recall, it does not do this now (although this is also needed
>> for handling pre/post-inc addressing modes properly).
>>
> Same question - It might be simpler to do a separate post-LSR GEP
> handling (in CodeGenPrepare, perhaps?), but I suspect it would also be
> possible to extend LSR to do this instead?

Splitting is what, in practice, we have now. Targets have "fixup/prep"
passes to account for the fact that LSR won't add new PHIs. This seems
simpler, but it is probably also suboptimal (i.e. it works reasonably
for targets with simpler addressing modes, like PPC, although there are
still issues, but I don't see that it would work well for targets with
complicated ones).

-Hal

>
> /Jonas
>

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________

Reply all

Reply to author

Forward