In the InstCombinePass, by default the pass will try to sink an
instruction to its successor basic block when possible (so that the
instruction isn’t executed on a path where its result isn’t needed.).
But doing that will also increase a value’s live range. For example:
entry:
..
%6 = load float, ..
%s.0 = load float, ..
%mul22 = fmul float %6, %s.0
%add23 = fadd float %mul22, zeroinitializer
%7 = load float, ..
%s.1 = load float, ..
%mul26 = fmul float %7, %s.1
%add27 = fadd float %add23, %mul26
..
br i1 %cmp, label %cleanup, label %if.end1
if.end1:
%15 = load float, ..
%add67 = fadd %add27, %15
store float %add67, ..
br label %cleanup
cleanup:
return
In the original input, only %add27 has longer live range, but after
InstCombine with instcombine-code-sinking=true (default), it turns out
that %6, %s.0, %7, %s.1 are having longer live ranges.
entry:
..
%6 = load float, ..
%s.0 = load float, ..
%7 = load float, ..
%s.1 = load float, ..
..
br i1 %cmp, label %cleanup, label %if.end1
if.end1:
%mul22 = fmul float %6, %s.0
%add23 = fadd float %mul22, zeroinitializer
%mul26 = fmul float %7, %s.1
%add27 = fadd float %add23, %mul26
%15 = load float, ..
%add67 = fadd %add27, %15
store float %add67, ..
br label %cleanup
cleanup:
return
We see an issue which causes our customized register-allocator keeping
those values like %6, %s.0, %7, %s.1 in registers with a long period.
My questions are:
Does llvm expect the backend's instruction scheduler and register
allocator can handle this properly?
Can this be solved by llvm’s GlobalISel?
Thank you!
CY
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
The original input pattern is as below:
```c
local memory
for (...) {
a function (has side effect) which copies from global to local memory
access data in local memory and do compute
}
if (...)
return;
store the computed result back.
```
If the for loop is fully unrolled, and the computing part is sunk to
the basicblock which stores the computed result back, then the backend
compiler needs to find some places (registers or memory) to store
these copied data.
I've tested with aarch64 and amdgcn, in the test pattern both targets
will spill the data to memory.
In the for-loop If we can directly copy instead of using a copy
function, both targets can generate better basicblock layouts.
(aarch64: "Machine code sinking (machine-sink)" pass, amdgcn: "Code
sinking (sink)" pass)
GlobalISel’s function-scope optimization doesn’t really help in these cases unless the target can somehow fold expressions into simpler instructions. If that’s not possible, the generated code should be fairly similar to that of SelectionDAG.
> On Thu, Oct 14, 2021 at 6:02 AM Amara Emerson <am...@apple.com> wrote:
> > Can this be solved by llvm’s GlobalISel?
> GlobalISel’s function-scope optimization doesn’t really help in these cases unless the target can somehow fold expressions into simpler instructions. If that’s not possible, the generated code should be fairly similar to that of SelectionDAG.