While experimenting with alternative ways of lowering funnel shifts on
AMDGPU I have run into a case where this GMIR:
%0:vgpr(s32) = COPY $vgpr0
%1:vgpr(s32) = COPY $vgpr1
%2:sgpr_64 = COPY $sgpr30_sgpr31
%3:sgpr(s32) = G_CONSTANT i32 5
%7:vgpr(s64) = G_MERGE_VALUES %1(s32), %0(s32)
%9:vgpr(s32) = COPY %3(s32)
%8:vgpr(s64) = G_LSHR %7, %9(s32)
%4:vgpr(s32) = G_TRUNC %8(s64)
$vgpr0 = COPY %4(s32)
%5:ccr_sgpr_64 = COPY %2
S_SETPC_B64_return %5, implicit $vgpr0
does not get matched by this selection pattern:
def : GCNPat<(i32 (trunc (srl i64:$src0, (i32 ShiftAmt32Imm:$src1)))),
(V_ALIGNBIT_B32_e64 (i32 (EXTRACT_SUBREG VReg_64:$src0, sub1)),
(i32 (EXTRACT_SUBREG VReg_64:$src0, sub0)),
VSrc_b32:$src1)>;
because the COPY of G_CONSTANT 5 from sgpr to vgpr gets in the way of
matching the (i32 ShiftAmt32Imm:$src1).
What's a good way of fixing this? I know there has been some
discussion before about whether the generated matcher should look
through copies, either automatically or only when the tablegen pattern
is augmented with some kind of "please look through copies here" node.
As an alternative, would it make sense to run some cross-bank constant
propagation after regbankselect? It seems pretty obvious to me that %9
above should be simplified to:
%9:vgpr(s32) = G_CONSTANT i32 5
so long as it's legal, which we can easily check, can't we?
Thanks,
Jay.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
For simplicity our regbankselect says that all operands of VALU
instructions have to go in vgprs. Moving some of them into sgprs is
left as an optimisation for a later pass. As you know there are limits
on //how many// operands of a VALU instruction can be sgprs or
constants, which are not simple to express in terms of alternative
operand mappings.
Thanks,
Jay.
For simplicity our regbankselect says that all operands of VALU
instructions have to go in vgprs. Moving some of them into sgprs is
left as an optimisation for a later pass. As you know there are limits
on //how many// operands of a VALU instruction can be sgprs or
constants, which are not simple to express in terms of alternative
operand mappings.
Yes, good point.
> The remaining options I see right now are:
>
> (1) Allow pattern matching to look through COPYs.
> (2) Eliminate those COPYs before instruction selection.
>
> Not having thought about this very long, the relative merits seem to be:
>
> - (2) results in smaller GMIR and a simpler pattern matcher, which is better for compile time
> - (2) requires legalization after(?) instruction selection (constant bus limit etc.), which is going to be a pain in the existing CodeGen framework
>
> Neither option makes me happy :) I guess I would go with (2) if this was a greenfield development, but (1) fits better in the existing LLVM code base.
Here's a patch that tackles (1): https://reviews.llvm.org/D86294 . The
current version of the patch adds a dag operator that looks through
copies. I'm not particularly thrilled about that because as a pattern
writer I don't know when I would be required to use it, so I would
probably sprinkle it everywhere.
Previous versions of the same patch changed GlobalISel pattern
matching to look through copies automatically, without any special dag
operator. This seems much more user friendly to me, but Matt argued
that it "is selecting instructions incorrectly, and then relying on
something else to fix them up".
To be honest I am confused about the role of RegBankSelect when you
have instruction selection patterns that match more than one GMIR
node. How can RegBankSelect possibly tell you what alternative
mappings (i.e. instructions) are available, without predicting what
patterns will be matched by instruction selection?
> On Mar 27, 2021, at 04:56, Jay Foad via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Hi Nicolai!
>
> For simplicity our regbankselect says that all operands of VALU
> instructions have to go in vgprs. Moving some of them into sgprs is
> left as an optimisation for a later pass. As you know there are limits
> on //how many// operands of a VALU instruction can be sgprs or
> constants, which are not simple to express in terms of alternative
> operand mappings.
>
> Thanks,
> Jay.
There are 2 issues:
1. Current RegBankSelect does not consider the uses when selecting the bank. This is a general missing optimization
2. For the AMDGPU case, I think we should have a post-regbankselect combiner for this. It’s often better to materialize constants for each bank
I don’t think we actually want to have to look through copies, and the places we do are just working around the status quo.
The folding SGPR/constants into instructions should be a new and improved version of SIFoldOperands. I think optimizing this is beyond the scope of what RegBankSelect and selection patterns. Far too much code would need to be taught to respect and preserve the constant bus limitation otherwise, so that’s why everything uses VGPRs.
-Matt
On Mar 29, 2021, at 06:14, Jay Foad <jay....@gmail.com> wrote:To be honest I am confused about the role of RegBankSelect when you
have instruction selection patterns that match more than one GMIR
node. How can RegBankSelect possibly tell you what alternative
mappings (i.e. instructions) are available, without predicting what
patterns will be matched by instruction selection?
I can understand leaving it to a later pass to fold //sgprs or
constants// into an instruction. What I can't understand is how you do
the same kind of thing for more complex selection patterns like:
t:sgpr = G_ADD y:sgpr, z:sgpr
t':vgpr = COPY t:sgpr
r:vgpr = G_ADD x:vgpt, t':vgpr
How can we select v_add3_u32 from this? I can only think of two options:
1. Select s_add and v_add and leave it to a later pass to combine
them. This seems to be giving up on doing decent pattern-based
instruction selection.
2. Match it in the instruction selector, using a pattern that
(explicitly or implicitly) looks through the cross-bank copy. But then
you're back to the problem that two of the inputs are sgprs, which may
or may not be valid according to complex operand restrictions.
Thanks,
Jay.
On Mon, 29 Mar 2021 at 14:04, Matt Arsenault <ars...@gmail.com> wrote:
> > On Mar 27, 2021, at 04:56, Jay Foad via llvm-dev <llvm...@lists.llvm.org> wrote:
> >
> > Hi Nicolai!
> >
> > For simplicity our regbankselect says that all operands of VALU
> > instructions have to go in vgprs. Moving some of them into sgprs is
> > left as an optimisation for a later pass. As you know there are limits
> > on //how many// operands of a VALU instruction can be sgprs or
> > constants, which are not simple to express in terms of alternative
> > operand mappings.
> >
> > Thanks,
> > Jay.
>
>
> There are 2 issues:
> 1. Current RegBankSelect does not consider the uses when selecting the bank. This is a general missing optimization
> 2. For the AMDGPU case, I think we should have a post-regbankselect combiner for this. It’s often better to materialize constants for each bank
>
> I don’t think we actually want to have to look through copies, and the places we do are just working around the status quo.
>
> The folding SGPR/constants into instructions should be a new and improved version of SIFoldOperands. I think optimizing this is beyond the scope of what RegBankSelect and selection patterns. Far too much code would need to be taught to respect and preserve the constant bus limitation otherwise, so that’s why everything uses VGPRs.
I can understand leaving it to a later pass to fold //sgprs or
constants// into an instruction. What I can't understand is how you do
the same kind of thing for more complex selection patterns like:
t:sgpr = G_ADD y:sgpr, z:sgpr
t':vgpr = COPY t:sgpr
r:vgpr = G_ADD x:vgpt, t':vgpr
How can we select v_add3_u32 from this? I can only think of two options:
1. Select s_add and v_add and leave it to a later pass to combine
them. This seems to be giving up on doing decent pattern-based
instruction selection.
2. Match it in the instruction selector, using a pattern that
(explicitly or implicitly) looks through the cross-bank copy. But then
you're back to the problem that two of the inputs are sgprs, which may
or may not be valid according to complex operand restrictions.
Thanks,
Jay.
Yes but I had always assumed that was just a heuristic. Are we saying
it's required for correctness? Actually I'm confused about the whole
concept of checking for constant bus violations at this stage.
In a normal compiler, the instruction selector would be allowed to
select any instruction that works, regardless of register classes, and
it would be the register allocator's job to copy the input values into
suitable input registers. So given this GMIR:
t:sgpr = G_ADD y:sgpr, z:sgpr
t':vgpr = COPY t:sgpr
r:vgpr = G_ADD x:vgpt, t':vgpr
I would naively hope that the instruction selector could select this
without worrying about register banks or constant bus restrictions:
%4:vgpr_32 = V_ADD3_U32 %0:vgpr_32, %1:vgpr_32, %2:vgpr_32
Then optionally SIFoldOperands (which does know about constant bus
restrictions) could modify some of the input register classes to take
sgprs instead.
Then the register allocator would do its job, inserting sgpr-to-vgpr
copies as and where necessary.
On Mon, 29 Mar 2021 at 19:51, Nicolai Hähnle <nhae...@gmail.com> wrote:
> On Mon, Mar 29, 2021 at 3:34 PM Jay Foad <jay....@gmail.com> wrote:
>> On Mon, 29 Mar 2021 at 14:04, Matt Arsenault <ars...@gmail.com> wrote:
>> > I don’t think we actually want to have to look through copies, and the places we do are just working around the status quo.
>> >
>> > The folding SGPR/constants into instructions should be a new and improved version of SIFoldOperands. I think optimizing this is beyond the scope of what RegBankSelect and selection patterns. Far too much code would need to be taught to respect and preserve the constant bus limitation otherwise, so that’s why everything uses VGPRs.
>>
>> I can understand leaving it to a later pass to fold //sgprs or
>> constants// into an instruction. What I can't understand is how you do
>> the same kind of thing for more complex selection patterns like:
>>
>> t:sgpr = G_ADD y:sgpr, z:sgpr
>> t':vgpr = COPY t:sgpr
>> r:vgpr = G_ADD x:vgpt, t':vgpr
>>
>> How can we select v_add3_u32 from this? I can only think of two options:
>>
>> 1. Select s_add and v_add and leave it to a later pass to combine
>> them. This seems to be giving up on doing decent pattern-based
>> instruction selection.
>> 2. Match it in the instruction selector, using a pattern that
>> (explicitly or implicitly) looks through the cross-bank copy. But then
>> you're back to the problem that two of the inputs are sgprs, which may
>> or may not be valid according to complex operand restrictions.
>
> The SelectionDAG pattern for v_add3 and friends already checks this using a C++ code fragment, doesn't it?
Yes but I had always assumed that was just a heuristic. Are we saying
it's required for correctness? Actually I'm confused about the whole
concept of checking for constant bus violations at this stage.
In a normal compiler, the instruction selector would be allowed to
select any instruction that works, regardless of register classes, and
it would be the register allocator's job to copy the input values into
suitable input registers. So given this GMIR:
t:sgpr = G_ADD y:sgpr, z:sgpr
t':vgpr = COPY t:sgpr
r:vgpr = G_ADD x:vgpt, t':vgpr
I would naively hope that the instruction selector could select this
without worrying about register banks or constant bus restrictions:
%4:vgpr_32 = V_ADD3_U32 %0:vgpr_32, %1:vgpr_32, %2:vgpr_32
Then optionally SIFoldOperands (which does know about constant bus
restrictions) could modify some of the input register classes to take
sgprs instead.
Then the register allocator would do its job, inserting sgpr-to-vgpr
copies as and where necessary.
Jay.
Jay.