[llvm-dev] GlobalISel round table follow-up: multi-stage legalization

19 views

Skip to first unread message

Dominik Montada via llvm-dev

unread,

Oct 7, 2020, 8:07:27 AM10/7/20

to LLVM Developers' List

Hi all,

first of all thanks to Matt for hosting the round table yesterday and to
everyone else for their time and the valuable discussions. As promised
in the chat yesterday, here is the email regarding the legalizer /
artifact combiner issue we talked about. I will send a separate email
regarding the regbankselect discussion.

In case someone on the list didn't attend the round table yesterday,
here is quick summary:

One problem that some of us are seeing in the legalizer is that
sometimes instructions get expanded before they get folded away in the
combiner. IIRC Matt sees this happening with division, while I am seeing
something similar with unmerge.

To my particular problem: due to the nature of the architecture I'm
working with, we have pretty strict legalization rules. One example is
that we only allow unmerges from 64 to 32 bit. Unmerges of 32-bit or
less get lowered to bit-arithmetic. However if we would do the same for
anything bigger than 64-bit, the legalization of the resulting bit
arithmetic would introduce illegal unmerges again, which then cause an
endless loop in the legalizer.

So our rules are carefully designed to create exactly the unmerges we
can handle and let everything else get handled by the artifact combiner.

The artifact combiner usually cannot get rid of the illegal unmerges on
the first go and must do some intermediary combines first. Some of those
try to create other illegal unmerges but ask the legalizer whether those
are supported. If we only provide rules which work on the unmerges that
we can handle, then this query would return false and the combine would
not work. But we also cannot create a rule for those unmerges without
causing endless loops through legalization.

The way we get around this problem is to add a custom rule, in which we
simply return false (i.e. UnableToLegalize). This is enough for the
combiner to produce the intermediary unmerge (because it is now support
through the custom rule), while also getting around the problem of
expanding something we don't want in the legalizer (because we say we
didn't legalize anything). This is obviously a hack.

One of the ideas that was floated around yesterday sounded quite
interesting to me: multi-stage legalization where you could specify
which of your rules apply at which stage. I'm pretty sure this would
solve our problem. In our case we would declare all artifacts as legal
in the first stage to not hinder the combiner and in the second stage we
could then focus on actually legalizing any left-over artifacts we have.

I do however see the problem that this could clutter up the existing
legalization info. Due to the amount of instructions and rules, it
already is quite complex and if rules could apply to different stages in
the same file, it could make it quite difficult to understand what
exactly is happening now.

I think Aditya pointed out that multi-stage legalization might be
already possible by just having two legalizer passes with different
legalization info and I feel like this might be the better approach of
the two. I guess this would still require some tweeks as currently in
llc we can only say `-stop-before/after=legalizer` but not which one of
those.

Another thing I was thinking of when I implemented this hack for our
use-case was that we need some kind of rule which tells the combiner
that something is supported but is actually not doing any legalization
in the legalizer (something like a `.combine()`, `.combineFor({s96})`).

Cheers,

Dominik

Daniel Sanders via llvm-dev

unread,

Oct 12, 2020, 2:56:00 PM10/12/20

to Dominik Montada, LLVM Developers' List

> On 7 Oct 2020, at 05:07, Dominik Montada <dominik...@hightec-rt.com> wrote:
>
> Hi all,
>
> first of all thanks to Matt for hosting the round table yesterday and to everyone else for their time and the valuable discussions. As promised in the chat yesterday, here is the email regarding the legalizer / artifact combiner issue we talked about. I will send a separate email regarding the regbankselect discussion.
>
> In case someone on the list didn't attend the round table yesterday, here is quick summary:
>
> One problem that some of us are seeing in the legalizer is that sometimes instructions get expanded before they get folded away in the combiner. IIRC Matt sees this happening with division, while I am seeing something similar with unmerge.

IIRC there were two main manifestations of it. For one it ends up at the same MIR but takes a slower-than-necessary route to get there. For the other it ends up with worse code which potentially can't be folded post-legalization.

> To my particular problem: due to the nature of the architecture I'm working with, we have pretty strict legalization rules. One example is that we only allow unmerges from 64 to 32 bit. Unmerges of 32-bit or less get lowered to bit-arithmetic. However if we would do the same for anything bigger than 64-bit, the legalization of the resulting bit arithmetic would introduce illegal unmerges again, which then cause an endless loop in the legalizer.

Are there particular >s64 cases that are the problem or is it all of them? I'd expect s128, s256, etc. to G_UNMERGE fairly simply but non-powers-of-2 are more likely to be tricky

> So our rules are carefully designed to create exactly the unmerges we can handle and let everything else get handled by the artifact combiner.
>
> The artifact combiner usually cannot get rid of the illegal unmerges on the first go and must do some intermediary combines first. Some of those try to create other illegal unmerges but ask the legalizer whether those are supported. If we only provide rules which work on the unmerges that we can handle, then this query would return false and the combine would not work. But we also cannot create a rule for those unmerges without causing endless loops through legalization.
>
> The way we get around this problem is to add a custom rule, in which we simply return false (i.e. UnableToLegalize). This is enough for the combiner to produce the intermediary unmerge (because it is now support through the custom rule), while also getting around the problem of expanding something we don't want in the legalizer (because we say we didn't legalize anything). This is obviously a hack.
>
> One of the ideas that was floated around yesterday sounded quite interesting to me: multi-stage legalization where you could specify which of your rules apply at which stage. I'm pretty sure this would solve our problem. In our case we would declare all artifacts as legal in the first stage to not hinder the combiner and in the second stage we could then focus on actually legalizing any left-over artifacts we have.
>
> I do however see the problem that this could clutter up the existing legalization info. Due to the amount of instructions and rules, it already is quite complex and if rules could apply to different stages in the same file, it could make it quite difficult to understand what exactly is happening now.

It would definitely add some clutter but I suspect it would be manageable. Essentially it would be a common ruleset for most operations and each pass would add its own version of the merge/unmerge rules.

> I think Aditya pointed out that multi-stage legalization might be already possible by just having two legalizer passes with different legalization info and I feel like this might be the better approach of the two. I guess this would still require some tweeks as currently in llc we can only say `-stop-before/after=legalizer` but not which one of those.

That's right. Each legalizer pass owns it's own ruleset so two passes is a possibility. The -stop-before/after bit has been solved for some other passes but it does need a bit of boilerplate. Each subclass needs it's own pass id and INITIALIZE_* macros and getPassName() needs to be overridable.

> Another thing I was thinking of when I implemented this hack for our use-case was that we need some kind of rule which tells the combiner that something is supported but is actually not doing any legalization in the legalizer (something like a `.combine()`, `.combineFor({s96})`).

I'm not quite sure what you mean here. Are you thinking of legalization rules for combining or something like legalization rules but for the combiner? Are you thinking of artifact combines in particular or more generally for combines?

> Cheers,
>
> Dominik
>

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Dominik Montada via llvm-dev

unread,

Oct 13, 2020, 3:48:18 AM10/13/20

to Daniel Sanders, LLVM Developers' List

Hi Daniel,

thanks for the follow-up! I left inline comments down below.

Am 12.10.20 um 20:55 schrieb Daniel Sanders:

>> One problem that some of us are seeing in the legalizer is that sometimes instructions get expanded before they get folded away in the combiner. IIRC Matt sees this happening with division, while I am seeing something similar with unmerge.
> IIRC there were two main manifestations of it. For one it ends up at the same MIR but takes a slower-than-necessary route to get there. For the other it ends up with worse code which potentially can't be folded post-legalization.

I see. In our case it is definitely the latter: we end with with
worse-code, or rather we end up with code which cannot be legalized
further and therefore stops the compilation.

>> To my particular problem: due to the nature of the architecture I'm working with, we have pretty strict legalization rules. One example is that we only allow unmerges from 64 to 32 bit. Unmerges of 32-bit or less get lowered to bit-arithmetic. However if we would do the same for anything bigger than 64-bit, the legalization of the resulting bit arithmetic would introduce illegal unmerges again, which then cause an endless loop in the legalizer.
> Are there particular >s64 cases that are the problem or is it all of them? I'd expect s128, s256, etc. to G_UNMERGE fairly simply but non-powers-of-2 are more likely to be tricky

I only noticed this problem with non-power-of-2 cases. Particularly s96,
which we often encounter for some reason.

>> One of the ideas that was floated around yesterday sounded quite interesting to me: multi-stage legalization where you could specify which of your rules apply at which stage. I'm pretty sure this would solve our problem. In our case we would declare all artifacts as legal in the first stage to not hinder the combiner and in the second stage we could then focus on actually legalizing any left-over artifacts we have.
>>
>> I do however see the problem that this could clutter up the existing legalization info. Due to the amount of instructions and rules, it already is quite complex and if rules could apply to different stages in the same file, it could make it quite difficult to understand what exactly is happening now.
> It would definitely add some clutter but I suspect it would be manageable. Essentially it would be a common ruleset for most operations and each pass would add its own version of the merge/unmerge rules.

I think as long as most operations would use the common ruleset, it
would indeed be manageable. But I suspect it would blow up significantly
when the rule sets would largely differ. In those cases it would
probably be better to allow a backend to define different legalization
infos per stage. To be fair, I don't see a use-case for such a blown-up
case at the moment, but if its fairly trivial, we shouldn't restrict
backends too much here IMO.

>> I think Aditya pointed out that multi-stage legalization might be already possible by just having two legalizer passes with different legalization info and I feel like this might be the better approach of the two. I guess this would still require some tweeks as currently in llc we can only say `-stop-before/after=legalizer` but not which one of those.
> That's right. Each legalizer pass owns it's own ruleset so two passes is a possibility. The -stop-before/after bit has been solved for some other passes but it does need a bit of boilerplate. Each subclass needs it's own pass id and INITIALIZE_* macros and getPassName() needs to be overridable.

Is there an existing example that I could look at? I would be fairly
interested to try this out in our backend.

>> Another thing I was thinking of when I implemented this hack for our use-case was that we need some kind of rule which tells the combiner that something is supported but is actually not doing any legalization in the legalizer (something like a `.combine()`, `.combineFor({s96})`).
> I'm not quite sure what you mean here. Are you thinking of legalization rules for combining or something like legalization rules but for the combiner? Are you thinking of artifact combines in particular or more generally for combines?

Here I was only thinking about artifact combines. So in your
legalization info you would define your rule set for e.g.
G_UNMERGE_VALUES like so: `.legalFor({...})[...].combine()` The
`.combine()` would have no effect on the legalizer itself, i.e. it would
only return something equivalent to `UnableToLegalize` when
`legalizeInstrStep(MI)` is called. However it would have an effect on
the LegalizationArtifactCombiner: when the artifact combiner asks
whether the resulting instruction of some combine is supported (which
only checks whether there is some legalization rule covering the
resulting instruction), it would return true and therefore enable the
combine. I hope that makes it a bit clearer what I meant by this.
>> Cheers,
>>
>> Dominik
>>
>>
--
----------------------------------------------------------------------
Dominik Montada Email: dominik...@hightec-rt.com
HighTec EDV-Systeme GmbH Phone: +49 681 92613 19
Europaallee 19 Fax: +49-681-92613-26
D-66113 Saarbrücken WWW: http://www.hightec-rt.com

Managing Director: Vera Strothmann
Register Court: Saarbrücken, HRB 10445, VAT ID: DE 138344222

This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient please notify the sender immediately
and destroy this e-mail. Any unauthorised copying, disclosure or
distribution of the material in this e-mail is strictly forbidden.
---

Daniel Sanders via llvm-dev

unread,

Oct 19, 2020, 1:33:17 PM10/19/20

to Dominik Montada, LLVM Developers' List

> On 13 Oct 2020, at 00:47, Dominik Montada <dominik...@hightec-rt.com> wrote:
>
> Hi Daniel,
>
> thanks for the follow-up! I left inline comments down below.
>
> Am 12.10.20 um 20:55 schrieb Daniel Sanders:
>>> One problem that some of us are seeing in the legalizer is that sometimes instructions get expanded before they get folded away in the combiner. IIRC Matt sees this happening with division, while I am seeing something similar with unmerge.
>> IIRC there were two main manifestations of it. For one it ends up at the same MIR but takes a slower-than-necessary route to get there. For the other it ends up with worse code which potentially can't be folded post-legalization.
> I see. In our case it is definitely the latter: we end with with worse-code, or rather we end up with code which cannot be legalized further and therefore stops the compilation.
>>> To my particular problem: due to the nature of the architecture I'm working with, we have pretty strict legalization rules. One example is that we only allow unmerges from 64 to 32 bit. Unmerges of 32-bit or less get lowered to bit-arithmetic. However if we would do the same for anything bigger than 64-bit, the legalization of the resulting bit arithmetic would introduce illegal unmerges again, which then cause an endless loop in the legalizer.
>> Are there particular >s64 cases that are the problem or is it all of them? I'd expect s128, s256, etc. to G_UNMERGE fairly simply but non-powers-of-2 are more likely to be tricky
> I only noticed this problem with non-power-of-2 cases. Particularly s96, which we often encounter for some reason.

Does it go straight to bit arithmetic or pass through some steps first? I'd expect something like s96 to unmerge to 3x s32, then the first two might merge to form {s64, s32}.

>>> One of the ideas that was floated around yesterday sounded quite interesting to me: multi-stage legalization where you could specify which of your rules apply at which stage. I'm pretty sure this would solve our problem. In our case we would declare all artifacts as legal in the first stage to not hinder the combiner and in the second stage we could then focus on actually legalizing any left-over artifacts we have.
>>>
>>> I do however see the problem that this could clutter up the existing legalization info. Due to the amount of instructions and rules, it already is quite complex and if rules could apply to different stages in the same file, it could make it quite difficult to understand what exactly is happening now.
>> It would definitely add some clutter but I suspect it would be manageable. Essentially it would be a common ruleset for most operations and each pass would add its own version of the merge/unmerge rules.
> I think as long as most operations would use the common ruleset, it would indeed be manageable. But I suspect it would blow up significantly when the rule sets would largely differ. In those cases it would probably be better to allow a backend to define different legalization infos per stage. To be fair, I don't see a use-case for such a blown-up case at the moment, but if its fairly trivial, we shouldn't restrict backends too much here IMO.

I agree. I think the only place we assume there's only one at the moment is in Legalizer::runOnMachineFunction() where it asks the subtarget for the LegalizerInfo. As long as we can do something about that I'd expect it to work whether you have common function to set them up or be entirely independent.

>>> I think Aditya pointed out that multi-stage legalization might be already possible by just having two legalizer passes with different legalization info and I feel like this might be the better approach of the two. I guess this would still require some tweeks as currently in llc we can only say `-stop-before/after=legalizer` but not which one of those.
>> That's right. Each legalizer pass owns it's own ruleset so two passes is a possibility. The -stop-before/after bit has been solved for some other passes but it does need a bit of boilerplate. Each subclass needs it's own pass id and INITIALIZE_* macros and getPassName() needs to be overridable.
> Is there an existing example that I could look at? I would be fairly interested to try this out in our backend.

AArch64PreLegalizerCombiner/AArch64PostLegalizerCombiner use a method where they define the pass and delegate into a common implementation.

It's not the one I was looking for (I remember seeing SelectionDAG use this method somewhere too) but X86ExecutionDomainFix does the minimal subclass method

>>> Another thing I was thinking of when I implemented this hack for our use-case was that we need some kind of rule which tells the combiner that something is supported but is actually not doing any legalization in the legalizer (something like a `.combine()`, `.combineFor({s96})`).
>> I'm not quite sure what you mean here. Are you thinking of legalization rules for combining or something like legalization rules but for the combiner? Are you thinking of artifact combines in particular or more generally for combines?
> Here I was only thinking about artifact combines. So in your legalization info you would define your rule set for e.g. G_UNMERGE_VALUES like so: `.legalFor({...})[...].combine()` The `.combine()` would have no effect on the legalizer itself, i.e. it would only return something equivalent to `UnableToLegalize` when `legalizeInstrStep(MI)` is called. However it would have an effect on the LegalizationArtifactCombiner: when the artifact combiner asks whether the resulting instruction of some combine is supported (which only checks whether there is some legalization rule covering the resulting instruction), it would return true and therefore enable the combine. I hope that makes it a bit clearer what I meant by this.

Ah I think I see. I think you effectively want to turn `!isInstUnsupported() || combineExists()`. The main snag with this is that combines aren't guaranteed to change the original instruction and if it fails to do so there'd be no legalization rule to ensure the unsupported instruction is eliminated. We'd therefore have to fail to compile.

I think we can achieve the same effect without the new mechanism via custom() where the mutation function (which is currently a monolithic function but I've wanted to change that for a while*) calls functions from LegalizationArtifactCombiner(). That mutation function would be able to add the necessary guarantees that _something_ happens to eliminate the original instruction even though the combiner itself doesn't.

*Incidentally, we really ought to find some time to get rid of the old setAction() mechanism

>>> Cheers,
>>>
>>> Dominik
>>>
>>>
> --
> ----------------------------------------------------------------------
> Dominik Montada Email: dominik...@hightec-rt.com
> HighTec EDV-Systeme GmbH Phone: +49 681 92613 19
> Europaallee 19 Fax: +49-681-92613-26
> D-66113 Saarbrücken WWW: http://www.hightec-rt.com
>
> Managing Director: Vera Strothmann
> Register Court: Saarbrücken, HRB 10445, VAT ID: DE 138344222
>
> This e-mail may contain confidential and/or privileged information. If
> you are not the intended recipient please notify the sender immediately
> and destroy this e-mail. Any unauthorised copying, disclosure or
> distribution of the material in this e-mail is strictly forbidden.
> ---
>

_______________________________________________

Dominik Montada via llvm-dev

unread,

Oct 23, 2020, 12:57:07 PM10/23/20

to Daniel Sanders, LLVM Developers' List

Am 19.10.20 um 19:33 schrieb Daniel Sanders:

>> On 13 Oct 2020, at 00:47, Dominik Montada <dominik...@hightec-rt.com> wrote:
>>
>> Hi Daniel,
>>
>> thanks for the follow-up! I left inline comments down below.
>>
>> Am 12.10.20 um 20:55 schrieb Daniel Sanders:
>>>> One problem that some of us are seeing in the legalizer is that sometimes instructions get expanded before they get folded away in the combiner. IIRC Matt sees this happening with division, while I am seeing something similar with unmerge.
>>> IIRC there were two main manifestations of it. For one it ends up at the same MIR but takes a slower-than-necessary route to get there. For the other it ends up with worse code which potentially can't be folded post-legalization.
>> I see. In our case it is definitely the latter: we end with with worse-code, or rather we end up with code which cannot be legalized further and therefore stops the compilation.
>>>> To my particular problem: due to the nature of the architecture I'm working with, we have pretty strict legalization rules. One example is that we only allow unmerges from 64 to 32 bit. Unmerges of 32-bit or less get lowered to bit-arithmetic. However if we would do the same for anything bigger than 64-bit, the legalization of the resulting bit arithmetic would introduce illegal unmerges again, which then cause an endless loop in the legalizer.
>>> Are there particular >s64 cases that are the problem or is it all of them? I'd expect s128, s256, etc. to G_UNMERGE fairly simply but non-powers-of-2 are more likely to be tricky
>> I only noticed this problem with non-power-of-2 cases. Particularly s96, which we often encounter for some reason.
> Does it go straight to bit arithmetic or pass through some steps first? I'd expect something like s96 to unmerge to 3x s32, then the first two might merge to form {s64, s32}.

That depends. LegalizerHelper only supports legalization actions on the
result type of the unmerge (i.e the smaller type) and the only supported
operations are widen, lower and fewer or more elements.

So if we have an unmerge from s96 to 3 x s32 for example, then the
result type is already exactly what we want to see and there is no other
supported legalizer action that makes sense for us, except to go
straight to bit-arithmetic.

If on the other hand we have an unmerge to something smaller then s32,
we widen and can either end up with another non-power-of-2 with the same
problem or we get lucky, end up with a power-of-2 type and everything
goes away nicely.

>>>> One of the ideas that was floated around yesterday sounded quite interesting to me: multi-stage legalization where you could specify which of your rules apply at which stage. I'm pretty sure this would solve our problem. In our case we would declare all artifacts as legal in the first stage to not hinder the combiner and in the second stage we could then focus on actually legalizing any left-over artifacts we have.
>>>>
>>>> I do however see the problem that this could clutter up the existing legalization info. Due to the amount of instructions and rules, it already is quite complex and if rules could apply to different stages in the same file, it could make it quite difficult to understand what exactly is happening now.
>>> It would definitely add some clutter but I suspect it would be manageable. Essentially it would be a common ruleset for most operations and each pass would add its own version of the merge/unmerge rules.
>> I think as long as most operations would use the common ruleset, it would indeed be manageable. But I suspect it would blow up significantly when the rule sets would largely differ. In those cases it would probably be better to allow a backend to define different legalization infos per stage. To be fair, I don't see a use-case for such a blown-up case at the moment, but if its fairly trivial, we shouldn't restrict backends too much here IMO.
> I agree. I think the only place we assume there's only one at the moment is in Legalizer::runOnMachineFunction() where it asks the subtarget for the LegalizerInfo. As long as we can do something about that I'd expect it to work whether you have common function to set them up or be entirely independent.

Sounds like it should be easy to ask the subtarget for the LegalizerInfo
for stage X and then it would be up to the subtarget to either return
the same one or a different one.

>>>> I think Aditya pointed out that multi-stage legalization might be already possible by just having two legalizer passes with different legalization info and I feel like this might be the better approach of the two. I guess this would still require some tweeks as currently in llc we can only say `-stop-before/after=legalizer` but not which one of those.
>>> That's right. Each legalizer pass owns it's own ruleset so two passes is a possibility. The -stop-before/after bit has been solved for some other passes but it does need a bit of boilerplate. Each subclass needs it's own pass id and INITIALIZE_* macros and getPassName() needs to be overridable.
>> Is there an existing example that I could look at? I would be fairly interested to try this out in our backend.
> AArch64PreLegalizerCombiner/AArch64PostLegalizerCombiner use a method where they define the pass and delegate into a common implementation.
>
> It's not the one I was looking for (I remember seeing SelectionDAG use this method somewhere too) but X86ExecutionDomainFix does the minimal subclass method

Thanks! I'll take a look and try it out when I find the time :)

>
>>>> Another thing I was thinking of when I implemented this hack for our use-case was that we need some kind of rule which tells the combiner that something is supported but is actually not doing any legalization in the legalizer (something like a `.combine()`, `.combineFor({s96})`).
>>> I'm not quite sure what you mean here. Are you thinking of legalization rules for combining or something like legalization rules but for the combiner? Are you thinking of artifact combines in particular or more generally for combines?
>> Here I was only thinking about artifact combines. So in your legalization info you would define your rule set for e.g. G_UNMERGE_VALUES like so: `.legalFor({...})[...].combine()` The `.combine()` would have no effect on the legalizer itself, i.e. it would only return something equivalent to `UnableToLegalize` when `legalizeInstrStep(MI)` is called. However it would have an effect on the LegalizationArtifactCombiner: when the artifact combiner asks whether the resulting instruction of some combine is supported (which only checks whether there is some legalization rule covering the resulting instruction), it would return true and therefore enable the combine. I hope that makes it a bit clearer what I meant by this.
> Ah I think I see. I think you effectively want to turn `!isInstUnsupported() || combineExists()`. The main snag with this is that combines aren't guaranteed to change the original instruction and if it fails to do so there'd be no legalization rule to ensure the unsupported instruction is eliminated. We'd therefore have to fail to compile.

Maybe I'm misunderstanding you here, but isn't that already the case
now? Artifacts are tried to combine, then they are tried to legalize. If
both of these things fail and the surrounding code hasn't changed as
well, we fail to compile.

> I think we can achieve the same effect without the new mechanism via custom() where the mutation function (which is currently a monolithic function but I've wanted to change that for a while*) calls functions from LegalizationArtifactCombiner(). That mutation function would be able to add the necessary guarantees that _something_ happens to eliminate the original instruction even though the combiner itself doesn't.

I assume with mutation function you mean the functions that actually
change the original instruction, like for example a `widenScalar`?
If so, I agree that it would be nice to gain more fine-grained control
over how combines are done by having them call functions from
LegalizationArtifactCombiner.

I don't see however how that means that it could add necessary guarantees?

Daniel Sanders via llvm-dev

unread,

Oct 26, 2020, 6:47:27 PM10/26/20

to Dominik Montada, LLVM Developers' List

On 23 Oct 2020, at 09:56, Dominik Montada <dominik...@hightec-rt.com> wrote:

Am 19.10.20 um 19:33 schrieb Daniel Sanders:
On 13 Oct 2020, at 00:47, Dominik Montada <dominik...@hightec-rt.com> wrote:

Hi Daniel,

thanks for the follow-up! I left inline comments down below.

Am 12.10.20 um 20:55 schrieb Daniel Sanders:
One problem that some of us are seeing in the legalizer is that sometimes instructions get expanded before they get folded away in the combiner. IIRC Matt sees this happening with division, while I am seeing something similar with unmerge.
IIRC there were two main manifestations of it. For one it ends up at the same MIR but takes a slower-than-necessary route to get there. For the other it ends up with worse code which potentially can't be folded post-legalization.
I see. In our case it is definitely the latter: we end with with worse-code, or rather we end up with code which cannot be legalized further and therefore stops the compilation.
To my particular problem: due to the nature of the architecture I'm working with, we have pretty strict legalization rules. One example is that we only allow unmerges from 64 to 32 bit. Unmerges of 32-bit or less get lowered to bit-arithmetic. However if we would do the same for anything bigger than 64-bit, the legalization of the resulting bit arithmetic would introduce illegal unmerges again, which then cause an endless loop in the legalizer.
Are there particular >s64 cases that are the problem or is it all of them? I'd expect s128, s256, etc. to G_UNMERGE fairly simply but non-powers-of-2 are more likely to be tricky
I only noticed this problem with non-power-of-2 cases. Particularly s96, which we often encounter for some reason.
Does it go straight to bit arithmetic or pass through some steps first? I'd expect something like s96 to unmerge to 3x s32, then the first two might merge to form {s64, s32}.

That depends. LegalizerHelper only supports legalization actions on the result type of the unmerge (i.e the smaller type) and the only supported operations are widen, lower and fewer or more elements.

So if we have an unmerge from s96 to 3 x s32 for example, then the result type is already exactly what we want to see and there is no other supported legalizer action that makes sense for us, except to go straight to bit-arithmetic.

If on the other hand we have an unmerge to something smaller then s32, we widen and can either end up with another non-power-of-2 with the same problem or we get lucky, end up with a power-of-2 type and everything goes away nicely.

Ah ok, I think I see the problem. The current strategy for narrowing overshoots for non-powers-of-two and at best hits a case we can easily widen from, or lands on a case that's legal and stops to early, or at worst lands on a difficult case and pays a heavy price to widen back to the intended type.

I suppose we might be a little too keen to use G_UNMERGE_VALUES and maybe should deal with non-powers-of-2 with a pair of G_EXTRACT's before G_UNMERGE_VALUE's. For this particular case, it could be that we have the wrong strategy. It seems safe to narrow an operation using G_ANYEXT to a wider type because the opcode is changing to a 'simpler' one. However, it wouldn't be safe to then narrow that G_ANYEXT using wider types as that would potentially cause a loop involving other operations.

I think there's also another issue which is that we only have one strategy for each legalization step and expect it to work for all kinds of targets while in truth certain paths towards legalization are riskier than others as different targets find different operations trickier.

One of the ideas that was floated around yesterday sounded quite interesting to me: multi-stage legalization where you could specify which of your rules apply at which stage. I'm pretty sure this would solve our problem. In our case we would declare all artifacts as legal in the first stage to not hinder the combiner and in the second stage we could then focus on actually legalizing any left-over artifacts we have.

I do however see the problem that this could clutter up the existing legalization info. Due to the amount of instructions and rules, it already is quite complex and if rules could apply to different stages in the same file, it could make it quite difficult to understand what exactly is happening now.
It would definitely add some clutter but I suspect it would be manageable. Essentially it would be a common ruleset for most operations and each pass would add its own version of the merge/unmerge rules.
I think as long as most operations would use the common ruleset, it would indeed be manageable. But I suspect it would blow up significantly when the rule sets would largely differ. In those cases it would probably be better to allow a backend to define different legalization infos per stage. To be fair, I don't see a use-case for such a blown-up case at the moment, but if its fairly trivial, we shouldn't restrict backends too much here IMO.
I agree. I think the only place we assume there's only one at the moment is in Legalizer::runOnMachineFunction() where it asks the subtarget for the LegalizerInfo. As long as we can do something about that I'd expect it to work whether you have common function to set them up or be entirely independent.
Sounds like it should be easy to ask the subtarget for the LegalizerInfo for stage X and then it would be up to the subtarget to either return the same one or a different one.
I think Aditya pointed out that multi-stage legalization might be already possible by just having two legalizer passes with different legalization info and I feel like this might be the better approach of the two. I guess this would still require some tweeks as currently in llc we can only say `-stop-before/after=legalizer` but not which one of those.
That's right. Each legalizer pass owns it's own ruleset so two passes is a possibility. The -stop-before/after bit has been solved for some other passes but it does need a bit of boilerplate. Each subclass needs it's own pass id and INITIALIZE_* macros and getPassName() needs to be overridable.
Is there an existing example that I could look at? I would be fairly interested to try this out in our backend.
AArch64PreLegalizerCombiner/AArch64PostLegalizerCombiner use a method where they define the pass and delegate into a common implementation.

It's not the one I was looking for (I remember seeing SelectionDAG use this method somewhere too) but X86ExecutionDomainFix does the minimal subclass method
Thanks! I'll take a look and try it out when I find the time :)

Another thing I was thinking of when I implemented this hack for our use-case was that we need some kind of rule which tells the combiner that something is supported but is actually not doing any legalization in the legalizer (something like a `.combine()`, `.combineFor({s96})`).
I'm not quite sure what you mean here. Are you thinking of legalization rules for combining or something like legalization rules but for the combiner? Are you thinking of artifact combines in particular or more generally for combines?
Here I was only thinking about artifact combines. So in your legalization info you would define your rule set for e.g. G_UNMERGE_VALUES like so: `.legalFor({...})[...].combine()` The `.combine()` would have no effect on the legalizer itself, i.e. it would only return something equivalent to `UnableToLegalize` when `legalizeInstrStep(MI)` is called. However it would have an effect on the LegalizationArtifactCombiner: when the artifact combiner asks whether the resulting instruction of some combine is supported (which only checks whether there is some legalization rule covering the resulting instruction), it would return true and therefore enable the combine. I hope that makes it a bit clearer what I meant by this.
Ah I think I see. I think you effectively want to turn `!isInstUnsupported() || combineExists()`. The main snag with this is that combines aren't guaranteed to change the original instruction and if it fails to do so there'd be no legalization rule to ensure the unsupported instruction is eliminated. We'd therefore have to fail to compile.
Maybe I'm misunderstanding you here, but isn't that already the case now? Artifacts are tried to combine, then they are tried to legalize. If both of these things fail and the surrounding code hasn't changed as well, we fail to compile.

I'm saying that if both fail, it's a bug in the legalizations not in the combines (in principle at least). The legalizations have to be able to deal with anything illegal that's given to them whereas the artifact combiners are allowed to leave illegal instructions as-is and leave it to the legalizers to fix them.

I think we can achieve the same effect without the new mechanism via custom() where the mutation function (which is currently a monolithic function but I've wanted to change that for a while*) calls functions from LegalizationArtifactCombiner(). That mutation function would be able to add the necessary guarantees that _something_ happens to eliminate the original instruction even though the combiner itself doesn't.
I assume with mutation function you mean the functions that actually change the original instruction, like for example a `widenScalar`?

Yes, that's right.

If so, I agree that it would be nice to gain more fine-grained control over how combines are done by having them call functions from LegalizationArtifactCombiner.

I don't see however how that means that it could add necessary guarantees?

Because you could include them in the implementation of the mutation function. For example:

.maxScalar(0, s64)

is currently equivalent to:

.narrowScalar(0, s64)

which will lower an s96 with:

%1(s32), %2(s32), %3(s32) = G_UNMERGE_VALUES %0(s96)

%4(s64) = G_MERGE_VALUES %1(s32), %2(s32)

%5(s32) = G_IMPLICIT_DEF

%6(s64) = G_MERGE_VALUES %3(s32), undef

which the combiner then has to fold away somehow so we don't end up with bit arithmetic. For targets with 2xs32 subregisters in each s64, that's trivial. It can just pick registers appropriately. However other targets might not have that convenient structure and end up packing into an s64 register with bit arithmetic.

Meanwhile:

.customIf(typeIsToLarge(0, 64), myNarrowScalar)

can be aware that the target isn't going to like packing 2x s32 into an s64 and can instead implement it as:

%1(s128) = G_ANYEXT %0(s96)

%2(s64), %3(s64) = G_UNMERGE_VALUES %1(s128)

%4(s32) = G_TRUNC %3

which will hopefully optimize out the excess bits of %3 G_UNMERGE_VALUES but even if it doesn't we still have no bit arithmetic. Also, unlike custom() in general, added your own custom handling of a particular rule doesn't take on _all_ responsibility for fully legalizing the given operation (and eliminating the opcode). You're allowed to fix your bit of the problem and give it back to the legalization loop to continue work.

If effect, our custom legalization here would provide the guarantee that it doesn't pass through the smaller type on the way to the larger type

Reply all

Reply to author

Forward

0 new messages