[llvm-dev] [GlobalISel] Prioritizing long patterns in combiner over short ones

54 views
Skip to first unread message

Dominik Montada via llvm-dev

unread,
Dec 15, 2020, 10:09:46 AM12/15/20
to LLVM Developers' List
Hi,

I'm currently writing target specific combiners with GlobalISel. I have
a case where a sub-node of a larger pattern also matches another,
smaller combiner pattern. Because the combiner runs top-down, the
smaller pattern is matched before the larger pattern has a chance to be
matched. Do I have to teach my larger pattern to handle this case or is
there a better way to do this?

More importantly, are there any plans to improve this behavior?

Cheers,

Dominik

--
----------------------------------------------------------------------
Dominik Montada Email: dominik...@hightec-rt.com
HighTec EDV-Systeme GmbH Phone: +49 681 92613 19
Europaallee 19 Fax: +49-681-92613-26
D-66113 Saarbrücken WWW: http://www.hightec-rt.com

Managing Director: Vera Strothmann
Register Court: Saarbrücken, HRB 10445, VAT ID: DE 138344222

This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient please notify the sender immediately
and destroy this e-mail. Any unauthorised copying, disclosure or
distribution of the material in this e-mail is strictly forbidden.
---

Jason Eckhardt via llvm-dev

unread,
Dec 15, 2020, 11:06:06 AM12/15/20
to LLVM Developers' List, Dominik Montada
GICombineGroup may be of use:

// A group of combine rules that can be added to a GICombiner or another group.
class GICombineGroup<list<GICombine> rules> : GICombine {
  // The rules contained in this group. The rules in a group are flattened into
  // a single list and sorted into whatever order is most efficient. However,
  // they will never be re-ordered such that behaviour differs from the
  // specified order. It is therefore possible to use the order of rules in this
  // list to describe priorities.
  let Rules = rules;
}



From: llvm-dev <llvm-dev...@lists.llvm.org> on behalf of Dominik Montada via llvm-dev <llvm...@lists.llvm.org>
Sent: Tuesday, December 15, 2020 9:09 AM
To: LLVM Developers' List <llvm...@lists.llvm.org>
Subject: [llvm-dev] [GlobalISel] Prioritizing long patterns in combiner over short ones
 
External email: Use caution opening links or attachments

Dominik Montada via llvm-dev

unread,
Dec 15, 2020, 11:19:50 AM12/15/20
to Jason Eckhardt, LLVM Developers' List

Hi Jason,


thanks for the reply. That's what we're already using but it's not useful in this case. It seems that the order only actually matters when the roots are the same. In my case, the roots are different. In one case I'm trying to match a G_AND and in the other one I'm trying to match G_OR. The G_AND pattern also shows up in the larger G_OR pattern. Since the combiner walks top-down, it sees the G_AND before the G_OR and matches the smaller pattern. When it then reaches the G_OR later on, the larger pattern doesn't match anymore because the G_AND has been combined away. If the combiner would walk bottom-up, it would see the G_OR before the G_AND and my use-case would work as expected.


So this is not a problem of priority, but rather a problem of the traversal that the combiner does. I do understand though that simply switching the traversal to bottom-up won't work in every case and might even cause the same problem.


Cheers,


Dominik


Am 15.12.20 um 17:05 schrieb Jason Eckhardt:

Jay Foad via llvm-dev

unread,
Dec 17, 2020, 5:14:56 AM12/17/20
to Dominik Montada, LLVM Developers' List
I am really surprised that "the combiner runs top-down". Surely it's
better to combine smaller sub-expressions before combining larger
expressions? Or is there actually a good reason to keep the top-down
order??

DAGCombiner also runs mostly top-down (but it can also add nodes to
the worklist in a bottom-up direction for certain cases). The top-down
order causes problems when you have deep expressions trees that could
be simplified away to nothing. Outer combines see these subexpression
unsimplified. They can use recursive helper functions like
computeKnownBits to try to analyse the unsimplified expression, but
the recursive helper functions typically have a max recursion depth of
6. which is easy to hit.

DAGCombiner is so embedded now that it is hard to change the basic
top-down order, but I would hope we could change the GlobalISel
combiner.

Jay.

> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Amara Emerson via llvm-dev

unread,
Jan 20, 2021, 1:51:48 AM1/20/21
to Jay Foad, LLVM Developers' List
+Daniel.

Daniel, do you remember if there was a strong reason for choosing to go top-down on the combiner?

Amara

Daniel Sanders via llvm-dev

unread,
Jan 20, 2021, 1:50:34 PM1/20/21
to Amara Emerson, LLVM Developers' List
To be honest, hearing it was top-down surprised me too at first but I think I was remembering the algorithm I wanted to experiment with which required bottom-up.

I asked Aditya about this and apparently it's to address the pathological case where we repeatedly do an entire pass only to simplify a single node of the MIR and slowly propagate that through the MIR. Another reason was because InstCombine is like this too.

I think we can probably address a fair portion of the pathological case by controlling how we populate the work list. Instead of adding all MI's to it each time, we could filter out blocks that can't possibly match on this round. I'm thinking that a block only needs revisiting on the next pass if it contains a use of a register that was def'd by an instruction we changed on the current pass. Therefore the observer could track the vreg defs affected by a pass and build a list of MBB's to revisit. I think we'd have to revisit all the successors of those blocks too if the pass is running top-down (in case we make changes that allow further changes in the same pass) but if we changed it to bottom-up we wouldn't need that. This wouldn't help with large blocks but maybe we can find a similar scheme there.

Jay Foad via llvm-dev

unread,
Jan 21, 2021, 12:02:58 PM1/21/21
to Daniel Sanders, LLVM Developers' List
I think the terminology is confusing. By "bottom up" I mean visiting
the leaves of an expression tree first like in BURS
(https://en.wikipedia.org/wiki/BURS). This is "bottom up" if you draw
your trees with the root at the top. For the expression (fadd (fmul a,
b), c) it would visit the mul before the add. I think simplifying
things in this order is sane because when you visit a node, you can
assume that the operands of that node have already been simplified.

In my defence I think I got this definition from earlier discussions
about the order in which DAGCombiner runs:
https://reviews.llvm.org/D33587#1372912

Unfortunately if you think of textual IR in a basic block, then
"bottom up" order starts at the top of the block and proceeds towards
the bottom :-(
%mul = fmul float %a, %b
%add = fadd float %mul, %c

A quick experiment shows that InstCombine is bottom-up:
$ cat z.ll
define float @main(float %a, float %b, float %c, float %d, float %e) {
%add = fadd float %a, %b
%sub = fsub float %add, %c
%mul = fmul float %sub, %d
%div = fdiv float %mul, %e
ret float %div
}
$ opt -instcombine -debug-only=instcombine z.ll
IC: Visiting: %add = fadd float %a, %b
IC: Visiting: %sub = fsub float %add, %c
IC: Visiting: %mul = fmul float %sub, %d
IC: Visiting: %div = fdiv float %mul, %e

DAGCombiner is top-down, which I think is wrong but it's hard to fix:
$ llc -march=aarch64 -debug-only=dagcombine z.ll
Combining: t14: f32 = fdiv t13, t10
Combining: t13: f32 = fmul t12, t8
Combining: t12: f32 = fsub t11, t6
Combining: t11: f32 = fadd t2, t4

I'm happy to see that GISel Combiner is bottom-up after all:
$ llc -march=aarch64 -global-isel -debug-only=gi-combiner z.ll
Try combining %5:_(s32) = G_FADD %0:_, %1:_
Try combining %6:_(s32) = G_FSUB %5:_, %2:_
Try combining %7:_(s32) = G_FMUL %6:_, %3:_
Try combining %8:_(s32) = G_FDIV %7:_, %4:_


Sorry if I have derailed your original questions Dominik. I think the
answers are "yes you have to teach your larger pattern to handle this
case" and "no there are no plans to improve this behaviour as far as I
know, and I'm not even sure how it would be improved".

Jay.

Jay.

Dominik Montada via llvm-dev

unread,
Jan 22, 2021, 2:28:20 AM1/22/21
to Jay Foad, Daniel Sanders, LLVM Developers' List
Hi Jay,

thanks for this clarification and sorry for the confusion I seem to have
caused! I was solely thinking in terms of IR instead of the pattern.
Also thanks for linking that DAGCombiner discussion as well. I'll update
my pattern to handle my use-case. In the future I might even add a 2nd
post-combiner-combiner to our backend, should it turn out to be beneficial.

Cheers,

Dominik

Am 21.01.21 um 18:02 schrieb Jay Foad:
Reply all
Reply to author
Forward
0 new messages