Hi Jason,
thanks for the reply. That's what we're already using but it's not useful in this case. It seems that the order only actually matters when the roots are the same. In my case, the roots are different. In one case I'm trying to match a G_AND and in the other one I'm trying to match G_OR. The G_AND pattern also shows up in the larger G_OR pattern. Since the combiner walks top-down, it sees the G_AND before the G_OR and matches the smaller pattern. When it then reaches the G_OR later on, the larger pattern doesn't match anymore because the G_AND has been combined away. If the combiner would walk bottom-up, it would see the G_OR before the G_AND and my use-case would work as expected.
So this is not a problem of priority, but rather a problem of the
traversal that the combiner does. I do understand though that
simply switching the traversal to bottom-up won't work in every
case and might even cause the same problem.
Cheers,
Dominik
DAGCombiner also runs mostly top-down (but it can also add nodes to
the worklist in a bottom-up direction for certain cases). The top-down
order causes problems when you have deep expressions trees that could
be simplified away to nothing. Outer combines see these subexpression
unsimplified. They can use recursive helper functions like
computeKnownBits to try to analyse the unsimplified expression, but
the recursive helper functions typically have a max recursion depth of
6. which is easy to hit.
DAGCombiner is so embedded now that it is hard to change the basic
top-down order, but I would hope we could change the GlobalISel
combiner.
Jay.
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Daniel, do you remember if there was a strong reason for choosing to go top-down on the combiner?
Amara
I asked Aditya about this and apparently it's to address the pathological case where we repeatedly do an entire pass only to simplify a single node of the MIR and slowly propagate that through the MIR. Another reason was because InstCombine is like this too.
I think we can probably address a fair portion of the pathological case by controlling how we populate the work list. Instead of adding all MI's to it each time, we could filter out blocks that can't possibly match on this round. I'm thinking that a block only needs revisiting on the next pass if it contains a use of a register that was def'd by an instruction we changed on the current pass. Therefore the observer could track the vreg defs affected by a pass and build a list of MBB's to revisit. I think we'd have to revisit all the successors of those blocks too if the pass is running top-down (in case we make changes that allow further changes in the same pass) but if we changed it to bottom-up we wouldn't need that. This wouldn't help with large blocks but maybe we can find a similar scheme there.
In my defence I think I got this definition from earlier discussions
about the order in which DAGCombiner runs:
https://reviews.llvm.org/D33587#1372912
Unfortunately if you think of textual IR in a basic block, then
"bottom up" order starts at the top of the block and proceeds towards
the bottom :-(
%mul = fmul float %a, %b
%add = fadd float %mul, %c
A quick experiment shows that InstCombine is bottom-up:
$ cat z.ll
define float @main(float %a, float %b, float %c, float %d, float %e) {
%add = fadd float %a, %b
%sub = fsub float %add, %c
%mul = fmul float %sub, %d
%div = fdiv float %mul, %e
ret float %div
}
$ opt -instcombine -debug-only=instcombine z.ll
IC: Visiting: %add = fadd float %a, %b
IC: Visiting: %sub = fsub float %add, %c
IC: Visiting: %mul = fmul float %sub, %d
IC: Visiting: %div = fdiv float %mul, %e
DAGCombiner is top-down, which I think is wrong but it's hard to fix:
$ llc -march=aarch64 -debug-only=dagcombine z.ll
Combining: t14: f32 = fdiv t13, t10
Combining: t13: f32 = fmul t12, t8
Combining: t12: f32 = fsub t11, t6
Combining: t11: f32 = fadd t2, t4
I'm happy to see that GISel Combiner is bottom-up after all:
$ llc -march=aarch64 -global-isel -debug-only=gi-combiner z.ll
Try combining %5:_(s32) = G_FADD %0:_, %1:_
Try combining %6:_(s32) = G_FSUB %5:_, %2:_
Try combining %7:_(s32) = G_FMUL %6:_, %3:_
Try combining %8:_(s32) = G_FDIV %7:_, %4:_
Sorry if I have derailed your original questions Dominik. I think the
answers are "yes you have to teach your larger pattern to handle this
case" and "no there are no plans to improve this behaviour as far as I
know, and I'm not even sure how it would be improved".
Jay.
Jay.