[llvm-dev] AArch64 fmul/fadd fusion

Meador Inge via llvm-dev

unread,

Sep 18, 2015, 11:15:03 PM9/18/15

to llvm...@lists.llvm.org

Hi All,

Recently I was doing some AArch64 work and noticed some cases where
fmuls were not getting fused with fadds. Is there any particular
reason that the AArch64 machine combiner doesn't do this like it does
for add/mul?

I am happy to work up a patch for this, but I wanted to make sure that
there wasn't a good reason for it not already being there. FWIW, I
see where GCC is doing this optimization.

Cheers,

Meador
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Tim Northover via llvm-dev

unread,

Sep 18, 2015, 11:34:26 PM9/18/15

to Meador Inge, llvm...@lists.llvm.org

On 18 September 2015 at 20:14, Meador Inge via llvm-dev

<llvm...@lists.llvm.org> wrote:
> Recently I was doing some AArch64 work and noticed some cases where
> fmuls were not getting fused with fadds. Is there any particular
> reason that the AArch64 machine combiner doesn't do this like it does
> for add/mul?

AArch64's fmadd instruction is fused, which means it can produce a
different result to the two operations executed separately. The C and
C++ standards do not allow such changes.

We support them via various flags (-ffast-math is the obvious one,
though an argument could be made for supporting -mfused-madd and
-mnofused-madd as well). But in the backend we definitely have to
check *somthing* before doing the substitution.

> I am happy to work up a patch for this, but I wanted to make sure that
> there wasn't a good reason for it not already being there. FWIW, I
> see where GCC is doing this optimization.

You might want to get together with Ana Pazos, who just asked similar
questions today.

Personally, I'd be sad to see Clang's default execution become less
conforming to the standard. But it's pretty undeniable that
"-std=gnu99/gnu11/..." do allow it by default.

Cheers.

Tim.

Meador Inge via llvm-dev

unread,

Sep 19, 2015, 12:19:05 AM9/19/15

to Tim Northover, llvm...@lists.llvm.org

On Fri, Sep 18, 2015 at 10:34 PM, Tim Northover <t.p.no...@gmail.com> wrote:

> AArch64's fmadd instruction is fused, which means it can produce a
> different result to the two operations executed separately. The C and
> C++ standards do not allow such changes.

Sorry, sloppy language on my part. I was aware of fmadd, but I was
really asking about turning sequences like:

fmul s0, s0, s2
fadd s0, s1, s0

into a fmadd:

fmadd s0, s0, s2, s1

> We support them via various flags (-ffast-math is the obvious one,
> though an argument could be made for supporting -mfused-madd and
> -mnofused-madd as well). But in the backend we definitely have to
> check *somthing* before doing the substitution.

Support in what way? I don't see any patterns or machine combiners
to do the above replacement. Did I miss something?

If I didn't miss something, is there interest in adding this to the AArch64
machine combiners assuming it was guarded by the right flag?

> You might want to get together with Ana Pazos, who just asked similar
> questions today.

I see that on cfe-dev now. Thanks for pointing that out.

> Personally, I'd be sad to see Clang's default execution become less
> conforming to the standard. But it's pretty undeniable that
> "-std=gnu99/gnu11/..." do allow it by default.

Agreed.

Thanks for the reply!

Cheers,

Meador

Joerg Sonnenberger via llvm-dev

unread,

Sep 20, 2015, 5:36:40 PM9/20/15

to llvm...@lists.llvm.org

On Fri, Sep 18, 2015 at 11:18:49PM -0500, Meador Inge via llvm-dev wrote:
> On Fri, Sep 18, 2015 at 10:34 PM, Tim Northover <t.p.no...@gmail.com> wrote:
>
> > AArch64's fmadd instruction is fused, which means it can produce a
> > different result to the two operations executed separately. The C and
> > C++ standards do not allow such changes.
>
> Sorry, sloppy language on my part. I was aware of fmadd, but I was
> really asking about turning sequences like:
>
> fmul s0, s0, s2
> fadd s0, s1, s0
>
> into a fmadd:
>
> fmadd s0, s0, s2, s1

...which is exactly the transform Tim is talking about. FMA is required
to provide a correctly rounded result *without* intermediate rounding.
The mul+add sequence on other the side are required to perform that
rounding. There are algorithms known to break under such optimisations,
which is why it is not enabled by default. The logic for producing FMA
is not target specific otherwise.

Joerg

Tim Northover via llvm-dev

unread,

Sep 20, 2015, 10:37:41 PM9/20/15

to Meador Inge, llvm...@lists.llvm.org

>> We support them via various flags (-ffast-math is the obvious one,
>> though an argument could be made for supporting -mfused-madd and
>> -mnofused-madd as well). But in the backend we definitely have to
>> check *somthing* before doing the substitution.
>
> Support in what way? I don't see any patterns or machine combiners
> to do the above replacement. Did I miss something?

(Having just checked lib/CodeGen/SelectionDAG) I believe it's mostly
covered before target-specific code gets involved, in the generic DAG
logic at the moment. ISD::FMA nodes should only be formed from paired
mul/add based on "AllowFPOpFusion" checks (and other similar explicit
permissions).

So I think the status quo is that generic code does any permissible
fusion, and AArch64 code really doesn't have any freedom to fuse more
operations on top of that.

> If I didn't miss something, is there interest in adding this to the AArch64
> machine combiners assuming it was guarded by the right flag?

If you can find a missing-but-allowed case, adding it to the generic
handling would probably be better than making it AArch64 only.

Cheers.

Tim.

Meador Inge via llvm-dev

unread,

Sep 21, 2015, 8:56:21 AM9/21/15

to Tim Northover, llvm...@lists.llvm.org

On Sun, Sep 20, 2015 at 9:37 PM, Tim Northover <t.p.no...@gmail.com> wrote:

> (Having just checked lib/CodeGen/SelectionDAG) I believe it's mostly
> covered before target-specific code gets involved, in the generic DAG
> logic at the moment. ISD::FMA nodes should only be formed from paired
> mul/add based on "AllowFPOpFusion" checks (and other similar explicit
> permissions).

Ah, I see it in the DAG combiner now. Thanks.

> So I think the status quo is that generic code does any permissible
> fusion, and AArch64 code really doesn't have any freedom to fuse more
> operations on top of that.

Yeah. I originally got thrown off because the integer version is AArch64
specific. There doesn't seem to be any generic handling for that.

> If you can find a missing-but-allowed case, adding it to the generic
> handling would probably be better than making it AArch64 only.

The cases that I was looking into all seem to be handled.

Thanks!

-- Meador

Reply all

Reply to author

Forward