[llvm-dev] defaults for FP contraction [e.g. fused multiply-add]: suggestion and patch to be slightly more aggressive and to make Clang`s optimization settings closer to having the same meaning as when they are given to GCC [at least for "-O3"]

Abe Skolnik via llvm-dev

unread,

Sep 9, 2016, 5:19:12 PM9/9/16

to llvm-dev

Dear all,

In the process of investigating a performance difference between Clang & GCC when both compile
the same non-toolchain program while using the "same"* compiler flags, I have found something
that may be worth changing in Clang, developed a patch, and confirmed that the patch has its
intended effect.

*: "same" in quotes b/c the essence of the problem is that the _meaning_ of "-O3" on Clang
differs from that of "-O3" on GCC in at least one way.

The specific problem here relates to the default settings for FP contraction, e.g. fused
multiply-add. At -O2 and higher, GCC defaults FP contraction to "fast", i.e. always on. I`m
not suggesting that Clang/LLVM/both need to do the same, since Clang+LLVM has good support for
"#pragma STDC FP_CONTRACT".

If we keep Clang`s default for FP contraction at "on" [which really means "according to the
pragma"] but change the default value of the _pragma_ [currently off] to on at -O3, then Clang
will be more competitive with GCC at high optimization settings without resorting to the
more-brutish "fast by default" at plain -O3 [as opposed to "-Ofast", "-O3 -ffast-math", etc.].

Since I don`t know what Objective-C [and Objective-C++] have to say about FP operations, I have
made my patch very selective based on language. Also, I noticed that the CUDA front-end seems
to already have its own defaults for FP contraction, so there`s no need to change this for
every language.

I needed to change one test case because it made an assumption that FP contraction is off by
default when compiling with "-O3" but without any additional optimization-related flags.

Patch relative to upstream code with Git ID b0768e805d1d33d730e5bd44ba578df043dfbc66
------------------------------------------------------------------------------------

diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index 619ea9c..d02d873 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -2437,6 +2437,13 @@ bool CompilerInvocation::CreateFromArgs(CompilerInvocation &Res,
if (Arch == llvm::Triple::spir || Arch == llvm::Triple::spir64) {
Res.getDiagnosticOpts().Warnings.push_back("spir-compat");
}
+
+ // If there will ever be e.g. "LangOpts.C", replace "LangOpts.C11 || LangOpts.C99" with
"LangOpts.C" on the next line.
+ if ( (LangOpts.C11 || LangOpts.C99 || LangOpts.CPlusPlus) // ...
+ /*...*/ && ( CodeGenOptions::FPC_On == Res.getCodeGenOpts().getFPContractMode() ) // ... //
just being careful
+ /*...*/ && (Res.getCodeGenOpts().OptimizationLevel >= 3) )
+ LangOpts.DefaultFPContract = 1;
+
return Success;
}

diff --git a/clang/test/CodeGen/fp-contract-pragma.cpp b/clang/test/CodeGen/fp-contract-pragma.cpp
index 1c5921a..0949272 100644
--- a/clang/test/CodeGen/fp-contract-pragma.cpp
+++ b/clang/test/CodeGen/fp-contract-pragma.cpp
@@ -13,6 +13,7 @@ float fp_contract_2(float a, float b, float c) {
// CHECK: _Z13fp_contract_2fff
// CHECK: %[[M:.+]] = fmul float %a, %b
// CHECK-NEXT: fadd float %[[M]], %c
+ #pragma STDC FP_CONTRACT OFF
{
#pragma STDC FP_CONTRACT ON
}

Please give me any and all feedback you may have on this suggested change and this proposed patch.

Regards,

Abe
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Stephen Canon via llvm-dev

unread,

Sep 9, 2016, 5:32:01 PM9/9/16

to Abe Skolnik, llvm-dev

Gating this on -Owhatever is dangerous, . We should simply default to the pragma “on” state universally.

– Steve

Abe Skolnik via llvm-dev

unread,

Sep 9, 2016, 6:23:00 PM9/9/16

to Stephen Canon, llvm-dev

On 09/09/2016 04:31 PM, Stephen Canon wrote:

> Gating this on -Owhatever is dangerous, . We should simply default to the pragma “on” state universally.

Why so? [honestly asking, not arguing]

My guess: b/c we don`t want programs to give different results when compiled at different
"-O<...>" settings with the exception of "-Ofast".

At any rate, the above change is trivial to apply to my recent proposed patch: just remove the
"&& (Res.getCodeGenOpts().OptimizationLevel >= 3)" part of the condition.

Steve Canon via llvm-dev

unread,

Sep 9, 2016, 6:28:01 PM9/9/16

to Abe Skolnik, llvm-dev

Sent from my iPhone

On Sep 9, 2016, at 6:21 PM, Abe Skolnik <a.sk...@samsung.com> wrote:

On 09/09/2016 04:31 PM, Stephen Canon wrote:

Gating this on -Owhatever is dangerous, . We should simply default to the pragma “on” state universally.

Why so? [honestly asking, not arguing]

My guess: b/c we don`t want programs to give different results when compiled at different "-O<...>" settings with the exception of "-Ofast".

Pretty much. In particular, imagine a user trying to debug an unexpected floating point result caused by conversion of a*b + c into fma(a, b, c).

Abe Skolnik via llvm-dev

unread,

Sep 9, 2016, 6:49:22 PM9/9/16

to Steve Canon, llvm-dev

[Stephen Canon wrote:]

>>> Gating this on -Owhatever is dangerous, . We should simply default to the pragma “on” state universally.

[Abe wrote:]

>> Why so? [honestly asking, not arguing]
>> My guess: b/c we don`t want programs to give different results when compiled at different "-O<...>" settings with the exception of "-Ofast".

[Steve Canon wrote:]

> Pretty much. In particular, imagine a user trying to debug an unexpected floating point result caused by conversion of a*b + c into fma(a, b, c).

I strongly agree with that philosophy. I tried arguing that GCC should change its policy, but
I was rebuffed ["RESOLVED INVALID"].

For reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77515

Chris Lattner via llvm-dev

unread,

Sep 9, 2016, 10:40:42 PM9/9/16

to Steve Canon, llvm-dev, Abe Skolnik

I think that’s unavoidable, because of the way the optimization levels work. Even fma contraction is on by default (something I’d like to see), at -O0, we wouldn't be doing contraction for:

auto x = a*b;

auto y = x+c;

but we would do that at -O2 since we do mem2reg on x.

-Chris

Steve Canon via llvm-dev

unread,

Sep 10, 2016, 6:33:38 AM9/10/16

to Chris Lattner, llvm-dev, Abe Skolnik

Sent from my iPhone

In C, we don't contract (the equivalent of) this unless we're passed fp-contract=fast. The pragma only licenses contraction within a statement.

IIRC, the situation in C++ is somewhat different, and the standard allows contraction across statement boundaries, though I don't think we take advantage of it at present.

You're definitely correct that there will still be differences; e.g.:

x = a*b + c;

y = a*b;

It might be that at some optimization level we prove y is unused / constant / etc. When targeting a machine where fma is costlier than mul, we generate mul+add in one case and fma in the other. These cases are necessarily rarer than if we gate it on optimization level, however. (And we want the perf win for -O0 anyway).

TLDR: yeah, let's do this.

Chris Lattner via llvm-dev

unread,

Sep 10, 2016, 9:18:21 PM9/10/16

to Steve Canon, llvm-dev, Abe Skolnik

On Sep 10, 2016, at 3:33 AM, Steve Canon <sca...@apple.com> wrote:

Pretty much. In particular, imagine a user trying to debug an unexpected floating point result caused by conversion of a*b + c into fma(a, b, c).

I think that’s unavoidable, because of the way the optimization levels work. Even fma contraction is on by default (something I’d like to see), at -O0, we wouldn't be doing contraction for:

auto x = a*b;
auto y = x+c;

but we would do that at -O2 since we do mem2reg on x.

In C, we don't contract (the equivalent of) this unless we're passed fp-contract=fast. The pragma only licenses contraction within a statement.

Ah ok. What’s GCC’s policy on this?

IIRC, the situation in C++ is somewhat different, and the standard allows contraction across statement boundaries, though I don't think we take advantage of it at present.

Is language standard pedanticism what we want to base our policies on? It’s great to not violate the standard of course, but it would be suboptimal for switching a .c file to .cpp to change its behavior. I’m not sure which way this cuts on this topic though, or if the cost is worth bearing.

TLDR: yeah, let's do this.

Nice :-)

-Chris

Abe Skolnik via llvm-dev

unread,

Sep 12, 2016, 3:00:02 PM9/12/16

to llvm-dev

The now-ungated-by-O3-or-higher passes with no new unexpected failures when run on Ubuntu
14.04.1 on a Xeon-based server in 64-bit mode. [No known unexpected failures when testing on
any other platform.]

-- Abe

The below patch is relative to...
---------------------------------
commit b0768e805d1d33d730e5bd44ba578df043dfbc66
Author: George Burgess IV <george.b...@gmail.com>
Date: Wed Sep 7 20:03:19 2016 +0000

diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index 619ea9c..4b04937 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -2437,6 +2437,12 @@ bool CompilerInvocation::CreateFromArgs(CompilerInvocation &Res,

if (Arch == llvm::Triple::spir || Arch == llvm::Triple::spir64) {
Res.getDiagnosticOpts().Warnings.push_back("spir-compat");
}
+
+ // If there will ever be e.g. "LangOpts.C", replace "LangOpts.C11 || LangOpts.C99" with
"LangOpts.C" on the next line.
+ if ( (LangOpts.C11 || LangOpts.C99 || LangOpts.CPlusPlus) // ...

+ /*...*/ && ( CodeGenOptions::FPC_On == Res.getCodeGenOpts().getFPContractMode() ) ) // ...
// just being careful

+ LangOpts.DefaultFPContract = 1;
+
return Success;
}

diff --git a/clang/test/CodeGen/fp-contract-pragma.cpp b/clang/test/CodeGen/fp-contract-pragma.cpp
index 1c5921a..0949272 100644
--- a/clang/test/CodeGen/fp-contract-pragma.cpp
+++ b/clang/test/CodeGen/fp-contract-pragma.cpp
@@ -13,6 +13,7 @@ float fp_contract_2(float a, float b, float c) {
// CHECK: _Z13fp_contract_2fff
// CHECK: %[[M:.+]] = fmul float %a, %b
// CHECK-NEXT: fadd float %[[M]], %c
+ #pragma STDC FP_CONTRACT OFF
{
#pragma STDC FP_CONTRACT ON
}

Abe Skolnik via llvm-dev

unread,

Sep 12, 2016, 3:43:59 PM9/12/16

to llvm-dev

On 09/12/2016 01:58 PM, Abe Skolnik wrote:
> The now-ungated-by-O3-or-higher passes with no new unexpected failures when run on Ubuntu 14.04.1 on a
> Xeon-based server in 64-bit mode. [No known unexpected failures when testing on any other
platform.]

Oops. I did a "make check" when I _should_ have done a "make check-all". Some test cases
_are_ broken. I will work on fixing them [as well as finishing my new test cases that will
test in the future that the WIP improvement will not have regressed] and report again later.

-- Abe

Abe Skolnik via llvm-dev

unread,

Sep 12, 2016, 5:10:17 PM9/12/16

to llvm-dev

Dear all,

I have added 4 test cases that all fail on the "vanilla" [i.e. unmodified] compiler and succeed
with my patch applied. Please see below, presented for comments/feedback.

The only difference across the non-O0 files is the -O<something> flag; would other people
prefer that I factor this out into one include file and 3 short stubs, if I can?

The only difference other than -O<something> between the O0 test and all the rest is that in
the -O0 case I have removed the "CHECK-NEXT" for "ret" immediately following the "fmadd" b/c at
-O0 the optimizer is not eliminating the boilerplate stack-related code that in this case is
not truly needed.

Regards,

Abe

diff --git a/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O0___aarch64-backend.c
b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O0___aarch64-backend.c
new file mode 100644
index 0000000..fd4a979
--- /dev/null
+++ b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O0___aarch64-backend.c
@@ -0,0 +1,15 @@
+// RUN: %clang_cc1 -triple aarch64 -O0 -S -o - %s | FileCheck %s
+// REQUIRES: aarch64-registered-target
+
+// CHECK-LABEL: fmadd_double:
+// CHECK: fmadd d0, d{{[0-7]}}, d{{[0-7]}}, d{{[0-7]}}
+double fmadd_double(double a, double b, double c) {
+ return a*b+c;
+}
+
+// CHECK-LABEL: fmadd_single:
+// CHECK: fmadd s0, s{{[0-7]}}, s{{[0-7]}}, s{{[0-7]}}
+float fmadd_single(float a, float b, float c) {
+ return a*b+c;
+}
+
diff --git a/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O1___aarch64-backend.c
b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O1___aarch64-backend.c
new file mode 100644
index 0000000..f5c55a9
--- /dev/null
+++ b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O1___aarch64-backend.c
@@ -0,0 +1,17 @@
+// RUN: %clang_cc1 -triple aarch64 -O1 -S -o - %s | FileCheck %s
+// REQUIRES: aarch64-registered-target
+
+// CHECK-LABEL: fmadd_double:
+// CHECK: fmadd d0, d{{[0-7]}}, d{{[0-7]}}, d{{[0-7]}}
+// CHECK-NEXT: ret
+double fmadd_double(double a, double b, double c) {
+ return a*b+c;
+}
+
+// CHECK-LABEL: fmadd_single:
+// CHECK: fmadd s0, s{{[0-7]}}, s{{[0-7]}}, s{{[0-7]}}
+// CHECK-NEXT: ret
+float fmadd_single(float a, float b, float c) {
+ return a*b+c;
+}
+
diff --git a/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O2___aarch64-backend.c
b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O2___aarch64-backend.c
new file mode 100644
index 0000000..98b82a6
--- /dev/null
+++ b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O2___aarch64-backend.c
@@ -0,0 +1,17 @@
+// RUN: %clang_cc1 -triple aarch64 -O2 -S -o - %s | FileCheck %s
+// REQUIRES: aarch64-registered-target
+
+// CHECK-LABEL: fmadd_double:
+// CHECK: fmadd d0, d{{[0-7]}}, d{{[0-7]}}, d{{[0-7]}}
+// CHECK-NEXT: ret
+double fmadd_double(double a, double b, double c) {
+ return a*b+c;
+}
+
+// CHECK-LABEL: fmadd_single:
+// CHECK: fmadd s0, s{{[0-7]}}, s{{[0-7]}}, s{{[0-7]}}
+// CHECK-NEXT: ret
+float fmadd_single(float a, float b, float c) {
+ return a*b+c;
+}
+
diff --git a/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O3___aarch64-backend.c
b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O3___aarch64-backend.c
new file mode 100644
index 0000000..4d64738
--- /dev/null
+++ b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O3___aarch64-backend.c
@@ -0,0 +1,17 @@
+// RUN: %clang_cc1 -triple aarch64 -O3 -S -o - %s | FileCheck %s
+// REQUIRES: aarch64-registered-target
+
+// CHECK-LABEL: fmadd_double:
+// CHECK: fmadd d0, d{{[0-7]}}, d{{[0-7]}}, d{{[0-7]}}
+// CHECK-NEXT: ret
+double fmadd_double(double a, double b, double c) {
+ return a*b+c;
+}
+
+// CHECK-LABEL: fmadd_single:
+// CHECK: fmadd s0, s{{[0-7]}}, s{{[0-7]}}, s{{[0-7]}}
+// CHECK-NEXT: ret
+float fmadd_single(float a, float b, float c) {
+ return a*b+c;
+}
+

Reply all

Reply to author

Forward