In the process of investigating a performance difference between Clang & GCC when both compile
the same non-toolchain program while using the "same"* compiler flags, I have found something
that may be worth changing in Clang, developed a patch, and confirmed that the patch has its
intended effect.
*: "same" in quotes b/c the essence of the problem is that the _meaning_ of "-O3" on Clang
differs from that of "-O3" on GCC in at least one way.
The specific problem here relates to the default settings for FP contraction, e.g. fused
multiply-add. At -O2 and higher, GCC defaults FP contraction to "fast", i.e. always on. I`m
not suggesting that Clang/LLVM/both need to do the same, since Clang+LLVM has good support for
"#pragma STDC FP_CONTRACT".
If we keep Clang`s default for FP contraction at "on" [which really means "according to the
pragma"] but change the default value of the _pragma_ [currently off] to on at -O3, then Clang
will be more competitive with GCC at high optimization settings without resorting to the
more-brutish "fast by default" at plain -O3 [as opposed to "-Ofast", "-O3 -ffast-math", etc.].
Since I don`t know what Objective-C [and Objective-C++] have to say about FP operations, I have
made my patch very selective based on language. Also, I noticed that the CUDA front-end seems
to already have its own defaults for FP contraction, so there`s no need to change this for
every language.
I needed to change one test case because it made an assumption that FP contraction is off by
default when compiling with "-O3" but without any additional optimization-related flags.
Patch relative to upstream code with Git ID b0768e805d1d33d730e5bd44ba578df043dfbc66
------------------------------------------------------------------------------------
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index 619ea9c..d02d873 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -2437,6 +2437,13 @@ bool CompilerInvocation::CreateFromArgs(CompilerInvocation &Res,
if (Arch == llvm::Triple::spir || Arch == llvm::Triple::spir64) {
Res.getDiagnosticOpts().Warnings.push_back("spir-compat");
}
+
+ // If there will ever be e.g. "LangOpts.C", replace "LangOpts.C11 || LangOpts.C99" with
"LangOpts.C" on the next line.
+ if ( (LangOpts.C11 || LangOpts.C99 || LangOpts.CPlusPlus) // ...
+ /*...*/ && ( CodeGenOptions::FPC_On == Res.getCodeGenOpts().getFPContractMode() ) // ... //
just being careful
+ /*...*/ && (Res.getCodeGenOpts().OptimizationLevel >= 3) )
+ LangOpts.DefaultFPContract = 1;
+
return Success;
}
diff --git a/clang/test/CodeGen/fp-contract-pragma.cpp b/clang/test/CodeGen/fp-contract-pragma.cpp
index 1c5921a..0949272 100644
--- a/clang/test/CodeGen/fp-contract-pragma.cpp
+++ b/clang/test/CodeGen/fp-contract-pragma.cpp
@@ -13,6 +13,7 @@ float fp_contract_2(float a, float b, float c) {
// CHECK: _Z13fp_contract_2fff
// CHECK: %[[M:.+]] = fmul float %a, %b
// CHECK-NEXT: fadd float %[[M]], %c
+ #pragma STDC FP_CONTRACT OFF
{
#pragma STDC FP_CONTRACT ON
}
Please give me any and all feedback you may have on this suggested change and this proposed patch.
Regards,
Abe
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
– Steve
> Gating this on -Owhatever is dangerous, . We should simply default to the pragma “on” state universally.
Why so? [honestly asking, not arguing]
My guess: b/c we don`t want programs to give different results when compiled at different
"-O<...>" settings with the exception of "-Ofast".
At any rate, the above change is trivial to apply to my recent proposed patch: just remove the
"&& (Res.getCodeGenOpts().OptimizationLevel >= 3)" part of the condition.
On 09/09/2016 04:31 PM, Stephen Canon wrote:Gating this on -Owhatever is dangerous, . We should simply default to the pragma “on” state universally.
Why so? [honestly asking, not arguing]
My guess: b/c we don`t want programs to give different results when compiled at different "-O<...>" settings with the exception of "-Ofast".
[Abe wrote:]
>> Why so? [honestly asking, not arguing]
>> My guess: b/c we don`t want programs to give different results when compiled at different "-O<...>" settings with the exception of "-Ofast".
[Steve Canon wrote:]
> Pretty much. In particular, imagine a user trying to debug an unexpected floating point result caused by conversion of a*b + c into fma(a, b, c).
I strongly agree with that philosophy. I tried arguing that GCC should change its policy, but
I was rebuffed ["RESOLVED INVALID"].
For reference: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77515
Pretty much. In particular, imagine a user trying to debug an unexpected floating point result caused by conversion of a*b + c into fma(a, b, c).I think that’s unavoidable, because of the way the optimization levels work. Even fma contraction is on by default (something I’d like to see), at -O0, we wouldn't be doing contraction for:auto x = a*b;auto y = x+c;but we would do that at -O2 since we do mem2reg on x.In C, we don't contract (the equivalent of) this unless we're passed fp-contract=fast. The pragma only licenses contraction within a statement.
IIRC, the situation in C++ is somewhat different, and the standard allows contraction across statement boundaries, though I don't think we take advantage of it at present.
TLDR: yeah, let's do this.
-- Abe
The below patch is relative to...
---------------------------------
commit b0768e805d1d33d730e5bd44ba578df043dfbc66
Author: George Burgess IV <george.b...@gmail.com>
Date: Wed Sep 7 20:03:19 2016 +0000
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index 619ea9c..4b04937 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -2437,6 +2437,12 @@ bool CompilerInvocation::CreateFromArgs(CompilerInvocation &Res,
if (Arch == llvm::Triple::spir || Arch == llvm::Triple::spir64) {
Res.getDiagnosticOpts().Warnings.push_back("spir-compat");
}
+
+ // If there will ever be e.g. "LangOpts.C", replace "LangOpts.C11 || LangOpts.C99" with
"LangOpts.C" on the next line.
+ if ( (LangOpts.C11 || LangOpts.C99 || LangOpts.CPlusPlus) // ...
+ /*...*/ && ( CodeGenOptions::FPC_On == Res.getCodeGenOpts().getFPContractMode() ) ) // ...
// just being careful
+ LangOpts.DefaultFPContract = 1;
+
return Success;
}
diff --git a/clang/test/CodeGen/fp-contract-pragma.cpp b/clang/test/CodeGen/fp-contract-pragma.cpp
index 1c5921a..0949272 100644
--- a/clang/test/CodeGen/fp-contract-pragma.cpp
+++ b/clang/test/CodeGen/fp-contract-pragma.cpp
@@ -13,6 +13,7 @@ float fp_contract_2(float a, float b, float c) {
// CHECK: _Z13fp_contract_2fff
// CHECK: %[[M:.+]] = fmul float %a, %b
// CHECK-NEXT: fadd float %[[M]], %c
+ #pragma STDC FP_CONTRACT OFF
{
#pragma STDC FP_CONTRACT ON
}
Oops. I did a "make check" when I _should_ have done a "make check-all". Some test cases
_are_ broken. I will work on fixing them [as well as finishing my new test cases that will
test in the future that the WIP improvement will not have regressed] and report again later.
-- Abe
I have added 4 test cases that all fail on the "vanilla" [i.e. unmodified] compiler and succeed
with my patch applied. Please see below, presented for comments/feedback.
The only difference across the non-O0 files is the -O<something> flag; would other people
prefer that I factor this out into one include file and 3 short stubs, if I can?
The only difference other than -O<something> between the O0 test and all the rest is that in
the -O0 case I have removed the "CHECK-NEXT" for "ret" immediately following the "fmadd" b/c at
-O0 the optimizer is not eliminating the boilerplate stack-related code that in this case is
not truly needed.
Regards,
Abe
diff --git a/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O0___aarch64-backend.c
b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O0___aarch64-backend.c
new file mode 100644
index 0000000..fd4a979
--- /dev/null
+++ b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O0___aarch64-backend.c
@@ -0,0 +1,15 @@
+// RUN: %clang_cc1 -triple aarch64 -O0 -S -o - %s | FileCheck %s
+// REQUIRES: aarch64-registered-target
+
+// CHECK-LABEL: fmadd_double:
+// CHECK: fmadd d0, d{{[0-7]}}, d{{[0-7]}}, d{{[0-7]}}
+double fmadd_double(double a, double b, double c) {
+ return a*b+c;
+}
+
+// CHECK-LABEL: fmadd_single:
+// CHECK: fmadd s0, s{{[0-7]}}, s{{[0-7]}}, s{{[0-7]}}
+float fmadd_single(float a, float b, float c) {
+ return a*b+c;
+}
+
diff --git a/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O1___aarch64-backend.c
b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O1___aarch64-backend.c
new file mode 100644
index 0000000..f5c55a9
--- /dev/null
+++ b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O1___aarch64-backend.c
@@ -0,0 +1,17 @@
+// RUN: %clang_cc1 -triple aarch64 -O1 -S -o - %s | FileCheck %s
+// REQUIRES: aarch64-registered-target
+
+// CHECK-LABEL: fmadd_double:
+// CHECK: fmadd d0, d{{[0-7]}}, d{{[0-7]}}, d{{[0-7]}}
+// CHECK-NEXT: ret
+double fmadd_double(double a, double b, double c) {
+ return a*b+c;
+}
+
+// CHECK-LABEL: fmadd_single:
+// CHECK: fmadd s0, s{{[0-7]}}, s{{[0-7]}}, s{{[0-7]}}
+// CHECK-NEXT: ret
+float fmadd_single(float a, float b, float c) {
+ return a*b+c;
+}
+
diff --git a/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O2___aarch64-backend.c
b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O2___aarch64-backend.c
new file mode 100644
index 0000000..98b82a6
--- /dev/null
+++ b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O2___aarch64-backend.c
@@ -0,0 +1,17 @@
+// RUN: %clang_cc1 -triple aarch64 -O2 -S -o - %s | FileCheck %s
+// REQUIRES: aarch64-registered-target
+
+// CHECK-LABEL: fmadd_double:
+// CHECK: fmadd d0, d{{[0-7]}}, d{{[0-7]}}, d{{[0-7]}}
+// CHECK-NEXT: ret
+double fmadd_double(double a, double b, double c) {
+ return a*b+c;
+}
+
+// CHECK-LABEL: fmadd_single:
+// CHECK: fmadd s0, s{{[0-7]}}, s{{[0-7]}}, s{{[0-7]}}
+// CHECK-NEXT: ret
+float fmadd_single(float a, float b, float c) {
+ return a*b+c;
+}
+
diff --git a/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O3___aarch64-backend.c
b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O3___aarch64-backend.c
new file mode 100644
index 0000000..4d64738
--- /dev/null
+++ b/clang/test/CodeGen/fp-contract-pragma___on-by-default___-O3___aarch64-backend.c
@@ -0,0 +1,17 @@
+// RUN: %clang_cc1 -triple aarch64 -O3 -S -o - %s | FileCheck %s
+// REQUIRES: aarch64-registered-target
+
+// CHECK-LABEL: fmadd_double:
+// CHECK: fmadd d0, d{{[0-7]}}, d{{[0-7]}}, d{{[0-7]}}
+// CHECK-NEXT: ret
+double fmadd_double(double a, double b, double c) {
+ return a*b+c;
+}
+
+// CHECK-LABEL: fmadd_single:
+// CHECK: fmadd s0, s{{[0-7]}}, s{{[0-7]}}, s{{[0-7]}}
+// CHECK-NEXT: ret
+float fmadd_single(float a, float b, float c) {
+ return a*b+c;
+}
+