As a developer (who always reads the docs and generally makes good life choices), if I turn on -ffast-math, I want the compiler to produce the fastest possible floating point math code possible, floating point semantics be darned. Given this viewpoint, my opinion on this topic is that the compiler should do whatever it wants, given the constraints of the documented behavior of NaN. I think the clang docs for -ffast-math are pretty clear on this subject:
```
Enable fast-math mode. This option lets the compiler make aggressive, potentially-lossy assumptions about floating-point math. These include:
...
- Operands to floating-point operations are not equal to NaN and Inf ...
```
The compiler may assume that operands to floating point operations are not NaN or infinity. So:
- What should return `std::numeric_limits<double>::has_quiet_NaN()`? : It should return true if it would have returned true with fast math disabled. Clang is not required to pretend NaN doesn't exist, it's allowed to pretend arguments cannot be NaN if that is convenient.
- What body should have this function if it is used in a program where some functions are compiled with `fast-math` and some without? : This function should be allowed to act as if NaN exists in all cases.
- Should inlining of a function compiled with `fast-math` to a function compiled without it be prohibited in inliner? No. The author of the function that uses fast-math made their choices, and the user of that function should have vetted their dependencies better. In my view, this is no different than if somebody wrote `if (x == y/z) ...`; it's a bug on the user. It's not clang's fault that this code doesn't work as the author wanted.
- Should `std::isnan(std::numeric_limits<float>::quiet_NaN())` be true? : No. quiet_NaN() can return whatever it wants, but the call to std::isnan can be replaced with false since it may assume it's argument is not NaN.
Of course, this all sounds fine and well, but the reality is that people don't read docs and don't make good life choices. They turn on fast math because they want it to reduce `x * 0` to `0`, and are surprised when their NaN handling code fails. This is unfortunate, but I don't think we should reduce the effectiveness of fast-math because of this human issue. Other flags exist for these users, and when they complain they should be told about them. Really this is an issue of poor developer discipline, and if we really want to solve this, perhaps some sort of "fast math sanitizer" can be created. It can statically analyze code and complain when it sees things like `if (isnan(foo))` not guarded by `__FAST_MATH__` with mast math enabled. Or, maybe the compiler can just issue a warning unconditionally in this case.
Thanks,
Chris Tetreault
From: cfe-dev <cfe-dev...@lists.llvm.org> On Behalf Of
Serge Pavlov via cfe-dev
Sent: Wednesday, September 8, 2021 10:03 AM
To: LLVM Developers <llvm...@lists.llvm.org>; Clang Dev <cfe...@lists.llvm.org>
Subject: [cfe-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
There is a huge different between optimisations that assume NaN is not
present and breaking checks for them. I'm not convinced at all that
constant-folding isnan to false will actually speed up real world code.
Joerg
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Are fast math flags _required_ to make assumptions? Or simply _allowed_? The difference is key here.
-----Original Message-----
From: cfe-dev <cfe-dev...@lists.llvm.org> On Behalf Of Joerg Sonnenberger via cfe-dev
Sent: Wednesday, September 08, 2021 4:58 PM
To: llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?
EXTERNAL
On Wed, Sep 08, 2021 at 06:04:08PM +0000, Chris Tetreault via llvm-dev wrote:
> As a developer (who always reads the docs and generally makes good life
> choices), if I turn on -ffast-math, I want the compiler to produce the
> fastest possible floating point math code possible, floating point
> semantics be darned. Given this viewpoint, my opinion on this topic is
> that the compiler should do whatever it wants, given the constraints of
> the documented behavior of NaN.
There is a huge different between optimisations that assume NaN is not
present and breaking checks for them. I'm not convinced at all that
constant-folding isnan to false will actually speed up real world code.
Joerg
_______________________________________________
cfe-dev mailing list
cfe...@lists.llvm.org
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcfe-dev&data=04%7C01%7Ckevin.neal%40sas.com%7C601e59a1438e478816ea08d9730b6652%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C637667315063404929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZSz6TH00A3DndUq1563b5akWHpxf81ZGn6nqImqP8Gw%3D&reserved=0
_______________________________________________
cfe-dev mailing list
cfe...@lists.llvm.org
I expressed my strong support for this on the previous thread, but I'll just repost the most important piece...I believe the proposed semantics from the Clang level ought to be:The -ffinite-math-only and -fno-signed-zeros options do not impact the ability to accurately load, store, copy, or pass or return such values from general function calls. They also do not impact any of the "non-computational" and "quiet-computational" IEEE-754 operations, which includes classification functions (fpclassify, signbit, isinf/isnan/etc), sign-modification (copysign, fabs, and negation `-(x)`), as well as the totalorder and totalordermag functions. Those correctly handle NaN, Inf, and signed zeros even when the flags are in effect. These flags do affect the behavior of other expressions and math standard-library calls, as well as comparison operations.
_______________________________________________
If we say that the fast-math flags are “enabling optimizations that the presence of nans otherwise prohibits”, then there is no reason for clang to keep calls to “isnan” around, or to keep checks like “fpclassify(x) == it’s_a_nan” unfolded. These are exactly the types of optimizations that the presence of NaNs would prohibit.
I understand the need for having some NaN-handling preserved in an otherwise finite-math code. We already have fast-math-related attributes attached to each function in the LLVM IR, so we could introduce a source-level attribute for enabling/disabling these flags per function.
--
Krzysztof Parzyszek kpar...@quicinc.com AI tools development
From: cfe-dev <cfe-dev...@lists.llvm.org> On Behalf Of
Chris Lattner via cfe-dev
Sent: Wednesday, September 8, 2021 5:51 PM
To: James Y Knight <jykn...@google.com>
Cc: LLVM Developers <llvm...@lists.llvm.org>; Clang Dev <cfe...@lists.llvm.org>
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
On Sep 8, 2021, at 3:27 PM, James Y Knight via llvm-dev <llvm...@lists.llvm.org> wrote:
Not sure which way to go, but I agree that we need to improve the docs/user experience either way.Let's try to iron this out with an example (this is based on https://llvm.org/PR51775 ):#include <math.h>#include <stdlib.h>
int main() {
const double d = strtod("1E+1000000", NULL);
return d == HUGE_VAL;
}
What should this program return when compiled with -ffinite-math-only? Should this trigger a clang warning?
If we say that the fast-math flags are “enabling optimizations that the presence of nans otherwise prohibits”, then there is no reason for clang to keep calls to “isnan” around, or to keep checks like “fpclassify(x) == it’s_a_nan” unfolded. These are exactly the types of optimizations that the presence of NaNs would prohibit.
I understand the need for having some NaN-handling preserved in an otherwise finite-math code. We already have fast-math-related attributes attached to each function in the LLVM IR, so we could introduce a source-level attribute for enabling/disabling these flags per function.
This goes back to what these options actually imply. The interpretation that I favor is “this code will never see a NaN”, or “the program can assume that no floating point expression will evaluate to a NaN”. The benefit of that is that it’s intuitively clear. In that case “isnan(x)” is false, because x cannot be a NaN. There is no distinction between “isnan(x+x)” and “isnan(x)”. If the user wants to preserve “isnan(x)”, they can apply some pragma (which clang may actually have already).
To be honest, I’m not sure that I understand your argument. Are you saying that under your interpretation we could optimize “isnan(x+x) -> false”, but not “isnan(x) -> false”?
This goes back to what these options actually imply. The interpretation that I favor is “this code will never see a NaN”, or “the program can assume that no floating point expression will evaluate to a NaN”. The benefit of that is that it’s intuitively clear. In that case “isnan(x)” is false, because x cannot be a NaN. There is no distinction between “isnan(x+x)” and “isnan(x)”. If the user wants to preserve “isnan(x)”, they can apply some pragma (which clang may actually have already).
To be honest, I’m not sure that I understand your argument. Are you saying that under your interpretation we could optimize “isnan(x+x) -> false”, but not “isnan(x) -> false”?
If the issue is that users want their asserts to fire, then they should be encouraged to only enable fast math in release builds.
It is apparent simplicity. As the discussion in gcc mail list demonstrated (https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544641.html) this is actually an impromissing way. From a practical viewpoint it is also a bad solution as users cannot even check the assertions.
The intent here is that users can preserve the NaN behavior by annotating the code with either attributes or pragmas. I don’t think that the linked discussion actually shows that the “no NaNs ever” interpretation is any worse than the “arithmetic operations do not produce NaNs”. A large part of was what happens to `__builtin_nan`, but if your code explicitly produces NaNs and you compile it finite-math, you shouldn’t expect anything meaningful. IMO it’s much better to have a flag with a clarity of what it does, even if it leads to potentially unexpected results, than having an option whose description is open to interpretation. At least the users will know what caused the issue, rather than wonder if they had found a compiler bug or not.
I agree that there may be issues with multiple definitions of functions compiled with different settings, although that is not strictly limited to FP flags. There should be some unified approach to that, and I don’t know what the right thing to do it off the top of my head.
Argument of `isnan(x+x)` is a result of arithmetic operation. According to the meaning of -ffinite-math-only it cannot produce NaN. So this call can be optimized out. In the general case `isnan(x)` value may be, say, loaded from memory. Load is not an arithmetic operation, so nothing prevents from loading NaN. Optimizing the call out is dangerous in this case.
`x` is not a load, it’s an expression. Also, even in the presence of NaNs, x+0 preserves the value type (i.e. normal/subnormal/infinity/NaN), except signaling NaNs perhaps. I’m not sure whether we even consider signaling NaNs, so let’s forget them for a moment. If x+0 is a NaN iff x is a NaN, then the compiler should be able to rewrite x -> x+0 regardless of any flags. But then, given that x+0 is now “arithmetic”, isnan(x+0) could become `false`. This is fundamentally counterintuitive.
Furthermore, if we had `a = isnan(x)`, we couldn’t fold it to `false`, but if we had `a = isnan(x); b = isnan(x+x)`, then we could fold both to `false`. This is, again, unintuitive.
It is apparent simplicity. As the discussion in gcc mail list demonstrated (https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544641.html) this is actually an impromissing way. From a practical viewpoint it is also a bad solution as users cannot even check the assertions.
The intent here is that users can preserve the NaN behavior by annotating the code with either attributes or pragmas. I don’t think that the linked discussion actually shows that the “no NaNs ever” interpretation is any worse than the “arithmetic operations do not produce NaNs”. A large part of was what happens to `__builtin_nan`, but if your code explicitly produces NaNs and you compile it finite-math, you shouldn’t expect anything meaningful.
IMO it’s much better to have a flag with a clarity of what it does, even if it leads to potentially unexpected results, than having an option whose description is open to interpretation. At least the users will know what caused the issue, rather than wonder if they had found a compiler bug or not.
In this case, I think it’s perfectly reasonable to reinterpret_cast the floats to uint32_t, and then inspect the bit pattern. Since NaN is being used as a sentinel value, I assume it’s a known bit pattern, and not just any old NaN.
I think it’s fine that fast-math renders isnan useless. As far as I know, the C++ standard wasn’t written to account for compilers providing fast-math flags. fast-math is itself a workaround for “IEEE floats do not behave like actual real numbers”, so working around a workaround seems reasonable to me.
In this case, I think it’s perfectly reasonable to reinterpret_cast the floats to uint32_t, and then inspect the bit pattern. Since NaN is being used as a sentinel value, I assume it’s a known bit pattern, and not just any old NaN.
I think it’s fine that fast-math renders isnan useless. As far as I know, the C++ standard wasn’t written to account for compilers providing fast-math flags. fast-math is itself a workaround for “IEEE floats do not behave like actual real numbers”, so working around a workaround seems reasonable to me.
Let me describe a real life example.There is a realtime program that processes float values from a huge array. Calculations do not produce NaNs and do not expect them. Using -ffinite-math-only substantially speeds up the program, so it is highly desirable to use it. The problem is that the array contains NaNs, they mark elements that should not be processed.An obvious solution is to check an element for NaN, and if it is not, process it. Now there is no clean way to do so. Only workarounds, like using integer arithmetics. The function 'isnan' became useless. And there are many cases when users complain of this optimization.
On Thu, Sep 9, 2021 at 10:34 AM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:Let me describe a real life example.There is a realtime program that processes float values from a huge array. Calculations do not produce NaNs and do not expect them. Using -ffinite-math-only substantially speeds up the program, so it is highly desirable to use it. The problem is that the array contains NaNs, they mark elements that should not be processed.An obvious solution is to check an element for NaN, and if it is not, process it. Now there is no clean way to do so. Only workarounds, like using integer arithmetics. The function 'isnan' became useless. And there are many cases when users complain of this optimization.I personally would separate the "pre-processing" of the input in a compilation unit that isn't compiled with -ffinite-math-only and isolate the perf-critical routines to be compiled with this flag if needed (I'd also like a sanitizer to have a build mode that validate that no NaNs are ever seen in this routines).
In general, Krzysztof's reasoning in this thread makes sense to me, in particular in terms of being consistent with how we treat isnan(x) vs isnan(x+0) for example.
(Speaking only for myself here, and mostly as someone who doesn’t typically write floating-point-heavy code).
The root issue we have here is that, as with many compiler extensions, fast-math flags ends up creating a vaguely-defined variant of the C specification governed by the “obvious” semantics, and is the case with “obvious” semantics, there are several different “obvious” results.
Given the standard C taste for undefined behavior, it would seem to me that the most natural definition of -ffinite-math-only would be to say that any operation that produces NaN or infinity results is undefined behavior, or produces a poison value using LLVM’s somewhat tighter definition here [1]. This notably doesn’t give a clear answer on what to do with floating-point operations that don’t produce floating-point results (e.g., casts, comparison operators), and the volume of discussions on this point is I think indicative that there are multiple reasonable options here. Personally, I find the extension of the UB to cases that consume but do not produce floating-point values to be the most natural option.
It’s also the case that many users don’t like undefined behavior as a concept, in large part because it can be very difficult to work around in a few cases where it is desired to explicitly override the undefined behavior. For some of the more basic integer UB, clang already provides builtin overflow checking macros to handle the I-want-to-check-if-it-overflowed-without-UB case, for example. And if fast math flags are to create UB, then similar functionality to override the floating-point UB ought to be provided. Already, C provides a mechanism to twiddle floating-point behavior on a per-scope basis (e.g., #pragma STDC FENV_ACCESS, CX_LIMITED_RANGE, FP_CONTRACT). LLVM already supports these flags on a per-instruction basis, so it really shouldn’t be very difficult to have Clang support pragmas to twiddle fast-math flags like the existing C pragmas. And in this model, the -ffast-math and related flags are doing nothing more than setting the default values of these pragmas.
In that vein, I can imagine a user writing a program that would look something like this:
int some_hard_math_kernel(float *inputs, float *outputs, int N) {
{
#pragma clang fast_math off
for (int i = 0; i < N; i++) {
if (isinf(inputs[i]) || isnan(inputs[i]))
return ILLEGAL_ARGUMENT;
}
}
#pragma clang fast_math on
// Do fancy math here…
// and if we see isnan(x) here, even if it’s in a library routine [compiled with -ffast-math],
// or maybe implied by some operation the compiler understands [say, complex multiplication]
// it is optimized to false.
return SUCCESS;
}
I can clearly see use cases where the programmer might wish to have the optimizer eliminate any isnan calls that are generated when -ffast-math is used, but like other UB, I think it is extremely beneficial to provide some way to explicitly opt-out of UB on a case-by-case basis.
I would even go so far as to suggest that maybe the C standards committee should discuss how to handle at least the nsz/nnan/ninf parts of fast-math flags, given that very similar concepts seem to exist in all of the major C/C++ compilers.
[1] I fully expect any user who is knowledgeable about poison in LLVM—which admittedly is a fairly expert user—would expect poison to kick in most of the time C or C++ provides for undefined behavior, and potentially to rely on that expectation.
The point I was trying to make regarding the C++ standard is that fast-math is a non-standard language extension. If you enable it, you should expect the compiler to diverge from the language standard. I’m sure there’s precedent for this. If I write #pragma once at the top of my header, and include it twice back to back, the preprocessor won’t paste my header twice. Should #pragma once be removed because it breaks #include?
Now, you have a real-world example that uses NaN as a sentinel value. In your case, it would be nice if the compiler worked as you suggest. Now, suppose I have a “safe matrix multiply”:
```
std::optional<MyMatrixT> safeMul(const MyMatrixT & lhs, const MyMatrixT & rhs) {
for (int i = 0; i < lhs.rows; ++i) {
for (int j = 0; j < lhs.cols; ++j) {
if (isnan(lhs[i][j])) {
return {};
}
}
}
for (int i = 0; i < rhs.rows; ++i) {
for (int j = 0; j < rhs.cols; ++j) {
if (isnan(rhs[i][j])) {
return {};
}
}
}
// do the multiply
}
```
In this case, if isnan(x) can be constant folded to false with fast-math enabled, then these two loops can be completely eliminated since they are empty and do nothing. If MyMatrixT is a 100 x 100 matrix, and/or safeMul is called in a hot loop, this could be huge. What should I do instead here?
Really, it would be much more consistent if we apply the clang documentation for fast-math “Operands to floating-point operations are not equal to NaN and Inf” literally, and not actually implement “Operands to floating-point operations are not equal to NaN and Inf, except in the case of isnan(), but only if the argument to isnan() is a value stored in a variable and not an expression”. As far as using isnan from the standard library compiled without fast-math vs a compiler builtin, I don’t think this is an issue. Really, enabling fast-math is basically telling the compiler “My code has no NaNs. I won’t try to do anything with them, and you should optimize assuming they aren’t there”. If a developer does their part, why should it matter to them that isnan() might work?
Thanks,
Chris Tetreault
Not sure which way to go, but I agree that we need to improve the docs/user experience either way.Let's try to iron this out with an example (this is based on https://llvm.org/PR51775 ):#include <math.h>#include <stdlib.h>
int main() {
const double d = strtod("1E+1000000", NULL);
return d == HUGE_VAL;
What should this program return when compiled with -ffinite-math-only? Should this trigger a clang warning?
The proposed documentation text isn't clear to me. Should clang apply "nnan ninf" to the IR call for "strtod"?
"strtod" is not in the enumerated list of functions where we would block fast-math-flags, but it is a standard lib call, so "nnan ninf" would seem to apply...but we also don't want "-ffinite-math-only" to alter the ability to return an INF from a "general function call"?
The point I was trying to make regarding the C++ standard is that fast-math is a non-standard language extension.
Would it be reasonable to treat operations on Inf and NaN values as UB in this mode only if the same operation on a signaling NaN might signal? (Approximately, that'd mean we imagine these non-finite value encodings all encode sNaNs that are UB if they would signal.) That means the operations that ISO 60559 defines as non-computational or quiet-computational would be permitted to receive NaN and Inf as input and produce them as output, but that other computational operations would not.Per ISO 60559, the quiet-computational operations that I think are relevant to us are: copy, negate, abs, copySign, and conversions between encoding (eg, bitcast). The non-computational operations that I think are relevant to us are classification functions (including isNaN).
I’m not super knowledgeable on the actual implementation of floating point math in clang, but on the surface this seems fine. My position is that we should provide no guarantees as to the behavior of code with NaN or infinity if fast-math is enabled. We can go with this behavior, but we shouldn’t tell users that they can rely on this behavior. Clang should have maximal freedom to optimize floating point math with fast-math, and any constraint we place potentially results in missed opportunities. Similarly we should feel free to change this implementation in the future, the goal not being stability for users who chose to rely on our implementation details. If users value reproducibility, they should not be using fast math.
The only thing I think we should guarantee is that casts work. I should be able to load some bytes from disk, cast the char array to a float array, and any NaNs that I loaded from disk should not be clobbered. After that, if I should be able to cast an element of my float array back to another type and inspect the bit pattern (assuming I did not transform that element in the array in any other way after casting it from char) to support use cases like Serge’s. Any other operation should be fair game.
Thanks,
Chris Tetreault
I’m not super knowledgeable on the actual implementation of floating point math in clang, but on the surface this seems fine. My position is that we should provide no guarantees as to the behavior of code with NaN or infinity if fast-math is enabled. We can go with this behavior, but we shouldn’t tell users that they can rely on this behavior. Clang should have maximal freedom to optimize floating point math with fast-math, and any constraint we place potentially results in missed opportunities. Similarly we should feel free to change this implementation in the future, the goal not being stability for users who chose to rely on our implementation details. If users value reproducibility, they should not be using fast math.
The only thing I think we should guarantee is that casts work. I should be able to load some bytes from disk, cast the char array to a float array, and any NaNs that I loaded from disk should not be clobbered. After that, if I should be able to cast an element of my float array back to another type and inspect the bit pattern (assuming I did not transform that element in the array in any other way after casting it from char) to support use cases like Serge’s. Any other operation should be fair game.
Thanks,
Chris Tetreault
I would argue that #undef’ing a macro provided by the compiler is a much worse kludge that static casting your float to an unsigned int. Additionally, you have to re define isnan to whatever it was after your function (let it pollute unrelated code that possibly isn’t even being compiled with fast math), which can’t be done portably as far as I know. Additionally, this requires you to be the author of safeMul. What if it’s in a dependency for which you don’t have the source? At that point, your only recourse is to open an issue with libProprietaryMatrixMath and hope your org is paying them enough to fast track a fix.
Without trying to be too harsh, this is the bad justification GCC has
used for years for exploiting all kinds of UB and implementation-defined
behavior in the name of performance. As has been shown over and over
again, the breakage is rarely matched by equivalent performance gains.
So once more, do we even have proof that significant code exists where
isnan and friends are used in a performance critical code path? I would
find that quite surprising and more an argument for throwing a compile
error...
Joerg
The problem is that math code is often templated, so `template <typename T> MyMatrixT<T> safeMul(const MyMatrixT<T> & lhs …` is going to be in a header.
Regardless, my position isn’t “there is no NaN”. My position is “you cannot count on operations on NaN working”. Just like sometimes you can dereference a pointer after it is free’d, but you should not count on this working. If the compiler I’m using emits a call to a library function instead of providing a macro, and this results in isnan actually computing if x is NaN, then so be it. But if the compiler provides a macro that evaluates to false under fast-math, then the two loops in safeMul can be optimized. Either way, as a developer, I know that I turned on fast-math, and I write code accordingly.
I think working around these sorts of issues is something that C and C++ developers are used to. These sorts of “inconsistent” between compilers behaviors is something we accept because we know it comes with improved performance. In this case, the fix is easy, so I don’t think this corner case is worth supporting. Especially when the fix is also just one line:
```
#define myIsNan(x) (reinterpret_cast<uint32_t>(x) == THE_BIT_PATTERN_OF_MY_SENTINEL_NAN)
```
I would probably call the macro something else like `shouldProcessElement`.
Thanks,
Chris Tetreault
From: Serge Pavlov <sepa...@gmail.com>
Sent: Friday, September 10, 2021 11:26 AM
To: Chris Tetreault <ctet...@quicinc.com>
Cc: Richard Smith <ric...@metafoo.co.uk>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
It should not be done in headers of course. Redefinition of this macro in the source file which is compiled with -ffinite-math-only is free from the described drawbacks. Besides, the macro `isnan` is defined by libc, not compiler and IIRC it is defined as macro to allow such manipulations.