As a developer (who always reads the docs and generally makes good life choices), if I turn on -ffast-math, I want the compiler to produce the fastest possible floating point math code possible, floating point semantics be darned. Given this viewpoint, my opinion on this topic is that the compiler should do whatever it wants, given the constraints of the documented behavior of NaN. I think the clang docs for -ffast-math are pretty clear on this subject:
```
Enable fast-math mode. This option lets the compiler make aggressive, potentially-lossy assumptions about floating-point math. These include:
...
- Operands to floating-point operations are not equal to NaN and Inf ...
```
The compiler may assume that operands to floating point operations are not NaN or infinity. So:
- What should return `std::numeric_limits<double>::has_quiet_NaN()`? : It should return true if it would have returned true with fast math disabled. Clang is not required to pretend NaN doesn't exist, it's allowed to pretend arguments cannot be NaN if that is convenient.
- What body should have this function if it is used in a program where some functions are compiled with `fast-math` and some without? : This function should be allowed to act as if NaN exists in all cases.
- Should inlining of a function compiled with `fast-math` to a function compiled without it be prohibited in inliner? No. The author of the function that uses fast-math made their choices, and the user of that function should have vetted their dependencies better. In my view, this is no different than if somebody wrote `if (x == y/z) ...`; it's a bug on the user. It's not clang's fault that this code doesn't work as the author wanted.
- Should `std::isnan(std::numeric_limits<float>::quiet_NaN())` be true? : No. quiet_NaN() can return whatever it wants, but the call to std::isnan can be replaced with false since it may assume it's argument is not NaN.
Of course, this all sounds fine and well, but the reality is that people don't read docs and don't make good life choices. They turn on fast math because they want it to reduce `x * 0` to `0`, and are surprised when their NaN handling code fails. This is unfortunate, but I don't think we should reduce the effectiveness of fast-math because of this human issue. Other flags exist for these users, and when they complain they should be told about them. Really this is an issue of poor developer discipline, and if we really want to solve this, perhaps some sort of "fast math sanitizer" can be created. It can statically analyze code and complain when it sees things like `if (isnan(foo))` not guarded by `__FAST_MATH__` with mast math enabled. Or, maybe the compiler can just issue a warning unconditionally in this case.
Thanks,
Chris Tetreault
From: cfe-dev <cfe-dev...@lists.llvm.org> On Behalf Of
Serge Pavlov via cfe-dev
Sent: Wednesday, September 8, 2021 10:03 AM
To: LLVM Developers <llvm...@lists.llvm.org>; Clang Dev <cfe...@lists.llvm.org>
Subject: [cfe-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
There is a huge different between optimisations that assume NaN is not
present and breaking checks for them. I'm not convinced at all that
constant-folding isnan to false will actually speed up real world code.
Joerg
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Are fast math flags _required_ to make assumptions? Or simply _allowed_? The difference is key here.
-----Original Message-----
From: cfe-dev <cfe-dev...@lists.llvm.org> On Behalf Of Joerg Sonnenberger via cfe-dev
Sent: Wednesday, September 08, 2021 4:58 PM
To: llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?
EXTERNAL
On Wed, Sep 08, 2021 at 06:04:08PM +0000, Chris Tetreault via llvm-dev wrote:
> As a developer (who always reads the docs and generally makes good life
> choices), if I turn on -ffast-math, I want the compiler to produce the
> fastest possible floating point math code possible, floating point
> semantics be darned. Given this viewpoint, my opinion on this topic is
> that the compiler should do whatever it wants, given the constraints of
> the documented behavior of NaN.
There is a huge different between optimisations that assume NaN is not
present and breaking checks for them. I'm not convinced at all that
constant-folding isnan to false will actually speed up real world code.
Joerg
_______________________________________________
cfe-dev mailing list
cfe...@lists.llvm.org
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcfe-dev&data=04%7C01%7Ckevin.neal%40sas.com%7C601e59a1438e478816ea08d9730b6652%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C637667315063404929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=ZSz6TH00A3DndUq1563b5akWHpxf81ZGn6nqImqP8Gw%3D&reserved=0
_______________________________________________
cfe-dev mailing list
cfe...@lists.llvm.org
I expressed my strong support for this on the previous thread, but I'll just repost the most important piece...I believe the proposed semantics from the Clang level ought to be:The -ffinite-math-only and -fno-signed-zeros options do not impact the ability to accurately load, store, copy, or pass or return such values from general function calls. They also do not impact any of the "non-computational" and "quiet-computational" IEEE-754 operations, which includes classification functions (fpclassify, signbit, isinf/isnan/etc), sign-modification (copysign, fabs, and negation `-(x)`), as well as the totalorder and totalordermag functions. Those correctly handle NaN, Inf, and signed zeros even when the flags are in effect. These flags do affect the behavior of other expressions and math standard-library calls, as well as comparison operations.
_______________________________________________
If we say that the fast-math flags are “enabling optimizations that the presence of nans otherwise prohibits”, then there is no reason for clang to keep calls to “isnan” around, or to keep checks like “fpclassify(x) == it’s_a_nan” unfolded. These are exactly the types of optimizations that the presence of NaNs would prohibit.
I understand the need for having some NaN-handling preserved in an otherwise finite-math code. We already have fast-math-related attributes attached to each function in the LLVM IR, so we could introduce a source-level attribute for enabling/disabling these flags per function.
--
Krzysztof Parzyszek kpar...@quicinc.com AI tools development
From: cfe-dev <cfe-dev...@lists.llvm.org> On Behalf Of
Chris Lattner via cfe-dev
Sent: Wednesday, September 8, 2021 5:51 PM
To: James Y Knight <jykn...@google.com>
Cc: LLVM Developers <llvm...@lists.llvm.org>; Clang Dev <cfe...@lists.llvm.org>
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
On Sep 8, 2021, at 3:27 PM, James Y Knight via llvm-dev <llvm...@lists.llvm.org> wrote:
Not sure which way to go, but I agree that we need to improve the docs/user experience either way.Let's try to iron this out with an example (this is based on https://llvm.org/PR51775 ):#include <math.h>#include <stdlib.h>
int main() {
const double d = strtod("1E+1000000", NULL);
return d == HUGE_VAL;
}
What should this program return when compiled with -ffinite-math-only? Should this trigger a clang warning?
If we say that the fast-math flags are “enabling optimizations that the presence of nans otherwise prohibits”, then there is no reason for clang to keep calls to “isnan” around, or to keep checks like “fpclassify(x) == it’s_a_nan” unfolded. These are exactly the types of optimizations that the presence of NaNs would prohibit.
I understand the need for having some NaN-handling preserved in an otherwise finite-math code. We already have fast-math-related attributes attached to each function in the LLVM IR, so we could introduce a source-level attribute for enabling/disabling these flags per function.
This goes back to what these options actually imply. The interpretation that I favor is “this code will never see a NaN”, or “the program can assume that no floating point expression will evaluate to a NaN”. The benefit of that is that it’s intuitively clear. In that case “isnan(x)” is false, because x cannot be a NaN. There is no distinction between “isnan(x+x)” and “isnan(x)”. If the user wants to preserve “isnan(x)”, they can apply some pragma (which clang may actually have already).
To be honest, I’m not sure that I understand your argument. Are you saying that under your interpretation we could optimize “isnan(x+x) -> false”, but not “isnan(x) -> false”?
This goes back to what these options actually imply. The interpretation that I favor is “this code will never see a NaN”, or “the program can assume that no floating point expression will evaluate to a NaN”. The benefit of that is that it’s intuitively clear. In that case “isnan(x)” is false, because x cannot be a NaN. There is no distinction between “isnan(x+x)” and “isnan(x)”. If the user wants to preserve “isnan(x)”, they can apply some pragma (which clang may actually have already).
To be honest, I’m not sure that I understand your argument. Are you saying that under your interpretation we could optimize “isnan(x+x) -> false”, but not “isnan(x) -> false”?
If the issue is that users want their asserts to fire, then they should be encouraged to only enable fast math in release builds.
It is apparent simplicity. As the discussion in gcc mail list demonstrated (https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544641.html) this is actually an impromissing way. From a practical viewpoint it is also a bad solution as users cannot even check the assertions.
The intent here is that users can preserve the NaN behavior by annotating the code with either attributes or pragmas. I don’t think that the linked discussion actually shows that the “no NaNs ever” interpretation is any worse than the “arithmetic operations do not produce NaNs”. A large part of was what happens to `__builtin_nan`, but if your code explicitly produces NaNs and you compile it finite-math, you shouldn’t expect anything meaningful. IMO it’s much better to have a flag with a clarity of what it does, even if it leads to potentially unexpected results, than having an option whose description is open to interpretation. At least the users will know what caused the issue, rather than wonder if they had found a compiler bug or not.
I agree that there may be issues with multiple definitions of functions compiled with different settings, although that is not strictly limited to FP flags. There should be some unified approach to that, and I don’t know what the right thing to do it off the top of my head.
Argument of `isnan(x+x)` is a result of arithmetic operation. According to the meaning of -ffinite-math-only it cannot produce NaN. So this call can be optimized out. In the general case `isnan(x)` value may be, say, loaded from memory. Load is not an arithmetic operation, so nothing prevents from loading NaN. Optimizing the call out is dangerous in this case.
`x` is not a load, it’s an expression. Also, even in the presence of NaNs, x+0 preserves the value type (i.e. normal/subnormal/infinity/NaN), except signaling NaNs perhaps. I’m not sure whether we even consider signaling NaNs, so let’s forget them for a moment. If x+0 is a NaN iff x is a NaN, then the compiler should be able to rewrite x -> x+0 regardless of any flags. But then, given that x+0 is now “arithmetic”, isnan(x+0) could become `false`. This is fundamentally counterintuitive.
Furthermore, if we had `a = isnan(x)`, we couldn’t fold it to `false`, but if we had `a = isnan(x); b = isnan(x+x)`, then we could fold both to `false`. This is, again, unintuitive.
It is apparent simplicity. As the discussion in gcc mail list demonstrated (https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544641.html) this is actually an impromissing way. From a practical viewpoint it is also a bad solution as users cannot even check the assertions.
The intent here is that users can preserve the NaN behavior by annotating the code with either attributes or pragmas. I don’t think that the linked discussion actually shows that the “no NaNs ever” interpretation is any worse than the “arithmetic operations do not produce NaNs”. A large part of was what happens to `__builtin_nan`, but if your code explicitly produces NaNs and you compile it finite-math, you shouldn’t expect anything meaningful.
IMO it’s much better to have a flag with a clarity of what it does, even if it leads to potentially unexpected results, than having an option whose description is open to interpretation. At least the users will know what caused the issue, rather than wonder if they had found a compiler bug or not.
In this case, I think it’s perfectly reasonable to reinterpret_cast the floats to uint32_t, and then inspect the bit pattern. Since NaN is being used as a sentinel value, I assume it’s a known bit pattern, and not just any old NaN.
I think it’s fine that fast-math renders isnan useless. As far as I know, the C++ standard wasn’t written to account for compilers providing fast-math flags. fast-math is itself a workaround for “IEEE floats do not behave like actual real numbers”, so working around a workaround seems reasonable to me.
In this case, I think it’s perfectly reasonable to reinterpret_cast the floats to uint32_t, and then inspect the bit pattern. Since NaN is being used as a sentinel value, I assume it’s a known bit pattern, and not just any old NaN.
I think it’s fine that fast-math renders isnan useless. As far as I know, the C++ standard wasn’t written to account for compilers providing fast-math flags. fast-math is itself a workaround for “IEEE floats do not behave like actual real numbers”, so working around a workaround seems reasonable to me.
Let me describe a real life example.There is a realtime program that processes float values from a huge array. Calculations do not produce NaNs and do not expect them. Using -ffinite-math-only substantially speeds up the program, so it is highly desirable to use it. The problem is that the array contains NaNs, they mark elements that should not be processed.An obvious solution is to check an element for NaN, and if it is not, process it. Now there is no clean way to do so. Only workarounds, like using integer arithmetics. The function 'isnan' became useless. And there are many cases when users complain of this optimization.
On Thu, Sep 9, 2021 at 10:34 AM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:Let me describe a real life example.There is a realtime program that processes float values from a huge array. Calculations do not produce NaNs and do not expect them. Using -ffinite-math-only substantially speeds up the program, so it is highly desirable to use it. The problem is that the array contains NaNs, they mark elements that should not be processed.An obvious solution is to check an element for NaN, and if it is not, process it. Now there is no clean way to do so. Only workarounds, like using integer arithmetics. The function 'isnan' became useless. And there are many cases when users complain of this optimization.I personally would separate the "pre-processing" of the input in a compilation unit that isn't compiled with -ffinite-math-only and isolate the perf-critical routines to be compiled with this flag if needed (I'd also like a sanitizer to have a build mode that validate that no NaNs are ever seen in this routines).
In general, Krzysztof's reasoning in this thread makes sense to me, in particular in terms of being consistent with how we treat isnan(x) vs isnan(x+0) for example.
(Speaking only for myself here, and mostly as someone who doesn’t typically write floating-point-heavy code).
The root issue we have here is that, as with many compiler extensions, fast-math flags ends up creating a vaguely-defined variant of the C specification governed by the “obvious” semantics, and is the case with “obvious” semantics, there are several different “obvious” results.
Given the standard C taste for undefined behavior, it would seem to me that the most natural definition of -ffinite-math-only would be to say that any operation that produces NaN or infinity results is undefined behavior, or produces a poison value using LLVM’s somewhat tighter definition here [1]. This notably doesn’t give a clear answer on what to do with floating-point operations that don’t produce floating-point results (e.g., casts, comparison operators), and the volume of discussions on this point is I think indicative that there are multiple reasonable options here. Personally, I find the extension of the UB to cases that consume but do not produce floating-point values to be the most natural option.
It’s also the case that many users don’t like undefined behavior as a concept, in large part because it can be very difficult to work around in a few cases where it is desired to explicitly override the undefined behavior. For some of the more basic integer UB, clang already provides builtin overflow checking macros to handle the I-want-to-check-if-it-overflowed-without-UB case, for example. And if fast math flags are to create UB, then similar functionality to override the floating-point UB ought to be provided. Already, C provides a mechanism to twiddle floating-point behavior on a per-scope basis (e.g., #pragma STDC FENV_ACCESS, CX_LIMITED_RANGE, FP_CONTRACT). LLVM already supports these flags on a per-instruction basis, so it really shouldn’t be very difficult to have Clang support pragmas to twiddle fast-math flags like the existing C pragmas. And in this model, the -ffast-math and related flags are doing nothing more than setting the default values of these pragmas.
In that vein, I can imagine a user writing a program that would look something like this:
int some_hard_math_kernel(float *inputs, float *outputs, int N) {
{
#pragma clang fast_math off
for (int i = 0; i < N; i++) {
if (isinf(inputs[i]) || isnan(inputs[i]))
return ILLEGAL_ARGUMENT;
}
}
#pragma clang fast_math on
// Do fancy math here…
// and if we see isnan(x) here, even if it’s in a library routine [compiled with -ffast-math],
// or maybe implied by some operation the compiler understands [say, complex multiplication]
// it is optimized to false.
return SUCCESS;
}
I can clearly see use cases where the programmer might wish to have the optimizer eliminate any isnan calls that are generated when -ffast-math is used, but like other UB, I think it is extremely beneficial to provide some way to explicitly opt-out of UB on a case-by-case basis.
I would even go so far as to suggest that maybe the C standards committee should discuss how to handle at least the nsz/nnan/ninf parts of fast-math flags, given that very similar concepts seem to exist in all of the major C/C++ compilers.
[1] I fully expect any user who is knowledgeable about poison in LLVM—which admittedly is a fairly expert user—would expect poison to kick in most of the time C or C++ provides for undefined behavior, and potentially to rely on that expectation.
The point I was trying to make regarding the C++ standard is that fast-math is a non-standard language extension. If you enable it, you should expect the compiler to diverge from the language standard. I’m sure there’s precedent for this. If I write #pragma once at the top of my header, and include it twice back to back, the preprocessor won’t paste my header twice. Should #pragma once be removed because it breaks #include?
Now, you have a real-world example that uses NaN as a sentinel value. In your case, it would be nice if the compiler worked as you suggest. Now, suppose I have a “safe matrix multiply”:
```
std::optional<MyMatrixT> safeMul(const MyMatrixT & lhs, const MyMatrixT & rhs) {
for (int i = 0; i < lhs.rows; ++i) {
for (int j = 0; j < lhs.cols; ++j) {
if (isnan(lhs[i][j])) {
return {};
}
}
}
for (int i = 0; i < rhs.rows; ++i) {
for (int j = 0; j < rhs.cols; ++j) {
if (isnan(rhs[i][j])) {
return {};
}
}
}
// do the multiply
}
```
In this case, if isnan(x) can be constant folded to false with fast-math enabled, then these two loops can be completely eliminated since they are empty and do nothing. If MyMatrixT is a 100 x 100 matrix, and/or safeMul is called in a hot loop, this could be huge. What should I do instead here?
Really, it would be much more consistent if we apply the clang documentation for fast-math “Operands to floating-point operations are not equal to NaN and Inf” literally, and not actually implement “Operands to floating-point operations are not equal to NaN and Inf, except in the case of isnan(), but only if the argument to isnan() is a value stored in a variable and not an expression”. As far as using isnan from the standard library compiled without fast-math vs a compiler builtin, I don’t think this is an issue. Really, enabling fast-math is basically telling the compiler “My code has no NaNs. I won’t try to do anything with them, and you should optimize assuming they aren’t there”. If a developer does their part, why should it matter to them that isnan() might work?
Thanks,
Chris Tetreault
Not sure which way to go, but I agree that we need to improve the docs/user experience either way.Let's try to iron this out with an example (this is based on https://llvm.org/PR51775 ):#include <math.h>#include <stdlib.h>
int main() {
const double d = strtod("1E+1000000", NULL);
return d == HUGE_VAL;
What should this program return when compiled with -ffinite-math-only? Should this trigger a clang warning?
The proposed documentation text isn't clear to me. Should clang apply "nnan ninf" to the IR call for "strtod"?
"strtod" is not in the enumerated list of functions where we would block fast-math-flags, but it is a standard lib call, so "nnan ninf" would seem to apply...but we also don't want "-ffinite-math-only" to alter the ability to return an INF from a "general function call"?
The point I was trying to make regarding the C++ standard is that fast-math is a non-standard language extension.
Would it be reasonable to treat operations on Inf and NaN values as UB in this mode only if the same operation on a signaling NaN might signal? (Approximately, that'd mean we imagine these non-finite value encodings all encode sNaNs that are UB if they would signal.) That means the operations that ISO 60559 defines as non-computational or quiet-computational would be permitted to receive NaN and Inf as input and produce them as output, but that other computational operations would not.Per ISO 60559, the quiet-computational operations that I think are relevant to us are: copy, negate, abs, copySign, and conversions between encoding (eg, bitcast). The non-computational operations that I think are relevant to us are classification functions (including isNaN).
I’m not super knowledgeable on the actual implementation of floating point math in clang, but on the surface this seems fine. My position is that we should provide no guarantees as to the behavior of code with NaN or infinity if fast-math is enabled. We can go with this behavior, but we shouldn’t tell users that they can rely on this behavior. Clang should have maximal freedom to optimize floating point math with fast-math, and any constraint we place potentially results in missed opportunities. Similarly we should feel free to change this implementation in the future, the goal not being stability for users who chose to rely on our implementation details. If users value reproducibility, they should not be using fast math.
The only thing I think we should guarantee is that casts work. I should be able to load some bytes from disk, cast the char array to a float array, and any NaNs that I loaded from disk should not be clobbered. After that, if I should be able to cast an element of my float array back to another type and inspect the bit pattern (assuming I did not transform that element in the array in any other way after casting it from char) to support use cases like Serge’s. Any other operation should be fair game.
Thanks,
Chris Tetreault
I’m not super knowledgeable on the actual implementation of floating point math in clang, but on the surface this seems fine. My position is that we should provide no guarantees as to the behavior of code with NaN or infinity if fast-math is enabled. We can go with this behavior, but we shouldn’t tell users that they can rely on this behavior. Clang should have maximal freedom to optimize floating point math with fast-math, and any constraint we place potentially results in missed opportunities. Similarly we should feel free to change this implementation in the future, the goal not being stability for users who chose to rely on our implementation details. If users value reproducibility, they should not be using fast math.
The only thing I think we should guarantee is that casts work. I should be able to load some bytes from disk, cast the char array to a float array, and any NaNs that I loaded from disk should not be clobbered. After that, if I should be able to cast an element of my float array back to another type and inspect the bit pattern (assuming I did not transform that element in the array in any other way after casting it from char) to support use cases like Serge’s. Any other operation should be fair game.
Thanks,
Chris Tetreault
I would argue that #undef’ing a macro provided by the compiler is a much worse kludge that static casting your float to an unsigned int. Additionally, you have to re define isnan to whatever it was after your function (let it pollute unrelated code that possibly isn’t even being compiled with fast math), which can’t be done portably as far as I know. Additionally, this requires you to be the author of safeMul. What if it’s in a dependency for which you don’t have the source? At that point, your only recourse is to open an issue with libProprietaryMatrixMath and hope your org is paying them enough to fast track a fix.
Without trying to be too harsh, this is the bad justification GCC has
used for years for exploiting all kinds of UB and implementation-defined
behavior in the name of performance. As has been shown over and over
again, the breakage is rarely matched by equivalent performance gains.
So once more, do we even have proof that significant code exists where
isnan and friends are used in a performance critical code path? I would
find that quite surprising and more an argument for throwing a compile
error...
Joerg
The problem is that math code is often templated, so `template <typename T> MyMatrixT<T> safeMul(const MyMatrixT<T> & lhs …` is going to be in a header.
Regardless, my position isn’t “there is no NaN”. My position is “you cannot count on operations on NaN working”. Just like sometimes you can dereference a pointer after it is free’d, but you should not count on this working. If the compiler I’m using emits a call to a library function instead of providing a macro, and this results in isnan actually computing if x is NaN, then so be it. But if the compiler provides a macro that evaluates to false under fast-math, then the two loops in safeMul can be optimized. Either way, as a developer, I know that I turned on fast-math, and I write code accordingly.
I think working around these sorts of issues is something that C and C++ developers are used to. These sorts of “inconsistent” between compilers behaviors is something we accept because we know it comes with improved performance. In this case, the fix is easy, so I don’t think this corner case is worth supporting. Especially when the fix is also just one line:
```
#define myIsNan(x) (reinterpret_cast<uint32_t>(x) == THE_BIT_PATTERN_OF_MY_SENTINEL_NAN)
```
I would probably call the macro something else like `shouldProcessElement`.
Thanks,
Chris Tetreault
From: Serge Pavlov <sepa...@gmail.com>
Sent: Friday, September 10, 2021 11:26 AM
To: Chris Tetreault <ctet...@quicinc.com>
Cc: Richard Smith <ric...@metafoo.co.uk>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
It should not be done in headers of course. Redefinition of this macro in the source file which is compiled with -ffinite-math-only is free from the described drawbacks. Besides, the macro `isnan` is defined by libc, not compiler and IIRC it is defined as macro to allow such manipulations.
The problem is that math code is often templated, so `template <typename T> MyMatrixT<T> safeMul(const MyMatrixT<T> & lhs …` is going to be in a header.
Regardless, my position isn’t “there is no NaN”. My position is “you cannot count on operations on NaN working”.
I think working around these sorts of issues is something that C and C++ developers are used to. These sorts of “inconsistent” between compilers behaviors is something we accept because we know it comes with improved performance. In this case, the fix is easy, so I don’t think this corner case is worth supporting. Especially when the fix is also just one line:
```
#define myIsNan(x) (reinterpret_cast<uint32_t>(x) == THE_BIT_PATTERN_OF_MY_SENTINEL_NAN)
```
The working construct is `reinterpret_cast<uint32_t&>(x)`. It however possesses the same drawback, it requires `x` be in memory.
If the compiler provides “isnan”, the user can’t redefine it. Redefining/undefining any function or a macro provided by a compiler is UB.
The “old” behavior can be tuned with #pragmas to restore the functionality of NaNs where needed.
The “old” behavior doesn’t have a problem with “has_nan”---it returns “true”. What other issues are there?
From: cfe-dev <cfe-dev...@lists.llvm.org> On Behalf Of
Serge Pavlov via cfe-dev
Sent: Monday, September 13, 2021 8:50 AM
To: James Y Knight <jykn...@google.com>
Cc: llvm-dev <llvm...@lists.llvm.org>; Clang Dev <cfe...@lists.llvm.org>
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
Let's weigh the alternatives.
Thanks,
--Serge
The standard says so, but I can’t find the corresponding passage in the draft...
From: Serge Pavlov <sepa...@gmail.com>
Sent: Monday, September 13, 2021 9:31 AM
To: Krzysztof Parzyszek <kpar...@quicinc.com>
From: Serge Pavlov <sepa...@gmail.com>
`isnan` does not begin with an underscore, so it is not a reserved identifier. Why is its redefinition an UB?
The standard says so, but I can’t find the corresponding passage in the draft...
Honestly, we can do this until the end of time. I think we both agree, that for either scheme, there exists workarounds. The question is which workarounds are more palatable, which is a matter of opinion. I think we’ve come to an impasse, so let me just state that my opinion on the question “Should isnan be optimized out in fast-math mode?” is “Yes”, which is what you asked to get in your original message. I think that the implementation of fast-math will be cleaner if we don’t special case a bunch of random constructs in order to do what the user meant instead of what they said. I think fast-math is a notorious footgun, and any attempts to mitigate this will only reduce the effectiveness of the tool, while not really improving the user experience.
As a user, if I read that:
```
if (isnan(x)) {
```
… is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:
```
if (isnan(x + 0)) {
```
… does not also work. I’m going to open a bug and complain, and the slide down the slippery slope will continue. You and I understand the difference, and the technical reason why `isnan(x)` is supported but `isnan(x + 0)` isn’t, but Joe Coder just trying to figure out why he’s got NaN in his matrices despite his careful NaN handling code. Joe is not a compiler expert, and on the face of it, it seems like a silly limitation. This will never end until fast-math is gutted.
Thanks,
Chris Tetreault
From: Serge Pavlov <sepa...@gmail.com>
Sent: Friday, September 10, 2021 9:21 PM
To: Chris Tetreault <ctet...@quicinc.com>
Cc: Richard Smith <ric...@metafoo.co.uk>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctet...@quicinc.com> wrote:
my opinion on the question “Should isnan be optimized out in fast-math mode?” is “Yes” […]
+1
From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of
Chris Tetreault via llvm-dev
Sent: Monday, September 13, 2021 11:46 AM
To: Serge Pavlov <sepa...@gmail.com>
If it would be helpful, I am happy to put people in touch with the
WG14 C Floating Point study group. Then these questions can be asked
of a wider audience of compiler vendors to see if there's some common
thoughts on the subject, even if WG14 won't have an "official" stance
because the behavior is wholly a matter of QoI. I'd request that we
nominate one person from the LLVM community to hold that discussion
and report back so we don't throw the study group into the deep end of
our pool (and given my lack of domain knowledge, I'd appreciate it if
that someone was not me).
~Aaron
On Mon, Sep 13, 2021 at 12:46 PM Chris Tetreault via cfe-dev
<cfe...@lists.llvm.org> wrote:
>
… is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:
If the compiler provides “isnan”, the user can’t redefine it. Redefining/undefining any function or a macro provided by a compiler is UB.
The “old” behavior can be tuned with #pragmas to restore the functionality of NaNs where needed.
The “old” behavior doesn’t have a problem with “has_nan”---it returns “true”. What other issues are there?
Btw, I don't think this thread has paid enough attention to Richard Smith's suggestion:
If `has_nan` returns "true", it means that the explanation "there are no NaNs" does not work anymore and something more complex is needed to explain the effect of the option. In this case it is difficult to say that this approach is "intuitively clear".
If your program has “x = *p”, it means that at this point p is never a null pointer. Does this imply that the type of p can no longer represent a null pointer?
If `has_nan` returns "true", it means that the explanation "there are no NaNs" does not work anymore and something more complex is needed to explain the effect of the option. In this case it is difficult to say that this approach is "intuitively clear".
If your program has “x = *p”, it means that at this point p is never a null pointer. Does this imply that the type of p can no longer represent a null pointer?
Good example! If you use integer division `r = a / b`, you promise that `b` is not zero. It however does not mean that preceding check `b == 0` may be optimized to `false`.
The statement "there are no NaNs" means that properties of type `float` are modified so that NaN is no longer an allowed value of it. In this case it is allowed to optimize out `isnan`. If the guarantee is only that NaN cannot be an argument of an arithmetic operation, NaN is still a valid value of `float` and `isnan` cannot be replaced with `false`.
Granted, the statement “there are no NaNs” is somewhat ambiguous, but taken to mean “NaNs will not happen at runtime” it would allow you to remove the NaN equivalent of “b == 0” without changing the meaning of “float”. This is the interpretation I’m arguing for.
From: Serge Pavlov <sepa...@gmail.com>
Sent: Tuesday, September 14, 2021 9:22 AM
To: Krzysztof Parzyszek <kpar...@quicinc.com>
On Tue, Sep 14, 2021 at 8:21 PM Krzysztof Parzyszek <kpar...@quicinc.com> wrote:If `has_nan` returns "true", it means that the explanation "there are no NaNs" does not work anymore and something more complex is needed to explain the effect of the option. In this case it is difficult to say that this approach is "intuitively clear".
If your program has “x = *p”, it means that at this point p is never a null pointer. Does this imply that the type of p can no longer represent a null pointer?
Good example! If you use integer division `r = a / b`, you promise that `b` is not zero. It however does not mean that preceding check `b == 0` may be optimized to `false`.
On Mon, Sep 13, 2021 at 10:28 PM Arthur O'Dwyer <arthur....@gmail.com> wrote:
Btw, I don't think this thread has paid enough attention to Richard Smith's suggestion:
I can only subscribe to James Y Knight's opinion. Indeed, it can be a good criterion of which operations should work in finite-math-only mode and which can not work. The only thing which I worry about is the possibility of checking the operation result for infinity (and nan for symmetry). But the suggested criterion is formulated in terms of arguments, not results, so it must allow such checks.
On Tue, Sep 14, 2021 at 9:22 AM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:On Tue, Sep 14, 2021 at 8:21 PM Krzysztof Parzyszek <kpar...@quicinc.com> wrote:If `has_nan` returns "true", it means that the explanation "there are no NaNs" does not work anymore and something more complex is needed to explain the effect of the option. In this case it is difficult to say that this approach is "intuitively clear".
If your program has “x = *p”, it means that at this point p is never a null pointer. Does this imply that the type of p can no longer represent a null pointer?
Good example! If you use integer division `r = a / b`, you promise that `b` is not zero. It however does not mean that preceding check `b == 0` may be optimized to `false`.In C and C++, it actually does mean that, although of the compilers I just tested on Godbolt, only MSVC seems to take advantage of that permission.
The question of whether it is acceptable to treat as equivalent the statements "p is known to be dereferenced in all successors of B" and "p is known to be non-null in B," was discussed extensively about 20 years ago, and then again 12 years ago when it bit someone in the Linux kernel:On Mon, Sep 13, 2021 at 10:28 PM Arthur O'Dwyer <arthur....@gmail.com> wrote:
Btw, I don't think this thread has paid enough attention to Richard Smith's suggestion:
I can only subscribe to James Y Knight's opinion. Indeed, it can be a good criterion of which operations should work in finite-math-only mode and which can not work. The only thing which I worry about is the possibility of checking the operation result for infinity (and nan for symmetry). But the suggested criterion is formulated in terms of arguments, not results, so it must allow such checks.
What is the opinion to which you subscribe?
On Thu, Sep 9, 2021, 8:59 PM Richard Smith via llvm-dev <llvm...@lists.llvm.org> wrote:Would it be reasonable to treat operations on Inf and NaN values as UB in this mode only if the same operation on a signaling NaN might signal? (Approximately, that'd mean we imagine these non-finite value encodings all encode sNaNs that are UB if they would signal.) That means the operations that ISO 60559 defines as non-computational or quiet-computational would be permitted to receive NaN and Inf as input and produce them as output, but that other computational operations would not.Per ISO 60559, the quiet-computational operations that I think are relevant to us are: copy, negate, abs, copySign, and conversions between encoding (eg, bitcast). The non-computational operations that I think are relevant to us are classification functions (including isNaN).
I'm in favor. (Perhaps unsurprisingly, as this is precisely the proposal I made earlier, worded slightly differently. :)
Anyway, Richard's "quiet is signaling and signals are unspecified values" is really the only way out of the difficulty, as far as compiler people are concerned. You two (Serge and Krzysztof) can keep talking past each other at the application level, but the compiler people are going to have to do something in the code eventually, and that something is going to have to be expressed in terms similar to what Richard and I have been saying, because these are the terms that the compiler understands.
Thanks,Arthur
Hopefully the clarified semantics can be coordinated between and
implemented in a consistent manner in both LLVM and GCC.
Thanks, David
Anyway, Richard's "quiet is signaling and signals are unspecified values" is really the only way out of the difficulty, as far as compiler people are concerned. You two (Serge and Krzysztof) can keep talking past each other at the application level, but the compiler people are going to have to do something in the code eventually, and that something is going to have to be expressed in terms similar to what Richard and I have been saying, because these are the terms that the compiler understands.
I don’t know why you’re saying “at the application level”. My concerns are motivated by what the compiler is supposed to do. I don’t think that the consequences of “arithmetic operations don’t produce NaNs” are fully understood, and are likely not completely intuitive either. We may end up having discussions as to whether we should optimize x+0 to x or not, because “x+0” carries the information that it won’t result in a NaN, while “x” alone doesn’t. This is one case that comes to mind and I’m concerned that there are many others that we aren’t aware of yet.
From: Arthur O'Dwyer <arthur....@gmail.com>
Sent: Tuesday, September 14, 2021 10:15 AM
To: Serge Pavlov <sepa...@gmail.com>
Cc: Krzysztof Parzyszek <kpar...@quicinc.com>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
On Tue, Sep 14, 2021 at 9:22 AM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:
Fundamentally, the question Serge asked has nothing to do with the concerns of “compiler people”, and everything to do with the user facing behavior of the compiler. Any talk of how the behavior should be implemented is (in my opinion) off topic until we settle the question of “should the compiler guarantee, as a special case, that isnan(x) will not be optimized out”. This is a Yes-or-No question, and the explanation for the answer needs to be able to be concisely described in the docs without the use of compiler jargon that Joe GameDev and Tom MLScientist probably don’t understand. I have stated my opinion, and am reluctant to wade into this argument again, but I think it’s important that we understand the issue at hand.
There are two productive outcomes of this question that I can see:
Let’s not put the cart before the horse. The “compiler people” don’t necessarily have to do anything.
Thanks,
Chris Tetreault
Fundamentally, the question Serge asked has nothing to do with the concerns of “compiler people”, and everything to do with the user facing behavior of the compiler. Any talk of how the behavior should be implemented is (in my opinion) off topic until we settle the question of “should the compiler guarantee, as a special case, that isnan(x) will not be optimized out”. This is a Yes-or-No question [...]
I agree. However, we *can* answer the question “is my call to isnan(x) guaranteed not to be optimized out?” Currently, the answer is “No”. Serge is asking if we can change the answer to be “Yes” which is the application level matter that we’re discussing. It seems that Serge has done the legwork to find out if their isnan call is being optimized out, and for them it is. At this time, it’s not really important what we would do to implement the Yes behavior, other than to establish the difficulty of implementation. We want to know if we should, not if we could.
As maintainers of the compiler, it should be our goal to provide a good user experience. As it stands, it sounds like fast-math is poorly understood by users. It is within our purview to improve this UX issue by clarifying the behavior or making the behavior match user expectations.
Thanks,
Chris Tetreault
From: Arthur O'Dwyer <arthur....@gmail.com>
Sent: Wednesday, September 15, 2021 10:58 AM
To: Chris Tetreault <ctet...@quicinc.com>
Cc: Krzysztof Parzyszek <kpar...@quicinc.com>; Serge Pavlov <sepa...@gmail.com>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
On Wed, Sep 15, 2021 at 12:42 PM Chris Tetreault <ctet...@quicinc.com> wrote:
FWIW, I personally come down on the side of *not* removing the call to
isnan() that the user explicitly wrote. It's not beyond belief that a
C API is called from another language that can generate a NaN, and a
user who has enabled finite math only may still wish to guard against
those cross-module cases passing in a NAN that they know their TU
can't properly handle.
~Aaron
>
> Thanks,
> --Serge
> _______________________________________________
> cfe-dev mailing list
> cfe...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
I think you are not adequately summing up the risks of your approach; there are three other issues I see.
First, redefining `isnan` as a macro is expressly undefined behavior in C (see section 7.1.3, clauses 2 and 3—it’s undefined behavior to define a macro as a same name as reserved identifier in a standard library header). Conditionally redefining an `isnan` macro is therefore not a permissible solution.
The second thing that has been repeatedly brought up that is missing is the fact that `isnan` may still be inconsistently optimized out. `isnan(x)` would only be retained in the program if the compiler cannot deduce that `x` is the result of a nnan arithmetic operation. If it can deduce that—the simplest case being the somewhat questionable `isnan(x + 0)` example, but it’s also possible that, e.g., you’re calling `isnan(sum)` on the result of a summation, which would be the result of an arithmetic expression post-mem2reg/SROA—then the compiler would still elide it. It could be that this is less surprising to users than unconditionally optimizing array `isnan(x)`, but it should still be admitted that there is a potential for surprise here.
A final point is that the potential optimization benefits of eliding `isnan` are not limited to the cost of running the function itself (which are likely to be negligible), but also include the benefits of deleting any subsequent code that is attempting to handle NaN values, which may be fairly large blocks. A salient example is complex multiplication and division, where the actual expansion of the multiplication and division code itself is dwarfed by the recalculation code if the result turns out to be a NaN.
From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of
Serge Pavlov via llvm-dev
Sent: Thursday, September 16, 2021 1:37
To: Chris Tetreault <ctet...@quicinc.com>
I think you are not adequately summing up the risks of your approach; there are three other issues I see.
First, redefining `isnan` as a macro is expressly undefined behavior in C (see section 7.1.3, clauses 2 and 3—it’s undefined behavior to define a macro as a same name as reserved identifier in a standard library header). Conditionally redefining an `isnan` macro is therefore not a permissible solution.
The second thing that has been repeatedly brought up that is missing is the fact that `isnan` may still be inconsistently optimized out. `isnan(x)` would only be retained in the program if the compiler cannot deduce that `x` is the result of a nnan arithmetic operation. If it can deduce that—the simplest case being the somewhat questionable `isnan(x + 0)` example, but it’s also possible that, e.g., you’re calling `isnan(sum)` on the result of a summation, which would be the result of an arithmetic expression post-mem2reg/SROA—then the compiler would still elide it. It could be that this is less surprising to users than unconditionally optimizing array `isnan(x)`, but it should still be admitted that there is a potential for surprise here.
On Mon, Sep 13, 2021 at 11:46 PM Chris Tetreault <ctet...@quicinc.com> wrote:… is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:You are right, this was a bad idea. Compiler may optimize out `isnan` but only when it deduces that the value cannot be NaN, but not due to the user's promise. It is especially important for `isinf`. Addition of two finite values may produce infinity and there is no universal way to predict it. It is probably not an issue for types like float or double, but ML cores use halfs or even minifloats, where overflow is much more probable. If in the code:```float r = a + b;if (isinf(r)) {...````isinf` were optimized out just because -ffinite-math-only is in effect, the user cannot check if overflow did not occur.
A final point is that the potential optimization benefits of eliding `isnan` are not limited to the cost of running the function itself (which are likely to be negligible), but also include the benefits of deleting any subsequent code that is attempting to handle NaN values, which may be fairly large blocks. A salient example is complex multiplication and division, where the actual expansion of the multiplication and division code itself is dwarfed by the recalculation code if the result turns out to be a NaN.
The standard makes no real distinction between what's the job of the
compiler and what's the job of the library; it's all "the
implementation" as far as the standard is concerned. FWIW, there are
plenty of libc things which are produced by the compiler (see
https://github.com/llvm/llvm-project/tree/main/clang/lib/Headers for
all the standard library interfaces provided by Clang).
> It does not cause undefined behavior by itself.
A user-defined macro named `isnan` is UB per 7.1.3p1 if the user
includes <math.h> in the TU.
> Redefining macro defined in system headers may be harmful if the new macro is inconsistent with other libc implementation (`errno` comes to mind). So this looks like a kind of legal disclaimer.
I would hope so; the standard says explicitly that redefining that
macro is UB. :-)
~Aaron
> _______________________________________________
> cfe-dev mailing list
> cfe...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
On Tue, Sep 14, 2021 at 12:50 AM Serge Pavlov <sepa...@gmail.com> wrote:On Mon, Sep 13, 2021 at 11:46 PM Chris Tetreault <ctet...@quicinc.com> wrote:… is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:You are right, this was a bad idea. Compiler may optimize out `isnan` but only when it deduces that the value cannot be NaN, but not due to the user's promise. It is especially important for `isinf`. Addition of two finite values may produce infinity and there is no universal way to predict it. It is probably not an issue for types like float or double, but ML cores use halfs or even minifloats, where overflow is much more probable. If in the code:```float r = a + b;if (isinf(r)) {...````isinf` were optimized out just because -ffinite-math-only is in effect, the user cannot check if overflow did not occur.Rules proposed by Richard are also formulated using arguments, not results. Now there is no intention to optimize such a case.
I'm really hoping that comparison is a signaling operation.
On 16 Sep 2021, at 20:18, Arthur O'Dwyer via cfe-dev <cfe...@lists.llvm.org> wrote:On Thu, Sep 16, 2021 at 1:31 PM Serge Pavlov <sepa...@gmail.com> wrote:On Tue, Sep 14, 2021 at 12:50 AM Serge Pavlov <sepa...@gmail.com> wrote:On Mon, Sep 13, 2021 at 11:46 PM Chris Tetreault <ctet...@quicinc.com> wrote:… is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:You are right, this was a bad idea. Compiler may optimize out `isnan` but only when it deduces that the value cannot be NaN, but not due to the user's promise. It is especially important for `isinf`. Addition of two finite values may produce infinity and there is no universal way to predict it. It is probably not an issue for types like float or double, but ML cores use halfs or even minifloats, where overflow is much more probable. If in the code:```float r = a + b;if (isinf(r)) {...````isinf` were optimized out just because -ffinite-math-only is in effect, the user cannot check if overflow did not occur.Rules proposed by Richard are also formulated using arguments, not results. Now there is no intention to optimize such a case.Infinity (HUGE_VAL) is already not NaN, so this example doesn't have anything to do with the NaN cases being discussed.
C/C++ already has cases like this. Pointer arithmetic on null pointers
is undefined behaviour, even if adding[1,2]/subtracting[3] zero. I
don't think it is too far fetched to expect from users to know that an
operation is undefined behaviour even if one of the operands is zero.
Michael
[1] https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-addition.c
[2] https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/ubsan/TestCases/Pointer/nullptr-and-nonzero-offset-constants.cpp
[3] https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-subtraction.c
Michael
The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers.
It seems to me that most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code.
To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler.
Changing the compiler is hard, affects everybody who uses the compiler, and creates inconsistency in behavior between clang and gcc (and msvc with /fp:fast), and clang and old versions of clang.
The behavior of fast-math with respect to NaN is consistent across the mainstream c/c++ compilers: no promises are made, users should not assume that they can use it for anything. Changing it now would create a major portability issue for user codebases, which in and of itself is a very strong reason to not make this change.
If the behavior is confusing to users, that's because it's poorly explained.
Honestly, I think the docs are pretty clear, but "It's clear, you just need to learn to read" is never an acceptable answer so it could certainly be improved. This is the only thing that needs to be fixed in my opinion.
On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctet...@quicinc.com> wrote:The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers.
It seems to me that most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code.It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case.Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback.
To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler.It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable.
Changing the compiler is hard, affects everybody who uses the compiler, and creates inconsistency in behavior between clang and gcc (and msvc with /fp:fast), and clang and old versions of clang.ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is implemented in libc, it is also a real check. Actually removing `isnan` creates inconsistency.The behavior of fast-math with respect to NaN is consistent across the mainstream c/c++ compilers: no promises are made, users should not assume that they can use it for anything. Changing it now would create a major portability issue for user codebases, which in and of itself is a very strong reason to not make this change.Removing `isnan` is only an optimization, it does not intend to change semantics. So it cannot create portability issues. Quite the contrary, it helps portability by making behavior consistent between compilers and libc implementations. The only possible issue is performance loss, this is discussed above, it is an unlikely case. Anyway, if such loss exists and it is absolutely intolerable for a user, a hack with redefinition of `isnan` restores the previous code generation.
If the behavior is confusing to users, that's because it's poorly explained.What is confusing? That the explicitly written call to a function is not removed? According to user feedback it is the silent removal of `isnan` that confuses users.Honestly, I think the docs are pretty clear, but "It's clear, you just need to learn to read" is never an acceptable answer so it could certainly be improved. This is the only thing that needs to be fixed in my opinion.The documentation says about -ffinite-math-only:"Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs."Is it clear whether `isnan` is arithmetic or not?
Thanks,
Chris Tetreault
-----Original Message-----
From: Michael Kruse <cfe...@meinersbur.de>
Sent: Thursday, September 16, 2021 12:29 PM
To: Chris Tetreault <ctet...@quicinc.com>
Cc: Serge Pavlov <sepa...@gmail.com>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
Am Mo., 13. Sept. 2021 um 11:46 Uhr schrieb Chris Tetreault via cfe-dev <cfe...@lists.llvm.org>:
> As a user, if I read that:
>
>
>
> ```
>
> if (isnan(x)) {
>
> ```
>
>
>
> … is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:
>
>
>
> ```
>
> if (isnan(x + 0)) {
>
> ```
>
>
>
> … does not also work. I’m going to open a bug and complain, and the slide down the slippery slope will continue. You and I understand the difference, and the technical reason why `isnan(x)` is supported but `isnan(x + 0)` isn’t, but Joe Coder just trying to figure out why he’s got NaN in his matrices despite his careful NaN handling code. Joe is not a compiler expert, and on the face of it, it seems like a silly limitation. This will never end until fast-math is gutted.
C/C++ already has cases like this. Pointer arithmetic on null pointers is undefined behaviour, even if adding[1,2]/subtracting[3] zero. I don't think it is too far fetched to expect from users to know that an operation is undefined behaviour even if one of the operands is zero.
Michael
[1] https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-addition.c
[2] https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/ubsan/TestCases/Pointer/nullptr-and-nonzero-offset-constants.cpp
[3] https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-subtraction.c
Michael
_______________________________________________
On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctet...@quicinc.com> wrote:The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers.
It seems to me that most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code.It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case.Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback.
To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler.It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable.Switching mode in a single TU seems valuable, but could this be handled with pragmas or function attributes instead?
Changing the compiler is hard, affects everybody who uses the compiler, and creates inconsistency in behavior between clang and gcc (and msvc with /fp:fast), and clang and old versions of clang.ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is implemented in libc, it is also a real check. Actually removing `isnan` creates inconsistency.The behavior of fast-math with respect to NaN is consistent across the mainstream c/c++ compilers: no promises are made, users should not assume that they can use it for anything. Changing it now would create a major portability issue for user codebases, which in and of itself is a very strong reason to not make this change.Removing `isnan` is only an optimization, it does not intend to change semantics. So it cannot create portability issues. Quite the contrary, it helps portability by making behavior consistent between compilers and libc implementations. The only possible issue is performance loss, this is discussed above, it is an unlikely case. Anyway, if such loss exists and it is absolutely intolerable for a user, a hack with redefinition of `isnan` restores the previous code generation.
If the behavior is confusing to users, that's because it's poorly explained.What is confusing? That the explicitly written call to a function is not removed? According to user feedback it is the silent removal of `isnan` that confuses users.Honestly, I think the docs are pretty clear, but "It's clear, you just need to learn to read" is never an acceptable answer so it could certainly be improved. This is the only thing that needs to be fixed in my opinion.The documentation says about -ffinite-math-only:"Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs."Is it clear whether `isnan` is arithmetic or not?If the result of a floating point arithmetic is fed directly to `isnan()`, are we allowed to eliminate the computation and fold the check to none? (seems like it according to the sentence you're quoting).Are we back to `isnan(x+0.0)` can be folded but not `isnan(x)`?
On Thu, Sep 16, 2021 at 1:31 PM Serge Pavlov <sepa...@gmail.com> wrote:On Tue, Sep 14, 2021 at 12:50 AM Serge Pavlov <sepa...@gmail.com> wrote:On Mon, Sep 13, 2021 at 11:46 PM Chris Tetreault <ctet...@quicinc.com> wrote:… is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:You are right, this was a bad idea. Compiler may optimize out `isnan` but only when it deduces that the value cannot be NaN, but not due to the user's promise. It is especially important for `isinf`. Addition of two finite values may produce infinity and there is no universal way to predict it. It is probably not an issue for types like float or double, but ML cores use halfs or even minifloats, where overflow is much more probable. If in the code:```float r = a + b;if (isinf(r)) {...````isinf` were optimized out just because -ffinite-math-only is in effect, the user cannot check if overflow did not occur.Rules proposed by Richard are also formulated using arguments, not results. Now there is no intention to optimize such a case.Infinity (HUGE_VAL) is already not NaN, so this example doesn't have anything to do with the NaN cases being discussed.However, let's rephrase as a NaN situation:bool f1(float a, float b) {float r = a + b;return isnan(r);}bool result = f1(-HUGE_VAL, HUGE_VAL); // expect "true"Here, `a + b` can produce quiet-NaN (if `a` is -HUGE_VAL and `b` is +HUGE_VAL).By Richard Smith's -ffast-math proposal as I understand it, this quiet-NaN result would be treated "as if" it were a signaling NaN.Under IEEE 754, no operation ever produces a signaling NaN, so unfortunately IEEE 754 can't guide us here; but intuitively, I think we'd all say that merely producing a signaling NaN would not itself cause a signal. So we store the quiet-NaN result in `r`.Then we ask whether `isnan(r)`. The quiet-NaN result in `r` is used. By Richard Smith's -ffast-math proposal as I understand it, any operation would produce an unspecified result if it would raise a signal; but in fact `isnan(r)` is a non-signaling operation, so even though we're treating quiet-NaN as signaling-NaN, isnan(r) never raises any signal. So this code has well-defined behavior in -ffast-math mode.(And because the code's behavior is well-defined, therefore `isnan(r)` has its usual meaning. When `r` holds a quiet-NaN, as in this case, `isnan(r)` will correctly return `true`.)
I've googled, but failed to discover, whether comparison against a signaling NaN is expected to signal. That is,bool f2(float a, float b) {float r = a + b;return (r != r);}bool result = f2(-HUGE_VAL, HUGE_VAL); // expect "true" in IEEE754 mode, but perhaps "false" in -ffast-math modeI'm really hoping that comparison is a signaling operation. If it is, then according to Richard Smith's proposal as I understand it, the compiler would be free to optimize `(r != r)` into `(false)` in -ffast-math mode. (And, as a corollary, the compiler would not generally be free to transform `isnan(r)` into `(r != r)`, because the latter expression has more preconditions than the former.)–Arthur
_______________________________________________
On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joke...@gmail.com> wrote:On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctet...@quicinc.com> wrote:The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers.
It seems to me that most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code.It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case.Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback.
To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler.It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable.Switching mode in a single TU seems valuable, but could this be handled with pragmas or function attributes instead?GCC allows it by using `#pragma GCC optimize()`, but clang does not support it. No suitable function attribute exists for that.
> It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case.
The *intent* of -ffast-math is to do absolutely everything possible to get the fastest floating point math possible. In other words, to pretend that floating point values are actually real numbers. In real math, there is no negative 0, or NaN, and infinity is not something you do algebra with. If you try to divide by 0, you just get points off your midterm; there’s no “behavior” that needs to be defined.
> Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback.
The use case is legitimate, I agree, and the compiler does allow it. In fancyAlgo.cpp, have your input validation code that checks for NaN. in fancyAlgoFast.cpp, have your ML kernel, compiled with fast-math. If you really *must* have them all be in the same TU, then compile with -fhonor-nans.
> It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable.
It's a workaround for a well defined behavior that you opted into. If having multiple TU’s is not an option, then you can do some of the other solutions presented or compile with -fhonor-nans.
> ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is implemented in libc, it is also a real check. Actually removing `isnan` creates inconsistency.
I don’t know about ICC, but this is a quote from the docs for MSVC: “Special values (NaN, +infinity, -infinity, -0.0) may not be propagated or behave strictly according to the IEEE-754 standard”. My interpretation of this is the same as that for clang: no guarantees are provided. Maybe it works today, but that doesn’t mean it will work tomorrow.
> Removing `isnan` is only an optimization, it does not intend to change semantics. So it cannot create portability issues. Quite the contrary, it helps portability by making behavior consistent between compilers and libc implementations. The only possible issue is performance loss, this is discussed above, it is an unlikely case. Anyway, if such loss exists and it is absolutely intolerable for a user, a hack with redefinition of `isnan` restores the previous code generation.
It creates a portability issue if we change the compiler to guarantee that isnan(x) works, and then users rely on it working. The resulting code is non portable to other compilers, and to old versions of clang.
> The documentation says about -ffinite-math-only:
> "Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs."
> Is it clear whether `isnan` is arithmetic or not?
The docs for -ffinite-math-only on https://clang.llvm.org/docs/UsersManual.html actually say “Allow floating-point optimizations that assume arguments and results are not NaNs or +-Inf”. Notice that it says nothing about arithmetic. I’m not sure where you got your quote, but it’s not from the docs for the currently available release of upstream clang.
Thanks,
Chris Tetreault
For the record, I think having a pragma that allows you to control fast-math behavior is fine. This sort of thing is far from unprecedented, and is far more palatable to me than just having random special cases. It’s also far more useful as I can do something like:
#pragma clang fast-math push
#pragma clang fast-math-on
// bunch of hairy floating point math here
#pragma clang fast-math-pop
… and easily isolate fast math to the smallest area it can be. This is morally equivalent to putting the scary math in it’s own TU, but less annoying.
The pragma should probably have a push/pop, but otherwise I don’t really care what color the bike shed is.
Thanks,
Chris Tetreault
From: Mehdi AMINI <joke...@gmail.com>
Sent: Friday, September 17, 2021 9:17 AM
To: Serge Pavlov <sepa...@gmail.com>
On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepa...@gmail.com> wrote:On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joke...@gmail.com> wrote:On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctet...@quicinc.com> wrote:The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers.
It seems to me that most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code.It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case.Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback.
To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler.It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable.Switching mode in a single TU seems valuable, but could this be handled with pragmas or function attributes instead?GCC allows it by using `#pragma GCC optimize()`, but clang does not support it. No suitable function attribute exists for that.Right, I know that clang does not support it, but it could :)So since we're looking at what provides the best user-experience: isn't that it? Shouldn't we look into providing this level of granularity? (whether function-level or finer grain)
You’re confusing implementation details (you have a Godbolt link that shows that MSVC just happens to not remove the isnan call) with documented behavior (I provided a link to the MSVC docs that shows that no promises are made with respect to NaN). The fact is that no compiler (Maybe ICC does, I don’t know, I haven’t checked. I bet their docs say something similar to MSVC, clang, and GCC though.) guarantees that isnan(x) will not be optimized out with fast-math enabled. There is no inconsistency: all the compilers document that they are free to optimize as if there were no NaNs, and they then do whatever is best for their implementation. If you think this is inconsistent, then let me tell you about that time I dereferenced a null pointer and it didn’t segfault.
Now, many people have suggested in this thread that a pragma be added. I personally fully support this proposal. I think it’s a very clean solution, and any non-trivial portable codebase probably already has a library of preprocessor macros that abstract this sort of thing. Do you have a concrete reason why a pragma is unsuitable?
From: Serge Pavlov <sepa...@gmail.com>
Sent: Monday, September 20, 2021 1:23 AM
To: Mehdi AMINI <joke...@gmail.com>
Cc: Chris Tetreault <ctet...@quicinc.com>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
On Fri, Sep 17, 2021 at 11:17 PM Mehdi AMINI <joke...@gmail.com> wrote:
On Fri, Sep 17, 2021 at 11:17 PM Mehdi AMINI <joke...@gmail.com> wrote:On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepa...@gmail.com> wrote:On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joke...@gmail.com> wrote:On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctet...@quicinc.com> wrote:The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers.
It seems to me that most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code.It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case.Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback.
To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler.It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable.Switching mode in a single TU seems valuable, but could this be handled with pragmas or function attributes instead?GCC allows it by using `#pragma GCC optimize()`, but clang does not support it. No suitable function attribute exists for that.Right, I know that clang does not support it, but it could :)So since we're looking at what provides the best user-experience: isn't that it? Shouldn't we look into providing this level of granularity? (whether function-level or finer grain)It could mitigate the problem if it were implemented. A user who needs to handle NaNs in -ffinite-math-only compilation and writes the code from scratch could use this facility to get things working. I also think such pragma, implemented with enough degree of flexibility, could be useful irrespective of this topic.However, in general it does not solve the problem. The most important issue which remains unaddressed is inconsistency of the implementation.The handling of `isnan` in -ffinite-math-only by clang is not consistent because:
- It differs from what other compilers do. Namely MSVC and Intel compiler do not throw away `isnan` in this mode: https://godbolt.org/z/qTaz47qhP.
- It depends on optimization options. With -O2 the check is removed but with -O0 remains: https://godbolt.org/z/cjYePv7s7. Other options also can affect the behavior, for example with `-ffp-model=strict` the check is generated irrespective of the optimization mode (see the same link).
- It is inconsistent with libc implementations. If `isnan` is provided by libc, it is a real check, but the compiler may drop it.
It would not be an issue if `isnan` removal were just an optimization. It however changes semantics in the presence of NaNs, so such removal can break user code.In the typical use case a user puts a call to `isnan` to ensure no operations on NaNs occur. The call can also be present in some header that implements some functionality for the general case. It may work because `isnan` is provided by libc. Later on when configuration changes or libc is updated the code may be broken, because implementation of `isnan` changes, as it happened after https://reviews.llvm.org/D69806.If clang kept calls to `isnan`, it would be consistent with ICC and MSVC and with all libc implementations. The behavior would be different from gcc, but clang would be on the winning side, because the number of programs that work with clang would be larger.Also if we agree that NaNs can appear in the code compiled with -ffinite-math-only, there must be a way to check if a number is a NaN.
Thanks,--Serge
all the compilers document that they are free to optimize as if there were no NaNs, and they then do whatever is best for their implementation.
Do you have a concrete reason why a pragma is unsuitable?
I think part of my concern is with the "result of floating point
arithmetic can't have NaN" statement. I consider this case:
if (isnan(foo / bar)) {}
to be fundamentally different from this case:
if (isnan(foo)) {}
because the first example can never result in the if branch being
taken with -ffinite-math-only while the second example can. Assuming
that all values are the result of floating-point arithmetic is a
faulty assumption.
~Aaron
You’re confusing implementation details (you have a Godbolt link that shows that MSVC just happens to not remove the isnan call) with documented behavior (I provided a link to the MSVC docs that shows that no promises are made with respect to NaN). The fact is that no compiler (Maybe ICC does, I don’t know, I haven’t checked. I bet their docs say something similar to MSVC, clang, and GCC though.) guarantees that isnan(x) will not be optimized out with fast-math enabled. There is no inconsistency: all the compilers document that they are free to optimize as if there were no NaNs, and they then do whatever is best for their implementation. If you think this is inconsistent, then let me tell you about that time I dereferenced a null pointer and it didn’t segfault.
Now, many people have suggested in this thread that a pragma be added. I personally fully support this proposal. I think it’s a very clean solution, and any non-trivial portable codebase probably already has a library of preprocessor macros that abstract this sort of thing. Do you have a concrete reason why a pragma is unsuitable?
On Mon, Sep 20, 2021 at 1:04 PM Mehdi AMINI via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> I'd find it unfortunate though that in a mode which specifies that the result of floating point arithmetic can't have NaN we can't constant fold isnan(x) where x is a potentially complex expression (think that x could be dead-code if not for the isnan).
I think part of my concern is with the "result of floating point
arithmetic can't have NaN" statement. I consider this case:
if (isnan(foo / bar)) {}
to be fundamentally different from this case:
if (isnan(foo)) {}
because the first example can never result in the if branch being
taken with -ffinite-math-only while the second example can.
Thanks! So long as we are conservative with the optimization and only
fold the call to isnan when the operand is proven to be the result of
an arithmetic expression, my concerns are lessened.
~Aaron
>
> HTH,
> –Arthur
Right, but having to manage the fact that `x + 0. -> x` would become a
potential pessimization for the optimizer is quite disturbing to me.
--
Mehdi
Eh, this is in an "ignore the standard" compiler mode, so all of it is
quite disturbing to me. :-D But more seriously, I would rather we err
on the side of caution when deciding which code the user explicitly
wrote that can be removed by the optimizer. When my code goes slow, I
can profile it to see what's causing that and react accordingly if I
care. When the optimizer removes some error handling code I have, I am
100% reliant on my testing infrastructure catching that before my
users do, and testing code is sometimes run in debug mode for a
variety of reasons and not everyone writes fantastic testing code that
covers all of their error cases. To me, being conserative is the less
user-hostile approach in this case.
~Aaron
That does not seem like a terrible idea: we already limit inline when
function attributes mismatch in this way.
But here you don't even need inlining if the pragma is used only for a
sequence of statements inside a function. Fortunately we handle
fast-math with individual instruction flags, so you could imagine:
#pragma GCC optimize("ffast-math")
x = a + b;
#pragma GCC optimize("fno-fast-math")
if (isnan(x)) {
...
}
which would tag the fadd with the fast flag but not the isnan.
In practice you'd write this way though:
x = a + b; // default specified on the command line
#pragma GCC push_options
#pragma push GCC optimize("fno-fast-math")
if (isnan(x)) {
...
}
#pragma GCC pop_options
>
> And of course some programmer is going to try something dumb like
>
> #pragma GCC optimize("ffast-math")
> #define REAL_ISNAN(x) std::isnan(x)
> #pragma GCC optimize("fno-fast-math")
>
> which "of course" won't work, but who's going to explain it to them?
>
> Not to mention, if the pragma is active at the top of the TU where some template or implicitly defaulted special member is defined, but then it's not active at the point where the template is instantiated or the special member is implicitly defined... what the heck happens in that case? and who's going to write the StackOverflow answer about it?
>
> Basically, the translation unit is the natural unit of... hmm... translation. There's very little return-on-investment involved in trying to circumvent that.
>
> –Arthur
On Wed, Sep 08, 2021 at 06:04:08PM +0000, Chris Tetreault via llvm-dev wrote:
> As a developer (who always reads the docs and generally makes good life
> choices), if I turn on -ffast-math, I want the compiler to produce the
> fastest possible floating point math code possible, floating point
> semantics be darned. Given this viewpoint, my opinion on this topic is
> that the compiler should do whatever it wants, given the constraints of
> the documented behavior of NaN.
There is a huge different between optimisations that assume NaN is not
present and breaking checks for them. I'm not convinced at all that
constant-folding isnan to false will actually speed up real world code.
Of course, this all sounds fine and well, but the reality is that people don't read docs and don't make good life choices. They turn on fast math because they want it to reduce `x * 0` to `0`, and are surprised when their NaN handling code fails. This is unfortunate, but I don't think we should reduce the effectiveness of fast-math because of this human issue. Other flags exist for these users, and when they complain they should be told about them. Really this is an issue of poor developer discipline, and if we really want to solve this, perhaps some sort of "fast math sanitizer" can be created. It can statically analyze code and complain when it sees things like `if (isnan(foo))` not guarded by `__FAST_MATH__` with fast math enabled. Or, maybe the compiler can just issue a warning unconditionally in this case.
I don’t know if that check against infinity is UB, but the way I read the docs this code is not safe. If we had a pragma to control fast-math, then it would be easy to fix this.
From: Jorg Brown <jorg....@gmail.com>
Sent: Monday, September 20, 2021 11:29 AM
To: Chris Tetreault <ctet...@quicinc.com>
Cc: Serge Pavlov <sepa...@gmail.com>; LLVM Developers <llvm...@lists.llvm.org>; cfe...@lists.llvm.org
Subject: Re: [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
On Wed, Sep 8, 2021 at 11:04 AM Chris Tetreault via llvm-dev <llvm...@lists.llvm.org> wrote:
> Exactly. Leaving `isnan` in the code makes compiler behavior more consistent and convenient for users. Clang also can go this way.
It is never a good idea to rely on implementation details when the documentation states otherwise. Just because MSVC and ICC *just happen to* work as you’d like clang to *today*, doesn’t mean that they will tomorrow. Their docs state this clearly. Dereferencing a pointer you just freed just happens to work sometimes too, but I can’t think of anybody who would use this as a basis for an argument that use-after-free is a reasonable use case.
> I described the concerns in the reply to Mehdi Amini's message.
I did read your response to Mehdi. You said “It could mitigate the problem if it were implemented”. You then went on to reiterate all your other arguments, completely ignoring a clean solution.
Adding a pragma is what’s most convenient for users. That way they can opt into, or opt out of, having their isnan calls being candidates for optimization. If clang guaranteed that isnan(x) would not be optimized out, then I cannot use a pragma to force it to be an optimization candidate. If clang keeps its current behavior, then a new fast-math pragma can be used to ensure that it is not optimized out. It’s the most flexible solution.
From: Serge Pavlov <sepa...@gmail.com>
Sent: Monday, September 20, 2021 10:10 AM
To: Chris Tetreault <ctet...@quicinc.com>
On 20/09/2021 20:59, Chris Tetreault via llvm-dev wrote:
> > Exactly. Leaving `isnan` in the code makes compiler behavior more
> consistent and convenient for users. Clang also can go this way.
>
> It is never a good idea to rely on implementation details when the
> documentation states otherwise. Just because MSVC and ICC **just happen
> to** work as you’d like clang to **today**, doesn’t mean that they will
> tomorrow. Their docs state this clearly. Dereferencing a pointer you
> just freed just happens to work sometimes too, but I can’t think of
> anybody who would use this as a basis for an argument that
> use-after-free is a reasonable use case.
And the trouble with that is that what YOU want, isn't necessarily what
SOMEONE ELSE wants.
>
> > I described the concerns in the reply to Mehdi Amini's message.
>
> I did read your response to Mehdi. You said “It could mitigate the
> problem if it were implemented”. You then went on to reiterate all your
> other arguments, completely ignoring a clean solution.
>
> Adding a pragma is what’s most convenient for users. That way they can
> opt into, or opt out of, having their isnan calls being candidates for
> optimization. If clang guaranteed that isnan(x) would not be optimized
> out, then I cannot use a pragma to force it to be an optimization
> candidate. If clang keeps its current behavior, then a new fast-math
> pragma can be used to ensure that it is not optimized out. It’s the most
> flexible solution.
>
I know I may be coming in at half cock, but what is the CORRECT
MATHEMATICAL behaviour? (Yes I know there may be multiple "correct"
behaviours.) Users should always be able to select a "correct"
behaviour, and one of them should be the default, not copying some other
implemention just because "they did it thataway". Iirc excel used to
accept 29 Feb 1900 as a valid date - that's no reason for all the other
spreadsheets to get it wrong, too ...
(MY "correct" hobby horse - I'd like "divide by zero" to return
"infinity" and "divide by infinity" return "zero", because that way the
origin does not have special status and it makes maths much easier. I've
heard a bunch of reasons why I'm wrong, but should changing the
co-ordinate system really alter the result of a mathematical operation?
Didn't Einstein say everything is relative? There is nothing special
about any observation point?)
Cheers,
Wol
I know I may be coming in at half cock, but what is the CORRECT
MATHEMATICAL behaviour?
(tl;dr) I seem to have become embroiled in this argument again, and this is clearly going nowhere. Therefore, I propose:
(my rebuttal to this last message)
Many users of fast-math *can* actually ignore the existence of NaNs. For example, in a real time 3D renderer, the presence of NaN is typically a bug. In the hot render loop, you cannot afford to do any sort of error handling related to NaN, and getting a NaN in one your matrices or vertices will result in black triangles or a black screen. For these users, you must just be disciplined, and avoid creating NaNs.
However, if such a user were to depend on a math lib (probably templated because graphics programmers overwhelmingly prefer C++, which means that they would be compiling it themselves, but can’t easily modify it) that contains isnan, it would be a shame if the compiler were forbidden to consider it an optimization candidate because they called a function that “uses” it for input validation or something. Such code would look something like:
```
template <typename T> T f(T x) {
// some sort of branch based on isnan(x) here
```
… which meets the criteria you propose for not being eliminated, but is also absolutely undesirable. For users that do care about NaN, a pragma would be an elegant way for them to ensure that the isnan call they care about is not eliminated. For a graphics developer, an assert on isnan suddenly becomes useful if they had a pragma and could enable it selectively.
Thanks,
Christopher Tetreault
From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of
Serge Pavlov via llvm-dev
Sent: Tuesday, September 21, 2021 6:05 AM
To: antlists <antl...@youngman.org.uk>
Cc: LLVM Developers <llvm...@lists.llvm.org>
Subject: Re: [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?
WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.
On Tue, Sep 21, 2021 at 3:46 AM antlists via llvm-dev <llvm...@lists.llvm.org> wrote: