[llvm-dev] Should isnan be optimized out in fast-math mode?

631 views
Skip to first unread message

Serge Pavlov via llvm-dev

unread,
Sep 8, 2021, 1:03:09 PM9/8/21
to LLVM Developers, Clang Dev
Hi all,

One of the purposes of `llvm::isnan` was to help preserve the check made by `isnan` if fast-math mode is
specified (https://reviews.llvm.org/D104854). I'd like to describe reason for that and propose to use the behavior
implemented in that patch.

The option `-ffast-math` is often used when performance is important, as it allows a compiler to generate faster code.
This option itself is a collection of different optimization techniques, each having its own option. For this topic only the
option `-ffinite-math-only` is of interest. With it the compiler treats floating point numbers as mathematical real numbers,
so transformations like `0 * x -> 0` become valid.

In clang documentation (https://clang.llvm.org/docs/UsersManual.html#cmdoption-ffast-math) this option is described as:

    "Allow floating-point optimizations that assume arguments and results are not NaNs or +-Inf."

GCC documentation (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) is a bit more concrete:

    "Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs."

**What is the issue?**

C standard defines a macro `isnan`, which can be mapped to an intrinsic function provided by the compiler. For both
clang and gcc it is `__builtin_isnan`. How should this function behave if `-ffinite-math-only` is specified? Should it make a
real check or the compiler can assume that it always returns false?

GCC optimizes out `isnan`. It follows from the viewpoint that (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50724#c1):

    "With -ffinite-math-only you are telling that there are no NaNs and thus GCC optimizes isnan (x) to 0."

Such treatment of `-ffinite-math-only` has sufficient drawbacks. In particular it makes it impossible to check validity of
data: a user cannot write

assert(!isnan(x));

because the compiler replaces the actual function call with its expected value. There are many complaints in GCC bug
solutions are using integer operations to make the check, to turn off `-ffinite-math-only` in some parts of the code or to
ensure that libc function is called. It clearly demonstrates that `isnan` in this case is useless, but users need its functionality
and do not have a proper tool to make required checks. The similar direction was criticized in llvm as well (https://reviews.llvm.org/D18513#387418).

**Why imposing restrictions on floating types is bad?**

If `-ffinite-math-only` modifies properties of `double` type, several issues arise, for instance:
- What should return `std::numeric_limits<double>::has_quiet_NaN()`?
- What body should have this function if it is used in a program where some functions are compiled with `fast-math` and some without?
- Should inlining of a function compiled with `fast-math` to a function compiled without it be prohibited in inliner?
- Should `std::isnan(std::numeric_limits<float>::quiet_NaN())` be true?

If the type `double` cannot have NaN value, it means that `double` and `double` under `-ffinite-math-only` are different types
(https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544641.html). Such a way can solve these problems but it is so expensive
that hardly it has a chance to be realized.

**The solution**

Instead of modifying properties of floating point types, the effect of `-ffinite-math-only` can be expressed as a restriction on
operation usage.  Actually clang and gcc documentation already follows this way. Fast-math flags in llvm IR also are attributes
of instructions. The only question is whether `isnan` and similar functions are floating-point arithmetic.

From a practical viewpoint, treating non-computational functions as arithmetic does not add any advantage. If a code extensively
uses `isnan` (so could profit by their removal), it is likely it is not suitable for -ffinite-math-only. This interpretation however creates
the problems described above. So it is profitable to consider `isnan` and similar functions as non-arithmetical.

**Why is it safe to leave `isnan`?**

The probable concern of this solution is deviation from gcc behavior. There are several reasons why this is not an issue.

1. -ffinite-math-only is an optimization option. A correct program compiled with -ffinite-math-only and without it should behave
   identically, if conditions for using -ffinite-math-only are fulfilled. So making the check cannot break functionality.
2. `isnan` is implemented by libc, which can map it to a compiler builtin or use its own implementation, depending on
   configuration options. `isnan` implemented in libc obviously always does the real check.
3. ICC and MSVC preserve `isnan` in fast-math mode.

The proposal is to not consider `isnan` and other such functions as arithmetic operations and do not optimize them out
just because -ffinite-math-only is specified. Of course, there are cases when `isnan` may be optimized out, for instance,
`isnan(a + b)` may be optimized if -ffinite-math-only is in effect due to the assumption (result of arithmetic operation is not NaN).

What are your opinions?

Thanks,
--Serge

Chris Tetreault via llvm-dev

unread,
Sep 8, 2021, 2:04:36 PM9/8/21
to Serge Pavlov, LLVM Developers, cfe...@lists.llvm.org

As a developer (who always reads the docs and generally makes good life choices), if I turn on -ffast-math, I want the compiler to produce the fastest possible floating point math code possible, floating point semantics be darned. Given this viewpoint, my opinion on this topic is that the compiler should do whatever it wants, given the constraints of the documented behavior of NaN. I think the clang docs for -ffast-math are pretty clear on this subject:

 

```

Enable fast-math mode. This option lets the compiler make aggressive, potentially-lossy assumptions about floating-point math. These include:

...

- Operands to floating-point operations are not equal to NaN and Inf ...

```

 

The compiler may assume that operands to floating point operations are not NaN or infinity. So:

 

- What should return `std::numeric_limits<double>::has_quiet_NaN()`? : It should return true if it would have returned true with fast math disabled. Clang is not required to pretend NaN doesn't exist, it's allowed to pretend arguments cannot be NaN if that is convenient.

- What body should have this function if it is used in a program where some functions are compiled with `fast-math` and some without? : This function should be allowed to act as if NaN exists in all cases.

- Should inlining of a function compiled with `fast-math` to a function compiled without it be prohibited in inliner? No. The author of the function that uses fast-math made their choices, and the user of that function should have vetted their dependencies better. In my view, this is no different than if somebody wrote `if (x == y/z) ...`; it's a bug on the user. It's not clang's fault that this code doesn't work as the author wanted.

- Should `std::isnan(std::numeric_limits<float>::quiet_NaN())` be true? : No. quiet_NaN() can return whatever it wants, but the call to std::isnan can be replaced with false since it may assume it's argument is not NaN.

 

Of course, this all sounds fine and well, but the reality is that people don't read docs and don't make good life choices. They turn on fast math because they want it to reduce `x * 0` to `0`, and are surprised when their NaN handling code fails. This is unfortunate, but I don't think we should reduce the effectiveness of fast-math because of this human issue. Other flags exist for these users, and when they complain they should be told about them. Really this is an issue of poor developer discipline, and if we really want to solve this, perhaps some sort of "fast math sanitizer" can be created. It can statically analyze code and complain when it sees things like `if (isnan(foo))` not guarded by `__FAST_MATH__` with mast math enabled. Or, maybe the compiler can just issue a warning unconditionally in this case.

 

Thanks,

   Chris Tetreault

 

From: cfe-dev <cfe-dev...@lists.llvm.org> On Behalf Of Serge Pavlov via cfe-dev
Sent: Wednesday, September 8, 2021 10:03 AM
To: LLVM Developers <llvm...@lists.llvm.org>; Clang Dev <cfe...@lists.llvm.org>
Subject: [cfe-dev] Should isnan be optimized out in fast-math mode?

 

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

Joerg Sonnenberger via llvm-dev

unread,
Sep 8, 2021, 4:58:41 PM9/8/21
to llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Wed, Sep 08, 2021 at 06:04:08PM +0000, Chris Tetreault via llvm-dev wrote:
> As a developer (who always reads the docs and generally makes good life
> choices), if I turn on -ffast-math, I want the compiler to produce the
> fastest possible floating point math code possible, floating point
> semantics be darned. Given this viewpoint, my opinion on this topic is
> that the compiler should do whatever it wants, given the constraints of
> the documented behavior of NaN.

There is a huge different between optimisations that assume NaN is not
present and breaking checks for them. I'm not convinced at all that
constant-folding isnan to false will actually speed up real world code.

Joerg
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Chris Tetreault via llvm-dev

unread,
Sep 8, 2021, 6:16:51 PM9/8/21
to Joerg Sonnenberger, llvm...@lists.llvm.org, cfe...@lists.llvm.org
Maybe not, but it will simplify the implementation of clang, and eliminating even 1 instruction is technically a speedup. If the check is in an assert, then it would ideally be removed in a release build and not matter anyways. If the check is in a branch, then that's a whole branch that can get eliminated as dead code, which may be huge if it's deep in the hot render loop.

But really, "a check for NaN" is an operation, so by the documented behavior of -ffast-math, it should assume that it does not receive NaN as an argument. Absent a compelling use case, I think consistent behavior is a very valuable thing to have. By turning on fast math, as a developer you are saying "My code doesn't have NaN, so feel free to optimizing assuming this". To then go ahead and have code that expects checks for NaN to work is kind of silly. If the user wants this behavior, they should pass -funsafe-math-optimizations (or whatever subset of the flags of fast math that they really wanted). After all, what is the point of checking for NaN if "you don't have NaN"?

Really, the problem is that `-ffast-math` is the flag that everybody knows about, so they use it and get upset when it doesn't do what they want. This is a problem of education, not something the compiler should be working around. Now, if we want to issue warnings about misuse of things like isnan or isfinite in the presence of fast math, then that would be great.

Thanks,
Chris Tetreault

-----Original Message-----
From: cfe-dev <cfe-dev...@lists.llvm.org> On Behalf Of Joerg Sonnenberger via cfe-dev
Sent: Wednesday, September 8, 2021 1:58 PM
To: llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

On Wed, Sep 08, 2021 at 06:04:08PM +0000, Chris Tetreault via llvm-dev wrote:
> As a developer (who always reads the docs and generally makes good
> life choices), if I turn on -ffast-math, I want the compiler to
> produce the fastest possible floating point math code possible,
> floating point semantics be darned. Given this viewpoint, my opinion
> on this topic is that the compiler should do whatever it wants, given
> the constraints of the documented behavior of NaN.

There is a huge different between optimisations that assume NaN is not present and breaking checks for them. I'm not convinced at all that constant-folding isnan to false will actually speed up real world code.

Joerg
_______________________________________________
cfe-dev mailing list
cfe...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Kevin Neal via llvm-dev

unread,
Sep 8, 2021, 6:18:54 PM9/8/21
to llvm...@lists.llvm.org, Clang Dev
Constant folding away isnan() has already been mentioned as something that surprises people when it eliminates useful things like assert(!isnan(x)). This can be worked around by using integer operations, of course. But having isnan() ignore fast math flags will produce instructions that will frequently be faster than the integer operations.

Are fast math flags _required_ to make assumptions? Or simply _allowed_? The difference is key here.

-----Original Message-----
From: cfe-dev <cfe-dev...@lists.llvm.org> On Behalf Of Joerg Sonnenberger via cfe-dev
Sent: Wednesday, September 08, 2021 4:58 PM
To: llvm...@lists.llvm.org; cfe...@lists.llvm.org

Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?

EXTERNAL

On Wed, Sep 08, 2021 at 06:04:08PM +0000, Chris Tetreault via llvm-dev wrote:
> As a developer (who always reads the docs and generally makes good life
> choices), if I turn on -ffast-math, I want the compiler to produce the
> fastest possible floating point math code possible, floating point
> semantics be darned. Given this viewpoint, my opinion on this topic is
> that the compiler should do whatever it wants, given the constraints of
> the documented behavior of NaN.

There is a huge different between optimisations that assume NaN is not
present and breaking checks for them. I'm not convinced at all that
constant-folding isnan to false will actually speed up real world code.

Joerg
_______________________________________________
cfe-dev mailing list
cfe...@lists.llvm.org
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fcfe-dev&amp;data=04%7C01%7Ckevin.neal%40sas.com%7C601e59a1438e478816ea08d9730b6652%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C637667315063404929%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ZSz6TH00A3DndUq1563b5akWHpxf81ZGn6nqImqP8Gw%3D&amp;reserved=0

James Y Knight via llvm-dev

unread,
Sep 8, 2021, 6:27:54 PM9/8/21
to Serge Pavlov, LLVM Developers, Clang Dev
I expressed my strong support for this on the previous thread, but I'll just repost the most important piece...

I believe the proposed semantics from the Clang level ought to be:
  The -ffinite-math-only and -fno-signed-zeros options do not impact the ability to accurately load, store, copy, or pass or return such values from general function calls. They also do not impact any of the "non-computational" and "quiet-computational" IEEE-754 operations, which includes classification functions (fpclassify, signbit, isinf/isnan/etc), sign-modification (copysign, fabs, and negation `-(x)`), as well as the totalorder and totalordermag functions. Those correctly handle NaN, Inf, and signed zeros even when the flags are in effect. These flags do affect the behavior of other expressions and math standard-library calls, as well as comparison operations.

I would not expect this to have an actual negative impact on the performance benefit of those flags, since the optimization benefits mainly arise from comparisons and the general computation instructions which are unchanged.

In further support of this position, I note that the previous thread uncovered at least one vendor -- Apple (https://opensource.apple.com/source/Libm/Libm-2026/Source/Intel/math.h.auto.html) -- going out of their way to cause isnan and friends to function properly with -ffast-math enabled.

_______________________________________________
cfe-dev mailing list
cfe...@lists.llvm.org

Chris Lattner via llvm-dev

unread,
Sep 8, 2021, 6:51:43 PM9/8/21
to James Y Knight, LLVM Developers, Clang Dev
On Sep 8, 2021, at 3:27 PM, James Y Knight via llvm-dev <llvm...@lists.llvm.org> wrote:

I expressed my strong support for this on the previous thread, but I'll just repost the most important piece...

I believe the proposed semantics from the Clang level ought to be:
  The -ffinite-math-only and -fno-signed-zeros options do not impact the ability to accurately load, store, copy, or pass or return such values from general function calls. They also do not impact any of the "non-computational" and "quiet-computational" IEEE-754 operations, which includes classification functions (fpclassify, signbit, isinf/isnan/etc), sign-modification (copysign, fabs, and negation `-(x)`), as well as the totalorder and totalordermag functions. Those correctly handle NaN, Inf, and signed zeros even when the flags are in effect. These flags do affect the behavior of other expressions and math standard-library calls, as well as comparison operations.

FWIW, I completely agree - these flags are about enabling optimizations that the presence of nans otherwise prohibits.  We shouldn’t take a literal interpretation of an old GCC manual, as that would not be useful.

If we converge on this definition, I think it should be documented.  This is a source of confusion that comes up periodically.

-Chris
_______________________________________________

Krzysztof Parzyszek via llvm-dev

unread,
Sep 9, 2021, 9:30:31 AM9/9/21
to Chris Lattner, James Y Knight, LLVM Developers, cfe...@lists.llvm.org

If we say that the fast-math flags are “enabling optimizations that the presence of nans otherwise prohibits”, then there is no reason for clang to keep calls to “isnan” around, or to keep checks like “fpclassify(x) == it’s_a_nan” unfolded.  These are exactly the types of optimizations that the presence of NaNs would prohibit.

 

I understand the need for having some NaN-handling preserved in an otherwise finite-math code.  We already have fast-math-related attributes attached to each function in the LLVM IR, so we could introduce a source-level attribute for enabling/disabling these flags per function.

 

 

--

Krzysztof Parzyszek  kpar...@quicinc.com   AI tools development

 

From: cfe-dev <cfe-dev...@lists.llvm.org> On Behalf Of Chris Lattner via cfe-dev
Sent: Wednesday, September 8, 2021 5:51 PM
To: James Y Knight <jykn...@google.com>
Cc: LLVM Developers <llvm...@lists.llvm.org>; Clang Dev <cfe...@lists.llvm.org>
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?

 

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

On Sep 8, 2021, at 3:27 PM, James Y Knight via llvm-dev <llvm...@lists.llvm.org> wrote:

Sanjay Patel via llvm-dev

unread,
Sep 9, 2021, 11:02:44 AM9/9/21
to Krzysztof Parzyszek, LLVM Developers, cfe...@lists.llvm.org
Not sure which way to go, but I agree that we need to improve the docs/user experience either way.
Let's try to iron this out with an example (this is based on https://llvm.org/PR51775 ):

#include <math.h>
#include <stdlib.h>
int main() {
    const double d = strtod("1E+1000000", NULL);
    return d == HUGE_VAL;
}

What should this program return when compiled with -ffinite-math-only? Should this trigger a clang warning?

The proposed documentation text isn't clear to me. Should clang apply "nnan ninf" to the IR call for "strtod"?
"strtod" is not in the enumerated list of functions where we would block fast-math-flags, but it is a standard lib call, so "nnan ninf" would seem to apply...but we also don't want "-ffinite-math-only" to alter the ability to return an INF from a "general function call"?

Serge Pavlov via llvm-dev

unread,
Sep 9, 2021, 11:45:37 AM9/9/21
to Sanjay Patel, LLVM Developers, cfe...@lists.llvm.org
On Thu, Sep 9, 2021 at 10:02 PM Sanjay Patel via llvm-dev <llvm...@lists.llvm.org> wrote:
Not sure which way to go, but I agree that we need to improve the docs/user experience either way.
Let's try to iron this out with an example (this is based on https://llvm.org/PR51775 ):

#include <math.h>
#include <stdlib.h>
int main() {
    const double d = strtod("1E+1000000", NULL);
    return d == HUGE_VAL;
}

What should this program return when compiled with -ffinite-math-only? Should this trigger a clang warning?

Comparison `d == HUGE_VAL` is an arithmetic operation, so requirements for using -ffinite-math-only are broken. Both compilers are right.

Serge Pavlov via llvm-dev

unread,
Sep 9, 2021, 12:10:31 PM9/9/21
to Krzysztof Parzyszek, LLVM Developers, cfe...@lists.llvm.org
On Thu, Sep 9, 2021 at 8:30 PM Krzysztof Parzyszek via cfe-dev <cfe...@lists.llvm.org> wrote:

If we say that the fast-math flags are “enabling optimizations that the presence of nans otherwise prohibits”, then there is no reason for clang to keep calls to “isnan” around, or to keep checks like “fpclassify(x) == it’s_a_nan” unfolded.  These are exactly the types of optimizations that the presence of NaNs would prohibit.


Transformation 'x * 0 -> 0' is an optimization allowed in the absence of nans as arguments, because it produces a program that behaves identically under the given restrictions. Replacement of `isnan(x + x)` is also an optimization under the same restrictions. Replacement of `isnan(x)` in general case is not, because we cannot assume that x cannot be a NaN.

 

I understand the need for having some NaN-handling preserved in an otherwise finite-math code.  We already have fast-math-related attributes attached to each function in the LLVM IR, so we could introduce a source-level attribute for enabling/disabling these flags per function.


GCC allows using `#pragma GCC optimize ("finite-math-only")` or `#pragma GCC optimize ("no-finite-math-only")` to enable/disable optimization per function basis. Clang could support this pragmf or maybe `#pragma clang fp` can be extended to support similar functionality.

Krzysztof Parzyszek via llvm-dev

unread,
Sep 9, 2021, 12:29:32 PM9/9/21
to Serge Pavlov, LLVM Developers, cfe...@lists.llvm.org

This goes back to what these options actually imply.  The interpretation that I favor is “this code will never see a NaN”, or “the program can assume that no floating point expression will evaluate to a NaN”.  The benefit of that is that it’s intuitively clear.  In that case “isnan(x)” is false, because x cannot be a NaN.  There is no distinction between “isnan(x+x)” and “isnan(x)”.  If the user wants to preserve “isnan(x)”, they can apply some pragma (which clang may actually have already).

 

To be honest, I’m not sure that I understand your argument.  Are you saying that under your interpretation we could optimize “isnan(x+x) -> false”, but not “isnan(x) -> false”?

 

 

--

Krzysztof Parzyszek  kpar...@quicinc.com   AI tools development

 

Serge Pavlov via llvm-dev

unread,
Sep 9, 2021, 12:53:25 PM9/9/21
to Krzysztof Parzyszek, LLVM Developers, cfe...@lists.llvm.org
On Thu, Sep 9, 2021 at 11:29 PM Krzysztof Parzyszek <kpar...@quicinc.com> wrote:

This goes back to what these options actually imply.  The interpretation that I favor is “this code will never see a NaN”, or “the program can assume that no floating point expression will evaluate to a NaN”.  The benefit of that is that it’s intuitively clear.  In that case “isnan(x)” is false, because x cannot be a NaN.  There is no distinction between “isnan(x+x)” and “isnan(x)”.  If the user wants to preserve “isnan(x)”, they can apply some pragma (which clang may actually have already).


It is apparent simplicity. As the discussion in gcc mail list demonstrated (https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544641.html) this is actually an impromissing way. From a practical viewpoint it is also a bad solution as users cannot even check the assertions.
 

 

To be honest, I’m not sure that I understand your argument.  Are you saying that under your interpretation we could optimize “isnan(x+x) -> false”, but not “isnan(x) -> false”?


Argument of `isnan(x+x)` is a result of arithmetic operation. According to the meaning of -ffinite-math-only it cannot produce NaN. So this call can be optimized out. In the general case `isnan(x)` value may be, say, loaded from memory. Load is not an arithmetic operation, so nothing prevents from loading NaN. Optimizing the call out is dangerous in this case.

Chris Tetreault via llvm-dev

unread,
Sep 9, 2021, 1:10:08 PM9/9/21
to Serge Pavlov, Krzysztof Parzyszek, llvm...@lists.llvm.org, cfe...@lists.llvm.org

If the issue is that users want their asserts to fire, then they should be encouraged to only enable fast math in release builds.

Krzysztof Parzyszek via llvm-dev

unread,
Sep 9, 2021, 1:30:49 PM9/9/21
to Serge Pavlov, LLVM Developers, cfe...@lists.llvm.org

It is apparent simplicity. As the discussion in gcc mail list demonstrated (https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544641.html) this is actually an impromissing way. From a practical viewpoint it is also a bad solution as users cannot even check the assertions.

 

The intent here is that users can preserve the NaN behavior by annotating the code with either attributes or pragmas.  I don’t think that the linked discussion actually shows that the “no NaNs ever” interpretation is any worse than the “arithmetic operations do not produce NaNs”. A large part of was what happens to `__builtin_nan`, but if your code explicitly produces NaNs and you compile it finite-math, you shouldn’t expect anything meaningful.  IMO it’s much better to have a flag with a clarity of what it does, even if it leads to potentially unexpected results, than having an option whose description is open to interpretation.  At least the users will know what caused the issue, rather than wonder if they had found a compiler bug or not.

 

I agree that there may be issues with multiple definitions of functions compiled with different settings, although that is not strictly limited to FP flags.  There should be some unified approach to that, and I don’t know what the right thing to do it off the top of my head.

 

 

Argument of `isnan(x+x)` is a result of arithmetic operation. According to the meaning of -ffinite-math-only it cannot produce NaN. So this call can be optimized out. In the general case `isnan(x)` value may be, say, loaded from memory. Load is not an arithmetic operation, so nothing prevents from loading NaN. Optimizing the call out is dangerous in this case.

 

`x` is not a load, it’s an expression.  Also, even in the presence of NaNs, x+0 preserves the value type (i.e. normal/subnormal/infinity/NaN), except signaling NaNs perhaps.  I’m not sure whether we even consider signaling NaNs, so let’s forget them for a moment.  If x+0 is a NaN iff x is a NaN, then the compiler should be able to rewrite x -> x+0 regardless of any flags.  But then, given that x+0 is now “arithmetic”, isnan(x+0) could become `false`.  This is fundamentally counterintuitive.

 

Furthermore, if we had `a = isnan(x)`, we couldn’t fold it to `false`, but if we had `a = isnan(x); b = isnan(x+x)`, then we could fold both to `false`.  This is, again, unintuitive.

Serge Pavlov via llvm-dev

unread,
Sep 9, 2021, 1:34:54 PM9/9/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org
Let me describe a real life example.

There is a realtime program that processes float values from a huge array. Calculations do not produce NaNs and do not expect them. Using -ffinite-math-only substantially speeds up the program, so it is highly desirable to use it. The problem is that the array contains NaNs, they mark elements that should not be processed.

An obvious solution is to check an element for NaN, and if it is not, process it. Now there is no clean way to do so. Only workarounds, like using integer arithmetics. The function 'isnan' became useless. And there are many cases when users complain of this optimization.

Thanks,
--Serge

Serge Pavlov via llvm-dev

unread,
Sep 9, 2021, 1:52:24 PM9/9/21
to Krzysztof Parzyszek, LLVM Developers, cfe...@lists.llvm.org
On Fri, Sep 10, 2021 at 12:30 AM Krzysztof Parzyszek <kpar...@quicinc.com> wrote:

It is apparent simplicity. As the discussion in gcc mail list demonstrated (https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544641.html) this is actually an impromissing way. From a practical viewpoint it is also a bad solution as users cannot even check the assertions.

 

The intent here is that users can preserve the NaN behavior by annotating the code with either attributes or pragmas.  I don’t think that the linked discussion actually shows that the “no NaNs ever” interpretation is any worse than the “arithmetic operations do not produce NaNs”. A large part of was what happens to `__builtin_nan`, but if your code explicitly produces NaNs and you compile it finite-math, you shouldn’t expect anything meaningful.


The purpose of -ffinite-math-only was to make calculations faster by excluding corner cases when the user is sure that they do not occur. Why should it prohibit all operations on NaNs, like reading, writing and checking? Does prohibiting them make programs faster or otherwise better?
 

  IMO it’s much better to have a flag with a clarity of what it does, even if it leads to potentially unexpected results, than having an option whose description is open to interpretation.  At least the users will know what caused the issue, rather than wonder if they had found a compiler bug or not.


This solution seems overcomplicated, - a new flag with probably complex meaning. If the effect of -ffinite-math-only is limited to the cases where this restriction indeed gives benefits, it would be a solution without multiplying entities.

Chris Tetreault via llvm-dev

unread,
Sep 9, 2021, 2:03:17 PM9/9/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org

In this case, I think it’s perfectly reasonable to reinterpret_cast the floats to uint32_t, and then inspect the bit pattern. Since NaN is being used as a sentinel value, I assume it’s a known bit pattern, and not just any old NaN.

 

I think it’s fine that fast-math renders isnan useless. As far as I know, the C++ standard wasn’t written to account for compilers providing fast-math flags. fast-math is itself a workaround for “IEEE floats do not behave like actual real numbers”, so working around a workaround seems reasonable to me.

Serge Pavlov via llvm-dev

unread,
Sep 9, 2021, 2:28:02 PM9/9/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Fri, Sep 10, 2021 at 1:03 AM Chris Tetreault <ctet...@quicinc.com> wrote:

In this case, I think it’s perfectly reasonable to reinterpret_cast the floats to uint32_t, and then inspect the bit pattern. Since NaN is being used as a sentinel value, I assume it’s a known bit pattern, and not just any old NaN.


C standard defines a function to determine if a value is NaN. The fact that it does not work in this case demonstrates that the optimization is incorrect. Again, if isnan comes from libc implementation, it will work, but if it is provided by the compiler, it does not. Users expect consistent behavior.

If NaNs are not prohibited at all in -ffinite-math-only mode, isnan must work as specified in the standard.

 

I think it’s fine that fast-math renders isnan useless. As far as I know, the C++ standard wasn’t written to account for compilers providing fast-math flags. fast-math is itself a workaround for “IEEE floats do not behave like actual real numbers”, so working around a workaround seems reasonable to me.


I feel you are right with fast-math as a workaround, but the compiler is a practical tool and it must be convenient and suitable for a wide set of tasks. The situation when a user has to invent workarounds because some optimization changes semantics of a standard function is not good.

As for ffinite-math-only, it is actually more or less a safe mode. When we use integer division, we know that the divisor must not be zero. The case of ffinite-math-only is similar. 

Mehdi AMINI via llvm-dev

unread,
Sep 9, 2021, 2:30:24 PM9/9/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Thu, Sep 9, 2021 at 10:34 AM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:
Let me describe a real life example.

There is a realtime program that processes float values from a huge array. Calculations do not produce NaNs and do not expect them. Using -ffinite-math-only substantially speeds up the program, so it is highly desirable to use it. The problem is that the array contains NaNs, they mark elements that should not be processed.

An obvious solution is to check an element for NaN, and if it is not, process it. Now there is no clean way to do so. Only workarounds, like using integer arithmetics. The function 'isnan' became useless. And there are many cases when users complain of this optimization.

I personally would separate the "pre-processing" of the input in a compilation unit that isn't compiled with -ffinite-math-only and isolate the perf-critical routines to be compiled with this flag if needed (I'd also like a sanitizer to have a build mode that validate that no NaNs are ever seen in this routines).

In general, Krzysztof's reasoning in this thread makes sense to me, in particular in terms of being consistent with how we treat isnan(x) vs isnan(x+0) for example.

-- 
Mehdi

Serge Pavlov via llvm-dev

unread,
Sep 9, 2021, 2:55:47 PM9/9/21
to Mehdi AMINI, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Fri, Sep 10, 2021 at 1:29 AM Mehdi AMINI <joke...@gmail.com> wrote:


On Thu, Sep 9, 2021 at 10:34 AM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:
Let me describe a real life example.

There is a realtime program that processes float values from a huge array. Calculations do not produce NaNs and do not expect them. Using -ffinite-math-only substantially speeds up the program, so it is highly desirable to use it. The problem is that the array contains NaNs, they mark elements that should not be processed.

An obvious solution is to check an element for NaN, and if it is not, process it. Now there is no clean way to do so. Only workarounds, like using integer arithmetics. The function 'isnan' became useless. And there are many cases when users complain of this optimization.

I personally would separate the "pre-processing" of the input in a compilation unit that isn't compiled with -ffinite-math-only and isolate the perf-critical routines to be compiled with this flag if needed (I'd also like a sanitizer to have a build mode that validate that no NaNs are ever seen in this routines).

It could be a workaround. GCC supports '#pragma GCC optimize', which could be used to turn on and off -ffinite-math-only. In clang this pragma does not work, so only separate translation units with subsequent linking, which is not possible in some cases, like in ML kernels.


In general, Krzysztof's reasoning in this thread makes sense to me, in particular in terms of being consistent with how we treat isnan(x) vs isnan(x+0) for example.


The key point here is what guarantees the user provides to the compiler when they specify -ffinite-math-only. If "NaN never cannot be seen" then indeed, isnan may be optimized out. If "NaNs do not occur in arithmetic operations", then 'isnan' must be kept unless we know for sure that its argument cannot be a NaN. The choice should be based on practical needs IMHO. The second approach is more flexible and enables more use cases.

Cranmer, Joshua via llvm-dev

unread,
Sep 9, 2021, 4:39:01 PM9/9/21
to Mehdi AMINI, Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org

(Speaking only for myself here, and mostly as someone who doesn’t typically write floating-point-heavy code).

 

The root issue we have here is that, as with many compiler extensions, fast-math flags ends up creating a vaguely-defined variant of the C specification governed by the “obvious” semantics, and is the case with “obvious” semantics, there are several different “obvious” results.

 

Given the standard C taste for undefined behavior, it would seem to me that the most natural definition of -ffinite-math-only would be to say that any operation that produces NaN or infinity results is undefined behavior, or produces a poison value using LLVM’s somewhat tighter definition here [1]. This notably doesn’t give a clear answer on what to do with floating-point operations that don’t produce floating-point results (e.g., casts, comparison operators), and the volume of discussions on this point is I think indicative that there are multiple reasonable options here. Personally, I find the extension of the UB to cases that consume but do not produce floating-point values to be the most natural option.

 

It’s also the case that many users don’t like undefined behavior as a concept, in large part because it can be very difficult to work around in a few cases where it is desired to explicitly override the undefined behavior. For some of the more basic integer UB, clang already provides builtin overflow checking macros to handle the I-want-to-check-if-it-overflowed-without-UB case, for example. And if fast math flags are to create UB, then similar functionality to override the floating-point UB ought to be provided. Already, C provides a mechanism to twiddle floating-point behavior on a per-scope basis (e.g., #pragma STDC FENV_ACCESS, CX_LIMITED_RANGE, FP_CONTRACT). LLVM already supports these flags on a per-instruction basis, so it really shouldn’t be very difficult to have Clang support pragmas to twiddle fast-math flags like the existing C pragmas. And in this model, the -ffast-math and related flags are doing nothing more than setting the default values of these pragmas.

 

In that vein, I can imagine a user writing a program that would look something like this:

 

int some_hard_math_kernel(float *inputs, float *outputs, int N) {

  {

    #pragma clang fast_math off

    for (int i = 0; i < N; i++) {

       if (isinf(inputs[i]) || isnan(inputs[i]))
         return ILLEGAL_ARGUMENT;

    }

  }

  #pragma clang fast_math on

  // Do fancy math here…

  // and if we see isnan(x) here, even if it’s in a library routine [compiled with -ffast-math],

  // or maybe implied by some operation the compiler understands [say, complex multiplication]

  // it is optimized to false.

  return SUCCESS;
}

 

I can clearly see use cases where the programmer might wish to have the optimizer eliminate any isnan calls that are generated when -ffast-math is used, but like other UB, I think it is extremely beneficial to provide some way to explicitly opt-out of UB on a case-by-case basis.

 

I would even go so far as to suggest that maybe the C standards committee should discuss how to handle at least the nsz/nnan/ninf parts of fast-math flags, given that very similar concepts seem to exist in all of the major C/C++ compilers.

 

[1] I fully expect any user who is knowledgeable about poison in LLVM—which admittedly is a fairly expert user—would expect poison to kick in most of the time C or C++ provides for undefined behavior, and potentially to rely on that expectation.

Chris Tetreault via llvm-dev

unread,
Sep 9, 2021, 4:55:16 PM9/9/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org

The point I was trying to make regarding the C++ standard is that fast-math is a non-standard language extension. If you enable it, you should expect the compiler to diverge from the language standard. I’m sure there’s precedent for this. If I write #pragma once at the top of my header, and include it twice back to back, the preprocessor won’t paste my header twice. Should #pragma once be removed because it breaks #include?

 

Now, you have a real-world example that uses NaN as a sentinel value. In your case, it would be nice if the compiler worked as you suggest. Now, suppose I have a “safe matrix multiply”:

 

```

std::optional<MyMatrixT> safeMul(const MyMatrixT & lhs, const MyMatrixT & rhs) {

  for (int i = 0; i < lhs.rows; ++i) {

    for (int j = 0; j < lhs.cols; ++j) {

      if (isnan(lhs[i][j])) {

        return {};

      }

    }

  }

  for (int i = 0; i < rhs.rows; ++i) {

    for (int j = 0; j < rhs.cols; ++j) {

      if (isnan(rhs[i][j])) {

        return {};

      }

    }

 }

 

  // do the multiply

}

```

 

In this case, if isnan(x) can be constant folded to false with fast-math enabled, then these two loops can be completely eliminated since they are empty and do nothing. If MyMatrixT is a 100 x 100 matrix, and/or safeMul is called in a hot loop, this could be huge. What should I do instead here?

 

Really, it would be much more consistent if we apply the clang documentation for fast-math “Operands to floating-point operations are not equal to NaN and Inf” literally, and not actually implement “Operands to floating-point operations are not equal to NaN and Inf, except in the case of isnan(), but only if the argument to isnan() is a value stored in a variable and not an expression”. As far as using isnan from the standard library compiled without fast-math vs a compiler builtin, I don’t think this is an issue. Really, enabling fast-math is basically telling the compiler “My code has no NaNs. I won’t try to do anything with them, and you should optimize assuming they aren’t there”. If a developer does their part, why should it matter to them that isnan() might work?

 

Thanks,

   Chris Tetreault

James Y Knight via llvm-dev

unread,
Sep 9, 2021, 4:57:12 PM9/9/21
to Sanjay Patel, LLVM Developers, cfe...@lists.llvm.org
On Thu, Sep 9, 2021 at 11:02 AM Sanjay Patel <spa...@rotateright.com> wrote:
Not sure which way to go, but I agree that we need to improve the docs/user experience either way.
Let's try to iron this out with an example (this is based on https://llvm.org/PR51775 ):

#include <math.h>
#include <stdlib.h>
int main() {
    const double d = strtod("1E+1000000", NULL);

This should be covered by the "general function call" rule, is therefore unaffected by -ffinite-math-only, and may validly return inf.

    return d == HUGE_VAL;

For this comparison, however, the compiler can assume its operands are always finite. Thus, this comparison results in a poison value (in LLVM IR terminology).

What should this program return when compiled with -ffinite-math-only? Should this trigger a clang warning?

We could indeed emit a diagnostic (when -ffinite-math-only is in effect) to let you know that you are doing something guaranteed to be incorrect, by using a manifest constant INF, where you promised that you would not.
 
The proposed documentation text isn't clear to me. Should clang apply "nnan ninf" to the IR call for "strtod"?
"strtod" is not in the enumerated list of functions where we would block fast-math-flags, but it is a standard lib call, so "nnan ninf" would seem to apply...but we also don't want "-ffinite-math-only" to alter the ability to return an INF from a "general function call"?


The strtod function should be allowed to return inf/nan. There's two ways we could accomplish that:
1. We could specify in LLVM that nnan/ninf are meaningless to most function calls. In this case, Clang may continue emitting it everywhere, as is done today, including on strtod, but it would have no impact.
2. We could specify that clang should not emit nnan/ninf except on certain calls. In this case, Clang would not emit it on strtod.

I haven't thought about which option would be better. I've been trying to discuss the desired C-facing semantics first.
  

Richard Smith via llvm-dev

unread,
Sep 9, 2021, 8:59:20 PM9/9/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Thu, 9 Sept 2021 at 13:55, Chris Tetreault via llvm-dev <llvm...@lists.llvm.org> wrote:

The point I was trying to make regarding the C++ standard is that fast-math is a non-standard language extension.


-ffinite-math-only does not need to be a non-standard language extension. Neither C nor C++ requires that floating-point types can represent infinity or NaN, and we could define this flag as meaning that there are (notionally) simply no such values in the relevant types. Of course, that's not actually consistent with what we currently do, nor with what GCC does.

Would it be reasonable to treat operations on Inf and NaN values as UB in this mode only if the same operation on a signaling NaN might signal? (Approximately, that'd mean we imagine these non-finite value encodings all encode sNaNs that are UB if they would signal.) That means the operations that ISO 60559 defines as non-computational or quiet-computational would be permitted to receive NaN and Inf as input and produce them as output, but that other computational operations would not.

Per ISO 60559, the quiet-computational operations that I think are relevant to us are: copy, negate, abs, copySign, and conversions between encoding (eg, bitcast). The non-computational operations that I think are relevant to us are classification functions (including isNaN).

James Y Knight via llvm-dev

unread,
Sep 10, 2021, 10:30:08 AM9/10/21
to Richard Smith, llvm-dev, Clang Dev
On Thu, Sep 9, 2021, 8:59 PM Richard Smith via llvm-dev <llvm...@lists.llvm.org> wrote:
Would it be reasonable to treat operations on Inf and NaN values as UB in this mode only if the same operation on a signaling NaN might signal? (Approximately, that'd mean we imagine these non-finite value encodings all encode sNaNs that are UB if they would signal.) That means the operations that ISO 60559 defines as non-computational or quiet-computational would be permitted to receive NaN and Inf as input and produce them as output, but that other computational operations would not.

Per ISO 60559, the quiet-computational operations that I think are relevant to us are: copy, negate, abs, copySign, and conversions between encoding (eg, bitcast). The non-computational operations that I think are relevant to us are classification functions (including isNaN).

I'm in favor. (Perhaps unsurprisingly, as this is precisely the proposal I made earlier, worded slightly differently. :)

Serge Pavlov via llvm-dev

unread,
Sep 10, 2021, 1:42:32 PM9/10/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org
If clang does not remove `__builtin_isnan` in `-ffinite-math-only` mode and a user wants calls to `isnan` be optimized out, they can do it in a literally couple of lines:

#undef isnan
#define isnan(x) false

If clang optimizes out `__builtin_isnan` and a user wants to check if some float is NaN, they have no appropriate way for that, only hacks and kludges.

Approach that -ffast-math-only means that "there are no NaNs" is too rigid, it prevents several coding techniques, does not provide additional optimization possibilities and provokes user complaints.

Thanks,
--Serge


On Fri, Sep 10, 2021 at 11:28 PM Chris Tetreault <ctet...@quicinc.com> wrote:

I’m not super knowledgeable on the actual implementation of floating point math in clang, but on the surface this seems fine. My position is that we should provide no guarantees as to the behavior of code with NaN or infinity if fast-math is enabled. We can go with this behavior, but we shouldn’t tell users that they can rely on this behavior. Clang should have maximal freedom to optimize floating point math with fast-math, and any constraint we place potentially results in missed opportunities. Similarly we should feel free to change this implementation in the future, the goal not being stability for users who chose to rely on our implementation details. If users value reproducibility, they should not be using fast math.

 

The only thing I think we should guarantee is that casts work. I should be able to load some bytes from disk, cast the char array to a float array, and any NaNs that I loaded from disk should not be clobbered. After that, if I should be able to cast an element of my float array back to another type and inspect the bit pattern (assuming I did not transform that element in the array in any other way after casting it from char) to support use cases like Serge’s. Any other operation should be fair game.

 

Thanks,

   Chris Tetreault

Chris Tetreault via llvm-dev

unread,
Sep 10, 2021, 2:25:11 PM9/10/21
to Richard Smith, llvm...@lists.llvm.org, cfe...@lists.llvm.org

I’m not super knowledgeable on the actual implementation of floating point math in clang, but on the surface this seems fine. My position is that we should provide no guarantees as to the behavior of code with NaN or infinity if fast-math is enabled. We can go with this behavior, but we shouldn’t tell users that they can rely on this behavior. Clang should have maximal freedom to optimize floating point math with fast-math, and any constraint we place potentially results in missed opportunities. Similarly we should feel free to change this implementation in the future, the goal not being stability for users who chose to rely on our implementation details. If users value reproducibility, they should not be using fast math.

 

The only thing I think we should guarantee is that casts work. I should be able to load some bytes from disk, cast the char array to a float array, and any NaNs that I loaded from disk should not be clobbered. After that, if I should be able to cast an element of my float array back to another type and inspect the bit pattern (assuming I did not transform that element in the array in any other way after casting it from char) to support use cases like Serge’s. Any other operation should be fair game.

 

Thanks,

   Chris Tetreault

 

Chris Tetreault via llvm-dev

unread,
Sep 10, 2021, 2:25:34 PM9/10/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org

I would argue that #undef’ing a macro provided by the compiler is a much worse kludge that static casting your float to an unsigned int. Additionally, you have to re define isnan to whatever it was after your function (let it pollute unrelated code that possibly isn’t even being compiled with fast math), which can’t be done portably as far as I know. Additionally, this requires you to be the author of safeMul. What if it’s in a dependency for which you don’t have the source? At that point, your only recourse is to open an issue with libProprietaryMatrixMath and hope your org is paying them enough to fast track a fix.

Serge Pavlov via llvm-dev

unread,
Sep 10, 2021, 2:26:43 PM9/10/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org
It should not be done in headers of course. Redefinition of this macro in the source file which is compiled with -ffinite-math-only is free from the described drawbacks. Besides, the macro `isnan` is defined by libc, not compiler and IIRC it is defined as macro to allow such manipulations.

Influence of libc on behavior of `isnan` in -ffinite-math-only is also an argument against "there are no NaNs". It causes inconsistency in the behavior. Libc can provide its own implementation, which does not rely on compiler `__builtin_isnan` and user code that uses `isnan` would work. But at some point configuration script changes or libc changed the macro and your code works wrong, as it happened after commit 767eadd78 in llvm libcxx project. Keeping `isnan` would make changes in libc less harmful.

Thanks,
--Serge

Joerg Sonnenberger via llvm-dev

unread,
Sep 10, 2021, 2:39:34 PM9/10/21
to cfe...@lists.llvm.org, llvm...@lists.llvm.org
On Fri, Sep 10, 2021 at 04:28:31PM +0000, Chris Tetreault via cfe-dev wrote:
> I’m not super knowledgeable on the actual implementation of floating
> point math in clang, but on the surface this seems fine. My position
> is that we should provide no guarantees as to the behavior of code with
> NaN or infinity if fast-math is enabled. We can go with this behavior,
> but we shouldn’t tell users that they can rely on this behavior. Clang
> should have maximal freedom to optimize floating point math with
> fast-math, and any constraint we place potentially results in missed
> opportunities. Similarly we should feel free to change this
> implementation in the future, the goal not being stability for users
> who chose to rely on our implementation details. If users value
> reproducibility, they should not be using fast math.

Without trying to be too harsh, this is the bad justification GCC has
used for years for exploiting all kinds of UB and implementation-defined
behavior in the name of performance. As has been shown over and over
again, the breakage is rarely matched by equivalent performance gains.
So once more, do we even have proof that significant code exists where
isnan and friends are used in a performance critical code path? I would
find that quite surprising and more an argument for throwing a compile
error...

Joerg

Chris Tetreault via llvm-dev

unread,
Sep 10, 2021, 3:40:07 PM9/10/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org

The problem is that math code is often templated, so `template <typename T>  MyMatrixT<T> safeMul(const MyMatrixT<T> & lhs …` is going to be in a header.

 

Regardless, my position isn’t “there is no NaN”. My position is “you cannot count on operations on NaN working”. Just like sometimes you can dereference a pointer after it is free’d, but you should not count on this working. If the compiler I’m using emits a call to a library function instead of providing a macro, and this results in isnan actually computing if x is NaN, then so be it. But if the compiler provides a macro that evaluates to false under fast-math, then the two loops in safeMul can be optimized. Either way, as a developer, I know that I turned on fast-math, and I write code accordingly.

 

I think working around these sorts of issues is something that C and C++ developers are used to. These sorts of “inconsistent” between compilers behaviors is something we accept because we know it comes with improved performance. In this case, the fix is easy, so I don’t think this corner case is worth supporting. Especially when the fix is also just one line:

 

```

#define myIsNan(x) (reinterpret_cast<uint32_t>(x) == THE_BIT_PATTERN_OF_MY_SENTINEL_NAN)

```

 

I would probably call the macro something else like `shouldProcessElement`.

 

Thanks,

   Chris Tetreault

 

From: Serge Pavlov <sepa...@gmail.com>
Sent: Friday, September 10, 2021 11:26 AM
To: Chris Tetreault <ctet...@quicinc.com>
Cc: Richard Smith <ric...@metafoo.co.uk>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

 

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

It should not be done in headers of course. Redefinition of this macro in the source file which is compiled with -ffinite-math-only is free from the described drawbacks. Besides, the macro `isnan` is defined by libc, not compiler and IIRC it is defined as macro to allow such manipulations.

Serge Pavlov via llvm-dev

unread,
Sep 11, 2021, 12:20:53 AM9/11/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctet...@quicinc.com> wrote:
The problem is that math code is often templated, so `template <typename T>  MyMatrixT<T> safeMul(const MyMatrixT<T> & lhs …` is going to be in a header.

No problem, the user can write:
```
#ifdef __FAST_MATH__
#undef isnan
#define isnan(x) false
#endif
```
and put it somewhere in the headers.

On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctet...@quicinc.com> wrote:
Regardless, my position isn’t “there is no NaN”. My position is “you cannot count on operations on NaN working”. 

Exactly. Attempts to express the condition of -ffast-math as restrictions on types are not fruitful. I think it is the reason why GCC documentation does not use simple and clear "there is no NaN" but prefers more complicated wording about arithmetic.

On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctet...@quicinc.com> wrote:
I think working around these sorts of issues is something that C and C++ developers are used to. These sorts of “inconsistent” between compilers behaviors is something we accept because we know it comes with improved performance. In this case, the fix is easy, so I don’t think this corner case is worth supporting. Especially when the fix is also just one line:
```
#define myIsNan(x) (reinterpret_cast<uint32_t>(x) == THE_BIT_PATTERN_OF_MY_SENTINEL_NAN)
```

It won't work in this way. If `x == 5.0`, then `reinterpret_cast<uint32_t>(x) == 5`. What you need there is a bitcast. Standard C does not have such. To emulate it a reinterpret_cast of memory can be used: `*reinterpret_cast<int *>(&x)`. Another way is to use a union. Both these solutions require operations with memory, which is not good for performance, especially on GPU and ML cores. Of course, a smart compiler can eliminate memory operation, but it does not have to do it always, as it is only optimization. Moving a value between float and integer pipelines also may incur a performance penalty. At the same time this check often may be done with a single instruction.

Thanks,
--Serge

Serge Pavlov via llvm-dev

unread,
Sep 13, 2021, 2:02:56 AM9/13/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org
I was also wrong about reinterpret_cast, sorry.  `reinterpret_cast<uint32_t>(float)` is an invalid construct. The working construct is `reinterpret_cast<uint32_t&>(x)`. It however possesses the same drawback, it requires `x` be in memory.

Thanks,
--Serge

James Y Knight via llvm-dev

unread,
Sep 13, 2021, 8:00:42 AM9/13/21
to Serge Pavlov, llvm-dev, Clang Dev
On Mon, Sep 13, 2021, 2:02 AM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:
The working construct is `reinterpret_cast<uint32_t&>(x)`. It however possesses the same drawback, it requires `x` be in memory.

We're getting rather far afield of the thread topic here, but .. that is UB, don't do that.

Instead, always memcpy, e.g.
uint32_t y;
memcpy(&y, &flo, sizeof(uint32_t));


This has effectively no runtime overhead, the compiler is extremely good at deleting calls to memcpy when it has a constant smallish size. And remember that every local variable started out in memory. Only through optimizations does the memory location and the loads/stores for every access get eliminated.

Serge Pavlov via llvm-dev

unread,
Sep 13, 2021, 9:50:24 AM9/13/21
to James Y Knight, llvm-dev, Clang Dev
Let's weigh the alternatives.

We are discussing two approaches for handling `isnan` and similar functions in -ffinite-math-only mode:
1. "Old" behavior: "with -ffinite-math-only you are telling that there are no NaNs", so `isnan` may be optimized to `false`.
2. "New" behavior: with -ffinite-math-only you are telling that the operands of arithmetic operations are not NaNs but otherwise NaN may be used. As `isnan` is not an arithmetic operation, it should be preserved.

Advantages of the "old" behavior are:
- " it’s intuitively clear".
- It is close to the GCC current behavior.

Advantages of the "new" behavior are:
- `isnan` is still available to the user, which allows, for instance, validation of working data or selection between fast and slow path.
- NaN is available and may be used, for instance, as sentinel.
- Consistency between compiler and library implementations, both would behave similarly.
- In most real cases the "old" behavior can be easily obtained by redefinition of `isnan`.
- It is free from issues like "what returns numeric_limits<float>::has_quite_NaN()?".

It is unlikely that "old" behavior gives noticeable performance gain. Anyway, `isnan` may be redefined to `false` if it actually does.

Intuitive clarity of the "old" way is questionable for users, because it is not clear why functions like `isnan` silently disappeared or what body should have specializations of `numeric_limit` methods.

There are cases when checking for NaN is needed even in -ffinite-math-only mode. To make it, users have to use workarounds like doing integer arithmetic on float values, which reduce clarity of code, make it unportable and slower.

Are there any other advantages/disadvantages of these approaches?

Thanks,
--Serge

Krzysztof Parzyszek via llvm-dev

unread,
Sep 13, 2021, 10:04:12 AM9/13/21
to Serge Pavlov, James Y Knight, llvm-dev, cfe...@lists.llvm.org

If the compiler provides “isnan”, the user can’t redefine it.  Redefining/undefining any function or a macro provided by a compiler is UB.

 

The “old” behavior can be tuned with #pragmas to restore the functionality of NaNs where needed.

The “old” behavior doesn’t have a problem with “has_nan”---it returns “true”.  What other issues are there?

 

--

Krzysztof Parzyszek  kpar...@quicinc.com   AI tools development

 

From: cfe-dev <cfe-dev...@lists.llvm.org> On Behalf Of Serge Pavlov via cfe-dev
Sent: Monday, September 13, 2021 8:50 AM
To: James Y Knight <jykn...@google.com>

Cc: llvm-dev <llvm...@lists.llvm.org>; Clang Dev <cfe...@lists.llvm.org>
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?

 

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

Let's weigh the alternatives.

 

Thanks,
--Serge

 

Serge Pavlov via llvm-dev

unread,
Sep 13, 2021, 10:31:52 AM9/13/21
to Krzysztof Parzyszek, llvm-dev, cfe...@lists.llvm.org
`isnan` does not begin with an underscore, so it is not a reserved identifier. Why is its redefinition an UB?

Thanks,
--Serge

Krzysztof Parzyszek via llvm-dev

unread,
Sep 13, 2021, 11:10:12 AM9/13/21
to Serge Pavlov, llvm-dev, cfe...@lists.llvm.org

The standard says so, but I can’t find the corresponding passage in the draft...

 

--

Krzysztof Parzyszek  kpar...@quicinc.com   AI tools development

 

From: Serge Pavlov <sepa...@gmail.com>
Sent: Monday, September 13, 2021 9:31 AM
To: Krzysztof Parzyszek <kpar...@quicinc.com>

Arthur O'Dwyer via llvm-dev

unread,
Sep 13, 2021, 11:28:19 AM9/13/21
to Krzysztof Parzyszek, Serge Pavlov, James Y Knight, llvm-dev, cfe...@lists.llvm.org
On Mon, Sep 13, 2021 at 10:09 AM Krzysztof Parzyszek via cfe-dev <cfe...@lists.llvm.org> wrote:
From: Serge Pavlov <sepa...@gmail.com
`isnan` does not begin with an underscore, so it is not a reserved identifier. Why is its redefinition an UB?

The standard says so, but I can’t find the corresponding passage in the draft...


I don't know about C, but in C++ redefining any library name as a macro is forbidden by

Btw, I don't think this thread has paid enough attention to Richard Smith's suggestion: that in fast-math mode, the implementation should
- treat all quiet NaNs as if they are signaling NaNs
- treat all "signals" as if they are UB produce an unspecified value
So, any floating-point operations that IEEE754 guarantees will work silently even on signaling NaNs, must continue to work on any kind of NaN in fast-math mode. But any operation that is allowed to signal, is therefore allowed to give wrong results if you feed it any kind of NaN in fast-math mode. In this model, we don't talk about specific mathematical identities like "x+0 == x". Instead, we say "If !isnan(x), then computationally x+0 == x; and if isnan(x), then x+0 is allowed to signal and therefore in fast-math mode we can make its result come out to any value we like. Therefore, if the optimizer sometimes wants to pretend that QNAN + 0 == QNAN, that's perfectly acceptable."

Notice that you cannot make "signaling" into actual UB; you must make it produce an unspecified value. If you make it UB, then the compiler will happily optimize

    {
        if (!isnan(someGlobal)) puts("it's not nan");  // #1
        double x = someGlobal;
        x += 1;  // This is a signaling operation
    }

into

    {
        puts("it's not nan");  // because if it were NaN on line #1, then either we'd hit that signaling operation, or we'd have a data race
    }

But if you just make "signaling" operations produce unspecified values when given NaN, then I think everything works fine and you end up with behavior that's pretty darn close to what Serge is advocating for with his "New" behavior.

my $.02,
–Arthur

Chris Tetreault via llvm-dev

unread,
Sep 13, 2021, 12:46:28 PM9/13/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org

Honestly, we can do this until the end of time. I think we both agree, that for either scheme, there exists workarounds. The question is which workarounds are more palatable, which is a matter of opinion. I think we’ve come to an impasse, so let me just state that my opinion on the question “Should isnan be optimized out in fast-math mode?” is “Yes”, which is what you asked to get in your original message. I think that the implementation of fast-math will be cleaner if we don’t special case a bunch of random constructs in order to do what the user meant instead of what they said. I think fast-math is a notorious footgun, and any attempts to mitigate this will only reduce the effectiveness of the tool, while not really improving the user experience.

 

As a user, if I read that:

 

```

if (isnan(x)) {

```

 

… is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:

 

```

if (isnan(x + 0)) {

```

 

… does not also work. I’m going to open a bug and complain, and the slide down the slippery slope will continue. You and I understand the difference, and the technical reason why `isnan(x)` is supported but `isnan(x + 0)` isn’t, but Joe Coder just trying to figure out why he’s got NaN in his matrices despite his careful NaN handling code. Joe is not a compiler expert, and on the face of it, it seems like a silly limitation. This will never end until fast-math is gutted.

 

Thanks,

   Chris Tetreault

 

From: Serge Pavlov <sepa...@gmail.com>
Sent: Friday, September 10, 2021 9:21 PM
To: Chris Tetreault <ctet...@quicinc.com>
Cc: Richard Smith <ric...@metafoo.co.uk>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

 

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

On Sat, Sep 11, 2021 at 2:39 AM Chris Tetreault <ctet...@quicinc.com> wrote:

Aaron Ballman via llvm-dev

unread,
Sep 13, 2021, 1:05:41 PM9/13/21
to llvm...@lists.llvm.org, cfe...@lists.llvm.org

Krzysztof Parzyszek via llvm-dev

unread,
Sep 13, 2021, 1:06:35 PM9/13/21
to Chris Tetreault, Serge Pavlov, llvm-dev, cfe...@lists.llvm.org

my opinion on the question “Should isnan be optimized out in fast-math mode?” is “Yes” […]

 

+1

 

--

Krzysztof Parzyszek  kpar...@quicinc.com   AI tools development

 

From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Chris Tetreault via llvm-dev
Sent: Monday, September 13, 2021 11:46 AM
To: Serge Pavlov <sepa...@gmail.com>

Aaron Ballman via llvm-dev

unread,
Sep 13, 2021, 1:09:07 PM9/13/21
to llvm...@lists.llvm.org, cfe...@lists.llvm.org
Sorry for fat-fingering my previous attempt at a response.

If it would be helpful, I am happy to put people in touch with the
WG14 C Floating Point study group. Then these questions can be asked
of a wider audience of compiler vendors to see if there's some common
thoughts on the subject, even if WG14 won't have an "official" stance
because the behavior is wholly a matter of QoI. I'd request that we
nominate one person from the LLVM community to hold that discussion
and report back so we don't throw the study group into the deep end of
our pool (and given my lack of domain knowledge, I'd appreciate it if
that someone was not me).

~Aaron

On Mon, Sep 13, 2021 at 12:46 PM Chris Tetreault via cfe-dev
<cfe...@lists.llvm.org> wrote:
>

Serge Pavlov via llvm-dev

unread,
Sep 13, 2021, 1:50:41 PM9/13/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org
What I'd like to emphasize is that this option was introduced not for logical consistency, but for practical needs. It allows users to get faster code and this is why it is an important option. We are discussing two ways, which are not equivalent. If `isnan` is unconditionally optimized out, users that need it have to use workarounds, which leads to loss of portability and performance. If `isnan` is preserved, no workarounds are required, simple redefinition results in the "old" behavior. It seems to me that implementation of this option should pursue practical needs and should enable most use cases. The current implementation does not fit user needs, as it follows from the complaints in gcc bug tracker and forums. We could make clang more user-friendly if this option would be implemented slightly differently than now.

On Mon, Sep 13, 2021 at 11:46 PM Chris Tetreault <ctet...@quicinc.com> wrote:
… is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:

You are right, this was a bad idea. Compiler may optimize out `isnan` but only when it deduces that the value cannot be NaN, but not due to the user's promise. It is especially important for `isinf`. Addition of two finite values may produce infinity and there is no universal way to predict it. It is probably not an issue for types like float or double, but ML cores use halfs or even minifloats, where overflow is much more probable. If in the code:
```
float r = a + b;
if (isinf(r)) {...
```
`isinf` were optimized out just because -ffinite-math-only is in effect, the user cannot check if overflow did not occur. This contrasts with the definition of `ninf` in LLVM IR:

"No Infs - Allow optimizations to assume the arguments and result are not +/-Inf."

It is possible to ensure that arguments are not Infs but for the result it is much more difficult to guarantee.

Thanks,
--Serge

Serge Pavlov via llvm-dev

unread,
Sep 14, 2021, 8:04:47 AM9/14/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Mon, Sep 13, 2021 at 9:03 PM Krzysztof Parzyszek <kpar...@quicinc.com> wrote:

If the compiler provides “isnan”, the user can’t redefine it.  Redefining/undefining any function or a macro provided by a compiler is UB.

 

Actually it does not matter. This is needed only to emulate the "old" behavior, which itself breaks the standard.
 

The “old” behavior can be tuned with #pragmas to restore the functionality of NaNs where needed.


Did you mean `#pragma GCC optimize("ffinite-math-only")`? Clang does not support it.
 

The “old” behavior doesn’t have a problem with “has_nan”---it returns “true”.  What other issues are there?


 If `has_nan` returns "true", it means that the explanation "there are no NaNs" does not work anymore and something more complex is needed to explain the effect of the option. In this case it is difficult to say that this approach is "intuitively clear".

On Mon, Sep 13, 2021 at 10:28 PM Arthur O'Dwyer <arthur....@gmail.com> wrote:

Btw, I don't think this thread has paid enough attention to Richard Smith's suggestion:

I can only subscribe to James Y Knight's opinion. Indeed, it can be a good criterion of which operations should work in finite-math-only mode and which can not work. The only thing which I worry about is the possibility of checking the operation result for infinity (and nan for symmetry). But the suggested criterion is formulated in terms of arguments, not results, so it must allow such checks.
 
Thanks,
--Serge

Krzysztof Parzyszek via llvm-dev

unread,
Sep 14, 2021, 9:21:41 AM9/14/21
to Serge Pavlov, Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org

If `has_nan` returns "true", it means that the explanation "there are no NaNs" does not work anymore and something more complex is needed to explain the effect of the option. In this case it is difficult to say that this approach is "intuitively clear".

 

If your program has “x = *p”, it means that at this point p is never a null pointer.  Does this imply that the type of p can no longer represent a null pointer?

 

--

Krzysztof Parzyszek  kpar...@quicinc.com   AI tools development

 

Serge Pavlov via llvm-dev

unread,
Sep 14, 2021, 10:12:40 AM9/14/21
to Aaron Ballman, llvm...@lists.llvm.org, cfe...@lists.llvm.org
If we cannot come to a solution, it could be a good chance to get things moving.
It is not clear to me how this topic can be interesting for standardization and would the people discuss it.
But if there is such a possibility, I would be happy to use it.

Thanks,
--Serge

Serge Pavlov via llvm-dev

unread,
Sep 14, 2021, 10:22:33 AM9/14/21
to Krzysztof Parzyszek, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Tue, Sep 14, 2021 at 8:21 PM Krzysztof Parzyszek <kpar...@quicinc.com> wrote:

If `has_nan` returns "true", it means that the explanation "there are no NaNs" does not work anymore and something more complex is needed to explain the effect of the option. In this case it is difficult to say that this approach is "intuitively clear".

 

If your program has “x = *p”, it means that at this point p is never a null pointer.  Does this imply that the type of p can no longer represent a null pointer?


Good example! If you use integer division `r = a / b`, you promise that `b` is not zero. It however does not mean  that preceding check `b == 0` may be optimized to `false`.

The statement "there are no NaNs" means that properties of type `float` are modified so that NaN is no longer an allowed value of it. In this case it is allowed to optimize out `isnan`. If the guarantee is only that NaN cannot be an argument of an arithmetic operation, NaN is still a valid value of `float` and `isnan` cannot be replaced with `false`.

Krzysztof Parzyszek via llvm-dev

unread,
Sep 14, 2021, 11:01:48 AM9/14/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org

Good example! If you use integer division `r = a / b`, you promise that `b` is not zero. It however does not mean  that preceding check `b == 0` may be optimized to `false`.

 

The statement "there are no NaNs" means that properties of type `float` are modified so that NaN is no longer an allowed value of it. In this case it is allowed to optimize out `isnan`. If the guarantee is only that NaN cannot be an argument of an arithmetic operation, NaN is still a valid value of `float` and `isnan` cannot be replaced with `false`.

 

Granted, the statement “there are no NaNs” is somewhat ambiguous, but taken to mean “NaNs will not happen at runtime” it would allow you to remove the NaN equivalent of “b == 0” without changing the meaning of “float”.  This is the interpretation I’m arguing for.

 

--

Krzysztof Parzyszek  kpar...@quicinc.com   AI tools development

 

From: Serge Pavlov <sepa...@gmail.com>
Sent: Tuesday, September 14, 2021 9:22 AM
To: Krzysztof Parzyszek <kpar...@quicinc.com>

Arthur O'Dwyer via llvm-dev

unread,
Sep 14, 2021, 11:15:36 AM9/14/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Tue, Sep 14, 2021 at 9:22 AM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:
On Tue, Sep 14, 2021 at 8:21 PM Krzysztof Parzyszek <kpar...@quicinc.com> wrote:

If `has_nan` returns "true", it means that the explanation "there are no NaNs" does not work anymore and something more complex is needed to explain the effect of the option. In this case it is difficult to say that this approach is "intuitively clear".

 

If your program has “x = *p”, it means that at this point p is never a null pointer.  Does this imply that the type of p can no longer represent a null pointer?


Good example! If you use integer division `r = a / b`, you promise that `b` is not zero. It however does not mean  that preceding check `b == 0` may be optimized to `false`.

In C and C++, it actually does mean that, although of the compilers I just tested on Godbolt, only MSVC seems to take advantage of that permission.

The question of whether it is acceptable to treat as equivalent the statements "p is known to be dereferenced in all successors of B" and "p is known to be non-null in B," was discussed extensively about 20 years ago, and then again 12 years ago when it bit someone in the Linux kernel:

On Mon, Sep 13, 2021 at 10:28 PM Arthur O'Dwyer <arthur....@gmail.com> wrote: 

Btw, I don't think this thread has paid enough attention to Richard Smith's suggestion:

I can only subscribe to James Y Knight's opinion. Indeed, it can be a good criterion of which operations should work in finite-math-only mode and which can not work. The only thing which I worry about is the possibility of checking the operation result for infinity (and nan for symmetry). But the suggested criterion is formulated in terms of arguments, not results, so it must allow such checks.


What is the opinion to which you subscribe?

Anyway, Richard's "quiet is signaling and signals are unspecified values" is really the only way out of the difficulty, as far as compiler people are concerned. You two (Serge and Krzysztof) can keep talking past each other at the application level, but the compiler people are going to have to do something in the code eventually, and that something is going to have to be expressed in terms similar to what Richard and I have been saying, because these are the terms that the compiler understands.

Thanks,
Arthur

Serge Pavlov via llvm-dev

unread,
Sep 14, 2021, 12:36:36 PM9/14/21
to Arthur O'Dwyer, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Tue, Sep 14, 2021 at 10:15 PM Arthur O'Dwyer <arthur....@gmail.com> wrote:
On Tue, Sep 14, 2021 at 9:22 AM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:
On Tue, Sep 14, 2021 at 8:21 PM Krzysztof Parzyszek <kpar...@quicinc.com> wrote:

If `has_nan` returns "true", it means that the explanation "there are no NaNs" does not work anymore and something more complex is needed to explain the effect of the option. In this case it is difficult to say that this approach is "intuitively clear".

 

If your program has “x = *p”, it means that at this point p is never a null pointer.  Does this imply that the type of p can no longer represent a null pointer?


Good example! If you use integer division `r = a / b`, you promise that `b` is not zero. It however does not mean  that preceding check `b == 0` may be optimized to `false`.

In C and C++, it actually does mean that, although of the compilers I just tested on Godbolt, only MSVC seems to take advantage of that permission.

But this is the *following* check, not preceding.  


The question of whether it is acceptable to treat as equivalent the statements "p is known to be dereferenced in all successors of B" and "p is known to be non-null in B," was discussed extensively about 20 years ago, and then again 12 years ago when it bit someone in the Linux kernel:

On Mon, Sep 13, 2021 at 10:28 PM Arthur O'Dwyer <arthur....@gmail.com> wrote: 

Btw, I don't think this thread has paid enough attention to Richard Smith's suggestion:

I can only subscribe to James Y Knight's opinion. Indeed, it can be a good criterion of which operations should work in finite-math-only mode and which can not work. The only thing which I worry about is the possibility of checking the operation result for infinity (and nan for symmetry). But the suggested criterion is formulated in terms of arguments, not results, so it must allow such checks.


What is the opinion to which you subscribe?

I mean this post:

On Fri, Sep 10, 2021 at 9:29 PM James Y Knight via cfe-dev <cfe...@lists.llvm.org> wrote:
On Thu, Sep 9, 2021, 8:59 PM Richard Smith via llvm-dev <llvm...@lists.llvm.org> wrote:
Would it be reasonable to treat operations on Inf and NaN values as UB in this mode only if the same operation on a signaling NaN might signal? (Approximately, that'd mean we imagine these non-finite value encodings all encode sNaNs that are UB if they would signal.) That means the operations that ISO 60559 defines as non-computational or quiet-computational would be permitted to receive NaN and Inf as input and produce them as output, but that other computational operations would not.

Per ISO 60559, the quiet-computational operations that I think are relevant to us are: copy, negate, abs, copySign, and conversions between encoding (eg, bitcast). The non-computational operations that I think are relevant to us are classification functions (including isNaN).

I'm in favor. (Perhaps unsurprisingly, as this is precisely the proposal I made earlier, worded slightly differently. :)
 
Sorry for the unclear statement.


Anyway, Richard's "quiet is signaling and signals are unspecified values" is really the only way out of the difficulty, as far as compiler people are concerned. You two (Serge and Krzysztof) can keep talking past each other at the application level, but the compiler people are going to have to do something in the code eventually, and that something is going to have to be expressed in terms similar to what Richard and I have been saying, because these are the terms that the compiler understands.

 
Glad to hear it. If we decide to implement Richard's approach, I will put back the patches that implement `llvm.isnan` and continue implementing other similar intrinsics. Clang needs missing `__builtin_*` intrinsics, they should be mapped into the new llvm intrinsics. Not sure when the user's manual should be updated, now or after the desired behavior will be implemented.
 
Thanks,
Arthur

David Edelsohn via llvm-dev

unread,
Sep 14, 2021, 12:41:16 PM9/14/21
to Arthur O'Dwyer, llvm...@lists.llvm.org, cfe...@lists.llvm.org

Hopefully the clarified semantics can be coordinated between and
implemented in a consistent manner in both LLVM and GCC.

Thanks, David

Krzysztof Parzyszek via llvm-dev

unread,
Sep 14, 2021, 6:28:26 PM9/14/21
to Arthur O'Dwyer, Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org

Anyway, Richard's "quiet is signaling and signals are unspecified values" is really the only way out of the difficulty, as far as compiler people are concerned. You two (Serge and Krzysztof) can keep talking past each other at the application level, but the compiler people are going to have to do something in the code eventually, and that something is going to have to be expressed in terms similar to what Richard and I have been saying, because these are the terms that the compiler understands.

 

I don’t know why you’re saying “at the application level”.  My concerns are motivated by what the compiler is supposed to do.  I don’t think that the consequences of “arithmetic operations don’t produce NaNs” are fully understood, and are likely not completely intuitive either.  We may end up having discussions as to whether we should optimize x+0 to x or not, because “x+0” carries the information that it won’t result in a NaN, while “x” alone doesn’t.  This is one case that comes to mind and I’m concerned that there are many others that we aren’t aware of yet.

 

--

Krzysztof Parzyszek  kpar...@quicinc.com   AI tools development

 

From: Arthur O'Dwyer <arthur....@gmail.com>
Sent: Tuesday, September 14, 2021 10:15 AM
To: Serge Pavlov <sepa...@gmail.com>

Cc: Krzysztof Parzyszek <kpar...@quicinc.com>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?

 

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

On Tue, Sep 14, 2021 at 9:22 AM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:

Chris Tetreault via llvm-dev

unread,
Sep 15, 2021, 1:42:35 PM9/15/21
to Krzysztof Parzyszek, Arthur O'Dwyer, Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org

Fundamentally, the question Serge asked has nothing to do with the concerns of “compiler people”, and everything to do with the user facing behavior of the compiler. Any talk of how the behavior should be implemented is (in my opinion) off topic until we settle the question of “should the compiler guarantee, as a special case, that isnan(x) will not be optimized out”. This is a Yes-or-No question, and the explanation for the answer needs to be able to be concisely described in the docs without the use of compiler jargon that Joe GameDev and Tom MLScientist probably don’t understand. I have stated my opinion, and am reluctant to wade into this argument again, but I think it’s important that we understand the issue at hand.

 

There are two productive outcomes of this question that I can see:

 

  1. The answer is No. In this case, we just need a small doc fix
  2. The answer is Yes. Only in this case do we actually need to modify the implementation

 

Let’s not put the cart before the horse. The “compiler people” don’t necessarily have to do anything.

 

Thanks,

   Chris Tetreault

Arthur O'Dwyer via llvm-dev

unread,
Sep 15, 2021, 1:58:25 PM9/15/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Wed, Sep 15, 2021 at 12:42 PM Chris Tetreault <ctet...@quicinc.com> wrote:

Fundamentally, the question Serge asked has nothing to do with the concerns of “compiler people”, and everything to do with the user facing behavior of the compiler. Any talk of how the behavior should be implemented is (in my opinion) off topic until we settle the question of “should the compiler guarantee, as a special case, that isnan(x) will not be optimized out”. This is a Yes-or-No question [...]


That's a good illustration of what I meant by "application level."  The user who asks "Will my call to isnan() be 'optimized out'?" doesn't really have any sense of what's going on at the compiler level. The compiler person can say, "Yes, of course some calls to `isnan` can be eliminated" and show them all kinds of examples:

    void one(double d) { (void)isnan(d); puts("hello world"); }
    void two(double d) { d = 2.0; if (isnan(d)) puts("this branch is dead"); }
    void three(double d) { d *= 0.0; if (isnan(d)) puts("this branch is dead only in fast-math mode"); }

but they don't really care about these examples; they care about one specific use of isnan() somewhere deep inside their specific application. We can't really give a Yes or No answer about that one because we can't see it; all we can do is try to explain the rules by which the compiler decides whether a given transformation is OK or not, and then the application developer has to take those rules back home and study their own code to figure out whether the compiler will consider the transformation OK to apply to that code.

HTH,
Arthur

Chris Tetreault via llvm-dev

unread,
Sep 15, 2021, 3:01:41 PM9/15/21
to Arthur O'Dwyer, llvm...@lists.llvm.org, cfe...@lists.llvm.org

I agree. However, we *can* answer the question “is my call to isnan(x) guaranteed not to be optimized out?” Currently, the answer is “No”. Serge is asking if we can change the answer to be “Yes” which is the application level matter that we’re discussing. It seems that Serge has done the legwork to find out if their isnan call is being optimized out, and for them it is. At this time, it’s not really important what we would do to implement the Yes behavior, other than to establish the difficulty of implementation. We want to know if we should, not if we could.

 

As maintainers of the compiler, it should be our goal to provide a good user experience. As it stands, it sounds like fast-math is poorly understood by users. It is within our purview to improve this UX issue by clarifying the behavior or making the behavior match user expectations.

 

Thanks,

   Chris Tetreault

 

From: Arthur O'Dwyer <arthur....@gmail.com>

Sent: Wednesday, September 15, 2021 10:58 AM
To: Chris Tetreault <ctet...@quicinc.com>

Cc: Krzysztof Parzyszek <kpar...@quicinc.com>; Serge Pavlov <sepa...@gmail.com>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?

 

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

On Wed, Sep 15, 2021 at 12:42 PM Chris Tetreault <ctet...@quicinc.com> wrote:

Serge Pavlov via llvm-dev

unread,
Sep 16, 2021, 1:37:57 AM9/16/21
to Chris Tetreault, Arthur O'Dwyer, llvm...@lists.llvm.org, cfe...@lists.llvm.org
Let me make some summary. I omit references for brevity, they are spread in the thread.

Treatment of `isnan` with `-ffinite-math-only` has issues:
- There are many users' complaints and disagreement expressed in GCC bug tracker and forums about the treatment.
- There are legitimate use cases when `isnan` needs to be called in `-ffinite-math-only` mode.
- Users have to invent workarounds to get functionality of `isnan`, which results in portability and performance loss.
- There is inconsistency with the behavior of libc, which always does a real check, and the compiler, which omits it.
Preserving `isnan` in the code would solve all of them.

What is the risk?

`-ffinite-math-only` is an optimization option, so preserving `isnan` cannot break the behavior of correct programs. The only possible negative impact is some loss of performance. It is unlikely that a real program spends so much time in `isnan` calls that it has noticeable effect, but if it does, a user can conditionally redefine `isnan` macro.

Preserving `isnan` in `-ffinite-math-only` mode is safe and makes the compiler more reliable and user-friendly.

Thanks,
--Serge

Aaron Ballman via llvm-dev

unread,
Sep 16, 2021, 8:06:45 AM9/16/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org

FWIW, I personally come down on the side of *not* removing the call to
isnan() that the user explicitly wrote. It's not beyond belief that a
C API is called from another language that can generate a NaN, and a
user who has enabled finite math only may still wish to guard against
those cross-module cases passing in a NAN that they know their TU
can't properly handle.

~Aaron

>
> Thanks,
> --Serge
> _______________________________________________
> cfe-dev mailing list
> cfe...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Serge Pavlov via llvm-dev

unread,
Sep 16, 2021, 9:48:08 AM9/16/21
to Aaron Ballman, llvm...@lists.llvm.org, cfe...@lists.llvm.org
At least Fortran users have the same problem and have to use similar kludges: https://stackoverflow.com/questions/15944614/is-it-possible-to-make-isnan-work-in-gfortran-o3-ffast-math.

Rust has experimental fast-math intrinsics (https://doc.rust-lang.org/core/intrinsics/fn.fadd_fast.html). Their implementation must suffer from this optimization. Users report that using -fast-math gives 20% improvement (https://github.com/rust-lang/rust/issues/21690#issuecomment-307244726) but wrong results due to "some strange things (e.g., NaN checks always return false)".

This is one more reason for not removing `isnan` at least on llvm level.

Thanks,
--Serge

Cranmer, Joshua via llvm-dev

unread,
Sep 16, 2021, 11:03:04 AM9/16/21
to Serge Pavlov, Chris Tetreault, llvm...@lists.llvm.org, Arthur O'Dwyer, cfe...@lists.llvm.org

I think you are not adequately summing up the risks of your approach; there are three other issues I see.

 

First, redefining `isnan` as a macro is expressly undefined behavior in C (see section 7.1.3, clauses 2 and 3—it’s undefined behavior to define a macro as a same name as reserved identifier in a standard library header). Conditionally redefining an `isnan` macro is therefore not a permissible solution.

 

The second thing that has been repeatedly brought up that is missing is the fact that `isnan` may still be inconsistently optimized out. `isnan(x)` would only be retained in the program if the compiler cannot deduce that `x` is the result of a nnan arithmetic operation. If it can deduce that—the simplest case being the somewhat questionable `isnan(x + 0)` example, but it’s also possible that, e.g., you’re calling `isnan(sum)` on the result of a summation, which would be the result of an arithmetic expression post-mem2reg/SROA—then the compiler would still elide it. It could be that this is less surprising to users than unconditionally optimizing array `isnan(x)`, but it should still be admitted that there is a potential for surprise here.

 

A final point is that the potential optimization benefits of eliding `isnan` are not limited to the cost of running the function itself (which are likely to be negligible), but also include the benefits of deleting any subsequent code that is attempting to handle NaN values, which may be fairly large blocks. A salient example is complex multiplication and division, where the actual expansion of  the multiplication and division code itself is dwarfed by the recalculation code if the result turns out to be a NaN.

 

From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Serge Pavlov via llvm-dev
Sent: Thursday, September 16, 2021 1:37
To: Chris Tetreault <ctet...@quicinc.com>

Serge Pavlov via llvm-dev

unread,
Sep 16, 2021, 1:31:37 PM9/16/21
to Cranmer, Joshua, llvm...@lists.llvm.org, cfe...@lists.llvm.org, Arthur O'Dwyer
On Thu, Sep 16, 2021 at 10:02 PM Cranmer, Joshua <joshua....@intel.com> wrote:

I think you are not adequately summing up the risks of your approach; there are three other issues I see.

 

First, redefining `isnan` as a macro is expressly undefined behavior in C (see section 7.1.3, clauses 2 and 3—it’s undefined behavior to define a macro as a same name as reserved identifier in a standard library header). Conditionally redefining an `isnan` macro is therefore not a permissible solution.


Defining things like `isnan` is the job of libc, which often is not a part of the compiler. It does not cause undefined behavior by itself. Redefining macro defined in system headers may be harmful if the new macro is inconsistent with other libc implementation (`errno` comes to mind). So this looks like a kind of legal disclaimer. As with `-ffinite-math-only` the redefinition is used at your own risk. For our task to remove the call of pure function the risk is negligible. And it is needed only to reproduce the old behavior.
 

 

The second thing that has been repeatedly brought up that is missing is the fact that `isnan` may still be inconsistently optimized out. `isnan(x)` would only be retained in the program if the compiler cannot deduce that `x` is the result of a nnan arithmetic operation. If it can deduce that—the simplest case being the somewhat questionable `isnan(x + 0)` example, but it’s also possible that, e.g., you’re calling `isnan(sum)` on the result of a summation, which would be the result of an arithmetic expression post-mem2reg/SROA—then the compiler would still elide it. It could be that this is less surprising to users than unconditionally optimizing array `isnan(x)`, but it should still be admitted that there is a potential for surprise here.


Regarding your example, this thread already contains consideration of this case:

On Tue, Sep 14, 2021 at 12:50 AM Serge Pavlov <sepa...@gmail.com> wrote:
On Mon, Sep 13, 2021 at 11:46 PM Chris Tetreault <ctet...@quicinc.com> wrote:
… is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:
You are right, this was a bad idea. Compiler may optimize out `isnan` but only when it deduces that the value cannot be NaN, but not due to the user's promise. It is especially important for `isinf`. Addition of two finite values may produce infinity and there is no universal way to predict it. It is probably not an issue for types like float or double, but ML cores use halfs or even minifloats, where overflow is much more probable. If in the code:
```
float r = a + b;
if (isinf(r)) {...
```
`isinf` were optimized out just because -ffinite-math-only is in effect, the user cannot check if overflow did not occur.
 
Rules proposed by Richard are also formulated using arguments, not results. Now there is no intention to optimize such a case.

 

A final point is that the potential optimization benefits of eliding `isnan` are not limited to the cost of running the function itself (which are likely to be negligible), but also include the benefits of deleting any subsequent code that is attempting to handle NaN values, which may be fairly large blocks. A salient example is complex multiplication and division, where the actual expansion of  the multiplication and division code itself is dwarfed by the recalculation code if the result turns out to be a NaN.


The code intended for handling NaNs won't be executed in -ffinite-math-only mode, if the mode is used correctly. So expenses are only the check itself and the associated jump. For the code that does intensive calculation they must be negligible. Anyway, the user can redefine `isnan`, it is as safe as `-ffinite-math-only` itself.

Aaron Ballman via llvm-dev

unread,
Sep 16, 2021, 1:56:53 PM9/16/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Thu, Sep 16, 2021 at 1:31 PM Serge Pavlov via cfe-dev
<cfe...@lists.llvm.org> wrote:
>
> On Thu, Sep 16, 2021 at 10:02 PM Cranmer, Joshua <joshua....@intel.com> wrote:
>>
>> I think you are not adequately summing up the risks of your approach; there are three other issues I see.
>>
>>
>>
>> First, redefining `isnan` as a macro is expressly undefined behavior in C (see section 7.1.3, clauses 2 and 3—it’s undefined behavior to define a macro as a same name as reserved identifier in a standard library header). Conditionally redefining an `isnan` macro is therefore not a permissible solution.
>
>
> Defining things like `isnan` is the job of libc, which often is not a part of the compiler.

The standard makes no real distinction between what's the job of the
compiler and what's the job of the library; it's all "the
implementation" as far as the standard is concerned. FWIW, there are
plenty of libc things which are produced by the compiler (see
https://github.com/llvm/llvm-project/tree/main/clang/lib/Headers for
all the standard library interfaces provided by Clang).

> It does not cause undefined behavior by itself.

A user-defined macro named `isnan` is UB per 7.1.3p1 if the user
includes <math.h> in the TU.

> Redefining macro defined in system headers may be harmful if the new macro is inconsistent with other libc implementation (`errno` comes to mind). So this looks like a kind of legal disclaimer.

I would hope so; the standard says explicitly that redefining that
macro is UB. :-)

~Aaron

> _______________________________________________
> cfe-dev mailing list
> cfe...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

Arthur O'Dwyer via llvm-dev

unread,
Sep 16, 2021, 2:18:58 PM9/16/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Thu, Sep 16, 2021 at 1:31 PM Serge Pavlov <sepa...@gmail.com> wrote:
On Tue, Sep 14, 2021 at 12:50 AM Serge Pavlov <sepa...@gmail.com> wrote:
On Mon, Sep 13, 2021 at 11:46 PM Chris Tetreault <ctet...@quicinc.com> wrote:
… is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:
You are right, this was a bad idea. Compiler may optimize out `isnan` but only when it deduces that the value cannot be NaN, but not due to the user's promise. It is especially important for `isinf`. Addition of two finite values may produce infinity and there is no universal way to predict it. It is probably not an issue for types like float or double, but ML cores use halfs or even minifloats, where overflow is much more probable. If in the code:
```
float r = a + b;
if (isinf(r)) {...
```
`isinf` were optimized out just because -ffinite-math-only is in effect, the user cannot check if overflow did not occur.
 
Rules proposed by Richard are also formulated using arguments, not results. Now there is no intention to optimize such a case.

Infinity (HUGE_VAL) is already not NaN, so this example doesn't have anything to do with the NaN cases being discussed.
However, let's rephrase as a NaN situation:

    bool f1(float a, float b) {
        float r = a + b;
        return isnan(r);
    }
    bool result = f1(-HUGE_VAL, HUGE_VAL);  // expect "true"

Here, `a + b` can produce quiet-NaN (if `a` is -HUGE_VAL and `b` is +HUGE_VAL).
By Richard Smith's -ffast-math proposal as I understand it, this quiet-NaN result would be treated "as if" it were a signaling NaN.
Under IEEE 754, no operation ever produces a signaling NaN, so unfortunately IEEE 754 can't guide us here; but intuitively, I think we'd all say that merely producing a signaling NaN would not itself cause a signal. So we store the quiet-NaN result in `r`.
Then we ask whether `isnan(r)`. The quiet-NaN result in `r` is used. By Richard Smith's -ffast-math proposal as I understand it, any operation would produce an unspecified result if it would raise a signal; but in fact `isnan(r)` is a non-signaling operation, so even though we're treating quiet-NaN as signaling-NaN, isnan(r) never raises any signal. So this code has well-defined behavior in -ffast-math mode.
(And because the code's behavior is well-defined, therefore `isnan(r)` has its usual meaning. When `r` holds a quiet-NaN, as in this case, `isnan(r)` will correctly return `true`.)

I've googled, but failed to discover, whether comparison against a signaling NaN is expected to signal. That is,

    bool f2(float a, float b) {
        float r = a + b;
        return (r != r);
    }
    bool result = f2(-HUGE_VAL, HUGE_VAL);  // expect "true" in IEEE754 mode, but perhaps "false" in -ffast-math mode

I'm really hoping that comparison is a signaling operation. If it is, then according to Richard Smith's proposal as I understand it, the compiler would be free to optimize `(r != r)` into `(false)` in -ffast-math mode.  (And, as a corollary, the compiler would not generally be free to transform `isnan(r)` into `(r != r)`, because the latter expression has more preconditions than the former.)

–Arthur

Serge Pavlov via llvm-dev

unread,
Sep 16, 2021, 3:04:15 PM9/16/21
to Aaron Ballman, llvm...@lists.llvm.org, cfe...@lists.llvm.org
Yes, formally this is UB. But removal of `isnan` is also a standard violation, because `isnan` has definite semantics, which is not preserved in this case. We already are outside the standard behavior with the `isnan` removal, so things cannot become worse.

Anyway this is a workaround for emulation only, for the (hypothetical) case, when performance dropped noticeably due to execution of the checks. It is now clear if such cases really exist.

Thanks,
--Serge

Serge Pavlov via llvm-dev

unread,
Sep 16, 2021, 3:15:49 PM9/16/21
to Arthur O'Dwyer, llvm...@lists.llvm.org, cfe...@lists.llvm.org
Good example, thank you!

I'm really hoping that comparison is a signaling operation.

Comparison is a signaling-computational operation as per IEEE-754 (5.6.1). All of them signal on signaling NaN.

Thanks,
--Serge

Dimitry Andric via llvm-dev

unread,
Sep 16, 2021, 3:27:58 PM9/16/21
to Arthur O'Dwyer, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On 16 Sep 2021, at 20:18, Arthur O'Dwyer via cfe-dev <cfe...@lists.llvm.org> wrote:

On Thu, Sep 16, 2021 at 1:31 PM Serge Pavlov <sepa...@gmail.com> wrote:
On Tue, Sep 14, 2021 at 12:50 AM Serge Pavlov <sepa...@gmail.com> wrote:
On Mon, Sep 13, 2021 at 11:46 PM Chris Tetreault <ctet...@quicinc.com> wrote:
… is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:
You are right, this was a bad idea. Compiler may optimize out `isnan` but only when it deduces that the value cannot be NaN, but not due to the user's promise. It is especially important for `isinf`. Addition of two finite values may produce infinity and there is no universal way to predict it. It is probably not an issue for types like float or double, but ML cores use halfs or even minifloats, where overflow is much more probable. If in the code:
```
float r = a + b;
if (isinf(r)) {...
```
`isinf` were optimized out just because -ffinite-math-only is in effect, the user cannot check if overflow did not occur.
 
Rules proposed by Richard are also formulated using arguments, not results. Now there is no intention to optimize such a case.

Infinity (HUGE_VAL) is already not NaN, so this example doesn't have anything to do with the NaN cases being discussed.

Well, there's https://bugs.llvm.org/show_bug.cgi?id=51775, "-ffast-math breaks strtod() due to "== HUGE_VAL" considered impossible"... :-)

-Dimitry


signature.asc

Michael Kruse via llvm-dev

unread,
Sep 16, 2021, 3:29:39 PM9/16/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org
Am Mo., 13. Sept. 2021 um 11:46 Uhr schrieb Chris Tetreault via
cfe-dev <cfe...@lists.llvm.org>:

> As a user, if I read that:
>
>
>
> ```
>
> if (isnan(x)) {
>
> ```
>
>
>
> … is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:
>
>
>
> ```
>
> if (isnan(x + 0)) {
>
> ```
>
>
>
> … does not also work. I’m going to open a bug and complain, and the slide down the slippery slope will continue. You and I understand the difference, and the technical reason why `isnan(x)` is supported but `isnan(x + 0)` isn’t, but Joe Coder just trying to figure out why he’s got NaN in his matrices despite his careful NaN handling code. Joe is not a compiler expert, and on the face of it, it seems like a silly limitation. This will never end until fast-math is gutted.

C/C++ already has cases like this. Pointer arithmetic on null pointers
is undefined behaviour, even if adding[1,2]/subtracting[3] zero. I
don't think it is too far fetched to expect from users to know that an
operation is undefined behaviour even if one of the operands is zero.

Michael

[1] https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-addition.c
[2] https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/ubsan/TestCases/Pointer/nullptr-and-nonzero-offset-constants.cpp
[3] https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-subtraction.c


Michael

Chris Tetreault via llvm-dev

unread,
Sep 16, 2021, 4:11:36 PM9/16/21
to Michael Kruse, llvm...@lists.llvm.org, cfe...@lists.llvm.org
The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers.

It seems to me that most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code.

To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler. Changing the compiler is hard, affects everybody who uses the compiler, and creates inconsistency in behavior between clang and gcc (and msvc with /fp:fast), and clang and old versions of clang. The behavior of fast-math with respect to NaN is consistent across the mainstream c/c++ compilers: no promises are made, users should not assume that they can use it for anything. Changing it now would create a major portability issue for user codebases, which in and of itself is a very strong reason to not make this change.

If the behavior is confusing to users, that's because it's poorly explained. Honestly, I think the docs are pretty clear, but "It's clear, you just need to learn to read" is never an acceptable answer so it could certainly be improved. This is the only thing that needs to be fixed in my opinion.

Thanks,
Chris Tetreault

-----Original Message-----
From: Michael Kruse <cfe...@meinersbur.de>
Sent: Thursday, September 16, 2021 12:29 PM
To: Chris Tetreault <ctet...@quicinc.com>
Cc: Serge Pavlov <sepa...@gmail.com>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

Serge Pavlov via llvm-dev

unread,
Sep 16, 2021, 11:23:41 PM9/16/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org, Michael Kruse
On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctet...@quicinc.com> wrote:
The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers.

It seems to me that  most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code.

It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case.

Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback.


To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler.

It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable.
 
Changing the compiler is hard, affects everybody who uses the compiler, and creates inconsistency in behavior between clang and gcc (and msvc with /fp:fast), and clang and old versions of clang.

ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is implemented in libc, it is also a real check. Actually removing `isnan` creates inconsistency.
 
The behavior of fast-math with respect to NaN is consistent across the mainstream c/c++ compilers: no promises are made, users should not assume that they can use it for anything. Changing it now would create a major portability issue for user codebases, which in and of itself is a very strong reason to not make this change.

Removing `isnan` is only an optimization, it does not intend to change semantics. So it cannot create portability issues. Quite the contrary, it helps portability by making behavior consistent between compilers and libc implementations. The only possible issue is performance loss, this is discussed above, it is an unlikely case. Anyway, if such loss exists and it is absolutely intolerable for a user, a hack with redefinition of `isnan` restores the previous code generation.


If the behavior is confusing to users, that's because it's poorly explained.
 
What is confusing? That the explicitly written call to a function is not removed? According to user feedback it is the silent removal of `isnan` that confuses users.

Honestly, I think the docs are pretty clear, but "It's clear, you just need to learn to read" is never an acceptable answer so it could certainly be improved. This is the only thing that needs to be fixed in my opinion.

The documentation says about -ffinite-math-only:

 "Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs."

Is it clear whether `isnan` is arithmetic or not?

Mehdi AMINI via llvm-dev

unread,
Sep 16, 2021, 11:53:57 PM9/16/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:
On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctet...@quicinc.com> wrote:
The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers.

It seems to me that  most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code.

It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case.

Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback.


To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler.

It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable.

Switching mode in a single TU seems valuable, but could this be handled with pragmas or function attributes instead?
 
 
Changing the compiler is hard, affects everybody who uses the compiler, and creates inconsistency in behavior between clang and gcc (and msvc with /fp:fast), and clang and old versions of clang.

ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is implemented in libc, it is also a real check. Actually removing `isnan` creates inconsistency.
 
The behavior of fast-math with respect to NaN is consistent across the mainstream c/c++ compilers: no promises are made, users should not assume that they can use it for anything. Changing it now would create a major portability issue for user codebases, which in and of itself is a very strong reason to not make this change.

Removing `isnan` is only an optimization, it does not intend to change semantics. So it cannot create portability issues. Quite the contrary, it helps portability by making behavior consistent between compilers and libc implementations. The only possible issue is performance loss, this is discussed above, it is an unlikely case. Anyway, if such loss exists and it is absolutely intolerable for a user, a hack with redefinition of `isnan` restores the previous code generation.


If the behavior is confusing to users, that's because it's poorly explained.
 
What is confusing? That the explicitly written call to a function is not removed? According to user feedback it is the silent removal of `isnan` that confuses users.

Honestly, I think the docs are pretty clear, but "It's clear, you just need to learn to read" is never an acceptable answer so it could certainly be improved. This is the only thing that needs to be fixed in my opinion.

The documentation says about -ffinite-math-only:

 "Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs."

Is it clear whether `isnan` is arithmetic or not?

If the result of a floating point arithmetic is fed directly to `isnan()`, are we allowed to eliminate the computation and fold the check to none? (seems like it according to the sentence you're quoting).
Are we back to `isnan(x+0.0)` can be folded but not `isnan(x)`?

-- 
Mehdi


 


Thanks,
   Chris Tetreault

-----Original Message-----
From: Michael Kruse <cfe...@meinersbur.de>
Sent: Thursday, September 16, 2021 12:29 PM
To: Chris Tetreault <ctet...@quicinc.com>
Cc: Serge Pavlov <sepa...@gmail.com>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

Am Mo., 13. Sept. 2021 um 11:46 Uhr schrieb Chris Tetreault via cfe-dev <cfe...@lists.llvm.org>:
> As a user, if I read that:
>
>
>
> ```
>
> if (isnan(x)) {
>
> ```
>
>
>
> … is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:
>
>
>
> ```
>
> if (isnan(x + 0)) {
>
> ```
>
>
>
> … does not also work. I’m going to open a bug and complain, and the slide down the slippery slope will continue. You and I understand the difference, and the technical reason why `isnan(x)` is supported but `isnan(x + 0)` isn’t, but Joe Coder just trying to figure out why he’s got NaN in his matrices despite his careful NaN handling code. Joe is not a compiler expert, and on the face of it, it seems like a silly limitation. This will never end until fast-math is gutted.

C/C++ already has cases like this. Pointer arithmetic on null pointers is undefined behaviour, even if adding[1,2]/subtracting[3] zero. I don't think it is too far fetched to expect from users to know that an operation is undefined behaviour even if one of the operands is zero.

Michael

[1] https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-addition.c
[2] https://github.com/llvm/llvm-project/blob/main/compiler-rt/test/ubsan/TestCases/Pointer/nullptr-and-nonzero-offset-constants.cpp
[3] https://github.com/llvm/llvm-project/blob/main/clang/test/Sema/pointer-subtraction.c


Michael
_______________________________________________

Serge Pavlov via llvm-dev

unread,
Sep 17, 2021, 2:19:09 AM9/17/21
to Mehdi AMINI, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joke...@gmail.com> wrote:


On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:
On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctet...@quicinc.com> wrote:
The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers.

It seems to me that  most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code.

It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case.

Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback.


To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler.

It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable.

Switching mode in a single TU seems valuable, but could this be handled with pragmas or function attributes instead?

GCC allows it by using `#pragma GCC optimize()`, but clang does not support it. No suitable function attribute exists for that.
 
 
 
Changing the compiler is hard, affects everybody who uses the compiler, and creates inconsistency in behavior between clang and gcc (and msvc with /fp:fast), and clang and old versions of clang.

ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is implemented in libc, it is also a real check. Actually removing `isnan` creates inconsistency.
 
The behavior of fast-math with respect to NaN is consistent across the mainstream c/c++ compilers: no promises are made, users should not assume that they can use it for anything. Changing it now would create a major portability issue for user codebases, which in and of itself is a very strong reason to not make this change.

Removing `isnan` is only an optimization, it does not intend to change semantics. So it cannot create portability issues. Quite the contrary, it helps portability by making behavior consistent between compilers and libc implementations. The only possible issue is performance loss, this is discussed above, it is an unlikely case. Anyway, if such loss exists and it is absolutely intolerable for a user, a hack with redefinition of `isnan` restores the previous code generation.


If the behavior is confusing to users, that's because it's poorly explained.
 
What is confusing? That the explicitly written call to a function is not removed? According to user feedback it is the silent removal of `isnan` that confuses users.

Honestly, I think the docs are pretty clear, but "It's clear, you just need to learn to read" is never an acceptable answer so it could certainly be improved. This is the only thing that needs to be fixed in my opinion.

The documentation says about -ffinite-math-only:

 "Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs."

Is it clear whether `isnan` is arithmetic or not?

If the result of a floating point arithmetic is fed directly to `isnan()`, are we allowed to eliminate the computation and fold the check to none? (seems like it according to the sentence you're quoting).
Are we back to `isnan(x+0.0)` can be folded but not `isnan(x)`?

Initially there was such intention, but during the discussion it became clear that it is not profitable. For `isinf` there is a realistic use case when removing the call `isinf(a+b)` is an issue. Arthur O'Dwyer demonstrated an example for `isnan`. It is harder to provide guarantees about the result, than about the arguments.

Richard Smith via llvm-dev

unread,
Sep 17, 2021, 4:19:56 AM9/17/21
to Arthur O'Dwyer, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Thu, 16 Sept 2021 at 11:18, Arthur O'Dwyer via cfe-dev <cfe...@lists.llvm.org> wrote:
On Thu, Sep 16, 2021 at 1:31 PM Serge Pavlov <sepa...@gmail.com> wrote:
On Tue, Sep 14, 2021 at 12:50 AM Serge Pavlov <sepa...@gmail.com> wrote:
On Mon, Sep 13, 2021 at 11:46 PM Chris Tetreault <ctet...@quicinc.com> wrote:
… is guaranteed to work, and I read that fast-math enables the compiler to reason about constructs like `x + 0` being equal to `x`, then I’m going to be very confused when:
You are right, this was a bad idea. Compiler may optimize out `isnan` but only when it deduces that the value cannot be NaN, but not due to the user's promise. It is especially important for `isinf`. Addition of two finite values may produce infinity and there is no universal way to predict it. It is probably not an issue for types like float or double, but ML cores use halfs or even minifloats, where overflow is much more probable. If in the code:
```
float r = a + b;
if (isinf(r)) {...
```
`isinf` were optimized out just because -ffinite-math-only is in effect, the user cannot check if overflow did not occur.
 
Rules proposed by Richard are also formulated using arguments, not results. Now there is no intention to optimize such a case.

Infinity (HUGE_VAL) is already not NaN, so this example doesn't have anything to do with the NaN cases being discussed.
However, let's rephrase as a NaN situation:

    bool f1(float a, float b) {
        float r = a + b;
        return isnan(r);
    }
    bool result = f1(-HUGE_VAL, HUGE_VAL);  // expect "true"

Here, `a + b` can produce quiet-NaN (if `a` is -HUGE_VAL and `b` is +HUGE_VAL).
By Richard Smith's -ffast-math proposal as I understand it, this quiet-NaN result would be treated "as if" it were a signaling NaN.
Under IEEE 754, no operation ever produces a signaling NaN, so unfortunately IEEE 754 can't guide us here; but intuitively, I think we'd all say that merely producing a signaling NaN would not itself cause a signal. So we store the quiet-NaN result in `r`.
Then we ask whether `isnan(r)`. The quiet-NaN result in `r` is used. By Richard Smith's -ffast-math proposal as I understand it, any operation would produce an unspecified result if it would raise a signal; but in fact `isnan(r)` is a non-signaling operation, so even though we're treating quiet-NaN as signaling-NaN, isnan(r) never raises any signal. So this code has well-defined behavior in -ffast-math mode.
(And because the code's behavior is well-defined, therefore `isnan(r)` has its usual meaning. When `r` holds a quiet-NaN, as in this case, `isnan(r)` will correctly return `true`.)

For what it's worth, this is indeed exactly what I meant. Thanks for clarifying!
 
I've googled, but failed to discover, whether comparison against a signaling NaN is expected to signal. That is,

    bool f2(float a, float b) {
        float r = a + b;
        return (r != r);
    }
    bool result = f2(-HUGE_VAL, HUGE_VAL);  // expect "true" in IEEE754 mode, but perhaps "false" in -ffast-math mode

I'm really hoping that comparison is a signaling operation. If it is, then according to Richard Smith's proposal as I understand it, the compiler would be free to optimize `(r != r)` into `(false)` in -ffast-math mode.  (And, as a corollary, the compiler would not generally be free to transform `isnan(r)` into `(r != r)`, because the latter expression has more preconditions than the former.)

–Arthur
_______________________________________________

Mehdi AMINI via llvm-dev

unread,
Sep 17, 2021, 12:17:39 PM9/17/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepa...@gmail.com> wrote:
On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joke...@gmail.com> wrote:


On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:
On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctet...@quicinc.com> wrote:
The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers.

It seems to me that  most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code.

It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case.

Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback.


To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler.

It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable.

Switching mode in a single TU seems valuable, but could this be handled with pragmas or function attributes instead?

GCC allows it by using `#pragma GCC optimize()`, but clang does not support it. No suitable function attribute exists for that.

Right, I know that clang does not support it, but it could :)
So since we're looking at what provides the best user-experience: isn't that it? Shouldn't we look into providing this level of granularity? (whether function-level or finer grain)

Chris Tetreault via llvm-dev

unread,
Sep 17, 2021, 1:06:11 PM9/17/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org, Michael Kruse

> It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case.

 

The *intent* of -ffast-math is to do absolutely everything possible to get the fastest floating point math possible. In other words, to pretend that floating point values are actually real numbers. In real math, there is no negative 0, or NaN, and infinity is not something you do algebra with. If you try to divide by 0, you just get points off your midterm; there’s no “behavior” that needs to be defined.

 

> Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback.

 

The use case is legitimate, I agree, and the compiler does allow it. In fancyAlgo.cpp, have your input validation code that checks for NaN. in fancyAlgoFast.cpp, have your ML kernel, compiled with fast-math. If you really *must* have them all be in the same TU, then compile with -fhonor-nans.

 

> It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable.

 

It's a workaround for a well defined behavior that you opted into. If having multiple TU’s is not an option, then you can do some of the other solutions presented or compile with -fhonor-nans.

 

> ICC and MSVC do not remove `isnan` in fast math mode. If `isnan` is implemented in libc, it is also a real check. Actually removing `isnan` creates inconsistency.

 

I don’t know about ICC, but this is a quote from the docs for MSVC: “Special values (NaN, +infinity, -infinity, -0.0) may not be propagated or behave strictly according to the IEEE-754 standard”. My interpretation of this is the same as that for clang: no guarantees are provided. Maybe it works today, but that doesn’t mean it will work tomorrow.

 

> Removing `isnan` is only an optimization, it does not intend to change semantics. So it cannot create portability issues. Quite the contrary, it helps portability by making behavior consistent between compilers and libc implementations. The only possible issue is performance loss, this is discussed above, it is an unlikely case. Anyway, if such loss exists and it is absolutely intolerable for a user, a hack with redefinition of `isnan` restores the previous code generation.

 

It creates a portability issue if we change the compiler to guarantee that isnan(x) works, and then users rely on it working. The resulting code is non portable to other compilers, and to old versions of clang.

 

> The documentation says about -ffinite-math-only:

 

 > "Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs."

 

> Is it clear whether `isnan` is arithmetic or not?

 

The docs for -ffinite-math-only on https://clang.llvm.org/docs/UsersManual.html actually say “Allow floating-point optimizations that assume arguments and results are not NaNs or +-Inf”. Notice that it says nothing about arithmetic. I’m not sure where you got your quote, but it’s not from the docs for the currently available release of upstream clang.

 

Thanks,

   Chris Tetreault

Chris Tetreault via llvm-dev

unread,
Sep 17, 2021, 1:35:04 PM9/17/21
to Mehdi AMINI, Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org

For the record, I think having a pragma that allows you to control fast-math behavior is fine. This sort of thing is far from unprecedented, and is far more palatable to me than just having random special cases. It’s also far more useful as I can do something like:

 

   #pragma clang fast-math push

   #pragma clang fast-math-on

   // bunch of hairy floating point math here

   #pragma clang fast-math-pop

 

… and easily isolate fast math to the smallest area it can be. This is morally equivalent to putting the scary math in it’s own TU, but less annoying.

 

The pragma should probably have a push/pop, but otherwise I don’t really care what color the bike shed is.

 

Thanks,

   Chris Tetreault

 

From: Mehdi AMINI <joke...@gmail.com>
Sent: Friday, September 17, 2021 9:17 AM
To: Serge Pavlov <sepa...@gmail.com>

Serge Pavlov via llvm-dev

unread,
Sep 20, 2021, 4:23:33 AM9/20/21
to Mehdi AMINI, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Fri, Sep 17, 2021 at 11:17 PM Mehdi AMINI <joke...@gmail.com> wrote:
On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepa...@gmail.com> wrote:
On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joke...@gmail.com> wrote:
On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:
On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctet...@quicinc.com> wrote:
The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers.

It seems to me that  most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code.

It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case.

Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback.


To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler.

It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable.

Switching mode in a single TU seems valuable, but could this be handled with pragmas or function attributes instead?

GCC allows it by using `#pragma GCC optimize()`, but clang does not support it. No suitable function attribute exists for that.

Right, I know that clang does not support it, but it could :)
So since we're looking at what provides the best user-experience: isn't that it? Shouldn't we look into providing this level of granularity? (whether function-level or finer grain)

It could mitigate the problem if it were implemented. A user who needs to handle NaNs in -ffinite-math-only compilation and writes the code from scratch could use this facility to get things working. I also think such pragma, implemented with enough degree of flexibility, could be useful irrespective of this topic.

However, in general it does not solve the problem. The most important issue which remains unaddressed is inconsistency of the implementation.

The handling of `isnan` in -ffinite-math-only by clang is not consistent because:
- It differs from what other compilers do. Namely MSVC and Intel compiler do not throw away `isnan` in this mode: https://godbolt.org/z/qTaz47qhP.
- It depends on optimization options. With -O2 the check is removed but with -O0 remains: https://godbolt.org/z/cjYePv7s7. Other options also can affect the behavior, for example with `-ffp-model=strict` the check is generated irrespective of the optimization mode (see the same link).
- It is inconsistent with libc implementations. If `isnan` is provided by libc, it is a real check, but the compiler may drop it.
It would not be an issue if `isnan` removal were just an optimization. It however changes semantics in the presence of NaNs, so such removal can break user code.

In the typical use case a user puts a call to `isnan` to ensure no operations on NaNs occur. The call can also be present in some header that implements some functionality for the general case. It may work because `isnan` is provided by libc. Later on when configuration changes or libc is updated the code may be broken, because implementation of `isnan` changes, as it happened after https://reviews.llvm.org/D69806.

If clang kept calls to `isnan`, it would be consistent with ICC and MSVC and with all libc implementations. The behavior would be different from gcc, but clang would be on the winning side, because the number of programs that work with clang would be larger.

Also if we agree that NaNs can appear in the code compiled with -ffinite-math-only, there must be a way to check if a number is a NaN. 

Thanks,
--Serge

Chris Tetreault via llvm-dev

unread,
Sep 20, 2021, 12:39:46 PM9/20/21
to Serge Pavlov, Mehdi AMINI, llvm...@lists.llvm.org, cfe...@lists.llvm.org

You’re confusing implementation details (you have a Godbolt link that shows that MSVC just happens to not remove the isnan call) with documented behavior (I provided a link to the MSVC docs that shows that no promises are made with respect to NaN). The fact is that no compiler (Maybe ICC does, I don’t know, I haven’t checked. I bet their docs say something similar to MSVC, clang, and GCC though.) guarantees that isnan(x) will not be optimized out with fast-math enabled. There is no inconsistency: all the compilers document that they are free to optimize as if there were no NaNs, and they then do whatever is best for their implementation. If you think this is inconsistent, then let me tell you about that time I dereferenced a null pointer and it didn’t segfault.

 

Now, many people have suggested in this thread that a pragma be added. I personally fully support this proposal. I think it’s a very clean solution, and any non-trivial portable codebase probably already has a library of preprocessor macros that abstract this sort of thing. Do you have a concrete reason why a pragma is unsuitable?

 

From: Serge Pavlov <sepa...@gmail.com>
Sent: Monday, September 20, 2021 1:23 AM
To: Mehdi AMINI <joke...@gmail.com>
Cc: Chris Tetreault <ctet...@quicinc.com>; llvm...@lists.llvm.org; cfe...@lists.llvm.org
Subject: Re: [cfe-dev] [llvm-dev] Should isnan be optimized out in fast-math mode?

 

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

On Fri, Sep 17, 2021 at 11:17 PM Mehdi AMINI <joke...@gmail.com> wrote:

Mehdi AMINI via llvm-dev

unread,
Sep 20, 2021, 1:04:38 PM9/20/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Mon, Sep 20, 2021 at 1:23 AM Serge Pavlov <sepa...@gmail.com> wrote:
On Fri, Sep 17, 2021 at 11:17 PM Mehdi AMINI <joke...@gmail.com> wrote:
On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepa...@gmail.com> wrote:
On Fri, Sep 17, 2021 at 10:53 AM Mehdi AMINI <joke...@gmail.com> wrote:
On Thu, Sep 16, 2021 at 8:23 PM Serge Pavlov via cfe-dev <cfe...@lists.llvm.org> wrote:
On Fri, Sep 17, 2021 at 3:11 AM Chris Tetreault <ctet...@quicinc.com> wrote:
The difference there is that doing pointer arithmetic on null pointers doesn't *usually* work, unless you turn on -ffast-pointers.

It seems to me that  most confusion related to -ffast-math is likely caused by people who are transitioning to using it. I have some codebase, and I turn on fast math, and then a few months down the road I notice a strangeness that I did not catch during the initial transition period. If you're writing new code with fast-math, you don't do things like try to use NaN as a sentinel value in a TU with fast math turned on. This is the sort of thing you catch when you try to transition an existing codebase. Forgive me for the uncharitable interpretation, but it's much easier to ask the compiler to change to accommodate your use case than it is to refactor your code.

It is a common way to explain problems with -ffinite-math-only by user ignorance. However user misunderstandings and complaints may indicate a flaw in compiler implementation, which I believe we have in this case.

Using NaN as sentinels is a natural way when you cannot spend extra memory for keeping flags for each item, spend extra cycles to read that flag and do not want to pollute cache. It does not depend on reading documentation or writing the code from scratch. It is simply the best solution for storing data. If performance of the data processing is critical, -ffast-math is a good solution. This is a fairly legitimate use case. The fact that the compiler does not allow it is a compiler drawback.


To me, I think Mehdi had the best solution: The algorithm that is the bottleneck, and experiences the huge speedup using fast-math should be separated into its own source file. This source file, and only this source file should be compiled with fast-math. The outer driver loop should not be compiled with fast math. This solution is clean, (probably) easy, and doesn't require a change in the compiler.

It is a workaround, it works in some cases but does not in others. ML kernel often is a single translation unit, there may be no such thing as linker for that processor. At the same time it is computation intensive and using fast-math in it may be very profitable.

Switching mode in a single TU seems valuable, but could this be handled with pragmas or function attributes instead?

GCC allows it by using `#pragma GCC optimize()`, but clang does not support it. No suitable function attribute exists for that.

Right, I know that clang does not support it, but it could :)
So since we're looking at what provides the best user-experience: isn't that it? Shouldn't we look into providing this level of granularity? (whether function-level or finer grain)

It could mitigate the problem if it were implemented. A user who needs to handle NaNs in -ffinite-math-only compilation and writes the code from scratch could use this facility to get things working. I also think such pragma, implemented with enough degree of flexibility, could be useful irrespective of this topic.

However, in general it does not solve the problem. The most important issue which remains unaddressed is inconsistency of the implementation.

The handling of `isnan` in -ffinite-math-only by clang is not consistent because:

Some counter points to not be one-sided:
 
 
- It differs from what other compilers do. Namely MSVC and Intel compiler do not throw away `isnan` in this mode: https://godbolt.org/z/qTaz47qhP.

Right but we're not consistent with other compilers in our handling of fast-math in general: the numerical result of a floating point program will be potentially vastly different with fast-math enabled on different compilers.
 
- It depends on optimization options. With -O2 the check is removed but with -O0 remains: https://godbolt.org/z/cjYePv7s7. Other options also can affect the behavior, for example with `-ffp-model=strict` the check is generated irrespective of the optimization mode (see the same link).

Right, but that seems by design to me: being able to optimize is actually the point!
There are other example of such behavior:

a = *p
if (!p) {
  printf("p is null");
}

The check will also be eliminated with O2 and not O0 here. 
 
- It is inconsistent with libc implementations. If `isnan` is provided by libc, it is a real check, but the compiler may drop it.

I don't really understand the argument of comparison with libc to be honest. Everything the compiler optimizes may make it different from a library implementation under this kind of special flags. The very reason for these flags to exist is actually to allow the compiler to do this!

It would not be an issue if `isnan` removal were just an optimization. It however changes semantics in the presence of NaNs, so such removal can break user code.

In the typical use case a user puts a call to `isnan` to ensure no operations on NaNs occur. The call can also be present in some header that implements some functionality for the general case. It may work because `isnan` is provided by libc. Later on when configuration changes or libc is updated the code may be broken, because implementation of `isnan` changes, as it happened after https://reviews.llvm.org/D69806.

If clang kept calls to `isnan`, it would be consistent with ICC and MSVC and with all libc implementations. The behavior would be different from gcc, but clang would be on the winning side, because the number of programs that work with clang would be larger.

Also if we agree that NaNs can appear in the code compiled with -ffinite-math-only, there must be a way to check if a number is a NaN.

I'd find it unfortunate though that in a mode which specifies that the result of floating point arithmetic can't have NaN we can't constant fold isnan(x) where x is a potentially complex expression (think that x could be dead-code if not for the isnan).

Best,

-- 
Mehdi

 
 

Thanks,
--Serge

Serge Pavlov via llvm-dev

unread,
Sep 20, 2021, 1:10:11 PM9/20/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org
MSVC documentation says: “Special values (NaN, +infinity, -infinity, -0.0) may not be propagated or behave strictly according to the IEEE-754 standard”. Such exclusion is necessary to apply transformations that are suitable for real numbers only, like `x * 0 -> 0`. NaNs in arithmetic operations propagate from input to output, in most operations if an operand is NaN, the result is also NaN. `isnan` has nothing with NaN propagation, it just makes the check. The documentation does not provide justification for removal of `isnan`.

all the compilers document that they are free to optimize as if there were no NaNs, and they then do whatever is best for their implementation.

Exactly. Leaving `isnan` in the code makes compiler behavior more consistent and convenient for users. Clang also can go this way.

Do you have a concrete reason why a pragma is unsuitable?

I described the concerns in the reply to Mehdi Amini's message. 

Thanks,
--Serge

Aaron Ballman via llvm-dev

unread,
Sep 20, 2021, 1:11:24 PM9/20/21
to Mehdi AMINI, llvm...@lists.llvm.org, cfe...@lists.llvm.org

I think part of my concern is with the "result of floating point
arithmetic can't have NaN" statement. I consider this case:

if (isnan(foo / bar)) {}

to be fundamentally different from this case:

if (isnan(foo)) {}

because the first example can never result in the if branch being
taken with -ffinite-math-only while the second example can. Assuming
that all values are the result of floating-point arithmetic is a
faulty assumption.

~Aaron

Arthur O'Dwyer via llvm-dev

unread,
Sep 20, 2021, 1:13:46 PM9/20/21
to Chris Tetreault, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Mon, Sep 20, 2021 at 12:40 PM Chris Tetreault via cfe-dev <cfe...@lists.llvm.org> wrote:

You’re confusing implementation details (you have a Godbolt link that shows that MSVC just happens to not remove the isnan call) with documented behavior (I provided a link to the MSVC docs that shows that no promises are made with respect to NaN). The fact is that no compiler (Maybe ICC does, I don’t know, I haven’t checked. I bet their docs say something similar to MSVC, clang, and GCC though.) guarantees that isnan(x) will not be optimized out with fast-math enabled. There is no inconsistency: all the compilers document that they are free to optimize as if there were no NaNs, and they then do whatever is best for their implementation. If you think this is inconsistent, then let me tell you about that time I dereferenced a null pointer and it didn’t segfault.


+1.
 

Now, many people have suggested in this thread that a pragma be added. I personally fully support this proposal. I think it’s a very clean solution, and any non-trivial portable codebase probably already has a library of preprocessor macros that abstract this sort of thing. Do you have a concrete reason why a pragma is unsuitable?


I think that there are two questions in this thread.
- How should fast-math mode actually behave? [Maybe we're settled on the "NANs are SNANs and signaling operations produce unspecified values" model. Gee I hope so.]
- Should switching into/out-of fast-math mode be controlled only by a TU-level command line option, or should there also be a pragma for it?
(Btw, multiply these questions by the number of different modes we support; I've consciously been trying to phrase everything in terms of NANs, but Serge likes to talk about -ffinite-math-only, where not just NANs but also INF and -INF are verboten. And then there's the -fno-signed-zeros option, which does not forbid -0.0, but does permit it to be treated as a-zero-value-of-unspecified-sign. I think -ffast-math probably also forbids subnormals... but maybe it just treats them as either-their-actual-value-or-zero-of-the-appropriate-sign.)

Anyway, should there be a pragma in addition to the TU-level command line option?:

There must be a command-line option, anyway — I mean, it already exists (-ffast-math, etc). Pragmas are basically about taking some command-line decision and allowing the decision to be made more granularly. Look at `#pragma GCC diagnostic ignored "-Wfoo"`, for example; it's expressed in terms of the command-line option. So if Clang were to support something like
    #pragma GCC optimize("ffast-math")  // cf. #pragma GCC optimize("O2")
that would still be expressed in terms of the command-line option, and hopefully both the option and the pragma would end up setting the same internal bits.

However, pragmas are hard to get right. Consider:

    double unoptimized(double x) { return (x + 1) > x; }
    #pragma GCC optimize("ffast-math")
    bool optimized(double x) { return unoptimized(x+1); }
    #pragma GCC optimize("fno-fast-math")
    int main() {
        return optimized(HUGE_VAL);
    }

The compiler would have to think about what it means to inline `unoptimized` into `optimized`.  The arithmetic in `optimized` produces INF, but then it's passed to `unoptimized`, which is not marked as fast-math, so I guess the compiler can't optimize `(x+1) > x` into `true` in that context?  It's at least confusing and subtle for the compiler vendor to get right; and possibly philosophically confusing as well.
Alternatively, you could forbid inlining between functions with different optimization levels... but that's clearly a terrible idea, right?

And of course some programmer is going to try something dumb like

    #pragma GCC optimize("ffast-math")
    #define REAL_ISNAN(x) std::isnan(x)
    #pragma GCC optimize("fno-fast-math")

which "of course" won't work, but who's going to explain it to them?

Not to mention, if the pragma is active at the top of the TU where some template or implicitly defaulted special member is defined, but then it's not active at the point where the template is instantiated or the special member is implicitly defined... what the heck happens in that case? and who's going to write the StackOverflow answer about it?

Basically, the translation unit is the natural unit of... hmm... translation. There's very little return-on-investment involved in trying to circumvent that.

–Arthur

Arthur O'Dwyer via llvm-dev

unread,
Sep 20, 2021, 1:16:11 PM9/20/21
to Aaron Ballman, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Mon, Sep 20, 2021 at 1:11 PM Aaron Ballman via cfe-dev <cfe...@lists.llvm.org> wrote:
On Mon, Sep 20, 2021 at 1:04 PM Mehdi AMINI via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> I'd find it unfortunate though that in a mode which specifies that the result of floating point arithmetic can't have NaN we can't constant fold isnan(x) where x is a potentially complex expression (think that x could be dead-code if not for the isnan).

I think part of my concern is with the "result of floating point
arithmetic can't have NaN" statement. I consider this case:

if (isnan(foo / bar)) {}

to be fundamentally different from this case:

if (isnan(foo)) {}

because the first example can never result in the if branch being
taken with -ffinite-math-only while the second example can.

Mehdi's statement is consistent with your interpretation, yes.  In the first case, "foo / bar" is the result of floating-point arithmetic.  In the second case, "foo" is not the result of floating-point arithmetic — or at least, you didn't demonstrate that it was.

Of course if the preceding line was

    foo = foo / bar;
    if (isnan(foo)) {}

then we can constant-fold `isnan(foo)` to false, as Mehdi said, because `foo`'s value is the result of floating-point arithmetic.

HTH,
–Arthur

Aaron Ballman via llvm-dev

unread,
Sep 20, 2021, 1:21:44 PM9/20/21
to Arthur O'Dwyer, llvm...@lists.llvm.org, cfe...@lists.llvm.org

Thanks! So long as we are conservative with the optimization and only
fold the call to isnan when the operand is proven to be the result of
an arithmetic expression, my concerns are lessened.

~Aaron

>
> HTH,
> –Arthur

Mehdi AMINI via llvm-dev

unread,
Sep 20, 2021, 1:35:10 PM9/20/21
to Aaron Ballman, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Mon, Sep 20, 2021 at 10:11 AM Aaron Ballman <aa...@aaronballman.com> wrote:
>
> On Mon, Sep 20, 2021 at 1:04 PM Mehdi AMINI via llvm-dev
> <llvm...@lists.llvm.org> wrote:
> > On Mon, Sep 20, 2021 at 1:23 AM Serge Pavlov <sepa...@gmail.com> wrote:
> >>
> >> On Fri, Sep 17, 2021 at 11:17 PM Mehdi AMINI <joke...@gmail.com> wrote:
> >>>
> >>> On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepa...@gmail.com> wrote:
> >> Also if we agree that NaNs can appear in the code compiled with -ffinite-math-only, there must be a way to check if a number is a NaN.
> >
> >
> > I'd find it unfortunate though that in a mode which specifies that the result of floating point arithmetic can't have NaN we can't constant fold isnan(x) where x is a potentially complex expression (think that x could be dead-code if not for the isnan).
>
> I think part of my concern is with the "result of floating point
> arithmetic can't have NaN" statement. I consider this case:
>
> if (isnan(foo / bar)) {}
>
> to be fundamentally different from this case:
>
> if (isnan(foo)) {}
>
> because the first example can never result in the if branch being
> taken with -ffinite-math-only while the second example can. Assuming
> that all values are the result of floating-point arithmetic is a
> faulty assumption.

Right, but having to manage the fact that `x + 0. -> x` would become a
potential pessimization for the optimizer is quite disturbing to me.

--
Mehdi

Aaron Ballman via llvm-dev

unread,
Sep 20, 2021, 1:45:06 PM9/20/21
to Mehdi AMINI, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Mon, Sep 20, 2021 at 1:35 PM Mehdi AMINI <joke...@gmail.com> wrote:
>
> On Mon, Sep 20, 2021 at 10:11 AM Aaron Ballman <aa...@aaronballman.com> wrote:
> >
> > On Mon, Sep 20, 2021 at 1:04 PM Mehdi AMINI via llvm-dev
> > <llvm...@lists.llvm.org> wrote:
> > > On Mon, Sep 20, 2021 at 1:23 AM Serge Pavlov <sepa...@gmail.com> wrote:
> > >>
> > >> On Fri, Sep 17, 2021 at 11:17 PM Mehdi AMINI <joke...@gmail.com> wrote:
> > >>>
> > >>> On Thu, Sep 16, 2021 at 11:19 PM Serge Pavlov <sepa...@gmail.com> wrote:
> > >> Also if we agree that NaNs can appear in the code compiled with -ffinite-math-only, there must be a way to check if a number is a NaN.
> > >
> > >
> > > I'd find it unfortunate though that in a mode which specifies that the result of floating point arithmetic can't have NaN we can't constant fold isnan(x) where x is a potentially complex expression (think that x could be dead-code if not for the isnan).
> >
> > I think part of my concern is with the "result of floating point
> > arithmetic can't have NaN" statement. I consider this case:
> >
> > if (isnan(foo / bar)) {}
> >
> > to be fundamentally different from this case:
> >
> > if (isnan(foo)) {}
> >
> > because the first example can never result in the if branch being
> > taken with -ffinite-math-only while the second example can. Assuming
> > that all values are the result of floating-point arithmetic is a
> > faulty assumption.
>
> Right, but having to manage the fact that `x + 0. -> x` would become a
> potential pessimization for the optimizer is quite disturbing to me.

Eh, this is in an "ignore the standard" compiler mode, so all of it is
quite disturbing to me. :-D But more seriously, I would rather we err
on the side of caution when deciding which code the user explicitly
wrote that can be removed by the optimizer. When my code goes slow, I
can profile it to see what's causing that and react accordingly if I
care. When the optimizer removes some error handling code I have, I am
100% reliant on my testing infrastructure catching that before my
users do, and testing code is sometimes run in debug mode for a
variety of reasons and not everyone writes fantastic testing code that
covers all of their error cases. To me, being conserative is the less
user-hostile approach in this case.

~Aaron

Mehdi AMINI via llvm-dev

unread,
Sep 20, 2021, 1:46:24 PM9/20/21
to Arthur O'Dwyer, llvm...@lists.llvm.org, cfe...@lists.llvm.org
On Mon, Sep 20, 2021 at 10:13 AM Arthur O'Dwyer
<arthur....@gmail.com> wrote:
>
> On Mon, Sep 20, 2021 at 12:40 PM Chris Tetreault via cfe-dev <cfe...@lists.llvm.org> wrote:
>>
>> You’re confusing implementation details (you have a Godbolt link that shows that MSVC just happens to not remove the isnan call) with documented behavior (I provided a link to the MSVC docs that shows that no promises are made with respect to NaN). The fact is that no compiler (Maybe ICC does, I don’t know, I haven’t checked. I bet their docs say something similar to MSVC, clang, and GCC though.) guarantees that isnan(x) will not be optimized out with fast-math enabled. There is no inconsistency: all the compilers document that they are free to optimize as if there were no NaNs, and they then do whatever is best for their implementation. If you think this is inconsistent, then let me tell you about that time I dereferenced a null pointer and it didn’t segfault.
>
>
> +1.
>
>>
>> Now, many people have suggested in this thread that a pragma be added. I personally fully support this proposal. I think it’s a very clean solution, and any non-trivial portable codebase probably already has a library of preprocessor macros that abstract this sort of thing. Do you have a concrete reason why a pragma is unsuitable?
>
>
> I think that there are two questions in this thread.
> - How should fast-math mode actually behave? [Maybe we're settled on the "NANs are SNANs and signaling operations produce unspecified values" model. Gee I hope so.]
> - Should switching into/out-of fast-math mode be controlled only by a TU-level command line option, or should there also be a pragma for it?
> (Btw, multiply these questions by the number of different modes we support; I've consciously been trying to phrase everything in terms of NANs, but Serge likes to talk about -ffinite-math-only, where not just NANs but also INF and -INF are verboten. And then there's the -fno-signed-zeros option, which does not forbid -0.0, but does permit it to be treated as a-zero-value-of-unspecified-sign. I think -ffast-math probably also forbids subnormals... but maybe it just treats them as either-their-actual-value-or-zero-of-the-appropriate-sign.)
>
> Anyway, should there be a pragma in addition to the TU-level command line option?:
>
> There must be a command-line option, anyway — I mean, it already exists (-ffast-math, etc). Pragmas are basically about taking some command-line decision and allowing the decision to be made more granularly. Look at `#pragma GCC diagnostic ignored "-Wfoo"`, for example; it's expressed in terms of the command-line option. So if Clang were to support something like
> #pragma GCC optimize("ffast-math") // cf. #pragma GCC optimize("O2")
> that would still be expressed in terms of the command-line option, and hopefully both the option and the pragma would end up setting the same internal bits.
>
> However, pragmas are hard to get right. Consider:
>
> double unoptimized(double x) { return (x + 1) > x; }
> #pragma GCC optimize("ffast-math")
> bool optimized(double x) { return unoptimized(x+1); }
> #pragma GCC optimize("fno-fast-math")
> int main() {
> return optimized(HUGE_VAL);
> }
>
> The compiler would have to think about what it means to inline `unoptimized` into `optimized`. The arithmetic in `optimized` produces INF, but then it's passed to `unoptimized`, which is not marked as fast-math, so I guess the compiler can't optimize `(x+1) > x` into `true` in that context? It's at least confusing and subtle for the compiler vendor to get right; and possibly philosophically confusing as well.
> Alternatively, you could forbid inlining between functions with different optimization levels... but that's clearly a terrible idea, right?

That does not seem like a terrible idea: we already limit inline when
function attributes mismatch in this way.
But here you don't even need inlining if the pragma is used only for a
sequence of statements inside a function. Fortunately we handle
fast-math with individual instruction flags, so you could imagine:

#pragma GCC optimize("ffast-math")

x = a + b;


#pragma GCC optimize("fno-fast-math")

if (isnan(x)) {
...
}

which would tag the fadd with the fast flag but not the isnan.

In practice you'd write this way though:


x = a + b; // default specified on the command line
#pragma GCC push_options
#pragma push GCC optimize("fno-fast-math")
if (isnan(x)) {
...
}
#pragma GCC pop_options


>
> And of course some programmer is going to try something dumb like
>
> #pragma GCC optimize("ffast-math")
> #define REAL_ISNAN(x) std::isnan(x)
> #pragma GCC optimize("fno-fast-math")
>
> which "of course" won't work, but who's going to explain it to them?
>
> Not to mention, if the pragma is active at the top of the TU where some template or implicitly defaulted special member is defined, but then it's not active at the point where the template is instantiated or the special member is implicitly defined... what the heck happens in that case? and who's going to write the StackOverflow answer about it?
>
> Basically, the translation unit is the natural unit of... hmm... translation. There's very little return-on-investment involved in trying to circumvent that.
>
> –Arthur

Jorg Brown via llvm-dev

unread,
Sep 20, 2021, 2:23:51 PM9/20/21
to Joerg Sonnenberger, LLVM Developers Mailing List, cfe...@lists.llvm.org
On Wed, Sep 8, 2021 at 1:58 PM Joerg Sonnenberger via llvm-dev <llvm...@lists.llvm.org> wrote:
On Wed, Sep 08, 2021 at 06:04:08PM +0000, Chris Tetreault via llvm-dev wrote:
> As a developer (who always reads the docs and generally makes good life
> choices), if I turn on -ffast-math, I want the compiler to produce the
> fastest possible floating point math code possible, floating point
> semantics be darned. Given this viewpoint, my opinion on this topic is
> that the compiler should do whatever it wants, given the constraints of
> the documented behavior of NaN.

There is a huge different between optimisations that assume NaN is not
present and breaking checks for them. I'm not convinced at all that
constant-folding isnan to false will actually speed up real world code.

I am.

Here's the first 20ish lines of the code in question, which you can see in full at https://github.com/abseil/abseil-cpp/blob/master/absl/strings/numbers.cc :

size_t numbers_internal::SixDigitsToBuffer(double d, char* const buffer) {
static_assert(std::numeric_limits<float>::is_iec559,
"IEEE-754/IEC-559 support only");

char* out = buffer; // we write data to out

if (std::isnan(d)) {
strcpy(out, "nan"); // NOLINT(runtime/printf)
return 3;
}
if (d == 0) { // +0 and -0 are handled here
if (std::signbit(d)) *out++ = '-';
*out++ = '0';
*out = 0;
return out - buffer;
}
if (d < 0) {
*out++ = '-';
d = -d;
}
if (std::isinf(d)) {
strcpy(out, "inf"); // NOLINT(runtime/printf)
return out + 3 - buffer;
}


This routine formats a double-precision floating-point number as 6 digits, the same way "%g" would do it.  The calls to std::isnan and std::isinf are a measurable slowdown.

(In fact, as a result of this discussion I'm realizing that the call to isinf can be replaced with a comparison with positive infinity, since negative infinity is not a possibility here.  Patch in progress...)

-- Jorg

Jorg Brown via llvm-dev

unread,
Sep 20, 2021, 2:29:28 PM9/20/21
to Chris Tetreault, LLVM Developers, cfe...@lists.llvm.org
On Wed, Sep 8, 2021 at 11:04 AM Chris Tetreault via llvm-dev <llvm...@lists.llvm.org> wrote:

Of course, this all sounds fine and well, but the reality is that people don't read docs and don't make good life choices. They turn on fast math because they want it to reduce `x * 0` to `0`, and are surprised when their NaN handling code fails. This is unfortunate, but I don't think we should reduce the effectiveness of fast-math because of this human issue. Other flags exist for these users, and when they complain they should be told about them. Really this is an issue of poor developer discipline, and if we really want to solve this, perhaps some sort of "fast math sanitizer" can be created. It can statically analyze code and complain when it sees things like `if (isnan(foo))` not guarded by `__FAST_MATH__` with fast math enabled. Or, maybe the compiler can just issue a warning unconditionally in this case.


A "fast math sanitizer" sounds like a GREAT idea.

Also, I'd say that if -ffast-math is on, and -fsanitize=undefined, then the UB detection should already be doing this.

One caveat, though.  Suppose there is code like this:

if (d == std::numeric_limits<double>::infinity()) {

Does the comparison against infinity count as UB, since the compiler is allowed to assume there are no infinite fp values?  I hope not, or an optimization I'm about to check in will have to be reverted.

-- Jorg
 

Chris Tetreault via llvm-dev

unread,
Sep 20, 2021, 3:48:02 PM9/20/21
to Jorg Brown, LLVM Developers, cfe...@lists.llvm.org

I don’t know if that check against infinity is UB, but the way I read the docs this code is not safe. If we had a pragma to control fast-math, then it would be easy to fix this.

 

From: Jorg Brown <jorg....@gmail.com>
Sent: Monday, September 20, 2021 11:29 AM
To: Chris Tetreault <ctet...@quicinc.com>

Cc: Serge Pavlov <sepa...@gmail.com>; LLVM Developers <llvm...@lists.llvm.org>; cfe...@lists.llvm.org
Subject: Re: [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

 

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

On Wed, Sep 8, 2021 at 11:04 AM Chris Tetreault via llvm-dev <llvm...@lists.llvm.org> wrote:

Chris Tetreault via llvm-dev

unread,
Sep 20, 2021, 3:59:32 PM9/20/21
to Serge Pavlov, llvm...@lists.llvm.org, cfe...@lists.llvm.org

> Exactly. Leaving `isnan` in the code makes compiler behavior more consistent and convenient for users. Clang also can go this way.

 

It is never a good idea to rely on implementation details when the documentation states otherwise. Just because MSVC and ICC *just happen to* work as you’d like clang to *today*, doesn’t mean that they will tomorrow. Their docs state this clearly. Dereferencing a pointer you just freed just happens to work sometimes too, but I can’t think of anybody who would use this as a basis for an argument that use-after-free is a reasonable use case.

 

> I described the concerns in the reply to Mehdi Amini's message. 

 

I did read your response to Mehdi. You said “It could mitigate the problem if it were implemented”. You then went on to reiterate all your other arguments, completely ignoring a clean solution.

 

Adding a pragma is what’s most convenient for users. That way they can opt into, or opt out of, having their isnan calls being candidates for optimization. If clang guaranteed that isnan(x) would not be optimized out, then I cannot use a pragma to force it to be an optimization candidate. If clang keeps its current behavior, then a new fast-math pragma can be used to ensure that it is not optimized out. It’s the most flexible solution.

 

From: Serge Pavlov <sepa...@gmail.com>
Sent: Monday, September 20, 2021 10:10 AM
To: Chris Tetreault <ctet...@quicinc.com>

antlists via llvm-dev

unread,
Sep 20, 2021, 4:46:43 PM9/20/21
to llvm...@lists.llvm.org

On 20/09/2021 20:59, Chris Tetreault via llvm-dev wrote:
> > Exactly. Leaving `isnan` in the code makes compiler behavior more
> consistent and convenient for users. Clang also can go this way.
>
> It is never a good idea to rely on implementation details when the

> documentation states otherwise. Just because MSVC and ICC **just happen
> to** work as you’d like clang to **today**, doesn’t mean that they will

> tomorrow. Their docs state this clearly. Dereferencing a pointer you
> just freed just happens to work sometimes too, but I can’t think of
> anybody who would use this as a basis for an argument that
> use-after-free is a reasonable use case.

And the trouble with that is that what YOU want, isn't necessarily what
SOMEONE ELSE wants.


>
> > I described the concerns in the reply to Mehdi Amini's message.
>
> I did read your response to Mehdi. You said “It could mitigate the
> problem if it were implemented”. You then went on to reiterate all your
> other arguments, completely ignoring a clean solution.
>
> Adding a pragma is what’s most convenient for users. That way they can
> opt into, or opt out of, having their isnan calls being candidates for
> optimization. If clang guaranteed that isnan(x) would not be optimized
> out, then I cannot use a pragma to force it to be an optimization
> candidate. If clang keeps its current behavior, then a new fast-math
> pragma can be used to ensure that it is not optimized out. It’s the most
> flexible solution.
>

I know I may be coming in at half cock, but what is the CORRECT
MATHEMATICAL behaviour? (Yes I know there may be multiple "correct"
behaviours.) Users should always be able to select a "correct"
behaviour, and one of them should be the default, not copying some other
implemention just because "they did it thataway". Iirc excel used to
accept 29 Feb 1900 as a valid date - that's no reason for all the other
spreadsheets to get it wrong, too ...

(MY "correct" hobby horse - I'd like "divide by zero" to return
"infinity" and "divide by infinity" return "zero", because that way the
origin does not have special status and it makes maths much easier. I've
heard a bunch of reasons why I'm wrong, but should changing the
co-ordinate system really alter the result of a mathematical operation?
Didn't Einstein say everything is relative? There is nothing special
about any observation point?)

Cheers,
Wol

Serge Pavlov via llvm-dev

unread,
Sep 21, 2021, 9:05:26 AM9/21/21
to antlists, LLVM Developers
On Tue, Sep 21, 2021 at 3:46 AM antlists via llvm-dev <llvm...@lists.llvm.org> wrote:
I know I may be coming in at half cock, but what is the CORRECT
MATHEMATICAL behaviour?

I'll use your question to make a summary.

 Now clang removes calls to `isnan` in -ffinite-math-only. The justification for such behavior comes from early GCC viewpoint (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50724#c1):

"With -ffinite-math-only you are telling that there are no NaNs"

On this assumption the removal of `isnan` is a perfectly valid optimization, it produces a program that behaves identically to the initial program, which contains `isnan`. For all valid input data it produces  the same output.

Actually the existence of NaNs cannot be ignored even if no operations on NaNs take place. There are several reasons for that, at least:
- Users need  to check the input data, for example to assert, or to choose a slower but general path.
- NaN can be used as a sentinel. In this case it is actually not a number but some distinguishable bit pattern.
- NaNs can be produced by operations on "allowed" numbers. It is easier to demonstrate on infinities. The expression `x * y` may produce infinity when its operands are finite numbers and it is hard to reveal that without execution of the operation.
- Code compiled with `-ffinite-math-only` may call functions from libraries or from other parts of the program, compiled without this flag. These functions may return infinities and NaNs. It is often impractical to check the arguments prior to the call to ensure the result would be finite. In some cases it is even impossible. In this case it is necessary to check if the result is finite.

The "correct behavior" depends on whether NaNs are allowed in the code compiled with -ffinite-math-only:
- if no, we have the current behavior,
- if yes, `isnan` cannot be optimized out.

The approach "-ffinite-math-only means there are no NaNs" is an abstraction that does not fit user needs. This problem is actually common. There are many user feedbacks, in GCC bug tracker:
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50724
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949
as well as in forums:
- https://stackoverflow.com/questions/38978951/can-ffast-math-be-safely-used-on-a-typical-project
- https://stackoverflow.com/questions/47703436/isnan-does-not-work-correctly-with-ofast-flags
To support `isnan` with -ffinite-math-only users have to use kludges like emulating `isnan` with integer arithmetic, calling libs implementation or moving their own `isnan` implementation to separate translation units. All of them have drawbacks and bring loss in portability and performance.

The proposal was to support the programming model, anticipated by many users. Advantages and risks were discussed in the thread previously.

Thanks,
--Serge

Chris Tetreault via llvm-dev

unread,
Sep 21, 2021, 1:18:45 PM9/21/21
to Serge Pavlov, antlists, llvm...@lists.llvm.org, cfe...@lists.llvm.org

(tl;dr) I seem to have become embroiled in this argument again, and this is clearly going nowhere. Therefore, I propose:

 

  1. At a minimum, we should do nothing but (maybe) clarify the docs. I think the current behavior is acceptable for the vast majority of users, and there exists workarounds for users that really need them.
  2. If interested users really want to be able to do isnan(x) with fast-math enabled without workarounds, they should implement the pragma. Such a pragma would be of great value to society, and surely users would thank us.
  3. We should not create a special case for isnan(x). I have laid out many arguments why it’s not a good solution. It would reduce the value of fast-math for users such as the graphics programmers described below, and create portability issues if code begins to rely on this new special case (becoming non-portable to any other compiler or old versions of clang. And again, just because MSVC and ICC *just happen to*, as an implementation detail, work like this today doesn’t mean that they can be relied on to continue working like this, or that old versions worked like this).

 

(my rebuttal to this last message)

 

Many users of fast-math *can* actually ignore the existence of NaNs. For example, in a real time 3D renderer, the presence of NaN is typically a bug. In the hot render loop, you cannot afford to do any sort of error handling related to NaN, and getting a NaN in one your matrices or vertices will result in black triangles or a black screen. For these users, you must just be disciplined, and avoid creating NaNs.

 

However, if such a user were to depend on a math lib (probably templated because graphics programmers overwhelmingly prefer C++, which means that they would be compiling it themselves, but can’t easily modify it) that contains isnan, it would be a shame if the compiler were forbidden to consider it an optimization candidate because they called a function that “uses” it for input validation or something. Such code would look something like:

 

```

template <typename T> T f(T x) {

   // some sort of branch based on isnan(x) here

```

 

… which meets the criteria you propose for not being eliminated, but is also absolutely undesirable. For users that do care about NaN, a pragma would be an elegant way for them to ensure that the isnan call they care about is not eliminated. For a graphics developer, an assert on isnan suddenly becomes useful if they had a pragma and could enable it selectively.

 

Thanks,

   Christopher Tetreault

 

From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Serge Pavlov via llvm-dev
Sent: Tuesday, September 21, 2021 6:05 AM
To: antlists <antl...@youngman.org.uk>
Cc: LLVM Developers <llvm...@lists.llvm.org>
Subject: Re: [llvm-dev] [cfe-dev] Should isnan be optimized out in fast-math mode?

 

WARNING: This email originated from outside of Qualcomm. Please be wary of any links or attachments, and do not enable macros.

On Tue, Sep 21, 2021 at 3:46 AM antlists via llvm-dev <llvm...@lists.llvm.org> wrote:

Reply all
Reply to author
Forward
0 new messages