[llvm-dev] Intrinsic llvm::isnan

Serge Pavlov via llvm-dev

unread,

Aug 23, 2021, 6:57:44 AM8/23/21

to LLVM Developers, Clang Dev

Hi all,

Some time ago a new intrinsic `llvm.isnan` was introduced, which is intended to represent IEEE-754 operation `isNaN` as well as a family of C library functions `isnan*`. Recently during post-commit review concern was raised (see https://reviews.llvm.org/D104854) that this functionality must have had RFC to make sure there is consensus on semantics.

Previously the frontend intrinsic `__builtin_isnan` was converted into `cmp uno` during IR generation in clang codegen. There are two main reasons why this solution is not satisfactory.

1. Strict floating-point semantics.

If FP exceptions are not ignored, `cmp uno` must be replaced with its constrained counterpart, namely `llvm.experimental.constrained.fcmp` or `llvm.experimental.constrained.fcmps`. None of them is compatible with the semantics of `isnan`. Both IEEE-754 (5.7.2) an C standard (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf, F.3p6) demand that this function does not raise floating point exceptions. Both the constrained compare intrinsics raise an exception if either operand is a SNAN (https://llvm.org/docs/LangRef.html#id1131). So there was no target-independent IR construct that could express `isnan`.

This drawback was significant enough and some attempts to alleviate it were undertaken. In https://reviews.llvm.org/D95948 `isnan` was implemented using integer operations in strictfp functions. It however is not suitable for targets where a more efficient way exists, like dedicated instruction. Another solution was implemented in https://reviews.llvm.org/D96568, where a hook 'clang::TargetCodeGenInfo::testFPKind' was introduced, which injects target specific code into IR. Such a solution makes IR more target-dependent and prevents some IR-level optimizations.

2. Compilation with -ffast-math

The option '-ffast-math' is often used for performance critical code, as it can produce faster code. In this case the user must ensure that NaNs are not used as operand values. `isnan` is just proposed for such checks, but it was unusable when `isnan` was represented by compare instruction, because the latter may be optimized out. One of use cases is data in memory, which is processed by a function compiled with `-ffast-math`. Some items in the data are NaNs to denote absence of values.

This point requires some remarks about using NaNs when a function is compiled with `-ffast-math`. GCC manual does not specify how this option works, it only states about `-ffinite-math-only` (https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Optimize-Options.html#Optimize-Options):

`Allow optimizations for floating-point arithmetic that assume that arguments and results are not NaNs or +-Infs.`

`isnan` does not do any arithmetic, only check, so this statement apparently does not apply to it. There is a GCC bug report https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949, where investigation conforms that std::isnan() and std::fpclassify() should works with NaNs as specified even in -ffast-math mode.

Extending NaN restrictions in -ffast-math mode to functions like `isnan` does not make code faster, but is a source of broken user expectations. If a user writes `isnan` they usually expect an actual check. Silently removing the check is a stronger action than assuming that float value contains only real numbers.

Intrinsic `llvm.isnan` solves these problems. It
- represents the check throughout the IR pipeline and saves it from undesired optimizations,
- is lowered in selector, which can choose the most suitable implementation for particular target,
- helps keeping IR target-independent,
- facilitates program analysis as the operation is presented explicitly and is not hidden behind general nodes.

Note that `llvm.isnan` is optimized out if its argument is an operation with `nnan` flag, this behavior agrees with the definition of this flag in LLVM documentation.

Any feedback is welcome.

Thanks,
--Serge

Roman Lebedev via llvm-dev

unread,

Aug 23, 2021, 7:12:55 AM8/23/21

to Serge Pavlov, LLVM Developers, Clang Dev

Thank you for posting the RFC!

I do not believe we should conflate StrictFP support, and
`-ffast-math` handling, these are two separate/separatable concerns.

As for the latter, right now i'm not convinced that we should
second-guess/override explicit user request.
This is inconsistent, and does not match how at least the GCC deals with it.
I think changing the status-quo (before said patch) should be a separate RFC,
and that change should be undone until after that RFC is accepted.

As for the latter, the main point of confusion is,
why is `@llvm.isnan` still used in non-StrictFP code?
The argument that we need `@llvm.isnan` because we *might* transition
in and out of StrictFP section does not seem to hold for me, because
https://llvm.org/docs/LangRef.html#constrainedfp says:

> If any FP operation in a function is constrained then they all must be constrained. This is required for correct LLVM IR.

So presumably when codegen'ing a function, we already know that we
will use StrictFP ops, and that should be the knob to use `@llvm.isnan`,
i think.

Roman

> _______________________________________________
> cfe-dev mailing list
> cfe...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Serge Pavlov via llvm-dev

unread,

Aug 23, 2021, 9:01:16 AM8/23/21

to Roman Lebedev, LLVM Developers, Clang Dev

On Mon, Aug 23, 2021 at 6:12 PM Roman Lebedev <lebed...@gmail.com> wrote:

Thank you for posting the RFC!

I do not believe we should conflate StrictFP support, and
`-ffast-math` handling, these are two separate/separatable concerns.

You are right, they are separate, but they originate from the implementation of the same function and can be solved with the same solution.

As for the latter, right now i'm not convinced that we should
second-guess/override explicit user request.
This is inconsistent, and does not match how at least the GCC deals with it.
I think changing the status-quo (before said patch) should be a separate RFC,
and that change should be undone until after that RFC is accepted.

Actually we have two explicit user requests, a call of 'isnan' and an option '-ffast-math'. IMHO they do not contradict each other as 'isnan' is not an arithmetic operation. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. As for GCC behavior, they agree that this behavior is a bag: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949. Intel and Microsoft compilers do not replace 'isnan' with assumed value.

As for the latter, the main point of confusion is,
why is `@llvm.isnan` still used in non-StrictFP code?

We have to introduce an intrinsic to represent `isnan` in strictfp environment. It is natural to use it for the default environment as well. Besides, a target may have a more efficient way to represent `isnan` than unordered comparison.

The argument that we need `@llvm.isnan` because we *might* transition
in and out of StrictFP section does not seem to hold for me, because
https://llvm.org/docs/LangRef.html#constrainedfp says:

> If any FP operation in a function is constrained then they all must be constrained. This is required for correct LLVM IR.

There was no such intention. The primary motivation was strict fp exceptions.

Sanjay Patel via llvm-dev

unread,

Aug 23, 2021, 9:10:55 AM8/23/21

to Serge Pavlov, LLVM Developers, Clang Dev

I'm confused about the definition of:

https://llvm.org/docs/LangRef.html#llvm-experimental-constrained-fcmp-and-llvm-experimental-constrained-fcmps-intrinsics

These intrinsics require an "exception behavior" argument. That argument can take the value “fpexcept.ignore” which is defined as:

"optimization passes may assume that the exception status flags will not be read and that floating-point exceptions will be masked"

i1 @llvm.experimental.constrained.fcmp.f64(double %x, double %x, metadata !"uno", metadata !"fpexcept.ignore")

How is this call in LLVM different than the semantics of "isnan(x)" that is required by IEEE-754 or the C standard?

Serge Pavlov via llvm-dev

unread,

Aug 23, 2021, 9:38:46 AM8/23/21

to Sanjay Patel, LLVM Developers, Clang Dev

> How is this call in LLVM different than the semantics of "isnan(x)" that is required by IEEE-754 or the C standard?

If either of the arguments of `llvm.experimental.constrained.fcmp` is signaling NaN, this function should raise an 'Invalid' exception. 'isnan' never raises exceptions.

Thanks,
--Serge

Sanjay Patel via llvm-dev

unread,

Aug 23, 2021, 9:43:22 AM8/23/21

to Serge Pavlov, LLVM Developers, Clang Dev

You're saying that the function definition text overrides the argument definition text. Why are we choosing that interpretation rather than the inverse (and documenting it one way or the other)?

Serge Pavlov via llvm-dev

unread,

Aug 23, 2021, 9:50:43 AM8/23/21

to Sanjay Patel, LLVM Developers, Clang Dev

When codegen lowers this call, it does not know if this is a regular compare and it may create CMP instruction, as for default environment, or this is 'isnan' for which it should generate different code, which does not influence FP state.

On most architectures compare instructions change FP state, in default environment it is just ignored, but actually hardware registers can be modified, For 'isnan' instructions must actually leave FP state intact.

Thanks,
--Serge

Sanjay Patel via llvm-dev

unread,

Aug 23, 2021, 10:13:33 AM8/23/21

to Serge Pavlov, LLVM Developers, Clang Dev

If the FP state is not made invisible/irrelevant by "fpexcept.ignore", then why is that argument on this intrinsic call at all?

Ie, why is that argument not used by the backend to decide to lower to a CMP instruction or special isnan instruction or library call?

An example would be helpful - in what case would these two lower differently given that SNAN always raises a visible exception?

call i1 @llvm.experimental.constrained.fcmp.f64(double %x, double %x, metadata !"uno", metadata !"fpexcept.ignore") ; "floating-point exceptions will be masked"

call i1 @llvm.experimental.constrained.fcmp.f64(double %x, double %x, metadata !"uno", metadata !"fpexcept.strict") ; "this mode can also be used with code that unmasks FP exceptions"

Wang, Pengfei via llvm-dev

unread,

Aug 23, 2021, 11:04:18 AM8/23/21

to Sanjay Patel, Serge Pavlov, llvm...@lists.llvm.org, Clang Dev

AFAIU, the fpexcept metadata is just a hint for optimizations, which have to respect the sequence of FP instructions. “ignore” is not a command to ask backend to mask the exception. It just tells optimizations the exceptions are not concerned by user, so that the FP instructions can be scheduled.

I think an non exception intrinsic is needed, because it is a commend to request backend not to emit instructions that may raise exception.

Thanks

Pengfei

Sanjay Patel via llvm-dev

unread,

Aug 23, 2021, 1:56:09 PM8/23/21

to Wang, Pengfei, llvm...@lists.llvm.org, Clang Dev

Why would a target be allowed to lower the constrained fcmp intrinsic with "ignore" to an operation that might raise an exception?

If that scenario does not exist, then make the definition of fcmp "ignore" stronger by saying that it *requires* a lowering that does not raise an exception.

If refining the definition of "ignore" in this case is too hard to reconcile with other FP ops, how about adding another exception mode ("none") to specify that exceptions are guaranteed not to be raised?

Kaylor, Andrew via llvm-dev

unread,

Aug 23, 2021, 2:26:38 PM8/23/21

to Sanjay Patel, Wang, Pengfei, llvm...@lists.llvm.org, Clang Dev

> Why would a target be allowed to lower the constrained fcmp intrinsic with "ignore" to an operation that might raise an exception?

Because the “fpexcept.ignore” argument means that exceptions are being ignored. That is, it doesn’t matter whether exceptions are raised or not. It’s a promise that exceptions are not unmasked or being explicitly checked in this part of the code.

The “fpexcept.ignore” argument isn’t used for the fully strict mode. It is used for two cases:

1) To get back to the “default” state (in combination with “fpround.tonearest”) when a non-constrained operation is inlined into a function that has constrained operations.

2) To enable dynamic rounding (-frounding-math) without requiring strict exception semantics.

Regarding NaN comparison support in the presence of fast-math flags, I think this is an excellent reason to have the intrinsic. I needed this for a (non-C/C++) downstream compiler before the llvm.isnan intrinsic was introduced and found that the only ways to achieve it were either to introduce an intrinsic for the comparison or to remove the ‘nnan’ flag everywhere.

-Andy

Kevin Neal via llvm-dev

unread,

Aug 23, 2021, 2:40:29 PM8/23/21

to Sanjay Patel, Wang, Pengfei, LLVM Developers, Clang Dev

The exception behavior “ignore” is supposed to be the same behavior as with our normal FP instructions. We normally _assume_ that we are running with traps _off_. Thus there’s no requirement that we avoid instructions that may trap.

Which means these are intended to be the same:

%z = fadd float %x, %y

%z = call float llvm.experimental.constrained.fadd.f64(float %x, float %y, metadata !”round.tonearest”, metadata !”fpexcept.ignore”)

If you are compiling to an environment with traps _on_ then you probably need “maytrap” or “strict” instead. Well, unless you just don’t care about continuing after a trap and thus don’t need to know exactly where a trap happened.

That said, whether or not traps are allowed is per-instruction rather than with the constrained metadata. See the “fneg” instruction which is defined to never generate traps. We should be defining isnan() to never trap. It’s surprising as all when this traps:

Z = (isnan(x) ? 0 : x * y);

That’s real code (from memory) that bit me and is part of the reason I got into this strictfp work.

From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Sanjay Patel via llvm-dev
Sent: Monday, August 23, 2021 1:56 PM
To: Wang, Pengfei <pengfe...@intel.com>
Cc: llvm...@lists.llvm.org; Clang Dev <cfe...@lists.llvm.org>
Subject: Re: [llvm-dev] [cfe-dev] Intrinsic llvm::isnan

EXTERNAL

Sanjay Patel via llvm-dev

unread,

Aug 23, 2021, 3:42:46 PM8/23/21

to Kaylor, Andrew, Kevin Neal, llvm...@lists.llvm.org, Clang Dev

Ok, does this edit to the LangRef make sense for the definition of "ignore":

"optimization passes may assume that the exception status flags will not be read and that floating-point exceptions **will** be masked" -->

"optimization passes may assume that the exception status flags will not be read and that floating-point exceptions **may** be masked"

It's still not clear to me if there's a benefit from having an intrinsic vs. one more exception mode ("none" or "off").

That answer might depend on how many more of these intrinsics we'll need? <cmath> has these:

isfinite()

isinf()

isnormal()

signbit()

others?

James Y Knight via llvm-dev

unread,

Aug 23, 2021, 4:46:42 PM8/23/21

to Sanjay Patel, Clang Dev, llvm...@lists.llvm.org

According to IEEE 754, the following operations are "non-computational", and should not signal if provided with an sNAN argument.

I've annotated each with -> C function/macro equivalent, taken from the table in the C standard draft.

enum class(source) -> fpclassify, signbit, issignaling
boolean isSignMinus(source) -> signbit
boolean isNormal(source) -> isnormal
boolean isFinite(source) -> isfinite
boolean isZero(source) -> iszero
boolean isSubnormal(source) -> issubnormal
boolean isInfinite(source) -> isinf
boolean isNaN(source) -> isnan
boolean isSignaling(source) -> issignaling
boolean isCanonical(source) -> iscanonical
enum radix(source) -> FLT_RADIX constant
boolean totalOrder(source, source) -> totalorder
boolean totalOrderMag(source, source) -> totalordermag

(Decimal numbers:)
boolean sameQuantum(source, source) -> samequantum

The following list are "quiet-computational" operations, which also should not signal when provided with an sNAN:

sourceFormat copy(source) -> memcpy, memmove,+(x) -- (Somewhat surprisingly, "x = y" is implementation-defined to be either a "copy" (non-signaling) or a "convertFormat" (signaling), even when x and y are the same type.)
sourceFormat negate(source) -> -(x)
sourceFormat abs(source) -> fabs

(Decimal numbers):

decimalEncoding encodeDecimal(decimal) -> encodedec
decimal decodeDecimal(decimalEncoding) -> decodedec
binaryEncoding encodeBinary(decimal) -> encodebin
decimal decodeBinary(binaryEncoding) -> decodebin

Arthur O'Dwyer via llvm-dev

unread,

Aug 23, 2021, 5:37:20 PM8/23/21

to Sanjay Patel, Clang Dev, llvm...@lists.llvm.org

On Mon, Aug 23, 2021 at 3:42 PM Sanjay Patel via cfe-dev <cfe...@lists.llvm.org> wrote:

Ok, does this edit to the LangRef make sense for the definition of "ignore":
"optimization passes may assume that the exception status flags will not be read and that floating-point exceptions **will** be masked" -->
"optimization passes may assume that the exception status flags will not be read and that floating-point exceptions **may** be masked"

I haven't been following the technical details, but in terms of the English documentation, it makes no sense to say that someone "may assume that [X] may happen." Either [X] always happens, in which case optimization passes may safely assume that it happens; or [X] never happens, in which case optimization passes may safely assume that it does not happen; or else [X] sometimes happens and sometimes doesn't, in which case optimizations passes must not assume anything about [X].

So you might say: "optimization passes may assume that the exception status flags will not be read. Floating-point exceptions might or might not be masked, depending on [____]" (and then mention the relevant variable, such as "instruction set" or "optimization level" or whatever).

HTH,

Arthur

Kaylor, Andrew via llvm-dev

unread,

Aug 23, 2021, 6:31:46 PM8/23/21

to Arthur O'Dwyer, Sanjay Patel, llvm...@lists.llvm.org, Clang Dev

I think the confusion here has to do with what it means for the optimizer to assume that “floating-point exceptions will be masked”. What I’m about to say may be specific to the x86-64-based processor behavior because that’s the one I know, but I think other architectures will have similar behavior but possibly with different terms?

If a floating point exception occurs in an operation and that exception is masked in the floating point control registers, the status bit for that exception will be set and nothing else will happen. If a floating point exception occurs in an operation and that exception is NOT masked in the floating point control registers, an exception handler will be called.

So, what the current documentation means to say is that the optimizer may assume that the status flags will not be read and that no exception handler will be called if a floating point exception occurs. Basically, it means that floating point exceptions will be ignored in all ways.

> It's still not clear to me if there's a benefit from having an intrinsic vs. one more exception mode ("none" or "off").

The key point to recognize here is that the metadata arguments to the constrained intrinsics are intended to describe assumptions that can or cannot be made. They are not intended to bring about the state that they describe. So, for example, if the rounding mode argument is set to “fpround.tonearest” that means the optimizer can assume that the processor is set to use the “tonearest” rounding mode, but it does not enforce this. If we see this flag during ISel, we can select an instruction that takes an explicit rounding mode argument and use the “tonearest” value for that argument, but we are also allowed to select an instruction that uses the rounding mode from the MXCSR register and assume that when this instruction is executed that rounding mode will be “tonearest”.

If you need an intrinsic that is guaranteed not to raise exceptions, that should be a distinct intrinsic from any similar intrinsic that may raise exceptions. See, for example, llvm.experimental.constrained.nearbyint and llvm.experimental.constrained.rint which differ only in exception behavior.

-Andy

Kevin Neal via llvm-dev

unread,

Aug 24, 2021, 8:26:27 AM8/24/21

to Kaylor, Andrew, Arthur O'Dwyer, Sanjay Patel, llvm...@lists.llvm.org, Clang Dev

It probably doesn’t help that IEEE 754-2019 uses the word “signal” to mean, paraphrased, “set a bit in a status register and continue producing a result with no trap”, and that’s in the default FP environment. So IEEE “signaling” is not Unix “signaling”.

Once one picks up on that distinction the IEEE 754 document reads a lot like Andy’s description of x86-64 behavior.

From: Kaylor, Andrew <andrew...@intel.com>
Sent: Monday, August 23, 2021 6:31 PM
To: Arthur O'Dwyer <arthur....@gmail.com>; Sanjay Patel <spa...@rotateright.com>
Cc: Kevin Neal <Kevin...@sas.com>; Wang, Pengfei <pengfe...@intel.com>; llvm...@lists.llvm.org; Clang Dev <cfe...@lists.llvm.org>
Subject: RE: [cfe-dev] [llvm-dev] Intrinsic llvm::isnan

EXTERNAL

Sanjay Patel via llvm-dev

unread,

Aug 24, 2021, 11:59:26 AM8/24/21

to Kevin Neal, Arthur O'Dwyer, Clang Dev, llvm...@lists.llvm.org

Thanks for making this clearer. So there's general consensus that we need an intrinsic to represent isnan()...

1. How many others like this do we need? (see the list that James provided a couple of mails earlier)

2. How do we introduce these and limit regressions? D104854 had clang produce isnan() for all modes simultaneously, but that leads to problems like: https://llvm.org/PR51556 . I suggest introducing these in LLVM only or in clang with strictFP modes only as a 1st step, so we're not causing perf regressions that require reverting large patches.

Serge Pavlov via llvm-dev

unread,

Aug 24, 2021, 1:34:17 PM8/24/21

to Sanjay Patel, llvm...@lists.llvm.org, Clang Dev

On Tue, Aug 24, 2021 at 10:59 PM Sanjay Patel via cfe-dev <cfe...@lists.llvm.org> wrote:

Thanks for making this clearer. So there's general consensus that we need an intrinsic to represent isnan()...
1. How many others like this do we need? (see the list that James provided a couple of mails earlier)

We need all classification functions: isnan, isinf, isfinite, isnormal, issubnormal, iszero, fpclassify. In strict exception mode even iszero requires special treatment, it cannot be implemented as compare with zero, because the argument may be a signaling NaN.

2. How do we introduce these and limit regressions? D104854 had clang produce isnan() for all modes simultaneously, but that leads to problems like: https://llvm.org/PR51556 . I suggest introducing these in LLVM only or in clang with strictFP modes only as a 1st step, so we're not causing perf regressions that require reverting large patches.

The patch is already 3 weeks in the tree. PR51556 is the only problem so far and it is a missed optimization, not malfunction. Besides, as it is said in the ticket, "the SLP call vectorization fails as we don't properly account for calls that have different return/arg types", so the problem already existed, this intrinsic only elucidated it. I think if in a couple of weeks no new problems will be found, work in this direction can be continued, as if there were something broken, it would be already reported.

_______________________________________________

Roman Lebedev via llvm-dev

unread,

Aug 24, 2021, 1:54:03 PM8/24/21

to Serge Pavlov, llvm...@lists.llvm.org, Clang Dev

Regardless of everything, i would like to see a patch that restores
the -ffast-math handling, and *then* the RFC on what the is-nan check
should do when -ffast-math is present.
It is more than possible that the RFC will succeed,
but i don't think a change like that should happen the way it did.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 was mentioned as
motivational, but i don't see any resolution there,
it's not even in "confirmed" state.

The change landed *after* the branch, so i think we are not too
constrained on the timeline for it. If there is no RFC before Aug 30'th,
i will be posting a patch to revert said change myself.

Roman.

Serge Pavlov via llvm-dev

unread,

Aug 24, 2021, 2:28:38 PM8/24/21

to Roman Lebedev, llvm...@lists.llvm.org, Clang Dev

There is no problem to restore the previous behavior if -ffast-math is specified. But why cannot this RFC be used for the discussion of this behavior?

Making a real check instead of optimizing out the function seems safe, as it does not break functionality. Could you please share, what is your concern?

Thanks,
--Serge

James Y Knight via llvm-dev

unread,

Aug 24, 2021, 5:25:37 PM8/24/21

to Roman Lebedev, llvm...@lists.llvm.org, Clang Dev

On Tue, Aug 24, 2021 at 1:53 PM Roman Lebedev via cfe-dev <cfe...@lists.llvm.org> wrote:

Regardless of everything, i would like to see a patch that restores
the -ffast-math handling, and *then* the RFC on what the is-nan check
should do when -ffast-math is present.
It is more than possible that the RFC will succeed,
but i don't think a change like that should happen the way it did.

I find the rationale to be convincing, as to the need for a change. But, the scope of the proposal is too narrow. We cannot discuss fast-math semantics changes only for "isnan", it needs to be in the context of the desired behavior for all operations -- the RFC should cover the entire set of changes we want to eventually make, even if isnan is the only thing implemented so far. Discussing this greater scope could result in a different desired implementation, rather than simply adding "llvm.isnan" intrinsic.

Yet, even with that expanded scope, the two halves of the proposal are still going to be closely linked, so I suspect it still makes sense to discuss both the strict-fp and fast-math changes in a single RFC.

Anyhow, for the fast-math section, I believe the proposed semantics ought to be:

The -ffinite-math-only and -fno-signed-zeros options do not impact the ability to accurately load, store, copy, or pass or return such values from general function calls. They also do not impact any of the "non-computational" and "quiet-computational" IEEE-754 operations, which includes classification functions (fpclassify, signbit, isinf/isnan/etc), sign-modification (copysign, fabs, and negation `-(x)`), as well as the totalorder and totalordermag functions. Those correctly handle NaN, Inf, and signed zeros even when the flags are in effect. These flags do affect the behavior of other expressions and math standard-library calls, as well as comparison operations.

I would not expect this to have an actual negative impact on the performance benefit of those flags, since the optimization benefits mainly arise from comparisons and the general computation instructions which are unchanged.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 was mentioned as
motivational, but i don't see any resolution there,
it's not even in "confirmed" state.

I agree, this is not at all clear evidence as to GCC's position on the matter.

Serge Pavlov via llvm-dev

unread,

Aug 25, 2021, 7:13:08 AM8/25/21

to James Y Knight, llvm...@lists.llvm.org, Clang Dev

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 was mentioned as
motivational, but i don't see any resolution there,
it's not even in "confirmed" state.

I agree, this is not at all clear evidence as to GCC's position on the matter.

Sure, but that demonstrates that they are inclined to such interpretation.

GCC mail list has relevant discussion on this topic: https://gcc.gnu.org/pipermail/gcc-patches/2020-April/544622.html. They tried to make the documentation about -ffinite-math-only clearer. The discussion was based on a view that if -ffinite-math-only was specified, a floating point value cannot be NaN or Inf. During the discussion an interesting observation was made:

… double and double under -ffinite-math-only are different types. Any function call from
one world to the other is dangerous. Inline functions translated in different
TUs compiled with different math flags violate the ODR.

If different parts of a program are compiled with and without -ffinite-math-only doubles of different flavors can intermix. In the code compiled with -ffast-math a user cannot check assumption about the value by calling "assert(!isnan(x));" because `isnan` is replaced with expected value due to optimizations. The only usable solution in this case could be UBSAN, which is a much heavier solution.

Two different flavors of double require different mangling. Template specializations also must be different, in particular, specializations of `std::numeric_limits<T>` must be different for these two double types, the methods `has_infinity` and `has_quite_NaN` must return different values.

They agree that it is profitable to treat NaNs and Infs separately. In this case there would be 4 different double types. It is not clear what to do with constexpr expressions, should the compiler treat using NaN/Inf as undefined behavior even if the ultimate result is finite?

Participants agree that such optimizations are not good:

… the example of simplifying x * 0 to 0 is about preserving NaNs
during expression simplification when the FPU would. I think these
kind of optimizations are what originally was intended to be allowed
with -ffinite-math-only - that we started to simplify isnan(x) to false
was extending the scope and that change wasn't uncontested since
it makes -ffinite-math-only less useful to some people.

Eventually they came to conclusion:

… if we want a version of
-ffinite-math-only that's well-behaved in language terms (including
in terms of the macros that are defined and in the values of
numeric_limits), perhaps this should be an official (optional) C/C++
extension that defines what the rules are.

Thanks,

--Serge

Sanjay Patel via llvm-dev

unread,

Sep 1, 2021, 11:07:17 AM9/1/21

to Serge Pavlov, llvm...@lists.llvm.org, Clang Dev

I'll take a shot at summarizing where we are. Correct as needed:

1. We want isnan() for strict-FP support. The arguments are similar to why we added "fneg" as a real instruction.

2. We also need a bunch of other FP classify functions for strict-FP support to properly deal with SNAN: isinf, isfinite, isnormal, issubnormal, iszero, fpclassify.

3. There's a 2nd motivation to use at least some of these functions in the regular LLVM FP environment with fast-math. For isnan(), this boils down to is "fcmp ord nnan" always true, poison, or unknown? So the root cause might really be that we shouldn't have fast-math-flags on fcmp (see https://bugs.llvm.org/show_bug.cgi?id=38086 and related bugs).

4. The change to the regular FP env requires adding/changing documentation to the LLVM LangRef and clang (James posted a draft in a reply on 8/24).

5. The change to the regular FP env would make clang/LLVM behave differently than GCC as noted here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 . That bug report has been cited as motivation, but IIUC, the last comment says everything is behaving as expected and the report should be closed pending a C/C++ language change.

Several people have suggested reverting the original patch, so we can address this both at larger scale (so we have a clear plan for the other functions that are needed) and with smaller steps (so we don't break things).

An argument was made that because the original patch has been in main for a few weeks, and there are no known bug reports, that it is fine as-is. I disagree with that: it's not easy to recognize the potential harm (likely small perf regression), not many users live/test continuously against trunk for FP fast-math perf, and it's not easy to file bugs against LLVM.

_______________________________________________

Serge Pavlov via llvm-dev

unread,

Sep 1, 2021, 1:27:01 PM9/1/21

to Sanjay Patel, llvm...@lists.llvm.org, Clang Dev

Thank you for summarizing!

Let's divide topics.

1. We want isnan() for strict-FP support. The arguments are similar to why we added "fneg" as a real instruction.
2. We also need a bunch of other FP classify functions for strict-FP support to properly deal with SNAN: isinf, isfinite, isnormal, issubnormal, iszero, fpclassify.

Correct. It looks like we have consensus on this topic and agree that `isnan` as well as other classification intrinsics are necessary. If it is so then we discuss only behavior of the intrinsic in fast-math mode.

The problem with fast-math is that it is unclear what guarantees the user provides to the compiler by using this option. It is not clearly documented anywhere. Implicit understanding is usually one of these:

1. Values of type `double` never are NaNs.

2. Arithmetic operations never get or produce NaN values.

If we adhere to the first approach, then `isnan` may be optimized out, because NaNs are not in the set of possible values for `double`. This is the understanding used in the aforementioned discussion in the GCC mail list. As follows from the discussion, this is a pretty fruitless way. It means `double` in `fast-math` compilation is a different type than regular `double`, they relate to each other like C enumerations or Pascal subranges. In this approach we must answer the questions:

- How would the user check data that come from a function that was compiled without `fast-math`? They cannot write `assert(isnan(x))` if `isnan` is optimized out and there is no conversion between two different flavors of `double`.

- What should return `std::numeric_limits<double>::has_quiet_NaN()`?

- What body should have this function if it is used in a program where some files are compiled with `fast-math` and some without?

- Should inlining of a function compiled with `fast-math` to a function compiled without it be prohibited in inliner?

- Should `std::isnan(std::numeric_limits<float>::quiet_NaN())` be true?

In the cited GCC mail discussion it was found that consistent implementation of this approach requires tremendous efforts and eventually requires changes of C standard.

In the second approach the guarantees are similar to those implied in integer division - we assume the divisor is not zero. It is consistent with llvm IR, where fast math flags are properties of an operation, not a type. None of the listed problems do not rise in this case. At the same time it allows optimizations like `0 * x -> 0`, which was the main motivation for this mode.There must be severe reasons to reject this approach in favour of the first.

5. The change to the regular FP env would make clang/LLVM behave differently than GCC as noted here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 . That bug report has been cited as motivation, but IIUC, the last comment says everything is behaving as expected and the report should be closed pending a C/C++ language change.

I know nothing about such a pending C/C++ language change. Could you please share the references?

An argument was made that because the original patch has been in main for a few weeks, and there are no known bug reports, that it is fine as-is. I disagree with that: it's not easy to recognize the potential harm (likely small perf regression), not many users live/test continuously against trunk for FP fast-math perf, and it's not easy to file bugs against LLVM.

Note that leaving `isnan` in the code compiled with `fast-math` is at most a missed optimization. It **does not** break semantics. Hardly it brings noticeable performance regression. It could if the code had many calls to 'isnan', but it is unlikely for real code that is compiled with `fast-math`. Quite the contrary, there are users who have to use integer arithmetics to check for NaN values, because they come from outside (memory, other modules), where `fast-math` is not used. Users expect that `isnan` behaves according to C standard but it does not due to this optimization.

Thanks,
--Serge

Sanjay Patel via llvm-dev

unread,

Sep 1, 2021, 4:03:31 PM9/1/21

to Serge Pavlov, llvm...@lists.llvm.org, Clang Dev

On Wed, Sep 1, 2021 at 1:26 PM Serge Pavlov <sepa...@gmail.com> wrote:

1. We want isnan() for strict-FP support. The arguments are similar to why we added "fneg" as a real instruction.
2. We also need a bunch of other FP classify functions for strict-FP support to properly deal with SNAN: isinf, isfinite, isnormal, issubnormal, iszero, fpclassify.

Correct. It looks like we have consensus on this topic and agree that `isnan` as well as other classification intrinsics are necessary. If it is so then we discuss only behavior of the intrinsic in fast-math mode.

This is ignoring the fact that we need more general functionality for strict-FP. Do we want N different intrinsics to accomplish that? Or is one intrinsic that takes parameters and/or returns some enum value a better choice? Aren't we going to need that more general function for 'fpclassify' anyway?

5. The change to the regular FP env would make clang/LLVM behave differently than GCC as noted here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 . That bug report has been cited as motivation, but IIUC, the last comment says everything is behaving as expected and the report should be closed pending a C/C++ language change.

I know nothing about such a pending C/C++ language change. Could you please share the references?

I don't either. I'm just reading the comment there:

"Thus, it seems the status quo is working as intended. It's just that we're missing a standard interface to ask for behavior conformance. Any progress on this issue must go via WG21."

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949#c8

An argument was made that because the original patch has been in main for a few weeks, and there are no known bug reports, that it is fine as-is. I disagree with that: it's not easy to recognize the potential harm (likely small perf regression), not many users live/test continuously against trunk for FP fast-math perf, and it's not easy to file bugs against LLVM.

Note that leaving `isnan` in the code compiled with `fast-math` is at most a missed optimization. It **does not** break semantics. Hardly it brings noticeable performance regression. It could if the code had many calls to 'isnan', but it is unlikely for real code that is compiled with `fast-math`. Quite the contrary, there are users who have to use integer arithmetics to check for NaN values, because they come from outside (memory, other modules), where `fast-math` is not used. Users expect that `isnan` behaves according to C standard but it does not due to this optimization.

I haven't seen any evidence that users are expecting the new behavior, and I don't know how the C standard applies to code that explicitly exempts itself from that standard by using -ffast-math.

In any case, the patch that introduced the isnan intrinsic changed default FP behavior/perf without the appropriate changes to the LLVM LangRef or clang manual to even let users know that things changed. We could also argue that we're worse off in the current state because isnan changed, but isinf did not. At least before, the behavior was consistent?

Roman Lebedev via llvm-dev

unread,

Sep 2, 2021, 6:56:53 AM9/2/21

to Sanjay Patel, llvm...@lists.llvm.org, Clang Dev

I've gone ahead and pulled the plug.
I'm eagerly looking forward to seeing an RFC posted to llvm-dev+cfe-dev.
I think the main underlying thing this addresses (strict-fp) is important.

Roman

Serge Pavlov via llvm-dev

unread,

Sep 3, 2021, 2:02:14 AM9/3/21

to Sanjay Patel, llvm...@lists.llvm.org, Clang Dev

On Thu, Sep 2, 2021 at 3:03 AM Sanjay Patel <spa...@rotateright.com> wrote:

On Wed, Sep 1, 2021 at 1:26 PM Serge Pavlov <sepa...@gmail.com> wrote:

1. We want isnan() for strict-FP support. The arguments are similar to why we added "fneg" as a real instruction.
2. We also need a bunch of other FP classify functions for strict-FP support to properly deal with SNAN: isinf, isfinite, isnormal, issubnormal, iszero, fpclassify.

Correct. It looks like we have consensus on this topic and agree that `isnan` as well as other classification intrinsics are necessary. If it is so then we discuss only behavior of the intrinsic in fast-math mode.

This is ignoring the fact that we need more general functionality for strict-FP. Do we want N different intrinsics to accomplish that? Or is one intrinsic that takes parameters and/or returns some enum value a better choice? Aren't we going to need that more general function for 'fpclassify' anyway?

I think separate intrinsics for each function is a better solution. It allows effective implementation and expresses semantics more clearly. Universal functions like `fpclassify` or hypothetical `is_of_fp_class(FP_SUBNORMAL)` involve constants which are target-dependent and require more effort in codegen to produce better code.

5. The change to the regular FP env would make clang/LLVM behave differently than GCC as noted here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 . That bug report has been cited as motivation, but IIUC, the last comment says everything is behaving as expected and the report should be closed pending a C/C++ language change.

I know nothing about such a pending C/C++ language change. Could you please share the references?

I don't either. I'm just reading the comment there:
"Thus, it seems the status quo is working as intended. It's just that we're missing a standard interface to ask for behavior conformance. Any progress on this issue must go via WG21."
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949#c8

AFAIK there is no such pending proposal neither in WG21 nor in WG14.

An argument was made that because the original patch has been in main for a few weeks, and there are no known bug reports, that it is fine as-is. I disagree with that: it's not easy to recognize the potential harm (likely small perf regression), not many users live/test continuously against trunk for FP fast-math perf, and it's not easy to file bugs against LLVM.

Note that leaving `isnan` in the code compiled with `fast-math` is at most a missed optimization. It **does not** break semantics. Hardly it brings noticeable performance regression. It could if the code had many calls to 'isnan', but it is unlikely for real code that is compiled with `fast-math`. Quite the contrary, there are users who have to use integer arithmetics to check for NaN values, because they come from outside (memory, other modules), where `fast-math` is not used. Users expect that `isnan` behaves according to C standard but it does not due to this optimization.

I haven't seen any evidence that users are expecting the new behavior, and I don't know how the C standard applies to code that explicitly exempts itself from that standard by using -ffast-math.

-fast-math is just an optimization option, it is a hint to the compiler, which can generate faster code in this case. It should not deny the C standard.

As for user discontent about such optimization, actually there are bug reports and discussions in forums. For example, GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50724. In the first posts there the submitter described their concern about using isnan with fast-math, which is typical for such case. In https://stackoverflow.com/questions/22931147/stdisinf-does-not-work-with-ffast-math-how-to-check-for-infinity there is a recomendation to turn off temporarily fast-math when doing such check. `isnan` has become useless with fast-math.

Library developers also alleviate the problem. According to C standard, `isnan` is a macro and can be expanded in different ways depending on configuration options. An external function that implement `isnan` is compiled obviously without fast-math and allows users to work around this compiler behavior.

In any case, the patch that introduced the isnan intrinsic changed default FP behavior/perf without the appropriate changes to the LLVM LangRef or clang manual to even let users know that things changed. We could also argue that we're worse off in the current state because isnan changed, but isinf did not. At least before, the behavior was consistent?

That patch didn't change anything that LLVM LangRef or clang manual states. fast-math is an optimization hint, a correct program that satisfies requirements for -ffast-math must behave identically when compiled with or without this option. In the worst case leaving the function call in code may be considered as missed optimization but not changed functionality. There was nothing that should be mentioned in the documentation. Of course, remarks in documentation may be useful.

There is no consistency in using `isnan`. If this function is taken from libc implementation, it makes a real check. If compiler builtin is used, it disappears. Leaving `isnan` in the code would restore the consistency.

Reply all

Reply to author

Forward