I do not believe we should conflate StrictFP support, and
`-ffast-math` handling, these are two separate/separatable concerns.
As for the latter, right now i'm not convinced that we should
second-guess/override explicit user request.
This is inconsistent, and does not match how at least the GCC deals with it.
I think changing the status-quo (before said patch) should be a separate RFC,
and that change should be undone until after that RFC is accepted.
As for the latter, the main point of confusion is,
why is `@llvm.isnan` still used in non-StrictFP code?
The argument that we need `@llvm.isnan` because we *might* transition
in and out of StrictFP section does not seem to hold for me, because
https://llvm.org/docs/LangRef.html#constrainedfp says:
> If any FP operation in a function is constrained then they all must be constrained. This is required for correct LLVM IR.
So presumably when codegen'ing a function, we already know that we
will use StrictFP ops, and that should be the knob to use `@llvm.isnan`,
i think.
Roman
> _______________________________________________
> cfe-dev mailing list
> cfe...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Thank you for posting the RFC!
I do not believe we should conflate StrictFP support, and
`-ffast-math` handling, these are two separate/separatable concerns.
As for the latter, right now i'm not convinced that we should
second-guess/override explicit user request.
This is inconsistent, and does not match how at least the GCC deals with it.
I think changing the status-quo (before said patch) should be a separate RFC,
and that change should be undone until after that RFC is accepted.
As for the latter, the main point of confusion is,
why is `@llvm.isnan` still used in non-StrictFP code?
The argument that we need `@llvm.isnan` because we *might* transition
in and out of StrictFP section does not seem to hold for me, because
https://llvm.org/docs/LangRef.html#constrainedfp says:
> If any FP operation in a function is constrained then they all must be constrained. This is required for correct LLVM IR.
AFAIU, the fpexcept metadata is just a hint for optimizations, which have to respect the sequence of FP instructions. “ignore” is not a command to ask backend to mask the exception. It just tells optimizations the exceptions are not concerned by user, so that the FP instructions can be scheduled.
I think an non exception intrinsic is needed, because it is a commend to request backend not to emit instructions that may raise exception.
Thanks
Pengfei
> Why would a target be allowed to lower the constrained fcmp intrinsic with "ignore" to an operation that might raise an exception?
Because the “fpexcept.ignore” argument means that exceptions are being ignored. That is, it doesn’t matter whether exceptions are raised or not. It’s a promise that exceptions are not unmasked or being explicitly checked in this part of the code.
The “fpexcept.ignore” argument isn’t used for the fully strict mode. It is used for two cases:
1) To get back to the “default” state (in combination with “fpround.tonearest”) when a non-constrained operation is inlined into a function that has constrained operations.
2) To enable dynamic rounding (-frounding-math) without requiring strict exception semantics.
Regarding NaN comparison support in the presence of fast-math flags, I think this is an excellent reason to have the intrinsic. I needed this for a (non-C/C++) downstream compiler before the llvm.isnan intrinsic was introduced and found that the only ways to achieve it were either to introduce an intrinsic for the comparison or to remove the ‘nnan’ flag everywhere.
-Andy
The exception behavior “ignore” is supposed to be the same behavior as with our normal FP instructions. We normally _assume_ that we are running with traps _off_. Thus there’s no requirement that we avoid instructions that may trap.
Which means these are intended to be the same:
%z = fadd float %x, %y
%z = call float llvm.experimental.constrained.fadd.f64(float %x, float %y, metadata !”round.tonearest”, metadata !”fpexcept.ignore”)
If you are compiling to an environment with traps _on_ then you probably need “maytrap” or “strict” instead. Well, unless you just don’t care about continuing after a trap and thus don’t need to know exactly where a trap happened.
That said, whether or not traps are allowed is per-instruction rather than with the constrained metadata. See the “fneg” instruction which is defined to never generate traps. We should be defining isnan() to never trap. It’s surprising as all when this traps:
Z = (isnan(x) ? 0 : x * y);
That’s real code (from memory) that bit me and is part of the reason I got into this strictfp work.
From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of
Sanjay Patel via llvm-dev
Sent: Monday, August 23, 2021 1:56 PM
To: Wang, Pengfei <pengfe...@intel.com>
Cc: llvm...@lists.llvm.org; Clang Dev <cfe...@lists.llvm.org>
Subject: Re: [llvm-dev] [cfe-dev] Intrinsic llvm::isnan
EXTERNAL
enum class(source) -> fpclassify, signbit, issignalingboolean isSignMinus(source) -> signbitboolean isNormal(source) -> isnormalboolean isFinite(source) -> isfiniteboolean isZero(source) -> iszeroboolean isSubnormal(source) -> issubnormalboolean isInfinite(source) -> isinfboolean isNaN(source) -> isnanboolean isSignaling(source) -> issignalingboolean isCanonical(source) -> iscanonicalenum radix(source) -> FLT_RADIX constantboolean totalOrder(source, source) -> totalorderboolean totalOrderMag(source, source) -> totalordermag(Decimal numbers:)boolean sameQuantum(source, source) -> samequantum
sourceFormat copy(source) -> memcpy, memmove,+(x) -- (Somewhat surprisingly, "x = y" is implementation-defined to be either a "copy" (non-signaling) or a "convertFormat" (signaling), even when x and y are the same type.)sourceFormat negate(source) -> -(x)sourceFormat abs(source) -> fabs
(Decimal numbers):
decimalEncoding encodeDecimal(decimal) -> encodedecdecimal decodeDecimal(decimalEncoding) -> decodedecbinaryEncoding encodeBinary(decimal) -> encodebindecimal decodeBinary(binaryEncoding) -> decodebin
Ok, does this edit to the LangRef make sense for the definition of "ignore":"optimization passes may assume that the exception status flags will not be read and that floating-point exceptions **will** be masked" -->"optimization passes may assume that the exception status flags will not be read and that floating-point exceptions **may** be masked"
I think the confusion here has to do with what it means for the optimizer to assume that “floating-point exceptions will be masked”. What I’m about to say may be specific to the x86-64-based processor behavior because that’s the one I know, but I think other architectures will have similar behavior but possibly with different terms?
If a floating point exception occurs in an operation and that exception is masked in the floating point control registers, the status bit for that exception will be set and nothing else will happen. If a floating point exception occurs in an operation and that exception is NOT masked in the floating point control registers, an exception handler will be called.
So, what the current documentation means to say is that the optimizer may assume that the status flags will not be read and that no exception handler will be called if a floating point exception occurs. Basically, it means that floating point exceptions will be ignored in all ways.
> It's still not clear to me if there's a benefit from having an intrinsic vs. one more exception mode ("none" or "off").
The key point to recognize here is that the metadata arguments to the constrained intrinsics are intended to describe assumptions that can or cannot be made. They are not intended to bring about the state that they describe. So, for example, if the rounding mode argument is set to “fpround.tonearest” that means the optimizer can assume that the processor is set to use the “tonearest” rounding mode, but it does not enforce this. If we see this flag during ISel, we can select an instruction that takes an explicit rounding mode argument and use the “tonearest” value for that argument, but we are also allowed to select an instruction that uses the rounding mode from the MXCSR register and assume that when this instruction is executed that rounding mode will be “tonearest”.
If you need an intrinsic that is guaranteed not to raise exceptions, that should be a distinct intrinsic from any similar intrinsic that may raise exceptions. See, for example, llvm.experimental.constrained.nearbyint and llvm.experimental.constrained.rint which differ only in exception behavior.
-Andy
It probably doesn’t help that IEEE 754-2019 uses the word “signal” to mean, paraphrased, “set a bit in a status register and continue producing a result with no trap”, and that’s in the default FP environment. So IEEE “signaling” is not Unix “signaling”.
Once one picks up on that distinction the IEEE 754 document reads a lot like Andy’s description of x86-64 behavior.
From: Kaylor, Andrew <andrew...@intel.com>
Sent: Monday, August 23, 2021 6:31 PM
To: Arthur O'Dwyer <arthur....@gmail.com>; Sanjay Patel <spa...@rotateright.com>
Cc: Kevin Neal <Kevin...@sas.com>; Wang, Pengfei <pengfe...@intel.com>; llvm...@lists.llvm.org; Clang Dev <cfe...@lists.llvm.org>
Subject: RE: [cfe-dev] [llvm-dev] Intrinsic llvm::isnan
EXTERNAL
Thanks for making this clearer. So there's general consensus that we need an intrinsic to represent isnan()...1. How many others like this do we need? (see the list that James provided a couple of mails earlier)
2. How do we introduce these and limit regressions? D104854 had clang produce isnan() for all modes simultaneously, but that leads to problems like: https://llvm.org/PR51556 . I suggest introducing these in LLVM only or in clang with strictFP modes only as a 1st step, so we're not causing perf regressions that require reverting large patches.
_______________________________________________
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 was mentioned as
motivational, but i don't see any resolution there,
it's not even in "confirmed" state.
The change landed *after* the branch, so i think we are not too
constrained on the timeline for it. If there is no RFC before Aug 30'th,
i will be posting a patch to revert said change myself.
Roman.
Regardless of everything, i would like to see a patch that restores
the -ffast-math handling, and *then* the RFC on what the is-nan check
should do when -ffast-math is present.
It is more than possible that the RFC will succeed,
but i don't think a change like that should happen the way it did.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 was mentioned as
motivational, but i don't see any resolution there,
it's not even in "confirmed" state.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 was mentioned as
motivational, but i don't see any resolution there,
it's not even in "confirmed" state.I agree, this is not at all clear evidence as to GCC's position on the matter.
… double and double under -ffinite-math-only are different types. Any function call from
one world to the other is dangerous. Inline functions translated in different
TUs compiled with different math flags violate the ODR.
… the example of simplifying x * 0 to 0 is about preserving NaNs
during expression simplification when the FPU would. I think these
kind of optimizations are what originally was intended to be allowed
with -ffinite-math-only - that we started to simplify isnan(x) to false
was extending the scope and that change wasn't uncontested since
it makes -ffinite-math-only less useful to some people.
… if we want a version of
-ffinite-math-only that's well-behaved in language terms (including
in terms of the macros that are defined and in the values of
numeric_limits), perhaps this should be an official (optional) C/C++
extension that defines what the rules are.
_______________________________________________
1. We want isnan() for strict-FP support. The arguments are similar to why we added "fneg" as a real instruction.
2. We also need a bunch of other FP classify functions for strict-FP support to properly deal with SNAN: isinf, isfinite, isnormal, issubnormal, iszero, fpclassify.
5. The change to the regular FP env would make clang/LLVM behave differently than GCC as noted here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 . That bug report has been cited as motivation, but IIUC, the last comment says everything is behaving as expected and the report should be closed pending a C/C++ language change.
An argument was made that because the original patch has been in main for a few weeks, and there are no known bug reports, that it is fine as-is. I disagree with that: it's not easy to recognize the potential harm (likely small perf regression), not many users live/test continuously against trunk for FP fast-math perf, and it's not easy to file bugs against LLVM.
1. We want isnan() for strict-FP support. The arguments are similar to why we added "fneg" as a real instruction.
2. We also need a bunch of other FP classify functions for strict-FP support to properly deal with SNAN: isinf, isfinite, isnormal, issubnormal, iszero, fpclassify.Correct. It looks like we have consensus on this topic and agree that `isnan` as well as other classification intrinsics are necessary. If it is so then we discuss only behavior of the intrinsic in fast-math mode.
5. The change to the regular FP env would make clang/LLVM behave differently than GCC as noted here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 . That bug report has been cited as motivation, but IIUC, the last comment says everything is behaving as expected and the report should be closed pending a C/C++ language change.I know nothing about such a pending C/C++ language change. Could you please share the references?
An argument was made that because the original patch has been in main for a few weeks, and there are no known bug reports, that it is fine as-is. I disagree with that: it's not easy to recognize the potential harm (likely small perf regression), not many users live/test continuously against trunk for FP fast-math perf, and it's not easy to file bugs against LLVM.Note that leaving `isnan` in the code compiled with `fast-math` is at most a missed optimization. It **does not** break semantics. Hardly it brings noticeable performance regression. It could if the code had many calls to 'isnan', but it is unlikely for real code that is compiled with `fast-math`. Quite the contrary, there are users who have to use integer arithmetics to check for NaN values, because they come from outside (memory, other modules), where `fast-math` is not used. Users expect that `isnan` behaves according to C standard but it does not due to this optimization.
Roman
On Wed, Sep 1, 2021 at 1:26 PM Serge Pavlov <sepa...@gmail.com> wrote:1. We want isnan() for strict-FP support. The arguments are similar to why we added "fneg" as a real instruction.
2. We also need a bunch of other FP classify functions for strict-FP support to properly deal with SNAN: isinf, isfinite, isnormal, issubnormal, iszero, fpclassify.Correct. It looks like we have consensus on this topic and agree that `isnan` as well as other classification intrinsics are necessary. If it is so then we discuss only behavior of the intrinsic in fast-math mode.This is ignoring the fact that we need more general functionality for strict-FP. Do we want N different intrinsics to accomplish that? Or is one intrinsic that takes parameters and/or returns some enum value a better choice? Aren't we going to need that more general function for 'fpclassify' anyway?
5. The change to the regular FP env would make clang/LLVM behave differently than GCC as noted here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949 . That bug report has been cited as motivation, but IIUC, the last comment says everything is behaving as expected and the report should be closed pending a C/C++ language change.I know nothing about such a pending C/C++ language change. Could you please share the references?I don't either. I'm just reading the comment there:"Thus, it seems the status quo is working as intended. It's just that we're missing a standard interface to ask for behavior conformance. Any progress on this issue must go via WG21."
An argument was made that because the original patch has been in main for a few weeks, and there are no known bug reports, that it is fine as-is. I disagree with that: it's not easy to recognize the potential harm (likely small perf regression), not many users live/test continuously against trunk for FP fast-math perf, and it's not easy to file bugs against LLVM.Note that leaving `isnan` in the code compiled with `fast-math` is at most a missed optimization. It **does not** break semantics. Hardly it brings noticeable performance regression. It could if the code had many calls to 'isnan', but it is unlikely for real code that is compiled with `fast-math`. Quite the contrary, there are users who have to use integer arithmetics to check for NaN values, because they come from outside (memory, other modules), where `fast-math` is not used. Users expect that `isnan` behaves according to C standard but it does not due to this optimization.I haven't seen any evidence that users are expecting the new behavior, and I don't know how the C standard applies to code that explicitly exempts itself from that standard by using -ffast-math.
In any case, the patch that introduced the isnan intrinsic changed default FP behavior/perf without the appropriate changes to the LLVM LangRef or clang manual to even let users know that things changed. We could also argue that we're worse off in the current state because isnan changed, but isinf did not. At least before, the behavior was consistent?