[LLVMdev] [RFC] Extend LLVM IR to express "fast-math" at a per-instruction level

Michael Ilseman

unread,

Oct 29, 2012, 7:34:47 PM10/29/12

to llv...@cs.uiuc.edu

Introduction
---

LLVM IR currently does not have any support for specifying fine-grained control
over relaxing floating point requirements for the optimizer. The below is a
proposal to extend floating point IR instructions to support a number of flags
that a creator of IR can use to allow for greater optimizations when
desired. Such changes are sometimes referred to as fast-math, but this proposal
is about finer-grained specifications at a per-instruction level.

What this doesn't address
---

Default behavior is retained, and this proposal is only addressing relaxing
restrictions. For example, assuming default rounding mode will remain
untouched. Discussion on changing the default behavior of LLVM or allowing for
more restrictive behavior is outside the scope of this proposal. This proposal
does not address behavior of denormals, which is more of a backend concern.

Specifying exact precision control or requirements is outside the scope of this
proposal, and can probably be handled with the existing metadata implementation.

This proposal covers changes to and optimizations over LLVM IR, and changes to
codegen are outside the scope of this proposal. The flags described in the next
section exist only at the IR level, and will not be propagated into codegen or
the SelectionDAG.

Flags
---
no NaNs (N)
- ignore the existence of NaNs when convenient
no Infs (I)
- ignore the existence of Infs when convenient
no signed zeros (S)
- ignore the existence of negative zero when convenient
allow fusion (F)
- fuse FP operations when convenient, despite possible differences in rounding
(e.g. form FMAs)
unsafe algebra (A)
- allow for algebraically equivalent transformations that may dramatically
change results in floating point. (e.g. reassociation)

Throughout I'll refer to these options in their short-hand, e.g. 'A'.
Internally, these flags are to reside in SubclassData.

======
Question:

Not all combinations make sense (e.g. 'A' pretty much implies all other flags).

Basically, I have the below semilattice of sensible relations:
A > S > I > N
A > F
Meaning that 'A' implies all the others, 'S' implies 'I' and 'N', etc.

It might make sense to change the S, I, and N options to be some kind of finite
option with levels 3, 2, and 1 respectively. F and A could be kept distinct. It
is still the case that A would imply pretty much everything else.
======

Changes to LangRef
---

Change the definitions of floating point arithmetic operations, below is how
fadd will change:

'fadd' Instruction
Syntax:

<result> = fadd {flag}* <ty> <op1>, <op2> ; yields {ty}:result
...
Semantics:
...
flag can be one of the following optimizer hints to enable otherwise unsafe
floating point optimizations:
N: no NaNs - ignore the existence of NaNs when convenient
I: no infs - ignore the existence of Infs when convenient
S: no signed zeros - ignore the existence of negative zero when convenient
F: allow fusion - fuse FP operations when convenient, despite possible
differences in rounding
A: unsafe algebra - allow for algebraically equivalent transformations that
may dramatically change results in floating point.

Changes to optimizations
---

Optimizations should be allowed to perform unsafe optimizations provided the
instructions involved have the corresponding restrictions relaxed. When
combining instructions, optimizations should do what makes sense to not remove
restrictions that previously existed (commonly, a bitwise-AND of the flags).

Below are some example optimizations that could be allowed with the given
relaxations.

N - no NaNs
x == x ==> true

S - no signed zeros
x - 0 ==> x
0 - (x - y) ==> y - x

NS - no signed zeros AND no NaNs
x * 0 ==> 0

NI - no infs AND no NaNs
x - x ==> 0
Inf > x ==> true

A - unsafe-algebra
Reassociation
(x + C1) + C2 ==> x + (C1 + C2)
Redistribution
(x * C) + x ==> x * (C+1)
(x * C) + (x + x) ==> x * (C + 2)
Reciprocal
x / C ==> x * (1/C)

These examples apply when the new constants are permitted, e.g. not denormal,
and all the instructions involved have the needed flags.

I propose to expand -instsimplify and -instcombine to perform these kinds of
optimizations. -reassociate will be expanded to reassociate floating point
operations when allowed. Similar to existing behavior regarding integer
wrapping, -early-cse will not CSE FP operations with mismatched flags, while
-gvn will (conservatively). This allows later optimizations to optimize the
expressions independently between runs of -early-cse and -gvn.

Changes to frontends
---

Frontends are free to generate code with flags set as they desire. Frontends
should continue to call llc with their desired options, as the flags apply only
at the IR level and not at codegen or the SelectionDAGs.

Below is a suggested change to clang's command-line options.

-ffast-math
Currently described as:
Enable the *frontend*'s 'fast-math' mode. This has no effect on optimizations,
but provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math
flag

I propose to change the description and behavior to:

Enable 'fast-math' mode. This allows for optimizations that may produce
incorrect and unsafe results, and thus should only be used with care. This
also provides a preprocessor macro __FAST_MATH__ the same as GCC's -ffast-math
flag

I propose that this turn on all flags for all floating point instructions. If
this flag doesn't already cause clang to run llc with -enable-unsafe-fp-math,
then I propose that it does so as well.

-fp-contract=<value>
I'm not too familiar with this option, but I recommend that 'all' turn on the
'F' bit for all FP instructinos, default do so when following the pragma, and
off never doing so. This option should still be passed to the backend.

(Optional)
I propose adding the below flags:

-ffinite-math-only
Allow optimizations to assume that floating point arguments and results are
NaNs or +/-Inf. This may produce incorrect results, and so should be used with
care.

This would set the 'I' and 'N' bits on all generated floating point instructions.

-fno-signed-zeros
Allow optimizations to ignore the signedness of zero. This may produce
incorrect results, and so should be used with care.

This would set the 'S' bit on all FP instructions.

Changes to llvm cli tools
---
opt and llc already have the command line options
-enable-unsafe-fp-math: Enable optimizations that may decrease FP precision
-enable-fp-mad: Enable less precise MAD instructions to be generated
-enable-no-infs-fp-math: Enable FP math optimizations that assume no +-Infs
-enable-no-nans-fp-math: Enable FP math optimizations that assume no NaNs
However, opt makes no use of them as they are currently only considered to be
TargetOptions. llc will remain unchanged, as these options apply to DAG
optimizations while this proposal deals with IR optimizations.

(Optional)
Have an opt pass that adds the desired flags to floating point instructions.

Miscellaneous explanations in the form of Q&A
---

Why not just have "fast-math" rather than individual flags?

Having the individual flags gives the granularity to choose the levels of
optimizations. For example, unsafe-algebra can lead to dramatically different
results in corner cases, and may not be desired when a user just wants to ensure
that x*0 folds to 0.

Why have these flags attached to the instruction itself, rather than be a
compiler mode?

Being attached to the instruction itself allows much greater flexibility both
for other optimizations and for the concerns of the source and target. For
example, a frontend may desire that x - x be folded to 0. This would require
no-NaNs for the subtract. However, the frontend may want to keep NaNs for its
comparisons.

Additionally, these properties can be set internally in the optimizer when the
property has been proven. For example, if x has been found to be positive, then
operations involving x and a constant can be marked to ignore signed zero.

Finally, having these flags allows for greater safety and optimization when code
of different flags are mixed. For example, a function author may set the
unsafe-algebra flag knowing that such transformations will not meaningfully
alter its result. If that function gets inlined into a caller, however, we don't
want to always assume that the function's expressions can be reassociated with
the caller's expressions. These properties allow us to preserve the
optimizations of the inlined function without affecting the caller.

Why not use metadata rather than flags?

There is existing metadata to denote precisions, and this proposal is orthogonal
to those efforts. These flags are analogous to nsw/nuw, and are inherent
properties of the IR instructions themselves that all transformations should
respect.

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Krzysztof Parzyszek

unread,

Oct 29, 2012, 8:18:59 PM10/29/12

to llv...@cs.uiuc.edu

On 10/29/2012 6:34 PM, Michael Ilseman wrote:
>
> N: no NaNs - ignore the existence of NaNs when convenient

Maybe distinguish between quiet and signaling NaNs?

> NI - no infs AND no NaNs
> x - x ==> 0
> Inf > x ==> true

Inf * x ==> 0?

I think that if an infinity appears when NI (or I) is given, the result
should be left as "undefined". Similarly with NaNs. In such cases,
it's impossible to predict the accuracy of the result, so trying to
define what happens is pretty much moot. In this case Inf > x may as
well be simplified to "false" without any loss of (already absent) meaning.

-Krzysztof

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Eli Friedman

unread,

Oct 29, 2012, 8:30:44 PM10/29/12

to Krzysztof Parzyszek, llv...@cs.uiuc.edu

On Mon, Oct 29, 2012 at 5:18 PM, Krzysztof Parzyszek
<kpar...@codeaurora.org> wrote:
> On 10/29/2012 6:34 PM, Michael Ilseman wrote:
>>
>> N: no NaNs - ignore the existence of NaNs when convenient
>
> Maybe distinguish between quiet and signaling NaNs?

We already ignore the existence of signaling NaNs by default. The
proposal could make that more clear, though.

-Eli

Michael Ilseman

unread,

Oct 29, 2012, 11:12:23 PM10/29/12

to Eli Friedman, llv...@cs.uiuc.edu

On Oct 29, 2012, at 5:30 PM, Eli Friedman <eli.fr...@gmail.com> wrote:

> On Mon, Oct 29, 2012 at 5:18 PM, Krzysztof Parzyszek
> <kpar...@codeaurora.org> wrote:
>> On 10/29/2012 6:34 PM, Michael Ilseman wrote:
>>>
>>> N: no NaNs - ignore the existence of NaNs when convenient
>>
>> Maybe distinguish between quiet and signaling NaNs?
>
> We already ignore the existence of signaling NaNs by default. The
> proposal could make that more clear, though.
>

Yes, the default LLVM behavior is:
* No signaling NaNs
* Default rounding mode
* FENV_ACCESS is off
I'll be more explicit from now on.

Michael Ilseman

unread,

Oct 29, 2012, 11:22:46 PM10/29/12

to Krzysztof Parzyszek, llv...@cs.uiuc.edu

On Oct 29, 2012, at 5:18 PM, Krzysztof Parzyszek <kpar...@codeaurora.org> wrote:

> On 10/29/2012 6:34 PM, Michael Ilseman wrote:
> >
> > N: no NaNs - ignore the existence of NaNs when convenient
>
> Maybe distinguish between quiet and signaling NaNs?
>
>
> > NI - no infs AND no NaNs
> > x - x ==> 0
> > Inf > x ==> true
>
> Inf * x ==> 0?
>
> I think that if an infinity appears when NI (or I) is given, the result should be left as "undefined". Similarly with NaNs. In such cases, it's impossible to predict the accuracy of the result, so trying to define what happens is pretty much moot. In this case Inf > x may as well be simplified to "false" without any loss of (already absent) meaning.
>

The goal is not necessarily to un-define Inf/NaN, but to opt-in to unsafe optimizations that would otherwise not be allowed to be applied, e.g. x*0==>0. There may be examples where these optimizations produce arbitrary results as though those constructs were absent in meaning, but that doesn't make Inf/NaN constants completely undefined in general. The "when convenient" wording is already a little vague/permissive, and could be re-worded to state that Values are assumed to not be Inf/NaN when convenient, but Constants may be honored.

Duncan Sands

unread,

Oct 30, 2012, 4:46:47 AM10/30/12

to llv...@cs.uiuc.edu

Hi Michael,

> Flags
> ---
> no NaNs (N)
> - ignore the existence of NaNs when convenient
> no Infs (I)
> - ignore the existence of Infs when convenient
> no signed zeros (S)
> - ignore the existence of negative zero when convenient

while the above flags make perfect sense for me, the other two seem more
dubious:

> allow fusion (F)
> - fuse FP operations when convenient, despite possible differences in rounding
> (e.g. form FMAs)
> unsafe algebra (A)
> - allow for algebraically equivalent transformations that may dramatically
> change results in floating point. (e.g. reassociation)

They don't seem to be capturing a clear concept, they seem more like a grab-bag
of "everything else" (A) or "here's a random thing that is important today so
let's have a flag for it" (F).

...

> Why not use metadata rather than flags?
>
> There is existing metadata to denote precisions, and this proposal is orthogonal
> to those efforts. These flags are analogous to nsw/nuw, and are inherent
> properties of the IR instructions themselves that all transformations should
> respect.

If you drop any of these flags then things are still conservatively correct,
just like with metadata. In my opinion this could be implemented as metadata.
(I'm not saying it should be represented as metadata, I'm saying it could be).

Disadvantages of metadata:

- Bloats the IR (however my measurements suggest this is by < 2% for math heavy
code)
- More painful to work with (though helper classes can mitigate this)
- Less efficient to modify (but will flags be cleared that often)?

Disadvantages of using subclass data bits:

- Can only represent flags. Thus you might end up with a mix of flags and
metadata for floating point math, with the metadata holding the non-flag
info, and subclass data holding the flags. In which case it might be better
to just have it all be metadata in the first place
- Only a limited number of bits (but hey)

Hopefully Chris will weigh in with his opinion.

Ciao, Duncan.

Krzysztof Parzyszek

unread,

Oct 30, 2012, 9:48:40 AM10/30/12

to llvmdev@cs.uiuc.edu >> "llvmdev@cs.uiuc.edu"

On 10/29/2012 10:22 PM, Michael Ilseman wrote:
>
>
> The goal is not necessarily to un-define Inf/NaN, but to opt-in to unsafe optimizations that would otherwise not be allowed to be applied, e.g. x*0==>0. There may be examples where these optimizations produce arbitrary results as though those constructs were absent in meaning, but that doesn't make Inf/NaN constants completely undefined in general. The "when convenient" wording is already a little vague/permissive, and could be re-worded to state that Values are assumed to not be Inf/NaN when convenient, but Constants may be honored.

The problem may be in that, in general, it may not be clear whether a
given constant appears in the simplifiable computation or not. For
example, if we have "x > y", and we manage to constant propagate "inf"
in place of "x", we end up with "inf > y", which you suggested be folded
to "true". However, as our constant propagation algorithm becomes more
aggressive, it may be capable of propagating a constant into "y", which
may also turn out to be "inf". This way we end up with "inf > inf". In
such case, again we follow the rule of respecting constants, but now we
generate "false".

Once we assume that there are no inifinities, and an infinity is
actually present, the results are unpredictable.

Dan Gohman

unread,

Oct 30, 2012, 11:23:10 AM10/30/12

to Michael Ilseman, llv...@cs.uiuc.edu

Hi Micheal,

On Mon, Oct 29, 2012 at 4:34 PM, Michael Ilseman <mils...@apple.com> wrote:

I

Flags
---
no NaNs (N)
- ignore the existence of NaNs when convenient
no Infs (I)
- ignore the existence of Infs when convenient
no signed zeros (S)
- ignore the existence of negative zero when convenient

Does this mean ignore the possibility of NaNs as operands, as results, or both? Ditto for infinity and negative zero.

Also, what does "ignore" mean? As worded, it seems to imply Undefined Behavior if the value is encountered. Is that intended?

allow fusion (F)
- fuse FP operations when convenient, despite possible differences in rounding
(e.g. form FMAs)

What do you intend to be the relationship between this and @llvm.fmuladd? It's not clear whether you're trying to replace it or trying to set up an alternative for different use cases.

Is your wording of "fusing" intended to imply fusing with infinite intermediate precision only, or is mere increased precision also valid?

unsafe algebra (A)
- allow for algebraically equivalent transformations that may dramatically
change results in floating point. (e.g. reassociation)

[...]

Not all combinations make sense (e.g. 'A' pretty much implies all other flags).

Basically, I have the below semilattice of sensible relations:
A > S > I > N
A > F
Meaning that 'A' implies all the others, 'S' implies 'I' and 'N', etc.

Why does it make sense for S to imply I and N? GCC's -fno-signed-zeros flag doesn't seem to imply -ffinite-math-only, among other things. The concept of negative zero isn't inherently linked with the concepts of infinity or NaN.

It might make sense to change the S, I, and N options to be some kind of finite
option with levels 3, 2, and 1 respectively. F and A could be kept distinct. It
is still the case that A would imply pretty much everything else.

N - no NaNs
x == x ==> true

This is not true if x is infinity.

S - no signed zeros
x - 0 ==> x
0 - (x - y) ==> y - x

NS - no signed zeros AND no NaNs
x * 0 ==> 0

NI - no infs AND no NaNs
x - x ==> 0
Inf > x ==> true

With the I flag, would the infinity as an operand make this undefined?

A - unsafe-algebra
Reassociation
(x + C1) + C2 ==> x + (C1 + C2)

Redistribution
(x * C) + x ==> x * (C+1)
(x * C) + (x + x) ==> x * (C + 2)
Reciprocal
x / C ==> x * (1/C)

These examples apply when the new constants are permitted, e.g. not denormal,
and all the instructions involved have the needed flags.

I'm confused. In other places, you seem to apply that reassociation would be valid even on non-constant values. It's not clear whether you meant to contradict that here.

[...]

-fp-contract=<value>
I'm not too familiar with this option, but I recommend that 'all' turn on the
'F' bit for all FP instructinos, default do so when following the pragma, and
off never doing so. This option should still be passed to the backend.

Please coordinate with Lang and others who have already done a fair amount of work on FP_CONTRACT.

(Optional)
I propose adding the below flags:

-ffinite-math-only
Allow optimizations to assume that floating point arguments and results are
NaNs or +/-Inf. This may produce incorrect results, and so should be used with
care.

This would set the 'I' and 'N' bits on all generated floating point instructions.

-fno-signed-zeros
Allow optimizations to ignore the signedness of zero. This may produce
incorrect results, and so should be used with care.

This would set the 'S' bit on all FP instructions.

These are established flags in GCC. Do you know if there are any semantic differences between your proposed semantics and the semantics of these flags in GCC? If so, it would be good to either change to match them, or document the differences.

Dan

Dan Gohman

unread,

Oct 30, 2012, 11:31:53 AM10/30/12

to Michael Ilseman, llv...@cs.uiuc.edu

On Tue, Oct 30, 2012 at 8:23 AM, Dan Gohman <dan4...@gmail.com> wrote:

On Mon, Oct 29, 2012 at 4:34 PM, Michael Ilseman <mils...@apple.com> wrote:

N - no NaNs
x == x ==> true

This is not true if x is infinity.

Oops, I was wrong here. Infinity is defined to be equal to infinity.

Dan

Michael Ilseman

unread,

Oct 30, 2012, 12:36:38 PM10/30/12

to Duncan Sands, llv...@cs.uiuc.edu

On Oct 30, 2012, at 1:46 AM, Duncan Sands <bald...@free.fr> wrote:

> Hi Michael,
>
>> Flags
>> ---
>> no NaNs (N)
>> - ignore the existence of NaNs when convenient
>> no Infs (I)
>> - ignore the existence of Infs when convenient
>> no signed zeros (S)
>> - ignore the existence of negative zero when convenient
>
> while the above flags make perfect sense for me, the other two seem more
> dubious:
>
>> allow fusion (F)
>> - fuse FP operations when convenient, despite possible differences in rounding
>> (e.g. form FMAs)
>> unsafe algebra (A)
>> - allow for algebraically equivalent transformations that may dramatically
>> change results in floating point. (e.g. reassociation)
>
> They don't seem to be capturing a clear concept, they seem more like a grab-bag
> of "everything else" (A) or "here's a random thing that is important today so
> let's have a flag for it" (F).
>

'A' is certainly a bit of a grab-bag, but I had difficulty breaking it apart into finer-grained pieces that a user would want to pick and choose between. I'd be interested in any suggestions you might have along these lines.

Why is 'F' such a random flag to have? 'F' implies ignoring intermediate rounding when a more efficient version exists, and it seems fair for it to be its own category.

Thanks for the feedback!

Michael Ilseman

unread,

Oct 30, 2012, 1:18:40 PM10/30/12

to Dan Gohman, llv...@cs.uiuc.edu

On Oct 30, 2012, at 8:23 AM, Dan Gohman <dan4...@gmail.com> wrote:

Hi Micheal,

On Mon, Oct 29, 2012 at 4:34 PM, Michael Ilseman <mils...@apple.com> wrote:
I
Flags
---
no NaNs (N)
- ignore the existence of NaNs when convenient
no Infs (I)
- ignore the existence of Infs when convenient
no signed zeros (S)
- ignore the existence of negative zero when convenient

Does this mean ignore the possibility of NaNs as operands, as results, or both? Ditto for infinity and negative zero.

I wrote this thinking both, though I could certainly imagine it being clearer if defined as operands. The example optimizations section is written along the lines of ignoring both.

Also, what does "ignore" mean? As worded, it seems to imply Undefined Behavior if the value is encountered. Is that intended?

What I'm intending is for optimizations to be allowed to ignore the possibility of those values. Thinking about it more, this is pretty vague. With your and Krzysztof's feedback in mind, I think something along the lines of:

no NaNs (N)

- The operands' values can be assumed to be non-NaN by the optimizer. The result of this operator is Undef if passed a NaN.

Might be more clear. I'll think about that more and revise the examples section too.

allow fusion (F)
- fuse FP operations when convenient, despite possible differences in rounding
(e.g. form FMAs)

What do you intend to be the relationship between this and @llvm.fmuladd? It's not clear whether you're trying to replace it or trying to set up an alternative for different use cases.

Interesting, I had not seen llvm.fmuladd. I'll have to think about this more; perhaps fmuladd can already provide what I was intending here.

Is your wording of "fusing" intended to imply fusing with infinite intermediate precision only, or is mere increased precision also valid?

My intention is that increased precision is also valid, though I haven't though too deeply about the difference

unsafe algebra (A)
- allow for algebraically equivalent transformations that may dramatically
change results in floating point. (e.g. reassociation)
[...]
Not all combinations make sense (e.g. 'A' pretty much implies all other flags).

Basically, I have the below semilattice of sensible relations:
A > S > I > N
A > F
Meaning that 'A' implies all the others, 'S' implies 'I' and 'N', etc.

Why does it make sense for S to imply I and N? GCC's -fno-signed-zeros flag doesn't seem to imply -ffinite-math-only, among other things. The concept of negative zero isn't inherently linked with the concepts of infinity or NaN.

What I mean here is that I'm finding it hard to think of a case where a user would desire to specify 'I' and not specify 'N'. This is more so a question I had as to whether we could/should express this as a fast-math level rather than allow each flag to be individually toggle-able. Any thoughts on this?

It might make sense to change the S, I, and N options to be some kind of finite
option with levels 3, 2, and 1 respectively. F and A could be kept distinct. It
is still the case that A would imply pretty much everything else.

N - no NaNs
x == x ==> true

This is not true if x is infinity.

S - no signed zeros
x - 0 ==> x
0 - (x - y) ==> y - x

NS - no signed zeros AND no NaNs
x * 0 ==> 0

NI - no infs AND no NaNs
x - x ==> 0
Inf > x ==> true

With the I flag, would the infinity as an operand make this undefined?

I'll think about this more with regards to the prior changes.

A - unsafe-algebra
Reassociation
(x + C1) + C2 ==> x + (C1 + C2)
Redistribution
(x * C) + x ==> x * (C+1)
(x * C) + (x + x) ==> x * (C + 2)
Reciprocal
x / C ==> x * (1/C)

These examples apply when the new constants are permitted, e.g. not denormal,
and all the instructions involved have the needed flags.

I'm confused. In other places, you seem to apply that reassociation would be valid even on non-constant values. It's not clear whether you meant to contradict that here.

Reassociation is still valid. These examples are just cases where there would be a clear optimization benefit to be had. I'll probably add in a general expression to clarify.

[...]
-fp-contract=<value>
I'm not too familiar with this option, but I recommend that 'all' turn on the
'F' bit for all FP instructinos, default do so when following the pragma, and
off never doing so. This option should still be passed to the backend.

Please coordinate with Lang and others who have already done a fair amount of work on FP_CONTRACT.

I will, thanks.

(Optional)
I propose adding the below flags:

-ffinite-math-only
Allow optimizations to assume that floating point arguments and results are
NaNs or +/-Inf. This may produce incorrect results, and so should be used with
care.

This would set the 'I' and 'N' bits on all generated floating point instructions.

-fno-signed-zeros
Allow optimizations to ignore the signedness of zero. This may produce
incorrect results, and so should be used with care.

This would set the 'S' bit on all FP instructions.

These are established flags in GCC. Do you know if there are any semantic differences between your proposed semantics and the semantics of these flags in GCC? If so, it would be good to either change to match them, or document the differences.

I don't know of any differences, but I'll have to look into GCC's behavior more.

Dan

Thanks for the feedback!

Michael Ilseman

unread,

Oct 30, 2012, 5:25:43 PM10/30/12

to llv...@cs.uiuc.edu

Here's a new version of the RFC, incorporating and addressing the feedback from Krzysztof, Eli, Duncan, and Dan.

Revision 1 changes:
* Removed Fusion flag from all sections
* Clarified and changed descriptions of remaining flags:
* Make 'N' and 'I' flags be explicitly concerning values of operands, and
producing undef values if a NaN/Inf is provided.
* 'S' is now only about distinguishing between +/-0.
* LangRef changes updated to reflect flags changes
* Updated Quesiton section given the now simpler set of flags
* Optimizations changed to reflect 'N' and 'I' describing operands and not
results
* Be explicit on what LLVM's default behavior is (no signaling NaNs, etc)
* Mention that this could be solved with metadata, and open the debate

Introduction
---

LLVM IR currently does not have any support for specifying fine-grained control
over relaxing floating point requirements for the optimizer. The below is a
proposal to extend floating point IR instructions to support a number of flags
that a creator of IR can use to allow for greater optimizations when
desired. Such changes are sometimes referred to as fast-math, but this proposal
is about finer-grained specifications at a per-instruction level.

What this doesn't address
---

Default behavior is retained, and this proposal is only addressing relaxing

restrictions. LLVM currently by default:
- ignores signaling NaNs
- assumes default rounding mode
- assumes FENV_ACCESS is off

Discussion on changing the default behavior of LLVM or allowing for more
restrictive behavior is outside the scope of this proposal. This proposal does
not address behavior of denormals, which is more of a backend concern.

Specifying exact precision control or requirements is outside the scope of this
proposal, and can probably be handled with the existing metadata implementation.

This proposal covers changes to and optimizations over LLVM IR, and changes to
codegen are outside the scope of this proposal. The flags described in the next
section exist only at the IR level, and will not be propagated into codegen or
the SelectionDAG.

Flags
---
no NaNs (N)

- The optimizer is allowed to optimize under the assumption that the operands'
values are not NaN. If one of the operands is NaN, the value of the result
is undefined.

no Infs (I)
- The optimizer is allowed to optimize under the assumption that the operands'
values are not +/-Inf. If one of the operands is +/-Inf, the value of the
result is undefined.

no signed zeros (S)
- The optimizer is allowed to not distinguish between -0 and +0 for the
purposes of optimizations.

unsafe algebra (A)
- The optimizer is allowed to perform algebraically equivalent transformations

that may dramatically change results in floating point. (e.g.
reassociation)

Throughout I'll refer to these options in their short-hand, e.g. 'A'.
Internally, these flags are to reside in SubclassData.

======
Question:

Not all combinations make sense (e.g. 'A' pretty much implies all other flags).

Basically, I have the below lattice of sensible relations:
A > S > N
A > I > N
Meaning that 'A' implies all the others, 'S' implies 'N', etc.

It might be desirable to simplify this into just being a fast-math level.

======

Changes to LangRef
---

Change the definitions of floating point arithmetic operations, below is how
fadd will change:

'fadd' Instruction
Syntax:

<result> = fadd {flag}* <ty> <op1>, <op2> ; yields {ty}:result
...
Semantics:
...
flag can be one of the following optimizer hints to enable otherwise unsafe
floating point optimizations:

N: no NaNs - The optimizer is allowed to optimize under the assumption that
the operands' values are not NaN. If one of the operands is NaN, the value
of the result is undefined.
I: no infs - The optimizer is allowed to optimize under the assumption that
the operands' values are not +/-Inf. If one of the operands is +/-Inf, the
value of the result is undefined.
S: no signed zeros - The optimizer is allowed to not distinguish between -0
and +0 for the purposes of optimizations.
A: unsafe algebra - The optimizer is allowed to perform algebraically

equivalent transformations that may dramatically change results in floating
point. (e.g. reassociation)

Changes to optimizations
---

Optimizations should be allowed to perform unsafe optimizations provided the
instructions involved have the corresponding restrictions relaxed. When
combining instructions, optimizations should do what makes sense to not remove
restrictions that previously existed (commonly, a bitwise-AND of the flags).

Below are some example optimizations that could be allowed with the given
relaxations.

N - no NaNs
x == x ==> true

S - no signed zeros
x - 0 ==> x
0 - (x - y) ==> y - x

NIS - no signed zeros AND no NaNs AND no Infs

x * 0 ==> 0

NI - no infs AND no NaNs
x - x ==> 0

A - unsafe-algebra
Reassociation
(x + y) + z ==> x + (y + z)

(Optional)
I propose adding the below flags:

-ffinite-math-only
Allow optimizations to assume that floating point arguments and results are
NaNs or +/-Inf. This may produce incorrect results, and so should be used with
care.

This would set the 'I' and 'N' bits on all generated floating point instructions.

-fno-signed-zeros
Allow optimizations to ignore the signedness of zero. This may produce
incorrect results, and so should be used with care.

This would set the 'S' bit on all FP instructions.

Changes to llvm cli tools
---
opt and llc already have the command line options
-enable-unsafe-fp-math: Enable optimizations that may decrease FP precision

to those efforts. While these properties could still be expressed as metadata,
the proposed flags are analogous to nsw/nuw and are inherent properties of the
IR instructions themselves that all transformations should respect. There is
still some debate on what form, metadata vs flags, should be used.

Evan Cheng

unread,

Oct 30, 2012, 6:11:26 PM10/30/12

to Duncan Sands, llv...@cs.uiuc.edu

On Oct 30, 2012, at 1:46 AM, Duncan Sands <bald...@free.fr> wrote:

FYI. We've already had extensive discussion with Chris on this. He has made it clear this *must* be implemented with subclass data bits, not with metadata.

Evan

Evan Cheng

unread,

Oct 30, 2012, 6:13:54 PM10/30/12

to Michael Ilseman, llv...@cs.uiuc.edu

On Oct 30, 2012, at 9:36 AM, Michael Ilseman <mils...@apple.com> wrote:

>
> On Oct 30, 2012, at 1:46 AM, Duncan Sands <bald...@free.fr> wrote:
>
>> Hi Michael,
>>
>>> Flags
>>> ---
>>> no NaNs (N)
>>> - ignore the existence of NaNs when convenient
>>> no Infs (I)
>>> - ignore the existence of Infs when convenient
>>> no signed zeros (S)
>>> - ignore the existence of negative zero when convenient
>>
>> while the above flags make perfect sense for me, the other two seem more
>> dubious:
>>
>>> allow fusion (F)
>>> - fuse FP operations when convenient, despite possible differences in rounding
>>> (e.g. form FMAs)
>>> unsafe algebra (A)
>>> - allow for algebraically equivalent transformations that may dramatically
>>> change results in floating point. (e.g. reassociation)
>>
>> They don't seem to be capturing a clear concept, they seem more like a grab-bag
>> of "everything else" (A) or "here's a random thing that is important today so
>> let's have a flag for it" (F).
>>
>
> 'A' is certainly a bit of a grab-bag, but I had difficulty breaking it apart into finer-grained pieces that a user would want to pick and choose between. I'd be interested in any suggestions you might have along these lines.

There is cost in modeling property. Unless there are uses for the individual fine grained properties, we shouldn't go overboard.

Evan

Dan Gohman

unread,

Oct 30, 2012, 7:19:17 PM10/30/12

to Michael Ilseman, llv...@cs.uiuc.edu

Ok, I checked LLVM CodeGen's existing -enable-no-infs-fp-math and -enable-no-nans-fp-math flags, and GCC's -ffinite-math-only flag, and they all say they apply to results as well as arguments. Do you have a good reason for varying from existing practice here?

Phrasing these from the perspective of the optimizer is a little confusing here. Also, "The optimizer is allowed to [not care about X]" read literally means that the semantics for X are unconstrained, which would be Undefined Behavior. For I and N here you have a second sentence which says only the result is undefined, but for S you don't. Also, even when you do have the second sentence, it seems to contradict the first sentence.

unsafe algebra (A)
- The optimizer is allowed to perform algebraically equivalent transformations

that may dramatically change results in floating point. (e.g.
reassociation)

Throughout I'll refer to these options in their short-hand, e.g. 'A'.
Internally, these flags are to reside in SubclassData.

======
Question:

Not all combinations make sense (e.g. 'A' pretty much implies all other flags).

Basically, I have the below lattice of sensible relations:
A > S > N
A > I > N
Meaning that 'A' implies all the others, 'S' implies 'N', etc.

Why does S still imply N?

Also, I'm curious if there's a specific motivation to have I imply N. LLVM CodeGen's existing options for these are independent.

It might be desirable to simplify this into just being a fast-math level.

What would make this desirable?

I'm still confused by what you mean in this sentence. Why are you talking about constants, if you intend this optimizations to be valid for non-constants? And, it's not clear what you're trying to say about denormal values here.

Dan

Michael Ilseman

unread,

Oct 30, 2012, 11:28:12 PM10/30/12

to Dan Gohman, llv...@cs.uiuc.edu

The primary example I was trying to simplify with that change was x * 0 ==> 0. It can be performed if you assume NIS inputs, or NS inputs and N outputs. This is because Inf * 0 is NaN. In hindsight, this is all making things more confusing, so I think I'll go back to "arguments and results" and allow this optimization for NS. GCC gets around this by lumping Inf and NaN under the same command line option.

Phrasing these from the perspective of the optimizer is a little confusing here.

I think it might be clearer to change "The optimizer is allowed to …" to "Allow optimizations to …" and clean up the wording a bit.

Also, "The optimizer is allowed to [not care about X]" read literally means that the semantics for X are unconstrained, which would be Undefined Behavior. For I and N here you have a second sentence which says only the result is undefined, but for S you don't.

'S' shouldn't have any undefined behavior, it just allows optimizations to not distinguish between +/-0. It's perfectly legal for the operation to receive a negative zero, the operation just might treat it exactly the same as a positive zero. I would rather have that than undefined behavior.

This is similar to how gcc defines -fno-signed-zeros:

"Allow optimizations for floating point arithmetic that ignore the signedness of zero. IEEE arithmetic specifies the behavior of distinct +0.0 and -0.0 values, which then prohibits simplification of expressions such as x+0.0 or 0.0*x (even with -ffinite-math-only). This option implies that the sign of a zero result isn't significant."

I'll revise my description to also mention that the sign of a zero result isn't significant.

Also, even when you do have the second sentence, it seems to contradict the first sentence.

Why does it contradict the first sentence? I meant it as a clarification or reinforcement of the first, not a contradiction.

unsafe algebra (A)
- The optimizer is allowed to perform algebraically equivalent transformations

that may dramatically change results in floating point. (e.g.
reassociation)

Throughout I'll refer to these options in their short-hand, e.g. 'A'.
Internally, these flags are to reside in SubclassData.

======
Question:

Not all combinations make sense (e.g. 'A' pretty much implies all other flags).

Basically, I have the below lattice of sensible relations:
A > S > N
A > I > N
Meaning that 'A' implies all the others, 'S' implies 'N', etc.

Why does S still imply N?

Also, I'm curious if there's a specific motivation to have I imply N. LLVM CodeGen's existing options for these are independent.

It might be desirable to simplify this into just being a fast-math level.

What would make this desirable?

I think this "Question" I had no longer makes too much sense, so I'm going to delete this section.

I was mentioning denormals for one of the optimizations. I think it would be more clear to say something like:

Reciprocal
x / C ==> x * (1/C) when (1/C) is not denormal

I was mostly trying to say that the optimizations are not blindly applied, but are applied when they are still legal. I think the sentence is more confusing than helpful, though.

Dan

Thanks!

Joshua Cranmer

unread,

Oct 31, 2012, 12:11:13 AM10/31/12

to llv...@cs.uiuc.edu

On 10/30/2012 10:28 PM, Michael Ilseman wrote:

On Oct 30, 2012, at 4:19 PM, Dan Gohman <dan4...@gmail.com> wrote:

On Tue, Oct 30, 2012 at 2:25 PM, Michael Ilseman <mils...@apple.com> wrote:

no signed zeros (S)
- The optimizer is allowed to not distinguish between -0 and +0 for the
purposes of optimizations.

Ok, I checked LLVM CodeGen's existing -enable-no-infs-fp-math and -enable-no-nans-fp-math flags, and GCC's -ffinite-math-only flag, and they all say they apply to results as well as arguments. Do you have a good reason for varying from existing practice here?

The primary example I was trying to simplify with that change was x * 0 ==> 0. It can be performed if you assume NIS inputs, or NS inputs and N outputs. This is because Inf * 0 is NaN. In hindsight, this is all making things more confusing, so I think I'll go back to "arguments and results" and allow this optimization for NS. GCC gets around this by lumping Inf and NaN under the same command line option.

Phrasing these from the perspective of the optimizer is a little confusing here.

I think it might be clearer to change "The optimizer is allowed to …" to "Allow optimizations to …" and clean up the wording a bit.

Also, "The optimizer is allowed to [not care about X]" read literally means that the semantics for X are unconstrained, which would be Undefined Behavior. For I and N here you have a second sentence which says only the result is undefined, but for S you don't.

'S' shouldn't have any undefined behavior, it just allows optimizations to not distinguish between +/-0. It's perfectly legal for the operation to receive a negative zero, the operation just might treat it exactly the same as a positive zero. I would rather have that than undefined behavior.

I'm not an expert in writing specifications, but I think defining the S flag in this manner would be preferable:
no signed zeros (S) - If present, then the result of a floating point operation with -0.0 or +0.0 as an operand is either the result of the operation with the original specified values or the result of the operation with the +0.0 or -0.0 replaced with its opposite sign.

As a side note, it's never explicitly stated in the language reference how much of IEEE 754 semantics floating point operations must follow.

-- 
Joshua Cranmer
News submodule owner
DXR coauthor

Chris Lattner

unread,

Oct 31, 2012, 1:50:33 AM10/31/12

to Evan Cheng, llv...@cs.uiuc.edu

On Oct 30, 2012, at 3:11 PM, Evan Cheng <evan....@apple.com> wrote:
>> Disadvantages of using subclass data bits:
>>
>> - Can only represent flags. Thus you might end up with a mix of flags and
>> metadata for floating point math, with the metadata holding the non-flag
>> info, and subclass data holding the flags. In which case it might be better
>> to just have it all be metadata in the first place
>> - Only a limited number of bits (but hey)
>>
>> Hopefully Chris will weigh in with his opinion.
>
> FYI. We've already had extensive discussion with Chris on this. He has made it clear this *must* be implemented with subclass data bits, not with metadata.

More specifically, I reviewed the proposal and I agree with it's general design: I think it makes sense to use subclass data for these bits even though fpprecision doesn't. It follows the analogy of NSW/NUW bits which have worked well. I also think it makes a lot of sense to separate out the "relaxing FP math" part of the FP problem from orthogonal issues like modeling rounding modes, trapping operations (SNANs), etc.

That said, I agree that the individual proposed bits (e.g. "A") could use some refinement. I think it is really important to accurately model the concepts that GCC exposes, but it may make sense to decompose them into finer-grained concepts than what GCC exposes. Also, infer-ability is an important aspect of this: we already have stuff in LLVM that tries to figure out things like "this can never be negative zero". I'd like it if we can separate the inference of this property from the clients of it.

At a (ridiculous) limit, we could take everything in "A" and see what optimizations we want to permit, and add a separate bit for every suboptimization that it would enable. Hopefully from that list we can find natural clusters that would make sense to group together.

-Chris

Krzysztof Parzyszek

unread,

Oct 31, 2012, 10:01:45 AM10/31/12

to llv...@cs.uiuc.edu

On 10/30/2012 11:36 AM, Michael Ilseman wrote:
> On Oct 30, 2012, at 1:46 AM, Duncan Sands <bald...@free.fr> wrote:
>
> 'A' is certainly a bit of a grab-bag, but I had difficulty breaking it apart into finer-grained pieces that a user would want to pick and choose between. I'd be interested in any suggestions you might have along these lines.

For reference (or ideas), here's how the IBM XL compiler breaks down the
floating point options (look for -qstrict if link doesn't take you there):

http://www.spec.org/cpu2006/flags/IBM-XL.20110613.html#user_F-qstrict

-Krzysztof

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Dan Gohman

unread,

Nov 1, 2012, 6:08:03 PM11/1/12

to Michael Ilseman, llv...@cs.uiuc.edu

On Tue, Oct 30, 2012 at 8:28 PM, Michael Ilseman <mils...@apple.com> wrote:

This is similar to how gcc defines -fno-signed-zeros:
"Allow optimizations for floating point arithmetic that ignore the signedness of zero. IEEE arithmetic specifies the behavior of distinct +0.0 and -0.0 values, which then prohibits simplification of expressions such as x+0.0 or 0.0*x (even with -ffinite-math-only). This option implies that the sign of a zero result isn't significant."

I'll revise my description to also mention that the sign of a zero result isn't significant.

Ok, I see what you're saying here now.

Also, even when you do have the second sentence, it seems to contradict the first sentence.

Why does it contradict the first sentence? I meant it as a clarification or reinforcement of the first, not a contradiction.

Suppose I'm writing a backend for a target which has an instruction that traps on any kind of NaN. Assuming I care about NaNs, I can't use such an instruction for regular floating-point operations. However, would it be ok to use it when the N flag is set?

If the "optimizer" may truly ignore the possibility of NaNs under the N flag, this would seem to be ok. However, a trap is outside the boundaries of "undefined result". So, which half is right?

Dan

Krzysztof Parzyszek

unread,

Nov 1, 2012, 6:56:20 PM11/1/12

to llv...@cs.uiuc.edu

On 11/1/2012 5:08 PM, Dan Gohman wrote:
>
> Suppose I'm writing a backend for a target which has an instruction that
> traps on any kind of NaN. Assuming I care about NaNs, I can't use such
> an instruction for regular floating-point operations. However, would it
> be ok to use it when the N flag is set?

I've lost track here. What kind of instruction are you talking about?

-Krzysztof

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Michael Ilseman

unread,

Nov 1, 2012, 7:38:45 PM11/1/12

to Dan Gohman, llv...@cs.uiuc.edu

That makes sense, I was thinking of things only in terms of the optimizer and not in terms of instruction selection. Which do you think is better, Undefined Behavior or that instruction selection should disregard those bits? I'd lean towards undefined behavior, but I don't have a good feel for LLVM's overall design for undefined behavior, poison values, etc.

Dan

Krzysztof Parzyszek

unread,

Nov 1, 2012, 9:41:30 PM11/1/12

to llv...@cs.uiuc.edu

On 11/1/2012 6:38 PM, Michael Ilseman wrote:
>
> On Nov 1, 2012, at 3:08 PM, Dan Gohman <dan4...@gmail.com

> <mailto:dan4...@gmail.com>> wrote:
>>
>> If the "optimizer" may truly ignore the possibility of NaNs under the
>> N flag, this would seem to be ok. However, a trap is outside the
>> boundaries of "undefined result". So, which half is right?
>>
>
> That makes sense, I was thinking of things only in terms of the
> optimizer and not in terms of instruction selection. Which do you think
> is better, Undefined Behavior or that instruction selection should
> disregard those bits? I'd lean towards undefined behavior, but I don't
> have a good feel for LLVM's overall design for undefined behavior,
> poison values, etc.

I still don't understand Dan's concerns, but a target that traps on a
quiet NaN is not IEEE compliant. IEEE754 requires that quiet NaNs get
propagated through operands to the results without causing exceptions.

-Krzysztof

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Michael Ilseman

unread,

Nov 2, 2012, 12:53:58 PM11/2/12

to Krzysztof Parzyszek, llv...@cs.uiuc.edu

On Nov 1, 2012, at 6:41 PM, Krzysztof Parzyszek <kpar...@codeaurora.org> wrote:

> On 11/1/2012 6:38 PM, Michael Ilseman wrote:
>>
>> On Nov 1, 2012, at 3:08 PM, Dan Gohman <dan4...@gmail.com
>> <mailto:dan4...@gmail.com>> wrote:
>>>
>>> If the "optimizer" may truly ignore the possibility of NaNs under the
>>> N flag, this would seem to be ok. However, a trap is outside the
>>> boundaries of "undefined result". So, which half is right?
>>>
>>
>> That makes sense, I was thinking of things only in terms of the
>> optimizer and not in terms of instruction selection. Which do you think
>> is better, Undefined Behavior or that instruction selection should
>> disregard those bits? I'd lean towards undefined behavior, but I don't
>> have a good feel for LLVM's overall design for undefined behavior,
>> poison values, etc.
>
> I still don't understand Dan's concerns, but a target that traps on a quiet NaN is not IEEE compliant. IEEE754 requires that quiet NaNs get propagated through operands to the results without causing exceptions.
>

I think Dan was making two points with his example. Dan, correct me if I misrepresent your example, but image a situation where a target has two instructions to choose between in order to perform the operation. The first is IEEE compliant, but the second isn't compliant in how it operates over NaNs (quiet or otherwise). For whatever reason, the second is preferred when we know inputs are not NaN.

The first point is that I didn't specify if the N bit would allow the target to choose the non-compliant operation.

The second point is that my specifying "undefined value" isn't enough. What if the non-compliant instruction's behavior on NaN was to trap. It's not just an invalid/random bit pattern, but actual behavioral differences.

My question is whether to approach this by saying that the flags are for the optimizer only, and targets must ignore them, or to say that targets are free to use them and the operation has undefined behavior over, e.g., NaN.

Michael Ilseman

unread,

Nov 2, 2012, 12:58:53 PM11/2/12

to Chris Lattner, llv...@cs.uiuc.edu

I should separate out Reciprocal from the rest of "A", as I believe that's pretty separable and safer than allowing the other transforms.

One very desired transform from "A" is to allow the reassociation/canonicalization of floating point operations similarly to how the reassociation pass operates over integer operations. I'll think about whether there are other transforms that would be sufficiently distinct from this one remaining in "A" that would make sense to separate out.

> -Chris

Thanks for the feedback!

Krzysztof Parzyszek

unread,

Nov 2, 2012, 1:02:50 PM11/2/12

to Michael Ilseman, llv...@cs.uiuc.edu

On 11/2/2012 11:53 AM, Michael Ilseman wrote:
>
>
> I think Dan was making two points with his example. Dan, correct me if I misrepresent your example, but image a situation where a target has two instructions to choose between in order to perform the operation. The first is IEEE compliant, but the second isn't compliant in how it operates over NaNs (quiet or otherwise). For whatever reason, the second is preferred when we know inputs are not NaN.
>
> The first point is that I didn't specify if the N bit would allow the target to choose the non-compliant operation.
>
> The second point is that my specifying "undefined value" isn't enough. What if the non-compliant instruction's behavior on NaN was to trap. It's not just an invalid/random bit pattern, but actual behavioral differences.

I see. The situation is that the user tells the compiler to "ignore"
NaNs, and yet the program does produce a NaN. The compiler generates
the trapping instruction (expecting that the trap won't happen), but
because of the NaN, the trap does occur.

My definition of the N flag would be that it instructs the compiler that
the computations do not involve or produce NaNs. In other words, when,
as a programmer, I use the N flag, I'm telling the compiler that I don't
expect NaNs to ever appear in the computations. If a NaN did appear and
produced a trap, it would be just as unexpected as seeing a NaN in the
output without a trap.

The same would apply to infinities.

Duncan Sands

unread,

Nov 2, 2012, 1:33:54 PM11/2/12

to llv...@cs.uiuc.edu

Hi Michael,

> I should separate out Reciprocal from the rest of "A", as I believe that's
pretty separable and safer than allowing the other transforms.

I think forming the reciprocal only introduces a bounded number of ULPs of
error, so you could use 'fpmath' metadata for this, by giving it a big enough
value. It might be nice to allow 'bounded' as a value for fpmath metadata,
which would allow anything which introduces at most a bounded number of ULPs
of error (unlike reassociation which can introduce an unbounded amount of
inaccuracy) like this or for example -0.0 -> 0.0.

Ciao, Duncan.

Dan Gohman

unread,

Nov 2, 2012, 4:07:37 PM11/2/12

to Krzysztof Parzyszek, llv...@cs.uiuc.edu

On Fri, Nov 2, 2012 at 10:02 AM, Krzysztof Parzyszek <kpar...@codeaurora.org> wrote:

On 11/2/2012 11:53 AM, Michael Ilseman wrote:

I think Dan was making two points with his example. Dan, correct me if I misrepresent your example, but image a situation where a target has two instructions to choose between in order to perform the operation. The first is IEEE compliant, but the second isn't compliant in how it operates over NaNs (quiet or otherwise). For whatever reason, the second is preferred when we know inputs are not NaN.

The first point is that I didn't specify if the N bit would allow the target to choose the non-compliant operation.

The second point is that my specifying "undefined value" isn't enough. What if the non-compliant instruction's behavior on NaN was to trap. It's not just an invalid/random bit pattern, but actual behavioral differences.

That's right.

I see. The situation is that the user tells the compiler to "ignore" NaNs, and yet the program does produce a NaN. The compiler generates the trapping instruction (expecting that the trap won't happen), but because of the NaN, the trap does occur.

My definition of the N flag would be that it instructs the compiler that the computations do not involve or produce NaNs. In other words, when, as a programmer, I use the N flag, I'm telling the compiler that I don't expect NaNs to ever appear in the computations. If a NaN did appear and produced a trap, it would be just as unexpected as seeing a NaN in the output without a trap.

Just so you know, what you're describing sounds like Undefined Behavior.

I don't currently have a suggestion for what is best choice is, or of how important it is.

Dan

Krzysztof Parzyszek

unread,

Nov 2, 2012, 5:10:07 PM11/2/12

to llv...@cs.uiuc.edu

On 11/2/2012 3:07 PM, Dan Gohman wrote:
>
>
> Just so you know, what you're describing sounds like Undefined Behavior.
>
> I don't currently have a suggestion for what is best choice is, or of
> how important it is.
>

I believe that the "undefined behavior" approach is clearer, and easier
to understand (for the users who will be providing such flags).

Eli Friedman

unread,

Nov 2, 2012, 5:21:34 PM11/2/12

to Dan Gohman, llv...@cs.uiuc.edu

It's worth noting that choosing "undefined behavior" could force
passes like LICM to strip the flags.

-Eli

Michael Ilseman

unread,

Nov 9, 2012, 5:34:51 PM11/9/12

to llv...@cs.uiuc.edu

Revision 2

Revision 2 changes:
* Add in separate Reciprocal flag
* Clarified wording of flags, specified undefined values, not behavior
* Removed some confusing language
* Mentioned optimizations/analyses adding in flags due to inferred knowledge

Revision 1 changes:
* Removed Fusion flag from all sections
* Clarified and changed descriptions of remaining flags:
* Make 'N' and 'I' flags be explicitly concerning values of operands, and
producing undef values if a NaN/Inf is provided.
* 'S' is now only about distinguishing between +/-0.
* LangRef changes updated to reflect flags changes
* Updated Quesiton section given the now simpler set of flags
* Optimizations changed to reflect 'N' and 'I' describing operands and not
results
* Be explicit on what LLVM's default behavior is (no signaling NaNs, etc)

* Mention that this could be alternatively solved with metadata, and open the

LLVM IR instructions will have the following flags that can be set by the
creator of the IR.

no NaNs (N)
- Allow optimizations that assume the arguments and result are not NaN. Such
optimizations are required to retain defined behavior over NaNs, but the

value of the result is undefined.

no Infs (I)

- Allow optimizations that assume the arguments and result are not
+/-Inf. Such optimizations are required to retain defined behavior over
+/-Inf, but the value of the result is undefined.

no signed zeros (S)
- Allow optimizations to treat the sign of a zero argument or result as
insignificant.

allow reciprocal (R)
- Allow optimizations to use the reciprocal of an argument instead of dividing

unsafe algebra (A)
- The optimizer is allowed to perform algebraically equivalent transformations
that may dramatically change results in floating point. (e.g.

reassociation).

Throughout I'll refer to these options in their short-hand, e.g. 'A'.
Internally, these flags are to reside in SubclassData.

Setting the 'A' flag implies the setting of all the others ('N', 'I', 'S', 'R').

Changes to LangRef
---

Change the definitions of floating point arithmetic operations, below is how
fadd will change:

'fadd' Instruction
Syntax:

<result> = fadd {flag}* <ty> <op1>, <op2> ; yields {ty}:result
...
Semantics:
...
flag can be one of the following optimizer hints to enable otherwise unsafe
floating point optimizations:

N: no NaNs - Allow optimizations that assume the arguments and result are not
NaN. Such optimizations are required to retain defined behavior over NaNs,
but the value of the result is undefined.
I: no infs - Allow optimizations that assume the arguments and result are not
+/-Inf. Such optimizations are required to retain defined behavior over
+/-Inf, but the value of the result is undefined.
S: no signed zeros - Allow optimizations to treat the sign of a zero argument
or result as insignificant.
A: unsafe algebra - The optimizer is allowed to perform algebraically

equivalent transformations that may dramatically change results in floating

point. (e.g. reassociation).

fdiv will also mention that 'R' allows the fdiv to be replaced by a
multiply-by-reciprocal.

Changes to optimizations
---

Optimizations should be allowed to perform unsafe optimizations provided the
instructions involved have the corresponding restrictions relaxed. When
combining instructions, optimizations should do what makes sense to not remove
restrictions that previously existed (commonly, a bitwise-AND of the flags).

Below are some example optimizations that could be allowed with the given
relaxations.

N - no NaNs
x == x ==> true

S - no signed zeros
x - 0 ==> x
0 - (x - y) ==> y - x

NIS - no signed zeros AND no NaNs AND no Infs
x * 0 ==> 0

NI - no infs AND no NaNs
x - x ==> 0

R - reciprocal
x / y ==> x * (1/y)

A - unsafe-algebra
Reassociation
(x + y) + z ==> x + (y + z)
(x + C1) + C2 ==> x + (C1 + C2)
Redistribution
(x * C) + x ==> x * (C+1)
(x * C) + (x + x) ==> x * (C + 2)

I propose to expand -instsimplify and -instcombine to perform these kinds of
optimizations. -reassociate will be expanded to reassociate floating point
operations when allowed. Similar to existing behavior regarding integer
wrapping, -early-cse will not CSE FP operations with mismatched flags, while
-gvn will (conservatively). This allows later optimizations to optimize the
expressions independently between runs of -early-cse and -gvn.

Optimizations and analyses that are able to infer certain properties of
instructions are allowed to set relevant flags. For example, if some analysis
has determined that the arguments and result of an instruction are not NaNs or
Infs, then it may set the 'N' and 'I' flags, allowing every other optimization
and analysis to benefit from this inferred knowledge.

Changes to frontends
---

Frontends are free to generate code with flags set as they desire. Frontends
should continue to call llc with their desired options, as the flags apply only
at the IR level and not at codegen or the SelectionDAGs.

The intention behind the flags are to allow the IR creator to say something
along the lines of:
"If this operation is given a NaN, or the result is a NaN, then I don't care
what answer I get back. However, I expect my program to otherwise behave
properly."

-freciprocal-math
Allow optimizations to use the reciprocal of an argument instead of using
division. This may produce less precise results, and so should be used with
care.

This would set the 'R' bit on all relevant FP instructions

Krzysztof Parzyszek

unread,

Nov 12, 2012, 12:21:00 PM11/12/12

to llv...@cs.uiuc.edu

Overall this looks good to me, but I'd rephrase this particular sentence
to make it clearer what the intent is:

> Such optimizations are required to retain defined behavior over NaNs,
> but the value of the result is undefined.

"When the N flag is present, having NaNs as arguments is valid, and the
program is well-formed, but the result is unspecified."

In particular, I would avoid the word "undefined" and use "unspecified"
or "implementation-defined" instead.

Similarly for infinities.

-Krzysztof

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Joe Abbey

unread,

Nov 12, 2012, 1:39:22 PM11/12/12

to Michael Ilseman, llv...@cs.uiuc.edu

Michael,

Since you won't be using metadata to store this information and are augmenting the IR, I'd recommend incrementing the bitcode version number. The current version stored in a local variable in BitcodeWriter.cpp:1814*

I would suspect then you'll also need to provide additional logic for reading:

      switch (module_version) {

        default: return Error("Unknown bitstream version!");

        case 2:

EncodesFastMathIR = true;

        case 1:

          UseRelativeIDs = true;

          break;

 	case 0:
          UseRelativeIDs = false;
          break;
       

}

Joe

(*TODO: Put this somewhere else).

Chris Lattner

unread,

Nov 12, 2012, 8:42:16 PM11/12/12

to Joe Abbey, llv...@cs.uiuc.edu

On Nov 12, 2012, at 10:39 AM, Joe Abbey <jab...@arxan.com> wrote:

Michael,

Since you won't be using metadata to store this information and are augmenting the IR, I'd recommend incrementing the bitcode version number. The current version stored in a local variable in BitcodeWriter.cpp:1814*

I would suspect then you'll also need to provide additional logic for reading:

  switch (module_version) {
default: return Error("Unknown bitstream version!");
  case 2:
EncodesFastMathIR = true;
  case 1:
UseRelativeIDs = true;
break;
case 0:
  UseRelativeIDs = false;
  break;

}

Couldn't this be handled by adding an extra operand to the binary operators?

-Chris

Michael Ilseman

unread,

Nov 14, 2012, 3:28:39 PM11/14/12

to Chris Lattner, Joe Abbey, llv...@cs.uiuc.edu

I think I missed what problem we're trying to solve here.

I'm looking at implementing the bitcode now. I have code to successfully read and write out the LLVM IR textual formal (LLParser, etc) and set the corresponding SubclassOptionalData bits. Looking at LLVMBitCodes.h, I'm seeing where these bits reside in the bitcode, so I believe that things should be pretty straight-forward from here.

Joe, what are the reasons for me to increment the IR version number? My understanding is that I'll just be using existing bits that were previously ignored. Ignoring these bits is still valid, just conservative. I believe these flags would be zero-ed out in old IR (correct me if I'm wrong), which is the intended default.

Chris, what problem could be solved by adding extra operands to binary ops? I'm trying to avoid those sorts of modifications, as the fast-math flags could make sense applied to a variety operations, e.g. comparisons and casts.

Krzysztof Parzyszek

unread,

Nov 14, 2012, 3:43:03 PM11/14/12

to llv...@cs.uiuc.edu

On 11/14/2012 2:28 PM, Michael Ilseman wrote:
> I think I missed what problem we're trying to solve here.
>

I'm guessing that the actual encoding may change. If we know that there
are certain unused bits, we don't need to store them. If the bits are
used, we have to keep them. I think someone was working on some sort of
bitcode compression scheme, and in that context the difference may be
significant.

-Krzysztof

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Chris Lattner

unread,

Nov 14, 2012, 3:47:57 PM11/14/12

to Michael Ilseman, Joe Abbey, llv...@cs.uiuc.edu

On Nov 14, 2012, at 12:28 PM, Michael Ilseman <mils...@apple.com> wrote:

> I think I missed what problem we're trying to solve here.
>
> I'm looking at implementing the bitcode now. I have code to successfully read and write out the LLVM IR textual formal (LLParser, etc) and set the corresponding SubclassOptionalData bits. Looking at LLVMBitCodes.h, I'm seeing where these bits reside in the bitcode, so I believe that things should be pretty straight-forward from here.
>
> Joe, what are the reasons for me to increment the IR version number? My understanding is that I'll just be using existing bits that were previously ignored. Ignoring these bits is still valid, just conservative. I believe these flags would be zero-ed out in old IR (correct me if I'm wrong), which is the intended default.

Yes, this is the right thing, just make the sense of the bits in the bitcode file be "1" for cases that differs from the old default.

> Chris, what problem could be solved by adding extra operands to binary ops? I'm trying to avoid those sorts of modifications, as the fast-math flags could make sense applied to a variety operations, e.g. comparisons and casts.

How, specifically, are you proposing that these bits be encoded?

-Chris

Michael Ilseman

unread,

Nov 14, 2012, 4:39:14 PM11/14/12

to Chris Lattner, Joe Abbey, llv...@cs.uiuc.edu

On Nov 14, 2012, at 12:47 PM, Chris Lattner <clat...@apple.com> wrote:

>
> On Nov 14, 2012, at 12:28 PM, Michael Ilseman <mils...@apple.com> wrote:
>
>> I think I missed what problem we're trying to solve here.
>>
>> I'm looking at implementing the bitcode now. I have code to successfully read and write out the LLVM IR textual formal (LLParser, etc) and set the corresponding SubclassOptionalData bits. Looking at LLVMBitCodes.h, I'm seeing where these bits reside in the bitcode, so I believe that things should be pretty straight-forward from here.
>>
>> Joe, what are the reasons for me to increment the IR version number? My understanding is that I'll just be using existing bits that were previously ignored. Ignoring these bits is still valid, just conservative. I believe these flags would be zero-ed out in old IR (correct me if I'm wrong), which is the intended default.
>
> Yes, this is the right thing, just make the sense of the bits in the bitcode file be "1" for cases that differs from the old default.
>

Will do!

>> Chris, what problem could be solved by adding extra operands to binary ops? I'm trying to avoid those sorts of modifications, as the fast-math flags could make sense applied to a variety operations, e.g. comparisons and casts.
>
> How, specifically, are you proposing that these bits be encoded?
>

I'm new to the bitcode so let me know if this doesn't make sense. I was going to look at the encoding for nuw (OBO_NO_UNSIGNED_WRAP) and follow how it is encoded/decoded in the bitcode. I would then specify some kind of fast-math enum and encode it in a similar fashion.

After I go down this path a little more I might be able to give you a more intelligent answer.

Thanks!

Michael Ilseman

unread,

Nov 14, 2012, 5:13:38 PM11/14/12

to Chris Lattner, Joe Abbey, llv...@cs.uiuc.edu

I attached a working patch of changes to the bitcode reader and writer. This patch references other local changes I have to other parts of the code (e.g. "FastMathFlags"), but shows the general idea I'm going for. When I've ironed out all the bugs, I'll attach a series of patches for all the other content.

bitcode_example.patch

Joe Abbey

unread,

Nov 14, 2012, 8:04:00 PM11/14/12

to Michael Ilseman, llv...@cs.uiuc.edu

On Nov 14, 2012, at 5:13 PM, Michael Ilseman <mils...@apple.com> wrote:

> I attached a working patch of changes to the bitcode reader and writer. This patch references other local changes I have to other parts of the code (e.g. "FastMathFlags"), but shows the general idea I'm going for. When I've ironed out all the bugs, I'll attach a series of patches for all the other content.
>

> <bitcode_example.patch>
>
> Does this patch make sense, or am I still missing the main concern?
>

Michael,

Ah I see. So this just becomes a matter of interpretation of bits in the optimization flags. Shouldn't need to promote the CurVersion.

Nitpick: 80-cols in BitcodeReader.cpp

Since Instruction::FastMathFlags is a class, seems like the constructor could take in Record[OpNum] , and assign the flags.

Looking forward to the full patch.

Joe

Michael Ilseman

unread,

Nov 14, 2012, 8:23:43 PM11/14/12

to Joe Abbey, llv...@cs.uiuc.edu

On Nov 14, 2012, at 5:04 PM, Joe Abbey <jab...@arxan.com> wrote:

>
>
> On Nov 14, 2012, at 5:13 PM, Michael Ilseman <mils...@apple.com> wrote:
>
>> I attached a working patch of changes to the bitcode reader and writer. This patch references other local changes I have to other parts of the code (e.g. "FastMathFlags"), but shows the general idea I'm going for. When I've ironed out all the bugs, I'll attach a series of patches for all the other content.
>>
>> <bitcode_example.patch>
>>
>> Does this patch make sense, or am I still missing the main concern?
>>
>
> Michael,
>
> Ah I see. So this just becomes a matter of interpretation of bits in the optimization flags. Shouldn't need to promote the CurVersion.
>
> Nitpick: 80-cols in BitcodeReader.cpp
>
> Since Instruction::FastMathFlags is a class, seems like the constructor could take in Record[OpNum] , and assign the flags.
>

I like the intent, but unfortunately Record[OpNum] is just a uint64_t. The agreement of which bit means what is in LLVMBitCodes.h, and I'd prefer not having an implicit handshake between the bitcode and the rest of LLVM. However, I'll try to find ways to factor more convenience into shared code.

> Looking forward to the full patch.
>
> Joe

Thanks for the feedback!

Joe Abbey

unread,

Nov 14, 2012, 8:28:05 PM11/14/12

to Michael Ilseman, llv...@cs.uiuc.edu

>> Since Instruction::FastMathFlags is a class, seems like the constructor could take in Record[OpNum] , and assign the flags.
>>
>
> I like the intent, but unfortunately Record[OpNum] is just a uint64_t. The agreement of which bit means what is in LLVMBitCodes.h, and I'd prefer not having an implicit handshake between the bitcode and the rest of LLVM. However, I'll try to find ways to factor more convenience into shared code.
>

Works for me.

Joe

Michael Ilseman

unread,

Nov 14, 2012, 10:19:17 PM11/14/12

to Chris Lattner, Joe Abbey, llv...@cs.uiuc.edu

On Nov 14, 2012, at 12:28 PM, Michael Ilseman <mils...@apple.com> wrote:

I think I missed what problem we're trying to solve here.

I'm looking at implementing the bitcode now. I have code to successfully read and write out the LLVM IR textual formal (LLParser, etc) and set the corresponding SubclassOptionalData bits. Looking at LLVMBitCodes.h, I'm seeing where these bits reside in the bitcode, so I believe that things should be pretty straight-forward from here.

Joe, what are the reasons for me to increment the IR version number? My understanding is that I'll just be using existing bits that were previously ignored. Ignoring these bits is still valid, just conservative. I believe these flags would be zero-ed out in old IR (correct me if I'm wrong), which is the intended default.

Chris, what problem could be solved by adding extra operands to binary ops? I'm trying to avoid those sorts of modifications, as the fast-math flags could make sense applied to a variety operations, e.g. comparisons and casts.

I see now that it's only binary operators that have OptimizationFlags reserved for them in the bitcode. Adding fast-math flags for only binary ops is straight-forward, but adding them for other ops might require a more involved bitcode change.

I think that there might be some benefit to having flags for other kinds of ops, but those seem a bit more far-fetched and less common. For example, "fcmp N oeq x, x ==> true" or "bitcast N (bitcast N i32 %foo to float) to i32 ==> i32 %foo" seem more contrived than optimizations over binary ops. Comparisons are already sort of their own beast, as they ignore the sign of zero and have ordered and unordered versions.

Given all that, I think it makes sense to add support for fast-math flags only to binary ops in this iteration, and think about adding it to other operations in the future. Thoughts?

Chandler Carruth

unread,

Nov 14, 2012, 10:37:38 PM11/14/12

to Michael Ilseman, Joe Abbey, llv...@cs.uiuc.edu

On Wed, Nov 14, 2012 at 7:19 PM, Michael Ilseman <mils...@apple.com> wrote:
> I see now that it's only binary operators that have OptimizationFlags
> reserved for them in the bitcode. Adding fast-math flags for only binary ops
> is straight-forward, but adding them for other ops might require a more
> involved bitcode change.

...

> Given all that, I think it makes sense to add support for fast-math flags
> only to binary ops in this iteration, and think about adding it to other
> operations in the future. Thoughts?

I'm really not trying to rehash a discussion in too much depth, but I
have to wonder with all of this -- why not use metadata as the
*encoding* mechanism for these flags?

Just to be clear, I have no strong feelings about any of this, but it
feels like metadata at the bitcode level provides a nice extensible
encoding scheme. Simultaneously the super convenient accessor methods
on the C++ instruction APIs much like flags seem really convenient.
But I don't see why we can't have the best of both worlds.
Essentially, put the flags in "metadata", but still provide nice
first-class APIs etc so that they're significantly easier to use.

:: shrug :: just an idea that might simplify modeling this stuff.

Krzysztof Parzyszek

unread,

Nov 14, 2012, 11:01:01 PM11/14/12

to llv...@cs.uiuc.edu

On 11/14/2012 9:37 PM, Chandler Carruth wrote:
>
> I'm really not trying to rehash a discussion in too much depth, but I
> have to wonder with all of this -- why not use metadata as the
> *encoding* mechanism for these flags?

I'm not familiar with the decoding algorithm, but the first thing that
comes to my mind is that wouldn't it make decoding of an instruction a
two-step process? I.e.
(1) decode the instruction,
(2) update the instruction after metadata has been decoded.

-Krzysztof

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Evan Cheng

unread,

Nov 15, 2012, 1:55:36 AM11/15/12

to Michael Ilseman, Joe Abbey, llv...@cs.uiuc.edu

On Nov 14, 2012, at 7:19 PM, Michael Ilseman <mils...@apple.com> wrote:

On Nov 14, 2012, at 12:28 PM, Michael Ilseman <mils...@apple.com> wrote:

I think I missed what problem we're trying to solve here.

I'm looking at implementing the bitcode now. I have code to successfully read and write out the LLVM IR textual formal (LLParser, etc) and set the corresponding SubclassOptionalData bits. Looking at LLVMBitCodes.h, I'm seeing where these bits reside in the bitcode, so I believe that things should be pretty straight-forward from here.

Joe, what are the reasons for me to increment the IR version number? My understanding is that I'll just be using existing bits that were previously ignored. Ignoring these bits is still valid, just conservative. I believe these flags would be zero-ed out in old IR (correct me if I'm wrong), which is the intended default.

Chris, what problem could be solved by adding extra operands to binary ops? I'm trying to avoid those sorts of modifications, as the fast-math flags could make sense applied to a variety operations, e.g. comparisons and casts.

I see now that it's only binary operators that have OptimizationFlags reserved for them in the bitcode. Adding fast-math flags for only binary ops is straight-forward, but adding them for other ops might require a more involved bitcode change.

I think that there might be some benefit to having flags for other kinds of ops, but those seem a bit more far-fetched and less common. For example, "fcmp N oeq x, x ==> true" or "bitcast N (bitcast N i32 %foo to float) to i32 ==> i32 %foo" seem more contrived than optimizations over binary ops. Comparisons are already sort of their own beast, as they ignore the sign of zero and have ordered and unordered versions.

Given all that, I think it makes sense to add support for fast-math flags only to binary ops in this iteration, and think about adding it to other operations in the future. Thoughts?

I agree. We are not trying all the problems at this time.

Evan

Chris Lattner

unread,

Nov 15, 2012, 4:27:38 PM11/15/12

to Michael Ilseman, Joe Abbey, llv...@cs.uiuc.edu

On Nov 14, 2012, at 1:39 PM, Michael Ilseman <mils...@apple.com> wrote:
>
>>> Chris, what problem could be solved by adding extra operands to binary ops? I'm trying to avoid those sorts of modifications, as the fast-math flags could make sense applied to a variety operations, e.g. comparisons and casts.
>>
>> How, specifically, are you proposing that these bits be encoded?
>>
>
> I'm new to the bitcode so let me know if this doesn't make sense. I was going to look at the encoding for nuw (OBO_NO_UNSIGNED_WRAP) and follow how it is encoded/decoded in the bitcode. I would then specify some kind of fast-math enum and encode it in a similar fashion.

Right, this is what I was suggesting. NSW and friends are handle with this code:

assert(isa<BinaryOperator>(I) && "Unknown instruction!");
Code = bitc::FUNC_CODE_INST_BINOP;
if (!PushValueAndType(I.getOperand(0), InstID, Vals, VE))
AbbrevToUse = FUNCTION_INST_BINOP_ABBREV;
pushValue(I.getOperand(1), InstID, Vals, VE);
Vals.push_back(GetEncodedBinaryOpcode(I.getOpcode()));
uint64_t Flags = GetOptimizationFlags(&I);
if (Flags != 0) {
...
Vals.push_back(Flags);
}

Basically, the Flags operand is added to the end of the operand list (before the bitcode record is written) if present. That's what I meant as an "extra operand", but I see how that was probably *really* confusing :)

Chris Lattner

unread,

Nov 15, 2012, 4:29:02 PM11/15/12

to Michael Ilseman, Joe Abbey, llv...@cs.uiuc.edu

On Nov 14, 2012, at 5:23 PM, Michael Ilseman <mils...@apple.com> wrote:
>> Ah I see. So this just becomes a matter of interpretation of bits in the optimization flags. Shouldn't need to promote the CurVersion.
>>
>> Nitpick: 80-cols in BitcodeReader.cpp
>>
>> Since Instruction::FastMathFlags is a class, seems like the constructor could take in Record[OpNum] , and assign the flags.
>>
>
> I like the intent, but unfortunately Record[OpNum] is just a uint64_t. The agreement of which bit means what is in LLVMBitCodes.h, and I'd prefer not having an implicit handshake between the bitcode and the rest of LLVM. However, I'll try to find ways to factor more convenience into shared code.

It's also a current design policy to explicitly enumerate the bitcode code separately from internal codes. This is because there should be no binary compatibility concerns with renumbering internal IR enums, but if bitcode uses them directly, we'd have a problem.

-Chris

Chris Lattner

unread,

Nov 15, 2012, 4:30:37 PM11/15/12

to Chandler Carruth, Joe Abbey, llv...@cs.uiuc.edu

On Nov 14, 2012, at 7:37 PM, Chandler Carruth <chan...@google.com> wrote:

> On Wed, Nov 14, 2012 at 7:19 PM, Michael Ilseman <mils...@apple.com> wrote:
>> I see now that it's only binary operators that have OptimizationFlags
>> reserved for them in the bitcode. Adding fast-math flags for only binary ops
>> is straight-forward, but adding them for other ops might require a more
>> involved bitcode change.
> ...
>> Given all that, I think it makes sense to add support for fast-math flags
>> only to binary ops in this iteration, and think about adding it to other
>> operations in the future. Thoughts?

Michael: yes, starting with binary operators first makes perfect sense to me.

> I'm really not trying to rehash a discussion in too much depth, but I
> have to wonder with all of this -- why not use metadata as the
> *encoding* mechanism for these flags?

Chandler: Because it is heinously inefficient for something that gets applied to every FP operation, and because it violates the pattern established by NSW/NUW etc. This was discussed in depth on the previous email thread.

-Chris

Reply all

Reply to author

Forward