Per C and C++, integer division by 0 is undefined. That means, if it
happens, the compiler is free to do whatever it wants. It is perfectly
legal for LLVM to define r to be, say, 42 in this code; it is not
required to preserve the fact that the idiv instruction on x86 and
x86-64 will trap.
--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Per C and C++, integer division by 0 is undefined. That means, if it happens, the compiler is free to do whatever it wants. It is perfectly legal for LLVM to define r to be, say, 42 in this code; it is not required to preserve the fact that the idiv instruction on x86 and x86-64 will trap.
Per C and C++, integer division by 0 is undefined. That means, if it happens, the compiler is free to do whatever it wants. It is perfectly legal for LLVM to define r to be, say, 42 in this code; it is not required to preserve the fact that the idiv instruction on x86 and x86-64 will trap.
On Fri, Apr 5, 2013 at 2:40 PM, Joshua Cranmer 🐧 <Pidg...@gmail.com> wrote:...Per C and C++, integer division by 0 is undefined. That means, if it happens, the compiler is free to do whatever it wants. It is perfectly legal for LLVM to define r to be, say, 42 in this code; it is not required to preserve the fact that the idiv instruction on x86 and x86-64 will trap.This is quite a conundrum to me. Yes, I agree with you on the C/C++ Standards interpretation. However, the x86-64 expectations are orthogonal. I find that other compilers, including GCC, will trap by default at high optimization levels on x86-64 for this test case. Hardly scientific, but every other compiler on our machines issues a trap.We take safety seriously in our compiler. Our other components go to great lengths not to constant fold an integer division by zero. We would also like LLVM to do the same. Are there others in the community who feel this way?I can envision an option which preserves faults during constant folding. E.g. an integer version of -enable-unsafe-fp-math or gfortran's -ffpe-trap. More accurately, this hypothetical option would suppress the folding of unsafe integer expressions altogether. Would an option such as this benefit the community at large?
Clang's -fsanitize=integer-divide-by-zero option seems to provide what you want. See the Clang User Manual: http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation
This is quite a conundrum to me. Yes, I agree with you on the C/C++ Standards interpretation. However, the x86-64 expectations are orthogonal. I find that other compilers, including GCC, will trap by default at high optimization levels on x86-64 for this test case. Hardly scientific, but every other compiler on our machines issues a trap.
The platform is irrelevant; division by zero is undefined, and compilers are allowed to optimize as if it can't happen.
Both gcc and clang will lift the division out of this loop, for example, causing the hardware trap to happen in the "wrong" place if bar is passed zero:void foo(int x, int y);void bar(int x) {for (int i = 0; i < 100; ++i) {foo(i, 200/x);}}
I'm less concerned about "where" the trap happens as I am about "if" it happens. For example, a Fortran program with division-by-zero is, by the Standard, non-conforming. Pragmatically, not a Fortran program. Rather than wrong answers, I would like to see a hard error indicating that a program is non-conforming to the Standard.
As I've pointed out, clang does provide such functionality as an opt-in feature through its -fsanitize options. A hypothetical Fortran frontend could do the same, and even make it an opt-out feature if it chose. I'm sorry if its implementation mechanism doesn't match exactly what you want it to be, but it's not like nobody else has thought about this problem. They have, and they've designed and shipped a solution!Side note: even if the -fsanitize option introduces a branch around the division (which I haven't verified), it's quite unlikely to cause a significant performance regression. The branch to the error block should be perfectly predicted on any CPU made in the last 25 years.
All kidding aside, an instruction is an instruction. We have the functionality in hardware for free; it would be a shame not to use it. I should also mention that for our team, performance is job #1. Extra instructions, however insignificant, don't go over well without solid justification.
Presumably the optimizer benefits from taking advantage of the
undefined behavior, but to get a consistent result you need to check
for both zero and this case, which is an awful lot of checks. Yes they
will branch predict well, but this still can't be good, for code size
if nothing else. How much performance can you really get by constant
folding -9223372036854775808/-1 to an unspecified value?
A division intrinsic with defined behavior on all arguments would be
awesome! Thanks for considering this.
can't front-ends implement this themselves using the "select" constant
expression? For example, rather than dividing by "x" you can divide by
"x == 0 ? 1 : x".
If you can find a way to implement -fsanitize=undefined to use the FP
trap instead of a branch when checking floating division by 0, I
suspect such a patch would stand a good chance of being accepted.
(Although I'm not intimately familiar with the implementation, so
there could be some subtlety that would prevent it.)
You might need to
do this in the processor-specific backend to avoid other
undefined-behavior-based optimizations—that is, recognize "if (x == 0)
goto err_handler; else y/x;" and replace it with
"register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;".
If you want a trap you are going to have to (IMO) output IR
instructions rather than a constant expression. For example you can output the
equivalent of
if (x == 0)
llvm.trap() // or maybe some special trap routine
y = whatever/x
... use y ...
rather than the constant expression "whatever/x". If the optimizers can prove
that x is never zero then this will be sunk back into a constant expression (if
"whatever" is a constant expression).
If the performance penalty is unclear to you, that means you haven't
measured it. Until you measure, you have absolutely no business
complaining about a potential performance problem. Measure, and then
come back with numbers.
> Although, I've been contemplating x86-64's behaviour for this case whenTo be clear, you're asking to turn off a set of optimizations. That
> floating point traps are disabled. Ideally, the compiler should preserve
> that behaviour, which might make this software implementation messy.
> Especially if different processors have different implementations. The
> simplest solution... let the hardware behave as it should.
is, you're asking to make code in general run slower, so that you can
get a particular behavior on some CPUs in unusual cases.
>> You might need toIf the compiler can prove x==0, then yes, you'd be left with just a
>> do this in the processor-specific backend to avoid other
>> undefined-behavior-based optimizations—that is, recognize "if (x == 0)
>> goto err_handler; else y/x;" and replace it with
>> "register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;".
>
>
> I believe that the constant folder would remove the constant division by
> zero and conditional before the backend could have its say. We would be left
> with only the jump to the error handler. That may complicate things.
jump to the error handler. That's more efficient than handling a
hardware trap, so it's what you ought to want.
To be clear, you're asking to turn off a set of optimizations. That
is, you're asking to make code in general run slower, so that you can
get a particular behavior on some CPUs in unusual cases.I respectfully disagree. I am asking for an *option* to turn off a set of optimizations; not turn off optimizations in general. I would like to make it easy for a compiler implementor to choose the desired behaviour. I whole-heartedly believe that both behaviours (undefined and trap) have merit.
I think this entire conversation is going a bit off the rails. Let's try to stay focused on the specific request, and why there (may) be problems with it.I think you're both misconstruing what this would involve.On Sun, Apr 7, 2013 at 11:50 AM, Cameron McInally <cameron....@nyu.edu> wrote:
To be clear, you're asking to turn off a set of optimizations. That
is, you're asking to make code in general run slower, so that you can
get a particular behavior on some CPUs in unusual cases.I respectfully disagree. I am asking for an *option* to turn off a set of optimizations; not turn off optimizations in general. I would like to make it easy for a compiler implementor to choose the desired behaviour. I whole-heartedly believe that both behaviours (undefined and trap) have merit.You're actually asking for the formal model of the LLVM IR to be *parameterized*. In one mode, an instruction would produce undefined behavior on division, and in another mode it would produce a trap. Then you are asking for the optimizer stack to support either semantic model, and produce efficient code regardless.This is completely intractable for LLVM to support. It would make both the optimizers and the developers of LLVM crazy to have deep parameterization of the fundamental semantic model for integer division.The correct way to support *exactly* reproducing the architectural peculiarities of the x86-64 integer divide instruction is to add a target-specific intrinsic which does this. It will have defined behavior (of trapping in some cases) as you want, and you can emit this in your FE easily. However, even this has the risk of incurring a high maintenance burden. If you want much in the way of optimizations of this intrinsic, you'll have to go through the optimizer and teach each pass about your intrinsic. Some of these will be easy, but some will be hard and there will be a *lot* of them. =/Cameron, you (and others interested) will certainly need to provide all of the patches and work to support this if you think this is an important use case, as the existing developers have found other trade-offs and solutions. And even then, if it requires really substantial changes to the optimizer, I'm not sure it's worth pursuing this in LLVM. My primary concerns are two-fold. First, I think that the amount of work required to recover the optimizations which could theoretically apply to both of these operations will be massive. Second, I fear that after having done this work, you will immediately find the need to remove some other undefined behavior from the IR which happens to have defined behavior on x86-64.
Fundamentally, the idea of undefined behavior is at the core of the design of LLVM's optimizers. It is leveraged everywhere, and without it many algorithms that are fast would become slow, transformations that are cheap would become expensive, passes that operate locally would be forced to operate across ever growing scopes in order to be certain the optimizations applied in this specific case. Trying to remove undefined behavior from LLVM seems unlikely to be a productive pursuit.
More productive (IMO) is to emit explicit guards against the undefined behavior in your language, much as -fsanitize does for undefined behavior in C++. Then work to build a mode where a specific target can take advantage of target specific trapping behaviors to emit these guards more efficiently. This will allow LLVM's optimizers to continue to function in the world they were designed for, and with a set of rules that we know how to build efficient optimizers around, and your source programs can operate in a world with checked behavior rather than undefined behavior. As a useful side-effect, you can defer the target-specific optimizations until you have benchmarks (internally is fine!) and can demonstrate the performance problems (if any).
Cameron, you may disagree, but honestly if you were to convince folks here I think it would have happened already. I'm not likely to continue the theoretical debate about whether LLVM's stance on UB (as I've described above) is a "good" or "bad" stance. Not that I wouldn't enjoy the debate (especially at a bar some time), but because I fear it isn't a productive way to spend the time of folks on this list. So let's try to stick to the technical discussion of strategies, costs, and tradeoffs.
More productive (IMO) is to emit explicit guards against the undefined behavior in your language, much as -fsanitize does for undefined behavior in C++. Then work to build a mode where a specific target can take advantage of target specific trapping behaviors to emit these guards more efficiently. This will allow LLVM's optimizers to continue to function in the world they were designed for, and with a set of rules that we know how to build efficient optimizers around, and your source programs can operate in a world with checked behavior rather than undefined behavior. As a useful side-effect, you can defer the target-specific optimizations until you have benchmarks (internally is fine!) and can demonstrate the performance problems (if any).Regrettably, this implementation does not suit my needs.
The constant folding would still occur
and I would like to produce the actual division, since the instruction is non-maskable on x86.
On Apr 8, 2013 8:42 AM, "Duncan Sands" <bald...@free.fr> wrote:
>
> Hi Cameron,
>
>
> On 07/04/13 19:42, Cameron McInally wrote:
>>
>> On Sun, Apr 7, 2013 at 1:23 PM, Duncan Sands <bald...@free.fr
>> <mailto:bald...@free.fr>> wrote:
>> ...
>>
>> If you want a trap you are going to have to (IMO) output IR
>> instructions rather than a constant expression. For example you can output the
>> equivalent of
>> if (x == 0)
>> llvm.trap() // or maybe some special trap routine
>> y = whatever/x
>> ... use y ...
>> rather than the constant expression "whatever/x". If the optimizers can prove
>> that x is never zero then this will be sunk back into a constant expression (if
>> "whatever" is a constant expression).
>>
>>
>> Ah, that's super interesting. I'll have to give it some more thought. Thanks for
>> this, Duncan.
>>
>> My knee-jerk reaction is that emulating hardware behaviour in software should be
>> avoided. At least in this specific case.
>
>
> I reckon it shouldn't be too hard to teach the code generators to turn this IR
> sequence into "y = target-divide whatever/x" on targets for which dividing by 0
> traps in a satisfactory way, so it turns into something efficient.
>
If the application is safety related, then the if (x == 0) should be there in the source code anyway.
If it happens to be able to optimize it into the div instruction at target lowering, then it could do it then.
If a target cpu does not do the div trap then the target lowering would at least maintain the safety at the expense of an extra branch.
James.
On Sun, Apr 7, 2013 at 9:28 PM, Cameron McInally <cameron....@nyu.edu> wrote:
More productive (IMO) is to emit explicit guards against the undefined behavior in your language, much as -fsanitize does for undefined behavior in C++. Then work to build a mode where a specific target can take advantage of target specific trapping behaviors to emit these guards more efficiently. This will allow LLVM's optimizers to continue to function in the world they were designed for, and with a set of rules that we know how to build efficient optimizers around, and your source programs can operate in a world with checked behavior rather than undefined behavior. As a useful side-effect, you can defer the target-specific optimizations until you have benchmarks (internally is fine!) and can demonstrate the performance problems (if any).Regrettably, this implementation does not suit my needs.I'm curious why you think so... I think it would very closely match your needs:
The constant folding would still occurNo? At least, I don't see why.I don't know what your frontend is doing (but my impression is that it is not Clang? If I'm wrong, let me know...)
but the common idiom is to emit nothing as a constant other than immediate values, and let LLVM's optimizers figure it out.
A consequence of this pattern combined with inserting guards is that (at most) the optimizer will fold directly to the trap. You should be able to control the exact structure in the FE though.
and I would like to produce the actual division, since the instruction is non-maskable on x86.Yes, and this is the point I was driving at with the comments about a target specific optimization. It is reasonable for the x86 backend to recognize the pattern of testing for a zero divisor and trapping if it occurs, and transform that into an unconditional divide relying on the hardware trap instead. I think it is not unreasonable to have this strategy result in one of two generated code patterns in the overwhelming majority of cases:
1) An unconditional trap because the optimizer proved divide by zero
2) A direct divide instruction which relies on the x86 trapping behavior.
I reckon it shouldn't be too hard to teach the code generators to turn this IR
sequence into "y = target-divide whatever/x" on targets for which dividing by 0
traps in a satisfactory way, so it turns into something efficient.
I was just writing Chandler about a similar implementation. With my current understanding of the problem, my constant division will end up as either a trap call or a load of an undefined value in the IR, depending on what the FE chooses to do. It's not clear to me, at least with my current knowledge of LLVM, how to recreate the actual division instruction in the target backend from that information (trap or undefined value). Hopefully there is a way. ;)
*** IR Dump Before Deduce function attributes ***define i32 @main() #0 {entry:call void @__ubsan_handle_divrem_overflow(i8* bitcast ({ { i8*, i32, i32 }, { i16, i16, [6 x i8] }* }* @2 to i8*), i64 6, i64 0) #3%call1 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([9 x i8]* @.str, i64 0, i64 0), i32 undef) #3ret i32 0}
> As I've pointed out, clang does provide such functionality as an
> opt-in feature through its -fsanitize options. A hypothetical Fortran
> frontend could do the same, and even make it an opt-out feature if it
> chose. I'm sorry if its implementation mechanism doesn't match
> exactly what you want it to be, but it's not like nobody else has
> thought about this problem. They have, and they've designed and
> shipped a solution!
The problem with a clang implementation is that not everyone uses clang.
It sure would be nice to have an option that could be set to not
constant fold traps away. Then it would be easy for implementors to
just set that option if they want.
This is an optimization/codegen issue so it seems to me the fix ought to
be there, not in the frontend.
Is there great harm in adding an option here?
-David