[LLVMdev] Integer divide by zero

4,336 views
Skip to first unread message

Cameron McInally

unread,
Apr 5, 2013, 2:23:39 PM4/5/13
to llv...@cs.uiuc.edu
Hey guys,

I'm learning that LLVM does not preserve faults during constant folding. I realize that this is an architecture dependent problem, but I'm not sure if it's safe to constant fold away a fault on x86-64.

A little testcase:

#include <stdio.h>

int foo(int j, int d) {
  return j / d ;
}

int bar (int k, int d) {
  return foo(k + 1, d);
}

int main( void ) {
  int r = bar(5, 0);
  printf( " r = %d\n", r);
}

At execution, on an x86-64 processor, this optimized code should fault with a divide-by-zero. At -O2, LLVM 3.1 folds the divide by zero into a constant load. This is, of course, undefined by the C standard, so the folding makes sense on that front. But on x86-64, I don't think this is okay. On x86-64, an integer divide by zero is non-maskable, meaning it will always trap and has no result of any type.

My first question is, what's the rationale behind not preserving faults on constant integer folding? Second, is this a bug? With my current understanding, this is most definitely a bug for me on x86-64. But, I'm wondering if there is a hole in my understanding.

Thanks in advance,
Cameron


Joshua Cranmer 🐧

unread,
Apr 5, 2013, 2:40:12 PM4/5/13
to llv...@cs.uiuc.edu
On 4/5/2013 1:23 PM, Cameron McInally wrote:
> Hey guys,
>
> I'm learning that LLVM does not preserve faults during constant
> folding. I realize that this is an architecture dependent problem, but
> I'm not sure if it's safe to constant fold away a fault on x86-64.
>
> A little testcase:
>
> #include <stdio.h>
>
> int foo(int j, int d) {
> return j / d ;
> }
>
> int bar (int k, int d) {
> return foo(k + 1, d);
> }
>
> int main( void ) {
> int r = bar(5, 0);
> printf( " r = %d\n", r);
> }
>
> At execution, on an x86-64 processor, this optimized code should fault
> with a divide-by-zero. At -O2, LLVM 3.1 folds the divide by zero into
> a constant load. This is, of course, undefined by the C standard, so
> the folding makes sense on that front. But on x86-64, I don't think
> this is okay. On x86-64, an integer divide by zero is non-maskable,
> meaning it will always trap and has no result of any type.

Per C and C++, integer division by 0 is undefined. That means, if it
happens, the compiler is free to do whatever it wants. It is perfectly
legal for LLVM to define r to be, say, 42 in this code; it is not
required to preserve the fact that the idiv instruction on x86 and
x86-64 will trap.

--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Renato Golin

unread,
Apr 5, 2013, 3:08:02 PM4/5/13
to Joshua Cranmer 🐧, LLVM Dev
On 5 April 2013 19:40, Joshua Cranmer 🐧 <Pidg...@gmail.com> wrote:
Per C and C++, integer division by 0 is undefined. That means, if it happens, the compiler is free to do whatever it wants. It is perfectly legal for LLVM to define r to be, say, 42 in this code; it is not required to preserve the fact that the idiv instruction on x86 and x86-64 will trap.

Undefined behaviour aside, would be good to have a warning, I think. Does the static analyser have something like it?

cheers,
--renato

Cameron McInally

unread,
Apr 5, 2013, 4:42:54 PM4/5/13
to Pidg...@gmail.com, llv...@cs.uiuc.edu
On Fri, Apr 5, 2013 at 2:40 PM, Joshua Cranmer 🐧 <Pidg...@gmail.com> wrote:
... 
Per C and C++, integer division by 0 is undefined. That means, if it happens, the compiler is free to do whatever it wants. It is perfectly legal for LLVM to define r to be, say, 42 in this code; it is not required to preserve the fact that the idiv instruction on x86 and x86-64 will trap.

This is quite a conundrum to me. Yes, I agree with you on the C/C++ Standards interpretation. However, the x86-64 expectations are orthogonal. I find that other compilers, including GCC, will trap by default at high optimization levels on x86-64 for this test case. Hardly scientific, but every other compiler on our machines issues a trap.

We take safety seriously in our compiler. Our other components go to great lengths not to constant fold an integer division by zero. We would also like LLVM to do the same. Are there others in the community who feel this way?

I can envision an option which preserves faults during constant folding. E.g. an integer version of -enable-unsafe-fp-math or gfortran's -ffpe-trap. More accurately, this hypothetical option would suppress the folding of unsafe integer expressions altogether. Would an option such as this benefit the community at large?

To be complete, I've also explored the idea of generating a __builtin_trap() call for such expressions before the IR level. However, I have not yet convinced myself that this will generate the same fault as the actual sdiv/udiv instruction would. Things to do.

-Cameron

Owen Anderson

unread,
Apr 5, 2013, 5:06:44 PM4/5/13
to cameron....@nyu.edu, Pidg...@gmail.com, llv...@cs.uiuc.edu

On Apr 5, 2013, at 1:42 PM, Cameron McInally <cameron....@nyu.edu> wrote:

On Fri, Apr 5, 2013 at 2:40 PM, Joshua Cranmer 🐧 <Pidg...@gmail.com> wrote:
... 
Per C and C++, integer division by 0 is undefined. That means, if it happens, the compiler is free to do whatever it wants. It is perfectly legal for LLVM to define r to be, say, 42 in this code; it is not required to preserve the fact that the idiv instruction on x86 and x86-64 will trap.

This is quite a conundrum to me. Yes, I agree with you on the C/C++ Standards interpretation. However, the x86-64 expectations are orthogonal. I find that other compilers, including GCC, will trap by default at high optimization levels on x86-64 for this test case. Hardly scientific, but every other compiler on our machines issues a trap.

We take safety seriously in our compiler. Our other components go to great lengths not to constant fold an integer division by zero. We would also like LLVM to do the same. Are there others in the community who feel this way?

I can envision an option which preserves faults during constant folding. E.g. an integer version of -enable-unsafe-fp-math or gfortran's -ffpe-trap. More accurately, this hypothetical option would suppress the folding of unsafe integer expressions altogether. Would an option such as this benefit the community at large?

Clang's -fsanitize=integer-divide-by-zero option seems to provide what you want.  See the Clang User Manual: http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation

--Owen

Cameron McInally

unread,
Apr 5, 2013, 5:26:20 PM4/5/13
to Owen Anderson, Joshua Cranmer, llv...@cs.uiuc.edu
On Fri, Apr 5, 2013 at 5:06 PM, Owen Anderson <resi...@mac.com> wrote:
...
Clang's -fsanitize=integer-divide-by-zero option seems to provide what you want.  See the Clang User Manual: http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation


That looks pretty close. I assume there are performance penalties for the runtime checks?

I would really like to produce the actual sdiv/udiv instruction and allow the processor to fault on it. That seems like the right fix to me.

-Cameron

Joe Groff

unread,
Apr 5, 2013, 6:49:14 PM4/5/13
to cameron....@nyu.edu, Pidg...@gmail.com, llv...@cs.uiuc.edu
On Fri, Apr 5, 2013 at 1:42 PM, Cameron McInally <cameron....@nyu.edu> wrote:
This is quite a conundrum to me. Yes, I agree with you on the C/C++ Standards interpretation. However, the x86-64 expectations are orthogonal. I find that other compilers, including GCC, will trap by default at high optimization levels on x86-64 for this test case. Hardly scientific, but every other compiler on our machines issues a trap.

The platform is irrelevant; division by zero is undefined, and compilers are allowed to optimize as if it can't happen. Both gcc and clang will lift the division out of this loop, for example, causing the hardware trap to happen in the "wrong" place if bar is passed zero:

void foo(int x, int y);

void bar(int x) {
for (int i = 0; i < 100; ++i) {
foo(i, 200/x);
}
}

If your program relies on the exact semantics of the 'idiv'/'udiv' hardware instruction, you might want to use an asm volatile statement.

-Joe 

Cameron McInally

unread,
Apr 5, 2013, 11:02:30 PM4/5/13
to Joe Groff, Joshua Cranmer, llv...@cs.uiuc.edu
On Fri, Apr 5, 2013 at 6:49 PM, Joe Groff <arc...@gmail.com> wrote:
...

The platform is irrelevant; division by zero is undefined, and compilers are allowed to optimize as if it can't happen. 
 
Understood.

I don't mean to nag, but I'm not arguing which is [more] correct: the Standards or the platform. Both have their own merits. At least to me they both have merits. The choice should be up to the compiler implementor. And, I would like to see LLVM make that choice easy for the compiler implementor. Maybe an anachronism, but there is some history of this option being built-in to other major compilers: GNU, Sun, HP, IBM, SGI, Compaq, etc.

But, if others can not (or will not) benefit from this option, I will not force my individual needs on the whole.

Both gcc and clang will lift the division out of this loop, for example, causing the hardware trap to happen in the "wrong" place if bar is passed zero:

void foo(int x, int y);

void bar(int x) {
for (int i = 0; i < 100; ++i) {
foo(i, 200/x);
}
}

I'm less concerned about "where" the trap happens as I am about "if" it happens. For example, a Fortran program with division-by-zero is, by the Standard, non-conforming. Pragmatically, not a Fortran program. Rather than wrong answers, I would like to see a hard error indicating that a program is non-conforming to the Standard.
-Cameron

Owen Anderson

unread,
Apr 6, 2013, 1:18:48 AM4/6/13
to cameron....@nyu.edu, Joshua Cranmer, llv...@cs.uiuc.edu

On Apr 5, 2013, at 8:02 PM, Cameron McInally <cameron....@nyu.edu> wrote:

I'm less concerned about "where" the trap happens as I am about "if" it happens. For example, a Fortran program with division-by-zero is, by the Standard, non-conforming. Pragmatically, not a Fortran program. Rather than wrong answers, I would like to see a hard error indicating that a program is non-conforming to the Standard.

As I've pointed out, clang does provide such functionality as an opt-in feature through its -fsanitize options.  A hypothetical Fortran frontend could do the same, and even make it an opt-out feature if it chose.  I'm sorry if its implementation mechanism doesn't match exactly what you want it to be, but it's not like nobody else has thought about this problem.  They have, and they've designed and shipped a solution!

Side note: even if the -fsanitize option introduces a branch around the division (which I haven't verified), it's quite unlikely to cause a significant performance regression.  The branch to the error block should be perfectly predicted on any CPU made in the last 25 years.

--Owen

Jeff Bezanson

unread,
Apr 6, 2013, 4:01:25 AM4/6/13
to llv...@cs.uiuc.edu
I'm also not fully happy with LLVM's behavior here. There is another
undefined case too, which is the minimum integer divided by -1. In
Julia I can get "random" answers by doing:

julia> sdiv_int(-9223372036854775808, -1)
87106304

julia> sdiv_int(-9223372036854775808, -1)
87108096

In other contexts where the arguments are not constant, this typically
gives an FPE trap.
More than insisting on a particular behavior, I'd like it to be
consistent. I know the result is undefined, so LLVM's behavior here is
valid, but I find that to be an overly lawyerly interpretation.
Presumably the optimizer benefits from taking advantage of the
undefined behavior, but to get a consistent result you need to check
for both zero and this case, which is an awful lot of checks. Yes they
will branch predict well, but this still can't be good, for code size
if nothing else. How much performance can you really get by constant
folding -9223372036854775808/-1 to an unspecified value?

Cameron McInally

unread,
Apr 6, 2013, 4:07:13 AM4/6/13
to Owen Anderson, Joshua Cranmer, llv...@cs.uiuc.edu
Hey Owen,

Thanks for your reply. I definitely understand your perspective. Hopefully I can change it.  }:-)

On Sat, Apr 6, 2013 at 1:18 AM, Owen Anderson <resi...@mac.com> wrote:
... 
As I've pointed out, clang does provide such functionality as an opt-in feature through its -fsanitize options.  A hypothetical Fortran frontend could do the same, and even make it an opt-out feature if it chose.  I'm sorry if its implementation mechanism doesn't match exactly what you want it to be, but it's not like nobody else has thought about this problem.  They have, and they've designed and shipped a solution!

Side note: even if the -fsanitize option introduces a branch around the division (which I haven't verified), it's quite unlikely to cause a significant performance regression.  The branch to the error block should be perfectly predicted on any CPU made in the last 25 years.


Even on a Xeon Phi? Just kidding around!

All kidding aside, an instruction is an instruction. We have the functionality in hardware for free; it would be a shame not to use it. I should also mention that for our team, performance is job #1. Extra instructions, however insignificant, don't go over well without solid justification.

Please don't misunderstand, I'm not trying to be unreasonable about this behaviour change. I do believe that there are a few strong arguments for this behaviour that we haven't touched upon yet. I also believe that this option would be an improvement to LLVM overall. But, I also haven't heard anyone in the community voicing interest. So, if no one speaks up and would like to discuss further, I'll be happy to keep such a change in my local branch.

-Cameron

Joe Groff

unread,
Apr 6, 2013, 9:35:14 AM4/6/13
to Cameron McInally, llv...@cs.uiuc.edu, Joshua Cranmer
On Saturday, April 6, 2013, Cameron McInally wrote:
All kidding aside, an instruction is an instruction. We have the functionality in hardware for free; it would be a shame not to use it. I should also mention that for our team, performance is job #1. Extra instructions, however insignificant, don't go over well without solid justification.

A compromise might be to add an llvm.div.or.trap intrinsic, similar to the llvm.add.with.overflow intrinsics used to implement Clang's optional integer overflow checks, which could be emitted efficiently on x86 or other platforms where division by zero or INT_MAX/-1 traps natively.

-Joe

Joe Groff

unread,
Apr 6, 2013, 11:27:01 AM4/6/13
to Jeff Bezanson, llv...@cs.uiuc.edu
On Saturday, April 6, 2013, Jeff Bezanson wrote:

Presumably the optimizer benefits from taking advantage of the
undefined behavior, but to get a consistent result you need to check
for both zero and this case, which is an awful lot of checks. Yes they
will branch predict well, but this still can't be good, for code size
if nothing else. How much performance can you really get by constant
folding -9223372036854775808/-1 to an unspecified value?

Constant folding undefined expressions is sort of silly, but I appreciate that it makes undefined behavior problems in frontends immediately apparent with trivial cases before they creep up on you in more complicated optimized code. After all, even if the backend makes practical concessions to trivial cases, the underlying semantic problem is still there and will bite you eventually. For high-level languages like Julia that want to provide efficiency but also give defined behavior to all user-exposed cases, I think adding an LLVM intrinsic to represent division that's defined to trap on division by zero or overflow would be the best solution. Since the trap is a side effect, it would stifle optimization of code that used the intrinsic, but the intrinsic could still be lowered to a single hardware instruction and avoid the branching and code bloat of manual checks on hardware where division by zero natively traps.

-Joe

Krzysztof Parzyszek

unread,
Apr 6, 2013, 12:16:27 PM4/6/13
to llv...@cs.uiuc.edu
On 4/6/2013 3:07 AM, Cameron McInally wrote:
>
> Please don't misunderstand, I'm not trying to be unreasonable about this
> behaviour change. I do believe that there are a few strong arguments for
> this behaviour that we haven't touched upon yet. I also believe that
> this option would be an improvement to LLVM overall. But, I also haven't
> heard anyone in the community voicing interest. So, if no one speaks up
> and would like to discuss further, I'll be happy to keep such a change
> in my local branch.

The IBM XLC compiler has a -qcheck option, which takes arguments
specifying what type of runtime checks are performed. For example,
-qcheck=divzero would cause the program to abort with a SIGTRAP if a
division by zero was detected (the compiler would insert a conditional
trap instruction).

Producing a diagnostic message is one way to address this[1], but
generating some sort of an exception has its merits as well---it all
depends on the specific situation. I don't think that the existence of
one precludes the other.

[1] I've never used fsanitize---this is based on what I've read in the
clang's manual.

-Krzysztof


--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Jeff Bezanson

unread,
Apr 6, 2013, 3:22:24 PM4/6/13
to Joe Groff, llv...@cs.uiuc.edu
A division intrinsic with defined behavior on all arguments would be
awesome! Thanks for considering this.

Cameron McInally

unread,
Apr 6, 2013, 4:52:40 PM4/6/13
to Jeff Bezanson, llv...@cs.uiuc.edu
On Sat, Apr 6, 2013 at 3:22 PM, Jeff Bezanson <jeff.b...@gmail.com> wrote:
A division intrinsic with defined behavior on all arguments would be
awesome! Thanks for considering this.

'Tis a good compromise. If there are no objections/concerns, I would like to move forward with it. 

Thanks, Joe!

-Cameron

 

Duncan Sands

unread,
Apr 7, 2013, 11:22:55 AM4/7/13
to llv...@cs.uiuc.edu
Hi Cameron,
can't front-ends implement this themselves using the "select" constant
expression? For example, rather than dividing by "x" you can divide by
"x == 0 ? 1 : x".

Ciao, Duncan.

Cameron McInally

unread,
Apr 7, 2013, 12:20:01 PM4/7/13
to Duncan Sands, llv...@cs.uiuc.edu
Hey Duncan,

On Sun, Apr 7, 2013 at 11:22 AM, Duncan Sands <bald...@free.fr> wrote:
... 
can't front-ends implement this themselves using the "select" constant
expression?  For example, rather than dividing by "x" you can divide by
"x == 0 ? 1 : x".

I'm not sure that I follow. I would like to see the division by zero execute and trap at runtime.

For example, x = 2/0. It looks like your suggestion would generate the expression 2/1, which wouldn't trap as expected. 

Am I misinterpreting your suggestion?

Thanks,
Cameron

P.S. Is it your intention to hide the simple constant expression from the constant folder? I.e. (x == 0) ? 0 : x. 
 

Jeffrey Yasskin

unread,
Apr 7, 2013, 12:31:04 PM4/7/13
to Cameron McInally, llv...@cs.uiuc.edu, Joshua Cranmer
On Sat, Apr 6, 2013 at 10:07 AM, Cameron McInally
<cameron....@nyu.edu> wrote:
> Hey Owen,
>
> Thanks for your reply. I definitely understand your perspective. Hopefully I
> can change it. }:-)
>
> On Sat, Apr 6, 2013 at 1:18 AM, Owen Anderson <resi...@mac.com> wrote:
> ...
>>
>> As I've pointed out, clang does provide such functionality as an opt-in
>> feature through its -fsanitize options. A hypothetical Fortran frontend
>> could do the same, and even make it an opt-out feature if it chose. I'm
>> sorry if its implementation mechanism doesn't match exactly what you want it
>> to be, but it's not like nobody else has thought about this problem. They
>> have, and they've designed and shipped a solution!
>>
>> Side note: even if the -fsanitize option introduces a branch around the
>> division (which I haven't verified), it's quite unlikely to cause a
>> significant performance regression. The branch to the error block should be
>> perfectly predicted on any CPU made in the last 25 years.
>
> All kidding aside, an instruction is an instruction. We have the
> functionality in hardware for free; it would be a shame not to use it. I
> should also mention that for our team, performance is job #1. Extra
> instructions, however insignificant, don't go over well without solid
> justification.

If you can find a way to implement -fsanitize=undefined to use the FP
trap instead of a branch when checking floating division by 0, I
suspect such a patch would stand a good chance of being accepted.
(Although I'm not intimately familiar with the implementation, so
there could be some subtlety that would prevent it.) You might need to
do this in the processor-specific backend to avoid other
undefined-behavior-based optimizations—that is, recognize "if (x == 0)
goto err_handler; else y/x;" and replace it with
"register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;".

Jeffrey

Duncan Sands

unread,
Apr 7, 2013, 1:23:37 PM4/7/13
to cameron....@nyu.edu, llv...@cs.uiuc.edu
Hi Cameron,

On 07/04/13 18:20, Cameron McInally wrote:
> Hey Duncan,
>
> On Sun, Apr 7, 2013 at 11:22 AM, Duncan Sands <bald...@free.fr
> <mailto:bald...@free.fr>> wrote:
> ...
>
> can't front-ends implement this themselves using the "select" constant
> expression? For example, rather than dividing by "x" you can divide by
> "x == 0 ? 1 : x".
>
>
> I'm not sure that I follow. I would like to see the division by zero execute and
> trap at runtime.
>
> For example, x = 2/0. It looks like your suggestion would generate the
> expression 2/1, which wouldn't trap as expected.
>
> Am I misinterpreting your suggestion?

I didn't realize you wanted a trap, I just thought you wanted to avoid undefined
behaviour. If you want a trap you are going to have to (IMO) output IR
instructions rather than a constant expression. For example you can output the
equivalent of
if (x == 0)
llvm.trap() // or maybe some special trap routine
y = whatever/x
... use y ...
rather than the constant expression "whatever/x". If the optimizers can prove
that x is never zero then this will be sunk back into a constant expression (if
"whatever" is a constant expression).

Ciao, Duncan.

>
> Thanks,
> Cameron
>
> P.S. Is it your intention to hide the simple constant expression from the
> constant folder? I.e. (x == 0) ? 0 : x.

Cameron McInally

unread,
Apr 7, 2013, 1:24:59 PM4/7/13
to Jeffrey Yasskin, llv...@cs.uiuc.edu, Joshua Cranmer
Hey Jeffrey,

Thanks for the suggestion. A few comments...

On Sun, Apr 7, 2013 at 12:31 PM, Jeffrey Yasskin <jyas...@googlers.com> wrote:
...
If you can find a way to implement -fsanitize=undefined to use the FP
trap instead of a branch when checking floating division by 0, I
suspect such a patch would stand a good chance of being accepted.
(Although I'm not intimately familiar with the implementation, so
there could be some subtlety that would prevent it.)

This would work well for the constant expression division case, since the division by zero expression would be folded away leaving only a call to trap. The performance penalty for using -fsanitize on other non-constant division expressions is still unclear to me; and concerning.

Although, I've been contemplating x86-64's behaviour for this case when floating point traps are disabled. Ideally, the compiler should preserve that behaviour, which might make this software implementation messy. Especially if different processors have different implementations. The simplest solution... let the hardware behave as it should.
 
You might need to
do this in the processor-specific backend to avoid other
undefined-behavior-based optimizations—that is, recognize "if (x == 0)
goto err_handler; else y/x;" and replace it with
"register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;".

I believe that the constant folder would remove the constant division by zero and conditional before the backend could have its say. We would be left with only the jump to the error handler. That may complicate things.

Please correct me if I am misunderstanding.

Thanks again,
Cameron

Jeffrey Yasskin

unread,
Apr 7, 2013, 1:39:48 PM4/7/13
to cameron....@nyu.edu, llv...@cs.uiuc.edu, Joshua Cranmer
On Sun, Apr 7, 2013 at 7:24 PM, Cameron McInally
<cameron....@nyu.edu> wrote:
> Hey Jeffrey,
>
> Thanks for the suggestion. A few comments...
>
> On Sun, Apr 7, 2013 at 12:31 PM, Jeffrey Yasskin <jyas...@googlers.com>
> wrote:
> ...
>>
>> If you can find a way to implement -fsanitize=undefined to use the FP
>> trap instead of a branch when checking floating division by 0, I
>> suspect such a patch would stand a good chance of being accepted.
>> (Although I'm not intimately familiar with the implementation, so
>> there could be some subtlety that would prevent it.)
>
>
> This would work well for the constant expression division case, since the
> division by zero expression would be folded away leaving only a call to
> trap. The performance penalty for using -fsanitize on other non-constant
> division expressions is still unclear to me; and concerning.

If the performance penalty is unclear to you, that means you haven't
measured it. Until you measure, you have absolutely no business
complaining about a potential performance problem. Measure, and then
come back with numbers.

> Although, I've been contemplating x86-64's behaviour for this case when
> floating point traps are disabled. Ideally, the compiler should preserve
> that behaviour, which might make this software implementation messy.
> Especially if different processors have different implementations. The
> simplest solution... let the hardware behave as it should.

To be clear, you're asking to turn off a set of optimizations. That
is, you're asking to make code in general run slower, so that you can
get a particular behavior on some CPUs in unusual cases.

>> You might need to
>> do this in the processor-specific backend to avoid other
>> undefined-behavior-based optimizations—that is, recognize "if (x == 0)
>> goto err_handler; else y/x;" and replace it with
>> "register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;".
>
>
> I believe that the constant folder would remove the constant division by
> zero and conditional before the backend could have its say. We would be left
> with only the jump to the error handler. That may complicate things.

If the compiler can prove x==0, then yes, you'd be left with just a
jump to the error handler. That's more efficient than handling a
hardware trap, so it's what you ought to want. In cases of
non-constant division by zero, the branch wouldn't usually get
removed, and the backend would get a chance to optimize it.

Cameron McInally

unread,
Apr 7, 2013, 1:42:11 PM4/7/13
to Duncan Sands, llv...@cs.uiuc.edu
On Sun, Apr 7, 2013 at 1:23 PM, Duncan Sands <bald...@free.fr> wrote:
... 
If you want a trap you are going to have to (IMO) output IR

instructions rather than a constant expression.  For example you can output the
equivalent of
  if (x == 0)
     llvm.trap() // or maybe some special trap routine
  y = whatever/x
  ... use y ...
rather than the constant expression "whatever/x".  If the optimizers can prove
that x is never zero then this will be sunk back into a constant expression (if
"whatever" is a constant expression).

Ah, that's super interesting. I'll have to give it some more thought. Thanks for this, Duncan.

My knee-jerk reaction is that emulating hardware behaviour in software should be avoided. At least in this specific case.

Thanks again,
Cameron

Cameron McInally

unread,
Apr 7, 2013, 2:50:53 PM4/7/13
to Jeffrey Yasskin, llv...@cs.uiuc.edu, Joshua Cranmer
Hey again,

Thank you for your opinions. I will take them into consideration. A few comments...

On Sun, Apr 7, 2013 at 1:39 PM, Jeffrey Yasskin <jyas...@google.com> wrote:
...

If the performance penalty is unclear to you, that means you haven't
measured it. Until you measure, you have absolutely no business
complaining about a potential performance problem. Measure, and then
come back with numbers.

Unfortunately, I am restricted from publicly sharing performance results without going through an extensive, expensive legal process. Not fun! 

Some thoughts though...

In order to test the performance of this Clang feature, I would have to build it into my frontend. That's not cost effective for me for the following reason.

It seems to me, a priori, that the code currently generated by Clang would indeed have a performance penalty on an inorder processor, without branch prediction. Take Xeon Phi for example. Albeit, a small penalty. Please correct me if my assumptions are incorrect.
 
Our team's culture dictates that "an instruction is an instruction", hence a performance problem. I understand that "performance problem" will have different definitions among different tribes. 


> Although, I've been contemplating x86-64's behaviour for this case when
> floating point traps are disabled. Ideally, the compiler should preserve
> that behaviour, which might make this software implementation messy.
> Especially if different processors have different implementations. The
> simplest solution... let the hardware behave as it should.

To be clear, you're asking to turn off a set of optimizations. That
is, you're asking to make code in general run slower, so that you can
get a particular behavior on some CPUs in unusual cases.

I respectfully disagree. I am asking for an *option* to turn off a set of optimizations; not turn off optimizations in general. I would like to make it easy for a compiler implementor to choose the desired behaviour. I whole-heartedly believe that both behaviours (undefined and trap) have merit.

To digress in the interest of light-heartedness, this reminds me of the old joke "my program's performance improved 20x!, but the results aren't correct". :)
 
>> You might need to
>> do this in the processor-specific backend to avoid other
>> undefined-behavior-based optimizations—that is, recognize "if (x == 0)
>> goto err_handler; else y/x;" and replace it with
>> "register-pc-in-fp-handler-map(); turn-on-fp-traps(); y/x;".
>
>
> I believe that the constant folder would remove the constant division by
> zero and conditional before the backend could have its say. We would be left
> with only the jump to the error handler. That may complicate things.

If the compiler can prove x==0, then yes, you'd be left with just a
jump to the error handler. That's more efficient than handling a
hardware trap, so it's what you ought to want. 

I would like a trap. I.e. x86-64's expected behaviour. 

I would also not like a branch on non-constant integer divisions. As a reminder, this discussion originated in the constant folder. The non-constant behaviour works just fine.

Thanks again,
Cameron

Chandler Carruth

unread,
Apr 7, 2013, 6:23:04 PM4/7/13
to cameron....@nyu.edu, Joshua Cranmer, Jeffrey Yasskin, llv...@cs.uiuc.edu
I think this entire conversation is going a bit off the rails. Let's try to stay focused on the specific request, and why there (may) be problems with it.

On Sun, Apr 7, 2013 at 11:50 AM, Cameron McInally <cameron....@nyu.edu> wrote:
To be clear, you're asking to turn off a set of optimizations. That
is, you're asking to make code in general run slower, so that you can
get a particular behavior on some CPUs in unusual cases.

I respectfully disagree. I am asking for an *option* to turn off a set of optimizations; not turn off optimizations in general. I would like to make it easy for a compiler implementor to choose the desired behaviour. I whole-heartedly believe that both behaviours (undefined and trap) have merit.

I think you're both misconstruing what this would involve.

You're actually asking for the formal model of the LLVM IR to be *parameterized*. In one mode, an instruction would produce undefined behavior on division, and in another mode it would produce a trap. Then you are asking for the optimizer stack to support either semantic model, and produce efficient code regardless.

This is completely intractable for LLVM to support. It would make both the optimizers and the developers of LLVM crazy to have deep parameterization of the fundamental semantic model for integer division.

The correct way to support *exactly* reproducing the architectural peculiarities of the x86-64 integer divide instruction is to add a target-specific intrinsic which does this. It will have defined behavior (of trapping in some cases) as you want, and you can emit this in your FE easily. However, even this has the risk of incurring a high maintenance burden. If you want much in the way of optimizations of this intrinsic, you'll have to go through the optimizer and teach each pass about your intrinsic. Some of these will be easy, but some will be hard and there will be a *lot* of them. =/


Cameron, you (and others interested) will certainly need to provide all of the patches and work to support this if you think this is an important use case, as the existing developers have found other trade-offs and solutions. And even then, if it requires really substantial changes to the optimizer, I'm not sure it's worth pursuing this in LLVM. My primary concerns are two-fold. First, I think that the amount of work required to recover the optimizations which could theoretically apply to both of these operations will be massive. Second, I fear that after having done this work, you will immediately find the need to remove some other undefined behavior from the IR which happens to have defined behavior on x86-64.


Fundamentally, the idea of undefined behavior is at the core of the design of LLVM's optimizers. It is leveraged everywhere, and without it many algorithms that are fast would become slow, transformations that are cheap would become expensive, passes that operate locally would be forced to operate across ever growing scopes in order to be certain the optimizations applied in this specific case. Trying to remove undefined behavior from LLVM seems unlikely to be a productive pursuit.

More productive (IMO) is to emit explicit guards against the undefined behavior in your language, much as -fsanitize does for undefined behavior in C++. Then work to build a mode where a specific target can take advantage of target specific trapping behaviors to emit these guards more efficiently. This will allow LLVM's optimizers to continue to function in the world they were designed for, and with a set of rules that we know how to build efficient optimizers around, and your source programs can operate in a world with checked behavior rather than undefined behavior. As a useful side-effect, you can defer the target-specific optimizations until you have benchmarks (internally is fine!) and can demonstrate the performance problems (if any).

Cameron, you may disagree, but honestly if you were to convince folks here I think it would have happened already. I'm not likely to continue the theoretical debate about whether LLVM's stance on UB (as I've described above) is a "good" or "bad" stance. Not that I wouldn't enjoy the debate (especially at a bar some time), but because I fear it isn't a productive way to spend the time of folks on this list. So let's try to stick to the technical discussion of strategies, costs, and tradeoffs.

Cameron McInally

unread,
Apr 8, 2013, 12:28:48 AM4/8/13
to Chandler Carruth, Joshua Cranmer, Jeffrey Yasskin, llv...@cs.uiuc.edu
Well put, Chandler!

On Sun, Apr 7, 2013 at 6:23 PM, Chandler Carruth <chan...@google.com> wrote:
I think this entire conversation is going a bit off the rails. Let's try to stay focused on the specific request, and why there (may) be problems with it.

On Sun, Apr 7, 2013 at 11:50 AM, Cameron McInally <cameron....@nyu.edu> wrote:
To be clear, you're asking to turn off a set of optimizations. That
is, you're asking to make code in general run slower, so that you can
get a particular behavior on some CPUs in unusual cases.

I respectfully disagree. I am asking for an *option* to turn off a set of optimizations; not turn off optimizations in general. I would like to make it easy for a compiler implementor to choose the desired behaviour. I whole-heartedly believe that both behaviours (undefined and trap) have merit.

I think you're both misconstruing what this would involve.

You're actually asking for the formal model of the LLVM IR to be *parameterized*. In one mode, an instruction would produce undefined behavior on division, and in another mode it would produce a trap. Then you are asking for the optimizer stack to support either semantic model, and produce efficient code regardless.

This is completely intractable for LLVM to support. It would make both the optimizers and the developers of LLVM crazy to have deep parameterization of the fundamental semantic model for integer division.

The correct way to support *exactly* reproducing the architectural peculiarities of the x86-64 integer divide instruction is to add a target-specific intrinsic which does this. It will have defined behavior (of trapping in some cases) as you want, and you can emit this in your FE easily. However, even this has the risk of incurring a high maintenance burden. If you want much in the way of optimizations of this intrinsic, you'll have to go through the optimizer and teach each pass about your intrinsic. Some of these will be easy, but some will be hard and there will be a *lot* of them. =/


Cameron, you (and others interested) will certainly need to provide all of the patches and work to support this if you think this is an important use case, as the existing developers have found other trade-offs and solutions. And even then, if it requires really substantial changes to the optimizer, I'm not sure it's worth pursuing this in LLVM. My primary concerns are two-fold. First, I think that the amount of work required to recover the optimizations which could theoretically apply to both of these operations will be massive. Second, I fear that after having done this work, you will immediately find the need to remove some other undefined behavior from the IR which happens to have defined behavior on x86-64.

Alas, I must have been shortsighted. For my purposes, I had envisioned using this target-specific intrinsic only when undefined behaviour was imminent. That information is available before the IR and it would work-around the constant folder. I did not anticipate needing optimizations around that intrinsic, since it would ultimately trap.

Supporting the intrinsic as a proper alternative to the integer division operator(s) sounds like a lot of work. I do not believe that the reward is worth the effort, at least for my purposes. Others may feel different.
 
Fundamentally, the idea of undefined behavior is at the core of the design of LLVM's optimizers. It is leveraged everywhere, and without it many algorithms that are fast would become slow, transformations that are cheap would become expensive, passes that operate locally would be forced to operate across ever growing scopes in order to be certain the optimizations applied in this specific case. Trying to remove undefined behavior from LLVM seems unlikely to be a productive pursuit.

Fair enough.
 
More productive (IMO) is to emit explicit guards against the undefined behavior in your language, much as -fsanitize does for undefined behavior in C++. Then work to build a mode where a specific target can take advantage of target specific trapping behaviors to emit these guards more efficiently. This will allow LLVM's optimizers to continue to function in the world they were designed for, and with a set of rules that we know how to build efficient optimizers around, and your source programs can operate in a world with checked behavior rather than undefined behavior. As a useful side-effect, you can defer the target-specific optimizations until you have benchmarks (internally is fine!) and can demonstrate the performance problems (if any).

Regrettably, this implementation does not suit my needs. The constant folding would still occur and I would like to produce the actual division, since the instruction is non-maskable on x86. Others may have a better use for this implementation though, so I don't want to shoot the idea down for everyone.
 
Cameron, you may disagree, but honestly if you were to convince folks here I think it would have happened already. I'm not likely to continue the theoretical debate about whether LLVM's stance on UB (as I've described above) is a "good" or "bad" stance. Not that I wouldn't enjoy the debate (especially at a bar some time), but because I fear it isn't a productive way to spend the time of folks on this list. So let's try to stick to the technical discussion of strategies, costs, and tradeoffs.

Oh, no. Your analysis was thorough and I can sympathize with it. The seams of C/C++ and the x86 architecture are foggy. I understand that my interpretation of their interactions is not gospel.

Thanks again for the thoughtful reply!

-Cameron

Chandler Carruth

unread,
Apr 8, 2013, 1:34:45 AM4/8/13
to cameron....@nyu.edu, Joshua Cranmer, Jeffrey Yasskin, llv...@cs.uiuc.edu
On Sun, Apr 7, 2013 at 9:28 PM, Cameron McInally <cameron....@nyu.edu> wrote:
More productive (IMO) is to emit explicit guards against the undefined behavior in your language, much as -fsanitize does for undefined behavior in C++. Then work to build a mode where a specific target can take advantage of target specific trapping behaviors to emit these guards more efficiently. This will allow LLVM's optimizers to continue to function in the world they were designed for, and with a set of rules that we know how to build efficient optimizers around, and your source programs can operate in a world with checked behavior rather than undefined behavior. As a useful side-effect, you can defer the target-specific optimizations until you have benchmarks (internally is fine!) and can demonstrate the performance problems (if any).

Regrettably, this implementation does not suit my needs.

I'm curious why you think so... I think it would very closely match your needs:
 
The constant folding would still occur

No? At least, I don't see why.

I don't know what your frontend is doing (but my impression is that it is not Clang? If I'm wrong, let me know...) but the common idiom is to emit nothing as a constant other than immediate values, and let LLVM's optimizers figure it out. A consequence of this pattern combined with inserting guards is that (at most) the optimizer will fold directly to the trap. You should be able to control the exact structure in the FE though.

and I would like to produce the actual division, since the instruction is non-maskable on x86.

Yes, and this is the point I was driving at with the comments about a target specific optimization. It is reasonable for the x86 backend to recognize the pattern of testing for a zero divisor and trapping if it occurs, and transform that into an unconditional divide relying on the hardware trap instead. I think it is not unreasonable to have this strategy result in one of two generated code patterns in the overwhelming majority of cases:

1) An unconditional trap because the optimizer proved divide by zero
2) A direct divide instruction which relies on the x86 trapping behavior.

That said, it remains a fairly significant amount of work to arrive at this place. Just trying to point out a reasonable approach to arriving there.

Duncan Sands

unread,
Apr 8, 2013, 3:39:32 AM4/8/13
to cameron....@nyu.edu, llv...@cs.uiuc.edu
Hi Cameron,
I reckon it shouldn't be too hard to teach the code generators to turn this IR
sequence into "y = target-divide whatever/x" on targets for which dividing by 0
traps in a satisfactory way, so it turns into something efficient.

Ciao, Duncan.

James Courtier-Dutton

unread,
Apr 8, 2013, 5:31:28 AM4/8/13
to Duncan Sands, llv...@cs.uiuc.edu


On Apr 8, 2013 8:42 AM, "Duncan Sands" <bald...@free.fr> wrote:
>
> Hi Cameron,
>
>
> On 07/04/13 19:42, Cameron McInally wrote:
>>
>> On Sun, Apr 7, 2013 at 1:23 PM, Duncan Sands <bald...@free.fr
>> <mailto:bald...@free.fr>> wrote:
>> ...
>>
>>     If you want a trap you are going to have to (IMO) output IR
>>     instructions rather than a constant expression.  For example you can output the
>>     equivalent of
>>        if (x == 0)
>>           llvm.trap() // or maybe some special trap routine
>>        y = whatever/x
>>        ... use y ...
>>     rather than the constant expression "whatever/x".  If the optimizers can prove
>>     that x is never zero then this will be sunk back into a constant expression (if
>>     "whatever" is a constant expression).
>>
>>
>> Ah, that's super interesting. I'll have to give it some more thought. Thanks for
>> this, Duncan.
>>
>> My knee-jerk reaction is that emulating hardware behaviour in software should be
>> avoided. At least in this specific case.
>
>
> I reckon it shouldn't be too hard to teach the code generators to turn this IR
> sequence into "y = target-divide whatever/x" on targets for which dividing by 0
> traps in a satisfactory way, so it turns into something efficient.
>

If the application is safety related, then the if (x == 0) should be there in the source code anyway.
If it happens to be able to optimize it into the div instruction at target lowering, then it could do it then.
If a target cpu does not do the div trap then the target lowering would at least maintain the safety at the expense of an extra branch.

James.

Cameron McInally

unread,
Apr 8, 2013, 11:45:16 AM4/8/13
to Chandler Carruth, Joshua Cranmer, Jeffrey Yasskin, llv...@cs.uiuc.edu
On Mon, Apr 8, 2013 at 1:34 AM, Chandler Carruth <chan...@google.com> wrote:
On Sun, Apr 7, 2013 at 9:28 PM, Cameron McInally <cameron....@nyu.edu> wrote:
More productive (IMO) is to emit explicit guards against the undefined behavior in your language, much as -fsanitize does for undefined behavior in C++. Then work to build a mode where a specific target can take advantage of target specific trapping behaviors to emit these guards more efficiently. This will allow LLVM's optimizers to continue to function in the world they were designed for, and with a set of rules that we know how to build efficient optimizers around, and your source programs can operate in a world with checked behavior rather than undefined behavior. As a useful side-effect, you can defer the target-specific optimizations until you have benchmarks (internally is fine!) and can demonstrate the performance problems (if any).

Regrettably, this implementation does not suit my needs.

I'm curious why you think so... I think it would very closely match your needs:

It's my current understanding that the actual division would be constant folded away and I would be left with only the guard. Later, the guard would be proved always true. I could, of course, be mistaken.

I checked out Clang with -fsanitize=integer-divide-by-zero this weekend and saw similar in the assembly. I also noticed that the division operands are still around in the assembly. If it would be possible to recreate the div instructions, I would be thrilled.

 
 
The constant folding would still occur

No? At least, I don't see why.

I don't know what your frontend is doing (but my impression is that it is not Clang? If I'm wrong, let me know...)

It is not Clang. Our compiler has proprietary frontends, optimizer, vectorizer, and more. You may have noticed that I don't often share LLVM IR on the list. Our IR has tons of proprietary changes to it. But, I digress.

 
but the common idiom is to emit nothing as a constant other than immediate values, and let LLVM's optimizers figure it out.

Our pre-LLVM optimizer may be a culprit here. With the original test case I provided, our inliner will inline both foo(...) and bar(...). In main(), we end up calling CreateSDiv in the IRBuilder with two constants, i.e. (6/0).

Would Clang snap these constants into temporaries? Or, does Clang maintain the original source structure? Sorry for these simple questions, I have no insight to Clang at all.

 
A consequence of this pattern combined with inserting guards is that (at most) the optimizer will fold directly to the trap. You should be able to control the exact structure in the FE though. 

and I would like to produce the actual division, since the instruction is non-maskable on x86.

Yes, and this is the point I was driving at with the comments about a target specific optimization. It is reasonable for the x86 backend to recognize the pattern of testing for a zero divisor and trapping if it occurs, and transform that into an unconditional divide relying on the hardware trap instead. I think it is not unreasonable to have this strategy result in one of two generated code patterns in the overwhelming majority of cases:

I'm going to be nitpicky over this. It's really not my intention to be unreasonable. I hope you can understand.
 
1) An unconditional trap because the optimizer proved divide by zero

This isn't desirable for me. I am afraid an unconditional trap in the assembly will just confuse a user. I suspect that a user will trigger this code and immediately report a compiler bug that we've inserted a rogue trap. 

 
2) A direct divide instruction which relies on the x86 trapping behavior.

This would be great. Would I be able to accurately recreate the actual constant divide that was folded away? Originally, I had suspected that the operands would be long gone or at least intractable. Again, this may sound unreasonable, but I would like to keep the division intact so that a user can see where their code went wrong in the assembly. Sorry in advance if I'm misunderstanding.

Thanks again, Chandler.

-Cameron 

Cameron McInally

unread,
Apr 8, 2013, 12:01:48 PM4/8/13
to Duncan Sands, llv...@cs.uiuc.edu
Hey Duncan,

On Mon, Apr 8, 2013 at 3:39 AM, Duncan Sands <bald...@free.fr> wrote:
... 
I reckon it shouldn't be too hard to teach the code generators to turn this IR

sequence into "y = target-divide whatever/x" on targets for which dividing by 0
traps in a satisfactory way, so it turns into something efficient.

I was just writing Chandler about a similar implementation. With my current understanding of the problem, my constant division will end up as either a trap call or a load of an undefined value in the IR, depending on what the FE chooses to do. It's not clear to me, at least with my current knowledge of LLVM, how to recreate the actual division instruction in the target backend from that information (trap or undefined value). Hopefully there is a way. ;)

Thanks again, Duncan.

-Cameron

Cameron McInally

unread,
Apr 8, 2013, 2:07:48 PM4/8/13
to Duncan Sands, llv...@cs.uiuc.edu
Hey again,

I have a nagging thought that my past comments may not have enough meat to them. So...

On Mon, Apr 8, 2013 at 12:01 PM, Cameron McInally <cameron....@nyu.edu> wrote:
... 
I was just writing Chandler about a similar implementation. With my current understanding of the problem, my constant division will end up as either a trap call or a load of an undefined value in the IR, depending on what the FE chooses to do. It's not clear to me, at least with my current knowledge of LLVM, how to recreate the actual division instruction in the target backend from that information (trap or undefined value). Hopefully there is a way. ;)

I've tried my little test case with an unmodified Clang and -fsanitize=integer-divide-by-zero. After LLVM runs the Function Integration/Inlining Pass, I end up with this IR:

*** IR Dump Before Deduce function attributes ***
define i32 @main() #0 {
entry:
  call void @__ubsan_handle_divrem_overflow(i8* bitcast ({ { i8*, i32, i32 }, { i16, i16, [6 x i8] }* }* @2 to i8*), i64 6, i64 0) #3
  %call1 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([9 x i8]* @.str, i64 0, i64 0), i32 undef) #3
  ret i32 0

The problem I'm facing is how to safely convert this __ubsan_handle_divrem_overflow call back to the actual integer division in the target backend. Is it possible and I'm missing it? Any insight would be warmly welcomed.

On a related note, this investigation into Clang has proven insightful. My compiler's usage of LLVM differs from Clang's in many ways. So, it could be that the issues I face are peculiar to my compiler. Without delving, my input LLVM IR is heavily optimized.

-Cameron

Cameron McInally

unread,
Apr 8, 2013, 2:37:37 PM4/8/13
to Duncan Sands, llv...@cs.uiuc.edu
On Mon, Apr 8, 2013 at 2:07 PM, Cameron McInally <cameron....@nyu.edu> wrote:
... 

*** IR Dump Before Deduce function attributes ***
define i32 @main() #0 {
entry:
  call void @__ubsan_handle_divrem_overflow(i8* bitcast ({ { i8*, i32, i32 }, { i16, i16, [6 x i8] }* }* @2 to i8*), i64 6, i64 0) #3
  %call1 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([9 x i8]* @.str, i64 0, i64 0), i32 undef) #3
  ret i32 0

On second thought, I do see the idiv operands. So that will work!

I suppose I just have to disambiguate between div and mod for this to be complete. That could be done at my interface.

Thanks for the help, guys.

-Cameron 

d...@cray.com

unread,
Apr 16, 2013, 12:35:58 PM4/16/13
to Owen Anderson, Joshua Cranmer, llv...@cs.uiuc.edu
Owen Anderson <resi...@mac.com> writes:

> As I've pointed out, clang does provide such functionality as an
> opt-in feature through its -fsanitize options.  A hypothetical Fortran
> frontend could do the same, and even make it an opt-out feature if it
> chose.  I'm sorry if its implementation mechanism doesn't match
> exactly what you want it to be, but it's not like nobody else has
> thought about this problem.  They have, and they've designed and
> shipped a solution!

The problem with a clang implementation is that not everyone uses clang.
It sure would be nice to have an option that could be set to not
constant fold traps away. Then it would be easy for implementors to
just set that option if they want.

This is an optimization/codegen issue so it seems to me the fix ought to
be there, not in the frontend.

Is there great harm in adding an option here?

-David

d...@cray.com

unread,
Apr 16, 2013, 12:41:14 PM4/16/13
to Joe Groff, llv...@cs.uiuc.edu
Joe Groff <arc...@gmail.com> writes:

> Constant folding undefined expressions is sort of silly, but I
> appreciate that it makes undefined behavior problems in frontends
> immediately apparent with trivial cases before they creep up on you in
> more complicated optimized code. After all, even if the backend makes
> practical concessions to trivial cases, the underlying semantic
> problem is still there and will bite you eventually.

Not necessarily. It's not uncommon for Fortran programmers to do
something like this (C-tran pseudo code):

if (error) then
temp = x / 0 ; Force a trap to get into the debugger
end if

The user is relying on trapping behavior to help debug code. There is
no bad behavior to bite one later. Optimizing traps away is unfriendly
in cases like this.

> For high-level languages like Julia that want to provide efficiency
> but also give defined behavior to all user-exposed cases, I think
> adding an LLVM intrinsic to represent division that's defined to trap
> on division by zero or overflow would be the best solution. Since the
> trap is a side effect, it would stifle optimization of code that used
> the intrinsic, but the intrinsic could still be lowered to a single
> hardware instruction and avoid the branching and code bloat of manual
> checks on hardware where division by zero natively traps.

What about the case where other optimization exposes a constant divide
by zero later on? There is no way for a frontend to know to emit the
intrinsic _a_priori_. We don't want to force all divides to use an
intrinsic because it will kill optimization.

I think to handle the general case we will need a flag to preserve
traps.

-David

d...@cray.com

unread,
Apr 16, 2013, 12:49:31 PM4/16/13
to Jeffrey Yasskin, Joshua Cranmer, llv...@cs.uiuc.edu
Jeffrey Yasskin <jyas...@google.com> writes:

>> This would work well for the constant expression division case, since the
>> division by zero expression would be folded away leaving only a call to
>> trap. The performance penalty for using -fsanitize on other non-constant
>> division expressions is still unclear to me; and concerning.
>
> If the performance penalty is unclear to you, that means you haven't
> measured it. Until you measure, you have absolutely no business
> complaining about a potential performance problem. Measure, and then
> come back with numbers.

Oh come on. He's expression concern, not saying it's a definite problem
or complaining.

>> Although, I've been contemplating x86-64's behaviour for this case when
>> floating point traps are disabled. Ideally, the compiler should preserve
>> that behaviour, which might make this software implementation messy.
>> Especially if different processors have different implementations. The
>> simplest solution... let the hardware behave as it should.
>
> To be clear, you're asking to turn off a set of optimizations. That
> is, you're asking to make code in general run slower, so that you can
> get a particular behavior on some CPUs in unusual cases.

No. He's asking to turn off a particular optimization case for a
certain _known_ constant expression. Furthermore, he's asking for an
option, not to blindly turn it off for everyone.

> If the compiler can prove x==0, then yes, you'd be left with just a
> jump to the error handler. That's more efficient than handling a
> hardware trap, so it's what you ought to want.

Not necessarily. See my example of using traps to aid debugging. A
jump to a handler would disrupt machine state. Perhaps not greatly but
it is an extra complication users don't expect.

-David

d...@cray.com

unread,
Apr 16, 2013, 12:58:42 PM4/16/13
to Chandler Carruth, Joshua Cranmer, Jeffrey Yasskin, llv...@cs.uiuc.edu
Chandler Carruth <chan...@google.com> writes:

> You're actually asking for the formal model of the LLVM IR to be
> *parameterized*. In one mode, an instruction would produce undefined
> behavior on division, and in another mode it would produce a trap.
> Then you are asking for the optimizer stack to support either semantic
> model, and produce efficient code regardless.
>
> This is completely intractable for LLVM to support. It would make both
> the optimizers and the developers of LLVM crazy to have deep
> parameterization of the fundamental semantic model for integer
> division.

You're making this *way* too complicated. We have plenty of examples of
options to turn off optimizations. Most compliers provide options to
preserve traps.

The division would still semantically have undefined behavior. The
implementation would simply make the behavior a trap instead of
returning some random garbage value. Sure, it *may* change how other
code is optimized but that is the choice the implementor makes when
choosing to use the option to preserve traps. LLVM developers not
concerned with preserving traps need not give it a second thought.

> The correct way to support *exactly* reproducing the architectural
> peculiarities of the x86-64 integer divide instruction is to add a
> target-specific intrinsic which does this. It will have defined
> behavior (of trapping in some cases) as you want, and you can emit
> this in your FE easily.

You can't do that if the FE doesn't see the constant expression.
Optimization may reveal it later.

> Cameron, you (and others interested) will certainly need to provide
> all of the patches and work to support this if you think this is an
> important use case, as the existing developers have found other
> trade-offs and solutions.

Certainly.

A software test+trap is theoretically possible and the target optimizer
could theoretically get rid of it, but I share Cameron's concern about
the work required to turn theory into reality. He's not asking to
redefine the LLVM IR. He's asking for an option to control the
implementation of that IR.

Preserving traps is a real-world need. LLVM itself doesn't currently
provide a way to do it. It seems like it should have one and not rely
on particular frontends. It's supposed to be an independent set of
libraries, right?

-David

d...@cray.com

unread,
Apr 16, 2013, 1:01:59 PM4/16/13
to Chandler Carruth, Joshua Cranmer, Jeffrey Yasskin, llv...@cs.uiuc.edu
Chandler Carruth <chan...@google.com> writes:

> Yes, and this is the point I was driving at with the comments about a
> target specific optimization. It is reasonable for the x86 backend to
> recognize the pattern of testing for a zero divisor and trapping if it
> occurs, and transform that into an unconditional divide relying on the
> hardware trap instead. I think it is not unreasonable to have this
> strategy result in one of two generated code patterns in the
> overwhelming majority of cases:
>
> 1) An unconditional trap because the optimizer proved divide by zero
> 2) A direct divide instruction which relies on the x86 trapping
> behavior.
>
> That said, it remains a fairly significant amount of work to arrive at
> this place. Just trying to point out a reasonable approach to arriving
> there.

I agree that this is potentially a good approach. I also agree that it
is a significant amount of work.

Given infinite resources, I think your solution makes the most sense.
Given limited resources, I'm not sure. But Cameron has said he won't
push anything upstream that people aren't comfortable with.

-David
Reply all
Reply to author
Forward
0 new messages