Re: undefined vs implementation-specified behavior

David Brown

unread,

Oct 17, 2017, 11:16:22 AM10/17/17

to

On 17/10/17 16:56, Stefan Ram wrote:
> In several cases of UB, one also could imagine a
> hypothetical language, where the behavior would be merely
> implementation specified.
>
> For example, the division by 0 is undefined behavior, but it
> could as well have been defined to be implementation
> specified.
>
> What are the reasons one sometimes prefers undefined
> behavior to implementation-specified behavior?
>

If it is implementation-defined, then the compiler (more accurately, the
compiler authors) has to pick a specific behaviour and document it. If
it is undefined behaviour, then the compiler can produce different
results each time, and it can also assume that the effect never occurs
in order to get better optimisation.

Taking your example, if the specs said that "division by 0 has
implementation-defined behaviour", then the compiler would have to give
documented behaviour. This could be to cause a processor trap, to
return 0, to return +inf (for floating point), to return x for x/0. The
compiler authors can pick the behaviour they want to match the
convenience of the hardware, but it must be specified.

However, if the hardware has a "div" instruction that is defined as "div
rA, rB, rC" with behaviour "if rB != 0, then rC is set to rA/rB" then
there is a problem. When trying to divide by 0, the behaviour of the
cpu itself is undefined. Perhaps it traps, perhaps it returns a random
value in rC, perhaps it leaves rC untouched. Perhaps the behaviour
depends on cpu flags, or different versions of the processor microcode.

The only way the compiler can generate code with definable behaviour is
to add an extra check for divide by 0 first.

Also, when something is undefined behaviour, the compiler "knows" it
cannot happen. So given this:

int foo(int x) {
int y = 100000 / x;
if (x == 0) {
printf("You are dividing by 0!\n");
exit(1);
}
return y;
}

the compiler can omit the "if (x == 0)" clause altogether. (It is
different if the check comes /before/ the division, of course.)

Some people think such optimisations are a good thing, others think they
are a bad thing - but they are clearly allowed by the way C and C++ work.

Finally, note that even though the standards may say something is
"undefined behaviour", other things - including compiler documentation,
or additional standards such as POSIX, or a particular target cpu or OS
- are free to define the behaviour. A compiler may, if they authors
desire, define the behaviour of division by 0.

Chris Vine

unread,

Oct 17, 2017, 12:05:18 PM10/17/17

to

On 17 Oct 2017 14:56:58 GMT

r...@zedat.fu-berlin.de (Stefan Ram) wrote:
> In several cases of UB, one also could imagine a
> hypothetical language, where the behavior would be merely
> implementation specified.
>
> For example, the division by 0 is undefined behavior, but it
> could as well have been defined to be implementation
> specified.
>
> What are the reasons one sometimes prefers undefined
> behavior to implementation-specified behavior?

§3.12, 3.28 and 3.29 of the C++17 standard defines three behaviors of a
program or parts of a program that cannot automatically be deduced from
the standard itself - implementation-defined behavior, undefined
behavior and unspecified behaviour.

Unspecified behavior seems to me to be an oddity. It is similar to
implementation-defined behavior except that the implementation does not
have to document what the behavior is. A program with unspecified
behavior is still "well formed": unspecified behavior is said to be
"behavior, for a well-formed program construct and correct data, that
depends on the implementation", and a note suggests that "the range of
possible behaviors is usually delineated by this International
Standard". I do not fully understand the purpose of having well formed
code which is not required to make known how it will behave before it
is first run.

As to why sometimes undefined behavior is preferred to implementation-
defined behavior, I guess it is probably a combination of the
usefulness of requiring a non-mandatory behavior to be documentated, the
difficulty in implementing a particular behavior, and whether
requiring a specified behavior will cause inefficiency in some target
platforms.

Probably, speaking to a compiler writer would give you more insight into
the choices made in the standard. Part of it may just be historical.

Chris

James R. Kuyper

unread,

Oct 17, 2017, 12:05:50 PM10/17/17

to

On 2017-10-17 10:56, Stefan Ram wrote:
> In several cases of UB, one also could imagine a
> hypothetical language, where the behavior would be merely
> implementation specified.
>
> For example, the division by 0 is undefined behavior, but it
> could as well have been defined to be implementation
> specified.
>
> What are the reasons one sometimes prefers undefined
> behavior to implementation-specified behavior?

Whenever the behavior is unspecified, the standard is supposed to, at
least implicitly, provide a set of different possible behaviors to
choose from. The main reason for preferring undefined behaviour is when
the range of possible behaviors gets too big and complicated to describe
correctly.

Example: overwriting the end of an array. On most systems, there's three
main possibilities. Such overwrites might end up writing to protected
memory, causing your program to abort(). That's not particularly
problematic to describe. An unlikely possibility is that it will simply
be a no-op, also easy to describe. However, if memory protection is not
triggered, such attempts will generally cause an arbitrary piece of
memory to be filled in with a representation of the value you provided.
That piece of memory might or might not be currently in use by some
other part of your program, or by some other program. However, that
can't be used as the description of the behavior. That description must
be in terms of what actually happens as a result of that write. What
will happen depends upon how that piece of memory is being used. For
instance, it might contain the code address that the current function
will jump to when it exits, in which case the function will jump to a
location that depends upon what you wrote to that piece of memory. Or it
might contain one of the other variables in your program, causing that
variable to unexpectedly change value. Or it might contain errno, or
argv[0]. Listing all of the possible consequences of writing to an
arbitrary location in memory is simply impossible. The easiest thing
that covers all of those possibilities is "undefined behavior".

Chris Vine

unread,

Oct 17, 2017, 12:13:23 PM10/17/17

to

On reflection I think that the question is in a sense the wrong way
around. Undefined behavior is behavior for which the standard imposes
no requirements. That will always be with us: not everything can be
implementation specified where the standard mandates no particular
behavior. So I guess the question is rather why sometimes
implementation-defined behavior is preferred in the standard to
undefined behavior.

James R. Kuyper

unread,

Oct 17, 2017, 1:19:16 PM10/17/17

to

When something is unspecified but not implementation-defined, that
serves as a warning that your code should be prepared to deal with all
of the permitted possibilities. For instance:

"All objects with static storage duration shall be initialized (set to
their initial values) before program startup. The manner and timing of
such initialization are otherwise unspecified." (5.1.2p1). That
statement serves to warn you that you should not write code which
depends upon that initialization occurring in a particular manner or timing.

"If the return type [of main] is not compatible with int, the
termination status returned to the host environment is unspecified."
(5.1.2.2.3p1). This means that if an implementation chooses to support a
main() that doesn't return int, and if you write a program that takes
advantage of that fact, you should not have any particular expectations
about whether that function will return a successful or unsuccessful
termination status to the host environment. For instance, if you're
using a scripting system that stops executing command lines after the
first command line that returns an unsuccessful status, you should be
equally unsurprised if the script halts, or doesn't halt, after
executing your program. If that would be a problem for your program,
don't write it that way.

Chris Vine

unread,

Oct 17, 2017, 2:08:22 PM10/17/17

to

Those are helpful examples, thank you. Doing a grep on the standard
brings up many similar ones.

The common thread seems to be that something reasonable will happen, but
it is not practicable, or not helpful, to say what. I see that these
are described elsewhere in the standard as "the nondeterministic
aspects of the abstract machine".