On 29/10/2019 17:41, Jakob Bohm wrote:
> On 29/10/2019 14:32, David Brown wrote:
>> On 28/10/2019 10:42, Jakob Bohm wrote:
>>
>>>
>>> For clarity, a number of existing compilers have the following known
>>> warning behavior:
>>>
>>> In places where empty statements are likely to be typos etc., they warn
>>> about an actual empty statement (such as a lone semicolon), but accept
>>> an empty block as en explicit request that nothing be done, without that
>>> warning.
>>>
>>> Typical examples include:
>>>
>>> if (x > 3) ; // Warns about possible typo
>>> puts("x is greater than 3");
>>> if (x > 3) {} // No warning
>>> puts("Hello world");
>>>
>>
>
> Note: The above was an example of actual good behavior by some actual
> current compilers, given as a counterexample to the proposal for a
> "nocode" keyword.
>
Yes, I agree - both that such compiler warnings are useful, and that
they show a "nocode" keyword is unnecessary.
Ah, okay. I think it would be difficult to specify these things
accurately enough for a document like that, but I now understand more
about what you mean. Perhaps it would be better expressed as a kind of
test suite - a set of example snippets that you expect "quality"
compilers to emit warnings on, plus other snippets that you /don't/ want
warnings on. People could rate compilers (or, more specifically,
compiler + flag combinations) on the percentage of matches on the tests.
>
>>>
>>> * Warn about empty controlled statements
>>> * Diagnostics should point to the exact code location and mention
>>> the items involved.
>>> * Diagnostics involving include files should state both the include
>>> file and where it was included.
>>> * Diagnostics involving macros should state both the location of the
>>> invocation and the location of the specific location within the macro
>>> definition.
>>> * Diagnostics involving continuation lines should point to actual
>>> source file lines, not abstract decoded lines.
>>
>> These are all just compiler warnings.
>
> The last 4 also apply to error diagnostics. Over the years I have
> encountered a number of otherwise good compilers failing each of these
> rules, making it very difficult to identify what exactly triggered that
> "syntax error" diagnostic.
>
I am afraid you will continue to be disappointed, because you are asking
about an unsolvable problem. Compilers can get better at this - and in
my experience, they /are/ getting better - but they can never get it
fully right. If you write "int colour;" in one part of a file, and
later write "color = 2;" in another part, which line is the mistake?
Compilers - especially combined with a good IDE - can make it easier to
find the source of diagnostics, but they won't ever get it perfect.
>
>>
>>
>>> * Unless otherwise stated, basic types should expose whatever
>>> behavior
>>> above and beyond the language standard that is provided by the actual
>>> machine definition. (For example, unless otherwise stated, if the
>>> machine is one that is defined to handle integer overflow in a specific
>>> way, that behavior should be implementation defined to be that machine
>>> behavior, not "anything goes" undefined).
>>
>> This is a terrible idea in terms of portability of C code, and
>> optimisation of C code.
>>
>> If a compiler wants to provide additional documented semantics for
>> things that the C standards do not define, such as integer overflow
>> behaviour, then it can do so. Even better, compilers can provide
>> choices to control this behaviour (I would strongly prefer such choices
>> to be made in the code, such as by pragmas, rather than command-line
>> switches).
>
> The ability to choose different semantics for compiler specific reason
> is why I wrote "unless otherwise stated" twice.
>
Yes - but I strongly disagree with your wish to make these extra
semantics the default. Change it to be "compilers should have a way to
expose stronger semantics of integer arithmetic convenient to the
underlying machine", and I would be a lot happier. Even better would be
for compilers to provide types like "int_mod32_t" which would be a
32-bit signed integer with modular overflow behaviour. "int32_t" (and
"int") would continue to have undefined overflow behaviour. Then the
programmer would be stating exactly what they want.
>>
>> But there are many disadvantages in having such extra behaviour - it
>> reduces optimisation opportunities, it reduces static error checking,
>> and it reduces run-time diagnostics and checks. And of course it
>> encourages code to be unnecessarily tied to specific platforms.
>>
>
> In practice, the vast majority of machines share some common semantics
> such as "2s complement ints" or "ints stored as straight little or big
> endian, nothing weird".
You are mixing representation and operational semantics.
That is a matter of representation, and it is certainly the case that
for modern processors, two's complement with big or little endian
ordering and no padding is universal. (There are still systems in use
for which this does not apply, but they are unlikely to have compilers
that support future C standards.) Sizes are not as universal as some
people think - there are devices in use with 16-bit or 32-bit char, or
with 24-bit int.
I have nothing against simplifying the C standards to require two's
complement representation and ban padding bits (except in _Bool). C++20
has this, and it is a proposed addition to the next C standard.
However, representation is not overflow behaviour. There are very good
reasons for /not/ wanting two's complement overflow behaviour even when
the underlying cpu supports it.
(There are some operations, especially with shifts, where it would be
reasonable to tighten the behaviour definitions if the representation is
fixed with two's complement. And things like conversions from unsigned
to signed types could be given fully defined behaviour rather than
implementation defined behaviour.)
>
>> I can appreciate that some people want specific behaviour for integer
>> overflow - I am convinced that many of these do not appreciate the
>> negative consequences of that behaviour. And many people specifically
>> do not want such behaviour. So it certainly should not be mandated by
>> the standards.
>>
>
> I have yet to see anyone outside academia and compiler vendors actively
> wanting the bizarro optimizations found e.g. in some versions of gcc.
>
Well, that's changed now. I write code for small embedded systems,
mainly in C (with some C++), and I want signed overflow to be undefined
behaviour, and I want the compiler to optimise based on that knowledge.
I want my compiler to transform "x + 1 > 0" into "x >= 0". I want it to
transform "x * 20 / 5" into "x * 4". I want it to inform me if I've
written "int x = 1000000 * 1000000;". I want it to be able to run with
"-fsanitize=integer-overflow" and tell me when it sees a mistake in my code.
Overflowing your types is almost always an error in the code. (Not
always, but almost always. I am happy to go out of my way to deal with
cases where I actually want wrapping behaviour.) I want my compiler to
be able to tell me if it sees these errors, and I want it to be able to
optimise on the assumption that those errors don't occur.
> I have seen those that want explicit overflow semantics such as
> saturation to MAXINT/MININT (for the actual type),
Yes, sometimes you want saturation behaviour.
> or program
> termination with a diagnostic, as either ensures that out-of-range
> values do not cause unexpected malfunctions (think Ariane rocket
> incident).
Making integer overflow defined behaviour means errors like this are
more likely - because they are still logical errors in the code, but now
they are allowed by the language and the tools have no way to distinguish.
>
> The point is to state that it is a horribly bad (even if compliant)
> compiler that does what some gcc versions reportedly did (optimizing
> away a loop condition because the loop would assign an out-of-range
> value to a variable which would not be read after assigning that invalid
> value).
>
Let's be clear here - the famous case of gcc removing a loop
optimisation in SPEC benchmark was due to a bug in the SPEC code, and
was with a pre-release test version of gcc. Part of the testing before
releasing new compiler versions involves such tests, and the whole thing
was only publicised because it was interesting that a bug had been found
in the old SPEC code.
When you write code with undefined behaviour, you cannot expect the
compiler to guess what you meant to write. If you think that is the
case, C is not the language for you - it is that simple.
Having said that, compilers can always be better at informing the user
about such cases. They cannot tell you whenever they optimises based on
the assumption that undefined behaviour does not occur - then perfectly
normal code would cause a flood of warnings. But they can always be better.
>
>>
>>> * With very few clearly implementation defined cases, "undefined
>>> behavior" is not a license to introduce semantics that bear no
>>> resemblance to the source code. (A typical implementation defined case
>>> would be a function-like keyword that explicitly tells the compiler to
>>> ignore certain cases in its interpretation of the program). It can
>>> however be a license to arbitrarily choose between different meaningful
>>> interpretations, such as performing memory accesses in either of
>>> multiple permitted orders or optimizing an expression in ways that are
>>> not correct for invalid inputs, provided this does not trigger mistaken
>>> optimization assumptions outside the expression.
>>>
>>
>> Undefined behaviour has, by its definition, no meaningful behaviour. In
>> some cases it might be possible to guess what the programmer intended,
>> though this can be very difficult to do in a deterministic way in a
>> compiler - what is "obvious" to a human reader in a simple sample case
>> may be very much harder to spot in general cases in a tool.
>>
>
> Indeed, that is how the compiler community has chosen to read it, while
> historic C compilers (from around the time of the C89 definition) would
> interpret it as simply "not defined by the standard, do something sane
> and preferably documented".
No, compilers have never done that - it's a common myth and
misunderstanding. Lack of optimisations in earlier compilers was due to
more limited knowledge of compilers and optimisation techniques, and
more limited resources on hosts. And even 30 years ago, compilers /did/
optimise on the assumption that integer overflow did not occur - such as
strength-reduction of "x * 20 / 5" to "x * 4". And there are also many
cases where compilers have said "this is not defined by the standard,
but we choose to define it ourselves".
>
>> It is reasonable to want a compiler that does not knowingly exasperate
>> problems due to undefined behaviour - that is a quality of
>> implementation issue. But it is unreasonable (indeed, impossible) to
>> attempt to legislate this in general.
>>
>
> The somewhat obtuse misreadings of the standard by all major compiler
> vendors in recent years is at least reason to explicitly make such
> misreadings invalid.
Every compiler vendor I know of would want to hear about any misreadings
they have made. Have you considered that they in fact /have/ read the
standards correctly, instead of making up additional ideas that are not
there in the standards? If you can point to the wording in the
published C standards (C90, C99, C11 or C18) showing where integer
overflow behaviour is defined to be wrapping two's complement behaviour
on two's complement cpus, then you would have a strong point.
There are many challenges involved in writing good, efficient, and
portable C code. One of these is that there are bits of code that can
be written in a way that is undefined behaviour and fails on stronger
compilers but works efficiently on weaker compilers, or it can be
written in another way that is defined behaviour and works efficiently
on strong compilers but very inefficiently on weaker compilers. Which
way should you write such code? The answer is, of course, "it depends".
What you cannot reasonably do, however, is place artificial limits on
good, modern compilers in order to support older code with undefined
behaviours - older code that is not, in fact, correct C code at all.
The way modern compilers handle this is to provide switches to give
specific additional semantics that match certain behaviours of weaker
tools that might be assumed by code - classic examples in gcc being
"-fwrapv" to get wrapping integer overflow, and "-fno-strict-aliasing"
to disable type-based alias analysis. In addition, these take effect by
default when the compiler is run, as optimisations are disabled unless
explicitly enabled.
In other words, these compilers go out of their way to support incorrect
(but reasonable at the time of writing) C code, and to support people
who want the C language to be defined in a different way. And yet
people like you /still/ complain!
>
>> There may be scope for changing certain /specific/ cases of undefined
>> behaviour in the standards into fully defined or implementation specific
>> behaviour. But it would be for specific cases only, not as a
>> generalisation.
>>
>
> The proposal is to change the default meaning of "undefined behavior" to
> "implementation specific" in all but a few clearly delineated cases.
That won't work.
First, /everything/ is undefined behaviour except in the cases where the
C standards explicitly defines it. You could turn some specific
undefined behaviours into implementation defined behaviour, but you
cannot possibly expect implementations to specify and document
everything that is not written in the C standards. If your cat sits on
your keyboard, the behaviour is undefined by the C standards - do you
really expect compiler vendors to pick a specific deterministic and
documented behaviour for that event?
Secondly, undefined behaviour is a /good/ thing. It is /better/ in many
cases that things are undefined behaviour, rather than implementation
defined behaviour. I have explained above why I actively prefer integer
overflow to be undefined behaviour.
Thirdly, in a great many cases of undefined behaviour as listed in Annex
J.2 of the standard, there is no sensible choice of what implementation
defined behaviour should be.
And even in cases where you apparently think there is an "obvious"
choice for defining the behaviour, it often is not the case or not the
only option. For signed integer overflow, people might want two's
complement wrapping. But they might want saturation, or trapping, or
optimisations based on ignoring the possibility of the overflow.
I invite you to read through J.2 of C11 or C18 (you have read the C
standards that you criticise, and criticise compiler vendors for
misreading, haven't you?) and list the implementation-defined behaviour
you would like to see for your favourite cpu target.