Synopsis: If a human can point out a conformance problem with
some code, then a compiler SHOULD point out that
problem; possibly, if a human can answer a question
about the Standard, then a compiler should also be able
to answer that question.
Perhaps, there is a stricter statement: Every undefined
behavior is run-time behavior; at compile-time, there
is no such thing as undefined behavior. If this is true
(or often true), this draws a distinction that may be
helpful in identifying where future revisions of the
Standard must define behavior for what may currently
be considered an invocation of undefined behavior.
For instance, if you define a variable in namespace
`std', your compiler should at least warn that
you're invoking undefined behavior; or, maybe, that
behavior should be better defined by the Standard.
It is common for a programmer to treat a compiler as being the
ultimate Language Lawyer, someone who will gleefully peruse every
nook and cranny of a program in order to report even the minutest
contravention of the Standard. Alas, in practice, a compiler does
not fulfill this role, potentially misleading the programmer to
confidently write code that is non-portable or that invokes
undefined behavior, and that causes much gnashing of teeth.
Instead, the Standard should impose the following requirement:
Within some practical, standardized limits, if every compiler
necessarily has the answer to a question, then every compiler
must provide a standardized means by which to ask that question
and to receive the definite answer.
Howard E. Hinnant emphasized this very principle as far as it
applies to one particular case: What is the endianness of the
execution environment? The "ancient, time-honored tradition" of
asking this question is considered in Howard's proposal for
`std::endian':
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0463r1.html
https://howardhinnant.github.io/endian.html
In that proposal, Howard eloquently identifies the special
position of the compiler in providing the answer:
There are many hacks that work most of the time. None of them
seem bullet proof. Few give the answer as a compile-time
constant. And in every single case:
THE COMPILER KNOWS THE ANSWER!
I, for one, burst out laughing when I read that insight, because
it struck me that every compiler necessarily knows so many
answers, and yet keeps them quietly hidden away like the guarded
trade secrets of the guilds of yore.
Indeed, Howard's proposal led me to my own question. You see, I
badly wanted to use `std::endian' in a program, but there is a
problem: My code is targeting C++17, and Howard's newfangled
feature is slated for support only in C++20.
No matter, I thought. I'll just "backport" it; I'll just include
in my code an extended version of Howard's sample implementation,
conditionally compiled to avoid a future conflict:
#if __cplusplus >= 202000L
#include <type_traits>
#else
namespace std
{
enum class endian
{
#ifdef _WIN32
little = 0,
big = 1,
native = little
#elif (defined __ORDER_LITTLE_ENDIAN__) && (defined __ORDER_BIG_ENDIAN__) && (defined __BYTE_ORDER__)
little = __ORDER_LITTLE_ENDIAN__,
big = __ORDER_BIG_ENDIAN__,
native = __BYTE_ORDER__
#else
#error "This platform has no implementation for C++20's `std::endian'."
#endif
};
}
#endif
If that code exists in the includable source file `compat.h',
then one might write the following program:
#include <iostream>
#include "compat.h"
int main()
{
if (std::endian::native == std::endian::little)
std::cout << "The execution environment is little-endian.\n";
}
If that program exists in the source file `endian.cpp', then one
might compile it with GCC's `g++', and run it as follows:
$ g++ -std=c++17 endian.cpp -o endian
$ ./endian
The execution environment is little-endian.
Great! The code works as expected. However... should one actually
expect the code to work? On second thought, it seems like a
suspicious intrusion of the Standard's sacred namespace. Well, by
default, C++ implementations tend to play fast and loose, or at
least they often provide nonstandard extensions; surely, this
question can be resolved by requesting the compiler to be a
little more unforgiving:
$ be_unforgiving='-pedantic-errors -Wall -Wextra -Werror'
$ g++ -std=c++17 $be_unforgiving endian.cpp -o endian
Huh. Hmmm. No problems. How about LLVM's compiler?
$ clang++ -std=c++17 -Weverything -Wno-c++98-compat endian.cpp
Nothing. Sigh... Fine. Let's go where few have gone before:
To consult the Standard. Consider section `[namespace.std]':
http://eel.is/c++draft/namespace.std (generated on 2018-04-15)
https://github.com/cplusplus/draft/blob/99325f3d8975075d27e40d8548919f70fc7824b8/source/lib-intro.tex#L2202
According to that, the future C++20 Standard will state something
similar to the following:
Unless otherwise specified, the behavior of a C++ program is
undefined if it adds declarations or definitions to namespace
std or to a namespace within namespace std.
Finally! Clarity. Sort of. I mean, I'm not going to go looking
for where it might be "otherwise specified", so practicality
demands that I assume my usage of namespace `std' exists
[inexplicably] outside the purview of the Standard. That is, The
aforementioned program invokes undefined behavior, and I wish I
had never been able to compile it (without any warning); I feel
so betrayed; I feel as if the world is a house of cards built
atop a foundation of sand; I feel the need to gnash my teeth.
With this new knowledge, there naturally percolates in the mind a
better-defined solution, namely to avoid naming namespace `std':
#if __cplusplus >= 202000L
#include <type_traits>
using std::endian;
#else
enum class endian
{
#ifdef _WIN32
little = 0,
big = 1,
native = little
#elif (defined __ORDER_LITTLE_ENDIAN__) && (defined __ORDER_BIG_ENDIAN__) && (defined __BYTE_ORDER__)
little = __ORDER_LITTLE_ENDIAN__,
big = __ORDER_BIG_ENDIAN__,
native = __BYTE_ORDER__
#else
#error "This platform has no implementation for C++20's `std::endian'."
#endif
};
#endif
Now, the program in question can be re-written thusly:
#include <iostream>
#include "compat.h"
int main()
{
if (endian::native == endian::little)
std::cout << "The execution environment is little-endian.\n";
}
However, this solution is actually quite unsatisfying, because
I'd rather it be written in terms of namespace `std', and why
shouldn't it be? Why shouldn't I have a well-defined way by which
to backport fully-accepted library additions, even if only
incompletely? After all, that kind of dangerous access is
precisely what draws experts to C++; some low-level, unpalatable
hack can be hammered into place, and then covered with a safe,
compatible, clean abstraction for everyday use.
And, you'll note that I employed the term "a well-defined way"
rather than the term "a well-formed way". Consider the Standard's
section `[defns.well.formed]':
http://eel.is/c++draft/defns.well.formed
https://github.com/cplusplus/draft/blob/99325f3d8975075d27e40d8548919f70fc7824b8/source/intro.tex#L331
It states that a "well-formed program" is defined as a:
C++ program constructed according to the syntax rules,
diagnosable semantic rules, and the one-definition rule
Well, even though the initial version of the program has
undefined behavior, it still manages to tick all those boxes, and
is thus a well-formed program. Gah! I suppose that's the loophole
for the experts, but I must say it looks more like a noose.
Recall to mind the rule that riles:
Unless otherwise specified, the behavior of a C++ program is
undefined if it adds declarations or definitions to namespace
std or to a namespace within namespace std.
By establishing "undefined behavior", that sentence in the
Standard not only allows, but even invites, an implementation to
utterly ignore that entire sentence; effectively, it is as though
that sentence doesn't even exist, because pretending that it
doesn't exist is certainly within the realm of allowed behavior.
Why is that sentence even there? How many more such ghostly
sentences exist within the Standard? Quite a few, I imagine, as
implied by GCC's documentation:
$ info '(gcc)Warning Options'
According to that, the GCC project isn't even much interested in
what the Standard does explicitly prohibit:
A feature to report any failure to conform to ISO C might be
useful in some instances, but would require considerable
additional work and would be quite different from '-Wpedantic'.
We don't have plans to support such a feature in the near
future.
Hey. Maybe that's a good thing. If our compilers actually cared,
we'd have nothing to talk about in the forums.
Surely, though, this is not the case here; surely, that which is
a matter of local compile-time analysis could benefit from rules
with well-defined behavior. Were a Language Lawyer to happen upon
the program in question, that lawyer would feel morally compelled
to belittle the blunder; shouldn't the compiler chastise the
same? Let the Standard state instead something similar to this:
Namespace std shall be distinguished from any other namespace
in only the following way: If a C++ program adds a declaration
or definition to namespace std (or to a namespace within
namespace std), then that declaration or definition shall
be treated as an implementation-specific extension (4.1) of
this International Standard. [Note: Such an extension is not
allowed to alter the behavior of any well-formed program;
thus, if such an extension does alter the behavior of any
well-formed program, the extended implementation is essentially
non-conforming, and the behavior is undefined. ---end note]
As referenced, this rule depends on section `[intro.compliance]':
http://eel.is/c++draft/intro.compliance
https://github.com/cplusplus/draft/blob/99325f3d8975075d27e40d8548919f70fc7824b8/source/intro.tex#L441
That section states:
A conforming implementation may have extensions (including
additional library functions), provided they do not alter
the behavior of any well-formed program. Implementations are
required to diagnose programs that use such extensions that are
ill-formed according to this document. Having done so, however,
they can compile and execute such programs.
This also seems to have the benefit of standardizing the existing
behavior of at least 2 major compilers, `g++' and `clang++'.
Where else in the Standard might compile-time behavior be
separated from run-time behavior, and thereby be recast as at
least a diagnosable rule?
Sincerely,
Michael Witten