"undefined behaviour" vs "ill-formed & no diagnostic required" ?

564 views
Skip to first unread message

andrew...@gmail.com

unread,
May 7, 2013, 4:28:53 PM5/7/13
to std-dis...@isocpp.org
In some places in the standard it says:

    A. If <blah> the program has undefined behavior.

In other places it says:

    B. If <blah> the program is ill-formed. No diagnostic required.

What is the semantic difference, if any, between these two statements?

That is, what could an implementation do differently (or would be required to do differently) if an occurrence of the consequent of A was replaced with the consequent of B (or visa versa)?

If there is no difference, why not simplify by replacing all occurrences of one with the other?

Daniel Krügler

unread,
May 7, 2013, 4:37:31 PM5/7/13
to std-dis...@isocpp.org
2013/5/7 <andrew...@gmail.com>:
> In some places in the standard it says:
>
> A. If <blah> the program has undefined behavior.
>
> In other places it says:
>
> B. If <blah> the program is ill-formed. No diagnostic required.
>
> What is the semantic difference, if any, between these two statements?

To the best of my knowledge there is no difference in this two
phrasings, the common link being [intro.compliance] p2 b3:

"— If a program contains a violation of a rule for which no diagnostic
is required, this International
Standard places no requirement on implementations with respect to that program."

compared to the definition of undefined behavior:

"undefined behavior
behavior for which this International Standard imposes no requirements"

> That is, what could an implementation do differently (or would be required
> to do differently) if an occurrence of the consequent of A was replaced with
> the consequent of B (or visa versa)?

I don't see any different choice, except that speculating upon the
effects of undefined behavior has limited success ;-)

> If there is no difference, why not simplify by replacing all occurrences of
> one with the other?

I think the main reason here is that in some parts the focus is on
ill-formedness with required diagnostics being a flavour in regard to
the specification, but sometimes not. But I cannot give a very good
reason otherwise.

- Daniel

Johannes Schaub

unread,
May 7, 2013, 4:48:16 PM5/7/13
to std-dis...@isocpp.org


Am 07.05.2013 22:37 schrieb "Daniel Krügler" <daniel....@gmail.com>:
>
> 2013/5/7  <andrew...@gmail.com>:

> > If there is no difference, why not simplify by replacing all occurrences of
> > one with the other?
>
> I think the main reason here is that in some parts the focus is on
> ill-formedness with required diagnostics being a flavour in regard to
> the specification, but sometimes not. But I cannot give a very good
> reason otherwise.
>

I have heard the statement that illformed; NDR is for cases that an implementation could in principal diagnose if it works hard enough (if had whole program optimization or a sufficient intelligent linker for instance).

And that undefined behavior is used for the remaining cases and for the "no explicit behavior defined" (for example there is no behavior defined for the case when a resource limit is exceeded).

Lawrence Crowl

unread,
May 7, 2013, 6:19:37 PM5/7/13
to std-dis...@isocpp.org
In other words, ill-formed-NDR is a static property of the program,
independent of the data it processes. Undefined behavior is
everything else.

--
Lawrence Crowl

andrew...@gmail.com

unread,
May 7, 2013, 11:49:08 PM5/7/13
to std-dis...@isocpp.org, schaub....@googlemail.com
If this were true, than you would expect that the ~50 cases that are designated undefined behaviour would be in general harder to diagnose than the ~40 cases designated illformed NDR.  Reading through each case however, this trend does not seem to be present.  There seems to be no correlation between the difficult of diagnosis and the selection of the two phrasings.

I suspect the more likely explanation is that the two phrasings evolved independently, are in fact synonymous, and have been decided between more or less randomly by contributors.

Jens Maurer

unread,
May 8, 2013, 2:24:10 AM5/8/13
to std-dis...@isocpp.org
On 05/08/2013 05:49 AM, andrew...@gmail.com wrote:
> If this were true, than you would expect that the ~50 cases that are
> designated undefined behaviour would be in general harder to diagnose
> than the ~40 cases designated illformed NDR. Reading through each
> case however, this trend does not seem to be present. There seems to
> be no correlation between the difficult of diagnosis and the
> selection of the two phrasings.

I always considered "ill-formed, no diagnostic required" something that
could (in principle) be checked at translation time, e.g. during
linking.

Example: one-definition rule in 3.2p4


In contrast, "undefined behavior" would require (potentially very costly)
runtime instrumentation of your program to detect.

Example: Data races cause undefined behavior in 1.10p21. I think
these are undecidable (in the theoretical computer science sense)
at compile time.


Do you have specific examples where this translation time vs.
runtime differentiation is violated? Those should be fixed
in the standard.

Jens

Daniel Krügler

unread,
May 8, 2013, 2:43:53 AM5/8/13
to std-dis...@isocpp.org
2013/5/8 Jens Maurer <Jens....@gmx.net>:
Aren't the following cases at least theoretically testable during compile-time:

a) [lex.phases] p1 b2:

"If, as a result, a character sequence that matches the syntax of a
universal-character-name is produced, the behavior is undefined."

b) dito b4:

"If a character sequence that matches the syntax of a
universal-character-name is produced by token concatenation (16.3.3),
the behavior is undefined."

c) [lex.pptoken] p2:

"The categories of preprocessing token are: header names, identifiers,
preprocessing numbers, character literals
(including user-defined character literals), string literals
(including user-defined string literals), preprocessing
operators and punctuators, and single non-white-space characters that
do not lexically match the other
preprocessing token categories. If a ’ or a " character matches the
last category, the behavior is undefined."

d) [basic.def.odr] p6:

"If the definitions of D do not satisfy these requirements, then the
behavior is undefined."

e) [temp.dep.candidate] p1:

"If the call would be ill-formed or would find a better match had the
lookup within the associated namespaces
considered all the function declarations with external linkage
introduced in those namespaces in all
translation units, not just considering those declarations found in
the template definition and template
instantiation contexts, then the program has undefined behavior."

- Daniel

Fernando Cacciola

unread,
May 8, 2013, 9:29:14 AM5/8/13
to std-dis...@isocpp.org
On Wed, May 8, 2013 at 3:43 AM, Daniel Krügler <daniel....@gmail.com> wrote:
2013/5/8 Jens Maurer <Jens....@gmx.net>:
> On 05/08/2013 05:49 AM, andrew...@gmail.com wrote:
>> If this were true, than you would expect that the ~50 cases that are
>> designated undefined behaviour would be in general harder to diagnose
>> than the ~40 cases designated illformed NDR.  Reading through each
>> case however, this trend does not seem to be present.  There seems to
>> be no correlation between the difficult of diagnosis and the
>> selection of the two phrasings.
>
> I always considered "ill-formed, no diagnostic required" something that
> could (in principle) be checked at translation time, e.g. during
> linking.
>

Me too, though Daniel seems to be right.

But, isn't there a real, well defined and well identifiable difference between the (undefined) behaviour of a program containing an error in the source code, for which no diagnostic is required simply to avoid the burden on the implementation, and the undefined behavior due to a runtime property of an otherwise correct program? I think it is.

If indeed there is, shouldn't those two phrases mean what it seems many of us thought it meant, even if the text needs to be corrected?

Best

--
Fernando Cacciola
SciSoft Consulting, Founder
http://www.scisoft-consulting.com

Tomalak Geret'kal

unread,
May 8, 2013, 10:16:13 AM5/8/13
to std-dis...@isocpp.org
Hi Andrew,

I see the difference as the following:

*undefined behaviour* - "the standard has no business making
any judgements on whether or not you should be able to do
this, nor what should happen when you do it, and will just
leave it well alone"

*ill-formed, no diagnostic required* - "the standard
fervently believes that this is /wrong/ and that you should
not be allowed to do it (although, unfortunately, diagnosing
it is a right pain and we can't realistically require an
implementation to be so kind)"

I guess the problem is that there is arguably no /practical/
difference between the two, but you asked for semantics...

Tom

Jens Maurer

unread,
May 8, 2013, 2:44:44 PM5/8/13
to std-dis...@isocpp.org
On 05/08/2013 08:43 AM, Daniel Kr�gler wrote:
> Aren't the following cases at least theoretically testable during compile-time:
>
> a) [lex.phases] p1 b2:
>
> "If, as a result, a character sequence that matches the syntax of a
> universal-character-name is produced, the behavior is undefined."

Yes.

> b) dito b4:
>
> "If a character sequence that matches the syntax of a
> universal-character-name is produced by token concatenation (16.3.3),
> the behavior is undefined."

Yes.

> c) [lex.pptoken] p2:
>
> "The categories of preprocessing token are: header names, identifiers,
> preprocessing numbers, character literals
> (including user-defined character literals), string literals
> (including user-defined string literals), preprocessing
> operators and punctuators, and single non-white-space characters that
> do not lexically match the other
> preprocessing token categories. If a � or a " character matches the
> last category, the behavior is undefined."

Yes.

> d) [basic.def.odr] p6:
>
> "If the definitions of D do not satisfy these requirements, then the
> behavior is undefined."

Yes.

> e) [temp.dep.candidate] p1:
>
> "If the call would be ill-formed or would find a better match had the
> lookup within the associated namespaces
> considered all the function declarations with external linkage
> introduced in those namespaces in all
> translation units, not just considering those declarations found in
> the template definition and template
> instantiation contexts, then the program has undefined behavior."

Yes.

It seems all your examples should be "ill-formed, no diagnostic
required". Could you please send e-mail to Mike Miller to open
a core issue?

Thanks,
Jens


Jens Maurer

unread,
May 8, 2013, 2:47:11 PM5/8/13
to std-dis...@isocpp.org
On 05/08/2013 04:16 PM, Tomalak Geret'kal wrote:
> I see the difference as the following:
>
> *undefined behaviour* - "the standard has no business making
> any judgements on whether or not you should be able to do
> this, nor what should happen when you do it, and will just
> leave it well alone"

These days, optimizers will actually exploit undefined
behavior. For example, if the optimizer can prove that a code
path reads an uninitialized variable, it can remove all
dependent calculations. I would be surprised if a compiler
did that for the "ill-formed, no diagnostic required" category.

Thanks,
Jens

Johannes Schaub

unread,
May 8, 2013, 3:37:00 PM5/8/13
to std-dis...@isocpp.org


Am 08.05.2013 20:44 schrieb "Jens Maurer" <Jens....@gmx.net>:


>
> On 05/08/2013 08:43 AM, Daniel Krügler wrote:
> > Aren't the following cases at least theoretically testable during compile-time:
> >
> > a) [lex.phases] p1 b2:
> >
> > "If, as a result, a character sequence that matches the syntax of a
> > universal-character-name is produced, the behavior is undefined."
>
> Yes.
>
> > b) dito b4:
> >
> > "If a character sequence that matches the syntax of a
> > universal-character-name is produced by token concatenation (16.3.3),
> > the behavior is undefined."
>
> Yes.
>
> > c) [lex.pptoken] p2:
> >
> > "The categories of preprocessing token are: header names, identifiers,
> > preprocessing numbers, character literals
> > (including user-defined character literals), string literals
> > (including user-defined string literals), preprocessing
> > operators and punctuators, and single non-white-space characters that
> > do not lexically match the other

> > preprocessing token categories. If a ’ or a " character matches the


> > last category, the behavior is undefined."
>
> Yes.
>
> > d) [basic.def.odr] p6:
> >
> > "If the definitions of D do not satisfy these requirements, then the
> > behavior is undefined."
>
> Yes.
>
> > e) [temp.dep.candidate] p1:
> >
> > "If the call would be ill-formed or would find a better match had the
> > lookup within the associated namespaces
> > considered all the function declarations with external linkage
> > introduced in those namespaces in all
> > translation units, not just considering those declarations found in
> > the template definition and template
> > instantiation contexts, then the program has undefined behavior."
>
> Yes.
>
> It seems all your examples should be "ill-formed, no diagnostic
> required".  Could you please send e-mail to Mike Miller to open
> a core issue?
>

And I recommend adding an issue report about a missing (non normative) note about the difference.

Otherwise we will find ourselfs with the same discussion in 2016 and we will showcase other counterexamples added by C++14 wording.

> Thanks,
> Jens
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
> To post to this group, send email to std-dis...@isocpp.org.
> Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/?hl=en.
>
>

Daniel Krügler

unread,
May 8, 2013, 3:51:01 PM5/8/13
to std-dis...@isocpp.org
2013/5/8 Johannes Schaub <schaub....@googlemail.com>:
>
> And I recommend adding an issue report about a missing (non normative) note
> about the difference.

Good idea. I have added this suggestion to the email I have send to Mike.

- Daniel

Lawrence Crowl

unread,
May 9, 2013, 2:21:43 PM5/9/13
to std-dis...@isocpp.org
IIUC, for Tomalak's meaning, the standard uses "unspecified
behavior".

--
Lawrence Crowl

Tomalak Geret'kal

unread,
May 10, 2013, 2:50:24 AM5/10/13
to std-dis...@isocpp.org
No, I don't think so.

Unspecified behaviour is the "nor what should happen when you do it" part, but not the "whether or not you should be able to do this" part. Unspecified behaviour requires some behaviour, it simply does not specify what that behaviour is.

Whereas undefined behaviour sort of requires no behaviour at all.

I agree with Jens.

Tom
Reply all
Reply to author
Forward
0 new messages