It is not always possible to detect this situation at compile-time.
extern init(double*);
int main(void)
{
double d;
init(&d);
return d; /* reading d, but is d initialized? */
}
Let's take your example. It's a good one.
You have already been shown why a read before initialisation can't be
diagnosed with 100% reliability at compilation time. But hey, we could
fix that, right? We could simply ensure that *all* data are initialised
(to 0, presumably). After all, this is /already/ the case for statics
and externs, right? So why not write that into the language spec?
It turns out, however, that this isn't such a smart idea after all.
One-time zero-initialisation of statics and externs isn't that big a
cost, and it's typical for the compiler to write the zeros into the
program image so that they get loaded into memory at the same time as
the rest of the program - cost, effectively zero. But auto objects are
being created and destroyed all the time, typically from a pool of
memory that is being re-used over and over again. To insist that all
auto objects are zeroed before use, when - a large portion of the time,
at least - the zero value won't actually get used, is to impose an
overhead on the system *for the sake of people who can't initialise
objects properly*, while giving no benefit to those people who /can/ and
/do/ initialise objects properly.
So, to avoid imposing a hidden runtime penalty, C doesn't zero out auto
scope objects by default, and it places the burden on you the programmer
to get it right. Whenever you see the word "undefined" in the spec, read
it as "oops, I do not want to go there - so it's up to *me* not to screw
this up".
"Trust the programmer", runs the old C adage. If the programmer doesn't
feel up to honouring that trust, then perhaps they need to switch to a
B&D language. But for those with sufficient hubris to take on the burden
of trust, the payoffs can be considerable.
--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within
Usually because there are machines on which the most natural behavior
is some kind of trap or interrupt, and avoiding this would be EXTREMELY
expensive.
> Granted when pointers are
> involved, you're often at the mercy of the system itself, but for
> something like reading a variable before it is initialized... seems to
> me that this could be easily standardized as a compile-time error.
No, it couldn't. Halting problem.
int i, n = 0;
scanf("%d", &n);
if (n != 1) {
i = 0;
}
i; /* do we read i before it is initialized? */
> Your thoughts?
In general, undefined behavior occurs when you do something fundamentally
incoherent, and the cost of expecting a compiler to check for it or deal
with it is very large, and the cost of telling you not to do that is small.
-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
> On 2010-06-05, xlar54 <scott....@gmail.com> wrote:
<snip asking about why some things are undefined in C>
>> Granted when pointers are
>> involved, you're often at the mercy of the system itself, but for
>> something like reading a variable before it is initialized... seems to
>> me that this could be easily standardized as a compile-time error.
>
> No, it couldn't. Halting problem.
Yes, though your example might confuse someone expecting to see how the
halting problem relates to this question.
> int i, n = 0;
> scanf("%d", &n);
> if (n != 1) {
> i = 0;
> }
> i; /* do we read i before it is initialized? */
This example is ironic since it introduces another source of UB: the
scanf call can produce UB when otherwise well-formed input can't be
represented as an int. That does not matter to the point you are
making, but something entirely well-defined like a simple getchar call
would have avoided the irony.
The irony is significant in that the OP is wondering why so many things
are UB in C and this is one of the most infuriating examples with, in my
opinion, the weakest justification. It means you can't use any of the
scanf family for numeric input if you take UB and corner cases
seriously. You can cripple the input with a length limit (%9ld for
example) but that is hardly satisfactory.
[Aside: my preference would be for a correctly formatted but
unrepresentable input to be classed as a matching failure.]
>> Your thoughts?
>
> In general, undefined behavior occurs when you do something fundamentally
> incoherent, and the cost of expecting a compiler to check for it or deal
> with it is very large, and the cost of telling you not to do that is
> small.
Ack "in general" but in the specific case you introduced I don't think
the cost would be very large and the benefit would be significant but
maybe I am missing the reason for this specific UB. (I'd alter atoi to
be well-defined as well, though there are relatively simple solutions
for that function.)
--
Ben.
There've been several responses on the specific issue of
using uninitialized variables, but on the wider question of "Why
is some behavior left undefined?" here are a few quotes from
Section 3 of the Rationale:
"The terms unspecified behavior, undefined behavior, and
implementation-defined behavior are used to categorize the
result of writing programs whose properties the Standard
does not, or cannot, completely describe. The goal of adopting
this categorization is to allow a certain variety among
implementations which permits quality of implementation to be
an active force in the marketplace as well as to allow certain
popular extensions, without removing the cachet of conformance
to the Standard."
"Undefined behavior gives the implementor license not to catch
certain program errors that are difficult to diagnose. It also
identifies areas of possible conforming language extension:
the implementor may augment the language by providing a
definition of the officially undefined behavior."
I'd read this as saying there are multiple reasons to leave some
behaviors undefined. Here are my paraphrases of a few:
- Some errors are difficult to detect (most of the responses about
uninitialized variables have mentioned this aspect), so the
Standard places the burden for their detection on the programmer
rather than on the compiler. Looking at it another way, getting
the program to run correctly is a shared responsibility, and the
compiler shouldn't have to shoulder all of it unaided.
- Leaving some behaviors undefined may lead to higher-quality
implementations of defined behaviors. For example, strcpy()
may be able to use high-speed in-line instruction sequences
that would be unsuitable if it had to worry about predictable
behavior when source and destination overlap. By leaving the
behavior on overlap undefined, the Standard permits strcpy()
implementations that are faster than they might be otherwise.
- Leaving some behaviors undefined allows opportunity for language
and library extensions. If *all* behaviors were nailed down,
extensions would be impossible. (It is interesting, although
discouraging, to note that some of the more virulent anti-Standard
posters to this forum are the same people who make extensive use
of the freedoms the Standard grants them.)
Finally, there's a further argument for undefined behavior, one
that neither the Standard nor the Rationale appears to state out loud:
It's *really* *hard* to define everything precisely! If the writers
had withheld the Standard until every single corner had been smoothed,
primed, and varnished, we would still be waiting for the first version.
--
Eric Sosman
eso...@ieee-dot-org.invalid
i not find the problem
if someone want to eliminate this UB it is easy
"double d;" could mean "double d=0.0;"
in each compiler => no UB in this case
Except that the current standards do not support such behaviour. If, in a
*new* standard, it was specified that explicitly uninitialized automatic
variables take on a zero (or floatingpoint zero, or NULL, or zero-like (for
structures and unions) ) value, then your plan would work. But, right now,
the standards say (for instance, in C90, Section 6.7.8, paragraph 10)
"If an object that has automatic storage duration is not initialized
explicitly, its value is indeterminate."
Your
double d;
declares d to be an object of type double, with automatic storage duration,
not initialized to any set value. And, thus 6.7.8 #10 applies, and the
value of d is indeterminate /as far as the standard is concerned/, no
matter /what/ individual compilers do.
--
Lew Pitcher
Master Codewright & JOAT-in-training | Registered Linux User #112576
Me: http://pitcher.digitalfreehold.ca/ | Just Linux: http://justlinux.ca/
---------- Slackware - Because I know what I'm doing. ------
wrong this is not UB, could be one error of the programmer.
is it possible one warnign for pass to one function
one address of one not initializated variable?
> >> It is not always possible to detect this situation at compile-time.
> >>
> >> extern init(double*);
> >>
> >> int main(void)
> >> {
> >> double d;
> >> init(&d);
> >> return d; /* reading d, but is d initialized? */
> >> }
> >
> > i not find the problem
> > if someone want to eliminate this UB it is easy
> > "double d;" could mean "double d=0.0;"
> > in each compiler => no UB in this case
>
> wrong this is not UB, could be one error of the programmer.
> is it possible one warnign for pass to one function
> one address of one not initializated variable?
That's not a good idea.
The str* functions in <stdlib.h>
are commonly used with a pointer to an uninitialised object
as an argument.
--
pete
Which is in contrast to your own policy of initialising
all objects explicitly. Given that modern compilers can
detect uninitialised objects in most cases, it seems
reasonable to believe they can detect unnecessary
initialisation with equal success.
I can understand how the initial standards didn't want to
impose burdens on implementations on small systems, but
people these days almost exclusively use cross compilers
on large systems when targetting embedded platforms.
--
Peter
Yes. I like everything to start with a known value, because I find that
it makes debugging easier. But I don't insist that C should change its
ways just to suit my particular preferences. If I want to have
everything Just So, it's my job, not C's job, to make it so.
<snip>
> >>>Ive lurked a bit, always reading and learning (thank you all).
> >>>Regarding undefined behaviors.. in some cases I can understand, but in
> >>>others I dont fully get it. Why would the standards committee allow
> >>>an undefined behavior? Why not define it? Granted when pointers are
> >>>involved, you're often at the mercy of the system itself, but for
> >>>something like reading a variable before it is initialized... seems to
> >>>me that this could be easily standardized as a compile-time error.
it couldn't be a compile time error, in general it's too hard to
detect. Halting Problem too hard.
> >> It is not always possible to detect this situation at compile-time.
>
> >> extern init(double*);
>
> >> int main(void)
> >> {
> >> double d;
> >> init(&d);
> >> return d; /* reading d, but is d initialized? */
> >> }
>
> > i not find the problem
> > if someone want to eliminate this UB it is easy
> > "double d;" could mean "double d=0.0;"
> > in each compiler => no UB in this case
that wouldn't help you detect the error at compile time which is what
you suggested. I'm guessing they didn't do this becuase it has a
slight cost.
> Except that the current standards do not support such behaviour.
since he's asking why the standard is the way it is, this is a bit of
a daft answer
> If, in a
> *new* standard, it was specified that explicitly uninitialized automatic
> variables take on a zero (or floatingpoint zero, or NULL, or zero-like (for
> structures and unions) ) value, then your plan would work. But, right now,
> the standards say (for instance, in C90, Section 6.7.8, paragraph 10)
> "If an object that has automatic storage duration is not initialized
> explicitly, its value is indeterminate."
>
> Your
> double d;
> declares d to be an object of type double, with automatic storage duration,
> not initialized to any set value. And, thus 6.7.8 #10 applies, and the
> value of d is indeterminate /as far as the standard is concerned/, no
> matter /what/ individual compilers do.
--
Avoid hyperbole at all costs,
it's the most destructive argument on the planet.
- Mark McIntyre in comp.lang.c
> On 6/5/2010 3:57 AM, xlar54 wrote:
>[why is there undefined behavior]
>
> There've been several responses on the specific issue of
> using uninitialized variables, but on the wider question of "Why
> is some behavior left undefined?" here are a few quotes from
> Section 3 of the Rationale:
>
> [various good reasons given, including some by ES]
>
> Finally, there's a further argument for undefined behavior, one
> that neither the Standard nor the Rationale appears to state out loud:
> It's *really* *hard* to define everything precisely! [snip]
I don't buy this argument. It might be hard to identify
where the line is, but it's the Standard's job to draw that
line sharply and precisely. Once the line is drawn, it's
very easy to say "everything on <side X> of the line causes
the program to stop termination immediately; all actions
before have been done, all actions following have not been
started." It's easy to give a definition. What makes the
question hard is /what/ definition to give -- that's why there
is undefined behavior, to avoid having to be pinned down to a
single answer.
(I should add that the rest of Eric's comments were spot on.)
That may be true but it's irrelevant to the point I was making.
> That's a major reason why we have undefined behaviour. It doesn't
> exist in Java, because you have no platform dependence.
That specific form of UB (null pointer handling) doesn't exist in Java,
but other forms of UB do.
One form of UB which will be found in any real language is the amount of
memory available. No real language is going to specify that allocating N
bytes of memory (in total) must succeed while allocating N+1 bytes must
fail.
Similarly, any language which provides the equivalent of time() is going
to admit UB through execution times; no real language is going to specify
that a given code fragment must take N seconds (or, at least, must
*appear* to take N seconds according to time()).
N must be some number.
Evaluating N,
would be characterized as "unspecified behavior" in C.
--
pete
Right.
> One form of UB which will be found in any real language is the amount of
> memory available. No real language is going to specify that allocating N
> bytes of memory (in total) must succeed while allocating N+1 bytes must
> fail.
That seems to stretch "undefined" beyond its useful elasticity.
In the language of the C Standard "implemenation-defined" or perhaps
"unspecified" would cover it better than "undefined." Follow this
route a bit further and you'll call `printf("Hello, world!\n")'
undefined because of the possibility of I/O error. That way lies
madness.
> Similarly, any language which provides the equivalent of time() is going
> to admit UB through execution times; no real language is going to specify
> that a given code fragment must take N seconds (or, at least, must
> *appear* to take N seconds according to time()).
C dodges this particular bullet by not treating elapsed time
as a "behavior" in the first place. The definition "external
appearance or action" (3.4p1) is over-broad, I'd say: It includes,
for example, the fragrance of fopen() and the sound of setjmp().
Still, I think we can exclude elapsed time from consideration
because it is not listed among the attributes the Standard claims
to govern (1p1).
--
Eric Sosman
eso...@ieee-dot-org.invalid
In C 'undefined' means "anything can happen". So ptr = malloc(N); if(!
ptr) exit(EXIT_FAILURE); is defined, even though it may not be
possible to predict whether the branch will be taken. ptr = malloc(N);
*ptr = 1; is however undefined if malloc(0 returns null. The result of
writing to the null pointer could be anything from an error message to
the failure of the keyboard to another function in a seemingly
unrelated part of the program returning the wrong result.
> >> > Finally, there's a further argument for undefined behavior, one
> >> > that neither the Standard nor the Rationale appears to state out loud:
> >> > It's *really* *hard* to define everything precisely! [snip]
>
> >> I don't buy this argument.
I'm semi in agreement with you. I think C left much behaviour
undefined because it was expensive to compute. C traded off absolute
safety for speed and simplicity of implementation.
> >> It might be hard to identify
> >> where the line is, but it's the Standard's job to draw that
> >> line sharply and precisely.
I'm not sure I agree. A standard may do this but I don't think it's
under any obligation to do so.
> >> Once the line is drawn, it's
> >> very easy to say "everything on <side X> of the line causes
> >> the program to stop termination immediately;
but not necessarily easy to implement
> >> all actions
> >> before have been done, all actions following have not been
> >> started."
don't the freedom of action between sequence points bugger this up?
> >> It's easy to give a definition.
I bet it isn't. Writing standards is hard.
> >> What makes the
> >> question hard is /what/ definition to give -- that's why there
> >> is undefined behavior, to avoid having to be pinned down to a
> >> single answer.
I don't think this is why C is the way it is
> > On some systems writing to a null pointer will trigger a hardware
> > trap, on others it will place a byte at position zero in memory.
> > Mandating a behaviour would put a burden on one compiler, essentially
> > involving an if stateemnt at every pointer write, so it's easier to
> > say 'the behaviour is undefined'.
>
> That may be true but it's irrelevant to the point I was making.
--
Most Ada programmers would consider going out of your way to
construct an Ada program that had a potential buffer overflow not as
a challenge, but as a kind of pornography.
Once again that may be true but it's irrelevant to the point
I was making.
> On 20 June, 09:36, Tim Rentsch <t...@alumni.caltech.edu> wrote:
>> Malcolm McLean <malcolm.mcle...@btinternet.com> writes:
>> > On Jun 20, 4:24 am, Tim Rentsch <t...@alumni.caltech.edu> wrote:
>
>> >> > Finally, there's a further argument for undefined behavior, one
>> >> > that neither the Standard nor the Rationale appears to state out loud:
>> >> > It's *really* *hard* to define everything precisely! [snip]
>>
>> >> I don't buy this argument.
>
> I'm semi in agreement with you. I think C left much behaviour
> undefined because it was expensive to compute. C traded off absolute
> safety for speed and simplicity of implementation.
>
>> >> It might be hard to identify
>> >> where the line is, but it's the Standard's job to draw that
>> >> line sharply and precisely.
>
> I'm not sure I agree. A standard may do this but I don't think it's
> under any obligation to do so.
When I say "draw the line" what I mean is to identify which
behaviors are defined and which behaviors are undefined.
(Also, which are unspecified, etc.) My position is (still)
that it is /absolutely/ the job of the Standard to do this.
If someone can't tell after reading the Standard whether
behavior X is defined or undefined, it has failed to fulfill
(at least one of) its primary function(s).
>> >> Once the line is drawn, it's
>> >> very easy to say "everything on <side X> of the line causes
>> >> the program to stop termination immediately;
>
> but not necessarily easy to implement
Very true.
>> >> all actions
>> >> before have been done, all actions following have not been
>> >> started."
>
> don't the freedom of action between sequence points bugger this up?
Obviously this needs to be taken into account, but I don't
think it prevents any significant difficulties. Remember,
we only have to say what the behavior will be, we don't
have to write a compiler that provides that behavior.
>> >> It's easy to give a definition.
>
> I bet it isn't. Writing standards is hard.
Sure it is; just try it:
"Execution of any statement whose behavior is not defined by
this Standard shall cause the computer it's running on to
catch fire."
"Execution of any statement whose behavior is not defined by
this Standard shall issue launch codes to all armed nuclear
missles."
"Execution of any statement whose behavior is not defined by
this Standard shall initiate entering the Hobart Phase where
time flows backwards instead of forwards."
It's only if we want the definition to be agreeable to potential
implementors that it gets hard.
>> >> What makes the
>> >> question hard is /what/ definition to give -- that's why there
>> >> is undefined behavior, to avoid having to be pinned down to a
>> >> single answer.
>
> I don't think this is why C is the way it is
I think I see the point you're making, and I believe I agree with it,
at least partly. What I meant by the statement wasn't expressed very
well. I wasn't trying to explain why C has undefined behavior _at
all_; that's historical plus a lot of other different things. But
when considering some particular aspect, and deciding whether its
behavior will be defined or undefined (and ignoring for the moment
some other possibilities such as implementation defined), it's often
true that "undefined behavior" simply means we don't want to be pinned
down to a single answer. It isn't that a choice can't be made, or
even that a choice can't be made that's reasonably cheap to implment;
but rather that we have decided /not to make a choice at all/ -- to
leave the freedom of choice (for that aspect) open to other factors.