Why does rewind() ignore errors?

Keith Thompson

unread,

May 26, 2006, 3:34:42 PM5/26/06

to

The rewind() function is defined as follows (C99 7.19.9.5):

Synopsis
#include <stdio.h>
void rewind(FILE *stream);
Description
The rewind function sets the file position indicator for the
stream pointed to by stream to the beginning of the file. It
is equivalent to
(void)fseek(stream, 0L, SEEK_SET)
except that the error indicator for the stream is also cleared.
Returns
The rewind function returns no value.

C90 has the same wording.

On an error, fseek() returns a non-zero result and sets the error
indicator for the stream. rewind() deliberately discards this
information.

The Rationale (V5.10, 9.19.9.5) says:

Resetting the end-of-file and error indicators was added to the
specification of rewind to make the specification more logically
consistent.

Why does rewind() discard error information? Wouldn't it have been
both simpler and more useful for rewind() to return the same int
result as fseek() and to set the error indicator when appropriate?

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Douglas A. Gwyn

unread,

May 26, 2006, 5:27:09 PM5/26/06

to

Keith Thompson wrote:
> Why does rewind() discard error information? Wouldn't it have been
> both simpler and more useful for rewind() to return the same int
> result as fseek() and to set the error indicator when appropriate?

It's a legacy interface. If you need the status information
you should use fseek.

jacob navia

unread,

May 26, 2006, 6:27:30 PM5/26/06

to

Douglas A. Gwyn a écrit :

Wouldn't it be better to get rid of it?

I have proposed it here before but again:

Obsolete functions:

The new standard (C 2009) would declare
gets()
rewind()
trigraphs

and other stuff as OBSOLETE. Those names would no longer appear
in the standard but in Appendix: "obsolete constructs". That (non
normative) Appendix would specify all those functions and would
say that in the next revision (C 2019) they will disappear.

People would have 10 years to change their code, or longer, if they
use the "compatibility with C99" module as described in the Appendix
for obsolete stuff.

Keith Thompson

unread,

May 26, 2006, 7:09:02 PM5/26/06

to

Pre-ANSI implementations of rewind() obviously would not have been
declared to return void (since void hadn't been invented yet). Most
likely they would have implicitly returned int. Did the actual
implementations just not return a value?

In any case, adding a requirement for rewind() to return a meaningful
value would not have broken existing code, which could have continued
to quietly ignore any result.

My advice from now on will be: never use rewind(), use fseek()
instead.

Brian Inglis

unread,

May 29, 2006, 3:24:35 AM5/29/06

to

If you consider the spec of rewind() as being to reset the file
position, error, and eof indicators, no error is possible.
It is then the responsibility of the next read/write/get/put... I/O
operation to establish the actual file position by performing a
rewind, seek, or nop, depending on the actual I/O device involved.

--
Thanks. Take care, Brian Inglis Calgary, Alberta, Canada

Brian....@CSi.com (Brian[dot]Inglis{at}SystematicSW[dot]ab[dot]ca)
fake address use address above to reply

Douglas A. Gwyn

unread,

May 30, 2006, 3:12:14 PM5/30/06

to

jacob navia wrote:
> Douglas A. Gwyn a écrit :

> > It's a legacy interface. ...

> Wouldn't it be better to get rid of it?

No. There is strictly conforming code currently using it,
so cointinued documentation of the interface is a useful
servce; and implementation is not a problem (given fseek).

> I have proposed it here before but again:
> Obsolete functions:
> The new standard (C 2009) would declare
> gets()
> rewind()
> trigraphs
> and other stuff as OBSOLETE. Those names would no longer appear
> in the standard but in Appendix: "obsolete constructs". That (non
> normative) Appendix would specify all those functions and would
> say that in the next revision (C 2019) they will disappear.

Feel free to propose it during the next round of active
revision. Actually I have a much better counterproposal
for gets() that would make it safe as well as useful.
Also, if trigraph processing is changed then some currently
correct code will be affected, for example
#if some_condition
??=error "This produces an error even with a vintage compiler"
#endif

If you really want a new language that meets your stylistic
preferences, by all means invent one. C is too important
to mess with its established behavior, unless you can show
that that is the best way to address some clearly identified
problem AND that the change won't cause more damage than benefit.

Douglas A. Gwyn

unread,

May 30, 2006, 3:29:04 PM5/30/06

to

> Pre-ANSI implementations of rewind() obviously would not have been
> declared to return void (since void hadn't been invented yet). Most
> likely they would have implicitly returned int. Did the actual
> implementations just not return a value?

Actually void predated the C standard; it was introduced into AT&T
C compilers not long after 7th Edition Unix (say, around 1979).

Before void type was introduced, there was a lot of muddle about
whether or not (default-int) functions returned a value or not.
If one examines the 7th Edition Unix C library source code, one
sees that in fact nothing was (intentionally) returned by rewind,
so code that used rewind must have treated it the same as we now
consider "returning void". In later versions of Unix, rewind's
return type was explicitly stated to be void.

The worst case I recall of confusion about return type was when
somebody involved with BSD noticed that sprintf unintentionally
happned to always return a pointer to its buffer upon success,
and added that property to their version of the specs. Meanwhile,
in the AT&T thread of evolution it was understood that sprintf
didn't have a spec for what it returned, so it (as well as printf
and fprintf) was changed to officially return an int denoting
how many bytes were transmitted. The latter, being in the base
library, was adopted for the C standard, at which point there
were complaints from the BSD community that had done their own
(different) thing with that interface.

> In any case, adding a requirement for rewind() to return a meaningful
> value would not have broken existing code, which could have continued
> to quietly ignore any result.

If any code uses an explicit pointer to the function, a change
in the type signature could break the code.

> My advice from now on will be: never use rewind(), use fseek()
> instead.

Or, check errno.

Wojtek Lerch

unread,

May 30, 2006, 4:02:20 PM5/30/06

to

"Douglas A. Gwyn" <DAG...@null.net> wrote in message
news:447C9D00...@null.net...

>> My advice from now on will be: never use rewind(), use fseek()
>> instead.
>
> Or, check errno.

The C standard doesn't require that rewind() or fseek() must set errno on
error, does it? Were you perhaps thinking about POSIX?

jacob navia

unread,

May 30, 2006, 5:01:36 PM5/30/06

to

Douglas A. Gwyn a écrit :
> jacob navia wrote:
>
>>Douglas A. Gwyn a écrit :
>>
>>>It's a legacy interface. ...
>>
>>Wouldn't it be better to get rid of it?
>
>
> No. There is strictly conforming code currently using it,
> so cointinued documentation of the interface is a useful
> servce; and implementation is not a problem (given fseek).
>
>
>>I have proposed it here before but again:
>>Obsolete functions:
>>The new standard (C 2009) would declare
>>gets()
>>rewind()
>>trigraphs
>>and other stuff as OBSOLETE. Those names would no longer appear
>>in the standard but in Appendix: "obsolete constructs". That (non
>>normative) Appendix would specify all those functions and would
>>say that in the next revision (C 2019) they will disappear.
>
>
> Feel free to propose it during the next round of active
> revision. Actually I have a much better counterproposal
> for gets() that would make it safe as well as useful.

OK, but then *todays* gets() would be obsolete isn't it?

I mean absolete things appear, and they must go.
Why can't C use a standard way of cleaning up the language?

> Also, if trigraph processing is changed then some currently
> correct code will be affected, for example
> #if some_condition
> ??=error "This produces an error even with a vintage compiler"
> #endif
>

Yes, that would be no longer standard C but supported by compiler that
want to support obsolete constructs for compatibility reasons.

This means that most compilers would still support it, until the next
standard appears (2019).

> If you really want a new language that meets your stylistic
> preferences, by all means invent one.

??? Making obsolete gets() is "inventing a new language"???
Ahh, of course. C is frozen forever, and the bugs of the past
must be faithfully reproduced AD NAUSEUM!!

> C is too important
> to mess with its established behavior, unless you can show
> that that is the best way to address some clearly identified
> problem AND that the change won't cause more damage than benefit.

You haven't noticed that gets() has a small problem maybe?
Do I have to show it to you *again*?

Why denying the evidence?

You yourself want to change it, (see above). Why when I propose
the same thing I am "inventing a new language"?

Why not changing this attitude a bit and accepting that gets()
wasn't a very bright idea. That trigraphs are just nonsense that
got into the standard because I do not know what danish terminal
manufacturers problem... as discussed in this group years
ago.

The same with asctime() and the zero terminated strings.

jacob

Douglas A. Gwyn

unread,

May 30, 2006, 6:00:00 PM5/30/06

to

Wojtek Lerch wrote:
> The C standard doesn't require that rewind() or fseek() must set errno on
> error, does it? Were you perhaps thinking about POSIX?

I had in mind that if one wanted to change the rewind() spec,
requiring it (and probably fseek) to set errno upon error
(as I think POSIX does) would be a reasonable approach.

Douglas A. Gwyn

unread,

May 30, 2006, 6:44:33 PM5/30/06

to

jacob navia wrote:
> Douglas A. Gwyn a écrit :

> > ... Actually I have a much better counterproposal

> > for gets() that would make it safe as well as useful.
> OK, but then *todays* gets() would be obsolete isn't it?

Not with my particular proposal.

> Why can't C use a standard way of cleaning up the language?

Because once something is documented and widely used, there
is a substantial cost in changing it. Only if you can show
that the benefits would clearly outweigh the costs should
such a change be made. There is almost no cost for leaving
some existing but "obsolete" feature alone.

> ??? Making obsolete gets() is "inventing a new language"???

If you make enough such changes, especially changing the
scanner, then yes you have in effect a different language.

My point is that you have given insufficient justification
for meddling with such things. Every programmer on the
planet has his own idea about what a PL "ought" to look
like, so just wanting something changed just because you
don't happen to like it is a non-starter.

> You yourself want to change it, (see above). Why when I propose
> the same thing I am "inventing a new language"?

My change would not affect the usage of the current feature,
except to render some instances of undefined behavior well
defined.

> ... trigraphs are just nonsense that got into the standard

> because I do not know what danish terminal manufacturers
> problem...

The problem was actually one for Scandinavian
*programmers*, among others. Due to their having
larger linguistic character sets than English has, but
sharing general dimensions for keyboards (an economic
constraint), they didn't have some of the special
punctuation characters available. Since C was
originally designed in an environment where keyboards
did have all those special characters, in those other
environments it was hard to efficiently create C source
code. So there was always a genuine problem, brought
to light when internationalizing the specification.

The ability to express programs in the universal
invariant character set is not "nonsense". The only
available alternatives were to change C's operators to
different characters, or to define escape sequences
(the latter being what we did, as the most acceptable
and least obtrusive for people in more benign
environments).

What about the later C++-inspired digraphs? They actually
are worse from a design standpoint, since they're applied
too far downstream in the phases of translation.

Personally I don't think the C standard should have said
any more on the subject than that the full source
character set is supported somehow, and left the details
up to implementations.

However, trigraphs are now a defined part of the language.
I could support adding another defined trigraph, mapped
to the ? character; a proper escape sequence should from
the outset have included a way to escape the escape prefix
(the lack of which in some contexts is difficult to work
around). Note that even such an improvement could affect
*some* existing code; e.g. if "???" were picked, the
example in the current standard would behave differently.

Keith Thompson

unread,

May 30, 2006, 7:32:24 PM5/30/06

to

"Douglas A. Gwyn" <DAG...@null.net> writes:

> jacob navia wrote:
>> Douglas A. Gwyn a écrit :
>> > ... Actually I have a much better counterproposal
>> > for gets() that would make it safe as well as useful.
>> OK, but then *todays* gets() would be obsolete isn't it?
>
> Not with my particular proposal.

I'd be interested in the details of your proposal, or even a brief
overview. Is it published anywhere?

[...]

>> ... trigraphs are just nonsense that got into the standard
>> because I do not know what danish terminal manufacturers
>> problem...

I understand why trigraphs were introduced. In hindsight, I think I
would have done them differently. Specifically, enable them only if
there's a directive of some kind within the source file that
specifically requests them. The obvious form of the directive would
be:

??=pragma STDC TRIGRAPHS ON

with a special-case rule to allow this particular instance of "??=" to
be mapped to "#" even if trigraphs are disabled. Or something like
that. The committee would also have had to decide whether this
applies across #included files. And of course a compiler could choose
to provide a command-line option that controls the default setting.

But it's too late to do this if we aren't allowed to break existing
code. (On the other hand, I wonder how much existing code actually
uses trigraphs; I'm tempted to suggest that it might be worth the
cost.)

[...]

> What about the later C++-inspired digraphs? They actually
> are worse from a design standpoint, since they're applied
> too far downstream in the phases of translation.

In my opinion, this makes digraphs *better* in some ways. They can be
used as alternative names for certain punctuation marks without
messing up string literals and character constants. They aren't a
complete solution to the problem of a keyboard with no '#' key,
though.

Charlie Gordon

unread,

May 30, 2006, 8:49:07 PM5/30/06

to

"Keith Thompson" <ks...@mib.org> wrote in message
news:lnk683y...@nuthaus.mib.org...

> "Douglas A. Gwyn" <DAG...@null.net> writes:
> > jacob navia wrote:
> >> Douglas A. Gwyn a écrit :
> >> > ... Actually I have a much better counterproposal
> >> > for gets() that would make it safe as well as useful.
> >> OK, but then *todays* gets() would be obsolete isn't it?
> >
> > Not with my particular proposal.
>
> I'd be interested in the details of your proposal, or even a brief
> overview. Is it published anywhere?

Let me guess : do you suggest limitting the amount of input gets() stores into
the buffer to some magical constant such as BUFSIZ ?

> [...]
> >> ... trigraphs are just nonsense that got into the standard
> >> because I do not know what danish terminal manufacturers
> >> problem...
>
> I understand why trigraphs were introduced. In hindsight, I think I
> would have done them differently. Specifically, enable them only if
> there's a directive of some kind within the source file that
> specifically requests them. The obvious form of the directive would
> be:
>
> ??=pragma STDC TRIGRAPHS ON
>
> with a special-case rule to allow this particular instance of "??=" to
> be mapped to "#" even if trigraphs are disabled. Or something like
> that. The committee would also have had to decide whether this
> applies across #included files. And of course a compiler could choose
> to provide a command-line option that controls the default setting.

Please no! piling more trash on top of this mess just makes it worse.

> But it's too late to do this if we aren't allowed to break existing
> code. (On the other hand, I wonder how much existing code actually
> uses trigraphs; I'm tempted to suggest that it might be worth the
> cost.)

There is no use defending trigraphs at this point: the only sensible way to
handle them is what gcc does with warnings enabled: issue a warning and ignore
them.

x = parse_from_template(&state, "??/??/??");

trigraphs.c:219:27: warning: trigraph ??/ ignored
trigraphs.c:219:30: warning: trigraph ??/ ignored

> [...]
> > What about the later C++-inspired digraphs? They actually
> > are worse from a design standpoint, since they're applied
> > too far downstream in the phases of translation.
>
> In my opinion, this makes digraphs *better* in some ways. They can be
> used as alternative names for certain punctuation marks without
> messing up string literals and character constants. They aren't a
> complete solution to the problem of a keyboard with no '#' key,
> though.

If you cannot work around this trivial shortcoming, I suggest you dump both the
keyboard and the OS it is talking to at the Computer Museum.

Chqrlie.

Charlie Gordon

unread,

May 30, 2006, 9:30:40 PM5/30/06

to

"Douglas A. Gwyn" <DAG...@null.net> wrote in message

news:447CCAD1...@null.net...

> jacob navia wrote:
> > Douglas A. Gwyn a écrit :
> > > ... Actually I have a much better counterproposal
> > > for gets() that would make it safe as well as useful.
> > OK, but then *todays* gets() would be obsolete isn't it?
>
> Not with my particular proposal.

Please enlighten us, what is your proposal ?

> > Why can't C use a standard way of cleaning up the language?
>
> Because once something is documented and widely used, there
> is a substantial cost in changing it. Only if you can show
> that the benefits would clearly outweigh the costs should
> such a change be made. There is almost no cost for leaving
> some existing but "obsolete" feature alone.

It is not just obsolete, it is a blatant flaw lurking in the dark corners of
unmaintained code.

> > ??? Making obsolete gets() is "inventing a new language"???
>
> If you make enough such changes, especially changing the
> scanner, then yes you have in effect a different language.
>
> My point is that you have given insufficient justification
> for meddling with such things. Every programmer on the
> planet has his own idea about what a PL "ought" to look
> like, so just wanting something changed just because you
> don't happen to like it is a non-starter.

But not every programmer is a compiler developer...
I guess that's the way C++ got non-started ;-)

> > You yourself want to change it, (see above). Why when I propose
> > the same thing I am "inventing a new language"?
>
> My change would not affect the usage of the current feature,
> except to render some instances of undefined behavior well
> defined.

So would this:

#define gets(buf) abort("gets invoked!\n")

> > ... trigraphs are just nonsense that got into the standard
> > because I do not know what danish terminal manufacturers
> > problem...
>
> The problem was actually one for Scandinavian
> *programmers*, among others. Due to their having
> larger linguistic character sets than English has, but
> sharing general dimensions for keyboards (an economic
> constraint), they didn't have some of the special
> punctuation characters available. Since C was
> originally designed in an environment where keyboards
> did have all those special characters, in those other
> environments it was hard to efficiently create C source
> code. So there was always a genuine problem, brought
> to light when internationalizing the specification.

So you let the Scandinavians have their way, while the rest of Europe's non
English speakers coped with the very same issue without much pain. They had
basically 3 solutions:
- get used to typing funny local letters instead of { } [ ] etc. It takes some
getting used to reading æ*argvÆ1Å='Ø0';å as {*argv[1]='\0';} but many people
did, and the trigraph version is even worse.
- get a QWERTY keyboard, the only sensible solution for unix programmers anyway.
- get an 8 bit character set with a full ASCII subset: that was a strikingly
good idea of IBM's to include that in their PC from day 1 in 1980... it took a
while for printers to catch up, but by 1989 who still needed to cope with
ISO-646 ?

> The ability to express programs in the universal
> invariant character set is not "nonsense". The only
> available alternatives were to change C's operators to
> different characters, or to define escape sequences
> (the latter being what we did, as the most acceptable
> and least obtrusive for people in more benign
> environments).

The alternative was to specify the required character set and forget about
environments that didn't support it.

What about <iso646.h> ? What need was there to pollute the standard with such
useless crap ?

> What about the later C++-inspired digraphs? They actually
> are worse from a design standpoint, since they're applied
> too far downstream in the phases of translation.

At least they do not introduce side effects.

> Personally I don't think the C standard should have said
> any more on the subject than that the full source
> character set is supported somehow, and left the details
> up to implementations.

I agree !

> However, trigraphs are now a defined part of the language.
> I could support adding another defined trigraph, mapped
> to the ? character; a proper escape sequence should from
> the outset have included a way to escape the escape prefix
> (the lack of which in some contexts is difficult to work
> around). Note that even such an improvement could affect
> *some* existing code; e.g. if "???" were picked, the
> example in the current standard would behave differently.

Exactly ! That's the problem with trigraphs: they DO break existing code and
cause hard to find bugs such as :

const char *template = "??/??/??";
printf("Enter a date using the template %s: ", template);

Many programmers will be unable to debug what they will diagnose as data
corruption.
Please save us from such an "improvement".

--
Chqrlie.

Jordan Abel

unread,

May 30, 2006, 10:08:08 PM5/30/06

to

2006-05-30 <lnk683y...@nuthaus.mib.org>, Keith Thompson wrote:
> I understand why trigraphs were introduced. In hindsight, I think I
> would have done them differently. Specifically, enable them only if
> there's a directive of some kind within the source file that
> specifically requests them. The obvious form of the directive would
> be:
>
> ??=pragma STDC TRIGRAPHS ON

How about enable them if the first character of the source file is ?,
just like in pre-ansi compilers preprocessing was enabled by the first
character being #.

Charlie Gordon

unread,

May 31, 2006, 2:13:35 AM5/31/06

to

"Jordan Abel" <ran...@random.yi.org> wrote in message
news:slrne7pur4...@random.yi.org...

Do you mean the first non-white character, ignoring comments ?

jacob navia

unread,

May 31, 2006, 5:54:02 AM5/31/06

to

Douglas A. Gwyn a écrit :

Yes. That would be reasonable.

Error specification is an *essential* part of the standard. This has
been neglected too much. C has become an example of "anything goes".

Charlie Gordon

unread,

May 31, 2006, 7:00:47 AM5/31/06

to

"jacob navia" <ja...@jacob.remcomp.fr> wrote in message
news:447d67ba$0$18334$8fcf...@news.wanadoo.fr...

I agree, but using a pseudo global variable for error checking is an original
design flaw of the standard library. It is non-functional, non-reentrant,
requires a hack for multi-threaded programs... the semantics are poorly
understood if not altogether obscure :

7.5 3 : "The value of errno is zero at program startup, but is never set to zero
by any library
function. The value of errno may be set to nonzero by a library function call
whether or not there is an error, provided the use of errno is not documented in
the
description of the function in this International Standard."

How many newbies will set errno to 0 before the call to strtol() and friends ?

--
Chqrlie.

SuperKoko

unread,

May 31, 2006, 12:30:33 PM5/31/06

to

Keith Thompson wrote:
> Why does rewind() discard error information? Wouldn't it have been
> both simpler and more useful for rewind() to return the same int
> result as fseek() and to set the error indicator when appropriate?
>

Probably, yes...
But, I think that there is a good reason for that.

Just imagine the C standard at the time where the C standard was not
yet published.
There were a lot (all?) of non-compliant implementations...
And I think that many of them had a rewind() function which returned an
int... And this returned value was a garbage.

If the C standard had said that it returned the same value than fseek,
then, new C89 programmers would have used it, without thinking that it
was dangerous...
And when porting the program to a not-yet-perfectly-compliant compiler,
it would have *silently* changed the meaning of their code, introducing
a bug hard (or impossible) to diagnose.
I think that this was deemed too dangerous to be accepted.

In some way, we can say that there were a "reference" before the ANSI
standard.
It was the common subset of all compilers (which was more or less
described by the K&R book).
The comittee doesn't like silent change, and in the previous
"reference", rewind was returning a value without meaning.
Specifying the value would be problematic as I already said.

But specifying that it returns a value that nobody should use is ugly
(and I think that the rationale is here).
The most sensible way to do that was to say that it returns void.

Now that everybody (or almost) uses a C89-compliant implementation, we
may change the specification of rewind to make it return a meaningful
value...
However, I don't think it worth the change.
It's better to use fseek now.

Douglas A. Gwyn

unread,

May 31, 2006, 3:50:23 PM5/31/06

to

Keith Thompson wrote:
> I'd be interested in the details of your proposal, or even a brief
> overview. Is it published anywhere?

I put out a feeler on the C standards mailing list, to see if
there was enough interest and support to justify filing a DR
so that we could address the issue.

The idea is that gets would consume input through newline
(or up until EOF, whichever comes first) just as currently
specified, but no more than the first BUFSIZ bytes would be
transferred to the buffer. That doesn't magically fix *all*
existing usage (protecting against buffer overrun), but it
does fix a *lot* of it, and provides an entirely safe way to
use gets that happens to match the typical way it has been
used historically.

One could also require errno setting upon (avoided) overrun,
but I'm not proposing that initially. (Required errno
setting ought to be considered in a more general context,
which would include among other things the rewind() spec.)

> [Digraphs] aren't a complete solution to the problem of a

> keyboard with no '#' key, though.

Indeed. My preference is for any source-character
transliteration to be performed at translation phase 0,
where the problem actually lies, rather than embedding the
process within later translation phases.

Douglas A. Gwyn

unread,

May 31, 2006, 3:51:26 PM5/31/06

to

Charlie Gordon wrote:
> There is no use defending trigraphs at this point: the only sensible way to
> handle them is what gcc does with warnings enabled: issue a warning and ignore
> them.

It's even more sensible to warn about them and also translate them
properly in accordance with the language spec.

Douglas A. Gwyn

unread,

May 31, 2006, 4:34:27 PM5/31/06

to

Charlie Gordon wrote:
> #define gets(buf) abort("gets invoked!\n")

No, that breaks working code in benign environments.

> So you let the Scandinavians have their way, ...
> ... by 1989 who still needed to cope with ISO-646 ?

The decision was made around 1987, and the input from those
participating was that it was still enough of a problem for
their clients that at least one national body would vote
against ratification of the standard if it did not address
the issue. There was considerable debate and discussion
before the final decision. You may call that "politics",
but there was a process involved that required lots of
negotiations all along the way. We didn't have a C "czar",
and even if there had been one it isn't obvious that his
spec would have met actual user requirements as well,
overall, as the one reached through the ISO process.

A lot of people who think something is obvious to them
don't realize that there are factors they haven't considered,
or they dismiss those factors as unimportant, though they
might be very important for others. That's why a process
involving discussion and consensus is important for the
overall acceptability of the end result.

> What about <iso646.h> ? What need was there to pollute the
> standard with such useless crap ?

That was a stage in the further evolution of the battle over
trigraphs etc. once the C++ standards group got involved.
Again, it was essentially a political compromise in order to
gain a sufficient degree of acceptance among all involved.
If you don't want to use <iso646.h> you're free not to use it.

> const char *template = "??/??/??";
> printf("Enter a date using the template %s: ", template);
> Many programmers will be unable to debug what they will diagnose as data
> corruption.
> Please save us from such an "improvement".

I think the confusion would be due mainly to some people
constantly urging programmers to ignore trigraphs. As a
standard part of the language, they need to be understood.
Certainly other parts of C are much more complicated.

Douglas A. Gwyn

unread,

May 31, 2006, 4:39:04 PM5/31/06

to

jacob navia wrote:
> Error specification is an *essential* part of the standard. This has
> been neglected too much. C has become an example of "anything goes".

Not really. Note that when a standard function interface
had been invented by the committee, error indication would
be part of its spec. It is only for "legacy" interfaces
that we were often stuck with merely documenting rather
than improving. Even adding an errno-setting requirement
could be controversial, since errno is really a bad model
for error reporting. (But it's one of the things that
the legacy left us with, so it might be acceptable in
conjuction with legacy functions such as rewind().)

C is about programming freedom. If the programmer abuses
the freedom, it's not C's fault.

Douglas A. Gwyn

unread,

May 31, 2006, 4:40:21 PM5/31/06

to

Charlie Gordon wrote:
> How many newbies will set errno to 0 before the call to strtol() and friends ?

How many newbies will do anything right? Proper use of errno
is one of the things that need to be learned/taught.

Keith Thompson

unread,

May 31, 2006, 5:14:52 PM5/31/06

to

"Douglas A. Gwyn" <DAG...@null.net> writes:

> Keith Thompson wrote:
>> I'd be interested in the details of your proposal, or even a brief
>> overview. Is it published anywhere?
>
> I put out a feeler on the C standards mailing list, to see if
> there was enough interest and support to justify filing a DR
> so that we could address the issue.
>
> The idea is that gets would consume input through newline
> (or up until EOF, whichever comes first) just as currently
> specified, but no more than the first BUFSIZ bytes would be
> transferred to the buffer. That doesn't magically fix *all*
> existing usage (protecting against buffer overrun), but it
> does fix a *lot* of it, and provides an entirely safe way to
> use gets that happens to match the typical way it has been
> used historically.

Which requires anyone using it to know that the target string has to
be at least BUFSIZ bytes long. Most programmers who use gets() are
more likely to just assume that nobody would ever type more than 80,
or 200, characters on a line, assuming they think about it at all.

And it would break any existing code that uses gets() with a buffer
bigger than BUFSIZ bytes.

Anyone clever enough to use your proposed revised gets() safely is
more than clever enough to avoid using gets() at all.

gets() should be removed from the language, not "fixed". Its only
advantage over fgets() is that it removes the trailing newline; if
that's an important feature, provide another version of fgets() that
does that. Or, better yet, standardize one of the several safe
arbitrary-length line input functions that are floating around, such
as ggets().

jacob navia

unread,

May 31, 2006, 5:28:13 PM5/31/06

to

Keith Thompson a écrit :

> "Douglas A. Gwyn" <DAG...@null.net> writes:
>
>>Keith Thompson wrote:
>>
>>>I'd be interested in the details of your proposal, or even a brief
>>>overview. Is it published anywhere?
>>
>>I put out a feeler on the C standards mailing list, to see if
>>there was enough interest and support to justify filing a DR
>>so that we could address the issue.
>>
>>The idea is that gets would consume input through newline
>>(or up until EOF, whichever comes first) just as currently
>>specified, but no more than the first BUFSIZ bytes would be
>>transferred to the buffer. That doesn't magically fix *all*
>>existing usage (protecting against buffer overrun), but it
>>does fix a *lot* of it, and provides an entirely safe way to
>>use gets that happens to match the typical way it has been
>>used historically.
>
>
> Which requires anyone using it to know that the target string has to
> be at least BUFSIZ bytes long. Most programmers who use gets() are
> more likely to just assume that nobody would ever type more than 80,
> or 200, characters on a line, assuming they think about it at all.
>
> And it would break any existing code that uses gets() with a buffer
> bigger than BUFSIZ bytes.
>
> Anyone clever enough to use your proposed revised gets() safely is
> more than clever enough to avoid using gets() at all.
>

I agree with this. Fixing gets() is kind of a hack.

I repeat:

Experience teach us that certain functions are MISTAKES.

As everybody knows, the comitee is not PERFECT and has done
(and will do) mistakes. The solution is not to try to keep the bugs
forever but to CORRECT them by eliminating the bugs from the
standard.

Therefore we will have an OBSOLESCENT FEATURES Appendix, where
some features like gets() belong. Once there, they can rest there
for quite a long time, or maybe indefinitely.

The OBSOLESCENT appendix is NOT normative but many compilers will
follow it to reamin compatible with very old code.

I have been programming in C for more than 20 years now, and I have YET
to see a program that uses trigraphs. I have never been shown one
and I doubt that any such a program exists.

AND IF IT DOES, it is *TRIVIAL* to eliminate the trigraphs isn't it?

I am sure scandinavian programmers have NO NEED NOW for any such
a construct so trigraphs belong in the OBSOLESCENT Appendix.

> gets() should be removed from the language, not "fixed". Its only
> advantage over fgets() is that it removes the trailing newline; if
> that's an important feature, provide another version of fgets() that
> does that. Or, better yet, standardize one of the several safe
> arbitrary-length line input functions that are floating around, such
> as ggets().
>

EXACTLY.

This is such a simple task, why should we get stuck with such a wart
FOREVER ???

jacob

jacob navia

unread,

May 31, 2006, 5:37:24 PM5/31/06

to

Douglas A. Gwyn a écrit :
>

>>const char *template = "??/??/??";
>>printf("Enter a date using the template %s: ", template);
>>Many programmers will be unable to debug what they will diagnose as data
>>corruption.
>>Please save us from such an "improvement".
>
>
> I think the confusion would be due mainly to some people
> constantly urging programmers to ignore trigraphs. As a
> standard part of the language, they need to be understood.
> Certainly other parts of C are much more complicated.

Yes but since it goes against ALL "natural experience" of
the language is very difficult to see.

We KNOW that the compiler doesn't look into comments or
into character strings. Comments are IGNORED and we do NOT
search bugs there. So when the program fails because somebody wrote
a comment line
//?????/ What is this crap!

it is NOT OBVIOUS to anyone, even with experienced programmers
like me, to see what is going on. Luckily I had the source
of the compiler and preprocessor, and discovered it. But it
wasn't surely in my mind.

What bothers me is that this goes against all those basic rules that we
instinctively KNOW about C:

1) String literals are translated AS IS into data. The compiler doesn't
look into them nor the preprocessor.

2) Comments are COMMENTS and ignored by the compiler.

WHY KEEP THIS ?????

Why must we remain compatible with the bugs of the past?

Let's improve the language for a change. And improving means
FIXING the bugs, i.e. getting rid of them. I just do not understand
why somebody sensible and intelligent like you Mr Gwyn would be
so attached to this kind of warts and wage endless discussions
about features that you perfectly know are just historical errors.

jacob

Jordan Abel

unread,

May 31, 2006, 6:22:55 PM5/31/06

to

No. I meant the first character of the first line of the file. That's
how it worked for # and preprocessing Back In The Day, based on what
i've read. Something like

??=

/* Comments go here */

??=include <stdio.h>

int main(int argc, char *argv??(??)) ??<
printf("Hello, world!");
return 0;
??>

Douglas A. Gwyn

unread,

May 31, 2006, 6:22:16 PM5/31/06

to

Keith Thompson wrote:
> Which requires anyone using it to know that the target string has to
> be at least BUFSIZ bytes long. Most programmers who use gets() are
> more likely to just assume that nobody would ever type more than 80,
> or 200, characters on a line, assuming they think about it at all.

First, evidence on hand is that BUFSIZ is already quite commonly
used for such buffers. And *every* facility requires that
programmers know how to use it.

> And it would break any existing code that uses gets() with a buffer
> bigger than BUFSIZ bytes.

Not unless input lines were bigger than BUFSIZ. Evidence on
hand also indicates that there is nearly no code using gets that
legitimately uses a buffer larger than BUFSIZ.

The purpose is not to fix every possible existing (broken) usage
of the facility; it's to specify a safe and convenient way to use
the facility in the furture, that incidentally happens to already
match much of the existing usage.

> Anyone clever enough to use your proposed revised gets() safely is
> more than clever enough to avoid using gets() at all.

gets performs a common useful function more conveniently than
fgets, is reportedly used in many tutorials (with BUFSIZ buffers),
and is part of the long-standardized legacy interface. Fixing
its overrun problem without interfering with its utility would
be a positive development. In particular it would be nice to
have official sanction for implementations to impose the transfer
limit, automatically fixing the overrun problem in many of the
dynamically linked apps that happen to be lurking around
awaiting hacker discovery and exploitation. If you are actually
concerned about buffer overrun you should appreciate that.

Douglas A. Gwyn

unread,

May 31, 2006, 6:38:57 PM5/31/06

to

jacob navia wrote:
> Experience teach us that certain functions are MISTAKES.

Many of the legacy interfaces are imperfect. Perfection
is not the only goal of the standard.

> ... The solution is not to try to keep the bugs forever but

> to CORRECT them by eliminating the bugs from the standard.

There is an effective process for addressing actual defects.
Not agreeing with your particular prejudices is not a bug!

> I have been programming in C for more than 20 years now, and I have YET
> to see a program that uses trigraphs. I have never been shown one

I posted an example recently in this newsgroup.
It was based on actual code.

> AND IF IT DOES, it is *TRIVIAL* to eliminate the trigraphs isn't it?

You can always run C source code through a filter that maps
trigraph sequences to the corresponding characters in the
codeset used on your platform. Indeed, it might have been
a better design to mandate something like trigraph encoding
as a *source exchange* format, rather than embedded it into
the phases of translation during actual compilation.

However, you keep missing the point: making *any* change
has an impact, and you need to *justify* the change in terms
of how much its benefits would outweigh its adverse effects.
So far I haven't heard any rational justification, other
than perhaps that programmers who don't know the language
might be surprised by trigraph processing. That's not a
very strong argument in favor of a change.

> This is such a simple task, ...

So is setting fire to your home. That doesn't mean that
you should do it.

Douglas A. Gwyn

unread,

May 31, 2006, 7:13:46 PM5/31/06

to

jacob navia wrote:
> //?????/ What is this crap!

I use compilers that correctly process trigraphs, and have never
run into such a problem with either my own code or code I've
imported from elsewhere. There were actually more (similar)
problems when enabling support for //-comments, but I don't hear
complaints about that.

> instinctively KNOW about C:
> 1) String literals are translated AS IS into data. The compiler doesn't
> look into them nor the preprocessor.

Wrong. String literals and character constants can include
escape sequences other than trigraphs. E.g. "\n".

> 2) Comments are COMMENTS and ignored by the compiler.

Wrong. Comments are turned into single spaces and can have
an effect. E.g. foo/**/bar is not a single identifier, and
each part may be individually macro-expanded.

> Why must we remain compatible with the bugs of the past?

They're not bugs, they're intentional parts of the established
specification. Programmers have rightly relied on these
guarantees for decades. If you change such things, you
adversely impact some unknown amount of existing *carefully
written* code, in order to obtain nearly no compensating
benefit.

A change like removal of implicit int in declarations had
sufficient compensating advantage that it was approved by
committee and welcomed by most of the client community,
despite having substantial impact on existing code (but
that was ameliorated by being able to issue a warning and
continue to accept such code with unchanged semantics).
So changes are made, when clearly justified. What you
haven't shown is sufficient justifcation for your
proposed changes.

> ... I just do not understand

> why somebody sensible and intelligent like you Mr Gwyn would be
> so attached to this kind of warts and wage endless discussions
> about features that you perfectly know are just historical errors.

I have explained that, more than once, but you don't seem
to be hearing me. Why I spend time responding is to reach
people who, not hearing the whole story, might otherwise be
tricked into agreeing with you, potentially causing a lot
of problems downstream.

In the early days of the C standard, we had a lot of
trouble with GCC developers who thought they knew better
than the standards committee, and insisted on not conforming
to the standard in various ways. Since then, however, the
marketplace seems to have more or less straightened that
out.

It's one thing to have opinions; it's another to insist
that everybody else must agree with you. The standards
process involves the weighing of conflicting points of
view to arrive at a consensus. Among the purposes of
having a *process* for language standardization is
*stability* of the base specification. Unnecessary
changes are *bad* (practically/economically).

Douglas A. Gwyn

unread,

May 31, 2006, 7:20:19 PM5/31/06

to

Jordan Abel wrote:
> >> How about enable them if the first character of the source file is ?,
> >> just like in pre-ansi compilers preprocessing was enabled by the first
> >> character being #.

> ... I meant the first character of the first line of the file. That's

> how it worked for # and preprocessing Back In The Day, based on what
> i've read.

By 7th Edition Unix (1978), the C compiler always invoked the
preprocessor. (It did behave as you describe in some earlier
versions.)

When Stroustrup created "C with classes", the C preprocessor
reported a special exit status if it had encountered any
#class directive, at which point the compiler driver would
next invoke the classes preprocessor, before the C language
translator passes.

I'm not a fan of your proposal at this point, although it
might have been a good idea if something along those lines
had been included in the original C standard.

Jordan Abel

unread,

May 31, 2006, 9:32:06 PM5/31/06

to

2006-05-31 <447E24B3...@null.net>, Douglas A. Gwyn wrote:
> Jordan Abel wrote:
>> >> How about enable them if the first character of the source file is ?,
>> >> just like in pre-ansi compilers preprocessing was enabled by the first
>> >> character being #.
>> ... I meant the first character of the first line of the file. That's
>> how it worked for # and preprocessing Back In The Day, based on what
>> i've read.
>
> By 7th Edition Unix (1978), the C compiler always invoked the
> preprocessor. (It did behave as you describe in some earlier
> versions.)

Trigraphs don't really deserve any more than this, though.

> When Stroustrup created "C with classes", the C preprocessor
> reported a special exit status if it had encountered any
> #class directive, at which point the compiler driver would
> next invoke the classes preprocessor, before the C language
> translator passes.
>
> I'm not a fan of your proposal at this point, although it
> might have been a good idea if something along those lines
> had been included in the original C standard.

Eh.

It would probably break exactly _no_ real code, though, to only process
trigraphs starting with the first one that appears outside of a comment
or string literal. I bet it would break very little to process them as
tokens like digraphs (i.e. ??= ??=??= ??- ??-= ??! ??!= and so on)

James Dennett

unread,

Jun 1, 2006, 1:19:13 AM6/1/06

to

Douglas A. Gwyn wrote:
> Keith Thompson wrote:
>> Which requires anyone using it to know that the target string has to
>> be at least BUFSIZ bytes long. Most programmers who use gets() are
>> more likely to just assume that nobody would ever type more than 80,
>> or 200, characters on a line, assuming they think about it at all.
>
> First, evidence on hand is that BUFSIZ is already quite commonly
> used for such buffers.

It is? Every single use of gets() I've ever seen has used
a smaller buffer than this (and every single one has been
a defect). What's the evidence that BUFSIZ is common?

> And *every* facility requires that
> programmers know how to use it.

But some make it more obvious that there's something you need
to know; just as it's easy for many to ignore the need to set
errno to 0 before making a call for which you might need to
check errno afterwards (as that's non-local knowledge, not
even hinted at in the function's prototype), so a function
which has a buffer size as an argument is more obvious to
most users than one which has a (documented) magic number for
its buffer size.

>> And it would break any existing code that uses gets() with a buffer
>> bigger than BUFSIZ bytes.
>
> Not unless input lines were bigger than BUFSIZ.

The biggest problem of gets() is, of course, input that is
deliberately crafted to create buffer overruns. Such input
will commonly be larger than BUFSIZ.

> Evidence on
> hand also indicates that there is nearly no code using gets that
> legitimately uses a buffer larger than BUFSIZ.
>
> The purpose is not to fix every possible existing (broken) usage
> of the facility; it's to specify a safe and convenient way to use
> the facility in the furture, that incidentally happens to already
> match much of the existing usage.

If a goal is to reduce the cost of security breaches caused
by buffer overruns (and that's a valuable goal in monetary
terms), then this doesn't seem to be an effective solution.

>> Anyone clever enough to use your proposed revised gets() safely is
>> more than clever enough to avoid using gets() at all.
>
> gets performs a common useful function more conveniently than
> fgets, is reportedly used in many tutorials (with BUFSIZ buffers),
> and is part of the long-standardized legacy interface. Fixing
> its overrun problem without interfering with its utility would
> be a positive development. In particular it would be nice to
> have official sanction for implementations to impose the transfer
> limit, automatically fixing the overrun problem in many of the
> dynamically linked apps that happen to be lurking around
> awaiting hacker discovery and exploitation. If you are actually
> concerned about buffer overrun you should appreciate that.

Good linkers already warn about use of gets, which has had
some beneficial effect. Eliminating it would be more effective.
The cost of the change is a factor, but in this case the cost
saving associated with it is quite possibly very large.

Keeping easily misused functions in C will be taken by many,
rightly or wrongly, as evidence that we don't care enough
about buffer overruns. And in my experience, programmers
often fail to make correct use of functions which require
buffers of a specified minimum size unless they are required
to pass the actual buffer size to the function.

-- James

have documented

Richard Tobin

unread,

Jun 1, 2006, 5:03:36 AM6/1/06

to

In article <nNufg.102283$iU2.24494@fed1read01>,
James Dennett <jden...@cox.net> wrote:

>>> And it would break any existing code that uses gets() with a buffer
>>> bigger than BUFSIZ bytes.

>> Not unless input lines were bigger than BUFSIZ.

>The biggest problem of gets() is, of course, input that is
>deliberately crafted to create buffer overruns. Such input
>will commonly be larger than BUFSIZ.

Breaking that code is exactly what we want!

-- Richard

Ivan A. Kosarev

unread,

Jun 1, 2006, 5:18:39 AM6/1/06

to

"Douglas A. Gwyn" <DAG...@null.net> wrote in message

news:447E232A...@null.net...

>> instinctively KNOW about C:
>> 1) String literals are translated AS IS into data. The compiler doesn't
>> look into them nor the preprocessor.
>
> Wrong. String literals and character constants can include
> escape sequences other than trigraphs. E.g. "\n".

Besides, other characters are often a subject of actual translation to
execution character set. Such a translation may involve multibyte character
handling on both input and output sides of the process. Moreover, wide and
narrow string literals may have different translation mechanisms.

Another note is that the preprocessor does look into string literals when
stringizing. The intention is to preserve original form of stringized tokens
which seems to be pragmatic.

Should we eliminate these complications to make C more "instinctive"?

--
Unicals Group -- Embedded C++ for Your IP Cores
http://www.unicals.com

Richard Bos

unread,

Jun 1, 2006, 5:29:27 AM6/1/06

to

ric...@cogsci.ed.ac.uk (Richard Tobin) wrote:

I'd prefer it that any code that uses gets() gets broken. If you really
want a hole in your system (or your head) you can always dig up a
pre-ISO compiler somewhere.

Richard

Charlie Gordon

unread,

Jun 1, 2006, 5:46:02 AM6/1/06

to

"Douglas A. Gwyn" <DAG...@null.net> wrote in message

news:447DF3BE...@null.net...

I disagree : most warnings get ignored by lazy programmers.
Trigraphs have a very small probability of being there on purpose, ignoring them
but providing the warning will generate the intended behaviour in most cases.
If trigraphs are present on purpose, they will occur in places where ignoring
them will yield compiler errors : the programmer will quickly understand the
issue and turn trigraph support on to fix the problem.

gcc's approach is just pragmatic and efficient. The standard is bogus and error
prone on this issue.

Chqrlie.

Charlie Gordon

unread,

Jun 1, 2006, 5:50:08 AM6/1/06

to

"Douglas A. Gwyn" <DAG...@null.net> wrote in message

news:447DFF35...@null.net...

Yes, but the C language makes it especially hard to become a learnt programmer
that never forgets about all these subtleties.

Chqrlie.

Francis Glassborow

unread,

Jun 1, 2006, 6:27:24 AM6/1/06

to

In article <nNufg.102283$iU2.24494@fed1read01>, James Dennett

<jden...@cox.net> writes

>It is? Every single use of gets() I've ever seen has used
>a smaller buffer than this (and every single one has been
>a defect). What's the evidence that BUFSIZ is common?

It was certainly common in a number of introductory books on C written
in the middle part of the last decade because I noticed it when
reviewing those books.

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

Francis Glassborow

unread,

Jun 1, 2006, 6:32:28 AM6/1/06

to

In article <447eb850$0$12762$636a...@news.free.fr>, Charlie Gordon
<ne...@chqrlie.org> writes

>Yes, but the C language makes it especially hard to become a learnt programmer
>that never forgets about all these subtleties.

You mean like English makes it especially difficult to become a
competent writer who never forgets the subtle rules of its grammar and
syntax?

David R Tribble

unread,

Jun 1, 2006, 12:29:17 PM6/1/06

to

Douglas A. Gwyn wrote:
> The purpose is not to fix every possible existing (broken) usage
> of the facility; it's to specify a safe and convenient way to use
> the facility in the furture, that incidentally happens to already
> match much of the existing usage.
>

> gets performs a common useful function more conveniently than
> fgets, is reportedly used in many tutorials (with BUFSIZ buffers),
> and is part of the long-standardized legacy interface.

Existing code was broken by C99, specifically the elimination of
implicit int in variable and function declarations. So it's not like
ISO C has not been down this road before.

Requesting that gets() and other "unsafe" function be marked as
obsolescent in the next revision of the C library is not unreasonable.
Deprecating them does not actually eliminate them outright; they
remain valid for at least one more revision, which means that any
function so marked could not disappear from ISO C for at least
another 15 to 20 years. And even if functions did disappear from
ISO C, there is no reason to believe that vendors would not continue
to provide them as long as customers needed them (even for very
old legacy code).

The library continues to evolve, hopefully for the better, and
ISO C has a proven track record for dealing with obsolescent
features. I personally don't think that the reasons for keeping
gets() outweigh the reasons for deprecating it.

-drt

Douglas A. Gwyn

unread,

Jun 1, 2006, 12:07:20 PM6/1/06

to

James Dennett wrote:
> ... Every single use of gets() I've ever seen has used

> a smaller buffer than this (and every single one has been
> a defect). What's the evidence that BUFSIZ is common?

Examples in tutorial books. Legacy code.

There are two general gets programming scenarios:
(1) need a "sufficiently large" buffer: BUFSIZ is the obvious
choice, and the one I've made myself (when using gets in
"throw-away" quick apps).
(2) assume input won't exceed some human-interface oriented
line width, 80 being probably the most obvious choice
(until windowing systems became widespread): whatever
buffer size is assumed, *even if fgets has been used
the programmer would not be checking for overflow*, as
that is inconsistent with his assumption.

Nothing can be done about case (2), not even banishing gets.
Case (1) is what I want to address.

> >> And it would break any existing code that uses gets() with a buffer
> >> bigger than BUFSIZ bytes.
> > Not unless input lines were bigger than BUFSIZ.
> The biggest problem of gets() is, of course, input that is
> deliberately crafted to create buffer overruns. Such input
> will commonly be larger than BUFSIZ.

You didn't understand the point to which I was responding.
The immediate issue was what legitimate usage could be
broken by failing to transmit more than BUFSIZ characters,
not what illegitimate exploitation would be prevented.

> If a goal is to reduce the cost of security breaches caused
> by buffer overruns (and that's a valuable goal in monetary
> terms), then this doesn't seem to be an effective solution.

It's very cost-effective in the (unknown quantity of) cases
where existing vulnerabilities would automatically be removed
when the DLLs were updated to impose the limit.

It's slightly cost-effective for new code, in that there
would be a standard safe way to read the contents of a line
from stdin without having every time to scan for a new-line
and remove it.

The goal of the one proposal is not to fix every possible
problem, just to address one particularly notorious one in
a more intelligent way than dropping the spec for a long-
established standard facility.

> Good linkers already warn about use of gets, ...

Really? How does a special-case hack for political purposes
make a linker "good"? In my universe, good linkers do the
well-defined job of link editing, and don't make noises
unrelated to problems in performing that job.

> Keeping easily misused functions in C will be taken by many,
> rightly or wrongly, as evidence that we don't care enough
> about buffer overruns.

Hardly, when the proposed change explicitly addresses
exactly that issue.

The general issue of buffer overruns involves much more
than can even begin to be addressed by removing gets.
Indeed, there is a TC in the works now that addresses the
issue more comprehensively. But no technical solution
will matter if the educational issue isn't addressed: the
problem lies in how some programmers *think* (or fail to),
not in the language as such (which can be safely used by
other programmers). From the educational point of view,
gets performs a great service by being a standard point
of reference when discussing the buffer overrun issue.

David R Tribble

unread,

Jun 1, 2006, 12:49:55 PM6/1/06

to

>> Pre-ANSI implementations of rewind() obviously would not have been
>> declared to return void (since void hadn't been invented yet). Most
>> likely they would have implicitly returned int. Did the actual
>> implementations just not return a value?
>

Douglas A. Gwyn wrote:
> Actually void predated the C standard; it was introduced into AT&T
> C compilers not long after 7th Edition Unix (say, around 1979).
>
> Before void type was introduced, there was a lot of muddle about
> whether or not (default-int) functions returned a value or not.
> If one examines the 7th Edition Unix C library source code, one
> sees that in fact nothing was (intentionally) returned by rewind,
> so code that used rewind must have treated it the same as we now
> consider "returning void". In later versions of Unix, rewind's
> return type was explicitly stated to be void.

I would guess that other implementations did something like:

#define rewind(fp) fseek(stream, 0L, SEEK_SET)

which is similar to the specification in C89 but without the cast
to void. Since fseek() returns an int, old implementations of
rewind() probably also returned an int (but which was ignored
by most programs anyway).

At any rate, IMHO all the standard I/O functions should return an
error indication, and there are three ways to do this:
- explicit return value
- setting errno
- setting the stream error flag

Functions that do none of these should be changed to do so.
In the case of rewind(), the second option is the way to go.

-drt

Douglas A. Gwyn

unread,

Jun 1, 2006, 3:25:35 PM6/1/06

to

David R Tribble wrote:
> Existing code was broken by C99, specifically the elimination of
> implicit int in variable and function declarations.

Yes, I discussed that previously in this discussion. It was
an example of how changes that can adversely affect existing
code *are* (rarely) made, but only when the expected benefit
overwhelms the expected costs. (Actually I was the one who
volunteered to act as proponent for removal of implicit int,
although nobody was more surprised than I was when there was
so little objection to that proposal.)

> Requesting that gets() and other "unsafe" function be marked as
> obsolescent in the next revision of the C library is not unreasonable.
> Deprecating them does not actually eliminate them outright; they
> remain valid for at least one more revision, which means that any
> function so marked could not disappear from ISO C for at least
> another 15 to 20 years. And even if functions did disappear from
> ISO C, there is no reason to believe that vendors would not continue
> to provide them as long as customers needed them (even for very
> old legacy code).

So, little would be gained, in exchange for making support
for gets eventually depend on whims of the implementors.

> features. I personally don't think that the reasons for keeping
> gets() outweigh the reasons for deprecating it.

The default for *any* standardized feature is to retain it.
It actually takes sufficient specific justification to make
a change.

Douglas A. Gwyn

unread,

Jun 1, 2006, 3:31:54 PM6/1/06

to

David R Tribble wrote:
> At any rate, IMHO all the standard I/O functions should return an
> error indication, and there are three ways to do this:
> - explicit return value
> - setting errno
> - setting the stream error flag

All except those that merely manipulate EOF/error indicators
for the streams.

I'm one of the few programmers I know of who check whether
fclose() (or close() under POSIX) succeeds; many aren't aware
of the fact that failure is a very real possibility.

> Functions that do none of these should be changed to do so.

Changing any standard function's type signature causes too many
problems.

> In the case of rewind(), the second option is the way to go.

And although it's an unfortunate mechanism, it's consistent
with rewind() being a legacy function anyway.

kuy...@wizard.net

unread,

Jun 1, 2006, 3:58:12 PM6/1/06

to

Douglas A. Gwyn wrote:
...

> So, little would be gained, in exchange for making support
> for gets eventually depend on whims of the implementors.

Making it unreliably available would tend to discourage it's use -
which is precisely the effect desired by the people you're arguing
with, so it doesn't count as a disadvantage.

Keith Thompson

unread,

Jun 1, 2006, 4:09:46 PM6/1/06

to

"Douglas A. Gwyn" <DAG...@null.net> writes:

> David R Tribble wrote:
[...]

>> features. I personally don't think that the reasons for keeping
>> gets() outweigh the reasons for deprecating it.
>
> The default for *any* standardized feature is to retain it.
> It actually takes sufficient specific justification to make
> a change.

In my opinion, there is more than enough sufficient specific
justification for removing gets() from the standard, or at least
deprecating it.

As far as I can tell, it's also the opinion of everyone who's
discussed it except you.

Michal Necasek

unread,

Jun 1, 2006, 4:43:10 PM6/1/06

to

Keith Thompson wrote:

> As far as I can tell, it's also the opinion of everyone who's
> discussed it except you.
>

I agree with Douglas Gwyn (just to disprove your point) ;)

For the life of me, I can't figure out what would be gained by
removing gets() from the standard (as opposed to providing better
alternatives). Is anyone holding a gun to people's heads, forcing them
to use gets()? Are C programmers children who cannot think for
themselves and can't decide if gets() is something they should be using
or not? Do they need parents who "know better"?

Either gets() is so horrible and useless that no one in their right
mind would ever use it. In that case, it won't be used regardless of
whether it's available or not (need I point out that trigraphs, despite
being widely available, aren't being used?). Or gets() does actually
have some marginal utility, and in that case, I really can't see how its
removal would be justified. Which is it?

Michal

Keith Thompson

unread,

Jun 1, 2006, 5:01:08 PM6/1/06

to

Michal Necasek <mic...@scitechsoft.com> writes:
[...]

> Either gets() is so horrible and useless that no one in their right
> mind would ever use it. In that case, it won't be used regardless of
> whether it's available or not (need I point out that trigraphs,
> despite being widely available, aren't being used?). Or gets() does
> actually have some marginal utility, and in that case, I really can't
> see how its removal would be justified. Which is it?

gets() is so horrible and useless that no one in their right mind

would ever use it. People use it anyway. Having it in the standard
encourages this.

kuy...@wizard.net

unread,

Jun 1, 2006, 5:43:02 PM6/1/06

to

Michal Necasek wrote:
...

> Either gets() is so horrible and useless that no one in their right
> mind would ever use it.

You're assuming a) that everyone who writes C code is in their right
mind and b) that everyone who is in their right mind and writes C code
is aware of the reasons why using gets() is a bad idea. Neiither of
those assumptions is very accurate.

In that case, it won't be used regardless of
> whether it's available or not (need I point out that trigraphs, despite
> being widely available, aren't being used?). Or gets() does actually
> have some marginal utility, and in that case, I really can't see how its
> removal would be justified. Which is it?

The marginal utility of gets() is negative. There are, in principle,
rare and highly non-portable contexts where gets() can be used safely,
but the belief that any particular use of gets() has occurred in such a
context is far more likely to be the result of self-delusion than a
correct awareness of reality. Even in those contexts, it has no
advantage over any of it's alternatives. It's simple existence gives
people the opportunity to make the mistake of using it. Code that would
fail to compile as a result of removing gets() from the standard
library is generally code that makes us safer by failing to compile.

The fact that it's part of the standard is incorrectly percieved by
many people as an endorsement. Removing it from the standard would be a
lot cheaper than the re-education required to correct that
misperception.

Michal Necasek

unread,

Jun 1, 2006, 5:54:36 PM6/1/06

to

Keith Thompson wrote:

> gets() is so horrible and useless that no one in their right mind
> would ever use it. People use it anyway. Having it in the standard
> encourages this.
>

Your statement is self-contradictory. If people use gets(), it is not
useless.

Essentially, what I have a big problem with is your assertion that you
know better than everyone else. Such claims are very rarely true. What's
next? Pointers have to go because they're dangerous and too many people
can't use them properly?

I would much prefer providing a safer alternative to gets() (something
like or identical to gets_s()) and amending gets() documentation with
explanation of its dangers and a pointer to the better solution.

From a practical standpoint, if I was a compiler maintainer, I would
not remove gets() from the runtime library since doing so would render
it non-compliant with C89 and C99, and that would not be appreciated. I
could remove gets() prototype from the headers in C09 (hypothetically
speaking) mode, which would likely produce a warning during compilation,
but that's not very likely to stop the sort of people who are using
gets() in the first place. I don't think building separate runtime libs
for C09 and older standards would be worth the trouble.

What would you do with the gets identifier? Would you make it reserved
somehow?

If I really thought I knew better, I would make the compiler bitch &
moan every time someone even thinks of using gets() - but guess what, a
compiler can do that already (and some do so), with no change to the
standard required!

To sum up, I can see the pain of modifying the standard, but not the
gain. Yeah, gets() is evil, but removing it would create more problems
than it'd solve. There are several crappy interfaces in the C library
(like all the functions with hidden static buffers), but removing them
is more trouble than it's worth.

Michal

Keith Thompson

unread,

Jun 1, 2006, 6:12:46 PM6/1/06

to

Michal Necasek <mic...@scitechsoft.com> writes:
> Keith Thompson wrote:
>
>> gets() is so horrible and useless that no one in their right mind
>> would ever use it. People use it anyway. Having it in the standard
>> encourages this.
>>
> Your statement is self-contradictory. If people use gets(), it is
> not useless.

That doesn't follow.

> Essentially, what I have a big problem with is your assertion that
> you know better than everyone else. Such claims are very rarely
> true. What's next? Pointers have to go because they're dangerous and
> too many people can't use them properly?

I don't assert that I know better than everyone else. I assert that
the majority who think that gets() should be removed make a better
case than the minority who don't.

> I would much prefer providing a safer alternative to gets()
> (something like or identical to gets_s()) and amending gets()
> documentation with explanation of its dangers and a pointer to the
> better solution.

I certainly have no problem with providing a safer alternative to
gets(). Actually, a number of such alteranatives have already been
provided; standardizing one of them would probably be a good thing.
For that matter, we already have fgets().

[...]

> What would you do with the gets identifier? Would you make it
> reserved somehow?

That's an interesting question. I'm not sure what the best answer is.

Michal Necasek

unread,

Jun 1, 2006, 6:31:34 PM6/1/06

to

Keith Thompson wrote:

> I certainly have no problem with providing a safer alternative to
> gets(). Actually, a number of such alteranatives have already been
> provided; standardizing one of them would probably be a good thing.
>

I'm all for that!

> For that matter, we already have fgets().
>

Yes, but there's that thing with '\n'. If it wasn't for that, no one
would probably be using gets() anymore...

>> What would you do with the gets identifier? Would you make it
>> reserved somehow?
>
> That's an interesting question. I'm not sure what the best answer is.
>

What?? You're convinced that removing gets() from the C Standard is the
best thing since sliced bread, and you haven't even thought it through?
Tsk, tsk ;-)

Now seriously. I agree that gets() is evil, shouldn't be used, etc.
etc. I do not agree that removing gets() from the standard library is
the best solution to that problem.

I am also of the opinion that if people insist on shooting themselves
in the foot, they should be allowed to do so. It tends to be a valuable
learning experience.

Michal

David R Tribble

unread,

Jun 1, 2006, 7:14:24 PM6/1/06

to

Michal Necasek wrote:
> Now seriously. I agree that gets() is evil, shouldn't be used, etc.
> etc. I do not agree that removing gets() from the standard library is
> the best solution to that problem.
>
> I am also of the opinion that if people insist on shooting themselves
> in the foot, they should be allowed to do so. It tends to be a valuable
> learning experience.

Even if it's your foot they shoot? Case in point, most of the buffer
overrun bugs in Microsoft Windows, which require service patches
on an ongoing basis. Is that kind of experience valuable to you?

I made a proposal for adding new functions that do not use
global modifiable data (http://david.tribble.com/text/c9xthr.txt),
but to no avail. Note that I did not advocate removing the old
functions, but rather adding new safer ones and deprecating
the old ones.

-drt

Douglas A. Gwyn

unread,

Jun 1, 2006, 6:48:15 PM6/1/06

to

Michal Necasek wrote:
> Your statement is self-contradictory. If people use gets(), it is not
> useless.

Indeed, with the fix I proposed it would be even more useful.

Douglas A. Gwyn

unread,

Jun 1, 2006, 6:58:10 PM6/1/06

to

kuy...@wizard.net wrote:
> The fact that it's part of the standard is incorrectly percieved by
> many people as an endorsement. Removing it from the standard would be a
> lot cheaper than the re-education required to correct that
> misperception.

But the actual problem isn't gets, as witnessed by the large
number of buffer overrun exploits reported that have not
involved gets. Education is what is actually needed (perhaps
augmented by a readily available reliable buffer-handling
library to make compliance with the enlightenment easier).

As with most attempt to fix social ills via legislation, the
anti-gets fanatics divert attention and resources away from a
proper solution addressing the actual cause of the problem,
into ineffective "feel-good" activity that merely delays any
real solution (while the problem may just get worse).

Wojtek Lerch

unread,

Jun 1, 2006, 7:18:03 PM6/1/06

to

"Douglas A. Gwyn" <DAG...@null.net> wrote in message

news:447F10B8...@null.net...

> There are two general gets programming scenarios:
> (1) need a "sufficiently large" buffer: BUFSIZ is the obvious
> choice, and the one I've made myself (when using gets in
> "throw-away" quick apps).

I have to admit that it's not obvious to me at all. Until I checked
recently, I didn't even realize that BUFSIZ is guaranteed to be at least
256 -- if I had to pick a value, I'd probably be more comfortable with a
"reasonable" explicit number such as, I don't know, maybe 256? :)
Apparently, I hadn't read (or remembered) any of those old books that use
BUFSIZ for this purpose, and never noticed before that the minimum values
for BUFSIZ and for the line length that text files must handle are similar
and specified in the same paragraph in the C standard.

To me, the only obvious choice is LINE_MAX; but unfortunately, it's POSIX,
not C. The funny thing is that POSIX requires LINE_MAX to be at least 2048
but allows BUFSIZ to be only 256 -- in a POSIX system, BUFSIZ is likely to
be the wrong choice. I would expect implementors to pick a value for
BUFSIZ based on their idea of the tradeoff between speed and size rather
than on the maximum length of a line of text in the environment.

Keith Thompson

unread,

Jun 1, 2006, 7:25:38 PM6/1/06

to

Michal Necasek <mic...@scitechsoft.com> writes:
[...]

> Now seriously. I agree that gets() is evil, shouldn't be used,
> etc. etc. I do not agree that removing gets() from the standard
> library is the best solution to that problem.
>
> I am also of the opinion that if people insist on shooting
> themselves in the foot, they should be allowed to do so. It tends to
> be a valuable learning experience.

Unfortunately, the bullet is very likely to go through my foot on the
way to its target.

Keith Thompson

unread,

Jun 1, 2006, 8:09:01 PM6/1/06

to

"Douglas A. Gwyn" <DAG...@null.net> writes:

> kuy...@wizard.net wrote:
>> The fact that it's part of the standard is incorrectly percieved by
>> many people as an endorsement. Removing it from the standard would be a
>> lot cheaper than the re-education required to correct that
>> misperception.
>
> But the actual problem isn't gets, as witnessed by the large
> number of buffer overrun exploits reported that have not
> involved gets. Education is what is actually needed (perhaps
> augmented by a readily available reliable buffer-handling
> library to make compliance with the enlightenment easier).

Speaking only for myself, I'm not attempting to fix "the" actual
problem. I'm very well aware that removing gets() from the language
will not fix all buffer overflow problems. It will fix one small
problem: the existence of gets(), which is a uniquely dangerous
function.

> As with most attempt to fix social ills via legislation, the
> anti-gets fanatics divert attention and resources away from a
> proper solution addressing the actual cause of the problem,
> into ineffective "feel-good" activity that merely delays any
> real solution (while the problem may just get worse).

I am not a fanatic. There is no reason that removing or deprecating
gets() should prevent or delay any other work on the standard, any
more than any other useful change should do so.

It has become obvious yet again that this discussion will never get
anywhere.

Michal Necasek

unread,

Jun 1, 2006, 8:12:44 PM6/1/06

to

David R Tribble wrote:

> Even if it's your foot they shoot?
>

How is that avoided by removing gets() from some future standard? You
are assuming that there is either existing exploitable code using gets()
that will get magically fixed because its maintainers will switch to a
new C09 compiler, or that some not yet written code will use gets(), and
it will magically become widespread (else it can't be my foot) despite
being written by totally incompetent programmers. Both assumptions seem
pretty tenuous to me.

The most likely danger to my own foot comes from (bad) existing code
built with existing tools. But there I'm SOL anyway because changing
some future C standard does exactly nothing for this case.

> Case in point, most of the buffer
> overrun bugs in Microsoft Windows, which require service patches
> on an ongoing basis. Is that kind of experience valuable to you?
>

How many of those buffer overruns were caused by gets()? Was it
actually more than zero? (Microsoft Windows is not known for its
affinity to command line oriented interfaces)

I don't really use Windows, so I couldn't tell you if said
hypothetical experience would be useful. I expect it would be, in that I
would think very hard about whether I wanted to continue using Windows.

I am *not* talking about buffer overruns in general. I am talking only
about gets().

> I made a proposal for adding new functions that do not use
> global modifiable data (http://david.tribble.com/text/c9xthr.txt),
> but to no avail.
>

These functions were all standardized by POSIX, correct?

> Note that I did not advocate removing the old
> functions, but rather adding new safer ones and deprecating
> the old ones.
>

I consider that to be far preferable to removing previously
standardized functions.

If someone really wants to stop others from using gets(), I would
think that the most productive course would be:

1) standardize a safe and easy to use replacement, and

2) lobby compiler vendors to warn about gets() use

in that order. It is important to realize that programmers are by nature
lazy (else they'd be out in the sun doing some real work) and if the
change isn't easy to make, they'll just avoid the new tools or keep
using them in C99 mode or whatever.

Michal

James Dennett

unread,

Jun 1, 2006, 11:53:07 PM6/1/06

to

Douglas A. Gwyn wrote:
> James Dennett wrote:
>> ... Every single use of gets() I've ever seen has used
>> a smaller buffer than this (and every single one has been
>> a defect). What's the evidence that BUFSIZ is common?
>
> Examples in tutorial books. Legacy code.

That's rather non-specific; is there no concrete evidence
which can be objectively verified?

> There are two general gets programming scenarios:
> (1) need a "sufficiently large" buffer: BUFSIZ is the obvious
> choice, and the one I've made myself (when using gets in
> "throw-away" quick apps).
> (2) assume input won't exceed some human-interface oriented
> line width, 80 being probably the most obvious choice
> (until windowing systems became widespread): whatever
> buffer size is assumed, *even if fgets has been used
> the programmer would not be checking for overflow*, as
> that is inconsistent with his assumption.
>
> Nothing can be done about case (2), not even banishing gets.

Banishing gets() so obviously *does* do something about
case 2 that I think we may be speaking different languages.
If gets() were eliminated and fgets() were used, the
programmer would specify the buffer size (fgets() makes
it obvious that this needs to be done), and overflow would
not occur. Input may be left in the buffer, but that's an
entirely different class of problem.

> Case (1) is what I want to address.

(2) is the important issue in my experience. Addressing
(1) could have negative consequences for (2), by keeping
gets() on life support.

>>>> And it would break any existing code that uses gets() with a buffer
>>>> bigger than BUFSIZ bytes.
>>> Not unless input lines were bigger than BUFSIZ.
>> The biggest problem of gets() is, of course, input that is
>> deliberately crafted to create buffer overruns. Such input
>> will commonly be larger than BUFSIZ.
>
> You didn't understand the point to which I was responding.

Please do not assume that you know what I understand. My
understanding may well differ from yours; that being the
case, we can use a forum (say, this one) to discuss the
issue. Deciding unilaterally that someone does not
understand is not a constructive form of discussion.

> The immediate issue was what legitimate usage could be
> broken by failing to transmit more than BUFSIZ characters,
> not what illegitimate exploitation would be prevented.

*My* point is that any discussion that focused only on this
aspect is problematic, as the meat of the issue lies in the
security problems.

>> If a goal is to reduce the cost of security breaches caused
>> by buffer overruns (and that's a valuable goal in monetary
>> terms), then this doesn't seem to be an effective solution.
>
> It's very cost-effective in the (unknown quantity of) cases
> where existing vulnerabilities would automatically be removed
> when the DLLs were updated to impose the limit.

For those cases there the injected code is blocked by the
BUFSIZ limit, it could help. For the rest of cases, it
doesn't help as much as eliminating gets() would.

> It's slightly cost-effective for new code, in that there
> would be a standard safe way to read the contents of a line
> from stdin without having every time to scan for a new-line
> and remove it.

Adding a new function would achieve that same goal, without
being a "quiet change".

> The goal of the one proposal is not to fix every possible
> problem, just to address one particularly notorious one in
> a more intelligent way than dropping the spec for a long-
> established standard facility.

Dismissing a choice advocated by many intelligent people as
being a priori "less intelligent" than your option seems to
be putting the horse before the cart.

>> Good linkers already warn about use of gets, ...
>
> Really? How does a special-case hack for political purposes
> make a linker "good"?

Helping to prevent a class of serious bugs which have
major financial impact isn't done for political purposes,
unless your definition of "politics purposes" is very
different than my own. It's there to prevent a technical
problem -- buffer overruns leading to security vulnerabilities.
(But I suspect you know that, so I'm intrigued as to why you
label this "political".)

> In my universe, good linkers do the
> well-defined job of link editing, and don't make noises
> unrelated to problems in performing that job.

In my universe, security is a multi-faceted thing which can
be helped by language/library standards, by quality
implementations of those standards, by external tools which
warn of dubious practice, by process, by education,
and by discipline.

If a smart editor can help people to avoid security problems,
that's also fine by me. If Lint does it, that's just great.
If the compiler does it without needing an external linting
tool, so much the better.

>> Keeping easily misused functions in C will be taken by many,
>> rightly or wrongly, as evidence that we don't care enough
>> about buffer overruns.
>
> Hardly, when the proposed change explicitly addresses
> exactly that issue.

It addresses it incompletely, when a more complete alternative
exists and is well known. Hence, it suggests that the choice
has been made not to address the issue as thoroughly as was
possible.

> The general issue of buffer overruns involves much more
> than can even begin to be addressed by removing gets.

Certainly true.

> Indeed, there is a TC in the works now that addresses the
> issue more comprehensively. But no technical solution
> will matter if the educational issue isn't addressed: the
> problem lies in how some programmers *think* (or fail to),
> not in the language as such (which can be safely used by
> other programmers).

Programming languages can't stop people writing bad code,
but they can make it easier for people to write good code.
It's not a black and white issue.

> From the educational point of view,
> gets performs a great service by being a standard point
> of reference when discussing the buffer overrun issue.

An educational purpose which would be enhanced, in my opinion,
if it were to become the first function to be removed from
ISO C as an unacceptable security risk.

-- James

James Dennett

unread,

Jun 2, 2006, 12:01:00 AM6/2/06

to

Michal Necasek wrote:
> Keith Thompson wrote:
>
>> As far as I can tell, it's also the opinion of everyone who's
>> discussed it except you.
>>
> I agree with Douglas Gwyn (just to disprove your point) ;)
>
> For the life of me, I can't figure out what would be gained by removing
> gets() from the standard (as opposed to providing better alternatives).
> Is anyone holding a gun to people's heads, forcing them to use gets()?

In my experience most programming is not done at the business
end of a gun.

> Are C programmers children who cannot think for themselves and can't
> decide if gets() is something they should be using or not? Do they need
> parents who "know better"?

No, but many C programmers are adults who are not qualified
to make a sound decision about whether they should be using
gets(). There are millions of people programming in C, and
on average they're... average. Many are really not very good
at programming.

> Either gets() is so horrible and useless that no one in their right
> mind would ever use it. In that case, it won't be used regardless of
> whether it's available or not (need I point out that trigraphs, despite
> being widely available, aren't being used?). Or gets() does actually
> have some marginal utility, and in that case, I really can't see how its
> removal would be justified. Which is it?

That's a false dichotomy -- but the truth is close to the
first case.

gets() in its current form is so horrible that nobody who is
aware of its flaws and values security would use it. However,
many people aren't aware of the problems, and many people are
not greatly motivated by security, so it is used, particularly
by amateurs who copy from examples they find around. gets()
has a seductively simple interface.

gets() can be used safely only if you have complete control
of the standard input stream somehow -- but that sort of
assumption is often brittle and hard to verify.

-- James

kuy...@wizard.net

unread,

Jun 2, 2006, 12:08:10 AM6/2/06

to

Michal Necasek wrote:
> Keith Thompson wrote:
>
> > gets() is so horrible and useless that no one in their right mind
> > would ever use it. People use it anyway. Having it in the standard
> > encourages this.
> >
> Your statement is self-contradictory. If people use gets(), it is not
> useless.

He didn't say that it was useless. He only said that using it was a bad
idea. The fact that people use it is NOT inconsistent with it being a
bad idea. People routinely do things that are bad ideas.

> Essentially, what I have a big problem with is your assertion that you
> know better than everyone else. Such claims are very rarely true.

True, but he made no such claim. He only claimed to know better than
the people who made the mistake of using gets(). "Everyone else"
includes all of the other competent programmers who are in agreement
with Keith, that using gets() is a bad idea.

> What's
> next? Pointers have to go because they're dangerous and too many people
> can't use them properly?

You're missing the point - pointers can be used properly. gets() can't
be. If you know with a certainty that your input doesn't contain any
over-length lines, it's safe, but such certainty usually reflects
self-delusion, not reality. In reality, a program that fails
catastrophically when handed input that doesn't match it's input
specifications is usually defective; it should fail cleanly, not
catastrophically. If the requirements specification for the program
endorses the catastrophic failure, then the requirements are defective.
There is one exception: some software is deliberately designed to
damage the system it is run on - catastrophic failure is an inefficient
but possibly acceptable means toward achieving that goal.

James Dennett

unread,

Jun 2, 2006, 12:10:29 AM6/2/06

to

Michal Necasek wrote:
> Keith Thompson wrote:
>
>> I certainly have no problem with providing a safer alternative to
>> gets(). Actually, a number of such alteranatives have already been
>> provided; standardizing one of them would probably be a good thing.
> >
> I'm all for that!

I'd agree with the goal. Being able to read a line
without fear of buffer overflow and without having
to truncate a '\n' is something that should be simple.

>> For that matter, we already have fgets().
>>
> Yes, but there's that thing with '\n'. If it wasn't for that, no one
> would probably be using gets() anymore...
>
>>> What would you do with the gets identifier? Would you make it
>>> reserved somehow?
>>
>> That's an interesting question. I'm not sure what the best answer is.
>>
> What?? You're convinced that removing gets() from the C Standard is the
> best thing since sliced bread, and you haven't even thought it through?
> Tsk, tsk ;-)
>
> Now seriously. I agree that gets() is evil, shouldn't be used, etc.
> etc. I do not agree that removing gets() from the standard library is
> the best solution to that problem.
>
> I am also of the opinion that if people insist on shooting themselves
> in the foot, they should be allowed to do so. It tends to be a valuable
> learning experience.

The problem is that they're shooting other people in the foot
by leaving security holes in software. Standards bodies can
help by warning that gets() is a Bad Thing.

(Next, to hop over to comp.std.c++ and lobby for the deprecation
of std::basic_istream<T>::operator>>(const char *), which is too
unsafe even though you can precede it with a function call to
limit the size of input that will be read...)

-- James

James Dennett

unread,

Jun 2, 2006, 12:16:58 AM6/2/06

to

Douglas A. Gwyn wrote:
> kuy...@wizard.net wrote:
>> The fact that it's part of the standard is incorrectly percieved by
>> many people as an endorsement. Removing it from the standard would be a
>> lot cheaper than the re-education required to correct that
>> misperception.
>
> But the actual problem isn't gets, as witnessed by the large
> number of buffer overrun exploits reported that have not
> involved gets. Education is what is actually needed (perhaps
> augmented by a readily available reliable buffer-handling
> library to make compliance with the enlightenment easier).

Education is important, but realistically it will not be
sufficient in itself.

> As with most attempt to fix social ills via legislation, the
> anti-gets fanatics divert attention and resources away from a
> proper solution addressing the actual cause of the problem,
> into ineffective "feel-good" activity that merely delays any
> real solution (while the problem may just get worse).

There are many aspects to the cause, but bad interface design
is a very large part of it. Eliminating gets() would help to
send a message that interfaces which make assumptions about
sizes of buffers are bad, hence addressing the *cause* of the
problem in some way.

Eliminating gets() is just a tiny step forward. Beyond
education and a change which would (a) make some uses of
gets() safe and (b) eliminate some code injection attacks,
what steps would you like to take to reduce the problem?

-- James

jacob navia

unread,

Jun 2, 2006, 1:09:18 AM6/2/06

to

James Dennett a écrit :

>
>
> Adding a new function would achieve that same goal, without
> being a "quiet change".
>

This is what bothers me the most in the proposal of Mr Gwyn.
The code that uses gets now say:

void getinput(void)
{
char buf[80]; // Nobody will type more than this

gets(buf);
if (!strcmp(buf,"exit"))
exut(0);
// ...
}

This will stay as is, and will continue to provoke a buffer overflow
when anyone types more than 80 characters.

Making gets obsolete would provoke a warning, or better, a link error.

jacob

Rudolf

unread,

Jun 2, 2006, 1:15:33 AM6/2/06

to

In article <447fc7fd$0$20174$ba4a...@news.orange.fr>,
jacob navia <ja...@jacob.remcomp.fr> wrote:

What does any of that have to do with rewind?

Charlie Gordon

unread,

Jun 2, 2006, 3:15:43 AM6/2/06

to

"Francis Glassborow" <fra...@robinton.demon.co.uk> wrote in message
news:zvbjldX8...@robinton.demon.co.uk...
> In article <447eb850$0$12762$636a...@news.free.fr>, Charlie Gordon
> <ne...@chqrlie.org> writes
> >Yes, but the C language makes it especially hard to become a learnt
programmer
> >that never forgets about all these subtleties.
>
> You mean like English makes it especially difficult to become a
> competent writer who never forgets the subtle rules of its grammar and
> syntax?

You are so righteous! Let's see you cope with my French.

Le fait que certains d'entre nous s'expriment imparfaitement dans la langue de
Shakespeare n'est pas une preuve de l'inanité de nos propos. La subtilité fait
sans doute le charme d'une langue, mais la complication gratuite restreint le
cercle de ses locuteurs, et l'évolution permanente est une condition nécessaire
pour qu'une langue reste vivante : je dirais même que son succès est conditionné
par son adaptabilité. Le langage C évolue tellement lentement qu'il sera
bientôt un langage mort, apanage de quelques spécialistes hiératiques confinés
aux limites du logiciel et du matériel, et bien sûr des informaticiennes qui
pour longtemps encore compileront le C.

Chqrlie.

Keith Thompson

unread,

Jun 2, 2006, 3:36:37 AM6/2/06

to

"Douglas A. Gwyn" <DAG...@null.net> writes:

[...]

> From the educational point of view,
> gets performs a great service by being a standard point
> of reference when discussing the buffer overrun issue.

I have to say that's one of the most absurd arguments I've ever seen.

Charlie Gordon

unread,

Jun 2, 2006, 3:39:12 AM6/2/06

to

"Keith Thompson" <ks...@mib.org> wrote in message
news:lnodxcw...@nuthaus.mib.org...

> Michal Necasek <mic...@scitechsoft.com> writes:
> [...]
> > Either gets() is so horrible and useless that no one in their right
> > mind would ever use it. In that case, it won't be used regardless of
> > whether it's available or not (need I point out that trigraphs,
> > despite being widely available, aren't being used?). Or gets() does
> > actually have some marginal utility, and in that case, I really can't
> > see how its removal would be justified. Which is it?
>
> gets() is so horrible and useless that no one in their right mind
> would ever use it. People use it anyway. Having it in the standard
> encourages this.

gets() is not useless, on the contrary, its semantics are more convenient than
those of fgets().
yet the API is broken.
a good solution to this flaw is to add the correct prototype for gets() and
deprecate the current one, when time comes to support some form of function
overloading in the language.

Chqrlie.

Charlie Gordon

unread,

Jun 2, 2006, 4:07:32 AM6/2/06

to

"Keith Thompson" <ks...@mib.org> wrote in message

news:lny7wgt...@nuthaus.mib.org...

> "Douglas A. Gwyn" <DAG...@null.net> writes:
> [...]
> > From the educational point of view,
> > gets performs a great service by being a standard point
> > of reference when discussing the buffer overrun issue.
>
> I have to say that's one of the most absurd arguments I've ever seen.

Ubuesque! like most of the case for gets(), to the very title of this thread
;-)

Chqrlie.

Keith Thompson

unread,

Jun 2, 2006, 4:43:08 AM6/2/06

to

Rudolf <rth...@bigfoot.com> writes:
> In article <447fc7fd$0$20174$ba4a...@news.orange.fr>,
> jacob navia <ja...@jacob.remcomp.fr> wrote:

[...]

>> Making gets obsolete would provoke a warning, or better, a link error.
>

> What does any of that have to do with rewind?

The topic drifted, as is typical.

Francis Glassborow

unread,

Jun 2, 2006, 5:09:28 AM6/2/06

to

In article <CCOfg.102337$iU2.99893@fed1read01>, James Dennett
<jden...@cox.net> writes

>For those cases there the injected code is blocked by the
>BUFSIZ limit, it could help. For the rest of cases, it
>doesn't help as much as eliminating gets() would.

However, as every implementation will continue to provide gets() for the
foreseeable future, changing it to make it safer would seem a positive
step. Perhaps we could change it along the lines discussed earlier this
year and also deprecate it.

--
Francis Glassborow ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

Francis Glassborow

unread,

Jun 2, 2006, 5:13:25 AM6/2/06

to

In article <447fc7fd$0$20174$ba4a...@news.orange.fr>, jacob navia

<ja...@jacob.remcomp.fr> writes

>Making gets obsolete would provoke a warning, or better, a link error.

It will do neither of those things. Compilers are already free to warn
and I would rahter not see link technology perverted to do things like
this (modern linkers have lots of general things to do without adding
special cases, and if you were thinking of not-defining gets(), be
realistic, no implementer is going to break their client's legacy code
that way.)

--
Francis Glassborow ACCU
Author of 'You Can Do It!' and "You Can Program in C++"

Francis Glassborow

unread,

Jun 2, 2006, 5:30:00 AM6/2/06

to

In article <447fe5a0$0$20747$636a...@news.free.fr>, Charlie Gordon

<ne...@chqrlie.org> writes
>"Francis Glassborow" <fra...@robinton.demon.co.uk> wrote in message
>news:zvbjldX8...@robinton.demon.co.uk...
>> In article <447eb850$0$12762$636a...@news.free.fr>, Charlie Gordon
>> <ne...@chqrlie.org> writes
>> >Yes, but the C language makes it especially hard to become a learnt
>programmer
>> >that never forgets about all these subtleties.
>>
>> You mean like English makes it especially difficult to become a
>> competent writer who never forgets the subtle rules of its grammar and
>> syntax?
>
>You are so righteous! Let's see you cope with my French.

Which makes my point. Every language, both human and computing, has
subtle rules but that does not prevent reasonable people from using them
reasonably. C's rules are no more subtle than most (and a good deal less
than many) but they are well aired because there is a very large body of
programmers stressing the language.

jacob navia

unread,

Jun 2, 2006, 5:52:22 AM6/2/06

to

Keith Thompson wrote:
>
> It has become obvious yet again that this discussion will never get
> anywhere.
>

Yes, we have no means of changing what the standards comitee does.

We have discussed this extensiveley MANY times and the comitee's answer
is always the same:

We will not change anything since that would cause maintenance problems
in old programs that could use trigraphs or gets() and we do not want
any old programs to be changed.

I tried to reach the local standards comitee here in France but they ask
too much money to do anything so I am stuck, probably as you are.

Arguments do not count anything here.

gets() will stay that way (it is years since we already discussed here
this, and nothing has been done)

Jordan Abel

unread,

Jun 2, 2006, 6:47:31 AM6/2/06

to

2006-06-02 <+dGA5av1...@robinton.demon.co.uk>, Francis Glassborow wrote:
> In article <447fc7fd$0$20174$ba4a...@news.orange.fr>, jacob navia
> <ja...@jacob.remcomp.fr> writes
>>Making gets obsolete would provoke a warning, or better, a link error.
>
> It will do neither of those things. Compilers are already free to warn
> and I would rahter not see link technology perverted to do things like
> this (modern linkers have lots of general things to do without adding
> special cases, and if you were thinking of not-defining gets(), be
> realistic, no implementer is going to break their client's legacy code
> that way.)

Modern linkers have, among other features, a general facility for
marking a symbol in order to cause it to produce a "this function is
unsafe" warning. Some libraries use this for gets(), tmpnam(), and
others. How is this a 'special case'? You can define a symbol like this
in your own library if you want.

Hallvard B Furuseth

unread,

Jun 2, 2006, 9:55:40 AM6/2/06

to

James Dennett writes:

>Michal Necasek wrote:
>
>> Are C programmers children who cannot think for themselves and can't
>> decide if gets() is something they should be using or not? Do they
>> need parents who "know better"?
>
> No, but many C programmers are adults who are not qualified
> to make a sound decision about whether they should be using
> gets().

And in most cases their problem is that they should not be using C in
the first place, but some higher-level language which fits them better.
Bad use of gets() will be one among of many many other bugs with their
code.

> gets() in its current form is so horrible that nobody who is
> aware of its flaws and values security would use it.

I'm aware of its flaws. I value security. I use it. Only in brief
throw-away programs for my own use, of course. It saves 3-4 lines of
code, plus the jump up and back down in the editor to declare a varable
for strchr(,'\n').

> gets() can be used safely only if you have complete control
> of the standard input stream somehow -- but that sort of
> assumption is often brittle and hard to verify.

No, for me it's very easy to verify.

--
Hallvard

Hallvard B Furuseth

unread,

Jun 2, 2006, 10:26:30 AM6/2/06

to

I think that's a misfeature, though making gets() safer somehow is a
good idea. It will still be a decade or more before one can rely on
gets() - but people will begin to do so quicker, once e.g. glibc
implements and documents it.

You could #define some other constant than BUFSIZ, e.g. _GETS_BUFSIZ
which must be >= BUFSIZ. Programs can then safely declare
buf[_GETS_BUFSIZ] and use gets(). If the safe variant is not supported,
the program will not compile.

I suppose a lesser change could be something like "it is unspecified
whether the implementation transfers the entire line or just at most
BUFSIZ characters. It is recommended that the implementation does the
latter, though conforming programs cannot not rely on this." If a safe
but equally convenient function is standardized at the same time, maybe
that would prevent an increased use of unsafe gets(). I kind of doubt
it though.

--
Hallvard

Wojtek Lerch

unread,

Jun 2, 2006, 10:31:02 AM6/2/06

to

"Hallvard B Furuseth" <h.b.fu...@usit.uio.no> wrote in message
news:hbf.2006...@bombur.uio.no...

> I suppose a lesser change could be something like "it is unspecified
> whether the implementation transfers the entire line or just at most
> BUFSIZ characters. It is recommended that the implementation does the
> latter, though conforming programs cannot not rely on this."

If at least one implementation does it, then some conforming programs can
rely on it. ;-)

Hallvard B Furuseth

unread,

Jun 2, 2006, 10:52:56 AM6/2/06

to

OK, then I don't know the right standardese. But it ought to be
possible to say somehow.

--
Hallvard

Dr A. N. Walker

unread,

Jun 2, 2006, 10:55:08 AM6/2/06

to

In article <1149221290.2...@f6g2000cwb.googlegroups.com>,
<kuy...@wizard.net> wrote:
[...]

> If you know with a certainty that your input doesn't contain any
>over-length lines, it's safe, but such certainty usually reflects
>self-delusion, not reality.

Whoa! In your reality, perhaps; but not so in mine; and
I'd guess that my reality matches that of at least a significant
proportion, and perhaps a large majority, of C code being written
[not *used*, but *written*]. ...

> In reality, a program that fails
>catastrophically when handed input that doesn't match it's input

>specifications is usually defective; [...]

... Ditto. ...

>There is one exception: some software is deliberately designed to

>damage the system it is run on [...]

... No, there are *lots* of exceptions. Most of my
programs have no security, safety or commercial implications,
likewise those of my colleagues or students. We want to type
in some coefficients, or a matrix, or the start position of
some game, or a list of exam marks, and have the computer solve
something or report some outcome. If two lines get run together
or the input is otherwise botched or garbled, no-one will get a
bill for a zillion dollars, no missile will be launched, no cancer
patient will die. If the program *does* crash, this is even
better than producing possibly wrong results, for at least the
user [probably the programmer!] will notice, and go back to edit
the input.

If input validation *is* required, what happened to the
notion of a validation *phase*? In Unix-like systems, this is
even the philosophically-preferred solution:

validate | dosomecomputation | printordisplayanswer

Again, if there are security/safety/commercial implications, then
the rest of the pipeline should be somewhat paranoid, but in most
research/academic/domestic applications?

None of this is meant as an excuse for sloppy coding or
for bad teaching. In particular, it has nothing to do with the
"gets()" problem. It's just a reminder that many of us work in
different environments, where the priorities differ.

--
Andy Walker, School of MathSci., Univ. of Nott'm, UK.
a...@maths.nott.ac.uk

kuy...@wizard.net

unread,

Jun 2, 2006, 12:02:55 PM6/2/06

to

Dr A. N. Walker wrote:
> In article <1149221290.2...@f6g2000cwb.googlegroups.com>,
> <kuy...@wizard.net> wrote:
> [...]
> > If you know with a certainty that your input doesn't contain any
> >over-length lines, it's safe, but such certainty usually reflects
> >self-delusion, not reality.
>
> Whoa! In your reality, perhaps; but not so in mine; and
> I'd guess that my reality matches that of at least a significant
> proportion, and perhaps a large majority, of C code being written
> [not *used*, but *written*]. ...

I don't know where your reality is; mine includes computers connected
to the internet that cannot, despite the best available security
systems, be kept perfectly clean at all times from unwanted intrusions.
However, while malicious intruders who deliberately misuse a program
are a problem, a far more serious threat (assuming the system has
decent security software that is being used properly) are the ordinary
users who through either stupidity or carelessness or badly written
documentation invoke a program in the wrong context.

> > In reality, a program that fails
> >catastrophically when handed input that doesn't match it's input
> >specifications is usually defective; [...]
>
> ... Ditto. ...
>
> >There is one exception: some software is deliberately designed to
> >damage the system it is run on [...]
>
> ... No, there are *lots* of exceptions. Most of my
> programs have no security, safety or commercial implications,
> likewise those of my colleagues or students.

The consequence of buffer overruns isn't just a possible security
violation. It's undefined behaviour. To me, undefined behavior is
serious - as far as the C standard is concerned, there's no limits to
the damage that can occur when the behavior is undefined. Reality is a
little kinder than that, but not a lot. Basically, anything your C
program could have done deliberately by means of code with defined
behavior is something that could occur when the behavior is undefined,
and in most contexts that includes at least one behavior that I REALLY
don't want to have happen; usually, it includes a very large number of
possibilities that I never want to have happen. There's relatively few
contexts, for instance, in which it would be acceptable for a program
to malfunction by deleting or overwriting every file that you have
permission to write to.

> If input validation *is* required, what happened to the
> notion of a validation *phase*? In Unix-like systems, this is

In my code, I use the standard that the fact that a given piece of code
has defined behavior must be verifiable simply by looking at other code
in the same translation unit; I insert validation code wherever needed
to make that possible. For calls to a function defined in another TU,
this requirement only applies to the arguments and return value of that
function; that the other function works properly is something to be
determined by looking at the TU that it's defined in. Despite the fact
that I break my code into fairly short files (typically no more than a
few hundred lines), I've not found this a burdensome requirement.

> Again, if there are security/safety/commercial implications, then
> the rest of the pipeline should be somewhat paranoid, but in most
> research/academic/domestic applications?

Most research/academic/domestic applications that I'm aware of are run
in contexts where plausible forms of "undefined behavior" includes at
least one completely unacceptable behavior, and usually a LOT more than
one.

Wojtek Lerch

unread,

Jun 2, 2006, 12:30:24 PM6/2/06

to

"Hallvard B Furuseth" <h.b.fu...@usit.uio.no> wrote in message
news:hbf.2006...@bombur.uio.no...
> Wojtek Lerch writes:
>> "Hallvard B Furuseth" <h.b.fu...@usit.uio.no> wrote in message
>> news:hbf.2006...@bombur.uio.no...
>>> I suppose a lesser change could be something like "it is unspecified
>>> whether the implementation transfers the entire line or just at most
>>> BUFSIZ characters. It is recommended that the implementation does the
>>> latter, though conforming programs cannot not rely on this."
>>
>> If at least one implementation does it, then some conforming programs can
>> rely on it. ;-)

Actually, I was wrong -- if it's unspecified, than even if an implementation
happens to do it, then programs written for that implementation might not be
able to rely on it unless the implementation's documentation promises to do
it. For some reason, my brain had read "unspecified" as
"implementation-defined". Maybe it's because making it unspecified doesn't
really change anything: there's no difference between "it's unspecified
whether the behaviour is defined or not" and "the behaviour is undefined".

> OK, then I don't know the right standardese. But it ought to be
> possible to say somehow.

Saying that it's unspecified is enough. You don't need to separately
explain that programs cannot rely on unspecified behaviour -- the definition
of "unspecified" takes care of that already.

Dr A. N. Walker

unread,

Jun 2, 2006, 2:18:06 PM6/2/06

to

In article <1149264175.5...@i40g2000cwc.googlegroups.com>,

<kuy...@wizard.net> wrote:
>> > If you know with a certainty that your input doesn't contain any
>> >over-length lines, it's safe, but such certainty usually reflects
>> >self-delusion, not reality.

>> Whoa! In your reality, perhaps; but not so in mine; [...]

>I don't know where your reality is; mine includes computers connected
>to the internet that cannot, despite the best available security
>systems, be kept perfectly clean at all times from unwanted intrusions.

Sure, but if an unwanted intruder has gained sufficient
access to be able to run a random program of mine, then I already
have much worse things to worry about than whether he can feed
that program an overlength line. And on a computer that has no
relevant security/safety/commercial interest, is it not far more
likely that the intruder is looking for ways to utilise the mail
or internet stuff for spam than for ways to make a program that
averages some exam marks print rude messages?

>However, while malicious intruders who deliberately misuse a program
>are a problem, a far more serious threat (assuming the system has
>decent security software that is being used properly) are the ordinary
>users who through either stupidity or carelessness or badly written
>documentation invoke a program in the wrong context.

Again, *you* are talking about a particular context. In
*my* context, I write thousands of lines of code that have no
documentation or formal specs at all, and for which the only
user is me. And thousands of students write hundreds of thousands
of lines whose sole purpose is to solve exercise 3.1, after which
they will be thrown away. Or programs whose sole purpose is to
calculate some numbers that will be used to draw a graph in their
projects. Programs that will never be used by anyone else, and
[before people get on high horses] are not part of a "how to write
commercial software" module. Of course, I *also* sometimes write
code that is less ephemeral; in that case, it starts being worth
my while to start worrying about how the code will react to all
the possible things that might go wrong with the input.

>The consequence of buffer overruns isn't just a possible security

>violation. It's undefined behaviour. [...]

Sure, again. But I'm talking about programs with a typical
structure something like

while read a number and it's positive
do calculate some function of that number,
and print the result

so that I can look at some typical values and perhaps draw a graph.
Or that read in a matrix, apply some algorithm to it and report.
It's *much* more likely that any such program has off-by-one errors
or "you used a[i] when you meant a[j]" errors [until it has been
debugged] than that I am going to sit there typing incredibly long
lines at it until nasal demons emerge. If we have to worry that
any UB program is going to erase the hard disc, then we can never
dare to run a program that has not yet been fully debugged. Or
any program, such as a compiler, operating system, browser, mailer,
... that may contain bugs. Oops. Life is too short ....

>Most research/academic/domestic applications that I'm aware of are run
>in contexts where plausible forms of "undefined behavior" includes at
>least one completely unacceptable behavior, and usually a LOT more than
>one.

The give-away is that word "applications". An "application"
is different from a "program". Different contexts, again. [And,
again, this is not a plea that I personally want to use "gets()"
or to write sloppy code; it's merely to do with the assumption
that nearly all C code is required to be paranoid about validating
its data, or else the sky will fall in.]

Michal Necasek

unread,

Jun 2, 2006, 4:05:04 PM6/2/06

to

Thanks for pointing out that yes, gets() does have some utility. I too
was thinking of throwaway programs that can't realistically cause any
damage no matter how insecure they are, and where trying to make them
secure would be a complete waste of effort. As always, it's about the
right tool for the job, and sometimes gets() is that tool, warts and all.

When I wish to build and try some example from a programming book,
such as this one

http://www.basepath.com/aup/ex/x3a_8c-source.html

the last thing I want to do is fight with missing gets() and start
rewriting the code. Much pain for no gain.

I am struck by the religious righteousness of the anti-gets()
crusaders, the sort of people who are convinced that programmers should
have their fingers chopped off (at the very least) for even
contemplating the use of gets(). They talk about insecurity but so far
haven't provided any proof that there have been (recent) security
breaches due to gets() use. I wonder if that might be because the sort
of software that's most likely to be exploited is not, in fact, likely
to have any need for gets() at all.

I know my history; people who know what's best for me might be led by
the purest and most altruistic motives, yet in the end they'll always
cause a disaster. 100% certainty is typically attainable only if not all
aspects of a problem have been considered. It's the same in politics and
in programming.

This is probably outside of the scope of standard C, but it might be
useful to devise a standard way to mark functions (maybe other symbols
too) as insecure, and leave it up to the user to decide if they want to
hear about such potential security problems or not.

Something along the lines of

#pragma unsafe(bad_interface[,suggested_safe_interface])

I use C in large part because it lets me do more or less everything I
need, and make my own decisions on my own responsibility. I'd prefer to
keep it that way.

Michal

Douglas A. Gwyn

unread,

Jun 2, 2006, 3:47:14 PM6/2/06

to

Wojtek Lerch wrote:
> "Douglas A. Gwyn" <DAG...@null.net> wrote ...
> > (1) need a "sufficiently large" buffer: BUFSIZ is the obvious
> > choice, and the one I've made myself (when using gets in
> > "throw-away" quick apps).
> I have to admit that it's not obvious to me at all. ...

Given that a <stdio.h> function is involved, BUFSIZ comes
naturally to mind. I've seen it used a lot for gets and
similar "don't want to think about it too hard" text-line
buffers. Anyway, if it were the specified size you'd use
it (one hopes).

Douglas A. Gwyn

unread,

Jun 2, 2006, 4:05:42 PM6/2/06

to

James Dennett wrote:
> Adding a new function would achieve that same goal, without
> being a "quiet change".

There is in fact a proposal for such a new function,
as well as several others, in a TC currently in the works.
Dealing with deficiencies in existing facilities can
proceed separately from adding new facilities.

> > The goal of the one proposal is not to fix every possible
> > problem, just to address one particularly notorious one in
> > a more intelligent way than dropping the spec for a long-
> > established standard facility.
> Dismissing a choice advocated by many intelligent people as
> being a priori "less intelligent" ...

Would you rather I had called it "ill-considered" or
"irresponsible"? The point is that the anti-gets faction
is so fanatical in their belief that they don't care about
the adverse impact on existing gets usages. Since I do
care, as in general the C committee does, I prefer a more
careful approach.

Consider what the anti-gets camp might have believed
instead, had gets been carefully specified all along
as having a BUFSIZ transfer limit. They might have
decried the kludgy interface as opposed to one that
allows the programmer to pick his own buffer size,
but it seems most unlikely that they would have been
screaming for stamping out gets altogether as they
do now. So there seems to be inertia in their belief
even when considering a potentially different situation.
It seems to qualify as a "religion".

> (But I suspect you know that, so I'm intrigued as to why you
> label this "political".)

Because it doesn't actually achieve the stated
"technical" goal, and would not have happened at all
had it not been for a particular cult having developed.

Douglas A. Gwyn

unread,

Jun 2, 2006, 4:12:41 PM6/2/06

to

Keith Thompson wrote:
> I don't assert that I know better than everyone else. I assert that
> the majority who think that gets() should be removed make a better
> case than the minority who don't.

(1) A majority of what? Posters on the subject in newsgroups
aren't necessarily representative of everybody potentially
affected by any proposed change. There is a general social
phenomenon that members of special-interest groups are more
vocal and visible than the "silent majority".

(2) If decisions about the language standard were subject to
a majority vote of this kind, we'd have a wildly different
language, and almost certainly one that even you would deem
to be of lower quality.

Douglas A. Gwyn

unread,

Jun 2, 2006, 4:19:21 PM6/2/06

to

Frankly, I find any notion that gets poses a significant
hazard for future code to be rather far-fetched. Surely,
any programmer who has heard anything at all about the
evils of allowing buffer overruns has also heard about
gets as the prime example; and other programmers are too
oblivious to the issue to think that their products will
be safe in this regard.

Again, the *real* problem in this regard is not the
existence of gets.

Douglas A. Gwyn

unread,

Jun 2, 2006, 4:30:39 PM6/2/06

to

Hallvard B Furuseth wrote:
> You could #define some other constant than BUFSIZ, e.g. _GETS_BUFSIZ

Yes, but I'm trying to keep it simple and in line with widespread
common usage. Generally those are good guidelines for proposals.

> ... If the safe variant is not supported, the program will not compile.

Of course if that is your main concern, you can complicate the
proposal. This is the kind of detail that gets hashed out in
committee discussion.

> I suppose a lesser change could be something like "it is unspecified
> whether the implementation transfers the entire line or just at most
> BUFSIZ characters. It is recommended that the implementation does the
> latter, though conforming programs cannot not rely on this."

I don't think that is any better than saying "don't use gets".

My main purpose in this is to fix the problem (well enough to
make a conforming implementation of gets safe to use even in
hostile environments), not to change it into a different problem.

Keith Thompson

unread,

Jun 2, 2006, 4:47:22 PM6/2/06

to

"Douglas A. Gwyn" <DAG...@null.net> writes:
[...]

> Consider what the anti-gets camp might have believed
> instead, had gets been carefully specified all along
> as having a BUFSIZ transfer limit. They might have
> decried the kludgy interface as opposed to one that
> allows the programmer to pick his own buffer size,
> but it seems most unlikely that they would have been
> screaming for stamping out gets altogether as they
> do now. So there seems to be inertia in their belief
> even when considering a potentially different situation.
> It seems to qualify as a "religion".

So you're speculating about what you think the "anti-gets camp" would
believe in some hypothetical situation, and using that speculation as
a basis for claiming that we're religious fanatics.

Doug, though I disagree with you, I respect your opinion. Your
failure to reciprocate is a large part of why this debate isn't going
anywhere.

My advocacy for deprecating gets() or removing it from the standard is
based on consideration of a number of factors, *including* the
existence of code that uses gets(). I understand that removing gets()
would have a non-zero cost (just as removing implicit int had a
non-zero cost). It is my considered opinion that it would be worth
it. I am sick and tired of being labeled a religious fanatic because
of that.

Douglas A. Gwyn

unread,

Jun 2, 2006, 4:45:53 PM6/2/06

to

The fact is that gets is quite useful, therefore not "useless".
It is not *safe* to use in some contexts, but in others it is
(even as currently specified) sufficiently safe.
while (gets(line))
process(line);
is exactly what the programmer wants to be able to say. Now, if
one had a standard support library for opaque buffer objects,
such code could be written in essentially that form using those
functions (although some operations with the buffers might be
less convenient.) However, lacking any such standard support,
gets is the closest thing available. fgets can be used, but it
takes more work that in many contexts isn't of any value.

So it is not surprising that gets would be used for quick hacks
or personal-use apps. It would be of benefit overall if such
usage were also made perfectly safe against buffer overrun,
since sometimes quick hacks and personal code eventually finds
its way into different contexts where the original assumptions
no longer hold.

Douglas A. Gwyn

unread,

Jun 2, 2006, 4:48:28 PM6/2/06

to

"Dr A. N. Walker" wrote:

> ... And thousands of students write hundreds of thousands

> of lines whose sole purpose is to solve exercise 3.1, after which
> they will be thrown away.

Right. And the anti-gets fanatics want each of them to have
to do more work for every single instance.

Douglas A. Gwyn

unread,

Jun 2, 2006, 4:56:26 PM6/2/06

to

jacob navia wrote:
> Yes, we have no means of changing what the standards comitee does.
> We have discussed this extensiveley MANY times and the comitee's answer
> is always the same:

Excuse me, but there are means of influencing standards committee
actions, and furthemore this issue has not been brought before
the C standards committee "many" times. (I seem to recall that
it was addressed in response to a comment received during public
review of a draft, but it was posed as an all-or-nothing issue,
not in terms of possible ways to fix the problematic loophole.)

> We will not change anything since that would cause maintenance problems
> in old programs that could use trigraphs or gets() and we do not want
> any old programs to be changed.

The effects of such changes must be *weighed*, and the benefits
seen to outweigh the costs. Any other approach would be irrational.

Douglas A. Gwyn

unread,

Jun 2, 2006, 5:07:58 PM6/2/06

to

James Dennett wrote:
> ... Beyond
> education and a change which would (a) make some uses of
> gets() safe and (b) eliminate some code injection attacks,
> what steps would you like to take to reduce the problem?

Actually I don't see (a) and (b) as a significant part of
any solution (just as I don't see gets itself as a
significant part of the actual problem). Besides proper
programmer education in a variety of matters, including
the need to avoid potential buffer overruns and how they
may arise, it seems that the most urgently needed action
is for some well-designed package of robust buffer-object
management functions to become sufficiently widely
available. That would allow the trained programmers to
put their knowledge into practice with minimal fuss and
expense.

Of course, some other PLs more or less already have that.
And I'd recommend use of higher-level PLs than C for many
application purposes, when feasible. (Keeping in mind
that the education is still needed; one can, in some
contexts, overrun buffers in *any* sufficiently general
PL, although some PLs make it harder to do so by accident,
and the consequences of doing so may be harder to exploit
for nefarious purposes.)

Douglas A. Gwyn

unread,

Jun 2, 2006, 5:10:56 PM6/2/06

to

James Dennett wrote:
> No, but many C programmers are adults who are not qualified
> to make a sound decision about whether they should be using

> gets(). There are millions of people programming in C, ...

If you're using network-accessible apps produced by such
people, you should expect a lot of quality and security
problems having nothing to do with gets.

C was never meant for novices, nor for people too lazy to
learn before doing.

Jordan Abel

unread,

Jun 2, 2006, 6:19:40 PM6/2/06

to

2006-06-02 <44809A16...@null.net>, Douglas A. Gwyn wrote:
> James Dennett wrote:
>> Adding a new function would achieve that same goal, without
>> being a "quiet change".
>
> There is in fact a proposal for such a new function,
> as well as several others, in a TC currently in the works.
> Dealing with deficiencies in existing facilities can
> proceed separately from adding new facilities.
>
>> > The goal of the one proposal is not to fix every possible
>> > problem, just to address one particularly notorious one in
>> > a more intelligent way than dropping the spec for a long-
>> > established standard facility.
>> Dismissing a choice advocated by many intelligent people as
>> being a priori "less intelligent" ...
>
> Would you rather I had called it "ill-considered" or
> "irresponsible"? The point is that the anti-gets faction
> is so fanatical in their belief that they don't care about
> the adverse impact on existing gets usages. Since I do
> care, as in general the C committee does, I prefer a more
> careful approach.

This "adverse impact" is mostly imagined. Deprecating it in the standard
won't magically make it disappear from existing implementations.

Allowing a run-time diagnostic to stderr at the first call to gets while
having it otherwise work would be one possible course of action, and
some implementations do this _now_ in violation of the standard.

Requiring a compile-time diagnostic if there is an external reference to
the symbol "gets" is another possibility.

Even if "gets" isn't mentioned in the standard at all, that doesn't mean
implementations won't provide it.

And even if an implementation does withdraw it, all it takes is one file
added to your program.

---gets.c---
#include<stdio.h>
char *gets(char *buf) {
char *tmp = buf;
int c;
while((c = getchar()) != EOF && c != '\n')
*buf++ = c;
if(c == EOF && (buf == tmp || ferror(stdin))) return 0;
*buf = 0;
return tmp;
}
---end---

I hereby release the above code into the public domain. As far as
I know, it contains no errors except for the buffer overflow, and is
a conforming implementation of the gets function.

Jordan Abel

unread,

Jun 2, 2006, 6:19:43 PM6/2/06

to

2006-06-02 <hbf.2006...@bombur.uio.no>, Hallvard B Furuseth wrote:
> I suppose a lesser change could be something like "it is unspecified
> whether the implementation transfers the entire line or just at most
> BUFSIZ characters. It is recommended that the implementation does the
> latter, though conforming programs cannot not rely on this." If a safe
> but equally convenient function is standardized at the same time, maybe
> that would prevent an increased use of unsafe gets(). I kind of doubt
> it though.

If it only transfers BUFSIZ (BUFSIZ-1?) characters, will it leave the
rest of the line on stdin, or will it discard it? Either way, how do you
tell that happened?