A good gets()

113 views
Skip to first unread message

CBFalconer

unread,
Jun 22, 2002, 7:52:15 PM6/22/02
to
Since people have the urge to use gets because of the simplicity
of the call, here is a version that is perfectly safe. Note the
simple file copying loop in the test program.

Have at it. splint complains about puts and also has invalid
complaints about using buffer after realloc fails.

[1] c:\c\ggets>splint -linelength 65 *.c
Splint 3.0.1.6 --- 11 Feb 2002

ggets.c: (in function fggets)
ggets.c(49,17): Variable buffer used after being released
Memory is used after it has been released (either by passing
as an only param or assigning to an only global). (Use
-usereleased to inhibit warning)
ggets.c(47,35): Storage buffer released
ggets.c(65,14): Variable buffer used after being released
ggets.c(64,32): Storage buffer released
tggets.c: (in function main)
tggets.c(15,13): Return value (type int) ignored: puts(line)
Result returned by function call is not used. If this is
intended, can cast result to (void) to eliminate message. (Use
-retvalint to inhibit warning)
tggets.c(20,4): Return value (type int) ignored: puts("Usage:
tgg...

Finished checking --- 4 code warnings

I haven't really thrashed this yet, so am using the Microsoft
style testbed. No warranties, express or implied.


/* File ggets.h - goodgets is a safe alternative to gets */
/* By C.B. Falconer. Public domain 2002-06-22 */
/* attribution appreciated. */

/* fggets and ggets [which is fggets(stdin)] return a buffer
filled with the next complete line from the text stream f.
The storage has been allocated within fggets, and is
normally reduced to be an exact fit. The trailing \n has
been removed, so the resultant line is ready for dumping
with puts. If an allocation or file error occurs fggets
returns the NULL pointer. The buffer will be as large as
is required to hold the complete line.

Freeing of assigned storage is the callers responsibility
*/

#ifndef ggets_h_
# define ggets_h_

# ifdef __cplusplus
extern "C" {
# endif

/*@NULL@*/ /* <-- for splint */
char *fggets(FILE *f);

#define ggets() fggets(stdin)

# ifdef __cplusplus
}
# endif
#endif
/* END ggets.h */

================================

/* File ggets.h - goodgets is a safe alternative to gets */
/* By C.B. Falconer. Public domain 2002-06-22 */
/* attribution appreciated. */

/* fggets and ggets [which is fggets(stdin)] return a buffer
filled with the next complete line from the text stream f.
The storage has been allocated within fggets, and is
normally reduced to be an exact fit. The trailing \n has
been removed, so the resultant line is ready for dumping
with puts. If an allocation or file error occurs fggets
returns the NULL pointer. The buffer will be as large as
is required to hold the complete line.

Freeing of assigned storage is the callers responsibility
*/

#include <stdio.h>
#include <string.h> /* strchr */
#include <stdlib.h>
#include "ggets.h"

#define INITSIZE 112 /* power of 2 minus 16, helps malloc */
#define DELTASIZE (INITSIZE + 16)

/*@NULL@*/ /* <-- for splint */
char *fggets(FILE *f)
{
int cursize, rdsize;
char *buffer, *temp, *rdpoint;

if (NULL == (buffer = malloc(INITSIZE)))
return NULL;
cursize = rdsize = INITSIZE;
rdpoint = buffer;

if (NULL == fgets(rdpoint, rdsize, f)) {
free(buffer);
return NULL;
}
/* initial read succeeded, now decide about expansion */
while (NULL == (temp = strchr(rdpoint, '\n'))) {
/* line is not completed, expand */

/* set up cursize, rdpoint and rdsize, expand buffer */
rdsize = DELTASIZE + 1; /* allow for a final '\0' */
cursize += DELTASIZE;
if (NULL == (temp = realloc(buffer, (size_t)cursize))) {
/* ran out of memory */
return buffer; /* partial line, next call will fail */
}
buffer = temp;
/* Read into the '\0' up */
rdpoint = buffer + (cursize - DELTASIZE - 1);

/* get the next piece of this line */
if (NULL == fgets(rdpoint, rdsize, f)) {
free(buffer); /* fouled, read error */
return NULL;
}
} /* while line not complete */

*temp = '\0'; /* mark line end, strip \n */
rdsize = temp - buffer;
if (NULL == (temp = realloc(buffer, (size_t)rdsize + 1))) {
return buffer; /* without reducing it */
}

return temp;
} /* fggets */
/* End of ggets.c */

================================

/* file tggets.c - testing ggets() */

#include <stdio.h>
#include <stdlib.h>
#include "ggets.h"

int main(int argc, char **argv)
{
FILE *infile;
char *line;

if (argc == 2)
if ((infile = fopen(argv[1], "r"))) {
while ((line = fggets(infile))) {
puts(line);
free(line);
}
return 0;
}
puts("Usage: tggets filetodisplay");
return EXIT_FAILURE;
} /* main */
/* End of tggets.c */

--
Chuck F (cbfal...@yahoo.com) (cbfal...@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

CBFalconer

unread,
Jun 23, 2002, 2:05:19 AM6/23/02
to
CBFalconer wrote:
>
> Since people have the urge to use gets because of the simplicity
> of the call, here is a version that is perfectly safe. Note the
> simple file copying loop in the test program.
>
> Have at it. splint complains about puts and also has invalid
> complaints about using buffer after realloc fails.

I have thrashed this further, and am satisfied it functions as
advertised. At any rate I have mounted it, together with the
testing programs, at:

<http://cbfalconer.home.att.net/download>

Pat Foley

unread,
Jun 24, 2002, 11:32:54 PM6/24/02
to

CBFalconer <cbfal...@yahoo.com> writes:

> Since people have the urge to use gets because of the simplicity
> of the call, here is a version that is perfectly safe. Note the
> simple file copying loop in the test program.

Well, maybe it's safe, but is it actually useful?

> ================================
>
> /* File ggets.h - goodgets is a safe alternative to gets */

^
This is ggets.c I'm quoting, not ggets.h...

Would you really use this?

Here's a library function that, when told to go get a string, will
happily slurp up all the memory available to it. The caller is given
no control whatsoever over storage allocation. And in this particular
version, fggets will claw it's way up to, say, 80MB, 128 bytes at a
time. That would be 640 calls to realloc().

I don't think this really does what we want. This is how I see it:

1. gets() is bad, because it does no bounds-checking;

2. fgets() is good because it does, but

3. there's the '\n' inconvenience to deal with, and,
more importantly

4. sometimes legitimate input might go slightly over the
limit imposed by the caller, so we want to be able
to dynamically expand our storage.

I don't think this leads to expanding endlessly. I think the caller
should still be able to put an absolute upper bound on how much
storage it's willing to throw at a given string. We could just call
fgets() with the upper limit (maybe 32K or something) and resize
downward on success, but that seems awfully wasteful. So how about
something like this:

char *ggets(size_t abs_max);

Then we use your seed & grow mechanism, just making sure we stay under
abs_max. At that point we go ahead & truncate as fgets would do
because by now we're dealing with unreasonable input. (I'm also
thinking maybe we should try to get to abs_max faster, maybe doubling
each time we expand instead of adding a fixed number of bytes, so we
have fewer calls to realloc(), but maybe I'm wrong about that.) And of
course we resize downward on the way out.

What do you think?

Pat

CBFalconer

unread,
Jun 25, 2002, 12:48:36 AM6/25/02
to
Pat Foley wrote:
> CBFalconer <cbfal...@yahoo.com> writes:
>
> > Since people have the urge to use gets because of the simplicity
> > of the call, here is a version that is perfectly safe. Note the
> > simple file copying loop in the test program.
>
> Well, maybe it's safe, but is it actually useful?
>
> > /* fggets and ggets [which is fggets(stdin)] return a buffer
> > filled with the next complete line from the text stream f.
> > The storage has been allocated within fggets, and is
> > normally reduced to be an exact fit. The trailing \n has
> > been removed, so the resultant line is ready for dumping
> > with puts. If an allocation or file error occurs fggets
> > returns the NULL pointer. The buffer will be as large as
> > is required to hold the complete line.
> >
> > Freeing of assigned storage is the callers responsibility
> > */
> > char *fggets(FILE *f)
>
... snip code ...

To make it grow indefinitely the input has to supply something
that doesn't include a newline and yet continues indefinitely.
For stdin input I suspect that is actually impossible on most
installations. That leaves input from a disk file. I think it
would be a challenge to find such a file that fails to have a \n
within its first 80 megabytes.

I deliberately used the linear growth mechanism, because this is
expected to deal with text, and I expect that the most prevalent
inputs will require no expansion. It might have an interesting
effect on crackers who spend their time carefully typing in long
lines to find the point at which the system crashes, thus
detecting an unsafe program.

The fgets \n inconvenience is more than trivial. It requires the
user be aware, which most aren't. It then requires he take steps,
to leave the input stream in a consistent state (i.e. how and when
to flush the input line).

Use of fgets (and gets) requires that the unaware user supply a
suitable buffer. Many don't. Just scan this newsgroup for
plaintive beginner questions about how my absolutely perfect code
crashes sometimes. For fgets they also have to know the size of
the buffer, which often results in taking the sizeof a pointer.

So my attitude is 'do away with all that'. I admit that you have
found a way to make it hog everything available, but an infinite
loop can do that quite nicely too.

A further point is that, if implemented at the system level, with
full knowledge of what is going on under the hood, it can be very
efficient. Input lines are normally limited to something to allow
the input line editor to function, and this probably happens
within some sort of input interrupt routine. The underlying
system knows the size, and can happily allocate only one receive
buffer for the edited line in one fell swoop. At the user level
we don't have this ability.

Even without this a good malloc system will simply expand the
input buffer, without doing any data moving, on most of the
reallocs. I know the one in my nmalloc code will.

Experiment at your console. Just hold down some key for input,
and see how long a line you can generate from the keyboard. I am
morally certain you will find a limit.

/* FILE longline.c */
#include <stdio.h>
#include <string.h>
#include "ggets.h"

int main(void)
{
char * ln;

ln = ggets();
printf("\nlinelength was %d\n", (int)strlen(ln));
return 0;
}

On my machine this spat out 127. First there was much beeping and
input was rejected until I finally hit return. From a file things
can get much longer. When I made a file with 530 a's and a
newline, I got back 530 with "longline <longline.txt". I also
tried it on 4 binaries, and never got anything over 5.

Thanks for the comments. I hope I have answered satisfactorily.
NOW what do you think?

Pat Foley

unread,
Jun 25, 2002, 3:15:29 PM6/25/02
to
CBFalconer <cbfal...@yahoo.com> writes:

> To make it grow indefinitely the input has to supply something
> that doesn't include a newline and yet continues indefinitely.
> For stdin input I suspect that is actually impossible on most
> installations. That leaves input from a disk file. I think it
> would be a challenge to find such a file that fails to have a \n
> within its first 80 megabytes.

No, no, "80MB" was just an example. Your function will use whatever
memory is _available_ to the process calling it, and that might be a
whole lot less than 80MB. Big is not really the problem. Using all of
whatever's available is the problem.

> I deliberately used the linear growth mechanism, because this is
> expected to deal with text, and I expect that the most prevalent
> inputs will require no expansion. It might have an interesting

Yes. I think we agree what the usual case would be and we intend not
to punish the usual case.

> effect on crackers who spend their time carefully typing in long
> lines to find the point at which the system crashes, thus
> detecting an unsafe program.

Um, do you actually know what that effect would be? (See below.)

> The fgets \n inconvenience is more than trivial. It requires the
> user be aware, which most aren't. It then requires he take steps,
> to leave the input stream in a consistent state (i.e. how and when
> to flush the input line).

See, I dunno -- you're trying to devise a library function that's
idiot-proof? No way. I think you've just replaced one POLA violation
with another: gets() will cheerfully overrun the buffer passed to it;
your ggets() will cheerfully use whatever memory it can get its hands
on. When a simple library function does that I think something's
wrong.

[snip]


> Experiment at your console. Just hold down some key for input,
> and see how long a line you can generate from the keyboard. I am
> morally certain you will find a limit.
>
> /* FILE longline.c */
> #include <stdio.h>
> #include <string.h>
> #include "ggets.h"
>
> int main(void)
> {
> char * ln;
>
> ln = ggets();
> printf("\nlinelength was %d\n", (int)strlen(ln));
> return 0;
> }
>
> On my machine this spat out 127. First there was much beeping and
> input was rejected until I finally hit return. From a file things
> can get much longer. When I made a file with 530 a's and a
> newline, I got back 530 with "longline <longline.txt". I also
> tried it on 4 binaries, and never got anything over 5.

Dude, tell me this is not how you've tested this?! It makes me wonder
if you've ever actually executed the code for when memory runs
out...

> Thanks for the comments. I hope I have answered satisfactorily.

Not really.

Let me ask you this: have you done any tests which would have failed
if, instead of using ggets() you had used bad old gets() with a buffer
of say 1024 bytes?

You see, if your argument is that it'll almost always work, that's
true of gets() too.

If your argument is that there's almost always a newline at some
point, all you need is to pass a big enough buffer to gets().

Or if you really like yours better but you're convinced the
deviant cases just won't arise "in practice" or something, why not
just do something like "#define ABSOLUTE_MAX 32768" right in the
source, check against it, document it, and assume it'll never kick in?

But that doesn't seem right does it? If the caller knows what kind of
input it's going to deal with & knows it has the resources, why not
let it read strings as big as it wants? (Of course that's what my
version would do...) (And of course you could let the calling program
dick around with ABSOLUTE_MAX, but why not just give in and pass it as
a parameter?)

I've compiled it & tested it a little, but don't have time at the
moment to do a proper job. Did you test on files or input with no
newlines at all? I got no output back at all from tggets. None at
all. But figuring out why will have to wait until later tonight...

(I'll also test on input bigger than available memory, so we can
compare to other approaches.)

All for now,

Pat

E. Gibbons

unread,
Jun 25, 2002, 4:43:26 PM6/25/02
to
In article <3D17F587...@yahoo.com>,

CBFalconer <cbfal...@worldnet.att.net> wrote:
>Pat Foley wrote:
>> CBFalconer <cbfal...@yahoo.com> writes:
>>
>> > Since people have the urge to use gets because of the simplicity
>> > of the call, here is a version that is perfectly safe. Note the
>> > simple file copying loop in the test program.
>>
>> Well, maybe it's safe, but is it actually useful?
>>
[...]

>
>To make it grow indefinitely the input has to supply something
>that doesn't include a newline and yet continues indefinitely.
>For stdin input I suspect that is actually impossible on most
>installations. That leaves input from a disk file. I think it
>would be a challenge to find such a file that fails to have a \n
>within its first 80 megabytes.

Counterexamples:

./your_program < /dev/zero # stdin, no '\n', continues indefinitely

dd if=/dev/zero of=file bs=1 count=80M # big disk file with no '\n'

The latter is actually of practical use sometimes, even. The former might
occur in testing (in fact, it's one of the first things I'd try if I was
"stress-testing" a program which claimed to handle indefinite input).

--Ben

--

CBFalconer

unread,
Jun 25, 2002, 6:42:25 PM6/25/02
to
"Douglas A. Gwyn" wrote:

> CBFalconer wrote:
>
> > Since people have the urge to use gets because of the simplicity
> > of the call, here is a version that is perfectly safe.
>
> I applaud your making this available, but I have to take issue
> with the characterization as "safe":

>
> > Note the simple file copying loop in the test program.
> > while (temp = ggets()) {
> > puts(temp);
> > free(temp);
> > }
>
> What happens if ggets() cannot allocate the buffer? The loop
> terminates the same as if EOF had been encountered, thereby
> silently truncating the copy of the file.

Well, safe in that it cannot write wildly into forbidden areas.

>
> I would prefer that ggets() have the added feature of throwing
> an exception upon error, or at least change the interface so
> as to have more than a 1-bit status return:
> while (status = ggets(&temp)) { // no tuples in C, alas
> puts(temp);
> free(temp);
> }
> if (status != EOF )
> handle_error(status);
>
> The idea that malloc()ing functions should be passed a handle
> so that failure to test for success will be flagged by lint etc.
> is a useful one, although it reduces slightly the compiler's
> ability to make best use of registers.

My initial reaction was that that disturbed the simplicity of the
call, and thus encouraged misuse. On second thoughts I think you
are right, and I will change it accordingly in the next few days.
However I expect to have it return 0 on success, EOF for EOF, and
something else, positive, for allocation problems or other errors.

Thus the proposed parent function prototype will be:

int fggets(char* *buffer, FILE *f);

It has the added advantage of not losing the returned pointer by
accident. Unfortunately it doesn't read quite as well from the
point of view of the tyro.

That would make the condition above:

while (0 == (status = ggets(&temp))) { ...

I don't think exceptions are a good idea here.

I am amused by the idea of a cracker trying to find the overflow
point by entering ever lengthening lines at the terminal.
Especially if the system echoes back his completed input lines, so
that he continues to look for the end of the buffer.

--
Chuck F (cbfal...@yahoo.com) (cbfal...@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

--
comp.lang.c.moderated - moderation address: cl...@plethora.net

CBFalconer

unread,
Jun 25, 2002, 9:36:42 PM6/25/02
to
Pat Foley wrote:
>
> CBFalconer <cbfal...@yahoo.com> writes:
>
... snip ...

>
> See, I dunno -- you're trying to devise a library function that's
> idiot-proof? No way. I think you've just replaced one POLA violation
> with another: gets() will cheerfully overrun the buffer passed to it;
> your ggets() will cheerfully use whatever memory it can get its hands
> on. When a simple library function does that I think something's
> wrong.

There is no such thing. Idiots can always find a way :-)

>
> [snip]
> > Experiment at your console. Just hold down some key for input,
> > and see how long a line you can generate from the keyboard. I am
> > morally certain you will find a limit.
> >
> > /* FILE longline.c */
> > #include <stdio.h>
> > #include <string.h>
> > #include "ggets.h"
> >
> > int main(void)
> > {
> > char * ln;
> >
> > ln = ggets();
> > printf("\nlinelength was %d\n", (int)strlen(ln));
> > return 0;
> > }
> >
> > On my machine this spat out 127. First there was much beeping and
> > input was rejected until I finally hit return. From a file things
> > can get much longer. When I made a file with 530 a's and a
> > newline, I got back 530 with "longline <longline.txt". I also
> > tried it on 4 binaries, and never got anything over 5.
>
> Dude, tell me this is not how you've tested this?! It makes me wonder
> if you've ever actually executed the code for when memory runs out...

Of course not. That was just something I threw together to
respond to you. The first level of testing is the design. After
that - well you can see what I ran if you download the package -
see my website in the sig, download section.

I am planning to change the interface slightly - see Doug Gwynns
comments in c.l.c.moderated. This will give more detailed error
status and remove the temptation to lose the returned pointer.

int fggets(char* *line, FILE *f);

with a normal 0 returned, otherwise EOF or positive error
indicator.

... snip ...


>
> If your argument is that there's almost always a newline at some
> point, all you need is to pass a big enough buffer to gets().

No, that WILL use up all the memory and fail at points that could
have been handled. There is no such thing as a big enough gets
buffer. Any machinery is finite and will fail at some point - the
point is how will it fail and what will it take with it.

... snip ...


>
> I've compiled it & tested it a little, but don't have time at the
> moment to do a proper job. Did you test on files or input with no
> newlines at all? I got no output back at all from tggets. None at
> all. But figuring out why will have to wait until later tonight...
>
> (I'll also test on input bigger than available memory, so we can
> compare to other approaches.)

See above on the tests. I welcome the thrashings.

CBFalconer

unread,
Jun 25, 2002, 9:36:44 PM6/25/02
to

Can't try those here on this system - please do and let me know
what happens. Are text streams allowed to include the \0
character? (not that it affects the argument).

CBFalconer

unread,
Jun 26, 2002, 2:27:36 AM6/26/02
to
CBFalconer wrote:
>
... snip ...

>
> I am planning to change the interface slightly - see Doug Gwynns
> comments in c.l.c.moderated. This will give more detailed error
> status and remove the temptation to lose the returned pointer.
>
> int fggets(char* *line, FILE *f);
>
> with a normal 0 returned, otherwise EOF or positive error
> indicator.

This has now been done, and the mounted zip file updated at

<http://cbfalconer.home.att.net/download/ggets.zip>

Pat Foley

unread,
Jun 26, 2002, 2:48:30 AM6/26/02
to
CBFalconer <cbfal...@yahoo.com> writes:

> I am planning to change the interface slightly - see Doug Gwynns
> comments in c.l.c.moderated. This will give more detailed error
> status and remove the temptation to lose the returned pointer.
>
> int fggets(char* *line, FILE *f);
>
> with a normal 0 returned, otherwise EOF or positive error indicator.

Well I'll wait on the new version before looking at this anymore, but
while you're reworking it you have to change the main loop:

[quoting from ggets.c]


> while (NULL == (temp = strchr(rdpoint, '\n'))) {
> /* line is not completed, expand */

Nope. You can't assume not finding '\n' means the line is
incomplete. You don't test for EOF anywhere.

> /* set up cursize, rdpoint and rdsize, expand buffer */
> rdsize = DELTASIZE + 1; /* allow for a final '\0' */
> cursize += DELTASIZE;
> if (NULL == (temp = realloc(buffer, (size_t)cursize))) {
> /* ran out of memory */
> return buffer; /* partial line, next call will fail */
> }
> buffer = temp;
> /* Read into the '\0' up */
> rdpoint = buffer + (cursize - DELTASIZE - 1);
>
> /* get the next piece of this line */
> if (NULL == fgets(rdpoint, rdsize, f)) {

fgets() might return NULL here because we hit EOF without a newline...

> free(buffer); /* fouled, read error */

so you're throwing away the last line of the file if it doesn't have a
final newline. (And thus the whole file if it doesn't have a single
newline.)

> return NULL;
> }
> } /* while line not complete */

Or EOF.

ttyp0:~/src/bin/tggets% echo "This line has a newline." >test.data
ttyp0:~/src/bin/tggets% echo -n "This line doesn't.">>test.data
ttyp0:~/src/bin/tggets% cat test.data
This line has a newline.
This line doesn't.ttyp0:~/src/bin/tggets% ./tggets test.data
This line has a newline.
ttyp0:~/src/bin/tggets%

Oops.

That's why when I tested on input with no newlines at all I got
nothing back.

Pat

Pat Foley

unread,
Jun 26, 2002, 11:45:28 AM6/26/02
to
CBFalconer <cbfal...@yahoo.com> writes:

> /* Revised 2002-06-26 New prototype.
> */
>
> /* fggets and ggets [which is fggets(ln, stdin)] set *ln to


> a buffer filled with the next complete line from the text
> stream f. The storage has been allocated within fggets,
> and is normally reduced to be an exact fit. The trailing
> \n has been removed, so the resultant line is ready for

> dumping with puts. The buffer will be as large as is


> required to hold the complete line.
>

> Note: this means a final file line without a \n terminator
> is considered an error, because EOF occurs within the read.

An "error"? That's bogus. You always have to check for the final
newline. What's more, I think this is a serious POLA violation because
your wrapper dramatically changes the behavior of
gets()/fgets(). Here's the definition from n869:

7.19.7.2 The fgets function

Synopsis

[#1]

#include <stdio.h>
char *fgets(char * restrict s, int n,
FILE * restrict stream);

Description

[#2] The fgets function reads at most one less than the
number of characters specified by n from the stream pointed
to by stream into the array pointed to by s. No additional
characters are read after a new-line character (which is
retained) or after end-of-file. A null character is written
immediately after the last character read into the array.

Returns

[#3] The fgets function returns s if successful. If end-of-
file is encountered and no characters have been read into
the array, the contents of the array remain unchanged and a
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
null pointer is returned. If a read error occurs during the
operation, the array contents are indeterminate and a null
pointer is returned.

Your wrapper (I think) should be mimicking this behaviour:

1. call to fgets()

2. Oops! My buffer must've been too small...

3. go back to the same point in the stream and call fgets()
again, but with a bigger buffer

(4. repeat until you come up with a big enough n)

You see what I mean?

> If no error occurs fggets returns 0. If an EOF occurs on
> the input file, EOF is returned. For memory allocation
> errors some positive value is returned. In this case *ln
> may point to a partial line. For other errors memory is
> freed and *ln is set to NULL.


>
> Freeing of assigned storage is the callers responsibility
> */

Pat

CBFalconer

unread,
Jun 26, 2002, 5:45:36 PM6/26/02
to
Pat Foley wrote:
> CBFalconer <cbfal...@yahoo.com> writes:
>
... snip ...

> > /* get the next piece of this line */


> > if (NULL == fgets(rdpoint, rdsize, f)) {
>
> fgets() might return NULL here because we hit EOF without a newline...
>
> > free(buffer); /* fouled, read error */
>
> so you're throwing away the last line of the file if it doesn't have a
> final newline. (And thus the whole file if it doesn't have a single
> newline.)

Yes. Quite justified by the standard, which says in effect that
text file lines without a final \n have implementation (or un)
defined effects.

Pat Foley

unread,
Jun 27, 2002, 1:10:59 AM6/27/02
to
CBFalconer <cbfal...@yahoo.com> writes:

> Pat Foley wrote:
> > CBFalconer <cbfal...@yahoo.com> writes:
> >
> ... snip ...
>
> > > /* get the next piece of this line */
> > > if (NULL == fgets(rdpoint, rdsize, f)) {
> >
> > fgets() might return NULL here because we hit EOF without a newline...
> >
> > > free(buffer); /* fouled, read error */
> >
> > so you're throwing away the last line of the file if it doesn't have a
> > final newline. (And thus the whole file if it doesn't have a single
> > newline.)
>
> Yes. Quite justified by the standard, which says in effect that
> text file lines without a final \n have implementation (or un)
> defined effects.

Okay this is definitely not the answer I was expecting. Maybe I need
some help here (Dan? Ben?) but I don't think that's what the Standard
says at all...

From n869 (which is all I have at the moment):

7.19.2 Streams

[#1] Input and output, whether to or from physical devices
such as terminals and tape drives, or whether to or from
files supported on structured storage devices, are mapped
into logical data streams, whose properties are more uniform
than their various inputs and outputs. Two forms of mapping
are supported, for text streams and for binary streams.209)

[#2] A text stream is an ordered sequence of characters
composed into lines, each line consisting of zero or more
characters plus a terminating new-line character. Whether
the last line requires a terminating new-line character is
implementation-defined. Characters may have to be added,
altered, or deleted on input and output to conform to
differing conventions for representing text in the host
environment. [...]
____________________

209An implementation need not distinguish between text
streams and binary streams. In such an implementation,
there need be no new-line characters in a text stream nor
any limit to the length of a line.

I think that says that if you rely on the last line of a text-file
having a terminating new-line, then your program is not strictly
conforming. Have I got that right?

And even if I'm wrong there, the fact remains that your fggets() loses
data that gets() and fgets() both preserve. Why on earth would you
want to do that?

Pat

CBFalconer

unread,
Jun 27, 2002, 1:45:57 AM6/27/02
to
Pat Foley wrote:
> CBFalconer <cbfal...@yahoo.com> writes:
>
> > /* Revised 2002-06-26 New prototype.
> > */
> >
> > /* fggets and ggets [which is fggets(ln, stdin)] set *ln to
> > a buffer filled with the next complete line from the text
> > stream f. The storage has been allocated within fggets,
> > and is normally reduced to be an exact fit. The trailing
> > \n has been removed, so the resultant line is ready for
> > dumping with puts. The buffer will be as large as is
> > required to hold the complete line.
> >
> > Note: this means a final file line without a \n terminator
> > is considered an error, because EOF occurs within the read.
>
> An "error"? That's bogus. You always have to check for the final
> newline. What's more, I think this is a serious POLA violation because
> your wrapper dramatically changes the behavior of
> gets()/fgets(). Here's the definition from n869:

It is not a wrapper. It is a function which happens to have been
implemented in a portable manner by using fgets. It could be
implemented in other ways.

For a final line without a \n there is no point in looking for a
\n because there isn't one and won't ever be one, if EOF is
encountered. The choice is whether or not to manufacture one out
of air. Since fgets doesn't guarantee the buffer when an early
EOF is encountered, there is no choice remaining. If I built it
out of fgetc I could make some guarantees.

C stream text lines end with a \n. Every one. The lack is an
error.

Ben Pfaff

unread,
Jun 27, 2002, 2:05:48 AM6/27/02
to
Pat Foley <pfo...@earthlink.net> writes:

> CBFalconer <cbfal...@yahoo.com> writes:
> > Yes. Quite justified by the standard, which says in effect that
> > text file lines without a final \n have implementation (or un)
> > defined effects.

C99 7.19.2 says that whether a text stream requires a new-line at
the end of the last line is implementation-defined:

2 A text stream is an ordered sequence of characters composed


into lines, each line consisting of zero or more characters
plus a terminating new-line character. Whether the last line
requires a terminating new-line character is
implementation-defined.

> Okay this is definitely not the answer I was expecting. Maybe I need
> some help here (Dan? Ben?) but I don't think that's what the Standard
> says at all...

[...snip pretty much what I quoted...]

> I think that says that if you rely on the last line of a text-file
> having a terminating new-line, then your program is not strictly
> conforming. Have I got that right?

That's so. For instance, under Unix, there is no difference
between text and binary streams and thus no requirement that a
text file end in a new-line.

There are two ways to look at this being implementation-defined:

* On input, you cannot depend on a text file ending in a
new-line.

* On output, you cannot assume that the system supports
writing a text file that does not end in a new-line.

In short, this is a case where, for the widest portability, you
must be liberal in what you accept, strict in what you generate.
If CBFalconer's code doesn't properly handle text files that
don't end in a new-line, it's not going to work everywhere or in
every situation.
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}

Ben Pfaff

unread,
Jun 27, 2002, 2:07:27 AM6/27/02
to
CBFalconer <cbfal...@yahoo.com> writes:

> C stream text lines end with a \n. Every one. The lack is an
> error.

Not true. For portability text streams that a C program *writes*
must end a new-line. But there is no guarantee that a text
stream *read* by a C program will end in a new-line.

Dan Pop

unread,
Jun 27, 2002, 6:20:39 AM6/27/02
to
In <868z517...@sparky.fbsd.home> Pat Foley <pfo...@earthlink.net> writes:

>I think that says that if you rely on the last line of a text-file
>having a terminating new-line, then your program is not strictly
>conforming. Have I got that right?

If your program is doing file input, it is strictly conforming only if
its output is not affected by the failure of any <stdio.h> input function.

This basically rules out any useful program that needs input.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan...@ifh.de

CBFalconer

unread,
Jun 27, 2002, 7:35:51 AM6/27/02
to

I am basing my decision on the guaranteed action of fgets, below:

[#1]

#include <stdio.h>
char *fgets(char * restrict s, int n,
FILE * restrict stream);

Description

[#2] The fgets function reads at most one less than the
number of characters specified by n from the stream pointed
to by stream into the array pointed to by s. No additional
characters are read after a new-line character (which is
retained) or after end-of-file. A null character is written
immediately after the last character read into the array.

Returns

[#3] The fgets function returns s if successful. If end-of-
file is encountered and no characters have been read into
the array, the contents of the array remain unchanged and a

null pointer is returned. If a read error occurs during the
operation, the array contents are indeterminate and a null
pointer is returned.

____________________

232An end-of-file and a read error can be distinguished by
use of the feof and ferror functions.

Which, as I read it, guarantees nothing about the state of the
buffer when it hits EOF, which is what it will do on reading an
unterminated line. I can't even assume a terminal \0 is in
there. If I had prezeroed the buffer I could make some likely
guesses, but still nothing guaranteed.

If I was doing the reading with fgetc I could know something about
that final buffer. But that would normally be a horrendous
efficiency hit.

CBFalconer

unread,
Jun 27, 2002, 7:35:53 AM6/27/02
to

fgets() by your criterion doesn't work either. At any rate, all
this is criticism of my particular implementation of ggets. The
most important part is the interface and underlying action. It is
always possible to build one that does not have the unterminated
line bug, meanwhile it is documented.

As a reminder, this is intended to have the convenience of gets()
without the insecurities. I think it fills that bill.

BTW the proper place to fix the read action on an unterminated
line is in the file system proper, not the application level.
Anything else is throwing patch after patch.

Richard Heathfield

unread,
Jun 27, 2002, 11:13:42 AM6/27/02
to
CBFalconer wrote:
>
<snip>

>
> I am planning to change the interface slightly - see Doug Gwynns
> comments in c.l.c.moderated. This will give more detailed error
> status and remove the temptation to lose the returned pointer.
>
> int fggets(char* *line, FILE *f);
>
> with a normal 0 returned, otherwise EOF or positive error
> indicator.
>
<snip>

> >
> > (I'll also test on input bigger than available memory, so we can
> > compare to other approaches.)
>
> See above on the tests. I welcome the thrashings.

Another problem with the interface is that there is no way to re-use an
existing buffer. The caller must manage every single return with a
matching free() at some point.

I suggest two extra parameters:

1) a pointer to size_t, which tracks the current size of the buffer, so
that the existing memory can be re-used, with realloc only cutting in
when necessary, and

2) a size_t, which indicates the very highest amount of memory the user
is prepared to tolerate being used up (with 0, perhaps, indicating "go
all the way to the newline and damn the consequences").

--
Richard Heathfield : bin...@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton

Jeremy Yallop

unread,
Jun 27, 2002, 11:29:01 AM6/27/02
to
Richard Heathfield wrote:

> CBFalconer wrote:
> >
> > int fggets(char* *line, FILE *f);
>
> I suggest two extra parameters:
>
> 1) a pointer to size_t, which tracks the current size of the buffer, so
> that the existing memory can be re-used, with realloc only cutting in
> when necessary, and

Interestingly, at this point you have largely re-invented glibc's
getline():

[from the glibc documentation]

ssize_t getline (char **LINEPTR, size_t *N, FILE *STREAM)

This function reads an entire line from STREAM, storing the text
(including the newline and a terminating null character) in a
buffer and storing the buffer address in `*LINEPTR'.

Before calling `getline', you should place in `*LINEPTR' the
address of a buffer `*N' bytes long, allocated with `malloc'. If
this buffer is long enough to hold the line, `getline' stores the
line in this buffer. Otherwise, `getline' makes the buffer bigger
using `realloc', storing the new buffer address back in `*LINEPTR'
and the increased size back in `*N'.

If you set `*LINEPTR' to a null pointer, and `*N' to zero, before
the call, then `getline' allocates the initial buffer for you by
calling `malloc'

> 2) a size_t, which indicates the very highest amount of memory the user
> is prepared to tolerate being used up (with 0, perhaps, indicating "go
> all the way to the newline and damn the consequences").

getline() doesn't have this, of course.

Jeremy.

Ben Pfaff

unread,
Jun 27, 2002, 11:27:19 AM6/27/02
to
CBFalconer <cbfal...@yahoo.com> writes:

I really don't know how you're managing to misread the above
description. Please read it again. What it says is that fgets()
reads a line up to a new-line character or end-of-file, and then
adds a null terminator. The only case where it "guarantees
nothing" is when it hits end-of-file and *no characters have yet
been read*; in other words, when the stream was at end-of-file
before fgets() was called. Notice also that hitting end-of-file
is not a read error.

> I can't even assume a terminal \0 is in
> there. If I had prezeroed the buffer I could make some likely
> guesses, but still nothing guaranteed.

???

> If I was doing the reading with fgetc I could know something about
> that final buffer. But that would normally be a horrendous
> efficiency hit.

???

Ben Pfaff

unread,
Jun 27, 2002, 11:30:12 AM6/27/02
to
CBFalconer <cbfal...@yahoo.com> writes:

> > There are two ways to look at this being implementation-defined:
> >
> > * On input, you cannot depend on a text file ending in a
> > new-line.
> >
> > * On output, you cannot assume that the system supports
> > writing a text file that does not end in a new-line.
> >
> > In short, this is a case where, for the widest portability, you
> > must be liberal in what you accept, strict in what you generate.
> > If CBFalconer's code doesn't properly handle text files that
> > don't end in a new-line, it's not going to work everywhere or in
> > every situation.
>
> fgets() by your criterion doesn't work either.

Yes it does. Re-read the description of fgets() in the
standard. You are misreading it somehow.

> At any rate, all
> this is criticism of my particular implementation of ggets. The
> most important part is the interface and underlying action. It is
> always possible to build one that does not have the unterminated
> line bug, meanwhile it is documented.

The unterminated line bug is unacceptable IMO. I would never use
such a function in my own program. You are free to use it in
your programs of course, but you're free to get the user
complaints, too.

> As a reminder, this is intended to have the convenience of gets()
> without the insecurities. I think it fills that bill.

The unterminated line bug is a big insecurity from my point of
view at least.

> BTW the proper place to fix the read action on an unterminated
> line is in the file system proper, not the application level.
> Anything else is throwing patch after patch.

So you think that the file system should clean up after your
bugs? I can see Alexander Viro and Ted Ts'o just *jumping* to
add this code to VFS and/or ext2.
--
"Am I missing something?"
--Dan Pop

Pat Foley

unread,
Jun 27, 2002, 11:57:01 AM6/27/02
to
CBFalconer <cbfal...@yahoo.com> writes:

Yes it does, and it doesn't rely on implementation-defined behaviour.


[...]

> BTW the proper place to fix the read action on an unterminated
> line is in the file system proper, not the application level.
> Anything else is throwing patch after patch.

An implementation may choose not to require the last line of a
text-file to have a terminating new-line and still be conforming.

Pat

Pat Foley

unread,
Jun 27, 2002, 11:57:05 AM6/27/02
to
CBFalconer <cbfal...@yahoo.com> writes:

No. Encountering end-of-file is not a read error. Not when it comes
first (fgets() returns a null pointer) and not when it comes last
(fgets() returns s).

Because the standard discusses them separately, it would have to say
explicitly that encountering end-of-file is a read error. It does not.

> I can't even assume a terminal \0 is in there.

Yes you can.

Pat

Pat Foley

unread,
Jun 27, 2002, 12:11:05 PM6/27/02
to
Ben Pfaff <b...@cs.stanford.edu> writes:

ITYM it guarantees nothing on a read error. For immediate end-of-file
fgets() returns a null pointer but promises not to mess with the
buffer passed to it at all.

Pat

CBFalconer

unread,
Jun 27, 2002, 12:33:30 PM6/27/02
to
Richard Heathfield wrote:
> CBFalconer wrote:
> >
> <snip>
> >
> > I am planning to change the interface slightly - see Doug Gwynns
> > comments in c.l.c.moderated. This will give more detailed error
> > status and remove the temptation to lose the returned pointer.
> >
> > int fggets(char* *line, FILE *f);
> >
> > with a normal 0 returned, otherwise EOF or positive error
> > indicator.
> >
> <snip>
> > >
> > > (I'll also test on input bigger than available memory, so we can
> > > compare to other approaches.)
> >
> > See above on the tests. I welcome the thrashings.
>
> Another problem with the interface is that there is no way to re-use
> an existing buffer. The caller must manage every single return with
> a matching free() at some point.

I look on that as a plus. The user doesn't have to remember
something different depending on whatever. I have already
complicated the interface for better error reporting.

>
> I suggest two extra parameters:
>
> 1) a pointer to size_t, which tracks the current size of the buffer, so
> that the existing memory can be re-used, with realloc only cutting in
> when necessary, and

This would be pointless, because the function doesn't return until
it is done. Or am I missing something? Once you start adding
parameters you make the user think, which is a grand opportunity
to go wrong. It is already hard enough just to remember which end
of the parameter list holds the FILE* :-) (at least for me).

>
> 2) a size_t, which indicates the very highest amount of memory the user
> is prepared to tolerate being used up (with 0, perhaps, indicating "go
> all the way to the newline and damn the consequences").

We could wire in some large limit. This would have the effect, at
the user end, of effectively splitting long lines into multiple
lines, with no indication that this had occurred. I don't like
it. The subjective effect is to force a \n into the input stream
every so many characters. This smells like MS Word.

If it does actually use all the memory (which I consider extremely
unlikely) there are two possibilities.

a) it has a complete line, and the problem will show up later when
the user wants to allocate something more and

b) it did not complete the line, when it has already freed that
memory and reported an error. Apart from the state of the input
stream the system is in the same state as it would have been
without the ggets call.

More comments welcomed.

Ben Pfaff

unread,
Jun 27, 2002, 12:35:17 PM6/27/02
to
Pat Foley <pfo...@earthlink.net> writes:

[fgets()]

> ITYM it guarantees nothing on a read error. For immediate end-of-file
> fgets() returns a null pointer but promises not to mess with the
> buffer passed to it at all.

True; sorry about that. At any rate, end-of-file does not cause
fgets() to do anything strange or undefined or even
implementation-defined.

CBFalconer

unread,
Jun 27, 2002, 1:21:52 PM6/27/02
to
Ben Pfaff wrote:
> CBFalconer <cbfal...@yahoo.com> writes:
>
... snip ...

It looks as if the consensus is that I am misinterpreting the exit
conditions of fgets() and that I should rework my behavior on
premature EOF.

The result will be that a file copied with ggets/puts may have a
terminal \n appended to it. Obviously people consider that a
better ending that losing the final line.

Simon Biber

unread,
Jun 27, 2002, 4:33:31 PM6/27/02
to
"CBFalconer" <cbfal...@yahoo.com> wrote:
> It is already hard enough just to remember which end
> of the parameter list holds the FILE* :-) (at least for me).

My general rule is:

If it's a variadic function (or the "v" form of one), the FILE*
goes at the front. Otherwise the FILE* goes at the end.

However that doesn't work for:
setbuf
setvbuf
fgetpos
fsetpos
fseek
Thankfully I don't often use any of those exceptions.

I have used fseek once, while reading a 1.3 GB graphics file
that I wanted to read only part of. It took too long to
"for(i=0;i<n;i++)getc(fp);" my way through it!

--
Simon.


CBFalconer

unread,
Jun 27, 2002, 4:47:47 PM6/27/02
to

OK, you guys have caused significant improvement. ggets now
returns a final unterminated line, effectively injecting a
terminal \n if one is missing.

The result, dated 2002-06-27, is mounted (with added tests) as:

<http://cbfalconer.home.att.net/download/ggets.zip>

Richard Heathfield

unread,
Jun 27, 2002, 4:54:55 PM6/27/02
to
Jeremy Yallop wrote:
>
> Richard Heathfield wrote:
> > CBFalconer wrote:
> > >
> > > int fggets(char* *line, FILE *f);
> >
> > I suggest two extra parameters:
> >
> > 1) a pointer to size_t, which tracks the current size of the buffer, so
> > that the existing memory can be re-used, with realloc only cutting in
> > when necessary, and
>
> Interestingly, at this point you have largely re-invented glibc's
> getline():

Well, there ya go. I didn't know this.

>
> [from the glibc documentation]
>
> ssize_t getline (char **LINEPTR, size_t *N, FILE *STREAM)
>
> This function reads an entire line from STREAM, storing the text
> (including the newline and a terminating null character) in a
> buffer and storing the buffer address in `*LINEPTR'.

This seems to be the only important difference - my version (I say "my"
because one of the changes I suggested would have brought Chuck's
function into line with my own fgetline function, posted here some weeks
ago) removes the newline.

<snip>


>
> > 2) a size_t, which indicates the very highest amount of memory the user
> > is prepared to tolerate being used up (with 0, perhaps, indicating "go
> > all the way to the newline and damn the consequences").
>
> getline() doesn't have this, of course.

Neither does fgetline, but I am considering adding it.

Richard Heathfield

unread,
Jun 27, 2002, 4:59:45 PM6/27/02
to
CBFalconer wrote:
>
> Richard Heathfield wrote:

<snip>


> >
> > Another problem with the interface is that there is no way to re-use
> > an existing buffer. The caller must manage every single return with
> > a matching free() at some point.
>
> I look on that as a plus. The user doesn't have to remember
> something different depending on whatever. I have already
> complicated the interface for better error reporting.

I don't think three parameters are all that much to ask. The fourth is a
matter of how paranoid you are, and should perhaps be provided in a
similar but separate function.

>
> >
> > I suggest two extra parameters:
> >
> > 1) a pointer to size_t, which tracks the current size of the buffer, so
> > that the existing memory can be re-used, with realloc only cutting in
> > when necessary, and
>
> This would be pointless, because the function doesn't return until
> it is done. Or am I missing something?

If you don't want to re-use the buffer, it would indeed be pointless.
But consider the constant mallocing and freeing that a simple loop
through a file would involve if you /don't/ re-use the buffer.

> Once you start adding
> parameters you make the user think, which is a grand opportunity
> to go wrong. It is already hard enough just to remember which end
> of the parameter list holds the FILE* :-) (at least for me).

<shrug> It can be analogous to the parameter list to fgets, which is
burned into my synapses. I have far more problems with fread and fwrite
(because the two middle parameters are of the same type).

>
> >
> > 2) a size_t, which indicates the very highest amount of memory the user
> > is prepared to tolerate being used up (with 0, perhaps, indicating "go
> > all the way to the newline and damn the consequences").
>
> We could wire in some large limit. This would have the effect, at
> the user end, of effectively splitting long lines into multiple
> lines, with no indication that this had occurred. I don't like
> it. The subjective effect is to force a \n into the input stream
> every so many characters. This smells like MS Word.

No, it could simply be a failure condition: "if the line is longer than
maxbuf bytes, fail the call", perhaps with 0 as an override value.

<snip>

Quinn

unread,
Jun 27, 2002, 9:04:14 PM6/27/02
to
In article <3D1B7B9F...@eton.powernet.co.uk>,
bin...@eton.powernet.co.uk says...

> Neither does fgetline, but I am considering adding it.
>

Could you please post it again ?
I had build one too but using fgets (rather than fgetc for performance
reasons) and it's a little bit messy...
Yours seem to be a lot more reliable and I'll be glad to be a beta tester
;-)

Oh by he way : is yours free to use ?

Thanks

CBFalconer

unread,
Jun 28, 2002, 4:18:48 AM6/28/02
to
Quinn wrote:
>
... snip ...

>
> Could you please post it again ?
> I had build one too but using fgets (rather than fgetc for
> performance reasons) and it's a little bit messy...
> Yours seem to be a lot more reliable and I'll be glad to be
> a beta tester ;-)
>
> Oh by he way : is yours free to use ?

You can get it at:

<http://cbfalconer.home.att.net/download/ggets.zip>

I have made it public domain. Use as you wish.

Dan Pop

unread,
Jun 28, 2002, 6:26:03 AM6/28/02
to

>In article <3D1B7B9F...@eton.powernet.co.uk>,
>bin...@eton.powernet.co.uk says...
>> Neither does fgetline, but I am considering adding it.
>
>Could you please post it again ?
>I had build one too but using fgets (rather than fgetc for performance
>reasons) and it's a little bit messy...

Implementing such a function is inherently messy. That's why it should
have been provided by the standard library.

Richard Heathfield

unread,
Jun 28, 2002, 3:15:28 PM6/28/02
to
Quinn wrote:
>
> In article <3D1B7B9F...@eton.powernet.co.uk>,
> bin...@eton.powernet.co.uk says...
> > Neither does fgetline, but I am considering adding it.
> >
>
> Could you please post it again ?

Yes but.

Yes, but please be aware that I'm planning on making some changes to it
quite soon, which will involve an interface change.

> I had build one too but using fgets (rather than fgetc for performance
> reasons) and it's a little bit messy...
> Yours seem to be a lot more reliable and I'll be glad to be a beta tester
> ;-)

Gee thanks - I just /love/ bug reports. :-)

>
> Oh by he way : is yours free to use ?

Almost free - you can use it for whatever you like without charge, but
you must retain the authorship comment blocks.


Okay, here it is, such as it is.


Two files: fgetline.h and fgetline.c - I think you can probably work out
where the division is, but I've marked it anyway, /* ------ */ like
that.

/* fgetline.h - header for the fgetline function.
* fgetline() was written by Richard Heathfield,
* who can be reached at bin...@eton.powernet.co.uk.
* The version documented here is not the final version.
*
* fgetline reads a line from fp into a dynamically
* allocated string pointed to by *line. Note: it *is*
* acceptable to do char *p = NULL and pass &p; but it is
* *not* acceptable to do this: char s[10]; and pass &s!!
*
* If you pass the address of a NULL pointer, you needn't
* set size before passing &size. Otherwise, it must be
* set to the number of bytes pointed to by the pointer
* whose address you pass in.
*
* The string will be dynamically resized if need be, so as
* to be able to store the whole line.
*
* It is the caller's responsibility to release any memory
* allocated by this function.
*
* flags:
* FGL_REDUCE : after the line is captured, reduce the
* amount of memory to the minimum necessary
* to hold the string.
*
* The function returns 0 on success, 1 on EOF, or a negative
* number if something else went wrong (almost certainly a
* lack of memory, but could be a stream error).
*/
#ifndef FGETLINE_H_
#define FGETLINE_H_ 1

#include <stdio.h>

#define FGL_BUFSIZ BUFSIZ /* adjust to taste */
#define FGL_REDUCE 1

int fgetline(char **line, size_t *size, FILE *fp, unsigned int flags);

#endif

/* ------ */


/* fgetline.c - source for the fgetline function.
* fgetline() was written by Richard Heathfield,
* who can be reached at bin...@eton.powernet.co.uk.
* The version documented here is not the final version.
*
/* fgetline reads a line from fp into a dynamically
* allocated string pointed to by *line. Note: it *is*
* acceptable to do char *p = NULL and pass &p; but it is
* *not* acceptable to do this: char s[10]; and pass &s!!
*
* If you pass the address of a NULL pointer, you needn't
* set size before passing &size. Otherwise, it must be
* set to the number of bytes pointed to by the pointer
* whose address you pass in.
*
* The string will be dynamically resized if need be, so as
* to be able to store the whole line.
*
* It is the caller's responsibility to release any memory
* allocated by this function.
*
* flags:
* FGL_REDUCE : after the line is captured, reduce the
* amount of memory to the minimum necessary
* to hold the string.
*
* The function returns 0 on success, 1 on EOF, or a negative
* number if something else went wrong (almost certainly a
* lack of memory, but could be a stream error).
*
* Assertions:
* If NDEBUG is not defined, the following program bugs will
* fire an assertion failure: line is NULL, size is NULL, fp is NULL.
*/

#include <assert.h>

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "fgetline.h"

int fgetline(char **line, size_t *size, FILE *fp, unsigned int flags)
{
int Result = 0;
size_t count = 0;
int ch = 0;
char *tmp = NULL;

assert(fp != NULL);
assert(line != NULL);
assert(size != NULL);

if(NULL == *line)
{
*line = malloc(FGL_BUFSIZ);
if(*line != NULL)
{
*size = FGL_BUFSIZ;
}
else
{
Result = -1;
}
}

while(0 == Result && (ch = fgetc(fp)) != '\n' && ch != EOF)
{
if(count + 2 >= *size)
{
/* This realloc strategy is subject to revision */
tmp = realloc(*line, FGL_BUFSIZ + *size);
if(NULL == tmp)
{
Result = -2;
}
else
{
*line = tmp;
tmp = NULL;
*size += FGL_BUFSIZ;
}
}
if(0 == Result)
{
(*line)[count++] = ch;
}
}
if(0 == Result)
{
(*line)[count] = '\0';
}

if(0 == Result)
{
if(flags & FGL_REDUCE)
{
tmp = realloc(*line, strlen(*line) + 1);
if(tmp != NULL)
{
*line = tmp;
}
}
}

if(feof(fp))
{
Result = 1;
}
else if(ferror(fp))
{
Result = -3;
}

return Result;
}


/* test driver */
#ifdef TEST_IT
int main(void)
{
char *s = NULL;

size_t bufsize = 0;
while(0 == fgetline(&s, &bufsize, stdin, 0))
{
printf("[%s]\n", s); /* adjust to taste */
}

free(s);

return 0;
}
#endif

Quinn

unread,
Jun 28, 2002, 6:29:29 PM6/28/02
to
In article <3D1BDE64...@yahoo.com>, cbfal...@yahoo.com says...

> > Could you please post it again ?
> > I had build one too but using fgets (rather than fgetc for
> > performance reasons) and it's a little bit messy...
> > Yours seem to be a lot more reliable and I'll be glad to be
> > a beta tester ;-)
> >
> > Oh by he way : is yours free to use ?
>
> You can get it at:
>
> <http://cbfalconer.home.att.net/download/ggets.zip>
>
> I have made it public domain. Use as you wish.
>

Sorry I was talking to Richard Heathfield for his 'fgetline'.
For performance reasons I need to re-use the buffer allocated by the
'dynamic fgets'...

Thanks anyway, yours seem to be good enough for most people !

Quinn

unread,
Jun 28, 2002, 7:35:28 PM6/28/02
to
In article <3D1CB5D0...@eton.powernet.co.uk>,
bin...@eton.powernet.co.uk says...

> > I had build one too but using fgets (rather than fgetc for performance
> > reasons) and it's a little bit messy...
> > Yours seem to be a lot more reliable and I'll be glad to be a beta tester
> > ;-)
>
> Gee thanks - I just /love/ bug reports. :-)
>

Bug report ! bug report ! ;-)
First : your fgetline is very neat (a lot more than mine), and I
especially like your 'resize' option.

The only bug I found so far is in the 'resize' option :
I think that you should add :


if(flags & FGL_REDUCE)
{
tmp = realloc(*line, strlen(*line) + 1);
if(tmp != NULL)
{
*line = tmp;

/* line added */
*size=strlen(*line) + 1;

}
}

Regards,

CBFalconer

unread,
Jun 29, 2002, 12:15:02 AM6/29/02
to
Quinn wrote:
> <3D1BDE64...@yahoo.com>, cbfal...@yahoo.com says...
>
... snip ...

> >
> > You can get it at:
> >
> > <http://cbfalconer.home.att.net/download/ggets.zip>
> >
> > I have made it public domain. Use as you wish.
> >
>
> Sorry I was talking to Richard Heathfield for his 'fgetline'.
> For performance reasons I need to re-use the buffer allocated by the
> 'dynamic fgets'...
>
> Thanks anyway, yours seem to be good enough for most people !

The following profile shows negligible time spend in the
malloc/realloc/free group. This was a run of tggets copying the
text of the C standard. It effectively reuses the same storage
each time, but there is no guarantee of that. Virtually all the
fggets time is spent in fgets.

Bear in mind that this is on a 486/80 under DJGPP/W98, which some
consider slow and antiquated machinery :-)

This was also linked with my own malloc package for DJGPP, because
it gives added detail. The system package had essentially the same
overall results.

Flat profile:

Each sample counts as 0.0555556 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
48.97 14.50 14.50 __dpmi_int
8.44 17.00 2.50 _doprnt
5.44 18.61 1.61 putchar
4.50 19.94 1.33 fgets
4.13 21.17 1.22 mcount
2.44 21.89 0.72 _write
2.44 22.61 0.72 putc
2.06 23.22 0.61 39846 0.02 0.03 free
2.06 23.83 0.61 puts
1.88 24.39 0.56 __movedata
1.88 24.94 0.56 fprintf
1.50 25.39 0.44 79733 0.01 0.01 split
1.50 25.83 0.44 fflush
1.31 26.22 0.39
__dj_movedata
1.13 26.56 0.33 79684 0.00 0.00 extractfree
1.13 26.89 0.33 39894 0.01 0.02 malloc
1.13 27.22 0.33 39841 0.01 0.02 realloc
1.13 27.56 0.33 _flsbuf
0.94 27.83 0.28 119544 0.00 0.00 mv2freelist
0.94 28.11 0.28 79684 0.00 0.01 combinehi
0.75 28.33 0.22 39846 0.01 0.02 dofree
0.75 28.56 0.22 1 222.22 3276.59 main
0.75 28.78 0.22 strlen
0.56 28.94 0.17 39842 0.00 0.04 fggets
0.56 29.11 0.17 strchr
0.38 29.22 0.11
__FSEXT_get_function
0.38 29.33 0.11 init
0.38 29.44 0.11 memchr
0.19 29.50 0.06 39894 0.00 0.00 searchfree
0.19 29.56 0.06 __dosmemput
0.19 29.61 0.06 localeconv
0.00 29.61 0.00 5 0.00 0.00 extendsbrk
================================================


Call graph

granularity: each sample hit covers 4 byte(s) for 0.20% of 28.39
seconds

index % time self children called name
<spontaneous>
[1] 51.1 14.50 0.00 __dpmi_int [1]
-----------------------------------------------
<spontaneous>
[2] 11.5 0.00 3.28 __crt1_startup [2]
0.22 3.05 1/1 main [3]
0.00 0.00 1/39846 free [8]
-----------------------------------------------
0.22 3.05 1/1 __crt1_startup
[2]
[3] 11.5 0.22 3.05 1 main [3]
0.17 1.55 39842/39842 fggets [5]
0.61 0.72 39841/39846 free [8]
-----------------------------------------------
<spontaneous>
[4] 8.8 2.50 0.00 _doprnt [4]
-----------------------------------------------
0.17 1.55 39842/39842 main [3]
[5] 6.1 0.17 1.55 39842 fggets [5]
0.33 0.47 39842/39894 malloc [9]
0.33 0.42 39841/39841 realloc [10]
0.00 0.00 1/39846 free [8]
-----------------------------------------------
<spontaneous>
[6] 5.7 1.61 0.00 putchar [6]
-----------------------------------------------
<spontaneous>
[7] 4.7 1.33 0.00 fgets [7]
-----------------------------------------------
0.00 0.00 1/39846 __putenv [33]
0.00 0.00 1/39846
__crt0_load_environment_file [35]
0.00 0.00 1/39846 __crt1_startup
[2]
0.00 0.00 1/39846 __glob [36]
0.00 0.00 1/39846 fggets [5]
0.61 0.72 39841/39846 main [3]
[8] 4.7 0.61 0.72 39846 free [8]
0.22 0.50 39846/39846 dofree [13]
-----------------------------------------------
0.00 0.00 1/39894
__crt0_load_environment_file [35]
0.00 0.00 1/39894 __glob [36]
0.00 0.00 1/39894 atexit [41]
0.00 0.00 1/39894 __alloc_file [39]
0.00 0.00 1/39894 _filbuf [40]
0.00 0.00 1/39894 _flsbuf [22]
0.00 0.00 2/39894 calloc [38]
0.00 0.00 2/39894 add [37]
0.00 0.00 5/39894 c1xmalloc [34]
0.00 0.00 9/39894 __putenv [33]
0.00 0.00 28/39894 setup_environment
[32]
0.33 0.47 39842/39894 fggets [5]
[9] 2.8 0.33 0.47 39894 malloc [9]
0.22 0.00 39892/79733 split [20]
0.09 0.10 39855/119544 mv2freelist [16]
0.06 0.00 39894/39894 searchfree [31]
0.00 0.00 5/5 extendsbrk [42]
-----------------------------------------------
0.33 0.42 39841/39841 fggets [5]
[10] 2.6 0.33 0.42 39841 realloc [10]
0.22 0.00 39841/79733 split [20]
0.09 0.10 39841/119544 mv2freelist [16]
-----------------------------------------------
<spontaneous>
[11] 2.5 0.72 0.00 _write [11]
-----------------------------------------------
<spontaneous>
[12] 2.5 0.72 0.00 putc [12]
-----------------------------------------------
0.22 0.50 39846/39846 free [8]
[13] 2.5 0.22 0.50 39846 dofree [13]
0.14 0.17 39843/79684 combinehi [14]
0.09 0.10 39846/119544 mv2freelist [16]
-----------------------------------------------
0.14 0.17 39841/79684 mv2freelist [16]
0.14 0.17 39843/79684 dofree [13]
[14] 2.2 0.28 0.33 79684 combinehi [14]
0.33 0.00 79684/79684 extractfree [23]
-----------------------------------------------
<spontaneous>
[15] 2.2 0.61 0.00 puts [15]
-----------------------------------------------
0.00 0.00 2/119544 extendsbrk [42]
0.09 0.10 39841/119544 realloc [10]
0.09 0.10 39846/119544 dofree [13]
0.09 0.10 39855/119544 malloc [9]
[16] 2.1 0.28 0.31 119544 mv2freelist [16]
0.14 0.17 39841/79684 combinehi [14]
-----------------------------------------------
<spontaneous>
[17] 2.0 0.56 0.00 __movedata [17]
-----------------------------------------------
<spontaneous>
[18] 2.0 0.56 0.00 fprintf [18]
-----------------------------------------------
<spontaneous>
[19] 1.6 0.44 0.00 fflush [19]
-----------------------------------------------
0.22 0.00 39841/79733 realloc [10]
0.22 0.00 39892/79733 malloc [9]
[20] 1.6 0.44 0.00 79733 split [20]
-----------------------------------------------
<spontaneous>
[21] 1.4 0.39 0.00 __dj_movedata [21]
-----------------------------------------------
<spontaneous>
[22] 1.2 0.33 0.00 _flsbuf [22]
0.00 0.00 1/39894 malloc [9]
-----------------------------------------------
0.33 0.00 79684/79684 combinehi [14]
[23] 1.2 0.33 0.00 79684 extractfree [23]
-----------------------------------------------
<spontaneous>
[24] 0.8 0.22 0.00 strlen [24]
-----------------------------------------------
<spontaneous>
[25] 0.6 0.17 0.00 strchr [25]
-----------------------------------------------
<spontaneous>
[26] 0.4 0.11 0.00 __FSEXT_get_function
[26]
-----------------------------------------------
<spontaneous>
[27] 0.4 0.11 0.00 init [27]
-----------------------------------------------
<spontaneous>
[28] 0.4 0.11 0.00 memchr [28]
-----------------------------------------------
<spontaneous>
[29] 0.2 0.06 0.00 __dosmemput [29]
-----------------------------------------------
<spontaneous>
[30] 0.2 0.06 0.00 localeconv [30]
-----------------------------------------------
0.06 0.00 39894/39894 malloc [9]
[31] 0.2 0.06 0.00 39894 searchfree [31]
-----------------------------------------------
<spontaneous>
[32] 0.0 0.00 0.00 setup_environment
[32]
0.00 0.00 28/39894 malloc [9]
-----------------------------------------------
<spontaneous>
[33] 0.0 0.00 0.00 __putenv [33]
0.00 0.00 9/39894 malloc [9]
0.00 0.00 1/39846 free [8]
-----------------------------------------------
<spontaneous>
[34] 0.0 0.00 0.00 c1xmalloc [34]
0.00 0.00 5/39894 malloc [9]
-----------------------------------------------
<spontaneous>
[35] 0.0 0.00 0.00
__crt0_load_environment_file [35]
0.00 0.00 1/39846 free [8]
0.00 0.00 1/39894 malloc [9]
-----------------------------------------------
<spontaneous>
[36] 0.0 0.00 0.00 __glob [36]
0.00 0.00 1/39846 free [8]
0.00 0.00 1/39894 malloc [9]
-----------------------------------------------
<spontaneous>
[37] 0.0 0.00 0.00 add [37]
0.00 0.00 2/39894 malloc [9]
-----------------------------------------------
<spontaneous>
[38] 0.0 0.00 0.00 calloc [38]
0.00 0.00 2/39894 malloc [9]
-----------------------------------------------
<spontaneous>
[39] 0.0 0.00 0.00 __alloc_file [39]
0.00 0.00 1/39894 malloc [9]
-----------------------------------------------
<spontaneous>
[40] 0.0 0.00 0.00 _filbuf [40]
0.00 0.00 1/39894 malloc [9]
-----------------------------------------------
<spontaneous>
[41] 0.0 0.00 0.00 atexit [41]
0.00 0.00 1/39894 malloc [9]
-----------------------------------------------
0.00 0.00 5/5 malloc [9]
[42] 0.0 0.00 0.00 5 extendsbrk [42]
0.00 0.00 2/119544 mv2freelist [16]
-----------------------------------------------


Index by function name

[26] __FSEXT_get_function [23] extractfree (nmalloc.c) [28]
memchr
[21] __dj_movedata [19] fflush [16]
mv2freelist (nmalloc.c)
[29] __dosmemput [7] fgets [12]
putc
[1] __dpmi_int [5] fggets [6]
putchar
[17] __movedata [18] fprintf [15]
puts
[4] _doprnt [8] free [10]
realloc
[22] _flsbuf [27] init (nmalloc.c) [31]
searchfree (nmalloc.c)
[11] _write [30] localeconv [20]
split (nmalloc.c)
[14] combinehi (nmalloc.c) [3] main [25]
strchr
[13] dofree (nmalloc.c) [9] malloc [24]
strlen
[42] extendsbrk (nmalloc.c) (221) mcount

Richard Heathfield

unread,
Jun 29, 2002, 2:29:46 PM6/29/02
to

Blech. Good spot. You can have my job if you like.

Reply all
Reply to author
Forward
0 new messages