Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

using fgets() -- dynamically allocating string size

1,956 views
Skip to first unread message

Steve Sanyal

unread,
Feb 11, 1999, 3:00:00 AM2/11/99
to
Hi,

I'm writing fgrep as an assignment for a systems programming course.

I am using fgets() to read in a string from a file stream.

What I want to
do is make sure that fgets() does in fact read the entire line. So, if
the last character in the string is not a newline character, I want to
expand the size of my string, and try reading it again, and repeating
this until I do read the whole line.

I've been using the following method:

- malloc a buffer (ie: a string) with the default buffer length (256
characters let's say)

LOOP

- read from a file stream using fgets()
- if the last character read is not a new line, then realloc the buffer
to double its capacity
- try reading the string again (by remembering the offset in the
stream), and keep reallocing until realloc fails, or the newline is read

- at the end of each iteration, i always free the buffer, and then
malloc it again (however, i have taken these statements out, and it
makes no difference)

END LOOP

What is happening, however, is that fgets() only reads from the stream
once, and on the second time, it hangs.

Is there a problem using fgets() multiple times with a string that is
malloced?

Here is my relevant code

(Don't concern yourself with the contents of the loop to realloc the
buffer size, because in my test run, it never entres this loop):


char * readBuffer;
char * readBuffIndex;
FILE * tempptr;
long offset;
int BUFFERLENGTH = 20; /* I made this small on purpose for this
testfile */

/* set up file stream */
tempptr = fopen (tmpName, "r");


/* allocate memory for input buffer */
readBuffer = (char *)malloc (BUFFERLENGTH);

assert (readBuffer != NULL);

while (1)
{

/ * output current offset in file */
offset = ftell(tempptr);

printf ("offset is %ld\n", offset);

/* read next line from file */
if (fgets(readBuffer, BUFFERLENGTH, tempptr) == NULL) {
printf("\nEND OF STREAM\n");
break;
}

/********* hangs here on the second run through ************/

printf("successfully read using fgets.");

readBuffIndex = readBuffer + strlen(readBuffer)-1;
printf("\nmodified buffer index.");

/* check if last character in line is a new line character */

/* NOTE: in the test run I have given, this if block is not executed
because the newline character
IS encountered */

if (strcmp(readBuffIndex, "\n") != 0) {

int n;

printf("No new line encountered");

/* loop - reallocate storage until the newline is encountered */

for (n = 2;;n++);
{

printf("Buffer length needs to be modified");

/* reallocate the space for the string -- increasing it by
a factor of BUFFERLENGTH */
/* the size needs to grow since the newline character was
not encountered */
realloc(readBuffer, n*BUFFERLENGTH);

assert (readBuffer != NULL);
readBuffIndex = readBuffer;

/* reread the string using the previous value of offset */
fseek(tempptr, offset, SEEK_SET);

/* reread the line, with the expanded string */
if (fgets(readBuffer, n*BUFFERLENGTH, tempptr) == NULL) {
printf("\nEND OF STREAM\n");
break;
}

/* can quit the loop now if the last character is a new
line*/
readBuffIndex += strlen(readBuffer-1);
if (strcmp(readBuffIndex, "\n") == 0) break;

}

}

printf("%s", readBuffer);

/* whether i free and malloc every time or not, it still hangs */
free(readBuffer);
readBuffer = (char *)malloc(BUFFERLENGTH);

}


When I try running this code, the program hangs... prior to the
beginning of
the code I showed here.

The input for this program is as follows:

Test line 1
Test line 2 is longer than 1, to see if a long string can be
successfully realloced.
Test line 3.


The output is as follows:

offset is 0
read using fgets
modified buffer index
offset is 12


any help would be greatly appreciated, and if you can respond by email,
that would be great.

Thanks

Steve

Andrew Gierth

unread,
Feb 11, 1999, 3:00:00 AM2/11/99
to
>>>>> "Steve" == Steve Sanyal <san...@home.com> writes:

Steve> I've been using the following method:

Steve> - malloc a buffer (ie: a string) with the default buffer
Steve> length (256 characters let's say)

Steve> LOOP

Steve> - read from a file stream using fgets()
Steve> - if the last character read is not a new line, then realloc
Steve> the buffer to double its capacity

Steve> - try reading the string again (by remembering the offset in
Steve> the stream)

Why fiddle with the offsets? fgets() will leave the stream correctly
positioned for a subsequent call.

--
Andrew.

comp.unix.programmer FAQ: see <URL: http://www.erlenstar.demon.co.uk/unix/>
or <URL: http://www.whitefang.com/unix/>

Steve Sanyal

unread,
Feb 11, 1999, 3:00:00 AM2/11/99
to
Andrew Gierth wrote:

> >>>>> "Steve" == Steve Sanyal <san...@home.com> writes:
>
> Steve> I've been using the following method:
>
> Steve> - malloc a buffer (ie: a string) with the default buffer
> Steve> length (256 characters let's say)
>
> Steve> LOOP
>
> Steve> - read from a file stream using fgets()
> Steve> - if the last character read is not a new line, then realloc
> Steve> the buffer to double its capacity
>
> Steve> - try reading the string again (by remembering the offset in
> Steve> the stream)
>
> Why fiddle with the offsets? fgets() will leave the stream correctly
> positioned for a subsequent call.
>

fgets() will leave the string positioned for the *NEXT* read. But what I need
to do is repeat my previous read, because the whole line was not read, and
I've had to realloc() space for the larger sized line.

However, this issue is not one that is of concern regarding why the code I
provided does not seem to work.

ALTERNATIVELY -- is there a value that states what is the maximum size a
charaacter array can be? ANSI specifies it has to be at least 512 characters,
but I think in UNIX it's 4096. However, I'd rather set this as an operating
system limit, rather than a straight integer value.

Regards

Steve

Dan Mercer

unread,
Feb 11, 1999, 3:00:00 AM2/11/99
to
In article <36C310CB...@home.com>,

Steve Sanyal <san...@home.com> writes:
> Andrew Gierth wrote:
>
>> >>>>> "Steve" == Steve Sanyal <san...@home.com> writes:
>>
>> Steve> I've been using the following method:
>>
>> Steve> - malloc a buffer (ie: a string) with the default buffer
>> Steve> length (256 characters let's say)
>>
>> Steve> LOOP
>>
>> Steve> - read from a file stream using fgets()
>> Steve> - if the last character read is not a new line, then realloc
>> Steve> the buffer to double its capacity
>>
>> Steve> - try reading the string again (by remembering the offset in
>> Steve> the stream)
>>
>> Why fiddle with the offsets? fgets() will leave the stream correctly
>> positioned for a subsequent call.
>>
>
> fgets() will leave the string positioned for the *NEXT* read. But what I need
> to do is repeat my previous read, because the whole line was not read, and
> I've had to realloc() space for the larger sized line.
>

But your original data remains in the buffer - if realloc has to move
the block of data, it copies the old data to the new buffer up to the
limit of the size of the smaller buffer (you can realloc a smaller buffer).
Personally, I would do reads of PIPE_BUF size and avoid stdio calls
completely.

> However, this issue is not one that is of concern regarding why the code I
> provided does not seem to work.
>
> ALTERNATIVELY -- is there a value that states what is the maximum size a
> charaacter array can be? ANSI specifies it has to be at least 512 characters,
> but I think in UNIX it's 4096. However, I'd rather set this as an operating
> system limit, rather than a straight integer value.
>

Have no idea what you're talking about here - there are limits to total
process size, but they are quite large and config dependent.

--
Dan Mercer
dame...@uswest.net


> Regards
>
> Steve
>
>

Opinions expressed herein are my own and may not represent those of my employer.


Andrew Gierth

unread,
Feb 11, 1999, 3:00:00 AM2/11/99
to
>>>>> "Steve" == Steve Sanyal <san...@home.com> writes:

>> Why fiddle with the offsets? fgets() will leave the stream correctly
>> positioned for a subsequent call.

Steve> fgets() will leave the string positioned for the *NEXT* read.

The next character read will be the one immediately after the last one
stored in the string by the first call.

Steve> But what I need to do is repeat my previous read, because the
Steve> whole line was not read, and I've had to realloc() space for
Steve> the larger sized line.

You don't want to *repeat* your previous read, you want to *continue* it.

The right way to do this is something along these lines:

/* read a line of arbitrary length from FP and return it in allocated
* space which the caller must free.
*/

char *read_line(FILE *fp)
{
int linelen = 256;
int maxread = linelen;
char *line = malloc(linelen);
char *pos = line;

*line = 0;

while (fgets(pos, maxread, fp))
{
int rlen = strlen(pos);
if (feof(fp) || pos[rlen-1] == '\n')
return line;

linelen *= 2;
line = realloc(line, linelen);
rlen = strlen(line);
pos = line + rlen;
maxread = linelen - rlen;
}

/* might have an incomplete line at EOF - return it */
if (!ferror(fp) && strlen(line))
return line;

free(line);
return NULL;
}

Steve> However, this issue is not one that is of concern regarding
Steve> why the code I provided does not seem to work.

Actually it may be. Are you reading from a real file, or a pipe or
tty? In the latter case, you can't reposition the input anyway.

Steve> ALTERNATIVELY -- is there a value that states what is the
Steve> maximum size a charaacter array can be?

That is not a meaningful question. A char array can be as large as you
want, within the limits of available memory.

Steve> ANSI specifies it has to be at least 512 characters, but I
Steve> think in UNIX it's 4096. However, I'd rather set this as an
Steve> operating system limit, rather than a straight integer value.

Code that can't handle arbitrary line lengths is broken.

POSIX, however, only requires that text-based utilities handle lines of
length LINE_MAX or less, defined to be at least 2048.

Floyd Davidson

unread,
Feb 11, 1999, 3:00:00 AM2/11/99
to

Andrew Gierth <and...@erlenstar.demon.co.uk> wrote:
>
>You don't want to *repeat* your previous read, you want to *continue* it.
>
>The right way to do this is something along these lines:
>
>/* read a line of arbitrary length from FP and return it in allocated
> * space which the caller must free.
> */
>
>char *read_line(FILE *fp)
>{
> int linelen = 256;
> int maxread = linelen;
> char *line = malloc(linelen);
> char *pos = line;
>
> *line = 0;
>
> while (fgets(pos, maxread, fp))
> {
> int rlen = strlen(pos);
> if (feof(fp) || pos[rlen-1] == '\n')
> return line;

Why put a call to feof() there? If it can be true,
then fgets() will have returned NULL and this point
will not be reached?

>
> linelen *= 2;
> line = realloc(line, linelen);
> rlen = strlen(line);

What happens here if the call to realloc() failed?

> pos = line + rlen;
> maxread = linelen - rlen;
> }
>
> /* might have an incomplete line at EOF - return it */
> if (!ferror(fp) && strlen(line))
> return line;

Hmmm... this doesn't allow for a zero length file to be read.
I don't know if that is a feature or a bug... :-)

> free(line);
> return NULL;
>}

Floyd


--
Floyd L. Davidson fl...@ptialaska.net
Ukpeagvik (Barrow, Alaska) fl...@barrow.com
Pictures of the North Slope at <http://www.ptialaska.net/~floyd>

Andrew Gierth

unread,
Feb 11, 1999, 3:00:00 AM2/11/99
to
>>>>> "Floyd" == Floyd Davidson <fl...@tanana.polarnet.com> writes:

>> while (fgets(pos, maxread, fp))
>> {
>> int rlen = strlen(pos);
>> if (feof(fp) || pos[rlen-1] == '\n')
>> return line;

Floyd> Why put a call to feof() there? If it can be true,
Floyd> then fgets() will have returned NULL and this point
Floyd> will not be reached?

fgets() can return a non-null result if EOF is reached after reading a
partial line.

>> linelen *= 2;
>> line = realloc(line, linelen);
>> rlen = strlen(line);

Floyd> What happens here if the call to realloc() failed?

The program crashes. Add your own error checking.

>> pos = line + rlen;
>> maxread = linelen - rlen;
>> }
>>
>> /* might have an incomplete line at EOF - return it */
>> if (!ferror(fp) && strlen(line))
>> return line;

Floyd> Hmmm... this doesn't allow for a zero length file to be read.
Floyd> I don't know if that is a feature or a bug... :-)

Huh? Applied to a zero-length file, the first call will return NULL
signifying end-of-file, which is the correct behaviour.

Floyd Davidson

unread,
Feb 12, 1999, 3:00:00 AM2/12/99
to
Andrew Gierth <and...@erlenstar.demon.co.uk> wrote:
>>>>>> "Floyd" == Floyd Davidson <fl...@tanana.polarnet.com> writes:
>
> >> while (fgets(pos, maxread, fp))
> >> {
> >> int rlen = strlen(pos);
> >> if (feof(fp) || pos[rlen-1] == '\n')
> >> return line;
>
> Floyd> Why put a call to feof() there? If it can be true,
> Floyd> then fgets() will have returned NULL and this point
> Floyd> will not be reached?
>
>fgets() can return a non-null result if EOF is reached after reading a
>partial line.

Fgets() will *always* return non-NULL in that case (assuming no
read error). But feof() will not report EOF either...

However, on the next iteration when *no characters are read* then
fgets() will return NULL and feof() will also report EOF.

> >> linelen *= 2;
> >> line = realloc(line, linelen);
> >> rlen = strlen(line);
>
> Floyd> What happens here if the call to realloc() failed?
>
>The program crashes. Add your own error checking.

Exactly. The structure needs to be changed a little so
that the original pointer can be saved if any kind of
recovery is intended, otherwise at least check to see if
realloc() has failed and gracefully exit rather than just
allow strlen(NULL) to crash.

> >> pos = line + rlen;
> >> maxread = linelen - rlen;
> >> }
> >>
> >> /* might have an incomplete line at EOF - return it */
> >> if (!ferror(fp) && strlen(line))
> >> return line;
>
> Floyd> Hmmm... this doesn't allow for a zero length file to be read.
> Floyd> I don't know if that is a feature or a bug... :-)
>
>Huh? Applied to a zero-length file, the first call will return NULL
>signifying end-of-file, which is the correct behaviour.

Sure, that is from fgets(), but is that necessarily the correct
behavior for this function overall? It might be equally useful
to return a null string rather than a NULL pointer. Kinda depends
on what one wants to do, eh? Either way might be useful.

Andrew Gierth

unread,
Feb 12, 1999, 3:00:00 AM2/12/99
to
>>>>> "Floyd" == Floyd Davidson <fl...@tanana.polarnet.com> writes:

>> fgets() can return a non-null result if EOF is reached after
>> reading a partial line.

Floyd> Fgets() will *always* return non-NULL in that case (assuming
Floyd> no read error). But feof() will not report EOF either...

Yes, it will.

Floyd> However, on the next iteration when *no characters are read* then
Floyd> fgets() will return NULL and feof() will also report EOF.

fgets() will indeed return NULL on a following call. But I don't let it
get that far, because it would be expanding the buffer unnecessarily.

>> Huh? Applied to a zero-length file, the first call will return NULL
>> signifying end-of-file, which is the correct behaviour.

Floyd> Sure, that is from fgets(), but is that necessarily the correct
Floyd> behavior for this function overall?

The function returns NULL on end-of-file or error. Therefore it is correct
for it to return NULL on the first call if the file contains 0 bytes.
If you want different behaviour you can write your own function.

Floyd Davidson

unread,
Feb 12, 1999, 3:00:00 AM2/12/99
to
[emailed and posted]

Andrew Gierth <and...@erlenstar.demon.co.uk> wrote:
>>>>>> "Floyd" == Floyd Davidson <fl...@tanana.polarnet.com> writes:
>
> >> fgets() can return a non-null result if EOF is reached after
> >> reading a partial line.
>
> Floyd> Fgets() will *always* return non-NULL in that case (assuming
> Floyd> no read error). But feof() will not report EOF either...
>
>Yes, it will.

Per the ISO/ANSI C Standard Section 7.9.7.2, the fgets()
function will not read past the end of the file when characters
are returned; therefore, the end-of-file indicator will not be
set and feof() will not return true. Specifically it says:

"No additional characters are read after a new-line character
(which is retained) or after end-of-file."

Per Section 7.9.10.2, feof() returns non-zero if and only if the
end-of-file indicator is set.

Hence I would be interested in seeing a demonstration that it
works the way you suggest. For example, on a Linux system using
gcc it does not.

Steven M. Gallo

unread,
Feb 12, 1999, 3:00:00 AM2/12/99
to
In article <36C306DB...@home.com>, Steve Sanyal <san...@home.com> wrote:
>Hi,
>
>I'm writing fgrep as an assignment for a systems programming course.
>
>I am using fgets() to read in a string from a file stream.
>
>What I want to
>do is make sure that fgets() does in fact read the entire line. So, if
>the last character in the string is not a newline character, I want to
>expand the size of my string, and try reading it again, and repeating
>this until I do read the whole line.


Why don't you try something like this:

(1) Allocate a default buffer (say 256 bytes).
(2) Make the call to fgets().
(3) Copy the data from the call into your buffer.
(4) While (no newline read)
(4a) Make the call to fgets().
(4b) Reallocate your buffer to increase size and Copy the data from the
call into your new buffer.

Now your buffer is also larger so you'll have less of a change of not
getting a newline next time. you might also want to have a max size
for your buffer, but that gets complicated.

Steve


Andrew Gierth

unread,
Feb 12, 1999, 3:00:00 AM2/12/99
to
[xposting to comp.std.c]

>>>>> "Floyd" == Floyd Davidson <fl...@tanana.polarnet.com> writes:

AG> fgets() can return a non-null result if EOF is reached after
AG> reading a partial line.

Floyd> Fgets() will *always* return non-NULL in that case (assuming
Floyd> no read error). But feof() will not report EOF either...

AG> Yes, it will.

[note to standards weenies: I am here describing the observed
behaviour of certain systems, without reference to what the standard
might say.]

Floyd> Per the ISO/ANSI C Standard Section 7.9.7.2, the fgets()
Floyd> function will not read past the end of the file when characters
Floyd> are returned; therefore, the end-of-file indicator will not be
Floyd> set and feof() will not return true. Specifically it says:

Floyd> "No additional characters are read after a new-line character
Floyd> (which is retained) or after end-of-file."

Floyd> Per Section 7.9.10.2, feof() returns non-zero if and only if the
Floyd> end-of-file indicator is set.

Floyd> Hence I would be interested in seeing a demonstration that it
Floyd> works the way you suggest. For example, on a Linux system
Floyd> using gcc it does not.

FreeBSD 2.2 and 3-STABLE with gcc, and Solaris 2.6 with gcc, using the
following test program:

#include <stdio.h>

int main()
{
char buf[1024];
char *ptr;
*buf = 0;
if ((ptr = fgets(buf, 1024, stdin)) && feof(stdin))
fprintf(stderr, "eof found: ptr=%p len=%d buf='%s'\n",
ptr, (int)strlen(ptr), buf);
return 0;
}

$ echo -n foo | ./a.out
eof found: ptr=804791c len=3 buf='foo'

So are these systems violating the standard?

Nick Maclaren

unread,
Feb 12, 1999, 3:00:00 AM2/12/99
to
In article <87g18bq...@erlenstar.demon.co.uk>,

No. And gcc version 2.7.2.1 on my Linux system does the same. It is
erroneous for feof to return 0. The reason is that fgets says that
no further characters are returned AFTER a newline character or
end of file.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
Email: nm...@cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679

Floyd Davidson

unread,
Feb 12, 1999, 3:00:00 AM2/12/99
to

Nick Maclaren <nm...@cus.cam.ac.uk> wrote:

>Andrew Gierth <and...@erlenstar.demon.co.uk> wrote:
>>>>>>> "Floyd" == Floyd Davidson <fl...@tanana.polarnet.com> writes:
>>
>>#include <stdio.h>
>>
>>int main()
>>{
>> char buf[1024];
>> char *ptr;
>> *buf = 0;
>> if ((ptr = fgets(buf, 1024, stdin)) && feof(stdin))
>> fprintf(stderr, "eof found: ptr=%p len=%d buf='%s'\n",
>> ptr, (int)strlen(ptr), buf);
>> return 0;
>>}
>>
>>$ echo -n foo | ./a.out
>>eof found: ptr=804791c len=3 buf='foo'
>>
>>So are these systems violating the standard?
>
>No. And gcc version 2.7.2.1 on my Linux system does the same. It is
>erroneous for feof to return 0. The reason is that fgets says that
>no further characters are returned AFTER a newline character or
>end of file.

In fact gcc on a Linux system does exactly as described above when
input is from stdin, but will do the opposite if input is from a
disk file. Hence:

$ echo -n foo > bar; ./a.out < bar

Will not indicate that eof was found by feof().

Noone Really

unread,
Feb 12, 1999, 3:00:00 AM2/12/99
to
I am replying from comp.lang.c to which this is crossposted:

Floyd Davidson wrote:
> >: >>#include <stdio.h>


> >: >>
> >: >>int main()
> >: >>{
> >: >> char buf[1024];
> >: >> char *ptr;
> >: >> *buf = 0;

Huh? Why?

> >: >> if ((ptr = fgets(buf, 1024, stdin)) && feof(stdin))


> >: >> fprintf(stderr, "eof found: ptr=%p len=%d buf='%s'\n",
> >: >> ptr, (int)strlen(ptr), buf);
> >: >> return 0;
> >: >>}
> >: >>
> >: >>$ echo -n foo | ./a.out
> >: >>eof found: ptr=804791c len=3 buf='foo'
> >: >>
> >: >>So are these systems violating the standard?
> >: >
> >: >No. And gcc version 2.7.2.1 on my Linux system does the same. It is
> >: >erroneous for feof to return 0. The reason is that fgets says that
> >: >no further characters are returned AFTER a newline character or
> >: >end of file.
> >
> >: In fact gcc on a Linux system does exactly as described above when
> >: input is from stdin, but will do the opposite if input is from a
> >: disk file. Hence:
> >
> >: $ echo -n foo > bar; ./a.out < bar
> >
> >: Will not indicate that eof was found by feof().
> >

> >It may well be that stdin is adding a terminator. However, the standard
> >says that the need for a terminating newline on the last line of a text
> >file is implementation defined, so the program's output may vary between
> >systems.
>
> Will, can you expand on that a little (and provide a reference
> to the x89 Standard too). The original discussion concerned
> whether fgets(), upon reading the last line of a text file which
> is _not_ concluded with a newline, would necessarily set the
> end-of-file indicator such that feof() would return true. My
> assumption was that it would require another read by fgets(),
> which would return a NULL and provide no input characters, to
> set the end-of-file indicator.
>
> Andrew and I exchanged a couple of "does too/does not" examples.
> As stated above, on Linux the test code indicated one result when
> reading stdin and another when reading from a file. On Linux,
> reading from a file does not set the indicator, and yet Andrew
> finds that on two other systems the indicator is indeed set
> reading from a file or from stdin.
>
> (My appologies for this having landed in comp.std.c, it belongs
> in comp.lang.c instead, which I've added.)

7.9.2 `Whether the last line requires a terminating new-line character
is implementation-defined.'

Note that stdin is a text stream, so the above quote applies. If the
implementation defines that it *needs* a new line, then of course
anything can happen if you do not provide one.

Moreover, data that a C program sees when reading a text stream is
explicitly stated as being possibly different from what the operating
system sees. The only condition is that a C program can write out data
and be assured of getting the same data back, except:
1) for a text stream, the data in question is made of printable
characters, tab
and new line; no space appears immediately preceding a new line;
and the last
character is a new line.
2) for a binary stream, null characters can get appended at the
discretion of, and
only if documented by the implementation.
As you can see, even this does not guarantee anything about the last
line not ending in a newline: the C library can silently provide one.

However, if fgets terminates *due to* meeting end of file, it must set
the end-of-file flag. NULL is returned only if EOF happens *before* any
characters are read. So, one of the five conditions below must hold:
1) buf contains a '\n\0', characters in buf after that remain
unchanged, and feof
is false.
(i.e. a '\n' was silently appended)
2) All 1024 characters of buf are modified, the last character is
'\0', and feof
is false.
(i.e. a whole pad of arbitrary, maybe '\0', characters were
produced)
3) No elements of the array are changed, NULL is returned by fgets and
feof is
true
(i.e. the whole incomplete line was discarded)
4) NULL was returned by fgets, the array became anything whatsoever,
and ferror is
true
(i.e. reading a file with incomplete last line in text mode is
an error)
5) buf contains a '\0' not preceded by a newline and feof is true
(i.e. reading was terminated by the eof after some characters.)
All these behaviours are allowed, though they need documentation by the
implementation if the file itself was produced by a C program.

So, you have to change your program to check whether any of these
possibilities is taking place. My check on my linux box is giving me
the `eof found' message when I do it from a redirect as well: so I do
not know what the confusion is about.

>uname -a
Linux localhost.localdomain 2.0.36 #1 Sat Nov 28 09:22:28 MST 1998 i686
unknown
>gcc --version
2.7.2.3


>echo -n foo > bar ; ./a.out < bar

eof found: ptr=0xbffff4d8 len=3 buf='foo'

---


Miles Davies

unread,
Feb 12, 1999, 3:00:00 AM2/12/99
to comp.unix.programmer

Andrew Gierth wrote in message <87ogn0r...@erlenstar.demon.co.uk>...

>>>>>> "Steve" == Steve Sanyal <san...@home.com> writes:
>
> >> Why fiddle with the offsets? fgets() will leave the stream correctly
> >> positioned for a subsequent call.
>


[ snip Andrews very helpful reply and on to the pedantry]

>
>You don't want to *repeat* your previous read, you want to *continue* it.
>
>The right way to do this is something along these lines:
>
>/* read a line of arbitrary length from FP and return it in allocated
> * space which the caller must free.
> */
>
>char *read_line(FILE *fp)
>{
> int linelen = 256;
> int maxread = linelen;
> char *line = malloc(linelen);
> char *pos = line;
>
> *line = 0;
>

> while (fgets(pos, maxread, fp))
> {
> int rlen = strlen(pos);
> if (feof(fp) || pos[rlen-1] == '\n')
> return line;
>

> linelen *= 2;
> line = realloc(line, linelen);

/* is this length not just the sum of all the rlen's so far ? */

> rlen = strlen(line);


> pos = line + rlen;
> maxread = linelen - rlen;
> }
>
> /* might have an incomplete line at EOF - return it */
> if (!ferror(fp) && strlen(line))
> return line;
>

> free(line);
> return NULL;
>}
>
[snip a bit more]

Never mind the question of whether or not to check for feof() in the loop or
rely on
the next loop catching it.....

What about all those calls to strlen() !! You have the information you need
to work out
the total string size without calling strlen on what could be a very long
string indeed
each time round the loop.

Will Rose

unread,
Feb 13, 1999, 3:00:00 AM2/13/99
to
Floyd Davidson (fl...@tanana.polarnet.com) wrote:

: Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
: >Andrew Gierth <and...@erlenstar.demon.co.uk> wrote:
: >>>>>>> "Floyd" == Floyd Davidson <fl...@tanana.polarnet.com> writes:

: >>


: >>#include <stdio.h>
: >>
: >>int main()
: >>{
: >> char buf[1024];
: >> char *ptr;
: >> *buf = 0;

: >> if ((ptr = fgets(buf, 1024, stdin)) && feof(stdin))
: >> fprintf(stderr, "eof found: ptr=%p len=%d buf='%s'\n",
: >> ptr, (int)strlen(ptr), buf);
: >> return 0;
: >>}
: >>
: >>$ echo -n foo | ./a.out
: >>eof found: ptr=804791c len=3 buf='foo'
: >>
: >>So are these systems violating the standard?
: >
: >No. And gcc version 2.7.2.1 on my Linux system does the same. It is
: >erroneous for feof to return 0. The reason is that fgets says that
: >no further characters are returned AFTER a newline character or
: >end of file.

: In fact gcc on a Linux system does exactly as described above when
: input is from stdin, but will do the opposite if input is from a
: disk file. Hence:

: $ echo -n foo > bar; ./a.out < bar

: Will not indicate that eof was found by feof().

It may well be that stdin is adding a terminator. However, the standard
says that the need for a terminating newline on the last line of a text
file is implementation defined, so the program's output may vary between
systems.


Will
c...@crash.cts.com


Floyd Davidson

unread,
Feb 13, 1999, 3:00:00 AM2/13/99
to
[Note added crosspost to comp.lang.c]

Will Rose <c...@cts.com> wrote:
>Floyd Davidson (fl...@tanana.polarnet.com) wrote:
>: Nick Maclaren <nm...@cus.cam.ac.uk> wrote:
>: >Andrew Gierth <and...@erlenstar.demon.co.uk> wrote:
>: >>

Will, can you expand on that a little (and provide a reference


to the x89 Standard too). The original discussion concerned
whether fgets(), upon reading the last line of a text file which
is _not_ concluded with a newline, would necessarily set the
end-of-file indicator such that feof() would return true. My
assumption was that it would require another read by fgets(),
which would return a NULL and provide no input characters, to
set the end-of-file indicator.

Andrew and I exchanged a couple of "does too/does not" examples.
As stated above, on Linux the test code indicated one result when
reading stdin and another when reading from a file. On Linux,
reading from a file does not set the indicator, and yet Andrew
finds that on two other systems the indicator is indeed set
reading from a file or from stdin.

(My appologies for this having landed in comp.std.c, it belongs
in comp.lang.c instead, which I've added.)

Floyd

Will Rose

unread,
Feb 13, 1999, 3:00:00 AM2/13/99
to

Floyd Davidson (fl...@tanana.polarnet.com) wrote:
[...]
: >It may well be that stdin is adding a terminator. However, the standard

: >says that the need for a terminating newline on the last line of a text
: >file is implementation defined, so the program's output may vary between
: >systems.

: Will, can you expand on that a little (and provide a reference
: to the x89 Standard too). The original discussion concerned
: whether fgets(), upon reading the last line of a text file which
: is _not_ concluded with a newline, would necessarily set the
: end-of-file indicator such that feof() would return true. My
: assumption was that it would require another read by fgets(),
: which would return a NULL and provide no input characters, to
: set the end-of-file indicator.

In the C89 standard (I don't have a later version) 4.9.2 deals
with streams, and says "[In text streams] Whether the last line


requires a terminating new-line character is implementation-
defined."

: Andrew and I exchanged a couple of "does too/does not" examples.


: As stated above, on Linux the test code indicated one result when
: reading stdin and another when reading from a file. On Linux,
: reading from a file does not set the indicator, and yet Andrew
: finds that on two other systems the indicator is indeed set
: reading from a file or from stdin.

fgets() reads up to newline or EOF, and returns a NULL pointer
if EOF is encountered and no characters have been read in. Thus
on a system which requires a newline as the last character of
a file, (fgets(buff, sizeof(buff), fp) && feof(fp)) will always
be false, but on a system which does not require such a newline
it will vary depending on whether fgets() hits a newline or EOF.

Note also that some editors automatically add a newline to the
end of a text file, and in some cases even an additional EOF
symbol, so you have to take care when testing. It may be that
the shell is doing something similar, adding a newline to your
text stream which isn't generated by the usual stdio functions.


Will
c...@crash.cts.com

Lawrence Kirby

unread,
Feb 13, 1999, 3:00:00 AM2/13/99
to
In article <7a2ucu$5...@enews1.newsguy.com>
fl...@ptialaska.net "Floyd Davidson" writes:

...

>Will, can you expand on that a little (and provide a reference
>to the x89 Standard too). The original discussion concerned
>whether fgets(), upon reading the last line of a text file which
>is _not_ concluded with a newline, would necessarily set the
>end-of-file indicator such that feof() would return true. My
>assumption was that it would require another read by fgets(),
>which would return a NULL and provide no input characters, to
>set the end-of-file indicator.

fgets() will stop reading input and return when one of the following
conditions is encountered:

1. There is only one character left in the caller-supplied array (which
will have a null character written to it). A pointer to the buffer
is returned.

2. A new-line character is encountered in the input stream. A pointer to
the array is returned.

3. End-of-file has been encountered on the input stream and one or
more characters has been read into the array. A pointer to the array
is returned

4. End-of-file has been encountered on the input stream and no characters
have been read into the array. A null pointer is returned.

5. An error condition has been encountered on the input stream. A null
pointer is returned.

The C90 standard section 7.9.3 says

"All input takes place as if characters were read by successive
calls to fgetc()"

Section "7.9.7.1 The fgetc function" says

"The fgetc function returns the next character from the input stream
pointed to by stream. If the stream is at end-of-file, the end-of-file
indicator for the stream is set and fgetc returns EOF. If a read error
occurs, the error indicator for the stream is set and fgetc returns EOF."

We're interested in condition 3. A program can detect this condition
by testing that a) the return value of fgets() is not null, b) the
array is not full (must assume that there are no null characters in
the input stream) the string in the array contains no new-line character.
Under these conditions fgets() must act as if it had encountered an
end-of-file condition on calling fgetc() therefore is *must* set the
end-of-file indicator for the stream.

>Andrew and I exchanged a couple of "does too/does not" examples.
>As stated above, on Linux the test code indicated one result when
>reading stdin and another when reading from a file. On Linux,
>reading from a file does not set the indicator, and yet Andrew
>finds that on two other systems the indicator is indeed set
>reading from a file or from stdin.

Unix systems including Linux make no distinction between text and binary
streams therefore we can sidestep text stream issues in that case by
considering the stream as a binary stream. If the Linux system fails to
set the end-of-file indicator under the conditions I describe above it is
in clear violation of the standard.

>(My appologies for this having landed in comp.std.c, it belongs
>in comp.lang.c instead, which I've added.)

Since we're getting down to interpreting the text of the standard,
comp.std.c is a reasonable newsgroup for this.

--
-----------------------------------------
Lawrence Kirby | fr...@genesis.demon.co.uk
Wilts, England | 7073...@compuserve.com
-----------------------------------------


Floyd Davidson

unread,
Feb 13, 1999, 3:00:00 AM2/13/99
to

Lawrence Kirby <fr...@genesis.demon.co.uk> wrote:

>fl...@ptialaska.net "Floyd Davidson" writes:
>>Will, can you expand on that a little (and provide a reference
>>to the x89 Standard too). The original discussion concerned
>>whether fgets(), upon reading the last line of a text file which
>>is _not_ concluded with a newline, would necessarily set the
>>end-of-file indicator such that feof() would return true. My
>>assumption was that it would require another read by fgets(),
>>which would return a NULL and provide no input characters, to
>>set the end-of-file indicator.

...

Thank you!

What I was missing was the (section 7.9.3) "All input takes


place as if characters were read by successive calls to

fgetc()", and the clear significance that has reference to when
fgets() will cause the end-of-file indicatior to be set.

Linux gcc version 2.7.0 violates the standard when reading from
a file, but not when reading from stdin. Others have stated that
more recent versions which do not have the same problem.

David R. Conrad

unread,
Feb 14, 1999, 3:00:00 AM2/14/99
to
Floyd Davidson wrote:
>What I was missing was the (section 7.9.3) "All input takes
>place as if characters were read by successive calls to
>fgetc()", and the clear significance that has reference to when
>fgets() will cause the end-of-file indicatior to be set.
>
>Linux gcc version 2.7.0 violates the standard when reading from
>a file, but not when reading from stdin. Others have stated that
>more recent versions which do not have the same problem.

I'd just like to point out that it may have more to do with what
version of libc you are running, than the version of gcc, although
I don't know for certain that that is the case.

FWIW, gcc version egcs-2.91.60 19981201 (egcs-1.1.1 release) with
libc.so.5.4.46 sets end-of-file in both cases on my linux 2.2.1 box.

--
David R. Conrad <d...@adni.net> PGP keys (0x1993E1AE and 0xA0B83D31):
DSS Fingerprint20 = 9942 E27C 3966 9FB8 5058 73A4 83CE 62EF 1993 E1AE
RSA Fingerprint16 = 1D F2 F3 90 DA CA 35 5D 91 E4 09 45 95 C8 20 F1

Nate Eldredge

unread,
Feb 16, 1999, 3:00:00 AM2/16/99
to
Floyd Davidson wrote:

> What I was missing was the (section 7.9.3) "All input takes
> place as if characters were read by successive calls to
> fgetc()", and the clear significance that has reference to when
> fgets() will cause the end-of-file indicatior to be set.
>
> Linux gcc version 2.7.0 violates the standard when reading from
> a file, but not when reading from stdin. Others have stated that
> more recent versions which do not have the same problem.

Just to mention: it's probably not the compiler itself (gcc) that's at
fault, but the libc it's used with. If you are actually having this
problem, it might be more useful to compare libc versions with those who
don't see it.
--

Nate Eldredge
na...@cartsys.com

Clive D.W. Feather

unread,
Feb 17, 1999, 3:00:00 AM2/17/99
to
In article <7a2ucu$5...@enews1.newsguy.com>, Floyd Davidson
<fl...@tanana.polarnet.com> writes

>Will, can you expand on that a little (and provide a reference
>to the x89 Standard too). The original discussion concerned
>whether fgets(), upon reading the last line of a text file which
>is _not_ concluded with a newline, would necessarily set the
>end-of-file indicator such that feof() would return true.

If it is a text stream, then there is a lot of implementation freedom as
to what happens when there isn't a newline at the end of the file.

If we ignore this, for example by saying that text and binary streams
are equivalent, then fgets acts as if it keeps calling fgetc until
filling the buffer, reading a newline, or reaching end of file. In the
last case the feof flag will be set, even though fgets returns a
bufferload of text (by returning a non-null pointer).

>Andrew and I exchanged a couple of "does too/does not" examples.
>As stated above, on Linux the test code indicated one result when
>reading stdin and another when reading from a file.

Then Linux is wrong.

>(My appologies for this having landed in comp.std.c, it belongs
>in comp.lang.c instead, which I've added.)

No, it belongs in comp.std.c.

--
Clive D.W. Feather | Director of | Work: <cl...@demon.net>
Tel: +44 181 371 1138 | Software Development | Home: <cl...@davros.org>
Fax: +44 181 371 1037 | Demon Internet Ltd. | Web: <http://www.davros.org>
Written on my laptop; please observe the Reply-To address

0 new messages