Reading lines from a text file

Mark Hobley

unread,

Mar 22, 2010, 6:54:42 PM3/22/10

to

I want to read a text file a line at a time from within a C program. Are there
some available functions or code already written that does this or do I need
to code from scratch?

If I am doing this from scratch, what is the best practise for allocating
a buffer size for the input line?

I guess open the file, scan once to determine the buffer size, then rewind and
start reading. Has this already been done or do I need to code this from
scratch?

(My project is open source, so I can utilize GPL licenced code, if necessary.)

C89 compatible code is preferred.

Mark.

--
Mark Hobley
Linux User: #370818 http://markhobley.yi.org/

Andrew Poelstra

unread,

Mar 23, 2010, 5:23:30 PM3/23/10

to

On 2010-03-22, Mark Hobley <markh...@hotpop.donottypethisbit.com> wrote:
> I want to read a text file a line at a time from within a C program. Are there
> some available functions or code already written that does this or do I need
> to code from scratch?
>
> If I am doing this from scratch, what is the best practise for allocating
> a buffer size for the input line?
>
> I guess open the file, scan once to determine the buffer size, then rewind and
> start reading. Has this already been done or do I need to code this from
> scratch?
>
> (My project is open source, so I can utilize GPL licenced code, if necessary.)
>

Well, if you know how big your lines are, or know a reasonable
maximum, you can just use:

char buffer[1024];
fgets(buffer, sizeof buffer, file);

> C89 compatible code is preferred.
>

Otherwise, Chuck Falconer has a function called ggets() on his
website that handles memory allocation and all that. I don't
remember the link, but Google will find it.

Richard Heathfield also has such a beast, according to the
comments in Chuck's code. Given that Richard is still around
and Chuck is not, you maybe will be better off with that.

In either case, they're very easy functions to use.

--
Andrew Poelstra
http://www.wpsoftware.net/andrew

Ben Bacarisse

unread,

Mar 23, 2010, 5:48:40 PM3/23/10

to

markh...@hotpop.donottypethisbit.com (Mark Hobley) writes:

> I want to read a text file a line at a time from within a C program. Are there
> some available functions or code already written that does this or do I need
> to code from scratch?

<snip>

> (My project is open source, so I can utilize GPL licenced code, if necessary.)

gcc's glibc includes getline. If you can't use gcc and link against glibc
you might be able to use the source (though extracting parts of the
library might be fiddly).

<snip>
--
Ben.

Seebs

unread,

Mar 23, 2010, 6:31:11 PM3/23/10

to

On 2010-03-22, Mark Hobley <markh...@hotpop.donottypethisbit.com> wrote:

> I want to read a text file a line at a time from within a C program. Are there
> some available functions or code already written that does this or do I need
> to code from scratch?

There are some.

> If I am doing this from scratch, what is the best practise for allocating
> a buffer size for the input line?

Good question!

> I guess open the file, scan once to determine the buffer size, then rewind and
> start reading. Has this already been done or do I need to code this from
> scratch?

That's a very expensive way to do it. Reading is usually much more expensive
than, say, copying in memory. If you can make reasonable guesses about buffer
sizes, you should be able to do pretty well.

Have a look at fgets(), which gets a string of definitely no more than a
particular length. If a line is too long for it, you can call fgets()
again to get more of the line.

Do you need to keep multiple lines in memory, or do you just need to look
at each one? A typical strategy I'll use for "look at each item in turn"
is basically this:
size_t line_len = 256;
char *line_data;
line_data = malloc(line_len);
while (fgets(line_data, line_len, stdin)) {
char *s;
size_t this_line_len;
this_line_len = strlen(line_data);
while (line_data[this_line_len - 1] != '\n') {
s = malloc(line_len * 2);
memcpy(s, line_data, line_len);
free(line_data);
line_data = s;
fgets(line_data + line_len, line_len, stdin);
line_len *= 2;
this_line_len = strlen(line_data);
}
}

This omits quite a bit of error checking, but the basic idea is, you
pick a buffer size, and use it, and if it's not big enough, you increase
the buffer size, reallocate, then keep using that larger buffer. In
most cases, you'll probably never even reallocate once.

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Jens Thoms Toerring

unread,

Mar 23, 2010, 6:33:44 PM3/23/10

to

Mark Hobley <markh...@hotpop.donottypethisbit.com> wrote:
> I want to read a text file a line at a time from within a C program. Are there
> some available functions or code already written that does this or do I need
> to code from scratch?

> If I am doing this from scratch, what is the best practise for allocating
> a buffer size for the input line?

The simplest method is to start with guess for the length of the
longest line and allocate as much. Now you use fgets() to read in
a line and check if it ends in a '\n' - if it does everything is
ok but if it doesn't the line was too long to fit into the buffer
you started of with. In that case you jincrease the size of the
buffer, e.g. by doubling its size, using realloc(), and try to
read the rest of the line by calling fgets() again (but with the
first argument pointing into the buffer were the last try stopped).
Then repeat the test for the final '\n' and repeat increasing the
buffer size if necessary. If you don't run out of memory you end
up with a buffer that contains the complete line.

The only special case you may have to consider is that the last
line of a file may not end with a '\n' and then, of course, also
what fgets() reads in can't contain that character - but if you
try to read at the very end fgets() will return NULL, so it's
possible to check for that condition.

> I guess open the file, scan once to determine the buffer size, then rewind
> and start reading.

I guess reading the file twice just to find out the length of the
longest line is too much work.

> Has this already been done or do I need to code this from
> scratch?

Probably everyone being faced with the problem of reading lines of
arbitary length will have written such a function at least once;-)
Here's something I found looking through my files (although with
quite a number changes to the original, so be wary, I may have
broken it!):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define LEN_GUESS 128

int
read_line( FILE * fp,
char ** line )
{
static char *buf = NULL;
static size_t buf_len = LEN_GUESS;
char *p = buf;
size_t rem_len = buf_len;

if ( ! fp || ! line )
return -1; /* bad argument(s) */

if ( ! buf && ! ( buf = p = malloc( buf_len ) ) )
return -1; /* running out of memory */
*buf = '\0';

while ( 1 )
{
size_t len;
char *tmp;

if ( ! fgets( p, rem_len, fp ) )
{
if ( ferror( fp ) )
return -1; /* read failure */
break;
}

len = strlen( p );

if ( p[ len - 1 ] == '\n')
break;

if ( ! ( tmp = realloc( buf, 2 * buf_len ) ) )
return -1; /* running out of memory */

buf = tmp;
p += len;
rem_len += buf_len - len;
buf_len *= 2;
}

*line = buf;
return feof( fp ) ? 1 : 0; /* indicate if EOF has been reached */
}

Note that it's, of course, not thread-safe. And when you call it
again the last line returned will be overwritten. When you don't
need to call the function anymore you should free() the returned
pointer.

> (My project is open source, so I can utilize GPL licenced code, if
> necessary.) C89 compatible code is preferred.

Use it for whatever you want if it fits your needs (but better
check carefully that it works, it's not my tested version, I
just checked that it compiles!) And, of course, there are quite
a number of ways it could be improved, it's more meant for giving
you a better idea of how it could be done.

Regards, Jens
--
\ Jens Thoms Toerring ___ j...@toerring.de
\__________________________ http://toerring.de

pete

unread,

Mar 23, 2010, 6:45:06 PM3/23/10

to

Mark Hobley wrote:
> I want to read a text file a line at a time from within a C program. Are there
> some available functions or code already written that does this or do I need
> to code from scratch?
>
> If I am doing this from scratch, what is the best practise for allocating
> a buffer size for the input line?
>
> I guess open the file, scan once to determine the buffer size, then rewind and
> start reading. Has this already been done or do I need to code this from
> scratch?
>
> (My project is open source, so I can utilize GPL licenced code, if necessary.)
>
> C89 compatible code is preferred.

It is not uncommon for C programmers
to write their own getline function.

Mine is called get_line.

int get_line(char **lineptr, size_t *n, FILE *stream);

--
pete

bartc

unread,

Mar 23, 2010, 7:47:51 PM3/23/10

to

"Mark Hobley" <markh...@hotpop.donottypethisbit.com> wrote in message
news:i29l77-...@neptune.markhobley.yi.org...

>I want to read a text file a line at a time from within a C program. Are
>there
> some available functions or code already written that does this or do I
> need
> to code from scratch?
>
> If I am doing this from scratch, what is the best practise for allocating
> a buffer size for the input line?

I just use a fixed size, big enough for text files that are line-oriented.

I've just checked and I'm using a 2KB buffer, but it could be much higher if
memory allows.

If the lines are longer than that sort of size, the file probably isn't
line-oriented and could do with a different approach. (Or might use a
different newline convention from that expected. Either way, you have a file
that is not in the right format.)

> I guess open the file, scan once to determine the buffer size, then rewind
> and
> start reading. Has this already been done or do I need to code this from
> scratch?

From files that might work (although pedants might say that by the second
read, someone could have written a longer line to the file). From devices
such as consoles I'm not sure that would work.

--
Bartc

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Keith Thompson

unread,

Mar 23, 2010, 8:33:42 PM3/23/10

to

Andrew Poelstra <apoe...@localhost.localdomain> writes:
[...]

> Otherwise, Chuck Falconer has a function called ggets() on his
> website that handles memory allocation and all that. I don't
> remember the link, but Google will find it.
>
> Richard Heathfield also has such a beast, according to the
> comments in Chuck's code. Given that Richard is still around
> and Chuck is not, you maybe will be better off with that.

"still around" meaning that Richard still posts here in comp.lang.c;
Chuck used to, but hasn't lately.

> In either case, they're very easy functions to use.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"