Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Reading a file line by line

645 views
Skip to first unread message

Charles Reitzel

unread,
Jul 18, 1997, 3:00:00 AM7/18/97
to

/* Just say no to C++ iostreams. They are slow and cumbersome.
** Although I like using a string class for many things, I find
** parsing type code is actually easier to write and read using
** a straight C idiom. This is true whenever you pulling out quoted
** strings, searching for field delimiters, etc., etc. It is always
** more efficient as well.
*/

#include <stdio.h>

#define MAX_BUFFER_SIZE 4096 // or bigger, if needed.
char buf[ MAX_BUFFER_SIZE ];

void main( int argc, char* argv[] )
{
FILE *fpin = fopen( argv[1], "r" );
if ( !fpin )
exit( 1 );

while ( fgets( buf, sizeof(buf), fpin ) == buf )
{ // do whatever you want to do here.
}
exit( 0 );
}

Alex Oren <alexo@---filter---bigfoot.com> wrote in article
<33ce3365.951165623@neptune>...
> I'm trying to read lines of text (CR/LF delimited) from a file using
Win32 APIs
> (CreateFile, ReadFile, etc.)
>
> What will be a simple and efficient way to do it?
> Reentrancy and thread safety is a concern too.
>
> It seems that the amount of buffering done by NT is minimal at best.
reading a
> file a character at the time is two orders of magnitude slower than using
large
> buffers.


Alex Oren

unread,
Jul 20, 1997, 3:00:00 AM7/20/97
to

"Charles Reitzel" <do...@spam.me> wrote:

} /* Just say no to C++ iostreams. They are slow and cumbersome.
} ** Although I like using a string class for many things, I find
} ** parsing type code is actually easier to write and read using
} ** a straight C idiom. This is true whenever you pulling out quoted
} ** strings, searching for field delimiters, etc., etc. It is always
} ** more efficient as well.
} */
}
} #include <stdio.h>

<code snipped>

I know how to use fgets(), thank you.

I want to avoid using *any* run-time library (whether C or C++).
I asked about reading a line using *only* native Win32 APIs in an efficient,
reentrant and thread-safe way.

} Alex Oren <alexo@---filter---bigfoot.com> wrote in article
} <33ce3365.951165623@neptune>...
} > I'm trying to read lines of text (CR/LF delimited) from a file using
} > Win32 APIs (CreateFile, ReadFile, etc.)
} >
} > What will be a simple and efficient way to do it?
} > Reentrancy and thread safety is a concern too.
} >
} > It seems that the amount of buffering done by NT is minimal at best.
} > reading a file a character at the time is two orders of magnitude slower
} > than using large buffers.


Have fun,
Alex.

---------------------------------------------------------------
My email address is intentionally mangled to foil spambots.
Please remove the "---filter---" from the address for replying.
Sorry for the inconvenience.
---------------------------------------------------------------

Richard Sanders

unread,
Jul 20, 1997, 3:00:00 AM7/20/97
to

On Sun, 20 Jul 1997 06:52:48 GMT, alexo@---filter---bigfoot.com (Alex
Oren) wrote:

>"Charles Reitzel" <do...@spam.me> wrote:
>
>} /* Just say no to C++ iostreams. They are slow and cumbersome.
>} ** Although I like using a string class for many things, I find
>} ** parsing type code is actually easier to write and read using
>} ** a straight C idiom. This is true whenever you pulling out quoted
>} ** strings, searching for field delimiters, etc., etc. It is always
>} ** more efficient as well.
>} */
>}
>} #include <stdio.h>
><code snipped>
>
>I know how to use fgets(), thank you.
>
>I want to avoid using *any* run-time library (whether C or C++).
>I asked about reading a line using *only* native Win32 APIs in an efficient,
>reentrant and thread-safe way.

LIBCMT.LIB for
Multithread static library, retail version

MSVCRT.LIB for
Import library for MSVCRT.DLL, retail version


This is in the manual.

Alex Oren

unread,
Jul 21, 1997, 3:00:00 AM7/21/97
to

ric...@stardate.bc.ca (Richard Sanders) wrote:

Again <sigh>

I DO NOT WANT TO USE THE C RUNTIME LIBRARIES OR DLLS.
I NEED TO USE ONLY THE BARE WIN32 APIS.

That means:
No LIBCMT!!!
No MSVCRT!!!
And no open(), fopen(), fgets(), fscanf() and their ilk!!!

Sorry for shouting.

Chris Marriott

unread,
Jul 21, 1997, 3:00:00 AM7/21/97
to

In article <33d71f2d.6639827@neptune>, Alex Oren <alexo@---filter---
bigfoot.com> writes

>Again <sigh>
>
>I DO NOT WANT TO USE THE C RUNTIME LIBRARIES OR DLLS.
>I NEED TO USE ONLY THE BARE WIN32 APIS.
>
>That means:
>No LIBCMT!!!
>No MSVCRT!!!
>And no open(), fopen(), fgets(), fscanf() and their ilk!!!
>
>Sorry for shouting.

A suggestion, Alex, why not look at the source code for "fgets" and see
how it does it; the C RTL source code is supplied on the VC++ CD-ROM.

Chris

----------------------------------------------------------------
Chris Marriott, Microsoft Certified Solution Developer.
SkyMap Software, U.K. e-mail: ch...@skymap.com
Visit our web site at http://www.skymap.com

Robert Schlabbach

unread,
Jul 21, 1997, 3:00:00 AM7/21/97
to

Alex Oren <alexo@---filter---bigfoot.com> wrote in article
<33d71f2d.6639827@neptune>...

> I DO NOT WANT TO USE THE C RUNTIME LIBRARIES OR DLLS.
> I NEED TO USE ONLY THE BARE WIN32 APIS.

The bare Win32 API only has read functions that DON'T look at the data, so
they can't provide any "line-by-line" function. You'll have to do what
fgets() does - ReadFile() chunks from the file in a large enough buffer and
then scan through it for CR/LFs. Use nicely optimized assembly for maximum
speed :).

Regards,
--
Robert Schlabbach
e-mail: rob...@powerstation.isdn.cs.TU-Berlin.DE
Technical University of Berlin, Germany


Alfons Hoogervorst

unread,
Jul 21, 1997, 3:00:00 AM7/21/97
to

Lo alexo@---filter---bigfoot.com (Alex Oren):

[zoom, mr. Reitzel showed how to read a file using fgets()]


>I know how to use fgets(), thank you.
>
>I want to avoid using *any* run-time library (whether C or C++).
>I asked about reading a line using *only* native Win32 APIs in an efficient,
>reentrant and thread-safe way.

You're defeating the purpose of the run-time library. Anyway, read
chunks of data from the file using ReadFile() and let your code watch
the read data for line breaks.

Bye.

+- Conceived through intercalation and juxtaposition -+
| systems programmer / word player avant la lettre |
| proteus <hosted at> worldaccess nl |
+-----------------------------------------------------+

dave porter

unread,
Jul 21, 1997, 3:00:00 AM7/21/97
to

OK, if you don't want to use the C RTL, then
you must read in lumps of bytes using the Win32
ReadFile call, and look for \r and \n, and determine
line boundaries that way.

The file system is not line-oriented.
ReadFile is not line-oriented.

The C RTL file functions provide buffered line-oriented
I/O on top of the Win32 API. If you don't want to use
them, you'll have to write pretty much the same thing
yourself.

Actually, I don't usually use ReadFile at all. I just
map the file - you still have to scan for \r and \n but
at least you don't have to worry about intermediate
buffering.

dave
--
for email: remove the dollar sign from my address. sorry.


Alex Oren <alexo@---filter---bigfoot.com> wrote in article
<33d71f2d.6639827@neptune>...

> ric...@stardate.bc.ca (Richard Sanders) wrote:
>
> } On Sun, 20 Jul 1997 06:52:48 GMT, alexo@---filter---bigfoot.com (Alex
> } Oren) wrote:
> }

> } >I know how to use fgets(), thank you.
> } >
> } >I want to avoid using *any* run-time library (whether C or C++).
> } >I asked about reading a line using *only* native Win32 APIs in an
efficient,
> } >reentrant and thread-safe way.
> }

> } LIBCMT.LIB for
> } Multithread static library, retail version
> }
> } MSVCRT.LIB for
> } Import library for MSVCRT.DLL, retail version
> }
> } This is in the manual.
>
> Again <sigh>
>

> I DO NOT WANT TO USE THE C RUNTIME LIBRARIES OR DLLS.
> I NEED TO USE ONLY THE BARE WIN32 APIS.
>

Jerry Coffin

unread,
Jul 22, 1997, 3:00:00 AM7/22/97
to

In article <33dcb4bc.1180884712@neptune>, alexo@---filter---
bigfoot.com says...

[ ... ]

> I want to avoid using *any* run-time library (whether C or C++).
> I asked about reading a line using *only* native Win32 APIs in an
> efficient, reentrant and thread-safe way.

In that case, you'll need to write a fairly close simulation of what
the C library already does for you. You'll want to start with a
structure to hold information about the file:

/* BTW, this may look a bit like C, but it's really meant more as
* pseudo-code - it's not really intended to compile as-is. I'm
* leaving out quite a bit of stuff like parameters to Win32
* functions.
*/
typedef struct {
char *buffer;
size_t buf_size;
size_t cur_size;
size_t pos;
CRITICAL_SECTION sect;
HANDLE file;
} file_desc;

Then you'll need to be able to open a file and fill in its
information:

file_desc *open_file(char *name, DWORD mode) {

file_desc *retval;

retval=HeapAlloc(sizeof(*retval));
retval->file = CreateFile(name, ... );
retval->buffer = HeapAlloc(some_size);
retval->pos = 0;
retval->buf_size = some_size;
retval->cur_size = 0;

return retval;
}

obviously you'll also need to able to close the file and free its
memory:

void close_file(file_desc *file) {

EnterCriticalSection(file->sect);
CloseFile(file->file);
HeapFree(file->buffer);
file->buf_size = 0;
file->cur_pos = 0;
file->pos = 0;
LeaveCriticalSection(file->sect);
HeapFree(file);
}

Then a utility function to read a buffer full of info from the file
into the buffer:

void fill_buffer(file_desc *file) {

EnterCriticalSection(file->sect);
ReadFile(file->file,
file->buffer,
file->buf_size,
&(file->cur_size),
NULL);
if (file->cur_size == 0 )
return EOF;
file->pos = 1;
LeaveCriticalSection(file->sect);
return file->buffer[0];
}

and some code to read one character from the file:

#define get_character(file) \
file->pos < file->buff_size ? \
file->buffer[pos++] : \
fill_buffer(file);

finally, code to read a whole line from the file:

void *get_line(char *buffer, size_t max_len, file_desc *file) {

int ch;
size_t current = 0;

if ( EOF == (ch = get_character(file))
return NULL;

do
*buffer++ = ch;
while (++current<max_len &&
EOF!=(ch=get_character(file)) &&
'\n' != ch);
}

As-is, closing a file isn't completely thread-safe - you can close a
file from one thread and still attempt to access the (now freed)
memory of the file descriptor from another thread. About the cleanest
way to handle this is to create a zombie file descriptor structure
that any other thread can detect, but which still leaves some valid
memory at which the pointer can point. This complicates things enough
I've left it out for now, but hopefully the description is enough to
allow you to implement it if necessary.

I've also left write buffering out of the code for the moment. Write
buffering is almost identical to read buffering, but of course adds
more code. Allowing seeking on the file is a bit more difficult.
Generally you want to keep track of the absolute position of the start
of the file buffer. Seeks within the range of the current buffer can
be handled by simply modifying `pos'. Seeks outside that range have
to be handled by re-reading from the file. If a seek is to a point
just before the beginning of the buffer, you can simply move memory
within the buffer, and do a partial read, but this is likely more
trouble than it's worth.

Obviously there's a lot more that you might need in a complete I/O
library. However, I don't feel like writing and posting a complete
standard I/O library today...

--
Later,
Jerry.

Technical Support

unread,
Jul 22, 1997, 3:00:00 AM7/22/97
to

On Thu, 17 Jul 1997 15:05:35 GMT, alexo@---filter---bigfoot.com (Alex
Oren) wrote:

>
>Hello.


>
>I'm trying to read lines of text (CR/LF delimited) from a file using Win32 APIs
>(CreateFile, ReadFile, etc.)
>
>What will be a simple and efficient way to do it?
>Reentrancy and thread safety is a concern too.
>
>It seems that the amount of buffering done by NT is minimal at best. reading a
>file a character at the time is two orders of magnitude slower than using large
>buffers.

Create a memory mapped file then navigate that using whatever method
you find best for your data format.


======================================================================
Ś WideOcean Ltd Ś Image enabling for mainframe systems Ś
Ś in...@wideocean.co.uk Ś Fax routing via e-mail Ś
Ś http://www.wideocean.co.uk Ś Watermark imaging integrators Ś
======================================================================

Scott Holland Settlemier

unread,
Jul 24, 1997, 3:00:00 AM7/24/97
to

Read the overview on file mapping, what you need to do is get
away from treating the file as something "over there" which
is parsed by manually shuttling stuff yourself. Map the file
into your address space and let windows do the disk operations
it's really good at-- then you can treat the file just as you
would some other entity on the heap and parse through it
with some minimal routines of your own.. oh yeah, this
is all bare win32.

In article <33d71f2d.6639827@neptune>, alexo@---filter---bigfoot.com (Alex

Oren) wrote:
>ric...@stardate.bc.ca (Richard Sanders) wrote:
>

>} On Sun, 20 Jul 1997 06:52:48 GMT, alexo@---filter---bigfoot.com (Alex
>} Oren) wrote:
>}
>} >I know how to use fgets(), thank you.
>} >

>} >I want to avoid using *any* run-time library (whether C or C++).
>} >I asked about reading a line using *only* native Win32 APIs in an efficient,
>} >reentrant and thread-safe way.
>}

>} LIBCMT.LIB for
>} Multithread static library, retail version
>}
>} MSVCRT.LIB for
>} Import library for MSVCRT.DLL, retail version
>}
>} This is in the manual.
>
>Again <sigh>
>
>I DO NOT WANT TO USE THE C RUNTIME LIBRARIES OR DLLS.
>I NEED TO USE ONLY THE BARE WIN32 APIS.
>
>That means:
>No LIBCMT!!!
>No MSVCRT!!!
>And no open(), fopen(), fgets(), fscanf() and their ilk!!!
>
>Sorry for shouting.
>
>

0 new messages