Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ReadFile vs fread

691 views
Skip to first unread message

James Williams

unread,
Jan 8, 2009, 8:42:07 AM1/8/09
to
Hello again,

I was wondering if ReadFile have better performance that fread crt? I would
like to get away from the C CRT. Also, does windows provided equivalents to
malloc, realloc? And if so will they provide better performance?

Thanks.


James


Victor Bazarov

unread,
Jan 8, 2009, 9:04:26 AM1/8/09
to
James Williams wrote:
> I was wondering if ReadFile have better performance that fread crt? I would
> like to get away from the C CRT. Also, does windows provided equivalents to
> malloc, realloc? And if so will they provide better performance?

IME there is no significant difference between those. I did run into
some differences between fseek and lseek with large files. The reason
is that on Windows the CRT is actually implemented in terms of the
Windows API, so 'fread' most likely uses 'ReadFile' behind the scenes.

That said, it sounds like you're concerned with performance. I/O needs
to be looked at, no doubt. And memory allocation can be taxing on your
app's speed as well. But replacing the functions you think misbehave
with other functions without looking at what actually causes the
slowdown is not a good approach.

If memory management is the issue, take a look at solutions involving
pooled memory managers. If file access is a problem, consider reducing
the overall number of times you access the files, or organize the access
in such way that the files are read sequentially, in chunks, for
example. Windows has file I/O caching, which can help you unless you
work against it somehow (like reading a byte or two every megabyte or so
from your file).

In any case, performance needs to be addressed, but (a) it has to be the
last issue you tackle after the application is fully functional, (b) you
need to measure, not guess, before trying to improve anything, and (c)
do gather some information on how others dealt with performance, what
methods they used, so you don't just poke at your application hoping it
will get off its ass and start moving.

Good luck!

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

James Williams

unread,
Jan 8, 2009, 9:32:04 AM1/8/09
to
I have the application running now. I was just hoping to get away from the
C runtime because how this are handled between different compilers has
caused porting issues. Where as using Windows API, is uniform across
compilers. Normally, I use new. But new does not have a realloc for
resizing of arrays without having to just create a whole new one and then
copy everything over. Disk I/O is part of the bottle neck, but I have
handled that by using multi buffers. Each buffer has its own thread of
execution. The buffer then calls the heavy processing alorgithm which
processs the data. In this way I have increased performance on a single
file by over 400%. Again, though there are differences in fread and stati64
between compilers, some handle the st_mode properly some don't. Some
support S_IFSOCK in the mode others don't. Hence API. But I want to be
assured that the windows API would not hinder performance if I switch to
using CreateFile ReadFile, GetFileAttribu ....

As for malloc, and new and the like. I assume each has its own memory
manager, so the question is asked which has better performance. I know
windows API provides heap memory routines, how do they compare to malloc?

Thanks.

"Victor Bazarov" <v.Aba...@comAcast.net> wrote in message
news:gk5159$bk0$1...@news.datemas.de...

Giovanni Dicanio

unread,
Jan 8, 2009, 9:37:51 AM1/8/09
to
James Williams wrote:

> Normally, I use new. But new does not have a realloc for
> resizing of arrays without having to just create a whole new one and then
> copy everything over.

But C++ offers std::vector, much better than new and realloc (e.g. it is
exception safe, etc.).

If you need to write your own memory allocator (e.g. a pool allocator)
and use it from STL containers, you may want to read Stephan's (Dev of
VC++ Team) post on Visual C++ Team blog:

http://blogs.msdn.com/vcblog/archive/2008/08/28/the-mallocator.aspx

Giovanni

Giovanni Dicanio

unread,
Jan 8, 2009, 9:45:24 AM1/8/09
to
James Williams wrote:

> Hence API. But I want to be
> assured that the windows API would not hinder performance if I switch to
> using CreateFile ReadFile, GetFileAttribu ....

I believe that if you use Windows Win32 APIs directly (and if you use
them wisely) you have top performance, better than CRT (which I think
calls the Win32 APIs in its implementation code).

So, I believe that if you switch to CreateFile, ReadFile, etc. you will
be fine (assuming that you want to build Win32-only code).

I would suggest you to read this very interesting optimization series on
Raymond Chen's blog (when he optimizes his code, he directly uses Win32
API functions like CreateFile, file-mapping, etc.)

Loading the dictionary, part 1: Starting point
http://blogs.msdn.com/oldnewthing/archive/2005/05/10/415991.aspx

Loading the dictionary, part 2: Character conversion
http://blogs.msdn.com/oldnewthing/archive/2005/05/11/416430.aspx

Loading the dictionary, part 3: Breaking the text into lines
http://blogs.msdn.com/oldnewthing/archive/2005/05/13/417183.aspx

Loading the dictionary, part 4: Character conversion redux
http://blogs.msdn.com/oldnewthing/archive/2005/05/16/417865.aspx

Loading the dictionary, part 5: Avoiding string copying
http://blogs.msdn.com/oldnewthing/archive/2005/05/18/419130.aspx

Loading the dictionary, part 6: Taking advantage of our memory
allocation pattern
http://blogs.msdn.com/oldnewthing/archive/2005/05/19/420038.aspx


Giovanni

Tom Serface

unread,
Jan 8, 2009, 10:34:45 AM1/8/09
to
I've had better performance using CreateFile/ReadFile/WriteFile, but it is
more difficult to implement and the options to CreateFile are kind of
confusing. Still, I tend to use it now since I have more control over what
I'm doing and I got about 10% better performance using that paradigm.

Tom

"James Williams" <Jim_L_W...@hotmail.com> wrote in message
news:OH4yAcZc...@TK2MSFTNGP05.phx.gbl...

Doug Harrison [MVP]

unread,
Jan 8, 2009, 11:57:36 AM1/8/09
to
On Thu, 8 Jan 2009 07:42:07 -0600, "James Williams"
<Jim_L_W...@hotmail.com> wrote:

>Hello again,
>
>I was wondering if ReadFile have better performance that fread crt?

For the sorts of things you can do with fread, I expect ReadFile would
never be significantly better. OTOH, I expect it would be significantly
worse for frequent small reads, since ReadFile is a kernel call, while the
stdio function fread is buffered by default and only calls ReadFile to fill
the buffer. Note that to get the best performance out of fread for
single-threaded access to a file, especially with frequent small reads, you
need to use _fread_nolock and friends when using the multithreaded CRT.

>I would
>like to get away from the C CRT. Also, does windows provided equivalents to
>malloc, realloc? And if so will they provide better performance?

The CRT directly uses OS routines on Win2K and later. MS believes using the
OS-level "process heap" is at least as good as using the CRT's small-block
heap for most purposes, but this article documents a scenario in which it
is worse:

You may experience a C-Runtime heap performance problem in a Visual C++
application that is running on Windows 2000 or on Windows XP
http://support.microsoft.com/kb/323635

See also the function that allows you to use the CRT small-block heap:

_set_sbh_threshold
http://msdn.microsoft.com/en-us/library/a6x53890.aspx

--
Doug Harrison
Visual C++ MVP

Tom Serface

unread,
Jan 8, 2009, 12:26:46 PM1/8/09
to
That makes a lot of sense. The places where I use ReadFile I'm typically
reading 16K from the file myself and buffering in my program.

Tom

"Doug Harrison [MVP]" <d...@mvps.org> wrote in message
news:kvacm4lfcp7eejt0c...@4ax.com...


> On Thu, 8 Jan 2009 07:42:07 -0600, "James Williams"

> The CRT directly uses OS routines on Win2K and later. MS believes using

James Williams

unread,
Jan 8, 2009, 9:10:45 PM1/8/09
to
What I do with the file is read in a minimum of 32KBytes. The read is
aligned to a disk sector size. I have X number of threads operating on the
same file with the same file handle. Each thread is given a section of the
file to work with. I.E. Thread 1 get the first 10Mbyte, thread 2 gets the
next 10Mbyte plus any slop do to uneven division of file size to number of
buffer.

Then the file is processed in parallel. I have been using just fread and
using a critical section around the read and fsetpos for reposition for the
given thread. This seems to be working good. However, I would like to get
away from the CRT as much as possible.

Does this seem like a reasonable way of doing this.

James


"Doug Harrison [MVP]" <d...@mvps.org> wrote in message
news:kvacm4lfcp7eejt0c...@4ax.com...

Scott McPhillips [MVP]

unread,
Jan 8, 2009, 10:55:32 PM1/8/09
to
"James Williams" <Jim_L_W...@hotmail.com> wrote in message
news:uWpfW%23fcJ...@TK2MSFTNGP06.phx.gbl...

> What I do with the file is read in a minimum of 32KBytes. The read is
> aligned to a disk sector size. I have X number of threads operating on
> the same file with the same file handle. Each thread is given a section
> of the file to work with. I.E. Thread 1 get the first 10Mbyte, thread 2
> gets the next 10Mbyte plus any slop do to uneven division of file size to
> number of buffer.
>
> Then the file is processed in parallel. I have been using just fread and
> using a critical section around the read and fsetpos for reposition for
> the given thread. This seems to be working good. However, I would like
> to get away from the CRT as much as possible.
>
> Does this seem like a reasonable way of doing this.
>
> James

It seems unlikely that the disk read would go faster by doing it from
multiple threads, since you're adding overhead with the seek, and of course
the disk can't read from multiple places concurrently. But testing is the
only way to be sure.

You can improve read performance by using the FILE_FLAG_NO_BUFFERING option
in CreateFile, although it imposes very specific requirements on how you can
allocate the buffers and read. And further performance increases can be had
with FILE_FLAG_OVERLAPPED, which will let your code start processing the
first buffer while the second buffer is being read.

--
Scott McPhillips [VC++ MVP]

James Williams

unread,
Jan 9, 2009, 12:41:31 AM1/9/09
to
Testing is what I did. The disk is not being read concurrently. I have a
global critical section which prevents such an event. The critical section
protects the file handle so that only one thread may read at any given time.
The performance is gained because the processing algorithm which processes
the data in the buffer takes longer that the read operation to fill the
buffer, hence I can work with multiple threads.

Through testing, from single thread/buffer to multi thread/buffer I have
over a 400% increase in performance.
Now, if I open a file that is on a network share, then I only get about 200%
because the read operation is limited to the 1Gbit LAN.

Does the API provide file pointer repositioning with ReadFile, like fread
has fgetpos, fsetpos. What is the API equivalent if I choose to use
ReadFile?

Thanks,


James


"Scott McPhillips [MVP]" <org-dot-mvps-at-scottmcp> wrote in message
news:e69Qg4gc...@TK2MSFTNGP02.phx.gbl...

Scott McPhillips [MVP]

unread,
Jan 9, 2009, 12:47:50 AM1/9/09
to
"James Williams" <Jim_L_W...@hotmail.com> wrote in message
news:egVvH0hc...@TK2MSFTNGP04.phx.gbl...

> Does the API provide file pointer repositioning with ReadFile, like fread
> has fgetpos, fsetpos. What is the API equivalent if I choose to use
> ReadFile?
>
> Thanks,


Of course it does. The runtime lib wraps the APIs to do all its file
handling.

SetFilePointer(...)

Norbert Unterberg

unread,
Jan 9, 2009, 5:02:10 AM1/9/09
to

James Williams schrieb:


> Testing is what I did. The disk is not being read concurrently. I have a
> global critical section which prevents such an event. The critical section
> protects the file handle so that only one thread may read at any given time.
> The performance is gained because the processing algorithm which processes
> the data in the buffer takes longer that the read operation to fill the
> buffer, hence I can work with multiple threads.

By using Overlapped I/O with ReadFile, you get true concurrency without the need
to lock the file handle. The Offset and OffsetHigh members of the OVERLAPPED
structure speify thwe file position where the read starts, and the hEvent tells
your working thread when the read has finished and processing can start. When
using multiple concurrent ReadFile() calls from different threads, you give the
OS the chance to order the requests for maximum I/O performance.

Norbert

Alexander Grigoriev

unread,
Jan 9, 2009, 9:18:07 AM1/9/09
to
You can also do random reads with non-overlapped handle, by supplying
OVERLAPPED structure and putting the file offset into Offset and OffsetHigh.

"Norbert Unterberg" <nunte...@newsgroups.nospam> wrote in message
news:eHJIZFkc...@TK2MSFTNGP06.phx.gbl...

0 new messages