MapViewOfFile v ReadFile experiments and results

ne...@mail.adsl4less.com

unread,

Jan 18, 2006, 7:33:22 AM1/18/06

to

I think that this is the right NG to post to. :)

I've been experimenting with MapViewOfFile v ReadFile for performance.
It's on a PC (XP SP2) with 1Gb RAM and a 1.5Gb pagefile. The file I'm
reading is 347 MB and I'm simply reading the file sequentially and
doing some simple number crunching (checksumming, if you're interested)
on the data.

For the MapViewOfFile test, I map the whole file (based on the
assumption that it'll be faster than mapping and unmapping in smaller
chunks). When I run the test on a freshly booted PC, it reads the whole
file in 23.3s. If I repeatedly run the test (to cache the file in
"memory") it takes 10.7s.

For the ReadFile test, I read the file in 16KB chunks. When I run the
test on a freshly booted PC, it reads the whole file in 12.3s. If I
repeatedly run the test (to cache the file in the file cache) it takes
10.4s.

Having read that mapping files is generally faster than ReadFile,
should I be surprised that in my test, it's the other way round in both
a freshly booted PC and when run repeatedly? Any ideas how I might
speed things up even more? (Reading bigger chunks in ReadFile makes
negligable difference and if anything seems to slow things slightly.)

Cheers
Mark

ne...@mail.adsl4less.com

unread,

Jan 18, 2006, 9:01:16 AM1/18/06

to

FYI, I did try also mapping the file in 16KB chunks rather than the
whole file at once. On a freshly booted PC, it reads the whole
file in 36.9s. If I repeatedly run the test (to cache the file in
"memory") it takes 10.5s.

Seems that once the file is cached (in whatever cache) the results are
much the same. The big difference is running the tests on a freshly
booted PC: Using ReadFile in 16KB chunks is much faster than
MapViewOfFile (whole file mapped), which is much faster than
MapViewOfFile (using 16KB chunks).

Interesting.

Tom Widmer [VC++ MVP]

unread,

Jan 18, 2006, 9:48:59 AM1/18/06

to

ne...@mail.adsl4less.com wrote:
> I think that this is the right NG to post to. :)
>
> I've been experimenting with MapViewOfFile v ReadFile for performance.
> It's on a PC (XP SP2) with 1Gb RAM and a 1.5Gb pagefile. The file I'm
> reading is 347 MB and I'm simply reading the file sequentially and
> doing some simple number crunching (checksumming, if you're interested)
> on the data.
>
> For the MapViewOfFile test, I map the whole file (based on the
> assumption that it'll be faster than mapping and unmapping in smaller
> chunks). When I run the test on a freshly booted PC, it reads the whole
> file in 23.3s. If I repeatedly run the test (to cache the file in
> "memory") it takes 10.7s.
>
> For the ReadFile test, I read the file in 16KB chunks. When I run the
> test on a freshly booted PC, it reads the whole file in 12.3s. If I
> repeatedly run the test (to cache the file in the file cache) it takes
> 10.4s.
>
> Having read that mapping files is generally faster than ReadFile,
> should I be surprised that in my test, it's the other way round in both
> a freshly booted PC and when run repeatedly?

Memory mapped files are often faster for random access, but never for
sequential access AFAIK.

Any ideas how I might
> speed things up even more? (Reading bigger chunks in ReadFile makes
> negligable difference and if anything seems to slow things slightly.)

You might try disabling buffering. Read up on FILE_FLAG_NO_BUFFERING -
it places various constraints on how you call ReadFile. I don't know
whether it will actually speed things up though.

Tom

Lars Diehl

unread,

Jan 18, 2006, 12:41:35 PM1/18/06

to

Mark,

>
> Any ideas how I might
> speed things up even more? (Reading bigger chunks in ReadFile makes
> negligable difference and if anything seems to slow things slightly.)
>

IMHO the fastest way to read a whole file is a buffered sequential read.
Memory mapping has advantages for random access only.

You might consider using multiple threads, so that you can immediately read
the next chunk while processing the current one.

--
Lars Diehl

ne...@mail.adsl4less.com

unread,

Jan 18, 2006, 1:01:08 PM1/18/06

to

Tom Widmer [VC++ MVP] very kindly replied:

> ne...@mail.adsl4less.com wrote:
>
> Memory mapped files are often faster for random access, but never for
> sequential access AFAIK.

I didn't know that - but I do now. :)

>
> Any ideas how I might
> > speed things up even more? (Reading bigger chunks in ReadFile makes
> > negligable difference and if anything seems to slow things slightly.)
>
> You might try disabling buffering. Read up on FILE_FLAG_NO_BUFFERING -
> it places various constraints on how you call ReadFile. I don't know
> whether it will actually speed things up though.

Yes, I played with this option many moons ago, but only with ReadFile -
it killed performance! However, I'm now wondering what it might do to
MapViewOfFile. I'm guessing that in theory it should make no
difference. Hmm - I might give that a go. If I do, I'll re-post the
results.

I suspect, despite all this, that I'm at the limits of the speed of the
disk subsystem. Not that that has ever stopped my trying. :)

Cheers
Mark

Tony Proctor

unread,

Jan 18, 2006, 1:11:15 PM1/18/06

to

The size of the page file isn't really of interest Mark. The system file
cache is also memory-mapping the file, and this explains why your times are
so close. I suspect the slight difference is due to the granularity of the
mappings. Your explicit memory mapping is constraining itself to 16k chunks,
whereas the file cache may be mapping the whole file (it depends on the
available of resources on your system).

Tony Proctor

<ne...@mail.adsl4less.com> wrote in message
news:1137587602....@g44g2000cwa.googlegroups.com...

Lawrence Rust

unread,

Jan 18, 2006, 1:34:23 PM1/18/06

to

<ne...@mail.adsl4less.com> wrote
[snip]

> > You might try disabling buffering. Read up on FILE_FLAG_NO_BUFFERING -
> > it places various constraints on how you call ReadFile. I don't know
> > whether it will actually speed things up though.
>
> Yes, I played with this option many moons ago, but only with ReadFile -
> it killed performance! However, I'm now wondering what it might do to
> MapViewOfFile. I'm guessing that in theory it should make no
> difference. Hmm - I might give that a go. If I do, I'll re-post the
> results.

Setting FILE_FLAG_NO_BUFFERING & FILE_FLAG_OVERLAPPED when creating the
mapping file give maximum performance for MapViewOfFile.

> I suspect, despite all this, that I'm at the limits of the speed of the
> disk subsystem. Not that that has ever stopped my trying. :)

Have you tried benchmarking the disk I/O using perfmon? 347 MB in 23.3s
(16.9 MB/s) is well short of current disk performance. Back in 1999 I was
seeing a steady 8 MB/s writing video to a Seagate Medallist disk on NT4.

-- Lawrence Rust, Software Systems, www.softsystem.co.uk

ne...@mail.adsl4less.com

unread,

Jan 18, 2006, 6:11:55 PM1/18/06

to

Tony Proctor wrote:

> The size of the page file isn't really of interest Mark. The system file
> cache is also memory-mapping the file,

Does the system file cache do this the very first time the file is read
as well (that is, when it isn't yet cached)?

> and this explains why your times are
> so close. I suspect the slight difference is due to the granularity of the
> mappings.

The slight difference is only apparent once the read operation has been
done at least once. If I understand correctly, what you're saying is
that once the whole file has been read at least once, all subsequent
reads will ultimately use file mapping anyway? (On a freshly-booted PC,
the difference in timings is big.)

> Your explicit memory mapping is constraining itself to 16k chunks,

Even though I tell it to map the whole file? Everything I've read tells
me that if I use 0 as the size of the mapping, the whole file will be
mapped.

> whereas the file cache may be mapping the whole file (it depends on the
> available of resources on your system).
>
> Tony Proctor
>

[snip]

ne...@mail.adsl4less.com

unread,

Jan 18, 2006, 6:15:09 PM1/18/06

to

Lawrence Rust wrote:

>
> Have you tried benchmarking the disk I/O using perfmon? 347 MB in 23.3s
> (16.9 MB/s) is well short of current disk performance. Back in 1999 I was
> seeing a steady 8 MB/s writing video to a Seagate Medallist disk on NT4.
>

Quite so, but this isn't a standardised file I/O benchmark. :) It's a
comparative analysis of different file read methods on my system only.
The reason for the delay is the number crunching that app does - which
is independent of the file read method, so don't taken the numbers per
se too literally. (The app actually does two different checksums on the
read data.)

ne...@mail.adsl4less.com

unread,

Jan 18, 2006, 7:01:18 PM1/18/06

to

Lars Diehl wrote:

[snip]

> >
>
> IMHO the fastest way to read a whole file is a buffered sequential read.
> Memory mapping has advantages for random access only.

This certainly seems to be the consensus. I can happily say that I've
learnt something in this NG thread. :)

>
> You might consider using multiple threads, so that you can immediately read
> the next chunk while processing the current one.

Nice idea. I might do this if it turns out that I'm wasting a
significant amount of time during the number crunching which could be
spend reading data at the same time. Cheers.

>
> --
> Lars Diehl

Slava M. Usov

unread,

Jan 18, 2006, 7:37:07 PM1/18/06

to

<ne...@mail.adsl4less.com> wrote in message
news:1137626395.9...@o13g2000cwo.googlegroups.com...

>
> Lars Diehl wrote:
>
> [snip]
>> >
>>
>> IMHO the fastest way to read a whole file is a buffered sequential read.
>> Memory mapping has advantages for random access only.
>
> This certainly seems to be the consensus.

Not quite. Non-buffered reads are still faster when they are large enough.
Looking through your messages, I believe you have never tried reads bigger
than 16K, buffered or non-buffered -- in which case buffered IO wins because
the OS internally maintains a 128K read-ahead buffer. Try 1-2M non-buffered.

[...]

>> You might consider using multiple threads, so that you can immediately
>> read the next chunk while processing the current one.
>
> Nice idea. I might do this if it turns out that I'm wasting a
> significant amount of time during the number crunching which could be
> spend reading data at the same time. Cheers.

I doubt you will gain much from this with buffered reads [due to their
read-ahead property]. Non-buffered reads might fare better.

S

Tony Proctor

unread,

Jan 19, 2006, 5:53:02 AM1/19/06

to

I don't know the specific algorithm used by the file cache under Windows
Mark. However, there would be a similar delay when ReadFile first accesses
the file. Thereafter, these algorithms usually depend on available
resources, system loading, the number of concurrent accessors to the file,
the frequency of accesses to the same file, etc

You may also find that the system caching is more tuned to the de-blocking
necessary for record access via ReadFile

Tony Proctor

<ne...@mail.adsl4less.com> wrote in message
news:1137625915....@f14g2000cwb.googlegroups.com...

Tom Widmer [VC++ MVP]

unread,

Jan 19, 2006, 6:51:19 AM1/19/06

to

ne...@mail.adsl4less.com wrote:
> Tom Widmer [VC++ MVP] very kindly replied:
>
>
>>ne...@mail.adsl4less.com wrote:
>>
>>Memory mapped files are often faster for random access, but never for
>>sequential access AFAIK.
>
>
> I didn't know that - but I do now. :)

Well, it's a complex issue that I don't understand too well. Windows has
quite complex caching and paging, and it all depends on the exact flags
passed to the Create* functions. I might be wrong in some situations.

>> Any ideas how I might
>>
>>>speed things up even more? (Reading bigger chunks in ReadFile makes
>>>negligable difference and if anything seems to slow things slightly.)
>>
>>You might try disabling buffering. Read up on FILE_FLAG_NO_BUFFERING -
>>it places various constraints on how you call ReadFile. I don't know
>>whether it will actually speed things up though.
>
>
> Yes, I played with this option many moons ago, but only with ReadFile -
> it killed performance! However, I'm now wondering what it might do to
> MapViewOfFile. I'm guessing that in theory it should make no
> difference. Hmm - I might give that a go. If I do, I'll re-post the
> results.

FILE_FLAG_NO_BUFFERING has problems with certain request sizes I think.
Try exactly 256 * 1024 bytes.

> I suspect, despite all this, that I'm at the limits of the speed of the
> disk subsystem. Not that that has ever stopped my trying. :)

You might also try:
FILE_FLAG_SEQUENTIAL_SCAN
if you are reading sequentially.

Tom

Tom Widmer [VC++ MVP]

unread,

Jan 19, 2006, 7:57:39 AM1/19/06

to

ne...@mail.adsl4less.com wrote:
> Tom Widmer [VC++ MVP] very kindly replied:
>
>
>>ne...@mail.adsl4less.com wrote:
>>
>>Memory mapped files are often faster for random access, but never for
>>sequential access AFAIK.
>
>
> I didn't know that - but I do now. :)

Actually, it doesn't appear to necessarily be true anyway (sorry):

http://www.microsoft.com/technet/prodtechnol/windows2000serv/maintain/optimize/wperfch7.mspx

ReadFile on a buffered file has to copy the data from memory handled by
the cache manager, whereas a mapped file has the file read directly into
the virtual address space of your process.

Tom

Tony Proctor

unread,

Jan 19, 2006, 9:20:04 AM1/19/06

to

The copy operation for a cached file is only from the system area of your
process (which is shared across all process) to the process-specific area
Tom. In other words, the virtual storage used by the file cache is already
part of the process

Tony Proctor

"Tom Widmer [VC++ MVP]" <tom_u...@hotmail.com> wrote in message
news:#elgufPH...@TK2MSFTNGP09.phx.gbl...

Tony Proctor

unread,

Jan 19, 2006, 9:25:45 AM1/19/06

to

...or maybe I'm thinking of the 'Fast Copy' mechanism.

Tony Proctor

"Tony Proctor" <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote in message
news:...

Tom Widmer [VC++ MVP]

unread,

Jan 19, 2006, 11:35:53 AM1/19/06

to

Tony Proctor wrote:
> The copy operation for a cached file is only from the system area of your
> process (which is shared across all process) to the process-specific area
> Tom. In other words, the virtual storage used by the file cache is already
> part of the process

Got ya. Presumably that's the 1-2GB of virtual address space reserved
for the system for each process? It maps to the same "physical memory"
for each process?

Tom

Tony Proctor

unread,

Jan 19, 2006, 12:20:01 PM1/19/06

to

I think so Tom (sorry, my previous reply went astray and ended up at the end
of the thread). I may be thinking of the 'Fast Copy' mechanism. Maybe
someone else can confirm whether the file cache makes all/some of it's
virtual memory available to processes via the 'system' area.

Tony Proctor

"Tom Widmer [VC++ MVP]" <tom_u...@hotmail.com> wrote in message

news:uuYkqZRH...@TK2MSFTNGP10.phx.gbl...

ne...@mail.adsl4less.com

unread,

Jan 19, 2006, 5:59:21 PM1/19/06

to

Slava M. Usov wrote:

> <ne...@mail.adsl4less.com> wrote in message

>
> Not quite. Non-buffered reads are still faster when they are large enough.
> Looking through your messages, I believe you have never tried reads bigger
> than 16K, buffered or non-buffered -- in which case buffered IO wins because
> the OS internally maintains a 128K read-ahead buffer. Try 1-2M non-buffered.
>

I did try up to 64K buffered, but the differences between 16 and 64K
were negligable. From what you're saying this is to be expected. I'll
give 1MB+ a go tomorrow.

ne...@mail.adsl4less.com

unread,

Jan 19, 2006, 6:05:48 PM1/19/06

to

> Tony Proctor wrote:

> I don't know the specific algorithm used by the file cache under Windows
> Mark. However, there would be a similar delay when ReadFile first accesses
> the file. Thereafter, these algorithms usually depend on available
> resources, system loading, the number of concurrent accessors to the file,
> the frequency of accesses to the same file, etc
>
> You may also find that the system caching is more tuned to the de-blocking
> necessary for record access via ReadFile
>

Yes, this makes sense. Interestingly, I still haven't found file
mapping to be any faster than ReadFile with sequential reads (and to
clarify an earlier post, when I say sequential reads, I don't only mean
reading chunk by chunk, I also mean using the FILE_FLAG_SEQUENTIAL_SCAN
flag which as I understand it increased the read ahead caching). Also,
I still suspect that I've hit the limit of the disk subsystem and that
I can probably only improve the speed of non-buffered reads (which is
what I'm mainly interested in) with multiple threads, though in
practice that may be overkill!