Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

mmap() vs read() performance on Solaris

4 views
Skip to first unread message

Rich Teer

unread,
Aug 24, 2002, 2:00:31 PM8/24/02
to
Hi all,

The accepted wisdom is that mmaping a file is more efficient
that using read. Is this true for all circumstances, e.g.,
low memory situations?

If not, when would using read be preferable to using mmap?
Google wasn't much help, alas...

TIA,

--
Rich Teer

President,
Rite Online Inc.

Voice: +1 (250) 979-1638
URL: http://www.rite-online.net

Jack Hammer

unread,
Aug 24, 2002, 5:49:17 PM8/24/02
to
No i dont think its worth it in all cases. For example if you need to read
only one line from a file it would be better to use read().

That and I'm not sure some files -can- be mapped (devices or pipes for
example). I dont know tho so dont take my word for it.

Rich Teer wrote:

--
email is actually int0x80@hotmail =D

Rich Teer

unread,
Aug 24, 2002, 6:21:29 PM8/24/02
to
On Sat, 24 Aug 2002 Cyphe...@nyc.rr.com wrote:

> In comp.unix.solaris Rich Teer <ri...@rite-group.com> wrote:
> >
> > The accepted wisdom is that mmaping a file is more efficient
> > that using read. Is this true for all circumstances, e.g.,
> > low memory situations?
>

> Remember a decade ago when 4GLs were going to eliminate programmers? ;-)

I remember a program called "The Last One" which was supposedly
a program to automatically write programs. Obviously, it didn't
work too well... :-)

Rich Teer

unread,
Aug 25, 2002, 1:22:11 AM8/25/02
to
On Sat, 24 Aug 2002 Cyphe...@nyc.rr.com wrote:

> Hey, Rich, didn't you already write your book?

It's not finished yet. 420+ pages, though, so it's
getting there... Yes, I'm a slow writer! :-)

> Is this possible errata gathering? ;-)

Not quite errata, but one of my reviewers raised a question
that I'm not 100% sure of the answer to.

phil-new...@ipal.net

unread,
Aug 25, 2002, 1:17:25 AM8/25/02
to
In comp.unix.programmer Cyphe...@nyc.rr.com wrote:
| In comp.unix.solaris Rich Teer <ri...@rite-group.com> wrote:
|>
|> The accepted wisdom is that mmaping a file is more efficient
|> that using read. Is this true for all circumstances, e.g.,
|> low memory situations?
|
| Remember a decade ago when 4GLs were going to eliminate programmers? ;-)

Now it's XML!

--
-----------------------------------------------------------------
| Phil Howard - KA9WGN | Dallas | http://linuxhomepage.com/ |
| phil-...@ipal.net | Texas, USA | http://ka9wgn.ham.org/ |
-----------------------------------------------------------------

Jimmo

unread,
Aug 25, 2002, 4:24:47 AM8/25/02
to

"Rich Teer" <ri...@rite-group.com> wrote:
>
> The accepted wisdom is that mmaping a file is more efficient
> that using read. Is this true for all circumstances, e.g.,
> low memory situations?

The answer lies in the internal implementation of the two mechanisms.

To read from a file, you need to open it, initiate an I/O request for
a given block (or multiple blocks) from a given offset and return the
result to the caller (naturally this is summarised heavily but you
get the idea).

However, for mmap, you still need to do all of these things because
the file still needs to be opened and, sooner or later, the data needs
to be faulted in when the page data is accessed. You can't avoid the
I/O, although you may defer it until the fault.

So you can see that single or infrequent file access may not benefit
from mmap, should that be what the programmer is doing.

Finally, you should also consider any difference in flushing modified
data. For example, a sequence of open/read|write/close versus one
persistent mapping - does the programmer need msync() to flush out
modified pages? You get the idea...

Cheers
Jimmo


Andrew Gierth

unread,
Aug 25, 2002, 6:33:33 AM8/25/02
to
>>>>> "Rich" == Rich Teer <ri...@rite-group.com> writes:

Rich> Hi all,
Rich> The accepted wisdom is that mmaping a file is more efficient
Rich> that using read. Is this true for all circumstances,

no.

Rich> e.g., low memory situations?

Rich> If not, when would using read be preferable to using mmap?

The most obvious case is certain types of heavy random I/O where using
mmap() can cause the system to make sub-optimal choices of how much
data to read at one time from the underlying filesystem. For example,
suppose you want to read a large chunk (hundreds of kbytes) from the
middle of a large file which is stored on a large disk array
subsystem; a read() call may well end up requesting the whole thing
in one transaction, whereas mmap() will result in the first few pages
that are accessed being read piecemeal, with the amount of read-ahead
being largely guessed at by the system and dependent on the access
pattern (and the use of madvise() etc.).

--
Andrew.

comp.unix.programmer FAQ: see <URL: http://www.erlenstar.demon.co.uk/unix/>

Rich Teer

unread,
Aug 25, 2002, 1:50:00 PM8/25/02
to
On Sun, 25 Aug 2002, Jimmo wrote:

> Finally, you should also consider any difference in flushing modified
> data. For example, a sequence of open/read|write/close versus one
> persistent mapping - does the programmer need msync() to flush out
> modified pages? You get the idea...

Thanks!

Rich Teer

unread,
Aug 25, 2002, 1:50:13 PM8/25/02
to
On 25 Aug 2002, Andrew Gierth wrote:

> subsystem; a read() call may well end up requesting the whole thing
> in one transaction, whereas mmap() will result in the first few pages
> that are accessed being read piecemeal, with the amount of read-ahead
> being largely guessed at by the system and dependent on the access
> pattern (and the use of madvise() etc.).

Thanks!

Rich Teer

unread,
Aug 25, 2002, 1:51:14 PM8/25/02
to
On Sun, 25 Aug 2002, Jimmo wrote:

> Cheers
> Jimmo

BTW, are you the Jim Moore that works (or used to work) for Sun?

DINH Viet Hoa

unread,
Aug 28, 2002, 6:08:50 AM8/28/02
to
> The accepted wisdom is that mmaping a file is more efficient
> that using read. Is this true for all circumstances, e.g.,
> low memory situations?
>
> If not, when would using read be preferable to using mmap?
> Google wasn't much help, alas...

What I understood from my reading of the excellent book Solaris Internals
by Jim Mauro & Richard Mc Dougall is that in the case of a read(),
the system is however doing file caching and in Solaris, the file caching
mechanism work the same way as the mmap() calls in user mode:
the file is mapped in the kernel memory.
Then, in the case of the read, an additionnal mmap and a copy to user
mode will be done.

You may check the performances by writing some tests but, currently, I
believe that the read() system call is in all case less efficient.

--
DINH V. Hoa,
libEtPan! - a mail library - http://libetpan.sourceforge.net

"Quand tu discutes avec un chasseur, ton QI est au moins divisé par 2"

Joshua Jones

unread,
Aug 28, 2002, 8:34:28 AM8/28/02
to
In comp.unix.programmer DINH Viet Hoa <dinh-coeur.viet-fle...@free-sourire2.fr> wrote:

> You may check the performances by writing some tests but, currently, I
> believe that the read() system call is in all case less efficient.

"In all cases" covers, well, everything. If read() were less efficient
_in all cases_, we likely wouldn't have a read() call, or we would
implement read() as mmap(). There's hardly ever an _always_.

--
Joshua Jones
josh(at)homemail.com | jonesjos(at)us.ibm.com

DINH Viet Hoa

unread,
Aug 28, 2002, 9:16:22 AM8/28/02
to
> [ Part main, message/rfc822 ]

> > You may check the performances by writing some tests but, currently, I
> > believe that the read() system call is in all case less efficient.
>
> "In all cases" covers, well, everything. If read() were less efficient
> _in all cases_, we likely wouldn't have a read() call, or we would
> implement read() as mmap(). There's hardly ever an _always_.

mmap() appeared after read()
and read() is of course easier to use.

but if you have any clue about the efficiency ...

Andrew Gierth

unread,
Aug 28, 2002, 10:07:57 AM8/28/02
to
>>>>> "DINH" == DINH Viet Hoa <dinh-coeur.viet-fle...@free-sourire2.fr> writes:

DINH> You may check the performances by writing some tests but,
DINH> currently, I believe that the read() system call is in all case
DINH> less efficient.

believe what you like, but it isn't true.

The biggest difference between mmap() and read() on systems that have
unified VM systems (as Solaris does) is that in the case of read()
calls, the system has more information available to use for
optimisation. (Specifically, it knows _how much_ data needs to be read
_now_.) Ideally, one hopes (though one is often disappointed), this
will be used to make better decisions about how to dispatch I/O
requests to the underlying physical device.

When the application uses mmap(), all the I/O activity looks, to the
kernel, like a series of page-sized requests, and it has to deduce
for itself, with only some minor hints available for the application
to supply, how much data to read at each step.

(It has to be noted, though, that sometimes bad read-ahead choices by
the kernel can result in read() being substantially _less_ efficient.
This can depend on the underlying filesystem.)

Chris Thompson

unread,
Aug 28, 2002, 10:25:34 AM8/28/02
to
In article <8765xv1...@erlenstar.demon.co.uk>,
Andrew Gierth <and...@erlenstar.demon.co.uk> wrote:
[...]

>
>The biggest difference between mmap() and read() on systems that have
>unified VM systems (as Solaris does) is that in the case of read()
>calls, the system has more information available to use for
>optimisation. (Specifically, it knows _how much_ data needs to be read
>_now_.) Ideally, one hopes (though one is often disappointed), this
>will be used to make better decisions about how to dispatch I/O
>requests to the underlying physical device.
>
>When the application uses mmap(), all the I/O activity looks, to the
>kernel, like a series of page-sized requests, and it has to deduce
>for itself, with only some minor hints available for the application
>to supply, how much data to read at each step.
>
>(It has to be noted, though, that sometimes bad read-ahead choices by
>the kernel can result in read() being substantially _less_ efficient.
>This can depend on the underlying filesystem.)

Theoretically the system can recover a *little* of the information it
would have in the read() case if the program uses madvise() correctly.
Certainly cat(1), for example, uses madvise(MADV_SEQUENTIAL) when it
mmap's files.

Chris Thompson
Email: cet1 [at] cam.ac.uk

Barry Margolin

unread,
Aug 28, 2002, 5:23:01 PM8/28/02
to
In article <akig0k$q6m$2...@solaria.cc.gatech.edu>,

Joshua Jones <jaj...@cc.gatech.edu> wrote:
>In comp.unix.programmer DINH Viet Hoa
><dinh-coeur.viet-fle...@free-sourire2.fr> wrote:
>
>> You may check the performances by writing some tests but, currently, I
>> believe that the read() system call is in all case less efficient.
>
>"In all cases" covers, well, everything. If read() were less efficient
>_in all cases_, we likely wouldn't have a read() call, or we would
>implement read() as mmap(). There's hardly ever an _always_.

That's how we did it on Multics, the system that inspired Unix 30 years
ago.

--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Sony Antony

unread,
Aug 28, 2002, 11:30:21 PM8/28/02
to
> Theoretically the system can recover a *little* of the information it
> would have in the read() case if the program uses madvise() correctly.
> Certainly cat(1), for example, uses madvise(MADV_SEQUENTIAL) when it
> mmap's files.
>

using strace/truss in linux/solaris, cat is implemented as
open()/read()/write()/read()/write()/../close(). So the question of
madvise() doesn t arise at all.

--sony

Rich Teer

unread,
Aug 29, 2002, 12:14:31 AM8/29/02
to
On 28 Aug 2002, Sony Antony wrote:

> using strace/truss in linux/solaris, cat is implemented as
> open()/read()/write()/read()/write()/../close(). So the question of
> madvise() doesn t arise at all.

Solaris 8 cat definately uses mmap & madvise - I've just checked the
source code.

Casper H.S. Dik

unread,
Aug 29, 2002, 4:35:50 AM8/29/02
to

>> [ Part main, message/rfc822 ]
>> > You may check the performances by writing some tests but, currently, I
>> > believe that the read() system call is in all case less efficient.
>>
>> "In all cases" covers, well, everything. If read() were less efficient
>> _in all cases_, we likely wouldn't have a read() call, or we would
>> implement read() as mmap(). There's hardly ever an _always_.

>mmap() appeared after read()
>and read() is of course easier to use.

and mmap() doesn't work for all types of devices

mmap() does try to keep track of sequential reads.

Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

Joerg Schilling

unread,
Aug 29, 2002, 5:58:40 AM8/29/02
to
In article <3eb007f1.02082...@posting.google.com>,

So you checked somthing else but definitely not cat!

Cat uses mmap() since 1987.


--
EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
j...@cs.tu-berlin.de (uni) If you don't have iso-8859-1
schi...@fokus.gmd.de (work) chars I am J"org Schilling
URL: http://www.fokus.gmd.de/usr/schilling ftp://ftp.fokus.gmd.de/pub/unix

Chris Thompson

unread,
Aug 29, 2002, 11:45:18 AM8/29/02
to
In article <Pine.GSO.4.44.02082...@grover.rite-group.com>,

Rich Teer <ri...@rite-group.com> wrote:
>On 28 Aug 2002, Sony Antony wrote:
>
>> using strace/truss in linux/solaris, cat is implemented as
>> open()/read()/write()/read()/write()/../close(). So the question of
>> madvise() doesn t arise at all.
>
>Solaris 8 cat definately uses mmap & madvise - I've just checked the
>source code.

Source not necessary! Just:

$ truss cat /etc/motd >/dev/null
execve("/usr/bin/cat", 0xFFBEFC34, 0xFFBEFC40) argc = 2
[... lots of shared library stuff snipped ...]
brk(0x00022A18) = 0
brk(0x00024A18) = 0
fstat64(1, 0xFFBEFB38) = 0
open64("/etc/motd", O_RDONLY) = 3
fstat64(3, 0xFFBEFAA0) = 0
llseek(3, 0, SEEK_CUR) = 0
mmap64(0x00000000, 310, PROT_READ, MAP_SHARED, 3, 0) = 0xFF380000
read(3, " S", 1) = 1
memcntl(0xFF380000, 310, MC_ADVISE, MADV_SEQUENTIAL, 0, 0) = 0
write(1, " S o l a r i s 8 [ d".., 310) = 310
llseek(3, 310, SEEK_SET) = 310
munmap(0xFF380000, 310) = 0
llseek(3, 0, SEEK_CUR) = 310
close(3) = 0
close(1) = 0

I'm not entirely sure why it's doing a 1-byte read() from the source
file, though!

Sony Antony

unread,
Aug 29, 2002, 3:06:21 PM8/29/02
to
j...@cs.tu-berlin.de (Joerg Schilling) wrote in message news:<akkr8g$it1$1...@news.cs.tu-berlin.de>...

> In article <3eb007f1.02082...@posting.google.com>,
> Sony Antony <sonya...@hotmail.com> wrote:
> >> Theoretically the system can recover a *little* of the information it
> >> would have in the read() case if the program uses madvise() correctly.
> >> Certainly cat(1), for example, uses madvise(MADV_SEQUENTIAL) when it
> >> mmap's files.
> >>
> >
> >using strace/truss in linux/solaris, cat is implemented as
> >open()/read()/write()/read()/write()/../close(). So the question of
> >madvise() doesn t arise at all.
>
> So you checked somthing else but definitely not cat!
>
> Cat uses mmap() since 1987.

Actually I was wrong on solaris. I only checked the first line ( which
I dont know why it does a read() ). And linux definitely uses read()s.

//solaris 2.8
open64("a.c", O_RDONLY) = 3
fstat64(3, 0xFFBEF480) = 0


llseek(3, 0, SEEK_CUR) = 0

mmap64(0x00000000, 391, PROT_READ, MAP_SHARED, 3, 0) = 0xFF380000
read(3, " #", 1) = 1 //WHAT IS THIS
memcntl(0xFF380000, 391, MC_ADVISE, MADV_SEQUENTIAL, 0, 0) = 0
write(1, " # i n c l u d e < s t r".., 391) = 391
llseek(3, 391, SEEK_SET) = 391
munmap(0xFF380000, 391) = 0
llseek(3, 0, SEEK_CUR) = 391


close(3) = 0
close(1) = 0

--sony

Dragan Cvetkovic

unread,
Aug 29, 2002, 3:31:44 PM8/29/02
to
sonya...@hotmail.com (Sony Antony) writes:
> Actually I was wrong on solaris. I only checked the first line ( which
> I dont know why it does a read() ). And linux definitely uses read()s.

According to the comment in cat of Solaris 8 src code, it has something to
do with NFSv2 and root access. If you want to find out more, get Solaris
source code :-)

Bye, Dragan

--
Dragan Cvetkovic,

To be or not to be is true. G. Boole No it isn't. L. E. J. Brouwer

Casper H.S. Dik

unread,
Aug 29, 2002, 3:46:41 PM8/29/02
to
Dragan Cvetkovic <d1r2a3g4a...@soli99ton.com> writes:

>sonya...@hotmail.com (Sony Antony) writes:
>> Actually I was wrong on solaris. I only checked the first line ( which
>> I dont know why it does a read() ). And linux definitely uses read()s.

>According to the comment in cat of Solaris 8 src code, it has something to
>do with NFSv2 and root access. If you want to find out more, get Solaris
>source code :-)


Because cat used to fail with "write error" (write wasn't able to
page in the page because of read permission problems on the server;
but the result showed up as a failure of the "write" system call;
confusing for some)

Dragan Cvetkovic

unread,
Aug 29, 2002, 3:51:29 PM8/29/02
to
Casper H.S. Dik <Caspe...@Sun.COM> writes:

> Dragan Cvetkovic <d1r2a3g4a...@soli99ton.com> writes:
>
> >sonya...@hotmail.com (Sony Antony) writes:
> >> Actually I was wrong on solaris. I only checked the first line ( which
> >> I dont know why it does a read() ). And linux definitely uses read()s.
>
> >According to the comment in cat of Solaris 8 src code, it has something to
> >do with NFSv2 and root access. If you want to find out more, get Solaris
> >source code :-)
>
>
> Because cat used to fail with "write error" (write wasn't able to
> page in the page because of read permission problems on the server;
> but the result showed up as a failure of the "write" system call;
> confusing for some)
>
> Casper

Thanks Casper. I could see that from the source code, but wasn't sure how
much I was allowed to disclose in this newsgroup. However, you are from SUN
so you know what you are doing (I hope so) :-)

Geoff Clare

unread,
Aug 30, 2002, 9:17:02 AM8/30/02
to
cet1-...@cam.ac.uk.invalid (Chris Thompson) writes:

>$ truss cat /etc/motd >/dev/null

[snip]


>mmap64(0x00000000, 310, PROT_READ, MAP_SHARED, 3, 0) = 0xFF380000
>read(3, " S", 1) = 1
>memcntl(0xFF380000, 310, MC_ADVISE, MADV_SEQUENTIAL, 0, 0) = 0
>write(1, " S o l a r i s 8 [ d".., 310) = 310

>I'm not entirely sure why it's doing a 1-byte read() from the source
>file, though!

Probably so that the st_atime value for the file will be updated.

--
Geoff Clare <nos...@gclare.org.uk>

DINH Viet Hoa

unread,
Aug 30, 2002, 12:24:02 PM8/30/02
to

or check if the file is readable and fail on a read() system call
instead of a SIGBUS ?

--
DINH V. Hoa,
libEtPan! - a mail library - http://libetpan.sourceforge.net

"les gens faut cash leur dire qu'on s'en branle de leur problème avec word"

dlc=u...@cs.cmu.edu

unread,
Sep 1, 2002, 1:07:43 AM9/1/02
to
In article <etPan.3d6ca132.131a74ea.13b5@bart>,
>> The accepted wisdom is that mmaping a file is more efficient
>> that using read. Is this true for all circumstances, e.g.,
>> low memory situations?
>>
>> If not, when would using read be preferable to using mmap?
>> Google wasn't much help, alas...
>
>What I understood from my reading of the excellent book Solaris Internals
>by Jim Mauro & Richard Mc Dougall is that in the case of a read(),
>the system is however doing file caching and in Solaris, the file caching
>mechanism work the same way as the mmap() calls in user mode:
>the file is mapped in the kernel memory.
>Then, in the case of the read, an additionnal mmap and a copy to user
>mode will be done.

I will not get into performance issues since others have made valid points.
The following points are issues to consider when examining whether using
mmap() is the best choice where portability, software longevity and
changing the model to use read() may be necessary in the future.
Today, mmap() is more widely the same than it used to be, but consider
the following issues.

* Only recently (Opengroup Issue 6, maybe earlier) has mmap() been
standardized. I would not expect all vendors to have equally valid
implementations. In the past, the API and semantics have been
sufficiently different between vendors that mmap() was not a
good choice where portability was desired.

* Would it be reasonable for data sources other than a file to be
used. You can not mmap() devices, pipes, FIFO's, etc.

* While it is valid to close the descriptor after mapping the file, whether
this works properly may depend upon the underlying file system, which
may implement this semantic incorrectly.

* If you read() the file and there is an error (I/O error, timeout, etc.),
there is a good chance that an error indication will be reported to
your application and you get a chance to do something. A failed page
fault does not give you much of a chance to do anything; you probably
will get a BUS error, and portably doing something with this is bound
to hurt.

* If you expect different applications to mix and match read() and mmap()
for the same file, don't be surprised if some situation arises where
thinks don't work as well as you would like. Unified buffer caches
make this less likely, but I have developed a certain paranoia over
the years.

* Some implementations have had the following deficiencies

- Limits on the number of mapped segments
- Mapped files could not be unlinked
- Mapped files and allocated memory (heap, stack, etc.) can lead to
VM policy choices such that one or the other suffers.

Much of this is not as true today as it used to be, but mmap() is still a
more bleedy interface than good old read().

I am sure that I am forgetting some other bit of pain from the past, but
it is late and I can not think of anything else.

--
Daryl Clevenger
dlc=use...@cs.cmu.edu

Jimmo

unread,
Sep 1, 2002, 2:21:21 AM9/1/02
to

<dlc=use...@cs.cmu.edu> wrote:
>
> * Would it be reasonable for data sources other than a file to be
> used. You can not mmap() devices [...]

Well, that's not entirely true. You can map devices, if they are
mappable :) Xsun maps the framebuffer. In fact, mmapping a file is
ultimately mapping [data on] a disk device. The kernel maps devices
in pretty much the same way - allocate a va range and map it to the pa.

> * While it is valid to close the descriptor after mapping the file,
whether
> this works properly may depend upon the underlying file system, which
> may implement this semantic incorrectly.

True, although a vnode reference should still be held due to the mapping
which is why the fd can be closed. (I'm just talking Solaris here).

Cheers
Jimmo


Casper H.S. Dik

unread,
Sep 1, 2002, 6:23:08 AM9/1/02
to
dlc=use...@cs.cmu.edu writes:

>* While it is valid to close the descriptor after mapping the file, whether
> this works properly may depend upon the underlying file system, which
> may implement this semantic incorrectly.

The file is still "open" for the kernel so I don't see how this can be
a problem. Do you have an example of an operating system where this
doesn't work?

>* If you expect different applications to mix and match read() and mmap()
> for the same file, don't be surprised if some situation arises where
> thinks don't work as well as you would like. Unified buffer caches
> make this less likely, but I have developed a certain paranoia over
> the years.

It didn't work on Linux and HP-UX (not sure if they fixed that)

Sony Antony

unread,
Sep 1, 2002, 12:17:25 PM9/1/02
to
> >According to the comment in cat of Solaris 8 src code, it has something to
> >do with NFSv2 and root access. If you want to find out more, get Solaris
> >source code :-)
>
>
> Because cat used to fail with "write error" (write wasn't able to
> page in the page because of read permission problems on the server;
> but the result showed up as a failure of the "write" system call;
> confusing for some)

But in that case open() itself will fail with EACCESS right. So why
did they have to do this.

--sony

Sony Antony

unread,
Sep 1, 2002, 12:33:40 PM9/1/02
to
Casper H.S. Dik <Caspe...@Sun.COM> wrote in message news:<akspqc$og6$3...@news1.xs4all.nl>...

> dlc=use...@cs.cmu.edu writes:
>
> >* While it is valid to close the descriptor after mapping the file, whether
> > this works properly may depend upon the underlying file system, which
> > may implement this semantic incorrectly.
>
> The file is still "open" for the kernel so I don't see how this can be
> a problem. Do you have an example of an operating system where this
> doesn't work?
>
> >* If you expect different applications to mix and match read() and mmap()
> > for the same file, don't be surprised if some situation arises where
> > thinks don't work as well as you would like. Unified buffer caches
> > make this less likely, but I have developed a certain paranoia over
> > the years.

This sounded like an interesting point. But unfortunately not very
clear to me.
Are you saying that if an application mmap() a file and then also
open() s it for writing, the mmap() ed memory will not reflect what
was written through the opened file descriptor.

1. what was the reason for unices to have different buffer cache. (
other than unified ).
2. does mmap() also use buffer cache. If it does, in the above
scenario, even if an fsync() is done on the file descriptor for
write(), the mmap() memory still wont reflect the change.
3. will the changes be reflected, if the file descriptor for mmap() is
fsync() ed

--sony

David Schwartz

unread,
Sep 1, 2002, 2:28:57 PM9/1/02
to
DINH Viet Hoa wrote:

> or check if the file is readable and fail on a read() system call
> instead of a SIGBUS ?

If you think about how 'cat' would likely be implemented, it wouldn't
fail with a SIGBUS because the program is unlikely to try to access that
information from user-space. On the other hand, 'grep' would be a
different story.

Right after the 'mmap', one would expect 'cat' to 'write' the mmap-ed
space to its stardard output. This would cause the 'write' to fail,
leaving the program with a challenge to figure out whether the input was
unreadable or the output was unwritable.

DS

Casper H.S. Dik

unread,
Sep 1, 2002, 2:40:26 PM9/1/02
to
sonya...@hotmail.com (Sony Antony) writes:

Not in the case of NFSv2; there's no "open()" call for NFS so
permissions for open are checked locally on the client. Root
always gets read access. Then, when the actual read happens
the server intervenes, mapping root to nobody and refusing access.

With NFSv3, the problem does not occur.

Casper H.S. Dik

unread,
Sep 1, 2002, 2:43:35 PM9/1/02
to
sonya...@hotmail.com (Sony Antony) writes:

>This sounded like an interesting point. But unfortunately not very
>clear to me.
>Are you saying that if an application mmap() a file and then also
>open() s it for writing, the mmap() ed memory will not reflect what
>was written through the opened file descriptor.

On some systems that is the case: the file I/O is done through
the buffer cache whereas mmap I/O is not (not a problem on
SunOS 4.x/Solaris 2+)

>1. what was the reason for unices to have different buffer cache. (
>other than unified ).

The traditional implementation has a buffer cache for I/O and
a different mechanism for paging; the unified buffer cache
merges the two; it's just "more work" and there are some
things that work awkwardly with a unified system.

>2. does mmap() also use buffer cache. If it does, in the above
>scenario, even if an fsync() is done on the file descriptor for
>write(), the mmap() memory still wont reflect the change.

It's not clear that mmap'ed pages are directly marked for updated;
they might be. You may need to call msync() first.

>3. will the changes be reflected, if the file descriptor for mmap() is
>fsync() ed

I think that that is implementation dependent as well.

Geoff Clare

unread,
Sep 2, 2002, 9:02:48 AM9/2/02
to
dlc=use...@cs.cmu.edu writes:

>* Only recently (Opengroup Issue 6, maybe earlier) has mmap() been
> standardized.

It was in the first Single Unix Specification, published in 1994.

--
Geoff Clare <nos...@gclare.org.uk>

Ed L Cashin

unread,
Sep 6, 2002, 7:14:21 PM9/6/02
to
Rich Teer <ri...@rite-group.com> writes:

> Hi all,


>
> The accepted wisdom is that mmaping a file is more efficient
> that using read. Is this true for all circumstances, e.g.,
> low memory situations?
>
> If not, when would using read be preferable to using mmap?
> Google wasn't much help, alas...

I am a bit late, but I think there's something that hasn't been
discussed explicitly yet. The memory that mmap maps the file into has
to be backed up with virtual memory, so a program that wants to
minimize its use of virtual memory should avoid mmap when a read
buffer would be smaller than the size of the mmapped file.

Hopefully the gurus will correct or comment on the following example
if its incorrect: That means that if a program running on a machine
with 500M of virtual memory uses mmap on a 300M file when 200M of VM
are already in use, the system will start refusing malloc requests,
even though physical memory may still be available.

--
--Ed L Cashin | PGP public key:
eca...@uga.edu | http://noserose.net/e/pgp/

Drazen Kacar

unread,
Sep 6, 2002, 7:30:48 PM9/6/02
to
Ed L Cashin wrote:

> Hopefully the gurus will correct or comment on the following example
> if its incorrect: That means that if a program running on a machine
> with 500M of virtual memory uses mmap on a 300M file when 200M of VM
> are already in use, the system will start refusing malloc requests,
> even though physical memory may still be available.

Depends on how you mmap the file. If you use MAP_SHARED flag, then the
changes your program might do will be reflected in the backing file. In
that case swap is not reserved by mmap call. If you use MAP_PRIVATE, then
I think swap will be reserved, unless you also supply MAP_NORESERVE.

--
.-. .-. I don't work here. I'm a consultant.
(_ \ / _)
| da...@willfork.com
|

Kjetil Torgrim Homme

unread,
Sep 6, 2002, 7:51:39 PM9/6/02
to
[Ed L Cashin]:

>
> The memory that mmap maps the file into has to be backed up with
> virtual memory, so a program that wants to minimize its use of
> virtual memory should avoid mmap when a read buffer would be
> smaller than the size of the mmapped file.

look into PROT_READ and MAP_NORESERVE in mmap(2).

--
Kjetil T. ==. ,,==. ,,==. ,,==. ,,==. ,,==
::://:::://:::://:::://:::://::::
=='' `=='' `=='' `=='' `=='' `== http://folding.stanford.edu

Dragan Cvetkovic

unread,
Sep 6, 2002, 7:55:57 PM9/6/02
to
Ed L Cashin <eca...@uga.edu> writes:
>
> I am a bit late, but I think there's something that hasn't been
> discussed explicitly yet. The memory that mmap maps the file into has
> to be backed up with virtual memory, so a program that wants to
> minimize its use of virtual memory should avoid mmap when a read
> buffer would be smaller than the size of the mmapped file.

You can always pass MAP_NORESERVE flag to mmap() not to reserve swap space
for it. See Solaris man page for mmap().

But then, it's mixed blessing. To quote mmap man page:

A write into a MAP_NORESERVE mapping produces
results which depend on the current availability of swap
space in the system. If space is available, the write
succeeds and a private copy of the written page is created;
if space is not available, the write fails and a SIGBUS or
SIGSEGV signal is delivered to the writing process.

Richard L. Hamilton

unread,
Sep 6, 2002, 9:15:30 PM9/6/02
to
In article <1rheh2e...@fimm.ifi.uio.no>,

Kjetil Torgrim Homme <kjet...@haey.ifi.uio.no> writes:
> [Ed L Cashin]:
>>
>> The memory that mmap maps the file into has to be backed up with
>> virtual memory, so a program that wants to minimize its use of
>> virtual memory should avoid mmap when a read buffer would be
>> smaller than the size of the mmapped file.
>
> look into PROT_READ and MAP_NORESERVE in mmap(2).

Surely, swap wouldn't be used for a read-only page, such a page
could simply be abandoned to free up memory and re-fetched from the
file when needed again.

--
mailto:rlh...@mindwarp.smart.net http://www.smart.net/~rlhamil

dlc=u...@cs.cmu.edu

unread,
Sep 6, 2002, 11:10:03 PM9/6/02
to
In article <akspqc$og6$3...@news1.xs4all.nl>,

Casper H.S. Dik <Caspe...@Sun.COM> wrote:
>dlc=use...@cs.cmu.edu writes:
>
>>* While it is valid to close the descriptor after mapping the file, whether
>> this works properly may depend upon the underlying file system, which
>> may implement this semantic incorrectly.
>
>The file is still "open" for the kernel so I don't see how this can be
>a problem. Do you have an example of an operating system where this
>doesn't work?

The problem was with the file system, in this case AFS, and not necessarily
a problem in the kernel per se. AFS used the close() as the mechanism
to send the changes from the client cache back to the file server.
Trying to get AFS to do the right thing after the file was closed, but
there was still a reference because of the mapping, was tricky. I do not
know if AFS does it right or not.

--
Daryl Clevenger
dlc=use...@cs.cmu.edu

Casper H.S. Dik

unread,
Sep 7, 2002, 4:23:10 AM9/7/02
to
Ed L Cashin <eca...@uga.edu> writes:

>Hopefully the gurus will correct or comment on the following example
>if its incorrect: That means that if a program running on a machine
>with 500M of virtual memory uses mmap on a 300M file when 200M of VM
>are already in use, the system will start refusing malloc requests,
>even though physical memory may still be available.

No, the system will use the file itself as backing store unless it
is mapped privately (MAP_PRIVATE).

I.e., there's no effect on virtual memory use.

Casper H.S. Dik

unread,
Sep 7, 2002, 5:07:07 AM9/7/02
to
dlc=use...@cs.cmu.edu writes:

>The problem was with the file system, in this case AFS, and not necessarily
>a problem in the kernel per se. AFS used the close() as the mechanism
>to send the changes from the client cache back to the file server.
>Trying to get AFS to do the right thing after the file was closed, but
>there was still a reference because of the mapping, was tricky. I do not
>know if AFS does it right or not.


Well, it looks like a kernel problem or a bug in AFS; you should not
perform such actions on last close() but rather on the last reference
going away; if the kernel provides that information to AFS, then AFS
is at fault. If the kernel does not provide sufficient informatioto AFS
then it is at fault.

Kjetil Torgrim Homme

unread,
Sep 7, 2002, 6:45:13 AM9/7/02
to
[Richard L. Hamilton]:

>
> Kjetil Torgrim Homme <kjet...@haey.ifi.uio.no> writes:
> > look into PROT_READ and MAP_NORESERVE in mmap(2).
>
> Surely, swap wouldn't be used for a read-only page, such a page
> could simply be abandoned to free up memory and re-fetched from
> the file when needed again.

yes. I guess I should have said "or" rather than "and" or something
to make that clearer, but it seemed like poor English.

does anyone know whether mappings are more or less aggressively cached
than normal file reads? I assume "more". calling madvise(2) with
WONT_NEED occasionally could solve the problem, but your process may
not know that a different process will need the same file a second
later.

Ed L Cashin

unread,
Sep 7, 2002, 10:24:59 AM9/7/02
to
Dragan Cvetkovic <d1r2a3g4a...@soli99ton.com> writes:

> Ed L Cashin <eca...@uga.edu> writes:
> >
> > I am a bit late, but I think there's something that hasn't been
> > discussed explicitly yet. The memory that mmap maps the file into has
> > to be backed up with virtual memory, so a program that wants to
> > minimize its use of virtual memory should avoid mmap when a read
> > buffer would be smaller than the size of the mmapped file.
>
> You can always pass MAP_NORESERVE flag to mmap() not to reserve swap space
> for it. See Solaris man page for mmap().
>
> But then, it's mixed blessing. To quote mmap man page:
>
> A write into a MAP_NORESERVE mapping produces
> results which depend on the current availability of swap
> space in the system. If space is available, the write
> succeeds and a private copy of the written page is created;
> if space is not available, the write fails and a SIGBUS or
> SIGSEGV signal is delivered to the writing process.

Then the answer for the OP with regard to VM is that read is
preferable to mmap when you can't afford the somewhat lower
portability of MAP_NORESERVE, or when you can't afford to have the
program segfault when VM is low and you want to allocate as little VM
as possible to the program in question.

Ed L Cashin

unread,
Sep 7, 2002, 10:36:42 AM9/7/02
to
Casper H.S. Dik <Caspe...@Sun.COM> writes:

> Ed L Cashin <eca...@uga.edu> writes:
>
> >Hopefully the gurus will correct or comment on the following example
> >if its incorrect: That means that if a program running on a machine
> >with 500M of virtual memory uses mmap on a 300M file when 200M of VM
> >are already in use, the system will start refusing malloc requests,
> >even though physical memory may still be available.
>
> No, the system will use the file itself as backing store unless it
> is mapped privately (MAP_PRIVATE).
>
> I.e., there's no effect on virtual memory use.

OK, then the only circumstance when the VM concern would make read
preferable to mmap would be when all these are true:

* you don't want MAP_SHARED (meaning that now you must use
MAP_NORESERVE to avoid reserving swap space)

* you can't afford the somewhat lower portability of MAP_NORESERVE
or you can't afford to have the program segfault when VM is low

* you want to allocate as little VM as possible to the program in
question.

--

Philip Brown

unread,
Sep 20, 2002, 6:34:52 PM9/20/02
to
On 7 Sep 2002 08:23:10 GMT, Caspe...@Sun.COM wrote:
>...

>No, the system will use the file itself as backing store unless it
>is mapped privately (MAP_PRIVATE).
>
>I.e., there's no effect on virtual memory use.


unless perhaps,
(size of currently allocated virtual memory) + (size of file) >
addressable memory? ;-)

not so much a problem on fully 64bit systems, but potentially an issue on
32-bit systems, perhaps.

--
[Trim the no-bots from my address to reply to me by email!]
[ Do NOT email-CC me on posts. Pick one or the other.]
S.1618 http://thomas.loc.gov/cgi-bin/bdquery/z?d105:SN01618:@@@D
http://www.spamlaws.com/state/ca1.html

Richard L. Hamilton

unread,
Sep 20, 2002, 8:58:10 PM9/20/02
to
In article <slrnaon8ut....@bolthole.com>,

phi...@bolthole.no-bots.com (Philip Brown) writes:
> On 7 Sep 2002 08:23:10 GMT, Caspe...@Sun.COM wrote:
>>...
>>No, the system will use the file itself as backing store unless it
>>is mapped privately (MAP_PRIVATE).
>>
>>I.e., there's no effect on virtual memory use.
>
>
> unless perhaps,
> (size of currently allocated virtual memory) + (size of file) >
> addressable memory? ;-)
>
> not so much a problem on fully 64bit systems, but potentially an issue on
> 32-bit systems, perhaps.

I have mmap'd a portion of a file
larger than RAM once (that is, even the portion was larger than RAM),
and written out the mmap'd portion with a single write(). Is that the
sort of thing you meant? I don't recall if there was any effect on
swap space, but for the system (a Voyager, with 80 MB), it worked ok.

Ryan Younce

unread,
Sep 20, 2002, 11:05:25 PM9/20/02
to

He said larger than addressable memory, not larger than the amount of
physical memory.

Ryan

Richard L. Hamilton

unread,
Sep 23, 2002, 2:47:21 PM9/23/02
to
In article <VlRi9.33554$jF4.2...@twister.southeast.rr.com>,

Ok then, from mmap(2):

When MAP_FIXED is set and the requested address is the same
as previous mapping, the previous address is unmapped and
the new mapping is created on top of the old one.

When MAP_FIXED is not set, the system uses addr to arrive at
pa. The pa so chosen will be an area of the address space
that the system deems suitable for a mapping of len bytes to
the file. The mmap() function interprets an addr value of 0
as granting the system complete freedom in selecting pa,
subject to constraints described below. A non-zero value of
addr is taken to be a suggestion of a process address near
which the mapping should be placed. When the system selects
a value for pa, it will never place a mapping at address 0,
nor will it replace any extant mapping, nor map into areas
considered part of the potential data or stack "segments".
[...]
ENOMEM
The MAP_FIXED option was specified and the range
[addr, addr + len) exceeds that allowed for the
address space of a process.

The MAP_FIXED option was not specified and there is
insufficient room in the address space to effect the
mapping.

The mapping could not be locked in memory, if required
by mlockall(3C), because it would require more space
than the system is able to supply.

The composite size of len plus the lengths obtained
from all previous calls to mmap() exceeds RLIMIT_VMEM
(see getrlimit(2)).
[...]
USAGE
Use of mmap() may reduce the amount of memory available to
other memory allocation functions.


So MAP_FIXED replaces any existing mapping for the range, and without
MAP_FIXED, if there's not enough room, the attempt fails. So one
couldn't have mappings exceeding the addressible memory, which in the
case of the full 32 (64) bit space is obvious (no way to refer to it
syntactically), and in the case of lesser limits (subtractions for
reserved portions of the address space) would obviously be enforced
by the OS. D'oh.

0 new messages