Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

MMAP without flushing to disk?

2,779 views
Skip to first unread message

Aurel Balmosan

unread,
Jan 25, 1998, 3:00:00 AM1/25/98
to

Hi,
Is there a way how to shared memory between processes
with mmap without that the corresponding file
is continuesly updated? The representation in the
memory doesn't need to be flushed because all changes
to it are explicitly journaled and dumped. Currently
I used SYSV IPC shared memory. But with Linux there is
a strong size limit to it. Also I would like to use
mprotect and so.

My tests show that even if at least one process still has
mmaped a certain file an exiting process flushs out all
changes to the file that was mmaped.

My test was: one process doing randomly changes for ever.
A second one just do one change.

The result: The second process flushed out all dirty pages
to the file even if it did not changed those pages.
The first processes running forever didn't coursed a single
flush to the file while running.

The problem: I have many process changing the mmap-regions
heavily running forever. And I have some processes just
doing a single action in the mmap-regions and then exiting.
Now those processes would initiate a mmap-flush which is
totally unnecessary.

The question: How can I prevent the mmap-flush for the single
action processes?

Bye,
Aurel.
--
=========================================================
Aurel Balmosan |au...@xylo.owl.de, a...@orga.com
http://gaia.owl.de/~aurel/|Xylo: The way things are going
=========================================================

Richard Jones

unread,
Jan 26, 1998, 3:00:00 AM1/26/98
to

Aurel Balmosan <au...@xylo.owl.de> wrote:

> Hi,
> Is there a way how to shared memory between processes
> with mmap without that the corresponding file
> is continuesly updated? The representation in the
> memory doesn't need to be flushed because all changes
> to it are explicitly journaled and dumped. Currently
> I used SYSV IPC shared memory. But with Linux there is
> a strong size limit to it. Also I would like to use
> mprotect and so.

If the processes are forked from a single parent, then
try mapping /dev/zero.

If they're not, then there doesn't seem to be a way of doing
this without a simple kernel patch. The basic plan is as follows:
processes map <some file> and the last process that maps
it also does unlink (<some file>). Now the file is mapped
into everyone's address space, but doesn't exist as a directory
entry. So far so good, but you'll still find that dirty
disk blocks get written out every time a process like
update calls sync(). So you need to patch the kernel so
that sync() ignores inodes which have i_nlink == 0. (Since
they're not linked on the disk, it doesn't matter if they're
sync'd or not -- they get deleted next time you reboot
anyway).

You'll still find that if you run out of memory, Linux will
swap your memory into the file, but that is obviously
unavoidable. At least update won't flush the disk blocks
out every few seconds.

Rich.

--
Richard Jones rjo...@orchestream.com Tel: +44 171 460 6141 Fax: .. 4461
Orchestream Ltd. 262a Fulham Rd. London SW10 9EL. "you'll write in
PGP: www.four11.com telegraphic, or you won't write at all" [Céline]
Copyright © 1998 Richard W.M. Jones

A.van.Kessel

unread,
Jan 26, 1998, 3:00:00 AM1/26/98
to au...@xylo.owl.de

Aurel Balmosan <au...@xylo.owl.de> wrote:
>
>Hi,
>Is there a way how to shared memory between processes
>with mmap without that the corresponding file
>is continuesly updated? The representation in the
>memory doesn't need to be flushed because all changes
>to it are explicitly journaled and dumped. Currently
>I used SYSV IPC shared memory. But with Linux there is
>a strong size limit to it. Also I would like to use
>mprotect and so.
>
>My tests show that even if at least one process still has
>mmaped a certain file an exiting process flushs out all
>changes to the file that was mmaped.
>
>My test was: one process doing randomly changes for ever.
>A second one just do one change.
>
>The result: The second process flushed out all dirty pages
>to the file even if it did not changed those pages.
>The first processes running forever didn't coursed a single
>flush to the file while running.
>
>The problem: I have many process changing the mmap-regions
>heavily running forever. And I have some processes just
>doing a single action in the mmap-regions and then exiting.
>Now those processes would initiate a mmap-flush which is
>totally unnecessary.
>
>The question: How can I prevent the mmap-flush for the single
>action processes?
>
>Bye,
> Aurel.
>--
>=========================================================
>Aurel Balmosan |au...@xylo.owl.de, a...@orga.com
>http://gaia.owl.de/~aurel/|Xylo: The way things are going
>=========================================================

IMHO, what you want is impossibble. With mmap() you have no control
over the flushing behavior of the underlying fs. The OS may or may not write
your buffers to disk, in any order it chooses. Multiple
programs mapping the same file makkes it totally inpredictable, as you
reported.
The msync() systemcall is available on some systems, but the semantics of the
mmap() family are still too fuzzy to be usable in a synchonized
multi-processes-mapping-the-same file-scheme.

Sigh. Just redesign your program(s), splitting off one program
that does all the disk I/O and let the others communicate with it via shm/
IPC/streams/sockets, or whatever. If you really want to keep total control
of synchronisation/flushing , mmap() cannot be used at all.
That's a pity cause I'm fond of mmap() , too. It's usability is still
limited to trivial (parser,pager,copy,symboltable,sort) programs.

--
Happy hacking,

Adriaan van Kessel.
Ingres DBA, C/Unix hacker
Email: Adriaan.v...@NotThere.rivm.nl
(remove NotThere. from the above address)
*** Nederlandstalige zachtwaar is een pijn in de aars ***


Stephen C. Tweedie

unread,
Jan 28, 1998, 3:00:00 AM1/28/98
to

"A.van.Kessel" <Adriaan.v...@NotThere.rivm.nl> writes:

> Aurel Balmosan <au...@xylo.owl.de> wrote:

> >Is there a way how to shared memory between processes
> >with mmap without that the corresponding file
> >is continuesly updated?

> >My tests show that even if at least one process still has


> >mmaped a certain file an exiting process flushs out all
> >changes to the file that was mmaped.

Yes, we do an msync() on mapping exit for the whole file.

> IMHO, what you want is impossibble. With mmap() you have no control
> over the flushing behavior of the underlying fs. The OS may or may
> not write your buffers to disk, in any order it chooses. Multiple
> programs mapping the same file makkes it totally inpredictable, as
> you reported.

True. You can force the data out to disk at any point with msync() or
fsync(), but you cannot _prevent_ disk flushes with the current set of
system calls for mmap().

> The msync() systemcall is available on some systems, but the
> semantics of the mmap() family are still too fuzzy to be usable in a
> synchonized multi-processes-mapping-the-same file-scheme.

This is simply not at all true! msync() (which IS available on Linux)
simply forces the flushing of memory-mapped file data out to disk.
The semantics of mmap() in a multi-user, multi-processor system are
*perfectly* clear --- all processes get the same physical memory when
they access the same file data, regardless of whether they access it
via file reads/writes or via the mmap() call. The physical memory
used to cache the file is exactly the same memory that gets mapped
into every process's address space on mmap().

The only thing which is not precisely defined is the writing of that
data back to disk, but the same is true of normal file data anyway
(which is subject to the usual Unix 30-second writeback delay).

> Sigh. Just redesign your program(s), splitting off one program that
> does all the disk I/O and let the others communicate with it via
> shm/ IPC/streams/sockets, or whatever. If you really want to keep
> total control of synchronisation/flushing , mmap() cannot be used at
> all.

I don't know where you got this idea from! mmap()ed files give
*exactly* the same inter-process synchronisation semantics as the
shared memory you suggest as a replacement. Only synchronisation
between the file's logical contents in memory and its hardened
representation on disk is at issue, and that can be forced uptodate at
any time under application control.

> That's a pity cause I'm fond of mmap() , too. It's usability is
> still limited to trivial (parser,pager,copy,symboltable,sort)
> programs.

I don't agree. There are many rather substantial programs which use
writable memory mapped files to great effect in complex ways. innd,
the internet news software, uses memory mapping for all of its history
file access, read and write, for example. (Use of mmap in innd is a
compilation configuration option, but it is enabled on all of the
precompiled innd's on uptodate Linux distributions).

Cheers,
Stephen.

Linus Torvalds

unread,
Jan 29, 1998, 3:00:00 AM1/29/98
to

In article <m33ei8d...@bennevis.vmse.edo.dec.com>,

Stephen C. Tweedie <twe...@bennevis.vmse.edo.dec.com> wrote:
>
>True. You can force the data out to disk at any point with msync() or
>fsync(), but you cannot _prevent_ disk flushes with the current set of
>system calls for mmap().

Sure you can. You can use msync(MS_INVALIDATE) on your area, and it
will essentially throw out the page tables for that process: including
the dirty state. Obviously the kernel might have written the pages out
earlier in an attempt to reclaim some memory, but if you have enough
memory then it should be a fairly good way of getting rid of some
unnecessary disk IO.

Essentially, you _should_ be able to just do the MS_INVALIDATE just
before unmapping, and then you shouldn't see the flurry of disk activity
of syncing the area to disk.

Of course, it may not actually work, I haven't really used it..

Linus

Aurel Balmosan

unread,
Jan 29, 1998, 3:00:00 AM1/29/98
to

Stephen C. Tweedie <twe...@bennevis.vmse.edo.dec.com> wrote:
> Only synchronisation
> between the file's logical contents in memory and its hardened
> representation on disk is at issue, and that can be forced uptodate at
> any time under application control.

And thats what I don't need for high speed database application. We
work on Alpha and SUN with IPC shared memory of about 2G bytes. Some
processes are continuesly doing something and others are started
e.g. from shell scripts. With 2G the shmat() takes a lot. So
my idea/hope was the mmap actually mapes only those pages really
needed. (And of course, is not interested in other pages)
Flushing out all dirty pages for a program started several times
by a shell script is definitely more overhead than shmat() for 2G.

What I try now is to use SIGSEV to mmap only those pages currently
needed. But for that it would be best if I can reserve a linear
memory region per process where I can map the pages so that
the same relative offsets to the mmap-begin can be used for
all processes. Has anyone an idea?

Bye,
Aurel.

BTW: Can I use mprotect also on IPC shared memory?

Stephen C. Tweedie

unread,
Jan 30, 1998, 3:00:00 AM1/30/98
to

torv...@transmeta.com (Linus Torvalds) writes:

> In article <m33ei8d...@bennevis.vmse.edo.dec.com>,


> Stephen C. Tweedie <twe...@bennevis.vmse.edo.dec.com> wrote:
> >

> >True. You can force the data out to disk at any point with msync() or
> >fsync(), but you cannot _prevent_ disk flushes with the current set of
> >system calls for mmap().
>
> Sure you can. You can use msync(MS_INVALIDATE) on your area, and it
> will essentially throw out the page tables for that process: including
> the dirty state. Obviously the kernel might have written the pages out
> earlier in an attempt to reclaim some memory, but if you have enough
> memory then it should be a fairly good way of getting rid of some
> unnecessary disk IO.

Yep, that will work, but it's not quite what I was trying to say ---
basically, if you modify a MAP_SHARED page then there is no way that
you can make any guarantee that the data stays off disk. The initial
poster's remark was that:

> With mmap() you have no control over the flushing behavior of the
> underlying fs. The OS may or may not write your buffers to disk, in
> any order it chooses.

The point is really that this is just the same as writing through the
filesystem; mmap() provides no more control than write() does over the
ordering guarantees of disk flushes. If you want a barrier, you need
to make one explicitly by using a synchronisation call: fsync, O_SYNC
or msync as appropriate.

The MS_INVALIDATE trick should certainly help to reduce disk IO in the
case where you are not actually writing through the mmaped region.

--Stephen

Pavankumar S V

unread,
Mar 8, 2023, 2:02:04 AM3/8/23
to
On Friday, 30 January 1998 at 13:30:00 UTC+5:30, Stephen C. Tweedie wrote:
> torv...@transmeta.com (Linus Torvalds) writes:
> > In article <m33ei8d...@bennevis.vmse.edo.dec.com>,
> > Stephen C. Tweedie <twe...@bennevis.vmse.edo.dec.com> wrote:
> > >

> Yep, that will work, but it's not quite what I was trying to say ---
> basically, if you modify a MAP_SHARED page then there is no way that
> you can make any guarantee that the data stays off disk. The initial
> poster's remark was that:
> > With mmap() you have no control over the flushing behavior of the
> > underlying fs. The OS may or may not write your buffers to disk, in
> > any order it chooses.

I am also in the same situation to avoid flushing the memory mapped region to disk for some time.
As per my analysis and some google search, kernel parameter "dirty_writeback_centisecs" is responsible for periodically calling the kernel threads that perform the flushing. By setting "dirty_writeback_centisecs" to '0' for some time and changing back to default value, I was expecting that the flushing should not happen during the time when "dirty_writeback_centisecs" is'0'. But it is working partially i.e the flushing happens sometimes even though it's value is 0.

I also got to know that the flushing not only happens periodically but also happens whenever the number of dirty pages in memory exceeds a threshold value(and these threshold values are set by kernel parameters "dirty_background_ratio", "dirty_ratio")
In combination with "dirty_writeback_centisecs", I also tried increasing the values of kernel parameters "dirty_background_ratio", "dirty_ratio" to large values to increase the number of dirty pages that can be kept in memory before flushing to disk. I also tried setting "dirty_expire_centisecs" to large value.
But this also works partially.

Note: I'm using device file "mtdblock0" for memory mapping into virtual address space if one of my process and I am assuming that mmap() syncing works for the device files in the same way as for regular files.
I'm also assuming that same set of threads that are responsible for flushing the page cache are also responsible for flushing the memory mapped region to the underlying file.

Please let me know if I'm missing something or my understanding of the flushing concepts is wrong. Please correct me if I'm wrong.

Thanks in Advance.
0 new messages