Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

mmap & msync

204 views
Skip to first unread message

James K. Lowden

unread,
Aug 10, 2012, 6:06:05 PM8/10/12
to
What do I have to do to be assured that data written though a
pointer returned by mmap(2) were actually, really and truly written to
disk?

What exactly am I guaranteed -- and what can I really expect -- from
msync(2)? What is its relationship to fsync(2)?

The target is a Linux system, but I'm also interested in what the
standard intends.

I assume the flags passed to open(2) matter, and that O_SYNC is my
friend, and O_DIRECT is not.

Then I call mmap(2), then memcpy(3), then msync(2). Now what?

http://pubs.opengroup.org/onlinepubs/009695399/functions/msync.html

says

"the msync() function shall ensure that all write operations
are completed as defined for synchronized I/O data integrity
completion."

which isn't much of a commitment; "synchronized I/O data integrity"
says nothing about rusting bits on the disk. It just says that a read
initiated after a write reads what the write wrote. Yes, you read that
right.

My intention is to write to the file only via the pointer returned by
mmap(2), and to call msync(2) at unit-of-work boundaries. Am I also
bound to call fsync(2)?

If I do call fsync(2), that's apparently not enough;
http://linux.die.net/man/2/fsync says

"Calling fsync() does not necessarily ensure that the entry in
the directory containing the file has also reached disk. For that an
explicit fsync() on a file descriptor for the directory is also needed."

But what could that mean? I don't have a descriptor on the directory
because I wouldn't know what to do with it if I did; that's a job for
opendir(3). If I did open it, I wouldn't write to it, so how could
syncing an unmodified descriptor have any effect?

--jkl

Rainer Weikusat

unread,
Aug 10, 2012, 6:45:44 PM8/10/12
to
"James K. Lowden" <jklo...@speakeasy.net> writes:

[...]

> What exactly am I guaranteed -- and what can I really expect -- from
> msync(2)? What is its relationship to fsync(2)?

Inherently, none. On systems which don't have a 'unified page cache',
fsync may be need so that data written via write becomes visible in
mmapped pages and vice-versa (OTOH, I've heard that even the HP-UX
people have meanwhile - grudgeingly - accepted that this really
doesn't make any sense :-).

[...]


> Then I call mmap(2), then memcpy(3), then msync(2). Now what?
>
> http://pubs.opengroup.org/onlinepubs/009695399/functions/msync.html
>
> says
>
> "the msync() function shall ensure that all write operations
> are completed as defined for synchronized I/O data integrity
> completion."
>
> which isn't much of a commitment; "synchronized I/O data integrity"
> says nothing about rusting bits on the disk. It just says that a read
> initiated after a write reads what the write wrote.

It doesn't say that and the property you mention is part of the
definition of read and write, without any 'sync operation'. The actual
definition is

For write, when the operation has been completed or diagnosed
if unsuccessful. The write is complete only when the data
specified in the write request is successfully transferred and
all file system information required to retrieve the data is
successfully transferred.

> My intention is to write to the file only via the pointer returned by
> mmap(2), and to call msync(2) at unit-of-work boundaries. Am I also
> bound to call fsync(2)?

No, except for possibly for directories.

>
> If I do call fsync(2), that's apparently not enough;
> http://linux.die.net/man/2/fsync says
>
> "Calling fsync() does not necessarily ensure that the entry in
> the directory containing the file has also reached disk. For that an
> explicit fsync() on a file descriptor for the directory is also needed."
>
> But what could that mean?

That a directory is a file and that fsyncs done on file A -
unsurprisingly - don't affect file B. It's just that writes to
directory files happen automatically as side effect of writes to other
files and traditionally, these 'metadata writes' are not treated
specially by 'Linux filesystems'. If this concerns you, just do as the
manpage suggests - open a file descriptor to the directory and fsync
it.

Barry Margolin

unread,
Aug 10, 2012, 7:11:14 PM8/10/12
to
In article <87wr16f...@sapphire.mobileactivedefense.com>,
The directory is only relevant if you're creating or renaming a file.
All the other metadata for a file resides in the inode, not the
directory.

--
Barry Margolin, bar...@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***

James K. Lowden

unread,
Aug 10, 2012, 7:33:48 PM8/10/12
to
On Fri, 10 Aug 2012 23:45:44 +0100
Rainer Weikusat <rwei...@mssgmbh.com> wrote:

Hi Rainer,

Thanks for the clarification on the nonrelationship between msync and
fsync. And I wouldn't have thought that simply calling open and fsync
on a directory, with no intervening write, would update it, but OK.

I was a confused by the docs, see below. But I think I get it now:
msync(2) is a write operation governed by "synchronized I/O data
integrity completion", which states the data are safely on disk unless
an error is returned.

> > "synchronized I/O data integrity" says nothing about rusting bits
> > on the disk. It just says that a read initiated after a write
> > reads what the write wrote.
>
> It doesn't say that

Well, it does and it doesn't! I was referring to
http://pubs.opengroup.org/onlinepubs/007904975/xrat/xbd_chap03.html

"Synchronized I/O Data (and File) Integrity Completion

"These terms specify that for synchronized read operations,
pending writes must be successfully completed before the read operation
can complete. "

You pointed me to

> For write, when the operation has been completed or diagnosed
> if unsuccessful. The write is complete only when the data
> specified in the write request is successfully transferred and
> all file system information required to retrieve the data is
> successfully transferred.

which is to be found at
http://pubs.opengroup.org/onlinepubs/007908799/xbd/glossary.html

Thanks.

--jkl

Nobody

unread,
Aug 11, 2012, 11:25:38 PM8/11/12
to
On Fri, 10 Aug 2012 19:33:48 -0400, James K. Lowden wrote:

> And I wouldn't have thought that simply calling open and fsync
> on a directory, with no intervening write, would update it, but OK.

Creating a new file with open(..., O_CREAT) will modify the directory.
Initially, it will only modify the in-memory copy; you need to fsync() the
directory if you want to ensure that the on-disc copy has been updated.

Rainer Weikusat

unread,
Aug 12, 2012, 12:08:48 PM8/12/12
to
The situation where this probably matters most is when replacing
files. The common create temp file/ rename sequence needs an fsync on
the temporary file after the new contents where written and an fsync
on the directory after the rename before the replacement is
persistent.

joshua...@gmail.com

unread,
Aug 13, 2012, 6:02:12 PM8/13/12
to
On Friday, August 10, 2012 3:06:05 PM UTC-7, James K. Lowden wrote:
> What do I have to do to be assured that data written though a
> pointer returned by mmap(2) were actually, really and truly written to
> disk?

(Trying new google groups API again. Sorry if the line breaks don't work. Let's see...)

I've looked into this topic only a little, and the short answer is "there is no answer". The answer depends on your hard drive, file system driver, and operating system - assuming there even is a way. For example, some hard drives think they know better than you and will fail to write to non-volatile storage even when issued the appropriate commands.

Rich Gray

unread,
Aug 14, 2012, 9:22:40 PM8/14/12
to
How would one fsync() a directory after a rename()? Open() it
just for the purpose of fsync()?

- Rich

Rainer Weikusat

unread,
Aug 15, 2012, 6:36:58 AM8/15/12
to
On a sufficiently 'current' system, I would already have opened the
directory for using it in the openat and renameat calls prior to the
fsync. Otherwise, yes.

NB: This level of caution is often not necessary. It's just the
programs I have to deal with most are supposed to run continiously on
computers distributed all around the globe and in order to minimize
the chances for user-visible defects (and phonecalls at odd hours :-),
it is prudent to enable them to cope with all difficulties which could
befall them whereever this is possible.
0 new messages