Google grupe više ne podržavaju nove postove ni pretplate na Usenetu. Stari sadržaj ostaje vidljiv.

does fsync() imply msync()?

1.058 prikaza
Preskoči na prvu nepročitanu poruku

yirgster

nepročitano,
8. svi 2010. 05:59:5208. 05. 2010.
u
[pls forgive if I've posted this (inadvertently) more than once.]

Does fsync(fd) imply that any areas of that file that were mmap'd only
are also synced, or is a separate msync() required.

I know the os can determine which pages in the process belong to the
file. My question is: must it do this whenever fsycn() is invoked?

The spec says: "all data for the open file descriptor named by fildes
is to be transferred to the storage device associated with the file
described by fildes" so I think the answer is 'yes', but I have that
worm of doubt that it's only talking about areas of the file that have
been read() and not those mmap()'d. In any case, I want to be sure.

For example:

int fd = open("my_file", ...);
vaddr *addr = mmap(0, len, ,,, , fd, 0);

/* do work on mmap'd area */
[ work ]

/* now want to sync the file */

msync(0, len); /* is this necessary if I do the fsync() below? */

fsync(fd); /* is this sufficent by itself? */

yirgster

nepročitano,
8. svi 2010. 06:21:2508. 05. 2010.
u

I mean, msync(0, len, MS_SYNC);

Supposing that the only access to the file data in the process has
been through the mmap. Then if you did msync(0, len, MS_SYNC) and then
the fsync(fd), the os would now see (by looking at the page table
entries) that all the pages are clean (as a result of the msync) and
so not actually write again any of the data to disk. True (my choice)?
False? Depends?

Rainer Weikusat

nepročitano,
9. svi 2010. 14:37:3609. 05. 2010.
u
yirgster <yirg....@gmail.com> writes:
> [pls forgive if I've posted this (inadvertently) more than once.]
>
> Does fsync(fd) imply that any areas of that file that were mmap'd only
> are also synced,

Theoretically, no.

> or is a separate msync() required.

Theoretically, yes.

Practically, this depends on the implementation. If a so-called
unified file/ page cache is used, fsync will necessarily sync all
mmaped areas, too, since only one copy of each file page exists in the
kernel. Historically, this wasn't always the case because the file
caching facilities in the UNIX(*) kernel already existed by the time
virtual memory support (including mmap) was added. As far as I know,
the 'lone holdout' in this respect used to be HP-UX but even that has
reportedly meanwhile been fixed.

Chris Friesen

nepročitano,
10. svi 2010. 11:46:4010. 05. 2010.
u
On 05/09/2010 12:37 PM, Rainer Weikusat wrote:
> yirgster <yirg....@gmail.com> writes:
>> [pls forgive if I've posted this (inadvertently) more than once.]
>>
>> Does fsync(fd) imply that any areas of that file that were mmap'd only
>> are also synced,
>
> Theoretically, no.
>
>> or is a separate msync() required.
>
> Theoretically, yes.
>
> Practically, this depends on the implementation. If a so-called
> unified file/ page cache is used, fsync will necessarily sync all
> mmaped areas, too, since only one copy of each file page exists in the
> kernel.

I don't think that's correct. Last I checked fsync() didn't walk the
page tables looking for dirty pages to flush, while msync() does.

Chris

Chris Friesen

nepročitano,
10. svi 2010. 11:57:3010. 05. 2010.
u
On 05/10/2010 09:46 AM, Chris Friesen wrote:

> I don't think that's correct. Last I checked fsync() didn't walk the
> page tables looking for dirty pages to flush, while msync() does.

Looks like I was wrong and they now do that.

Chris

yirgster

nepročitano,
10. svi 2010. 15:01:1810. 05. 2010.
u

But, from a "legal" standpoint,

(1) this isn't required behavior by posix, that is, that fsync(fd)
sync all mmap'd(fd) memory too.

but it won't be much of a performance hit because if it does then

(2) the fsync() following the msync(MS_SYNC) will find the pte's clean
and hence will NOT rewrite the page again.

Note that in windows you MUST do both analogous operations.

Flushing a range of a mapped view initiates writing of dirty pages
within that range to the disk. Dirty pages are those whose contents
have changed since the file view was mapped. The FlushViewOfFile
function does not flush the file metadata, and it does not wait to
return until the changes are flushed from the underlying hardware disk
cache and physically written to disk. To flush all the dirty pages
plus the metadata for the file and ensure that they are physically
written to disk, call FlushViewOfFile and then call the
FlushFileBuffers function. http://msdn.microsoft.com/en-us/library/aa366563%28VS.85%29.aspx

Chris Friesen

nepročitano,
10. svi 2010. 15:41:3410. 05. 2010.
u
On 05/10/2010 01:01 PM, yirgster wrote:

> But, from a "legal" standpoint,
>
> (1) this isn't required behavior by posix, that is, that fsync(fd)
> sync all mmap'd(fd) memory too.

Contrary to Rainer, I think it actually might be implied by posix, and
that's why the various OS's have changed their behaviour.

The posix language reads "all data for the open file descriptor named by


fildes is to be transferred to the storage device associated with the

file described by fildes." Arguably, memory ranges mmap'd from a file
is "data for the open file descriptor".

However, we know that historically this hasn't been the case so it would
be silly to rely on this behaviour to be portable.

> but it won't be much of a performance hit because if it does then
>
> (2) the fsync() following the msync(MS_SYNC) will find the pte's clean
> and hence will NOT rewrite the page again.

True.

Chris

yirgster

nepročitano,
10. svi 2010. 16:04:1210. 05. 2010.
u
On May 10, 12:41 pm, Chris Friesen <cbf...@mail.usask.ca> wrote:
> On 05/10/2010 01:01 PM, yirgster wrote:
>
> > But, from a "legal" standpoint,
>
> > (1) this isn't required behavior by posix, that is, that fsync(fd)
> > sync all mmap'd(fd) memory too.
>
> Contrary to Rainer, I think it actually might be implied by posix, and
> that's why the various OS's have changed their behaviour.
>
> The posix language reads "all data for the open file descriptor named by
> fildes is to be transferred to the storage device associated with the
> file described by fildes."  Arguably, memory ranges mmap'd from a file
> is "data for the open file descriptor".

Chris, as you say: "arguably". This is a case where, imo, the spec
needs to be explicitly explicit, especially due to historically known
behavior. So, wrt to the spec I think "not guaranteed" even if
guaranteed is the current intent.

Are there means to get such language made more explicit? Is there a
"grand committee" than can be appealed to?

>
> However, we know that historically this hasn't been the case so it would
> be silly to rely on this behaviour to be portable.
>
> > but it won't be much of a performance hit because if it does then
>
> > (2) the fsync() following the msync(MS_SYNC) will find the pte's clean
> > and hence will NOT rewrite the page again.
>
> True.

So this is what I'm going to do, especially as the need to do it
arises very infrequently in our application.

>
> Chris

Chris Friesen

nepročitano,
10. svi 2010. 19:05:2310. 05. 2010.
u
On 05/10/2010 02:04 PM, yirgster wrote:
> On May 10, 12:41 pm, Chris Friesen <cbf...@mail.usask.ca> wrote:

>> The posix language reads "all data for the open file descriptor named by
>> fildes is to be transferred to the storage device associated with the
>> file described by fildes." Arguably, memory ranges mmap'd from a file
>> is "data for the open file descriptor".
>
> Chris, as you say: "arguably". This is a case where, imo, the spec
> needs to be explicitly explicit, especially due to historically known
> behavior. So, wrt to the spec I think "not guaranteed" even if
> guaranteed is the current intent.
>
> Are there means to get such language made more explicit? Is there a
> "grand committee" than can be appealed to?

I think that would be these folks:

http://www.opengroup.org/austin/

Chris

Rainer Weikusat

nepročitano,
12. svi 2010. 15:33:2312. 05. 2010.
u
Chris Friesen <cbf...@mail.usask.ca> writes:
> On 05/10/2010 01:01 PM, yirgster wrote:
>
>> But, from a "legal" standpoint,
>>
>> (1) this isn't required behavior by posix, that is, that fsync(fd)
>> sync all mmap'd(fd) memory too.
>
> Contrary to Rainer, I think it actually might be implied by posix, and
> that's why the various OS's have changed their behaviour.
>
> The posix language reads "all data for the open file descriptor named by
> fildes is to be transferred to the storage device associated with the
> file described by fildes." Arguably, memory ranges mmap'd from a file
> is "data for the open file descriptor".

The situation isn't that simple, eg, it is legal to close a file
descriptor after it was used to establish a memory mapping and to
continue using the mapping. Assuming that the file is later reopened,
is whatever the existing memory mapping contains necessarily 'data for
the new file descriptor' (or only if the implementation happens to
have a unified cache)?

Chris Friesen

nepročitano,
12. svi 2010. 15:53:2512. 05. 2010.
u

I agree that the wording is a bit unclear, but they left it that way on
purpose. From the posix rationale:

"The fsync() function is intended to force a physical write of data from
the buffer cache, and to assure that after a system crash or other
failure that all data up to the time of the fsync() call is recorded on
the disk. Since the concepts of "buffer cache", "system crash",
"physical write", and "non-volatile storage" are not defined here, the
wording has to be more abstract."

Based on the above, I see no reason to treat data modified via memory
mappings any different than data written by a write() syscall.

That said, if _POSIX_SYNCHRONIZED_IO is not defined, the spec explicitly
allows a null implementation of fcntl()...but it must be documented in
the compliance document.

Chris

yirgster

nepročitano,
12. svi 2010. 16:29:4712. 05. 2010.
u

Under msync(MS_SYNC) it would have had to make it out to disk, so it
will be seen by any open and file access that follows after.

I've assumed all along that we've been talking about mmap(...
MAP_SHARED ...)

Ersek, Laszlo

nepročitano,
12. svi 2010. 18:12:4712. 05. 2010.
u

I don't think so.

POSIX very carefully distinguishes file descriptor from file description
from file. The language quoted above is "all data for the open file
descriptor". Ie. the distinction is made on the most specific (least
shared) level. If you dup()licate a file descriptor, you get a new
descriptor referring to the same open file description [0] [1]. But
fsync() only needs to synchronize changes made through the exact file
descriptor that is passed to it.

If the spec went a single level deeper, ie. to file description, that
would require an fsync() call issued by process A to synchronize changes
made by process B with write(), for which B used a descriptor that it
inherited from A through a series of fork()s and exec()s, or one that it
received over a UNIX domain socket with SCM_RIGHTS.

(Btw, I found only one mention of "SCM_RIGHTS" in SUSv4 [2], and it only
"Indicates that the data array contains the access rights to be sent or
received." The Linux manual is more specific [3]: it not only mentions
that the "access rights" are file descriptors, but it also states that
SCM_RIGHTS is effectively a cross-process dup().)

Therefore, it seems to me, once you close a file descriptor, you may lose
any opportunity to fsync() the changes made through it.

I can't imagine that fsync() -- being permitted to ignore any changes made
through a different file descriptor -- would be *required* to care about
modifications performed through something that is not even a file
description.

In closing, if you don't mind, I'll quote myself; it seems relevant to
some extent.

----v----
Date: Fri, 2 Apr 2010 20:58:22 +0200
From: "Ersek, Laszlo" <la...@caesar.elte.hu>
Newsgroups: comp.programming.threads, comp.unix.programmer,
comp.os.linux.development.system, comp.os.linux.development.apps
Subject: Re: IPC based on name pipe FIFO and transaction log file
Message-ID: <Pine.LNX.4.64.10...@login01.caesar.elte.hu>

[snip]

Would anybody please validate the following table?

+-------------+----------------------------------------------------------------+
| change made | change visible via |
| through +----------------------------+-------------+---------------------+
| | MAP_SHARED | MAP_PRIVATE | read() |
+-------------+----------------------------+-------------+---------------------+
| MAP_SHARED | yes | unspecified | depends on MS_SYNC, |
| | | | MS_ASYNC, or normal |
| | | | system activity |
+-------------+----------------------------+-------------+---------------------+
| MAP_PRIVATE | no | no | no |
+-------------+----------------------------+-------------+---------------------+
| write() | depends on MS_INVALIDATE, | unspecified | yes |
| | or the system's read/write | | |
| | consistency | | |
+-------------+----------------------------+-------------+---------------------+

----^----

Cheers,
lacos

[0] http://www.opengroup.org/onlinepubs/9699919799/functions/dup.html
[1] http://www.opengroup.org/onlinepubs/9699919799/functions/fcntl.html
[2] http://www.opengroup.org/onlinepubs/9699919799/basedefs/sys_socket.h.html
[3] http://www.kernel.org/doc/man-pages/online/pages/man7/unix.7.html

yirgster

nepročitano,
12. svi 2010. 19:41:3112. 05. 2010.
u

I still don't think it's proven since "all data up to the time of
fsync()" seems conditioned on the preceding phrase "physical write of
data from the buffer cache." So, we're back to the unified buffer
cache issue.

yirgster

nepročitano,
12. svi 2010. 20:22:4112. 05. 2010.
u
lacos writes:

> [ snip ]


> POSIX very carefully distinguishes file descriptor from file description
> from file. The language quoted above is "all data for the open file
> descriptor". Ie. the distinction is made on the most specific (least
> shared) level. If you dup()licate a file descriptor, you get a new
> descriptor referring to the same open file description [0] [1]. But
> fsync() only needs to synchronize changes made through the exact file
> descriptor that is passed to it.

I agree with this reading. You know, looking at the discussion this
issue has engendered, and assuming yours is an absolutely correct
reading based on the writing (as I think it is), it still should have
been more explicitly clarified in the doc, e.g., "It doesn't apply to
other fd's even in the same process." I mean, the purpose is
understanding and clarity, no? Not Talmudic scholarship.

> [snip socket stuff -- I have no idea]

> Therefore, it seems to me, once you close a file descriptor, you may lose
> any opportunity to fsync() the changes made through it.

Yes, I agree with this too.

> I can't imagine that fsync() -- being permitted to ignore any changes made
> through a different file descriptor -- would be *required* to care about
> modifications performed through something that is not even a file
> description.

Sounds correct. But, it's not relevant to the issue of mmap() of a
file description being implied by fsync of the same fd.

> In closing, if you don't mind, I'll quote myself; it seems relevant to
> some extent.

I rarely mind advertisements for myself. Even from others. I do it all
the time.

> Would anybody please validate the following table?

Validate your table? I am sufficiently trustworthy (forget
knowledgeable)?

> +-------------+----------------------------------------------------------------+
> | change made | change visible via                                             |
> | through     +----------------------------+-------------+---------------------+
> |             | MAP_SHARED                 | MAP_PRIVATE | read()              |
> +-------------+----------------------------+-------------+---------------------+
> | MAP_SHARED  | yes                        | unspecified | depends on MS_SYNC, |
> |             |                            |             | MS_ASYNC, or normal |
> |             |                            |             | system activity     |
> +-------------+----------------------------+-------------+---------------------+
> | MAP_PRIVATE | no                         | no          | no                  |
> +-------------+----------------------------+-------------+---------------------+
> | write()     | depends on MS_INVALIDATE,  | unspecified | yes                 |
> |             | or the system's read/write |             |                     |
> |             | consistency                |             |                     |
> +-------------+----------------------------+-------------+---------------------+

Well, I'm not sure I understand your table completely. But here goes:

Under MAP_PRIVATE, 2nd row, I don't understand the qualifications. It
simply seems to me: unspecified. From the mmap() page: "It is
unspecified whether modifications to the underlying object done after
the MAP_PRIVATE mapping is established are visible through the
MAP_PRIVATE mapping." So what would MS_SYNC, MS_ASYNC, have to do with
it?

MS_INVALIDATE: there's a reality problem here, I believe. This is
that, from reading other posts on this subject back around 2002-2004,
that it's basically a no-op in some of the os's (linux? - I can't look
at the source now.) Also, it would be pretty hard to test, no? Isn't
it the same race condition between say, the store buffers and memory
cache consistency, of recent discussion in the threads group.

Speaking of reality (but why should this interfere with our thinking),
I know--i.e., actually seen, I'm not talking theoretically--a case in
which linux (at that time at least) did not in one instance meet the
posix spec. I keep thinking it was in zero'ing out the last page of
the file correctly. But this seems too obvious. Whatever it was, it
worked properly on Solaris, AIX, and HP. I saw it.

Well, got to show some motion at work. Hope you're not so unfortunate.

Ersek, Laszlo

nepročitano,
12. svi 2010. 21:24:4912. 05. 2010.
u
On Wed, 12 May 2010, yirgster wrote:

> lacos writes:
>
>> I can't imagine that fsync() -- being permitted to ignore any changes
>> made through a different file descriptor -- would be *required* to care
>> about modifications performed through something that is not even a file
>> description.
>
> Sounds correct. But, it's not relevant to the issue of mmap() of a file
> description being implied by fsync of the same fd.

Of course it's relevant. There is no "same fd". mmap() uses the file
descriptor temporarily to get to the file description, and then to the
regular file. Once the mapping is established, that is, the necessary
kernel structures are set up to translate addresses to file offsets
transparently and whatever else, there is no file descriptor or file
description involved anymore, even if you didn't yet happen to close the
file descriptor you originally passed to mmap(). The file descriptor in
question is not required to have any memory of it ever being used to
establish any mapping, and the mapping is free not to remember the fd it
was established via.


>> In closing, if you don't mind, I'll quote myself; it seems relevant to
>> some extent.
>
> I rarely mind advertisements for myself. Even from others. I do it all
> the time.
>
>> Would anybody please validate the following table?
>
> Validate your table? I am sufficiently trustworthy (forget
> knowledgeable)?

:)

I do appreciate your input, but the outermost quote was Rainer's :)


>> +-------------+----------------------------------------------------------------+
>> | change made | change visible via ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ |
>> | through ďż˝ ďż˝ +----------------------------+-------------+---------------------+
>> | ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ | MAP_SHARED ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ | MAP_PRIVATE | read() ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝|
>> +-------------+----------------------------+-------------+---------------------+
>> | MAP_SHARED ďż˝| yes ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝| unspecified | depends on MS_SYNC, |
>> | ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ | ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝| ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ | MS_ASYNC, or normal |
>> | ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ | ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝| ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ | system activity ďż˝ ďż˝ |
>> +-------------+----------------------------+-------------+---------------------+
>> | MAP_PRIVATE | no ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ | no ďż˝ ďż˝ ďż˝ ďż˝ ďż˝| no ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝|
>> +-------------+----------------------------+-------------+---------------------+
>> | write() ďż˝ ďż˝ | depends on MS_INVALIDATE, ďż˝| unspecified | yes ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ |
>> | ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ | or the system's read/write | ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ | ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ |
>> | ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ | consistency ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝| ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ | ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ |


>> +-------------+----------------------------+-------------+---------------------+
>
> Well, I'm not sure I understand your table completely. But here goes:
>
> Under MAP_PRIVATE, 2nd row, I don't understand the qualifications. It
> simply seems to me: unspecified. From the mmap() page: "It is
> unspecified whether modifications to the underlying object done after
> the MAP_PRIVATE mapping is established are visible through the
> MAP_PRIVATE mapping." So what would MS_SYNC, MS_ASYNC, have to do with
> it?

That language is reflected actually in the MAP_PRIVATE *column*, and the
write() *row*". The "modification[] to the underlying object" is done
through write(), and whether the change is visible via a pre-existent
MAP_PRIVATE mapping is unspecified.


+-------------+-----------------------------------------+
| change made | change visible via ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ |
| through ďż˝ ďż˝ +-------------+-------------+-------------+
| ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ ďż˝ | CVV_1 | CVV_2 | CVV_3 ďż˝|
+-------------+-------------+-------------+-------------+
| CMT_1 ďż˝| sentence_11 | sentence_12 | sentence_13 |
+-------------+-------------+-------------+-------------+
| CMT_2 | sentence_21ďż˝| sentence_22ďż˝| sentence_23ďż˝|
+-------------+-------------+-------------+-------------+
| CMT_3 ďż˝ | sentence_31 | sentence_32 | sentence_33 |
+-------------+-------------+-------------+-------------+

"It is <sentence_ij> whether a change made through CMT_i is a change
visible via (a pre-existent) CVV_j."

Example: i=3, j=1: "In order to make a change, effected through a write()
syscall, visible to a pre-existent MAP_SHARED mapping, the system's
read/write consistency is sufficient, or an msync(..., MS_INVALIDATE) call
issued after the write() is sufficient."

Cheers,
lacos

0 novih poruka