Understanding the working of Shared memory using mmap()

Pavankumar S V

unread,

Mar 9, 2023, 7:19:52 AM3/9/23

to

Hello,
As per my understanding, I can explain the working of mmap() briefly like this:
When a process(let's call it process1) calls mmap on a regular file, that file is first copied to the page cache. Then the region of page cache which contains the file is mapped to virtual address space of the process1(This memory region is called memory-mapped file).
If another process(let's call it process2) calls mmap on the same file, then the same page cache that was mapped to process1 will get mapped to the virtual address space of process2.
When the processes wants to access the file, they simply access this memory mapped file which is very faster. Also the data modified by process1 can be seen by process2.

I have a query here. Please clarify it:
When the process1 wants to write some data to the file, it will write to this memory mapped file. Then these dirty pages that are private to the process1 should be copied to the page cache. When will the kernel do this copying to page cache and how frequently? Is there a way for the process to control it?
(My concern is : If there is a slight delay in performing this copying, it will delay the process2 which is mapped to the same file from reading the modified data.)
The details of copying from Page Cache to the underlying file happens like this as per my understanding:
Once the Page Cache is modified, the dirty pages are eventually flushed to the disk automatically based on some conditions using pdflush threads. And the processes can also explicitly do this flushing using the system call msync().
So, I want to understand how these things happen when copying from memory mapped file to the Page cache.

Thanks in Advance

Richard Kettlewell

unread,

Mar 10, 2023, 11:30:06 AM3/10/23

to

My understanding is that there is no such copy. The page in the page
cache is added directly to the process’s virtual address space. The only
copies are when flushing a dirty page to disk, or duplicating a page
during copy-on-write with a private mapping.

--
https://www.greenend.org.uk/rjk/

Rainer Weikusat

unread,

Mar 13, 2023, 1:35:13 PM3/13/23

to

Pavankumar S V <pavanku...@gmail.com> writes:

> As per my understanding, I can explain the working of mmap() briefly
> like this: When a process(let's call it process1) calls mmap on a
> regular file, that file is first copied to the page cache. Then the
> region of page cache which contains the file is mapped to virtual
> address space of the process1(This memory region is called
> memory-mapped file). If another process(let's call it process2) calls
> mmap on the same file, then the same page cache that was mapped to
> process1 will get mapped to the virtual address space of process2.
> When the processes wants to access the file, they simply access this
> memory mapped file which is very faster. Also the data modified by
> process1 can be seen by process2.

It *may* be faster. But address space manipulations, page faults
occurring while populating some part of the virtual address space of a
process and cache- and TLB-misses are all expensive operations, hence,
it may well be not.

> I have a query here. Please clarify it:
> When the process1 wants to write some data to the file, it will write
> to this memory mapped file. Then these dirty pages that are private to
> the process1 should be copied to the page cache. When will the kernel
> do this copying to page cache and how frequently?

Not at all. If the mapping is done as MAP_PRIVATE, the process will gets
its own copy of each page as soon as it starts writing to it. For
MAP_SHARED mappings, all processes mapping the same file plus the kernel
page cache will write to the same page.