Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

extending an mmap'd file

2,781 views
Skip to first unread message

Bryan White

unread,
Apr 28, 1998, 3:00:00 AM4/28/98
to

I want to memory map a mostly static file using mmap. The only changes
too the file will be new entries appended to the end.

The file is shared among multiple processes but I can handle arbitration
separately.

The documentation I have found for mmap indicates I cannot extend it by
writing beyond the EOF in the mapping. That's fine but they give no
hint as to other techniques that may or may not work.

In particuler what I am thinking of doing is:
1) open the original file with open().
2) seek to the end
3) append the new entry
4) close() the file

Will this work? Is there some sync operation(msync, or madvise) that
needs to be done? Would this work better if I kept the original file
descriptor(the one given to mmap) around and used that for the append
operation.

Is there a method for a program to determine the valid size of the
mapped file as opposed to the size of the memory region passed to
mmap()?

--
Bryan White
ArcaMax Inc
Yorktown, VA

Kaz Kylheku

unread,
Apr 28, 1998, 3:00:00 AM4/28/98
to

In article <354619B8...@visi.net>, Bryan White <bryw...@visi.net> wrote:
>I want to memory map a mostly static file using mmap. The only changes
>too the file will be new entries appended to the end.

This question belongs in comp.unix.programmer.

The way you extend a mapped file is by doing an lseek and a write.

[ One thing you can do is trap a SIGBUS signal and do it in the signal handler,
if you are daring. That way, if you access to a part of the mapping which does
not correspond to an underlying file, you generate a bus error which your
handler catches and corrects by extending the file, allowing the access to be
tried again. ]

>The file is shared among multiple processes but I can handle arbitration
>separately.
>
>The documentation I have found for mmap indicates I cannot extend it by
>writing beyond the EOF in the mapping. That's fine but they give no

EOF is a special value used to indicate the failure of getchar() and
other functions.

You cannot access memory beyond the mapping, that much is true. But you can
map beyond the physical end of the file, in which case your mapped region
contains an area at the end which triggers a bus error when accessed.
(Also, since protection is granularized on pages, you can access some
bytes beyond the end of the file to the end of the page: watch out).

>hint as to other techniques that may or may not work.
>
>In particuler what I am thinking of doing is:
>1) open the original file with open().
>2) seek to the end
>3) append the new entry
>4) close() the file

No need to open and close if you retain the original file descriptor
that was used to do the map.

>Is there a method for a program to determine the valid size of the
>mapped file as opposed to the size of the memory region passed to
>mmap()?

lseek of zero from using CUR_END.

Kaz Kylheku

unread,
Apr 28, 1998, 3:00:00 AM4/28/98
to

In article <kbp11.1338$971.3...@newsgate.direct.ca>,

Kaz Kylheku <k...@cafe.net> wrote:
>>Is there a method for a program to determine the valid size of the
>>mapped file as opposed to the size of the memory region passed to
>>mmap()?
>
>lseek of zero from using CUR_END.

Or do a fstat on the file descriptor and retrieve the size from
the st_size field.

Paul Cifarelli

unread,
Apr 29, 1998, 3:00:00 AM4/29/98
to


Bryan White wrote:

> In particuler what I am thinking of doing is:
> 1) open the original file with open().
> 2) seek to the end
> 3) append the new entry
> 4) close() the file
>

> Will this work?

Well, it will add to the file, but it certainly wont change the mapping in
any way. Attempt toaccess the new record will result in SIGBUS, as always.


> Is there some sync operation(msync, or madvise) that
> needs to be done?

You need to remap the region. On Solaris and most *nix's, this is done by
callingmmap again, specifying the new len. The os will replace the original
mapping with the
new one. Never tried it in linux myself, though. Worst case you could
unmap and
then mmap again.

The important thing to remember when you remap is that the virtual address
of the new mapped area need not be the same as the original.


> Would this work better if I kept the original file
> descriptor(the one given to mmap) around and used that for the append
> operation.
>

It obviously saves you the trouble of reopening the file, but no other
advantage.

> Is there a method for a program to determine the valid size of the
> mapped file as opposed to the size of the memory region passed to
> mmap()?
>

It is true that the mapping will occur in multiples of the page size, but
thatdoesnt stop the os from delivering you a SIGBUS if you go beyond the
range that you asked for. If records are rarely added, as you say, then
I would just remap the file.


Regards,
Paul Cifarelli
pa...@ilx.com
p...@pipeline.com


Steve Peltz

unread,
Apr 29, 1998, 3:00:00 AM4/29/98
to

In article <kbp11.1338$971.3...@newsgate.direct.ca>,
Kaz Kylheku <k...@cafe.net> wrote:
>The way you extend a mapped file is by doing an lseek and a write.

A simpler way is to use ftruncate().

>You cannot access memory beyond the mapping, that much is true. But you can
>map beyond the physical end of the file, in which case your mapped region
>contains an area at the end which triggers a bus error when accessed.

At least under Linux, you can extend the underlying file, and if you
originally mapped it long enough, the "new" part of the file will become
accessible without having to remap it. Similarly, truncating the file
will make parts of your memory signal an error (either SIGBUS or SIGSEGV,
depending on the version of Linux, according to Linus)

Note that you have to be careful to either always use mapped files, or
use msync() appropriately. In some operating systems, write() to the
underlying file is immediately reflected in a shared map, regardless
of whether the pages are memory resident or not; I'm not sure what
Linux does.

Steve Peltz

unread,
Apr 29, 1998, 3:00:00 AM4/29/98
to

In article <35476205...@ilx.com>, Paul Cifarelli <pa...@ilx.com> wrote:
>Well, it will add to the file, but it certainly wont change the mapping in
>any way. Attempt toaccess the new record will result in SIGBUS, as always.

If it is outside the mapped region, then it will be a SIGSEGV. If it is
inside the mapped region, the newly extended parts of the file will be
accessible. You MAY need to do an msync() with an MS_INVALIDATE request
if the data was put there using write().

I note that msync doesn't seem to do much of anything, however, and that
Linux doesn't appear to need it to keep write() and mmap() in sync (which
doesn't excuse it not returning the memory when requested!). Also, blocks
don't actually get allocated in a file when you write to an unallocated
area, only when you unmap the area, which is also when all (dirtied)
blocks get written out as well (even if there are other processes which
still have the pages mapped, and even if the pages have been returned
by some other process since your process dirtied them). It seems odd
that it doesn't keep track of which pages are dirty and which aren't,
independent of which process got them that way, and allow them to be
returned asynchronously instead of only doing it during the unmap (which
typically can make it take a long time to exit a process).

Doing a shared mmap() of a 32M file (ftruncated(), so no blocks
allocated), after writing to each page, before doing an msync(MS_SYNC),
the file has 0 blocks allocated; after the msync, it has 8 blocks
allocated; after unmapping, it has 32897 blocks allocated.

0 new messages