Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

POSIX shm_open() vs. mmap(MAP_ANON|MAP_SHARED)....

1 view
Skip to first unread message

Greg A. Woods

unread,
Jul 9, 2003, 2:18:16 PM7/9/03
to
[ On Wednesday, July 9, 2003 at 17:43:30 (+0100), Christoph Hellwig wrote: ]
> Subject: Re: fsync performance hit on 1.6.1
>
> Not trying to defend IEEE here, but there is some sense at leat behind
> shm_open. Given that for shm your really want an object that's not
> backed by permantent storage (= a normal filesystem) you need to know
> where to look for a tmpfs-lookalike or, in the case you mentioned above
> something outside the normal filesystem namespace (yuck!).

You don't need such a concept for mmap(MAP_ANON|MAP_SHARED) -- the
filename is simply a key to the anonymous memory so that multiple
processes can map the same anonymous memory and thus share it.

> As IEEE
> isn't into the filesystem namespace business shm_open is an okay wrapper
> for leaving this to the implementation.

Oh I agree there's some sense behing shm_open() -- just so long as you
ignore the MAP_ANON jumping up and down and waving its hands and
shouting at you from over in _front_ of the curtain over there.... :-)

> Why the heck they specified shm_unlink is completly unclear to me,
> though.

That one's easy! ;-)

shm_unlink(), like unlink(), takes a pathname parameter, so given the
fact shm_open() names are strictly outside the normal visible filesystem
space then you need a matching unlink() interface to work in this
private, invisible, namespace. (or at least you do so long as you don't
also have something like a funlink() call that takes an open file
descriptor as its parameter :-)

> Just because it was know that doesn't mean it should be standandardize.

Given the constraints of trying to work without MAP_ANON to thus end up
with the same functionality only after inventing a dozen new API
signatures to work around the lack of MAP_ANON is in fact a very good
reason to standardize a far simpler API. That's why I say there must
have been some very strong politics influencing the committee members.
Normally these comittees are loathe to invent new APIs and the mere fact
that they started down that road when they thought they could do without
MAP_ANON should have suggested to them that they were going in the wrong
direction. "Oops! We're inventing something! Let's go back to that
last fork in the road we took to get here!" (Of course POSIX.4 seems to
be mostly cut from whole cloth so maybe they didn't share that same
desire to avoid invention in the standardization process.)

> And MAP_ANON really doesn't fit into the SunOS4/SVR4 VM that wants a backing
> vnode for each memory object unlike the Mach VM.

I don't buy that argument at all. SysVr4 VM has the concept of
anonymous memory and the swap layer provides the backing store for
anonymous pages. I suspect forcing anonymous pages to always have the
MAP_PRIVATE attribute was their downfall. Anonymous pages could have
been made sharable simply by associating a vnode from an ordinary file
descriptor with them -- i.e. there's a vnode but it's not what's mapped,
anonymous memory is mapped and thus the swap layer continues to provide
the backing store. That's essentially how mmap(MAP_ANON|MAP_SHARED)
works, IIUC -- the filename, if given via an open file descriptor,
simply allows two independent processes to locate and attach the same
anonymous memory object and thus share it (i.e. the kernel does the
equivalent of an ftok() mapping to the object resource ID internally).

In fact SysV SHM is implemented in SysVr4, IIUC, using anonymous pages
that are tagged MAP_SHARED, and which have a reference to the anonymous
object (/dev/zero in brain-dead implementations), but the anonymous
object does not provide their backing store, the swap layer does
instead, as with all anonymous pages. A trivial implementation of
mmap(MAP_ANON) could have been done with internal calls to shmget() and
shmat() along with a simple additional reference-count table to keep
track of the next available unique ID to use where the the mmap() call
did not supply an open file descriptor (i.e. mmap() calls with fd=-1)
and such that shmctl(IPC_RMID) could be called when the process exited.
Such a trivial implementation may end up only allowing 255 truly
anonymous (fd=-1) mappings in the whole system if it were to guarantee
to stay out of the ftok() namespace for any possible filename mapping,
but I think this certainly shows the possibility is/was there to
implement mmap(MAP_ANON) in SysVr4.

> Thus the horrible
> mmap() of /dev/zero hack, btw..

Hmmm.... yes. What a stupid idea that was. :-) (A NULL vnode pointer
was apparently supposed to suffice such that a /dev/zero vnode was
unnecessary.)

--
Greg A. Woods

+1 416 218-0098; <g.a....@ieee.org>; <wo...@robohack.ca>
Planix, Inc. <wo...@planix.com>; VE3TCP; Secrets of the Weird <wo...@weird.com>

Greywolf

unread,
Jul 9, 2003, 4:01:44 PM7/9/03
to
Thus spake Greg A. Woods ("GAW> ") sometime Today...

GAW> shm_unlink(), like unlink(), takes a pathname parameter, so given the
GAW> fact shm_open() names are strictly outside the normal visible filesystem
GAW> space then you need a matching unlink() interface to work in this
GAW> private, invisible, namespace. (or at least you do so long as you don't
GAW> also have something like a funlink() call that takes an open file
GAW> descriptor as its parameter :-)

Um, slightly off-topic, but wouldn't funlink() be somewhat disastrous in
practice? (I presume that's why the smiley). That would require a file-
system cleaner process or a routine that knew instantly how to match inode
numbers to pathnames (as it was explained to me, "The kernel routine is
called namei() for a reason. You will note that there is no converse
routine, since while name -> ino-dev is unique for each ino-dev, the
reverse is untrue -- consider /foo/bar/.. and /foo, for example...").

GAW> > Thus the horrible
GAW> > mmap() of /dev/zero hack, btw..
GAW>
GAW> Hmmm.... yes. What a stupid idea that was. :-) (A NULL vnode pointer
GAW> was apparently supposed to suffice such that a /dev/zero vnode was
GAW> unnecessary.)

Wow, a (vno_t *) NULL was supposed to allow one to create pre-cleared
pages in memory?

Despite its ugliness, /dev/zero has other uses, such as creating
arbitrarily large filespaces (for, e.g., swap (don't go there.)) without
having to rewrite a program to handle it -- one can use dd for it,
though I wouldn't have minded a 'mkfile' program to do the same thing
(thus avoiding the need for /dev/zero).

I have a question regarding mmap()ing /dev/zero:

Purportedly this was used by crt0.o and/or ld.so to create blank spots
into which to load the dynamic libraries. Surely the same thing could
have been accomplished with *alloc() and a clear routine, or
mmap() could just pre-zero whatever pages it maps. Was /dev/zero
*truly* necessary? Its sudden disappearance once or thrice on
SunOS was the cause of some concern (more so after it was discovered
that mknod(8) was dynamically linked (genius--;)).

--*greywolf;
--
I pushed my DeLorean to 88 MPH, and all I got was this stupid speeding ticket.

Greg A. Woods

unread,
Jul 10, 2003, 12:37:57 PM7/10/03
to
[ On Wednesday, July 9, 2003 at 13:00:03 (-0700), Greywolf wrote: ]
> Subject: Re: POSIX shm_open() vs. mmap(MAP_ANON|MAP_SHARED)....

>
> Wow, a (vno_t *) NULL was supposed to allow one to create pre-cleared
> pages in memory?

Apparently. There is only one anonymous object in the system -- why
would you need a "fake" pathname to represent it? (there's no backing
store in that kind of device file :-)

> I have a question regarding mmap()ing /dev/zero:
>
> Purportedly this was used by crt0.o and/or ld.so to create blank spots
> into which to load the dynamic libraries. Surely the same thing could
> have been accomplished with *alloc() and a clear routine,

That would be _very_ expensive in terms of VM for every process since
every page would fault and copy-on-write every time!

> or
> mmap() could just pre-zero whatever pages it maps.

Indeed mmap(MAP_ANON) memory is zero-filled (and must be else the
garbage it revealed may violate someones privacy because that garbage
would very likely have come from some other process)

0 new messages