Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

double mmap calls

2,811 views
Skip to first unread message

phil-new...@ipal.net

unread,
Jan 18, 2001, 6:39:00 PM1/18/01
to
Does Linux support double mmap calls where a 2nd mmap() can be used
to overlay an existing memory map to another address? I tired of
following all the calls through from sys_mmap() to see if it does
or not (got pretty deep, but nothing obvious one way or the other).

--
-----------------------------------------------------------------
| Phil Howard - KA9WGN | Dallas | http://linuxhomepage.com/ |
| phil-...@ipal.net | Texas, USA | http://phil.ipal.org/ |
-----------------------------------------------------------------

Arthur H. Gold

unread,
Jan 19, 2001, 12:32:17 AM1/19/01
to
phil-new...@ipal.net wrote:
>
> Does Linux support double mmap calls where a 2nd mmap() can be used
> to overlay an existing memory map to another address? I tired of
> following all the calls through from sys_mmap() to see if it does
> or not (got pretty deep, but nothing obvious one way or the other).
>
> --
I'm not sure I understand you, but could you be thinking of
mremap()?
HTH,
--ag
--
Artie Gold, Austin, TX (finger the cs.utexas.edu account
for more info)
mailto:ag...@bga.com or mailto:ag...@cs.utexas.edu
--
A: Yes I would. But not enough to put it out.

phil-new...@ipal.net

unread,
Jan 19, 2001, 11:47:28 PM1/19/01
to
On Thu, 18 Jan 2001 23:32:17 -0600 Arthur H. Gold <ag...@bga.com> wrote:

| phil-new...@ipal.net wrote:
|>
|> Does Linux support double mmap calls where a 2nd mmap() can be used
|> to overlay an existing memory map to another address? I tired of
|> following all the calls through from sys_mmap() to see if it does
|> or not (got pretty deep, but nothing obvious one way or the other).
|>
|> --
| I'm not sure I understand you, but could you be thinking of
| mremap()?

No.

I'm trying to make 2 virtual addresses point to the same real address
and do so on adjacent pages.

John Reiser

unread,
Jan 20, 2001, 1:40:00 AM1/20/01
to
> |> Does Linux support double mmap calls where a 2nd mmap() can be used
> |> to overlay an existing memory map to another address? I tired of
> |> following all the calls through from sys_mmap() to see if it does
> |> or not (got pretty deep, but nothing obvious one way or the other).
> |>
> |> --
> | I'm not sure I understand you, but could you be thinking of
> | mremap()?
>
> No.
>
> I'm trying to make 2 virtual addresses point to the same real address
> and do so on adjacent pages.


#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/user.h>
#include <unistd.h>

main()
{
/* needs error checking, of course! */
char const filename[] = "/tmp/DoubleMapXXXXXX";
char *const fn = (char *)malloc(1+ sizeof(filename));
char *const junk = strcpy(fn, filename);
int const fd = mkstemp(fn);
int const result = lseek(fd, PAGE_SIZE, SEEK_CUR);
char *const addr1 = mmap(0, PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
char *const addr2 = mmap(addr1 + PAGE_SIZE, PAGE_SIZE,
PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
}

--
John Reiser, jre...@BitWagon.com

John Reiser

unread,
Jan 20, 2001, 1:43:24 AM1/20/01
to
Also use MAP_FIXED in the second call of mmap,
although the posted version seems to work.

--
John Reiser, jre...@BitWagon.com

Kasper Dupont

unread,
Jan 20, 2001, 7:38:51 AM1/20/01
to

The file may be removed with unlink as soon as it has been
opened, and it may be closed as soon as it has been maped.
If you don't want to use a file at all you can use a shared
memory segment. See man pages for shmget, shmat and shmctl.

--
Kasper Dupont

Linus Torvalds

unread,
Jan 20, 2001, 8:07:58 PM1/20/01
to
In article <3A6932C0...@BitWagon.com>,

John Reiser <jre...@BitWagon.com> wrote:
>#include <stdlib.h>
>#include <string.h>
>#include <sys/mman.h>
>#include <sys/user.h>
>#include <unistd.h>
>
>main()
>{
> /* needs error checking, of course! */
> char const filename[] = "/tmp/DoubleMapXXXXXX";
> char *const fn = (char *)malloc(1+ sizeof(filename));
> char *const junk = strcpy(fn, filename);
> int const fd = mkstemp(fn);

I appreciate the carefulness of this, but why not just do

static char filename[] = "/tmp/DoubleMapXXXXXX";
int fd = mkstemp(filename);

which is rather simpler and does much less work (the reason for using
"static char filename[]" at all is obviously so that the string will be
writable rather than a string constant).

Simplicity is a virtue too.

Remove the "static" part if you want the string contents re-initialized
each time (it's slower and uses more memory, but if your function gets
called multiple times you need to do this).

> int const result = lseek(fd, PAGE_SIZE, SEEK_CUR);

This lseek should be a "ftruncate()", I assume.

> char *const addr1 = mmap(0, PAGE_SIZE,
> PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> char *const addr2 = mmap(addr1 + PAGE_SIZE, PAGE_SIZE,
> PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

Use MAP_SHARED | MAP_FIXED for the second mmap. Otherwise, depending on
the phase of the moon and other things, it won't work.

Also, to be strictly safe, what you _should_ do is actually make the
first mmap() be 2 pages in size - so that the system finds an empty
virtual area of 2 pages for you. The second mmap() then over-mmap's the
second page.

If you don't do it that way, it's possible that the first mmap just
finds a gap in the VM space that is exactly one page in size. The second
mmap would them over-mmap something _else_, and you'd have some really
hard-to-debug problems.

(The above really is nit-picking: it's a damn unlikely schenario. But
it's the unlikely schenarios that bite you really badly just when you
don't need them - and if the above is part of a suid application it
could be a security issue with the user being able to take advantage of
it some way by causing a certain pattern of mmap's with special input).

So make it something like

ftruncate(fd, PAGE_SIZE);
addr1 = mmap(0, 2*PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
addr2 = mmap(addr1+PAGE_SIZE, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, fd, 0);

(with error checking, of course, otherwise my "security" issues are
totally moot)

Linus

phil-new...@ipal.net

unread,
Jan 22, 2001, 11:26:08 AM1/22/01
to
On Sat, 20 Jan 2001 13:38:51 +0100 Kasper Dupont <kas...@daimi.au.dk> wrote:

| The file may be removed with unlink as soon as it has been
| opened, and it may be closed as soon as it has been maped.
| If you don't want to use a file at all you can use a shared
| memory segment. See man pages for shmget, shmat and shmctl.

I can always mmap() on /dev/zero for anonymous space. But if
there is yet another way with shmget and friends, I'd at least
be curious how to set that up in this peculiar way. Since the
shm* functions have always confused me because documentation
all seems to lack some piece of crucial information I do not
know, maybe your example might even clear that up.

phil-new...@ipal.net

unread,
Jan 22, 2001, 11:51:46 AM1/22/01
to
On 20 Jan 2001 17:07:58 -0800 Linus Torvalds <torv...@penguin.transmeta.com> wrote:

| In article <3A6932C0...@BitWagon.com>,
| John Reiser <jre...@BitWagon.com> wrote:
|>#include <stdlib.h>
|>#include <string.h>
|>#include <sys/mman.h>
|>#include <sys/user.h>
|>#include <unistd.h>
|>
|>main()
|>{
|> /* needs error checking, of course! */
|> char const filename[] = "/tmp/DoubleMapXXXXXX";
|> char *const fn = (char *)malloc(1+ sizeof(filename));
|> char *const junk = strcpy(fn, filename);
|> int const fd = mkstemp(fn);
|
| I appreciate the carefulness of this, but why not just do
|
| static char filename[] = "/tmp/DoubleMapXXXXXX";
| int fd = mkstemp(filename);
|
| which is rather simpler and does much less work (the reason for using
| "static char filename[]" at all is obviously so that the string will be
| writable rather than a string constant).
|
| Simplicity is a virtue too.

I'd agree. But I was wondering why not make it even simpler like:

fd = open("/dev/zero",O_RDWR,0);

Since I don't need persistence in a file and I don't need convergence
from other processes, and given that separate opens of /dev/zero don't
actually share, this would seem the way to go.

Maybe John was assuming a broader need than I had.


| Remove the "static" part if you want the string contents re-initialized
| each time (it's slower and uses more memory, but if your function gets
| called multiple times you need to do this).
|
|> int const result = lseek(fd, PAGE_SIZE, SEEK_CUR);
|
| This lseek should be a "ftruncate()", I assume.
|
|> char *const addr1 = mmap(0, PAGE_SIZE,
|> PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
|> char *const addr2 = mmap(addr1 + PAGE_SIZE, PAGE_SIZE,
|> PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
|
| Use MAP_SHARED | MAP_FIXED for the second mmap. Otherwise, depending on
| the phase of the moon and other things, it won't work.

I shouldn't need all this (except for MAP_FIXED) when using /dev/zero,
right?


| Also, to be strictly safe, what you _should_ do is actually make the
| first mmap() be 2 pages in size - so that the system finds an empty
| virtual area of 2 pages for you. The second mmap() then over-mmap's the
| second page.
|
| If you don't do it that way, it's possible that the first mmap just
| finds a gap in the VM space that is exactly one page in size. The second
| mmap would them over-mmap something _else_, and you'd have some really
| hard-to-debug problems.

What I apparently missed was that when mmap(,,,MAP_FIXED,,) is called
on a page that was already mapped, that it would replace the old mapping
on a page-by-page basis.

Doing all the space I need in the first mmap() call, then replacing the
2nd half of the space in the second mmap() call, clears up the issues I
had in this.


| (The above really is nit-picking: it's a damn unlikely schenario. But
| it's the unlikely schenarios that bite you really badly just when you
| don't need them - and if the above is part of a suid application it
| could be a security issue with the user being able to take advantage of
| it some way by causing a certain pattern of mmap's with special input).

Exactly my concerns.

The program could run fine until the next kernel (changes the order some
mapping is done) or the next libc (changes some of the stuff it maps), etc.
And that one would be quite a challenge to debug after it was running since
one would focus on apparent incompatibility.


| So make it something like
|
| ftruncate(fd, PAGE_SIZE);
| addr1 = mmap(0, 2*PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
| addr2 = mmap(addr1+PAGE_SIZE, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, fd, 0);
|
| (with error checking, of course, otherwise my "security" issues are
| totally moot)

So what about using /dev/zero in this case since I only need to use
this in the same process (or a child of this process forked after
all this is set up)?

Linus Torvalds

unread,
Jan 22, 2001, 3:09:55 PM1/22/01
to
In article <t6op92t...@news.supernews.com>,

<phil-new...@ipal.net> wrote:
>|
>| Simplicity is a virtue too.
>
>I'd agree. But I was wondering why not make it even simpler like:
>
> fd = open("/dev/zero",O_RDWR,0);

/dev/zero won't do what you want it to do - you want to map the _same_
physical page twice, right?

/dev/zero will give you different physical pages: they'll be shared
after a fork(), but they won't be shared across multiple mmap's. Think
of each "mmap()" as opening a separate file, that is "shared" only in
the sense that fork() won't do the copy-on-write thing.

Also, if you actually want just /dev/zero, you might as well forget
about open() altogether, and do

mmap( ,,, MAP_SHARED | MAP_ANONYMOUS, -1, 0)

because you don't need a file descriptor to get an anonymous mapping.
But I bet you want the tmp-file backing store - it acts as an anchor to
make your mmap's truly shared.

(Alternatively, you can just use SysV shared memory, of course, and use
shmat to map it in multiple places.)

Linus

Dragan Cvetkovic

unread,
Jan 22, 2001, 3:38:45 PM1/22/01
to
torv...@penguin.transmeta.com (Linus Torvalds) writes:
> Also, if you actually want just /dev/zero, you might as well forget
> about open() altogether, and do
>
> mmap( ,,, MAP_SHARED | MAP_ANONYMOUS, -1, 0)
>
> because you don't need a file descriptor to get an anonymous mapping.

Just a word of warning that MAP_ANONYMOUS is not universaly available
(which might be a non-issue if you are just developing for Linux and/or
BSD),. i.e. SVR4 system lack this feature.

Bye, Dragan

--
Dragan Cvetkovic,

To be or not to be is true. G. Boole

Linus Torvalds

unread,
Jan 22, 2001, 4:59:36 PM1/22/01
to
In article <lmelxvw...@landru.tor.soliton.com>,

Dragan Cvetkovic <dcvet...@mailcity.com> wrote:
>
>Just a word of warning that MAP_ANONYMOUS is not universaly available
>(which might be a non-issue if you are just developing for Linux and/or
>BSD),. i.e. SVR4 system lack this feature.

In many cases you can do

#ifndef MAP_ANONYMOUS
#define MAP_ANOMYMOUS MAP_ANON
#endif

which gets you working on a number of systems.

Still not everything, I agree.

Linus

Dragan Cvetkovic

unread,
Jan 22, 2001, 5:08:21 PM1/22/01
to
torv...@penguin.transmeta.com (Linus Torvalds) writes:
> >
> >Just a word of warning that MAP_ANONYMOUS is not universaly available
> >(which might be a non-issue if you are just developing for Linux and/or
> >BSD),. i.e. SVR4 system lack this feature.
>
> In many cases you can do
>
> #ifndef MAP_ANONYMOUS
> #define MAP_ANOMYMOUS MAP_ANON
> #endif
>
> which gets you working on a number of systems.
Notably exception being Solaris. I wish they implement this
feature, so that I don't have to ifdef my code any more :-(

Pete Zaitcev

unread,
Jan 22, 2001, 5:40:43 PM1/22/01
to
> > >Just a word of warning that MAP_ANONYMOUS is not universaly available
> > >(which might be a non-issue if you are just developing for Linux and/or
> > >BSD),. i.e. SVR4 system lack this feature.
> >
> > In many cases you can do
> >
> > #ifndef MAP_ANONYMOUS
> > #define MAP_ANOMYMOUS MAP_ANON
> > #endif
> >
> > which gets you working on a number of systems.

> Notably exception being Solaris. I wish they implement this
> feature, so that I don't have to ifdef my code any more :-(
>
> Dragan

How long ago have you tried? Solaris has anonymous mmap for
quite some time. Solaris 8 adds a define just like the above
specifically for compatibility with Linux.

-- Pete

phil-new...@ipal.net

unread,
Jan 22, 2001, 6:13:53 PM1/22/01
to
On 22 Jan 2001 15:38:45 -0500 Dragan Cvetkovic <dragan...@99.soliton.com> wrote:
| torv...@penguin.transmeta.com (Linus Torvalds) writes:
|> Also, if you actually want just /dev/zero, you might as well forget
|> about open() altogether, and do
|>
|> mmap( ,,, MAP_SHARED | MAP_ANONYMOUS, -1, 0)
|>
|> because you don't need a file descriptor to get an anonymous mapping.
|
| Just a word of warning that MAP_ANONYMOUS is not universaly available
| (which might be a non-issue if you are just developing for Linux and/or
| BSD),. i.e. SVR4 system lack this feature.

So tell me what the universal way is ... what one way works on all
the systems you know (and including Linux, of course).

Linus Torvalds

unread,
Jan 22, 2001, 6:36:16 PM1/22/01
to
In article <t6pflhk...@news.supernews.com>,

<phil-new...@ipal.net> wrote:
>
>So tell me what the universal way is ... what one way works on all
>the systems you know (and including Linux, of course).

There is no universal way, of course. The gcc lists had an interesting
discussion about some SCO UnixWare thing (or something) that actually
has the MAP_ANON #defines in the headers, so the program will compile.
When you actually _run_ it, it turns out that MAP_ANON doesn't actually
work (silently).

And with Linux, it for the longest time was true that MAP_ANONYMOUS did
_not_ work with MAP_SHARED (and /dev/zero did the same), but worked fine
with normal private mappings. So we shouldn't throw stones in glass
houses in this particular area ;)

The SysV shared memory thing is probably the most portable approach, but
it's rather less flexible than mmap (it has other advantages, of
course).

Linus

Andi Kleen

unread,
Jan 22, 2001, 6:54:17 PM1/22/01
to
torv...@penguin.transmeta.com (Linus Torvalds) writes:

> There is no universal way, of course. The gcc lists had an interesting
> discussion about some SCO UnixWare thing (or something) that actually
> has the MAP_ANON #defines in the headers, so the program will compile.
> When you actually _run_ it, it turns out that MAP_ANON doesn't actually
> work (silently).

Sounds like MAP_SHARED for IA32 programs when you run it in IA64 Linux
compiled with page sizes > 4k @)


-Andi

phil-new...@ipal.net

unread,
Jan 22, 2001, 9:03:23 PM1/22/01
to
On 22 Jan 2001 12:09:55 -0800 Linus Torvalds <torv...@penguin.transmeta.com> wrote:
| In article <t6op92t...@news.supernews.com>,
| <phil-new...@ipal.net> wrote:
|>|
|>| Simplicity is a virtue too.
|>
|>I'd agree. But I was wondering why not make it even simpler like:
|>
|> fd = open("/dev/zero",O_RDWR,0);
|
| /dev/zero won't do what you want it to do - you want to map the _same_
| physical page twice, right?
|
| /dev/zero will give you different physical pages: they'll be shared
| after a fork(), but they won't be shared across multiple mmap's. Think
| of each "mmap()" as opening a separate file, that is "shared" only in
| the sense that fork() won't do the copy-on-write thing.

So even if it is the same file descriptor, this isn't treated as the
same file?


| Also, if you actually want just /dev/zero, you might as well forget
| about open() altogether, and do
|
| mmap( ,,, MAP_SHARED | MAP_ANONYMOUS, -1, 0)
|
| because you don't need a file descriptor to get an anonymous mapping.

Obviously, this doesn't give a means for 2 different mmap() calls to
refer to the same piece of memory.


| But I bet you want the tmp-file backing store - it acts as an anchor to
| make your mmap's truly shared.

I want to avoid an actual file if at all possible. Supposedly I can
unlink the file after doing the 2 mmap() calls. But I don't want the
space being stored on that filesystem. If the space gets swapped out,
it should be to the swap space. If there is no swap space at all, then
it can't be swapped out. Now if that happens due to copy on write,
that would be fine, but I don't know if that would work with the data
still shared between the 2 adjacently located mappings. Hopefully a
very minimum of physical I/O to the /tmp filesystem will happen as a
result of the open/close/unlink and none from writing to the space.


| (Alternatively, you can just use SysV shared memory, of course, and use
| shmat to map it in multiple places.)

Well, I would like a universal way. Maybe that's not possible, and
could explain the lack of a solution when I asked the same question
on comp.unix.programmer.

phil-new...@ipal.net

unread,
Jan 22, 2001, 10:04:23 PM1/22/01
to
On 22 Jan 2001 15:36:16 -0800 Linus Torvalds <torv...@penguin.transmeta.com> wrote:

| The SysV shared memory thing is probably the most portable approach, but
| it's rather less flexible than mmap (it has other advantages, of
| course).

Well, I know SysV systems like Solaris do it. I just checked OpenBSD 2.8
and it has the man page, so I assume it does it. I'm installing FreeBSD
4.2 right now to see if it does. That should cover most of the major
platforms, which is already OT for this group :-)

Dragan Cvetkovic

unread,
Jan 23, 2001, 9:16:45 AM1/23/01
to
zai...@yahoo.com (Pete Zaitcev) writes:
> > > #ifndef MAP_ANONYMOUS
> > > #define MAP_ANOMYMOUS MAP_ANON
> > > #endif
> > >
> > > which gets you working on a number of systems.
>
> > Notably exception being Solaris. I wish they implement this
> > feature, so that I don't have to ifdef my code any more :-(
> >
> > Dragan
>
> How long ago have you tried? Solaris has anonymous mmap for
> quite some time. Solaris 8 adds a define just like the above
> specifically for compatibility with Linux.
>
> -- Pete

Well it certainly doesn't work in Solaris 2.6 and Solaris 7. You
are right about Solaris 8.
Bye, Dragan

P.S. I am not sure about "specifically for compatibility with Linux" bit,
since MAP_ANON existed in BSD 4.3 (cf. Stevens APUE, p. 470), long before
Linux existed. Besides, Linux defines MAP_ANON in terms of
MAP_ANOMYMOUS. OK, I will stop now.

bill davidsen

unread,
Jan 23, 2001, 3:09:34 PM1/23/01
to
In article <t6ppjbp...@news.supernews.com>,
<phil-new...@ipal.net> wrote:

| Obviously, this doesn't give a means for 2 different mmap() calls to
| refer to the same piece of memory.
|
|
| | But I bet you want the tmp-file backing store - it acts as an anchor to
| | make your mmap's truly shared.
|
| I want to avoid an actual file if at all possible. Supposedly I can
| unlink the file after doing the 2 mmap() calls. But I don't want the
| space being stored on that filesystem. If the space gets swapped out,
| it should be to the swap space. If there is no swap space at all, then
| it can't be swapped out. Now if that happens due to copy on write,
| that would be fine, but I don't know if that would work with the data
| still shared between the 2 adjacently located mappings. Hopefully a
| very minimum of physical I/O to the /tmp filesystem will happen as a
| result of the open/close/unlink and none from writing to the space.
|
|
| | (Alternatively, you can just use SysV shared memory, of course, and use
| | shmat to map it in multiple places.)
|
| Well, I would like a universal way. Maybe that's not possible, and
| could explain the lack of a solution when I asked the same question
| on comp.unix.programmer.

I like the suggestion of SysV shared memory. There are still a fair
number of o/s which don't have mmap() quite right. At least with SysV
shmat(), if it is there at all it seems to work. I haven't seen an o/s
which had a broken implementation, although I haven't tried all of them.
I have an IPC program which tries various methods to time token passing
through IPC, and shmat() has worked everywhere I've tested it.

--
bill davidsen <davi...@tmr.com> CTO, TMR Associates, Inc
"I am lost. I am out looking for myself. If I should come back before I
return, please ask me to wait." -seen in a doctor's office

Linus Torvalds

unread,
Jan 23, 2001, 4:29:24 PM1/23/01
to
>On 22 Jan 2001 12:09:55 -0800 Linus Torvalds <torv...@penguin.transmeta.com> wrote:
>|
>| /dev/zero will give you different physical pages: they'll be shared
>| after a fork(), but they won't be shared across multiple mmap's. Think
>| of each "mmap()" as opening a separate file, that is "shared" only in
>| the sense that fork() won't do the copy-on-write thing.
>
>So even if it is the same file descriptor, this isn't treated as the
>same file?

Nope. Not under Linux, at least. If you can find an OS that does it that
way (ie "one open, one mapping"), please holler. Not very many people
use shared writable mmap's under Linux, simply because historcially it
didn't work at all (only private mappings worked).

>
>| Also, if you actually want just /dev/zero, you might as well forget
>| about open() altogether, and do
>|
>| mmap( ,,, MAP_SHARED | MAP_ANONYMOUS, -1, 0)
>|
>| because you don't need a file descriptor to get an anonymous mapping.
>
>Obviously, this doesn't give a means for 2 different mmap() calls to
>refer to the same piece of memory.

Yes. Under Linux, the behaviour of the two is exactly the same.

>| But I bet you want the tmp-file backing store - it acts as an anchor to
>| make your mmap's truly shared.
>
>I want to avoid an actual file if at all possible. Supposedly I can
>unlink the file after doing the 2 mmap() calls.

Yes. Just unlink() and close the fd. The inode (and associated
storage) will still be kept around for the mmap's, but when the mmaps go
away, so will the inode. Which means that your app doesn't need to do
any book-keeping.

> But I don't want the
>space being stored on that filesystem.

Does it really matter? If it does, then you either have to use ramfs
(stays in memory) or you had better use SysV IPC.

Linus

John Reiser

unread,
Jan 23, 2001, 5:51:18 PM1/23/01
to
> I like the suggestion of SysV shared memory. There are still a fair
> number of o/s which don't have mmap() quite right. At least with SysV
> shmat(), if it is there at all it seems to work. I haven't seen an o/s
> which had a broken implementation, although I haven't tried all of them.
> I have an IPC program which tries various methods to time token passing
> through IPC, and shmat() has worked everywhere I've tested it.

Please post some code which compiles and runs. Documentation and
explanation of SysV shmget, shmat, etc., is lacking. At least with
code we can ask pointed questions :-)

--
John Reiser, jre...@BitWagon.com

phil-new...@ipal.net

unread,
Jan 23, 2001, 8:40:02 PM1/23/01
to
On 23 Jan 2001 20:09:34 GMT bill davidsen <davi...@tmr.com> wrote:

| I like the suggestion of SysV shared memory. There are still a fair
| number of o/s which don't have mmap() quite right. At least with SysV
| shmat(), if it is there at all it seems to work. I haven't seen an o/s
| which had a broken implementation, although I haven't tried all of them.
| I have an IPC program which tries various methods to time token passing
| through IPC, and shmat() has worked everywhere I've tested it.

One of the problems with SysV shared segments is that there is no
mechanism for making sure that you have a contiguous block twice as
big as the basic size needed (so there can be two together). If
the first falls into a gap, as Linus mentioned, then the next one
cannot be created at all. You can't create one first twice the
size then using the same segment create one normal size, since the
size is set at shmget() time instead of shmat() time.

Still, I've decided to use SysV shared segments the following way.
I first do an anonymous mmap() for twice the size, then I munmap()
it and do 2 calls to shmat() where the mmap() found enough space.
This is still exposed to systems where mmap() is implemented wrong
or not at all.

phil-new...@ipal.net

unread,
Jan 23, 2001, 11:06:31 PM1/23/01
to
On 23 Jan 2001 13:29:24 -0800 Linus Torvalds <torv...@penguin.transmeta.com> wrote:

| In article <t6ppjbp...@news.supernews.com>,
| <phil-new...@ipal.net> wrote:
|>On 22 Jan 2001 12:09:55 -0800 Linus Torvalds <torv...@penguin.transmeta.com> wrote:
|>|
|>| /dev/zero will give you different physical pages: they'll be shared
|>| after a fork(), but they won't be shared across multiple mmap's. Think
|>| of each "mmap()" as opening a separate file, that is "shared" only in
|>| the sense that fork() won't do the copy-on-write thing.
|>
|>So even if it is the same file descriptor, this isn't treated as the
|>same file?
|
| Nope. Not under Linux, at least. If you can find an OS that does it that
| way (ie "one open, one mapping"), please holler. Not very many people
| use shared writable mmap's under Linux, simply because historcially it
| didn't work at all (only private mappings worked).

I don't know of any. That's probably because I've never tried to do this
kind of thing until now, and even Stevens didn't get that specific to dispell
(or confirm) my [wrong] assumptions.


|>| But I bet you want the tmp-file backing store - it acts as an anchor to
|>| make your mmap's truly shared.
|>
|>I want to avoid an actual file if at all possible. Supposedly I can
|>unlink the file after doing the 2 mmap() calls.
|
| Yes. Just unlink() and close the fd. The inode (and associated
| storage) will still be kept around for the mmap's, but when the mmaps go
| away, so will the inode. Which means that your app doesn't need to do
| any book-keeping.

Unfortunately, I cannot count on /tmp having all the space for as many
of these that might exist. As long as I can get it into swap space,
then at least if it runs out of space, it is the expected issue (more
processes needs more swap space). Someone could fill up /tmp space and
affect things quite unexpectedly. OTOH, if swap is full, the reason is
what one expects.


| Does it really matter? If it does, then you either have to use ramfs
| (stays in memory) or you had better use SysV IPC.

Yes, it does matter, at least to me.

I've decided to code up using the following approach:

1. addr=mmap(,size*2,,MAP_ANONYMOUS,,) to find address space big enough
2. munmap() and hope no same process threads take memory here
3. shmget(IPC_PRIVATE,size,0) this should get swap backing store
4. shmat(id,addr,0); shmat(id,addr+size,0)

I haven't tested it yet, since there are other parts of tha project yet
to be coded to make it workable. The project is a pseudo-circular buffer
library which allows the caller to get a peek address into the buffer to
see if there is enough data available to do something with it (a line
of text, for example). I've previously done this without the memory
tricks, and the interface has always been awkward due to the need to
view data as one piece which is otherwise wrapped around.

<OT>

BTW, I love ramfs. It's made it easy to have a larger bunch of files
in a CD booted, no HD, system. And pivot_root() was a big plus there,
letting me have / on the ramfs. I've got a two stage program which
runs as init that sets it all up and loads a tarball from the CDROM
and finally runs the real init. It's part of a package that includes
a bootable ISO construction kit script. The first alpha is here:
http://phil.ipal.org/freeware/bick/bick-0.4.0-alpha.tar.bz2 (11.2m)
and rescue CDs built with it are in the same directory.

</OT>

bill davidsen

unread,
Jan 25, 2001, 7:24:37 PM1/25/01
to
In article <t6sl67b...@news.supernews.com>,

<phil-new...@ipal.net> wrote:
| On 23 Jan 2001 13:29:24 -0800 Linus Torvalds <torv...@penguin.transmeta.com> wrote:

| I've decided to code up using the following approach:
|
| 1. addr=mmap(,size*2,,MAP_ANONYMOUS,,) to find address space big enough
| 2. munmap() and hope no same process threads take memory here
| 3. shmget(IPC_PRIVATE,size,0) this should get swap backing store
| 4. shmat(id,addr,0); shmat(id,addr+size,0)
|
| I haven't tested it yet, since there are other parts of tha project yet
| to be coded to make it workable. The project is a pseudo-circular buffer
| library which allows the caller to get a peek address into the buffer to
| see if there is enough data available to do something with it (a line
| of text, for example). I've previously done this without the memory
| tricks, and the interface has always been awkward due to the need to
| view data as one piece which is otherwise wrapped around.

Lack of sleep has caught up to me. I've done some number of circular
buffer implementations in shared memory, back to a Z80 and 8088 sharing
memory between two procvessors in the old DEC "Rainbow" system. I don't
see why you need more than one mapping, or why if you use two they must
adjoin. After reading your posts, I still can't decide if you're doing
something subtle or trying to save some programming effort with a
memory mapping which I am not sure will work on other than Linux and
depends on your point two. I assume subtlety ;-)

| <OT>
|
| BTW, I love ramfs. It's made it easy to have a larger bunch of files
| in a CD booted, no HD, system. And pivot_root() was a big plus there,
| letting me have / on the ramfs. I've got a two stage program which
| runs as init that sets it all up and loads a tarball from the CDROM
| and finally runs the real init. It's part of a package that includes
| a bootable ISO construction kit script. The first alpha is here:
| http://phil.ipal.org/freeware/bick/bick-0.4.0-alpha.tar.bz2 (11.2m)
| and rescue CDs built with it are in the same directory.
|
| </OT>

Gotta try this, I want one for reinstall backups of a bunch of little
machines.!

phil-new...@ipal.net

unread,
Jan 25, 2001, 9:13:37 PM1/25/01
to
On 26 Jan 2001 00:24:37 GMT bill davidsen <davi...@tmr.com> wrote:

| Lack of sleep has caught up to me. I've done some number of circular
| buffer implementations in shared memory, back to a Z80 and 8088 sharing
| memory between two procvessors in the old DEC "Rainbow" system. I don't
| see why you need more than one mapping, or why if you use two they must
| adjoin. After reading your posts, I still can't decide if you're doing
| something subtle or trying to save some programming effort with a
| memory mapping which I am not sure will work on other than Linux and
| depends on your point two. I assume subtlety ;-)

I need to be able to access data segments up to as much as all the
data that is present in the buffer as a linear (e.g. contiguous in
the proper order) block. The purposes for this including being
able to peek ahead at the buffered data for sentinels that indicate
the data is complete enough to take out of the buffer and fully
act on it (e.g. strchr() finds a newline when I'm looking for a
complete line) ... and being able to call sendto() data knowing that
it is being sent as a whole datagram. There are various other reasons.

I have done circular buffers other ways. I've done them more than once
where they just wrap around from back to front. I've also done them
where they keep reshifting so the beginning of the data content is at
the beginning of the space. I also did one with variable sized chunks
in a circularly linked list. What I'm trying to do is get the benefits
in not only a single implementation, but also in each instance.

It seems to be working. Code will be released soon. I am doing it as
a library so all the "mess" will be encapsulated, leaving one with an
API that allows things like peeking ahead.

And no, I didn't think of it. Someone on comp.unix.programmer suggested
it to me a couple years ago, and I just now got around to doing it. I
wish I could recall who that was.

| | <OT>
| |
| | BTW, I love ramfs. It's made it easy to have a larger bunch of files
| | in a CD booted, no HD, system. And pivot_root() was a big plus there,
| | letting me have / on the ramfs. I've got a two stage program which
| | runs as init that sets it all up and loads a tarball from the CDROM
| | and finally runs the real init. It's part of a package that includes
| | a bootable ISO construction kit script. The first alpha is here:
| | http://phil.ipal.org/freeware/bick/bick-0.4.0-alpha.tar.bz2 (11.2m)
| | and rescue CDs built with it are in the same directory.
| |
| | </OT>
|
| Gotta try this, I want one for reinstall backups of a bunch of little
| machines.!

Well, go download it :-)

Sorry, no documentation on it, yet. I'm still working on getting it to
do both Intel and Sparc in the same CD (should work since each platform
boots in a different way, and from there I can symlink different parts
of the ISO filesystem).

0 new messages