munmap bug?

Shawn L. Baird

unread,

Oct 28, 1994, 7:48:16 PM10/28/94

to

I'm working on some tricky stuff which involves swapping pages in and out
by hand using mmap and munmap. Under SunOS it seems fine, but under Linux
the following problem is occuring. When I munmap my pages, occasionally,
every now and then, it munmaps and then when I try to access the munmapped
location I don't hit a segmentation fault (this is not in the stack or
data space of the process). If I use mprotect instead of munmap (i.e., set
the protection to PROT_NONE) it works fine, but, since the pages aren't
being unmapped, memory usage continues to climb. It's tricky code, I'll
admit, so the problem could lie therein, but it seems to work okay under
SunOS (SunOS also mentions that a segmentation violation will occur if one
tries to access a page that's unmapped and doesn't reside in either the
data space or the stack space). Any help would be appreciated.

Shawn L. Baird

unread,

Oct 31, 1994, 2:03:37 PM10/31/94

to

r...@pe1chl.ampr.org (Rob Janssen) writes:
>We have heard the story "SunOS runs this program OK so it must be a
>bug in Linux" sooooo many times, and nearly always it turned out to be
>a bug in the user's program that merely wasn't detected by SunOS.

My point wasn't to try and claim that SunOS worked better, but rather
that SunOS worked according to the information in the manuals of both.

>However, in the case of mmap it may be a bug in Linux, as this part is
>not a complete implementation and there have been problems before.

>I would say: write a simpler test case that you can post to show a
>pertinent problem.

Well, I did run it under the debugger and strace several times. Now,
looking at the source code to munmap reveals that it _can_ fail without
generating an error. However, there's no easily feasable way for me to
check these cases out. In any event, gdb reveals the same problem when
tracing my program. That is, I will have gdb maintaining a pointer to
the region which is about to be munmapped. When the location is munmapped
the gdb pointer typically becomes invalid because it's now pointing to a
region of memory which it can't dereference. However, after this
particular munmap the dereference works fine, but the memory page that gdb
points to has been cleared to all zeros (i.e., not even the original data
that was there before munmapping) remains. This pretty clearly indicates
to me that the munmap is proceeding and is not correctly finishing. I
know better than to jump on the "my program doesn't work so it must be a
bug" bandwagon, but in this particular case I think there is sufficient
cause. I used strace to verify that munmap was being called with the
proper parameters and that it was not being called twice, or that it
wasn't being remapped somehow in the interim, etc. It would be hard to
write a much simpler case, as "most" of the munmaps succeed. All of them
point to previously mmapped memory, all of them are munmapping the same
size (one page exactly, the same size at what was mapped). All of them
are mmapping to a fixed address and all of the mmaps are succeeding. For
now I will just increase the cache size and place some detection of the
problem in place. I looked at the munmap code myself, but it will take a
bit of perusal before I fully understand how it works. If it turns out
to be a bug of mine own cause despite all of the above, well, I only
wanted to point out what could be a potential problem in case anybody
with more expertise at the way the kernel works felt like glancing over
what it does.

Jeremy Fitzhardinge

unread,

Nov 1, 1994, 2:32:51 AM11/1/94

to

In <scarrowC...@netcom.com> sca...@netcom.com (Shawn L. Baird) writes:
>Well, I did run it under the debugger and strace several times. Now,
>looking at the source code to munmap reveals that it _can_ fail without
>generating an error.

What case is that?

J

Shawn L. Baird

unread,

Nov 1, 1994, 10:58:13 AM11/1/94

to

>What case is that?

If the result of page_aligning the length gives a result of zero, it will
return 0 (the same as the return for the success case). If it finds no page
to free it apparently also returns 0. When it completes successfully, it
does return 0 as well. I wonder if it would be valid to have the previous
two cases return -EINVAL, like the case where address and len are put through
basic validity tests at the very beginning. My suspicion is that the list
of pages is getting munged or the mechanism behind finding the page is
missing it somehow and it is actually returning from one of these failed
cases. It is clearing the page, so that should help me at least determine
partly how far I'm getting. (Hmmm, odd that it would clear the page if it
didn't find it, so perhaps it is finding it and for some wacky reason the
page isn't getting made invalid.) Oh well, I shall continue to muddle my
way through for the moment. :)
Hmmm, spent some more time browsing and I can't even locate where it
clears the page. Not exactly the most straightforward section of the
kernel. *sigh* Then again, I'm not a memory free list writing, virtual
memory handling guru either. Perhaps this is a system feature invoked when
the entry is invalidated? I'm going to have to brush up on my knowledge of
x86 virtual memory I guess.

Hamish Macdonald

unread,

Nov 1, 1994, 1:49:35 PM11/1/94

to

>>>>> On 01 Nov 1994 10:58:13 EST,
>>>>> In message <scarrowC...@netcom.com>,
>>>>> sca...@netcom.com (Shawn L. Baird) wrote:

Shawn> If the result of page_aligning the length gives a result of
Shawn> zero, it will return 0 (the same as the return for the success
Shawn> case). If it finds no page to free it apparently also returns
Shawn> 0. When it completes successfully, it does return 0 as well.
Shawn> I wonder if it would be valid to have the previous two cases
Shawn> return -EINVAL, like the case where address and len are put
Shawn> through basic validity tests at the very beginning.

Someone gave me a patch for Linux/68k with which flags this as EINVAL.

Jeremy Fitzhardinge

unread,

Nov 2, 1994, 10:08:15 PM11/2/94

to

In <scarrowC...@netcom.com> sca...@netcom.com (Shawn L. Baird) writes:

>jer...@sour.sw.oz.au (Jeremy Fitzhardinge) writes:
>>In <scarrowC...@netcom.com> sca...@netcom.com (Shawn L. Baird) writes:
>>>Well, I did run it under the debugger and strace several times. Now,
>>>looking at the source code to munmap reveals that it _can_ fail without
>>>generating an error.
>
>>What case is that?
>
>If the result of page_aligning the length gives a result of zero, it will
>return 0 (the same as the return for the success case). If it finds no page
>to free it apparently also returns 0. When it completes successfully, it
>does return 0 as well. I wonder if it would be valid to have the previous
>two cases return -EINVAL, like the case where address and len are put through
>basic validity tests at the very beginning.

Unmapping a non-existant range of memory is a valid operation, so if it
has nothing to do it did what you asked for successfully. If it has to
return an error in this case, what should it return if the range you give
it is partially mapped: a partial error?

What does SunOS do if you unmap an already unmapped range, or 0 bytes?

>My suspicion is that the list
>of pages is getting munged or the mechanism behind finding the page is
>missing it somehow and it is actually returning from one of these failed
>cases.

You should probably stick some printk()s into the code to see what's
happening. I'd be surprised if it were getting munged because you wouldn't
get the subtle effects you're seeing. When I wrote the code I tested it
with some thouroughness, but I did assume the basic unmapping mechanism
worked (unmap_page_range()).

>It is clearing the page, so that should help me at least determine
>partly how far I'm getting. (Hmmm, odd that it would clear the page if it
>didn't find it, so perhaps it is finding it and for some wacky reason the
>page isn't getting made invalid.) Oh well, I shall continue to muddle my
>way through for the moment. :)
> Hmmm, spent some more time browsing and I can't even locate where it
>clears the page. Not exactly the most straightforward section of the
>kernel. *sigh* Then again, I'm not a memory free list writing, virtual
>memory handling guru either. Perhaps this is a system feature invoked when
>the entry is invalidated? I'm going to have to brush up on my knowledge of
>x86 virtual memory I guess.

It never clears the page explicitly. It should eventually call
unmap_page_range() which should change the relevent PTEs to invalidate
the pages.

J

Bruno Haible

unread,

Nov 4, 1994, 11:55:41 AM11/4/94

to

Shawn L. Baird <sca...@netcom.com> wrote:
> I'm working on some tricky stuff which involves swapping pages in and out
> by hand using mmap and munmap. Under SunOS it seems fine, but under Linux
> the following problem is occuring. When I munmap my pages, occasionally,
> every now and then, it munmaps and then when I try to access the munmapped
> location I don't hit a segmentation fault (this is not in the stack or
> data space of the process).

The enigma's solution: He was mmaping a range 0x40000000..0x40003fff, then
unmapping the last page 0x40003000..0x40003fff and then accessing this page.
Since there was no memory between 0x40003000 and the stack [there were
no shared libraries in the range 0x6000000..0x7fffffff since the program
was linked with -g, which implies -static], the do_page_fault handler
thought he was extending the stack. Just as if he had allocated
a big array on the stack and was beginning to fill it. Linux thus extended
the stack downto 0x40003000.

There is no bug in mmap() or munmap().

He should set the stacksize limit to 1 GB if he wants to intercept SIGSEGV.

Bruno Haible
hai...@ma2s2.mathematik.uni-karlsruhe.de

Message has been deleted

Miquel van Smoorenburg

unread,

Nov 7, 1994, 2:44:05 AM11/7/94

to

Bruno Haible (hai...@ma2s2.mathematik.uni-karlsruhe.de) wrote:

: I think it would be better if the init(1) process would set the RLIMIT_STACK
: soft limit to 1 GB. This would have effect on all processes, unless it is
: explicitly overwritten by a larger value.

: Any volunteer for this?

Well, I'll add it to SysVinit 1.60. Maybe a line like:

op::options:umask 022,ulimit -s 1024

What do other people think of this? Any suggestions?

: Bruno Haible
: hai...@ma2s2.mathematik.uni-karlsruhe.de

Mike.
--
| Miquel van Smoorenburg -- miq...@drinkel.ow.org |
| "Life's too short to use Windows" |

Bruno Haible

unread,

Nov 5, 1994, 5:34:27 PM11/5/94

to

Brandon S. Allbery <b...@kf8nh.wariat.org> replied to me:
> +---------------

> | Shawn L. Baird <sca...@netcom.com> wrote:
> | > every now and then, it munmaps and then when I try to access the munmapped
> | > location I don't hit a segmentation fault (this is not in the stack or
> | > data space of the process).
> |

> | Since there was no memory between 0x40003000 and the stack [there were
> | no shared libraries in the range 0x6000000..0x7fffffff since the program
> | was linked with -g, which implies -static], the do_page_fault handler
> | thought he was extending the stack. Just as if he had allocated

> +------------->8
>
> This seems excessive. Most systems (seem to) limit the amount tbe stack will
> extend *in a single reference*, so a wild reference won't get you 2GB of
> stack. (680x0 *ixes do this for sure, but there's another reason there: you
> can't restart instructions properly on a 68000, so you have to use a
> non-critical instruction to "probe" for a page fault to extend the stack.
> While this isn't needed on a 68020/68030, most of the *ixes for those
> processors evolved from 68000 *ix.)
>
> Should there be a limit to how much the stack will extend due to a single
> page fault?

I don't think so because it forces the compiler or the program writer to
add dummy references in case a big allocation from the stack is required.

For example, on many 68000 systems, the single-stack-allocation maximum
is 64 KB (or 32 KB?). You can't have a 100x100 array of 'double' on the stack.

On OS/2 the single-stack-allocation maximum is one page. In consequence,
you need to patch gcc so that __builtin_alloca() references each allocated
stack page once.

I think it would be better if the init(1) process would set the RLIMIT_STACK
soft limit to 1 GB. This would have effect on all processes, unless it is
explicitly overwritten by a larger value.

Any volunteer for this?

Bruno Haible
hai...@ma2s2.mathematik.uni-karlsruhe.de

David - Morris

unread,

Nov 11, 1994, 2:10:11 AM11/11/94

to

b...@kf8nh.wariat.org (Brandon S. Allbery) writes:

>Should there be a limit to how much the stack will extend due to a single page
>fault?

NO! At least no limit that isn't easily set by a user. As a
mainframe unix compiler developer, I can tell you from the
brutal experience of my kernel development peers that system
programming / hacking folks may not really understand other
programming problem sets. Real world scientific applications
from problem domains such as computational chemistry, high
energy physics and many more deal with mind boggling hunks
of data. 100s and 1000s of megabytes are not unusual today.

To the extent that the problem can be solved on a given machine,
it is quite reasonable to use the stack for array storage.
For temporary work areas, it avoids fragemenation and provides
very natural continued reuse of the same 'hot' pages of memory.

Array operations in F90 have the potential for requiring many
more temporary storage locations for arrays due to the built in
array language. Without pointers in Fortran77, I did not
observe a significant number of problems (can't actually recall
any) where a wild memory reference looked like a stack extension.
We did see a bunch of problems related to silly stack size limits
in default kernels, etc. Also problems because successive
development builds of the kernel changed the limits breaking test
cases, etc.

We had two linkage architecture considerations which may have
limited the problems associated with false stack growth:
a) We were a non hardware stack machine and dedicated a specific
register as the stack pointer. The only instruction allowed
to extend the stack was a store at 4 bytes displacement from
the stack register.
b) The redzone which detected growth stores was only guaranteed
to be a single page in size --- one must be wary of a
stack increase (legitimate from a program logic perspective)
where the probe store would reach across the dead area and
touch the bss/data area and not be caught
c) ALL compilers (and alloca) handled stack growth greater than
1 page by calling a highly tuned stack check routine which
would grow the stack if possible or abort the process (don't
ask how ... didja ever notice the lack of a stack overflow
in the standard unix signal list?) Part of the theory here
was that a routine with a large stack frame was expected
to do a bunch of work and a few extra instructions in the
call linkage wasn't a big deal.

A similar kind of approach would deal with most unrecognized
unintended stack growth without placing unexpected limits on
user stack size.

Dave Morris