My point wasn't to try and claim that SunOS worked better, but rather
that SunOS worked according to the information in the manuals of both.
>However, in the case of mmap it may be a bug in Linux, as this part is
>not a complete implementation and there have been problems before.
>I would say: write a simpler test case that you can post to show a
>pertinent problem.
Well, I did run it under the debugger and strace several times. Now,
looking at the source code to munmap reveals that it _can_ fail without
generating an error. However, there's no easily feasable way for me to
check these cases out. In any event, gdb reveals the same problem when
tracing my program. That is, I will have gdb maintaining a pointer to
the region which is about to be munmapped. When the location is munmapped
the gdb pointer typically becomes invalid because it's now pointing to a
region of memory which it can't dereference. However, after this
particular munmap the dereference works fine, but the memory page that gdb
points to has been cleared to all zeros (i.e., not even the original data
that was there before munmapping) remains. This pretty clearly indicates
to me that the munmap is proceeding and is not correctly finishing. I
know better than to jump on the "my program doesn't work so it must be a
bug" bandwagon, but in this particular case I think there is sufficient
cause. I used strace to verify that munmap was being called with the
proper parameters and that it was not being called twice, or that it
wasn't being remapped somehow in the interim, etc. It would be hard to
write a much simpler case, as "most" of the munmaps succeed. All of them
point to previously mmapped memory, all of them are munmapping the same
size (one page exactly, the same size at what was mapped). All of them
are mmapping to a fixed address and all of the mmaps are succeeding. For
now I will just increase the cache size and place some detection of the
problem in place. I looked at the munmap code myself, but it will take a
bit of perusal before I fully understand how it works. If it turns out
to be a bug of mine own cause despite all of the above, well, I only
wanted to point out what could be a potential problem in case anybody
with more expertise at the way the kernel works felt like glancing over
what it does.
What case is that?
J
>What case is that?
If the result of page_aligning the length gives a result of zero, it will
return 0 (the same as the return for the success case). If it finds no page
to free it apparently also returns 0. When it completes successfully, it
does return 0 as well. I wonder if it would be valid to have the previous
two cases return -EINVAL, like the case where address and len are put through
basic validity tests at the very beginning. My suspicion is that the list
of pages is getting munged or the mechanism behind finding the page is
missing it somehow and it is actually returning from one of these failed
cases. It is clearing the page, so that should help me at least determine
partly how far I'm getting. (Hmmm, odd that it would clear the page if it
didn't find it, so perhaps it is finding it and for some wacky reason the
page isn't getting made invalid.) Oh well, I shall continue to muddle my
way through for the moment. :)
Hmmm, spent some more time browsing and I can't even locate where it
clears the page. Not exactly the most straightforward section of the
kernel. *sigh* Then again, I'm not a memory free list writing, virtual
memory handling guru either. Perhaps this is a system feature invoked when
the entry is invalidated? I'm going to have to brush up on my knowledge of
x86 virtual memory I guess.
Shawn> If the result of page_aligning the length gives a result of
Shawn> zero, it will return 0 (the same as the return for the success
Shawn> case). If it finds no page to free it apparently also returns
Shawn> 0. When it completes successfully, it does return 0 as well.
Shawn> I wonder if it would be valid to have the previous two cases
Shawn> return -EINVAL, like the case where address and len are put
Shawn> through basic validity tests at the very beginning.
Someone gave me a patch for Linux/68k with which flags this as EINVAL.
>jer...@sour.sw.oz.au (Jeremy Fitzhardinge) writes:
>>In <scarrowC...@netcom.com> sca...@netcom.com (Shawn L. Baird) writes:
>>>Well, I did run it under the debugger and strace several times. Now,
>>>looking at the source code to munmap reveals that it _can_ fail without
>>>generating an error.
>
>>What case is that?
>
>If the result of page_aligning the length gives a result of zero, it will
>return 0 (the same as the return for the success case). If it finds no page
>to free it apparently also returns 0. When it completes successfully, it
>does return 0 as well. I wonder if it would be valid to have the previous
>two cases return -EINVAL, like the case where address and len are put through
>basic validity tests at the very beginning.
Unmapping a non-existant range of memory is a valid operation, so if it
has nothing to do it did what you asked for successfully. If it has to
return an error in this case, what should it return if the range you give
it is partially mapped: a partial error?
What does SunOS do if you unmap an already unmapped range, or 0 bytes?
>My suspicion is that the list
>of pages is getting munged or the mechanism behind finding the page is
>missing it somehow and it is actually returning from one of these failed
>cases.
You should probably stick some printk()s into the code to see what's
happening. I'd be surprised if it were getting munged because you wouldn't
get the subtle effects you're seeing. When I wrote the code I tested it
with some thouroughness, but I did assume the basic unmapping mechanism
worked (unmap_page_range()).
>It is clearing the page, so that should help me at least determine
>partly how far I'm getting. (Hmmm, odd that it would clear the page if it
>didn't find it, so perhaps it is finding it and for some wacky reason the
>page isn't getting made invalid.) Oh well, I shall continue to muddle my
>way through for the moment. :)
> Hmmm, spent some more time browsing and I can't even locate where it
>clears the page. Not exactly the most straightforward section of the
>kernel. *sigh* Then again, I'm not a memory free list writing, virtual
>memory handling guru either. Perhaps this is a system feature invoked when
>the entry is invalidated? I'm going to have to brush up on my knowledge of
>x86 virtual memory I guess.
It never clears the page explicitly. It should eventually call
unmap_page_range() which should change the relevent PTEs to invalidate
the pages.
J
The enigma's solution: He was mmaping a range 0x40000000..0x40003fff, then
unmapping the last page 0x40003000..0x40003fff and then accessing this page.
Since there was no memory between 0x40003000 and the stack [there were
no shared libraries in the range 0x6000000..0x7fffffff since the program
was linked with -g, which implies -static], the do_page_fault handler
thought he was extending the stack. Just as if he had allocated
a big array on the stack and was beginning to fill it. Linux thus extended
the stack downto 0x40003000.
There is no bug in mmap() or munmap().
He should set the stacksize limit to 1 GB if he wants to intercept SIGSEGV.
Bruno Haible
hai...@ma2s2.mathematik.uni-karlsruhe.de
: I think it would be better if the init(1) process would set the RLIMIT_STACK
: soft limit to 1 GB. This would have effect on all processes, unless it is
: explicitly overwritten by a larger value.
: Any volunteer for this?
Well, I'll add it to SysVinit 1.60. Maybe a line like:
op::options:umask 022,ulimit -s 1024
What do other people think of this? Any suggestions?
: Bruno Haible
: hai...@ma2s2.mathematik.uni-karlsruhe.de
Mike.
--
| Miquel van Smoorenburg -- miq...@drinkel.ow.org |
| "Life's too short to use Windows" |
I don't think so because it forces the compiler or the program writer to
add dummy references in case a big allocation from the stack is required.
For example, on many 68000 systems, the single-stack-allocation maximum
is 64 KB (or 32 KB?). You can't have a 100x100 array of 'double' on the stack.
On OS/2 the single-stack-allocation maximum is one page. In consequence,
you need to patch gcc so that __builtin_alloca() references each allocated
stack page once.
I think it would be better if the init(1) process would set the RLIMIT_STACK
soft limit to 1 GB. This would have effect on all processes, unless it is
explicitly overwritten by a larger value.
Any volunteer for this?
Bruno Haible
hai...@ma2s2.mathematik.uni-karlsruhe.de
>Should there be a limit to how much the stack will extend due to a single page
>fault?
NO! At least no limit that isn't easily set by a user. As a
mainframe unix compiler developer, I can tell you from the
brutal experience of my kernel development peers that system
programming / hacking folks may not really understand other
programming problem sets. Real world scientific applications
from problem domains such as computational chemistry, high
energy physics and many more deal with mind boggling hunks
of data. 100s and 1000s of megabytes are not unusual today.
To the extent that the problem can be solved on a given machine,
it is quite reasonable to use the stack for array storage.
For temporary work areas, it avoids fragemenation and provides
very natural continued reuse of the same 'hot' pages of memory.
Array operations in F90 have the potential for requiring many
more temporary storage locations for arrays due to the built in
array language. Without pointers in Fortran77, I did not
observe a significant number of problems (can't actually recall
any) where a wild memory reference looked like a stack extension.
We did see a bunch of problems related to silly stack size limits
in default kernels, etc. Also problems because successive
development builds of the kernel changed the limits breaking test
cases, etc.
We had two linkage architecture considerations which may have
limited the problems associated with false stack growth:
a) We were a non hardware stack machine and dedicated a specific
register as the stack pointer. The only instruction allowed
to extend the stack was a store at 4 bytes displacement from
the stack register.
b) The redzone which detected growth stores was only guaranteed
to be a single page in size --- one must be wary of a
stack increase (legitimate from a program logic perspective)
where the probe store would reach across the dead area and
touch the bss/data area and not be caught
c) ALL compilers (and alloca) handled stack growth greater than
1 page by calling a highly tuned stack check routine which
would grow the stack if possible or abort the process (don't
ask how ... didja ever notice the lack of a stack overflow
in the standard unix signal list?) Part of the theory here
was that a routine with a large stack frame was expected
to do a bunch of work and a few extra instructions in the
call linkage wasn't a big deal.
A similar kind of approach would deal with most unrecognized
unintended stack growth without placing unexpected limits on
user stack size.
Dave Morris