Using 64k pages is even worse. I can't even run '/bin/ls' with a 1MB
stack (ulimit -s 1024; /bin/ls). Hence, it seems new kernels are too
restrictive, rather than the old kernels being too liberal.
I've not tested with any other architectures.
Bisecting, I found that this is the culprit (which is in 2.6.32)
commit fc63cf237078c86214abcb2ee9926d8ad289da9b
Author: Anton Blanchard <an...@samba.org>
exec: setup_arg_pages() fails to return errors
Looking at the patch, it's probably just unmasking a preexisting issue.
The error path for expand_stack() (and others) was modified to:
---
ret = expand_stack(vma, stack_base);
if (ret)
ret = -EFAULT;
out_unlock:
up_write(&mm->mmap_sem);
- return 0;
+ return ret;
}
EXPORT_SYMBOL(setup_arg_pages);
---
So previously expand_stack errors were not returned correctly by
setup_arg_pages, but now they are.
Any clues how to fix this?
Mikey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
> On recent ppc64 kernels, limiting the stack (using 'ulimit -s blah') is
> now more restrictive than it was before. On 2.6.31 with 4k pages I
> could run 'ulimit -s 16; /usr/bin/test' without a problem. Now with
> mainline, even 'ulimit -s 64; /usr/bin/test' gets killed.
>
> Using 64k pages is even worse. I can't even run '/bin/ls' with a 1MB
> stack (ulimit -s 1024; /bin/ls). Hence, it seems new kernels are too
> restrictive, rather than the old kernels being too liberal.
It looks like this is causing it:
#define EXTRA_STACK_VM_PAGES 20 /* random */
...
#ifdef CONFIG_STACK_GROWSUP
stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
#else
stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
#endif
Which got added back in 2005 in a memory overcommit patch. It only took 5
years for us to go back and review that random setting :)
The comment from Andries explains the purpose:
(1) It reserves a reasonable amount of virtual stack space (amount
randomly chosen, no guarantees given) when the process is started, so
that the common utilities will not be killed by segfault on stack
extension.
This explains why 64kB is much worse. The extra stack reserve should be in kB
and we also need to be careful not to ask for more than our rlimit.
Anton
Cool, thanks. The following is based on this and fixes the problem for
me on PPC64 ie. the !CONFIG_STACK_GROWSUP case.
Mikey
[PATCH] Restrict stack space reservation to rlimit
When reserving stack space for a new process, make sure we're not
attempting to allocate more than rlimit allows.
Also, reserve the same stack size independent of page size.
This fixes a bug unmasked by fc63cf237078c86214abcb2ee9926d8ad289da9b
Signed-off-by: Michael Neuling <mi...@neuling.org>
Cc: Anton Blanchard <an...@samba.org>
Cc: sta...@kernel.org
---
fs/exec.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
Index: clone1/fs/exec.c
===================================================================
--- clone1.orig/fs/exec.c
+++ clone1/fs/exec.c
@@ -554,7 +554,7 @@ static int shift_arg_pages(struct vm_are
return 0;
}
-#define EXTRA_STACK_VM_PAGES 20 /* random */
+#define EXTRA_STACK_VM_SIZE 81920UL /* randomly 20 4K pages */
/*
* Finalizes the stack vm_area_struct. The flags and permissions are updated,
@@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
goto out_unlock;
}
+ stack_base = min(EXTRA_STACK_VM_SIZE,
+ current->signal->rlim[RLIMIT_STACK].rlim_cur) -
+ PAGE_SIZE;
#ifdef CONFIG_STACK_GROWSUP
- stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_end + stack_base;
#else
- stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_start - stack_base;
#endif
ret = expand_stack(vma, stack_base);
if (ret)
> Cool, thanks. The following is based on this and fixes the problem for
> me on PPC64 ie. the !CONFIG_STACK_GROWSUP case.
Thanks! Seeing the original setting of EXTRA_STACK_VM_PAGES is more or
less random, I wonder if we should round EXTRA_STACK_VM_SIZE up to 128kB
(or even down to 64kB) so it operates better with > 4kB pages.
But in the end its probably of little use for the default OVERCOMMIT_GUESS
setting, so the main thing is we dont terminate processes incorrectly.
Acked-by: Anton Blanchard <an...@samba.org>
Anton
Also, reserve the same stack size independent of page size.
This fixes a bug cause by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba
"mm: variable length argument support" and unmasked by
fc63cf237078c86214abcb2ee9926d8ad289da9b
"exec: setup_arg_pages() fails to return errors".
Signed-off-by: Michael Neuling <mi...@neuling.org>
Cc: Anton Blanchard <an...@samba.org>
Cc: sta...@kernel.org
---
Update commit message to include patch name and SHA1 of related
patches.
> apkm, linus: this or something like it needs to go into 2.6.33 (& 32) to
> fix 'ulimit -s'.
"fix ulimit -s" is too cool explanation ;-)
we are not ESPer. please consider to provide what bug is exist.
> Mikey
>
> [PATCH] Restrict stack space reservation to rlimit
>
> When reserving stack space for a new process, make sure we're not
> attempting to allocate more than rlimit allows.
>
> Also, reserve the same stack size independent of page size.
Why do we need page size independent stack size? It seems to have
compatibility breaking risk.
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> Why do we need page size independent stack size? It seems to have
> compatibility breaking risk.
I don't think so. The current behaviour is clearly wrong, we dont need a
16x larger stack just because you went from a 4kB to a 64kB base page
size. The user application stack usage is the same in both cases.
Anton
I didn't discuss which behavior is better. Michael said he want to apply
his patch to 2.6.32 & 2.6.33. stable tree never accept the breaking
compatibility patch.
Your answer doesn't explain why can't we wait it until next merge window.
btw, personally, I like page size indepent stack size. but I'm not sure
why making stack size independency is related to bug fix.
I tend to agree.
Below is just the bug fix to limit the reservation size based rlimit.
We still reserve different stack sizes based on the page size as
before (unless we hit rlimit of course).
Mikey
Restrict stack space reservation to rlimit
When reserving stack space for a new process, make sure we're not
attempting to allocate more than rlimit allows.
This fixes a bug cause by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba
"mm: variable length argument support" and unmasked by
fc63cf237078c86214abcb2ee9926d8ad289da9b
"exec: setup_arg_pages() fails to return errors".
Signed-off-by: Michael Neuling <mi...@neuling.org>
Cc: Anton Blanchard <an...@samba.org>
Cc: sta...@kernel.org
---
fs/exec.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
Index: linux-2.6-ozlabs/fs/exec.c
===================================================================
--- linux-2.6-ozlabs.orig/fs/exec.c
+++ linux-2.6-ozlabs/fs/exec.c
@@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
goto out_unlock;
}
+ stack_base = min(EXTRA_STACK_VM_PAGES * PAGE_SIZE,
+ current->signal->rlim[RLIMIT_STACK].rlim_cur -
+ PAGE_SIZE);
#ifdef CONFIG_STACK_GROWSUP
- stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_end + stack_base;
#else
- stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_start - stack_base;
#endif
ret = expand_stack(vma, stack_base);
if (ret)
--
> I didn't discuss which behavior is better. Michael said he want to apply
> his patch to 2.6.32 & 2.6.33. stable tree never accept the breaking
> compatibility patch.
>
> Your answer doesn't explain why can't we wait it until next merge window.
>
>
> btw, personally, I like page size indepent stack size. but I'm not sure
> why making stack size independency is related to bug fix.
OK sorry, I misunderstood your initial mail. I agree fixing the bit that
regressed in 2.6.32 is the most important thing. The difference in page size is
clearly wrong but since it isn't a regression we could probably live with it
until 2.6.34
Anton
Thanks.
I agree your patch in almost part. but I have very few requests.
> Mikey
>
> Restrict stack space reservation to rlimit
>
> When reserving stack space for a new process, make sure we're not
> attempting to allocate more than rlimit allows.
>
> This fixes a bug cause by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba
> "mm: variable length argument support" and unmasked by
> fc63cf237078c86214abcb2ee9926d8ad289da9b
> "exec: setup_arg_pages() fails to return errors".
Your initial mail have following problem use-case. please append it
into the patch description.
On recent ppc64 kernels, limiting the stack (using 'ulimit -s blah') is
now more restrictive than it was before. On 2.6.31 with 4k pages I
could run 'ulimit -s 16; /usr/bin/test' without a problem. Now with
mainline, even 'ulimit -s 64; /usr/bin/test' gets killed.
>
> Signed-off-by: Michael Neuling <mi...@neuling.org>
> Cc: Anton Blanchard <an...@samba.org>
> Cc: sta...@kernel.org
> ---
> fs/exec.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> Index: linux-2.6-ozlabs/fs/exec.c
> ===================================================================
> --- linux-2.6-ozlabs.orig/fs/exec.c
> +++ linux-2.6-ozlabs/fs/exec.c
> @@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
> goto out_unlock;
> }
>
> + stack_base = min(EXTRA_STACK_VM_PAGES * PAGE_SIZE,
> + current->signal->rlim[RLIMIT_STACK].rlim_cur -
> + PAGE_SIZE);
This line is a bit unclear why "- PAGE_SIZE" is necessary.
personally, I like following likes explicit comments.
stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
stack_lim = ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur);
/* Initial stack must not cause stack overflow. */
if (stack_expand + PAGE_SIZE > stack_lim)
stack_expand = stack_lim - PAGE_SIZE;
note: accessing rlim_cur require ACCESS_ONCE.
Thought?
thanks!
It's better to use the helper function: rlimit().
AFAIK, stable tree doesn't have rlimit(). but yes, making two patch
(for mainline and for stable) is good opinion.
Ok, I'll add this info in.
>
> >
> > Signed-off-by: Michael Neuling <mi...@neuling.org>
> > Cc: Anton Blanchard <an...@samba.org>
> > Cc: sta...@kernel.org
> > ---
> > fs/exec.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > Index: linux-2.6-ozlabs/fs/exec.c
> > ===================================================================
> > --- linux-2.6-ozlabs.orig/fs/exec.c
> > +++ linux-2.6-ozlabs/fs/exec.c
> > @@ -627,10 +627,13 @@ int setup_arg_pages(struct linux_binprm
> > goto out_unlock;
> > }
> >
> > + stack_base = min(EXTRA_STACK_VM_PAGES * PAGE_SIZE,
> > + current->signal->rlim[RLIMIT_STACK].rlim_cur -
> > + PAGE_SIZE);
>
> This line is a bit unclear why "- PAGE_SIZE" is necessary.
This is because the stack is already 1 page in size. I'm going to
change that code to make it clearer... hopefully :-)
> personally, I like following likes explicit comments.
>
> stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> stack_lim = ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur);
>
> /* Initial stack must not cause stack overflow. */
> if (stack_expand + PAGE_SIZE > stack_lim)
> stack_expand = stack_lim - PAGE_SIZE;
>
> note: accessing rlim_cur require ACCESS_ONCE.
>
>
> Thought?
Thanks, looks better/clearer to me too. I'll change, new patch coming....
Mikey
This fixes a bug caused by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba "mm:
variable length argument support" and unmasked by
fc63cf237078c86214abcb2ee9926d8ad289da9b "exec: setup_arg_pages() fails
to return errors". This bug means when limiting the stack to less the
20*PAGE_SIZE (eg. 80K on 4K pages or 'ulimit -s 79') all processes will
be killed before they start. This is particularly bad with 64K pages,
where a ulimit below 1280K will kill every process.
Signed-off-by: Michael Neuling <mi...@neuling.org>
Cc: sta...@kernel.org
---
Attempts to answer comments from Kosaki Motohiro.
Tested on PPC only, hence !CONFIG_STACK_GROWSUP. Someone should
probably ACK for an arch with CONFIG_STACK_GROWSUP.
As noted, stable needs the same patch, but 2.6.32 doesn't have the
rlimit() helper.
fs/exec.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
Index: linux-2.6-ozlabs/fs/exec.c
===================================================================
--- linux-2.6-ozlabs.orig/fs/exec.c
+++ linux-2.6-ozlabs/fs/exec.c
@@ -555,6 +555,7 @@ static int shift_arg_pages(struct vm_are
}
#define EXTRA_STACK_VM_PAGES 20 /* random */
+#define ALIGN_DOWN(addr,size) ((addr)&(~((size)-1)))
/*
* Finalizes the stack vm_area_struct. The flags and permissions are updated,
@@ -570,7 +571,7 @@ int setup_arg_pages(struct linux_binprm
struct vm_area_struct *vma = bprm->vma;
struct vm_area_struct *prev = NULL;
unsigned long vm_flags;
- unsigned long stack_base;
+ unsigned long stack_base, stack_expand, stack_expand_lim, stack_size;
#ifdef CONFIG_STACK_GROWSUP
/* Limit stack size to 1GB */
@@ -627,10 +628,24 @@ int setup_arg_pages(struct linux_binprm
goto out_unlock;
}
+ stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_size = vma->vm_end - vma->vm_start;
+ if (rlimit(RLIMIT_STACK) < stack_size)
+ stack_expand_lim = 0; /* don't shrick the stack */
+ else
+ /*
+ * Align this down to a page boundary as expand_stack
+ * will align it up.
+ */
+ stack_expand_lim = ALIGN_DOWN(rlimit(RLIMIT_STACK) - stack_size,
+ PAGE_SIZE);
+ /* Initial stack must not cause stack overflow. */
+ if (stack_expand > stack_expand_lim)
+ stack_expand = stack_expand_lim;
#ifdef CONFIG_STACK_GROWSUP
- stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_end + stack_expand;
#else
- stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_base = vma->vm_start - stack_expand;
#endif
ret = expand_stack(vma, stack_base);
if (ret)
Umm.. It looks correct. but the nested complex if statement seems a bit ugly.
Instead, How about following?
note: it's untested.
===============
From: Michael Neuling <mi...@neuling.org>
Subject: Restrict initial stack space expansion to rlimit
When reserving stack space for a new process, make sure we're not
attempting to expand the stack by more than rlimit allows.
This fixes a bug caused by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba "mm:
variable length argument support" and unmasked by
fc63cf237078c86214abcb2ee9926d8ad289da9b "exec: setup_arg_pages() fails
to return errors". This bug means when limiting the stack to less the
20*PAGE_SIZE (eg. 80K on 4K pages or 'ulimit -s 79') all processes will
be killed before they start. This is particularly bad with 64K pages,
where a ulimit below 1280K will kill every process.
[kosaki....@jp.fujitsu.com: cleanups]
Signed-off-by: Michael Neuling <mi...@neuling.org>
Signed-off-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
Cc: sta...@kernel.org
---
Attempts to answer comments from Kosaki Motohiro.
Tested on PPC only, hence !CONFIG_STACK_GROWSUP. Someone should
probably ACK for an arch with CONFIG_STACK_GROWSUP.
As noted, stable needs the same patch, but 2.6.32 doesn't have the
rlimit() helper.
diff --git a/fs/exec.c b/fs/exec.c
index 6f7fb0c..325bad4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -573,6 +573,9 @@ int setup_arg_pages(struct linux_binprm *bprm,
struct vm_area_struct *prev = NULL;
unsigned long vm_flags;
unsigned long stack_base;
+ unsigned long stack_size;
+ unsigned long stack_expand;
+ unsigned long rlim_stack;
#ifdef CONFIG_STACK_GROWSUP
/* Limit stack size to 1GB */
@@ -629,10 +632,27 @@ int setup_arg_pages(struct linux_binprm *bprm,
goto out_unlock;
}
+ stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_size = vma->vm_end - vma->vm_start;
+ /*
+ * Align this down to a page boundary as expand_stack
+ * will align it up.
+ */
+ rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
+ if (rlim_stack < stack_size)
+ rlim_stack = stack_size;
#ifdef CONFIG_STACK_GROWSUP
- stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ if (stack_size + stack_expand > rlim_stack) {
+ stack_base = vma->vm_start + rlim_stack;
+ } else {
+ stack_base = vma->vm_end + stack_expand;
+ }
#else
- stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ if (stack_size + stack_expand > rlim_stack) {
+ stack_base = vma->vm_end - rlim_stack;
+ } else {
+ stack_base = vma->vm_start - stack_expand;
+ }
I don't like the duplicated code in the #ifdef/else but I can live with it.
> note: it's untested.
Works for me on ppc64 with 4k and 64k pages. Thanks!
I'd still like someone with a CONFIG_STACK_GROWSUP arch to test/ACK it
as well.
Mikey
> > > + /* Initial stack must not cause stack overflow. */
> > > + if (stack_expand > stack_expand_lim)
> > > + stack_expand = stack_expand_lim;
> > > #ifdef CONFIG_STACK_GROWSUP
> > > - stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> > > + stack_base = vma->vm_end + stack_expand;
> > > #else
> > > - stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
> > > + stack_base = vma->vm_start - stack_expand;
> > > #endif
> > > ret = expand_stack(vma, stack_base);
> > > if (ret)
> >
> > Umm.. It looks correct. but the nested complex if statement seems a bit ugly.
> > Instead, How about following?
>
> I don't like the duplicated code in the #ifdef/else but I can live with it.
cleanup the cleanup:
--- a/fs/exec.c~fs-execc-restrict-initial-stack-space-expansion-to-rlimit-cleanup-cleanup
+++ a/fs/exec.c
@@ -637,20 +637,17 @@ int setup_arg_pages(struct linux_binprm
* will align it up.
*/
rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
- if (rlim_stack < stack_size)
- rlim_stack = stack_size;
+ rlim_stack = min(rlim_stack, stack_size);
#ifdef CONFIG_STACK_GROWSUP
- if (stack_size + stack_expand > rlim_stack) {
+ if (stack_size + stack_expand > rlim_stack)
stack_base = vma->vm_start + rlim_stack;
- } else {
+ else
stack_base = vma->vm_end + stack_expand;
- }
#else
- if (stack_size + stack_expand > rlim_stack) {
+ if (stack_size + stack_expand > rlim_stack)
stack_base = vma->vm_end - rlim_stack;
- } else {
+ else
stack_base = vma->vm_start - stack_expand;
- }
#endif
ret = expand_stack(vma, stack_base);
if (ret)
_
> > note: it's untested.
>
> Works for me on ppc64 with 4k and 64k pages. Thanks!
>
> I'd still like someone with a CONFIG_STACK_GROWSUP arch to test/ACK it
> as well.
There's only one CONFIG_GROWSUP arch - parisc.
Guys, here's the rolled-up patch. Could someone please test it on
parisc?
err, I'm not sure what one needs to do to test it, actually.
Presumably it involves setting an unusual `ulimit -s'. Can someone
please suggest a test plan?
From: Michael Neuling <mi...@neuling.org>
When reserving stack space for a new process, make sure we're not
attempting to expand the stack by more than rlimit allows.
This fixes a bug caused by b6a2fea39318e43fee84fa7b0b90d68bed92d2ba ("mm:
variable length argument support") and unmasked by
fc63cf237078c86214abcb2ee9926d8ad289da9b ("exec: setup_arg_pages() fails
to return errors").
This bug means that when limiting the stack to less the 20*PAGE_SIZE (eg.
80K on 4K pages or 'ulimit -s 79') all processes will be killed before
they start. This is particularly bad with 64K pages, where a ulimit below
1280K will kill every process.
Signed-off-by: Michael Neuling <mi...@neuling.org>
Cc: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
Cc: Americo Wang <xiyou.w...@gmail.com>
Cc: Anton Blanchard <an...@samba.org>
Cc: Oleg Nesterov <ol...@redhat.com>
Cc: James Morris <jmo...@namei.org>
Cc: Ingo Molnar <mi...@elte.hu>
Cc: Serge Hallyn <se...@us.ibm.com>
Cc: Benjamin Herrenschmidt <be...@kernel.crashing.org>
Cc: <sta...@kernel.org>
fs/exec.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff -puN fs/exec.c~fs-execc-restrict-initial-stack-space-expansion-to-rlimit fs/exec.c
--- a/fs/exec.c~fs-execc-restrict-initial-stack-space-expansion-to-rlimit
+++ a/fs/exec.c
@@ -571,6 +571,9 @@ int setup_arg_pages(struct linux_binprm
struct vm_area_struct *prev = NULL;
unsigned long vm_flags;
unsigned long stack_base;
+ unsigned long stack_size;
+ unsigned long stack_expand;
+ unsigned long rlim_stack;
#ifdef CONFIG_STACK_GROWSUP
/* Limit stack size to 1GB */
@@ -627,10 +630,24 @@ int setup_arg_pages(struct linux_binprm
goto out_unlock;
}
+ stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_size = vma->vm_end - vma->vm_start;
+ /*
+ * Align this down to a page boundary as expand_stack
+ * will align it up.
+ */
+ rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
+ rlim_stack = min(rlim_stack, stack_size);
#ifdef CONFIG_STACK_GROWSUP
- stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ if (stack_size + stack_expand > rlim_stack)
+ stack_base = vma->vm_start + rlim_stack;
+ else
+ stack_base = vma->vm_end + stack_expand;
#else
- stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ if (stack_size + stack_expand > rlim_stack)
+ stack_base = vma->vm_end - rlim_stack;
+ else
+ stack_base = vma->vm_start - stack_expand;
#endif
ret = expand_stack(vma, stack_base);
if (ret)
_
FYI the rolled up patch still works fine on PPC64. Thanks.
> Could someone please test it on parisc?
>
> err, I'm not sure what one needs to do to test it, actually.
> Presumably it involves setting an unusual `ulimit -s'. Can someone
> please suggest a test plan?
How about doing:
'ulimit -s 15; ls'
before and after the patch is applied. Before it's applied, 'ls' should
be killed. After the patch is applied, 'ls' should no longer be killed.
I'm suggesting a stack limit of 15KB since it's small enough to trigger
20*PAGE_SIZE. Also 15KB not a multiple of PAGE_SIZE, which is a trickier
case to handle correctly with this code.
4K pages on parisc should be fine to test with.
Mikey
I did.
> How about doing:
> 'ulimit -s 15; ls'
> before and after the patch is applied. Before it's applied, 'ls' should
> be killed. After the patch is applied, 'ls' should no longer be killed.
>
> I'm suggesting a stack limit of 15KB since it's small enough to trigger
> 20*PAGE_SIZE. Also 15KB not a multiple of PAGE_SIZE, which is a trickier
> case to handle correctly with this code.
>
> 4K pages on parisc should be fine to test with.
Mikey, thanks for the suggested test plan.
I'm not sure if your patch does it correct for parisc/stack-grows-up-case.
I tested your patch on a 4k pages kernel:
root@c3000:~# uname -a
Linux c3000 2.6.33-rc7-32bit #221 Tue Feb 9 23:17:06 CET 2010 parisc GNU/Linux
Without your patch:
root@c3000:~# ulimit -s 15; ls
Killed
-> correct.
With your patch:
root@c3000:~# ulimit -s 15; ls
Killed
_or_:
root@c3000:~# ulimit -s 15; ls
Segmentation fault
-> ??
Any idea?
Helge
>> + rlim_stack = rlimit(RLIMIT_STACK)& PAGE_MASK;
>> + rlim_stack = min(rlim_stack, stack_size);
>> #ifdef CONFIG_STACK_GROWSUP
>> - stack_base = vma->vm_end + EXTRA_STACK_VM_PAGES * PAGE_SIZE;
>> + if (stack_size + stack_expand> rlim_stack)
>> + stack_base = vma->vm_start + rlim_stack;
>> + else
>> + stack_base = vma->vm_end + stack_expand;
>> #else
>> - stack_base = vma->vm_start - EXTRA_STACK_VM_PAGES * PAGE_SIZE;
>> + if (stack_size + stack_expand> rlim_stack)
>> + stack_base = vma->vm_end - rlim_stack;
>> + else
>> + stack_base = vma->vm_start - stack_expand;
>> #endif
>> ret = expand_stack(vma, stack_base);
>> if (ret)
My x86_64 box also makes segmentation fault. I think "ulimit -s 15" is too small stack for ls.
"ulimit -s 27; ls " wroks perfectly fine.
Arrh. I asked Helge offline earlier to check what use to work on parisc
on 2.6.31.
I guess PPC has a nice clean non-bloated ABI :-D
Mikey
In message <20100210141016...@jp.fujitsu.com> you wrote:
Arrh. I asked Helge offline earlier to check what use to work on parisc
on 2.6.31.
I guess PPC has a nice clean non-bloated ABI :-D
Mikey
Hi Mikey,
I tested again, and it works for me with "ulimit -s 27" as well (on a 4k, 32bit kernel).
Still, I'm not 100% sure if your patch is correct.
Anyway, it seems to work.
But what makes me wonder is, why EXTRA_STACK_VM_PAGES is defined in pages at all.
You wrote in your patch description:
> This bug means that when limiting the stack to less the 20*PAGE_SIZE (eg.
> 80K on 4K pages or 'ulimit -s 79') all processes will be killed before
> they start. This is particularly bad with 64K pages, where a ulimit below
> 1280K will kill every process.
Wouldn't it make sense to define and use EXTRA_STACK_VM_SIZE instead (e.g. as 20*4096 = 80k)?
This extra stack reservation should IMHO be independend of the actual kernel page size.
Helge
Thanks for retesting
Did "ulimit -s 27" fail before you applied?
> Anyway, it seems to work.
>
> But what makes me wonder is, why EXTRA_STACK_VM_PAGES is defined in pages at
all.
> You wrote in your patch description:
> > This bug means that when limiting the stack to less the 20*PAGE_SIZE (eg.
> > 80K on 4K pages or 'ulimit -s 79') all processes will be killed before
> > they start. This is particularly bad with 64K pages, where a ulimit below
> > 1280K will kill every process.
>
> Wouldn't it make sense to define and use EXTRA_STACK_VM_SIZE instead
> (e.g. as 20*4096 = 80k)? This extra stack reservation should IMHO be
> independend of the actual kernel page size.
If you look back through this thread, that has already been noted but
it's a separate issue to this bug, so that change will be deferred till
2.6.34.
Mikey
This creates this initial stack independent of the PAGE_SIZE.
It also bumps up the number of 4k pages allocated from 20 to 32, to
align with 64K page systems.
Signed-off-by: Michael Neuling <mi...@neuling.org>
---
This is the second half of my original patch. This can be targeted for
2.6.34 as it's just a cleanup.
Tested on PPC64 with 4k and 64k pages.
fs/exec.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
Index: linux-2.6-ozlabs/fs/exec.c
===================================================================
--- linux-2.6-ozlabs.orig/fs/exec.c
+++ linux-2.6-ozlabs/fs/exec.c
@@ -554,8 +554,6 @@ static int shift_arg_pages(struct vm_are
return 0;
}
-#define EXTRA_STACK_VM_PAGES 20 /* random */
-
/*
* Finalizes the stack vm_area_struct. The flags and permissions are updated,
* the stack is optionally relocated, and some extra space is added.
@@ -630,7 +628,7 @@ int setup_arg_pages(struct linux_binprm
goto out_unlock;
}
- stack_expand = EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+ stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages */
stack_size = vma->vm_end - vma->vm_start;
/*
* Align this down to a page boundary as expand_stack
I don't think this is enough explanation. In past mail, you described
why page size dependency is harmful. I hope you add it into the patch
description.
IOW, we don't need to change the unnecessary-but-non-harmful behavior.
>
> This creates this initial stack independent of the PAGE_SIZE.
>
> It also bumps up the number of 4k pages allocated from 20 to 32, to
> align with 64K page systems.
Why do we need page-aligning? Do you mean this code doesn't works on
128K (or more larger) page systems?
I don't think it's harmful, it's just irrelevant. Stack size is
independent of page size.
> IOW, we don't need to change the unnecessary-but-non-harmful behavior.
>
> >
> > This creates this initial stack independent of the PAGE_SIZE.
> >
> > It also bumps up the number of 4k pages allocated from 20 to 32, to
> > align with 64K page systems.
>
> Why do we need page-aligning? Do you mean this code doesn't works on
> 128K (or more larger) page systems?
If the "random" setting is not a common multiple of the 4k and 64k
pages, they will end up getting aligned differently, hence causing what
we are trying to avoid in the first place with this patch.
I should probably add this as a comment in the code comment?
Mikey
ok.
> > IOW, we don't need to change the unnecessary-but-non-harmful behavior.
> >
> > >
> > > This creates this initial stack independent of the PAGE_SIZE.
> > >
> > > It also bumps up the number of 4k pages allocated from 20 to 32, to
> > > align with 64K page systems.
> >
> > Why do we need page-aligning? Do you mean this code doesn't works on
> > 128K (or more larger) page systems?
>
> If the "random" setting is not a common multiple of the 4k and 64k
> pages, they will end up getting aligned differently, hence causing what
> we are trying to avoid in the first place with this patch.
I see. ok,
Reviewed-by: KOSAKI Motohiro <kosaki....@jp.fujitsu.com>
>
> I should probably add this as a comment in the code comment?
probably, It's not big matter.
Thanks :)