[PATCH] update fix X86_64 procfs provide stack information for threads

Stefani Seibold

unread,

Nov 3, 2009, 2:40:02 AM11/3/09

to

This patch fix two issues in the procfs stack information on X86_64
linux.

The 32 bit loader compat_do_execve did not store stack start (this was
figured out by alexey).

The stack information on a X64_64 kernel always show 0 kbyte stack
usage, because of a miss implemented KSTK_ESP macro which always return
-1. The new implementation returns now the right value.

The patch is against 2.6.32-rc5-git5.

Andrew would you so kind to apply this patch?

Greetings,
Stefani

Signed-off-by: Stefani Seibold <ste...@seibold.net>
---
arch/x86/include/asm/processor.h | 8 +++++++-
arch/x86/kernel/process_64.c | 8 ++++++++
fs/compat.c | 2 ++
3 files changed, 17 insertions(+), 1 deletion(-)

--- linux-2.6.32-rc5/fs/compat.c 2009-10-16 02:41:50.000000000 +0200
+++ linux-2.6.32-rc5.new/fs/compat.c 2009-11-02 09:00:52.871909633 +0100
@@ -1532,6 +1532,8 @@
if (retval < 0)
goto out;

+ current->stack_start = current->mm->start_stack;
+
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
--- linux-2.6.32-rc5/arch/x86/include/asm/processor.h 2009-10-16 02:41:50.000000000 +0200
+++ linux-2.6.32-rc5.new/arch/x86/include/asm/processor.h 2009-11-02 10:39:47.177909657 +0100
@@ -1000,7 +1001,13 @@
#define thread_saved_pc(t) (*(unsigned long *)((t)->thread.sp - 8))

#define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1)
-#define KSTK_ESP(tsk) -1 /* sorry. doesn't work for syscall. */
+
+#ifdef CONFIG_IA32_EMULATION
+extern unsigned long KSTK_ESP(struct task_struct *task);
+#else
+#define KSTK_ESP(task) ((task)->thread.usersp)
+#endif
+
#endif /* CONFIG_X86_64 */

extern void start_thread(struct pt_regs *regs, unsigned long new_ip,
--- linux-2.6.32-rc5/arch/x86/kernel/process_64.c 2009-10-16 02:41:50.000000000 +0200
+++ linux-2.6.32-rc5.new/arch/x86/kernel/process_64.c 2009-11-02 10:48:23.614936810 +0100
@@ -664,3 +669,11 @@
return do_arch_prctl(current, code, addr);
}

+#ifdef CONFIG_IA32_EMULATION
+unsigned long KSTK_ESP(struct task_struct *task)
+{
+ return (test_tsk_thread_flag(task, TIF_IA32)) ? \
+ (task_pt_regs(task)->sp) : \
+ ((task)->thread.usersp);
+}
+#endif

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Ingo Molnar

unread,

Nov 3, 2009, 3:30:03 AM11/3/09

to

That's quite ugly. The KSTK_ESP() function should be unconditional and
the #ifdef should be eliminated. If CONFIG_IA32_EMULATION is turned off
(whichis rare) then TIF_IA32 wont be set so the function should work
fine.

Thanks,

Ingo

Stefani Seibold

unread,

Nov 3, 2009, 4:10:02 AM11/3/09

to

Hi Ingo,

come on, thats not fair. This would be not the only piece of ugly code
in the x86_64 implementation. It is much better than the previous hack
where KSTK_ESP always returns a wrong hard coded value. That is really
ugly!!!!

It took me 6 hours to analyze the x64_64 code, most of them written in
assembler. I think it is a first solution, which makes the procfs stack
information work on this architecture and that was the goal.

I will remove the #ifdef's and repost the patch. Please accept this
patch, which make the KSTP_ESP thing on x86_64 better as before.

I am not a x64_64 bit hacker, i have not the knowledge to make a perfect
solution for this architecture. Also i am not a full time kernel hacker,
i have customers who wait for their projects.

Greeting,
Stefani

Ingo Molnar

unread,

Nov 3, 2009, 1:20:02 PM11/3/09

to

* Stefani Seibold <ste...@seibold.net> wrote:

> come on, thats not fair. [...]

(Hacking the kernel is rarely 'fair' in the way you seem to be defining
it.)

> [...] This would be not the only piece of ugly code in the x86_64

> implementation. It is much better than the previous hack where
> KSTK_ESP always returns a wrong hard coded value. That is really
> ugly!!!!
>
> It took me 6 hours to analyze the x64_64 code, most of them written in
> assembler. I think it is a first solution, which makes the procfs
> stack information work on this architecture and that was the goal.
>
> I will remove the #ifdef's and repost the patch. Please accept this
> patch, which make the KSTP_ESP thing on x86_64 better as before.
>
> I am not a x64_64 bit hacker, i have not the knowledge to make a
> perfect solution for this architecture. Also i am not a full time
> kernel hacker, i have customers who wait for their projects.

The cleanup isnt really that hard at all, writng your mail probably took
more time. I didnt see you complain when we merged your original procfs
patch that introduced/exposed this whole issue:

d899bf7: procfs: provide stack information for threads

Fixing followup issues is standard part of the work-with-upstream
fairness equation.

Ingo

Andi Kleen

unread,

Nov 4, 2009, 6:20:01 AM11/4/09

to

Stefani Seibold <ste...@seibold.net> writes:
>
> +#ifdef CONFIG_IA32_EMULATION
> +unsigned long KSTK_ESP(struct task_struct *task)
> +{
> + return (test_tsk_thread_flag(task, TIF_IA32)) ? \
> + (task_pt_regs(task)->sp) : \
> + ((task)->thread.usersp);

Usersp is only set for system calls, but not when the process is blocked
in a interrupt.

In general if you really want a reliable implementation of this
you would really need to stop the task and grab the stack pointer;
otherwise it can be arbitarily outdated anyways.

-Andi

--
a...@linux.intel.com -- Speaking for myself only.

Stefani Seibold

unread,

Nov 4, 2009, 7:00:02 AM11/4/09

to

Am Mittwoch, den 04.11.2009, 12:17 +0100 schrieb Andi Kleen:
> Stefani Seibold <ste...@seibold.net> writes:
> >
> > +#ifdef CONFIG_IA32_EMULATION
> > +unsigned long KSTK_ESP(struct task_struct *task)
> > +{
> > + return (test_tsk_thread_flag(task, TIF_IA32)) ? \
> > + (task_pt_regs(task)->sp) : \
> > + ((task)->thread.usersp);
>
> Usersp is only set for system calls, but not when the process is blocked
> in a interrupt.
>
> In general if you really want a reliable implementation of this
> you would really need to stop the task and grab the stack pointer;
> otherwise it can be arbitarily outdated anyways.
>

This is true, but i think it is better to get an outdated value than a
complete wrong value like -1.

The truth is that KSTK_ESP always return an outdated value on a multi
core system if the process never do a system call.

But we can work on the next step and try to implement it better ;-)

Question: is task_pt_regs(task)->sp set in 64 bit mode when the process
is blocked in an interrupt? If true, we can add two additional assembly
instruction to the system call handler and store the stack pointer into
this. Than KSTK_ESP wil be again a simple macro like

#define KST_ESP(task) task_pt_regs(task)->sp

The drawback is that this will cost a litte bit performance for a litte
bit more accuracy.

Stefani

Andi Kleen

unread,

Nov 4, 2009, 7:10:01 AM11/4/09

to

> This is true, but i think it is better to get an outdated value than a
> complete wrong value like -1.

-1 means "I don't know". I don't think "completely wrong"
is the correct term to describe that.

> The truth is that KSTK_ESP always return an outdated value on a multi
> core system if the process never do a system call.

I think not supporting updates on interrupts at least is very poor.
Unfortunately there's no good way fast path way to detect this I know of
(that is why I originally added -1 here)

> Question: is task_pt_regs(task)->sp set in 64 bit mode when the process
> is blocked in an interrupt? If true, we can add two additional assembly
> instruction to the system call handler and store the stack pointer into
> this. Than KSTK_ESP wil be again a simple macro like

You want to add instructions to one of the hottest kernel paths
for this hyper-obscure application? Bad idea.

> The drawback is that this will cost a litte bit performance for a litte
> bit more accuracy.

As far as I can figure out this whole proc hack is never accurate
anyways, because it can report arbitarily outdated (or completely bogus
if the process never did any system calls/interrupts) information.

My recommendation would be to just deprecate this proc field
and if anyone really wants that information they can use
a trivial ptrace() based user program.

-Andi

--
a...@linux.intel.com -- Speaking for myself only.

Stefani Seibold

unread,

Nov 4, 2009, 7:30:02 AM11/4/09

to

Am Mittwoch, den 04.11.2009, 13:00 +0100 schrieb Andi Kleen:
> > This is true, but i think it is better to get an outdated value than a
> > complete wrong value like -1.
>
> -1 means "I don't know". I don't think "completely wrong"
> is the correct term to describe that.
>
> > The truth is that KSTK_ESP always return an outdated value on a multi
> > core system if the process never do a system call.
>
> I think not supporting updates on interrupts at least is very poor.
> Unfortunately there's no good way fast path way to detect this I know of
> (that is why I originally added -1 here)
>

I am sorry, i did not know that was your code. But anyway.

>
> > Question: is task_pt_regs(task)->sp set in 64 bit mode when the process
> > is blocked in an interrupt? If true, we can add two additional assembly
> > instruction to the system call handler and store the stack pointer into
> > this. Than KSTK_ESP wil be again a simple macro like
>
> You want to add instructions to one of the hottest kernel paths
> for this hyper-obscure application? Bad idea.
>

You complain that the the value is outdated and i told you how you can
get a more accuracy value. I agree that this is bad idea.

> My recommendation would be to just deprecate this proc field
> and if anyone really wants that information they can use
> a trivial ptrace() based user program.
>

I spend a lot of time doing this, it would be nice to give it a change a
fix the KSTK_ESP macro. It will be not only used by my code. It would be
great if we can do this together.

You have the knowledge, so i will ask my question again:
Is task_pt_regs(task)->sp set in 64 bit mode when the process is block
in an interrupt?
Is there a way to detected if a process is blocked by an interrupt?

If you answer both with true than i can fix KSTK_ESP without performance
penalty for the rest of the system.

Stefani

Stefani Seibold

unread,

Nov 4, 2009, 10:50:02 AM11/4/09

to

Am Mittwoch, den 04.11.2009, 13:00 +0100 schrieb Andi Kleen:

> > This is true, but i think it is better to get an outdated value than a
> > complete wrong value like -1.
>
> -1 means "I don't know". I don't think "completely wrong"
> is the correct term to describe that.
>
> > The truth is that KSTK_ESP always return an outdated value on a multi
> > core system if the process never do a system call.
>
> I think not supporting updates on interrupts at least is very poor.
> Unfortunately there's no good way fast path way to detect this I know of
> (that is why I originally added -1 here)

This is a first draft for supporting interrupts:

unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);

/* high bit used in ret_from_ code */
unsigned vector = ~regs->orig_ax;
unsigned irq;

exit_idle();
irq_enter();

/* >>>>>>>> update usersp */
current->thread.usersp = regs->sp;

irq = __get_cpu_var(vector_irq)[vector];

if (!handle_irq(irq, regs)) {
ack_APIC_irq();

if (printk_ratelimit())
pr_emerg("%s: %d.%d No irq handler for vector (irq %d)\n",
__func__, smp_processor_id(), vector, irq);
}

irq_exit();

set_irq_regs(old_regs);
return 1;
}

This works in my environment, but i have not the oversight if it work
under all circumstances. And we need a similar line in the timer
interrupt.

Greetings,
Stefani

Stefani Seibold

unread,

Nov 5, 2009, 6:40:56 AM11/5/09

to

Hi Andi,

what do you think about this? The following patch implements a more
accurate KSTK_ESP. The usersp of the task will be updated in the device
and apci_timer interrupt. It would be easy to change other interrupts
too. The performance penalty should tiny.

--- linux-2.6.32-rc5.old/arch/x86/include/asm/processor.h 2009-10-16 02:41:50.000000000 +0200
+++ linux-2.6.32-rc5.new/arch/x86/include/asm/processor.h 2009-11-04 23:02:53.705275836 +0100
@@ -1000,7 +1000,7 @@

#define thread_saved_pc(t) (*(unsigned long *)((t)->thread.sp - 8))

#define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1)
-#define KSTK_ESP(tsk) -1 /* sorry. doesn't work for syscall. */

+extern unsigned long KSTK_ESP(struct task_struct *task);

#endif /* CONFIG_X86_64 */

extern void start_thread(struct pt_regs *regs, unsigned long new_ip,

@@ -1052,4 +1052,12 @@
return ratio;
}

+#define update_usersp(regs) \
+({ \
+ unsigned long __stk__ = (unsigned long)task_stack_page(current); \
+ unsigned long __stkp__ = (regs)->sp; \
+ if ((__stkp__ < __stk__) || (__stkp__ >= __stk__ + THREAD_SIZE)) \
+ current->thread.usersp = __stkp__; \
+})
+
#endif /* _ASM_X86_PROCESSOR_H */
--- x/linux-2.6.32-rc5/arch/x86/kernel/process_64.c 2009-10-16 02:41:50.000000000 +0200
+++ linux-2.6.32-rc5/arch/x86/kernel/process_64.c 2009-11-03 10:11:11.202957393 +0100
@@ -664,3 +664,9 @@
return do_arch_prctl(current, code, addr);
}

+unsigned long KSTK_ESP(struct task_struct *task)
+{
+ return (test_tsk_thread_flag(task, TIF_IA32)) ? \
+ (task_pt_regs(task)->sp) : \
+ ((task)->thread.usersp);

+}
--- linux-2.6.32-rc5.old/arch/x86/kernel/irq_64.c 2009-10-16 02:41:50.000000000 +0200
+++ linux-2.6.32-rc5.new/arch/x86/kernel/irq_64.c 2009-11-04 22:29:55.762951577 +0100
@@ -53,6 +53,7 @@
struct irq_desc *desc;

stack_overflow_check(regs);
+ update_usersp(regs);

desc = irq_to_desc(irq);
if (unlikely(!desc))
--- linux-2.6.32-rc5.old/arch/x86/kernel/apic/apic.c 2009-10-16 02:41:50.000000000 +0200
+++ linux-2.6.32-rc5.new/arch/x86/kernel/apic/apic.c 2009-11-04 23:12:32.805086991 +0100
@@ -831,6 +831,9 @@

{
struct pt_regs *old_regs = set_irq_regs(regs);

+#ifndef CONFIG_X86_32
+ update_usersp(regs);
+#endif
/*
* NOTE! We'd better ACK the irq immediately,
* because timer handling can be slow.

This works in my environment under load, but i have not the oversight if it work
under all circumstances.

Stefani

Andi Kleen

unread,

Nov 5, 2009, 6:41:31 AM11/5/09

to

> +void update_usersp(struct pt_regs *regs)
> +{
> + unsigned long stk = (unsigned long)task_stack_page(current);
> + unsigned long stkp = (regs)->sp;
> +
> + if (((stkp < stk) || (stkp >= stk + THREAD_SIZE))
> + && regs->ip < PAGE_OFFSET)
> + percpu_write(old_rsp, stkp);

This does not handle interrupt and exception stacks correctly.

Also regs->ip is never a safe check for running in user space,
because a program can set the IP to a arbitrary value for a one
instruction window.

The larger problem is also if the kernel moves to no-tick-for-non-idle
(which I guess will happen sooner or later) your method won't
work anyways, or again be arbitarily inaccurate. Even today 10ms
worst time inaccuracy for HZ=100 is rather bad, there can be a lot of stack
allocations in that time. And adding new dependencies on a regular
timer when everything else is moving away from that doesn't seem right.

Also I suspect this method won't work on preempt-rt without
additional tweaks.

-Andi

Stefani Seibold

unread,

Nov 5, 2009, 6:41:56 AM11/5/09

to

Hi,

this is a RFC for a more accurate KSTK_ESP implementation for the x86_64
architecture.

Because the usersp will be only updated by a context switch this value
is most of the time outdated. This patch update the per CPU variable
old_rsp in the device and timer interrupt too.

In my opinion this can be save done if the current stack pointer is
outside the kernel stack of the current task and the instruction pointer
is not inside the kernel.

The old_rsp value will be stored in usersp in case of a context switch.

The KSTK_ESP will get the value from old_rsp in case the task is the
current task, otherwise it will read usersp.

I know about the performance coast, so this is why i ask for comments.

Stefani

Signed-off-by: Stefani Seibold <ste...@seibold.net>

include/asm/processor.h | 4 +++-
kernel/apic/apic.c | 3 +++
kernel/irq_64.c | 1 +
kernel/process_64.c | 20 ++++++++++++++++++++
4 files changed, 27 insertions(+), 1 deletion(-)

--- linux-2.6.32-rc5.old/arch/x86/include/asm/processor.h 2009-10-16 02:41:50.000000000 +0200
+++ linux-2.6.32-rc5.new/arch/x86/include/asm/processor.h 2009-11-05 08:28:23.765300812 +0100
@@ -1000,7 +1000,7 @@

#define thread_saved_pc(t) (*(unsigned long *)((t)->thread.sp - 8))

#define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1)
-#define KSTK_ESP(tsk) -1 /* sorry. doesn't work for syscall. */

+extern unsigned long KSTK_ESP(struct task_struct *task);

#endif /* CONFIG_X86_64 */

extern void start_thread(struct pt_regs *regs, unsigned long new_ip,

@@ -1052,4 +1052,6 @@
return ratio;
}

+extern void update_usersp(struct pt_regs *regs);

+
#endif /* _ASM_X86_PROCESSOR_H */

--- linux-2.6.32-rc5.old/arch/x86/kernel/process_64.c 2009-10-16 02:41:50.000000000 +0200
+++ linux-2.6.32-rc5.new/arch/x86/kernel/process_64.c 2009-11-05 08:52:39.965227285 +0100
@@ -664,3 +664,23 @@
return do_arch_prctl(current, code, addr);
}

+void update_usersp(struct pt_regs *regs)
+{
+ unsigned long stk = (unsigned long)task_stack_page(current);
+ unsigned long stkp = (regs)->sp;
+
+ if (((stkp < stk) || (stkp >= stk + THREAD_SIZE))
+ && regs->ip < PAGE_OFFSET)
+ percpu_write(old_rsp, stkp);

+}
+

+unsigned long KSTK_ESP(struct task_struct *task)
+{

+ if (test_tsk_thread_flag(task, TIF_IA32))
+ return task_pt_regs(task)->sp;
+
+ if (task != current)
+ return task->thread.usersp;
+
+ return percpu_read(old_rsp);

+}
--- linux-2.6.32-rc5.old/arch/x86/kernel/irq_64.c 2009-10-16 02:41:50.000000000 +0200
+++ linux-2.6.32-rc5.new/arch/x86/kernel/irq_64.c 2009-11-04 22:29:55.762951577 +0100
@@ -53,6 +53,7 @@
struct irq_desc *desc;

stack_overflow_check(regs);
+ update_usersp(regs);

desc = irq_to_desc(irq);
if (unlikely(!desc))
--- linux-2.6.32-rc5.old/arch/x86/kernel/apic/apic.c 2009-10-16 02:41:50.000000000 +0200
+++ linux-2.6.32-rc5.new/arch/x86/kernel/apic/apic.c 2009-11-04 23:12:32.805086991 +0100
@@ -831,6 +831,9 @@
{
struct pt_regs *old_regs = set_irq_regs(regs);

+#ifndef CONFIG_X86_32
+ update_usersp(regs);
+#endif
/*
* NOTE! We'd better ACK the irq immediately,
* because timer handling can be slow.

Stefani Seibold

unread,

Nov 5, 2009, 7:20:01 AM11/5/09

to

Am Donnerstag, den 05.11.2009, 12:08 +0100 schrieb Andi Kleen:
> > +void update_usersp(struct pt_regs *regs)
> > +{
> > + unsigned long stk = (unsigned long)task_stack_page(current);
> > + unsigned long stkp = (regs)->sp;
> > +
> > + if (((stkp < stk) || (stkp >= stk + THREAD_SIZE))
> > + && regs->ip < PAGE_OFFSET)
> > + percpu_write(old_rsp, stkp);
>
> This does not handle interrupt and exception stacks correctly.
>
> Also regs->ip is never a safe check for running in user space,
> because a program can set the IP to a arbitrary value for a one
> instruction window.
>

I think this doesn't matter, because i want only detect if it is save to
wrire the old_rsp value. There are only three places in the kernel where
this value will be writen, all of them are in the kernel. So checking if
the ip is in the kernel and the stack pointer points outside the kernel
stack of the current task will be enough. Or i am wrong?

> The larger problem is also if the kernel moves to no-tick-for-non-idle
> (which I guess will happen sooner or later) your method won't
> work anyways, or again be arbitarily inaccurate. Even today 10ms
> worst time inaccuracy for HZ=100 is rather bad, there can be a lot of stack
> allocations in that time. And adding new dependencies on a regular
> timer when everything else is moving away from that doesn't seem right.

The value is correct in case of on uni processor system. In an multi
core system it is a good approximation. A quick look into other
architectures shows that this is not a x86_64 issue. All architecture
give you only a snapshot.

But this is good enough for ps or /proc usage. If someone want an exact
value, then it is necessary to stop the task.

>
> Also I suspect this method won't work on preempt-rt without
> additional tweaks.
>

That is a other issue. Let us first fix and agree about the basics.

Stefani

Stefani Seibold

unread,

Nov 5, 2009, 8:10:02 AM11/5/09

to

This patch fix a small issue for the stack pointer in /proc/<pid>/stat.
In case of a kernel thread the value of the stack pointer should be 0.

Signed-off-by: Stefani Seibold <ste...@seibold.net>
---

array.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.32-rc5.old/fs/proc/array.c 2009-10-16 02:41:50.000000000 +0200
+++ linux-2.6.32-rc5.new/fs/proc/array.c 2009-11-05 13:46:58.770599144 +0100
@@ -571,7 +571,7 @@
rsslim,
mm ? mm->start_code : 0,
mm ? mm->end_code : 0,
- (permitted) ? task->stack_start : 0,
+ (permitted && mm) ? task->stack_start : 0,
esp,
eip,
/* The signal information here is obsolete.

Ingo Molnar

unread,

Nov 8, 2009, 6:40:02 AM11/8/09

to

* Stefani Seibold <ste...@seibold.net> wrote:

Cleanliness: no need for that parenthesis.

> +
> + if (((stkp < stk) || (stkp >= stk + THREAD_SIZE))
> + && regs->ip < PAGE_OFFSET)
> + percpu_write(old_rsp, stkp);
> +}

that check for regs->ip looks imprecise - why dont you use the
user_mode_vm()?

It's true that the value itself is statistical, but still we dont want
to leak a kernel-space regs->sp reason - it's an information leak.

Cleanliness: please eliminate this #ifdef by defining update_usersp() on
32-bit as well, as an empty inline function.

But, i dont like this patch because it adds overhead to the IRQ
fastpath.

I'd suggest a competely different method: why dont you use an IPI to
sample the SP whenever someone wants to read it from /proc and we see
that the task is running on a CPU right now?

Ingo

Ingo Molnar

unread,

Nov 8, 2009, 8:00:06 AM11/8/09

to

> Sounds like a challenge, i like the idea. I will have a look on it...

It's not a fastpath, so smp_function_call() ought to do the trick.

Stefani Seibold

unread,

Nov 8, 2009, 8:00:04 AM11/8/09

to

Am Sonntag, den 08.11.2009, 12:35 +0100 schrieb Ingo Molnar:
> * Stefani Seibold <ste...@seibold.net> wrote:
>
> > +
> > + if (((stkp < stk) || (stkp >= stk + THREAD_SIZE))
> > + && regs->ip < PAGE_OFFSET)
> > + percpu_write(old_rsp, stkp);
> > +}
>
> that check for regs->ip looks imprecise - why dont you use the
> user_mode_vm()?
>
> It's true that the value itself is statistical, but still we dont want
> to leak a kernel-space regs->sp reason - it's an information leak.
>

Good idea. Much better ;-)

> > --- linux-2.6.32-rc5.old/arch/x86/kernel/irq_64.c 2009-10-16 02:41:50.000000000 +0200
> > +++ linux-2.6.32-rc5.new/arch/x86/kernel/irq_64.c 2009-11-04 22:29:55.762951577 +0100
> > @@ -53,6 +53,7 @@
> > struct irq_desc *desc;
> >
> > stack_overflow_check(regs);
> > + update_usersp(regs);
> >
> >
> > desc = irq_to_desc(irq);
> > if (unlikely(!desc))
> > --- linux-2.6.32-rc5.old/arch/x86/kernel/apic/apic.c 2009-10-16 02:41:50.000000000 +0200
> > +++ linux-2.6.32-rc5.new/arch/x86/kernel/apic/apic.c 2009-11-04 23:12:32.805086991 +0100
> > @@ -831,6 +831,9 @@
> > {
> > struct pt_regs *old_regs = set_irq_regs(regs);
> >
> > +#ifndef CONFIG_X86_32
> > + update_usersp(regs);
> > +#endif
>
> Cleanliness: please eliminate this #ifdef by defining update_usersp() on
> 32-bit as well, as an empty inline function.
>
> But, i dont like this patch because it adds overhead to the IRQ
> fastpath.
>

Agree, but i saw no other way.

> I'd suggest a competely different method: why dont you use an IPI to
> sample the SP whenever someone wants to read it from /proc and we see
> that the task is running on a CPU right now?
>

Sounds like a challenge, i like the idea. I will have a look on it...

Stefani

Stefani Seibold

unread,

Nov 8, 2009, 9:10:02 AM11/8/09

to

Am Sonntag, den 08.11.2009, 13:55 +0100 schrieb Ingo Molnar:
> * Stefani Seibold <ste...@seibold.net> wrote:
>
> > > I'd suggest a competely different method: why dont you use an IPI to
> > > sample the SP whenever someone wants to read it from /proc and we
> > > see that the task is running on a CPU right now?
> >
> > Sounds like a challenge, i like the idea. I will have a look on it...
>
> It's not a fastpath, so smp_function_call() ought to do the trick.
>
> Ingo

There is no function smp_function_call()...

H. Peter Anvin

unread,

Nov 8, 2009, 11:40:01 AM11/8/09

to

On 11/08/2009 06:00 AM, Stefani Seibold wrote:
> Am Sonntag, den 08.11.2009, 13:55 +0100 schrieb Ingo Molnar:
>> * Stefani Seibold <ste...@seibold.net> wrote:
>>
>>>> I'd suggest a competely different method: why dont you use an IPI to
>>>> sample the SP whenever someone wants to read it from /proc and we
>>>> see that the task is running on a CPU right now?
>>>
>>> Sounds like a challenge, i like the idea. I will have a look on it...
>>
>> It's not a fastpath, so smp_function_call() ought to do the trick.
>

> There is no function smp_function_call()...
>

smp_call_function().

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

Andi Kleen

unread,

Nov 8, 2009, 2:40:02 PM11/8/09

to

Ingo Molnar <mi...@elte.hu> writes:
>
> I'd suggest a competely different method: why dont you use an IPI to
> sample the SP whenever someone wants to read it from /proc and we see
> that the task is running on a CPU right now?

Most of /proc tends to turn into a fast path when you run the right
monitoring tools unfortunately, which poll /proc at sometimes
quite high frequencies.

I suspect you'll slow something down significantly with this approach.

The only good way to avoid that would be to use a separate file, but again
if someone really wants it they can as well just use a ptrace()
based program from user space.

-Andi

--
a...@linux.intel.com -- Speaking for myself only.

Stefani Seibold

unread,

Nov 13, 2009, 3:10:02 AM11/13/09

to

This patch fix a small issue for the stack pointer in /proc/<pid>/stat.

In case of a kernel thread the value of the printed stack pointer should be 0.

Signed-off-by: Stefani Seibold <ste...@seibold.net>
---

array.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- linux-2.6.32-rc5.old/fs/proc/array.c 2009-10-16 02:41:50.000000000 +0200
+++ linux-2.6.32-rc5.new/fs/proc/array.c 2009-11-05 13:46:58.770599144 +0100
@@ -571,7 +571,7 @@
rsslim,
mm ? mm->start_code : 0,
mm ? mm->end_code : 0,
- (permitted) ? task->stack_start : 0,
+ (permitted && mm) ? task->stack_start : 0,
esp,
eip,
/* The signal information here is obsolete.

--