I get kernel crashes each time I use perf with callchains
on sparc 64.
It triggers with a simple:
perf record -a -f -g sleep 1
I'm attaching two different crashlogs, as it seem to happen
randomly, and also my config.
Tell me everything else you need.
Thanks.
> I get kernel crashes each time I use perf with callchains
> on sparc 64.
>
> It triggers with a simple:
>
> perf record -a -f -g sleep 1
>
> I'm attaching two different crashlogs, as it seem to happen
> randomly, and also my config.
Is your 'perf' a 64-bit or 32-bit binary. How about the
'sleep' binary?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Both are 32 bits binaries.
> It triggers with a simple:
>
> perf record -a -f -g sleep 1
I can reproduce, thanks for the report, I fix this now.
> I get kernel crashes each time I use perf with callchains
> on sparc 64.
>
> It triggers with a simple:
>
> perf record -a -f -g sleep 1
This should fix it, thanks again.
sparc64: Properly truncate pt_regs framepointer in perf callback.
For 32-bit processes, we save the full 64-bits of the regs in pt_regs.
But unlike when the userspace actually does load and store
instructions, the top 32-bits don't get automatically truncated by the
cpu in kernel mode (because the kernel doesn't execute with PSTATE_AM
address masking enabled).
So we have to do it by hand.
Reported-by: Frederic Weisbecker <fwei...@gmail.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
---
arch/sparc/kernel/perf_event.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index 9f2b2ba..610112e 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1337,7 +1337,7 @@ static void perf_callchain_user_32(struct pt_regs *regs,
callchain_store(entry, PERF_CONTEXT_USER);
callchain_store(entry, regs->tpc);
- ufp = regs->u_regs[UREG_I6];
+ ufp = regs->u_regs[UREG_I6] & 0xffffffffUL;
do {
struct sparc_stackf32 *usf, sf;
unsigned long pc;
--
1.7.0.3
I merged your tree on latest -git and it works well.
Thanks!
Sorry, I have another bug report.
While building perf tools, or the kernel, or whatever, I often
get the following error in the middle:
gcc: Internal error: Segmentation fault (program as)
And this in the logs:
[ 1429.477049] as[2658]: segfault at 4054dfa8 ip 0000000000020690 (rpc 00000000700adcf4) sp 00000000ffcbf008
error 30001 in as[10000+40000]
My gcc / as and everything in userspace is 32 bits but the kernel is a 64.
My config is the same as before.
Again, tell me everything you need to help debugging this.
Thanks.
> While building perf tools, or the kernel, or whatever, I often
> get the following error in the middle:
>
> gcc: Internal error: Segmentation fault (program as)
>
> And this in the logs:
>
> [ 1429.477049] as[2658]: segfault at 4054dfa8 ip 0000000000020690 (rpc 00000000700adcf4) sp 00000000ffcbf008
> error 30001 in as[10000+40000]
What distribution and binutils are you using?
It's a debian lenny, with binutils 2.18.1~cvs20080103-7.
> On Mon, Mar 29, 2010 at 02:01:31PM -0700, David Miller wrote:
>> From: Frederic Weisbecker <fwei...@gmail.com>
>> Date: Mon, 29 Mar 2010 22:49:33 +0200
>>
>> > While building perf tools, or the kernel, or whatever, I often
>> > get the following error in the middle:
>> >
>> > gcc: Internal error: Segmentation fault (program as)
>> >
>> > And this in the logs:
>> >
>> > [ 1429.477049] as[2658]: segfault at 4054dfa8 ip 0000000000020690 (rpc 00000000700adcf4) sp 00000000ffcbf008
>> > error 30001 in as[10000+40000]
>>
>> What distribution and binutils are you using?
>
> It's a debian lenny, with binutils 2.18.1~cvs20080103-7.
I'm using the same here on some boxes, what kind of machine is this?
It's a Niagara 2 based one.
> It's a Niagara 2 based one.
Strange, that's what I do all of my main sparc64 kernel
work on too. I've never seen these spurious 'as' crashes.
Hmmmm, what does "ldd /usr/bin/as" give you?
Thanks.
$ ldd /usr/bin/as
libopcodes-2.18.0.20080103.so => /usr/lib/libopcodes-2.18.0.20080103.so (0xf7ec4000)
libbfd-2.18.0.20080103.so => /usr/lib/libbfd-2.18.0.20080103.so (0xf7e14000)
libc.so.6 => /lib/libc.so.6 (0xf7ca0000)
/lib/ld-linux.so.2 (0xf7efc000)
The last kernel I know that don't have such problems is 2.6.31-rc6
May be I should bisect?
> $ ldd /usr/bin/as
> libopcodes-2.18.0.20080103.so => /usr/lib/libopcodes-2.18.0.20080103.so (0xf7ec4000)
> libbfd-2.18.0.20080103.so => /usr/lib/libbfd-2.18.0.20080103.so (0xf7e14000)
> libc.so.6 => /lib/libc.so.6 (0xf7ca0000)
> /lib/ld-linux.so.2 (0xf7efc000)
Ok, same here.
> The last kernel I know that don't have such problems is 2.6.31-rc6
> May be I should bisect?
Hmmm, since you know a good and bad point, yes a bisect
might be the best way to proceed here.
It might be quicker if you first test 2.6.32 and 2.6.33
and then use the results of that to guide your bisect.
Anyways, if you narrow it down to a commit I should be
able to fix this quickly.
Thanks!
I actually can't. It works well on a backup 2.6.31-rc6 kernel
but when I build a new one of this same version, the problem
happens again. And I don't have the config of the one that works
(and no /proc/config.gz as well).
So I suspect this is something that happens with some specific
configs only.
Anyway, once I get more clues about this, I'll tell you.
Thanks.
> I actually can't. It works well on a backup 2.6.31-rc6 kernel
> but when I build a new one of this same version, the problem
> happens again. And I don't have the config of the one that works
> (and no /proc/config.gz as well).
>
> So I suspect this is something that happens with some specific
> configs only.
>
> Anyway, once I get more clues about this, I'll tell you.
I was going to ask you if any of your compiler tools changed
recently...
Check the gcc version printed by the working kernel at the top of the
dmesg logs and compare to what you end up using now.
They are exactly the same :)
gcc version 4.3.2 (Debian 4.3.2-1.1)
Really I think I need to dig further as I don't have useful
clues to provide. I need to check if the segfault always happen
in the same place, etc...
It seems to happen with ld as well btw (not sure this is related
though):
[ 3366.005962] ld[19041]: segfault at 10 ip 000000007010248c (rpc 00000000701023f8) sp 00000000ffda87c8 error 30001
in libbfd-2.18.0.20080103.so[700d8000+a0000]
> It seems to happen with ld as well btw (not sure this is related
> though):
>
> [ 3366.005962] ld[19041]: segfault at 10 ip 000000007010248c (rpc 00000000701023f8) sp 00000000ffda87c8 error 30001
> in libbfd-2.18.0.20080103.so[700d8000+a0000]
It's data corruption coming either from the kernel or something
malfunctioning in libc is my guess, more likely the kernel.