Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[BUG] fault while using perf callchains in sparc64

1 view
Skip to first unread message

Frederic Weisbecker

unread,
Mar 28, 2010, 12:40:01 AM3/28/10
to
Hi,

I get kernel crashes each time I use perf with callchains
on sparc 64.

It triggers with a simple:

perf record -a -f -g sleep 1


I'm attaching two different crashlogs, as it seem to happen
randomly, and also my config.

Tell me everything else you need.

Thanks.

crashlog1
crashlog2
sparc-config

David Miller

unread,
Mar 28, 2010, 10:10:02 PM3/28/10
to
From: Frederic Weisbecker <fwei...@gmail.com>
Date: Sun, 28 Mar 2010 06:34:49 +0200

> I get kernel crashes each time I use perf with callchains
> on sparc 64.
>
> It triggers with a simple:
>
> perf record -a -f -g sleep 1
>
> I'm attaching two different crashlogs, as it seem to happen
> randomly, and also my config.

Is your 'perf' a 64-bit or 32-bit binary. How about the
'sleep' binary?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Frederic Weisbecker

unread,
Mar 28, 2010, 11:40:01 PM3/28/10
to
On Sun, Mar 28, 2010 at 07:02:35PM -0700, David Miller wrote:
> From: Frederic Weisbecker <fwei...@gmail.com>
> Date: Sun, 28 Mar 2010 06:34:49 +0200
>
> > I get kernel crashes each time I use perf with callchains
> > on sparc 64.
> >
> > It triggers with a simple:
> >
> > perf record -a -f -g sleep 1
> >
> > I'm attaching two different crashlogs, as it seem to happen
> > randomly, and also my config.
>
> Is your 'perf' a 64-bit or 32-bit binary. How about the
> 'sleep' binary?


Both are 32 bits binaries.

David Miller

unread,
Mar 29, 2010, 4:00:01 PM3/29/10
to
From: Frederic Weisbecker <fwei...@gmail.com>
Date: Sun, 28 Mar 2010 06:34:49 +0200

> It triggers with a simple:


>
> perf record -a -f -g sleep 1

I can reproduce, thanks for the report, I fix this now.

David Miller

unread,
Mar 29, 2010, 4:10:02 PM3/29/10
to
From: Frederic Weisbecker <fwei...@gmail.com>
Date: Sun, 28 Mar 2010 06:34:49 +0200

> I get kernel crashes each time I use perf with callchains


> on sparc 64.
>
> It triggers with a simple:
>
> perf record -a -f -g sleep 1

This should fix it, thanks again.

sparc64: Properly truncate pt_regs framepointer in perf callback.

For 32-bit processes, we save the full 64-bits of the regs in pt_regs.

But unlike when the userspace actually does load and store
instructions, the top 32-bits don't get automatically truncated by the
cpu in kernel mode (because the kernel doesn't execute with PSTATE_AM
address masking enabled).

So we have to do it by hand.

Reported-by: Frederic Weisbecker <fwei...@gmail.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
---
arch/sparc/kernel/perf_event.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/sparc/kernel/perf_event.c b/arch/sparc/kernel/perf_event.c
index 9f2b2ba..610112e 100644
--- a/arch/sparc/kernel/perf_event.c
+++ b/arch/sparc/kernel/perf_event.c
@@ -1337,7 +1337,7 @@ static void perf_callchain_user_32(struct pt_regs *regs,
callchain_store(entry, PERF_CONTEXT_USER);
callchain_store(entry, regs->tpc);

- ufp = regs->u_regs[UREG_I6];
+ ufp = regs->u_regs[UREG_I6] & 0xffffffffUL;
do {
struct sparc_stackf32 *usf, sf;
unsigned long pc;
--
1.7.0.3

Frederic Weisbecker

unread,
Mar 29, 2010, 4:50:02 PM3/29/10
to
On Mon, Mar 29, 2010 at 01:09:31PM -0700, David Miller wrote:
> From: Frederic Weisbecker <fwei...@gmail.com>
> Date: Sun, 28 Mar 2010 06:34:49 +0200
>
> > I get kernel crashes each time I use perf with callchains
> > on sparc 64.
> >
> > It triggers with a simple:
> >
> > perf record -a -f -g sleep 1
>
> This should fix it, thanks again.


I merged your tree on latest -git and it works well.

Thanks!

Sorry, I have another bug report.

While building perf tools, or the kernel, or whatever, I often
get the following error in the middle:

gcc: Internal error: Segmentation fault (program as)

And this in the logs:

[ 1429.477049] as[2658]: segfault at 4054dfa8 ip 0000000000020690 (rpc 00000000700adcf4) sp 00000000ffcbf008
error 30001 in as[10000+40000]

My gcc / as and everything in userspace is 32 bits but the kernel is a 64.

My config is the same as before.

Again, tell me everything you need to help debugging this.

Thanks.

David Miller

unread,
Mar 29, 2010, 5:10:02 PM3/29/10
to
From: Frederic Weisbecker <fwei...@gmail.com>
Date: Mon, 29 Mar 2010 22:49:33 +0200

> While building perf tools, or the kernel, or whatever, I often
> get the following error in the middle:
>
> gcc: Internal error: Segmentation fault (program as)
>
> And this in the logs:
>
> [ 1429.477049] as[2658]: segfault at 4054dfa8 ip 0000000000020690 (rpc 00000000700adcf4) sp 00000000ffcbf008
> error 30001 in as[10000+40000]

What distribution and binutils are you using?

Frederic Weisbecker

unread,
Mar 29, 2010, 5:20:01 PM3/29/10
to
On Mon, Mar 29, 2010 at 02:01:31PM -0700, David Miller wrote:
> From: Frederic Weisbecker <fwei...@gmail.com>
> Date: Mon, 29 Mar 2010 22:49:33 +0200
>
> > While building perf tools, or the kernel, or whatever, I often
> > get the following error in the middle:
> >
> > gcc: Internal error: Segmentation fault (program as)
> >
> > And this in the logs:
> >
> > [ 1429.477049] as[2658]: segfault at 4054dfa8 ip 0000000000020690 (rpc 00000000700adcf4) sp 00000000ffcbf008
> > error 30001 in as[10000+40000]
>
> What distribution and binutils are you using?


It's a debian lenny, with binutils 2.18.1~cvs20080103-7.

David Miller

unread,
Mar 29, 2010, 5:20:02 PM3/29/10
to
From: Frederic Weisbecker <fwei...@gmail.com>
Date: Mon, 29 Mar 2010 23:11:50 +0200

> On Mon, Mar 29, 2010 at 02:01:31PM -0700, David Miller wrote:
>> From: Frederic Weisbecker <fwei...@gmail.com>
>> Date: Mon, 29 Mar 2010 22:49:33 +0200
>>
>> > While building perf tools, or the kernel, or whatever, I often
>> > get the following error in the middle:
>> >
>> > gcc: Internal error: Segmentation fault (program as)
>> >
>> > And this in the logs:
>> >
>> > [ 1429.477049] as[2658]: segfault at 4054dfa8 ip 0000000000020690 (rpc 00000000700adcf4) sp 00000000ffcbf008
>> > error 30001 in as[10000+40000]
>>
>> What distribution and binutils are you using?
>
> It's a debian lenny, with binutils 2.18.1~cvs20080103-7.

I'm using the same here on some boxes, what kind of machine is this?

Frederic Weisbecker

unread,
Mar 29, 2010, 5:30:03 PM3/29/10
to
On Mon, Mar 29, 2010 at 02:19:20PM -0700, David Miller wrote:
> From: Frederic Weisbecker <fwei...@gmail.com>
> Date: Mon, 29 Mar 2010 23:11:50 +0200
>
> > On Mon, Mar 29, 2010 at 02:01:31PM -0700, David Miller wrote:
> >> From: Frederic Weisbecker <fwei...@gmail.com>
> >> Date: Mon, 29 Mar 2010 22:49:33 +0200
> >>
> >> > While building perf tools, or the kernel, or whatever, I often
> >> > get the following error in the middle:
> >> >
> >> > gcc: Internal error: Segmentation fault (program as)
> >> >
> >> > And this in the logs:
> >> >
> >> > [ 1429.477049] as[2658]: segfault at 4054dfa8 ip 0000000000020690 (rpc 00000000700adcf4) sp 00000000ffcbf008
> >> > error 30001 in as[10000+40000]
> >>
> >> What distribution and binutils are you using?
> >
> > It's a debian lenny, with binutils 2.18.1~cvs20080103-7.
>
> I'm using the same here on some boxes, what kind of machine is this?


It's a Niagara 2 based one.

David Miller

unread,
Mar 29, 2010, 6:10:02 PM3/29/10
to
From: Frederic Weisbecker <fwei...@gmail.com>
Date: Mon, 29 Mar 2010 23:28:42 +0200

> It's a Niagara 2 based one.

Strange, that's what I do all of my main sparc64 kernel
work on too. I've never seen these spurious 'as' crashes.

Hmmmm, what does "ldd /usr/bin/as" give you?

Thanks.

Frederic Weisbecker

unread,
Mar 29, 2010, 6:30:03 PM3/29/10
to
On Mon, Mar 29, 2010 at 03:02:53PM -0700, David Miller wrote:
> From: Frederic Weisbecker <fwei...@gmail.com>
> Date: Mon, 29 Mar 2010 23:28:42 +0200
>
> > It's a Niagara 2 based one.
>
> Strange, that's what I do all of my main sparc64 kernel
> work on too. I've never seen these spurious 'as' crashes.
>
> Hmmmm, what does "ldd /usr/bin/as" give you?
>
> Thanks.


$ ldd /usr/bin/as
libopcodes-2.18.0.20080103.so => /usr/lib/libopcodes-2.18.0.20080103.so (0xf7ec4000)
libbfd-2.18.0.20080103.so => /usr/lib/libbfd-2.18.0.20080103.so (0xf7e14000)
libc.so.6 => /lib/libc.so.6 (0xf7ca0000)
/lib/ld-linux.so.2 (0xf7efc000)

The last kernel I know that don't have such problems is 2.6.31-rc6
May be I should bisect?

David Miller

unread,
Mar 29, 2010, 6:40:01 PM3/29/10
to
From: Frederic Weisbecker <fwei...@gmail.com>
Date: Tue, 30 Mar 2010 00:21:34 +0200

> $ ldd /usr/bin/as
> libopcodes-2.18.0.20080103.so => /usr/lib/libopcodes-2.18.0.20080103.so (0xf7ec4000)
> libbfd-2.18.0.20080103.so => /usr/lib/libbfd-2.18.0.20080103.so (0xf7e14000)
> libc.so.6 => /lib/libc.so.6 (0xf7ca0000)
> /lib/ld-linux.so.2 (0xf7efc000)

Ok, same here.

> The last kernel I know that don't have such problems is 2.6.31-rc6
> May be I should bisect?

Hmmm, since you know a good and bad point, yes a bisect
might be the best way to proceed here.

It might be quicker if you first test 2.6.32 and 2.6.33
and then use the results of that to guide your bisect.

Anyways, if you narrow it down to a commit I should be
able to fix this quickly.

Thanks!

Frederic Weisbecker

unread,
Apr 1, 2010, 4:10:01 AM4/1/10
to
On Mon, Mar 29, 2010 at 03:32:08PM -0700, David Miller wrote:
> From: Frederic Weisbecker <fwei...@gmail.com>
> Date: Tue, 30 Mar 2010 00:21:34 +0200
>
> > $ ldd /usr/bin/as
> > libopcodes-2.18.0.20080103.so => /usr/lib/libopcodes-2.18.0.20080103.so (0xf7ec4000)
> > libbfd-2.18.0.20080103.so => /usr/lib/libbfd-2.18.0.20080103.so (0xf7e14000)
> > libc.so.6 => /lib/libc.so.6 (0xf7ca0000)
> > /lib/ld-linux.so.2 (0xf7efc000)
>
> Ok, same here.
>
> > The last kernel I know that don't have such problems is 2.6.31-rc6
> > May be I should bisect?
>
> Hmmm, since you know a good and bad point, yes a bisect
> might be the best way to proceed here.
>
> It might be quicker if you first test 2.6.32 and 2.6.33
> and then use the results of that to guide your bisect.
>
> Anyways, if you narrow it down to a commit I should be
> able to fix this quickly.
>
> Thanks!


I actually can't. It works well on a backup 2.6.31-rc6 kernel
but when I build a new one of this same version, the problem
happens again. And I don't have the config of the one that works
(and no /proc/config.gz as well).

So I suspect this is something that happens with some specific
configs only.

Anyway, once I get more clues about this, I'll tell you.

Thanks.

David Miller

unread,
Apr 1, 2010, 4:10:01 AM4/1/10
to
From: Frederic Weisbecker <fwei...@gmail.com>
Date: Thu, 1 Apr 2010 11:06:11 +0200

> I actually can't. It works well on a backup 2.6.31-rc6 kernel
> but when I build a new one of this same version, the problem
> happens again. And I don't have the config of the one that works
> (and no /proc/config.gz as well).
>
> So I suspect this is something that happens with some specific
> configs only.
>
> Anyway, once I get more clues about this, I'll tell you.

I was going to ask you if any of your compiler tools changed
recently...

Check the gcc version printed by the working kernel at the top of the
dmesg logs and compare to what you end up using now.

Frederic Weisbecker

unread,
Apr 1, 2010, 4:50:02 AM4/1/10
to
On Thu, Apr 01, 2010 at 01:09:03AM -0700, David Miller wrote:
> From: Frederic Weisbecker <fwei...@gmail.com>
> Date: Thu, 1 Apr 2010 11:06:11 +0200
>
> > I actually can't. It works well on a backup 2.6.31-rc6 kernel
> > but when I build a new one of this same version, the problem
> > happens again. And I don't have the config of the one that works
> > (and no /proc/config.gz as well).
> >
> > So I suspect this is something that happens with some specific
> > configs only.
> >
> > Anyway, once I get more clues about this, I'll tell you.
>
> I was going to ask you if any of your compiler tools changed
> recently...
>
> Check the gcc version printed by the working kernel at the top of the
> dmesg logs and compare to what you end up using now.


They are exactly the same :)

gcc version 4.3.2 (Debian 4.3.2-1.1)

Really I think I need to dig further as I don't have useful
clues to provide. I need to check if the segfault always happen
in the same place, etc...

It seems to happen with ld as well btw (not sure this is related
though):

[ 3366.005962] ld[19041]: segfault at 10 ip 000000007010248c (rpc 00000000701023f8) sp 00000000ffda87c8 error 30001
in libbfd-2.18.0.20080103.so[700d8000+a0000]

David Miller

unread,
Apr 1, 2010, 5:10:01 AM4/1/10
to
From: Frederic Weisbecker <fwei...@gmail.com>
Date: Thu, 1 Apr 2010 11:38:33 +0200

> It seems to happen with ld as well btw (not sure this is related
> though):
>
> [ 3366.005962] ld[19041]: segfault at 10 ip 000000007010248c (rpc 00000000701023f8) sp 00000000ffda87c8 error 30001
> in libbfd-2.18.0.20080103.so[700d8000+a0000]

It's data corruption coming either from the kernel or something
malfunctioning in libc is my guess, more likely the kernel.

0 new messages