Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: 2.6.34-rc2 - crash on shutdown

1 view
Skip to first unread message

Clemens Ladisch

unread,
Mar 23, 2010, 8:10:02 AM3/23/10
to
David R wrote:
> I recently upgraded my kernel to the .34 rc, and I'm getting the
> following crash on shutdown (see attached console image)

Mee too, also in amd_pmu_cpu_offline().

The only pointer access in this function is cpuhw->amd_nb, but
I don't see any obvious bugs.


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Stephane Eranian

unread,
Mar 23, 2010, 9:40:02 AM3/23/10
to
On Tue, Mar 23, 2010 at 1:02 PM, Clemens Ladisch <cle...@ladisch.de> wrote:
> David R wrote:
>> I recently upgraded my kernel to the .34 rc, and I'm getting the
>> following crash on shutdown (see attached console image)
>
> Mee too, also in amd_pmu_cpu_offline().
>
> The only pointer access in this function is cpuhw->amd_nb, but
> I don't see any obvious bugs.
>
>
I reported a problem with the AMD initialization just last week.
There is an issue with amd_pmu_cpu_online() which gets called
too early, and thus fails. That leaves some bogus state and causes
a crash in amd_pmu_cpu_offline().

I proposed a fix which was rejected. The alternative involves moving
some the of CPU initialization code (on AMD) to an earlier position,i.e.,
which would be executed before the CPU_STARTED notifier. Nobody
has proposed anything else so far.


> Regards,
> Clemens
>

--
Stephane Eranian  | EMEA Software Engineering
Google France | 38 avenue de l'Opéra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks

Clemens Ladisch

unread,
Mar 23, 2010, 10:00:02 AM3/23/10
to
Stephane Eranian wrote:
> On Tue, Mar 23, 2010 at 1:02 PM, Clemens Ladisch <cle...@ladisch.de> wrote:
> > The only pointer access in this function is cpuhw->amd_nb, but
> > I don't see any obvious bugs.
>
> I reported a problem with the AMD initialization just last week.
> There is an issue with amd_pmu_cpu_online() which gets called
> too early, and thus fails. That leaves some bogus state and causes
> a crash in amd_pmu_cpu_offline().
>
> I proposed a fix which was rejected. The alternative involves moving
> some the of CPU initialization code (on AMD) to an earlier position,i.e.,
> which would be executed before the CPU_STARTED notifier. Nobody
> has proposed anything else so far.

I don't know about the early bootmem stuff, but regardless of this issue,
if amd_pmu_cpu_online() can fail, then amd_pmu_cpu_offline() must be able
to handle this without blowing up. Something like this (untested):

Signed-off-by: Clemens Ladisch <cle...@ladisch.de>

--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -324,17 +324,17 @@ static void amd_pmu_cpu_online(int cpu)
if (boot_cpu_data.x86_max_cores < 2)
return;

+ cpu1 = &per_cpu(cpu_hw_events, cpu);
+ cpu1->amd_nb = NULL;
+
/*
* function may be called too early in the
* boot process, in which case nb_id is bogus
*/
nb_id = amd_get_nb_id(cpu);
if (nb_id == BAD_APICID)
return;

- cpu1 = &per_cpu(cpu_hw_events, cpu);
- cpu1->amd_nb = NULL;
-
raw_spin_lock(&amd_nb_lock);

for_each_online_cpu(i) {
@@ -370,7 +370,7 @@ static void amd_pmu_cpu_offline(int cpu)

raw_spin_lock(&amd_nb_lock);

- if (--cpuhw->amd_nb->refcnt == 0)
+ if (cpuhw->amd_nb && --cpuhw->amd_nb->refcnt == 0)
kfree(cpuhw->amd_nb);

cpuhw->amd_nb = NULL;

Rafael J. Wysocki

unread,
Mar 23, 2010, 6:20:02 PM3/23/10
to
On Tuesday 23 March 2010, Clemens Ladisch wrote:
> Stephane Eranian wrote:
> > On Tue, Mar 23, 2010 at 1:02 PM, Clemens Ladisch <cle...@ladisch.de> wrote:
> > > The only pointer access in this function is cpuhw->amd_nb, but
> > > I don't see any obvious bugs.
> >
> > I reported a problem with the AMD initialization just last week.
> > There is an issue with amd_pmu_cpu_online() which gets called
> > too early, and thus fails. That leaves some bogus state and causes
> > a crash in amd_pmu_cpu_offline().
> >
> > I proposed a fix which was rejected. The alternative involves moving
> > some the of CPU initialization code (on AMD) to an earlier position,i.e.,
> > which would be executed before the CPU_STARTED notifier. Nobody
> > has proposed anything else so far.
>
> I don't know about the early bootmem stuff, but regardless of this issue,
> if amd_pmu_cpu_online() can fail, then amd_pmu_cpu_offline() must be able
> to handle this without blowing up. Something like this (untested):

I guess we handle that already:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a90110c61073eab95d1986322693c2b9a8a6a5f6

Rafael

Stephane Eranian

unread,
Mar 23, 2010, 6:50:02 PM3/23/10
to
On Tue, Mar 23, 2010 at 11:18 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
> On Tuesday 23 March 2010, Clemens Ladisch wrote:
>> Stephane Eranian wrote:
>> > On Tue, Mar 23, 2010 at 1:02 PM, Clemens Ladisch <cle...@ladisch.de> wrote:
>> > > The only pointer access in this function is cpuhw->amd_nb, but
>> > > I don't see any obvious bugs.
>> >
>> > I reported a problem with the AMD initialization just last week.
>> > There is an issue with amd_pmu_cpu_online() which gets called
>> > too early, and thus fails. That leaves some bogus state and causes
>> > a crash in amd_pmu_cpu_offline().
>> >
>> > I proposed a fix which was rejected. The alternative involves moving
>> > some the of CPU initialization code (on AMD) to an earlier position,i.e.,
>> > which would be executed before the CPU_STARTED notifier. Nobody
>> > has proposed anything else so far.
>>
>> I don't know about the early bootmem stuff, but regardless of this issue,
>> if amd_pmu_cpu_online() can fail, then amd_pmu_cpu_offline() must be able
>> to handle this without blowing up.  Something like this (untested):
>
> I guess we handle that already:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a90110c61073eab95d1986322693c2b9a8a6a5f6
>
Ok, the fix avoids the crash but perf_events support for AMD is still broken.

The root of the problem is elsewhere as I pointed out last week. Peter proposed
a patch today and I think this would be enough to avoid the crash and have
perf_events working again on AMD.

Rafael J. Wysocki

unread,
Mar 23, 2010, 7:30:02 PM3/23/10
to
On Tuesday 23 March 2010, Stephane Eranian wrote:
> On Tue, Mar 23, 2010 at 11:18 PM, Rafael J. Wysocki <r...@sisk.pl> wrote:
> > On Tuesday 23 March 2010, Clemens Ladisch wrote:
> >> Stephane Eranian wrote:
> >> > On Tue, Mar 23, 2010 at 1:02 PM, Clemens Ladisch <cle...@ladisch.de> wrote:
> >> > > The only pointer access in this function is cpuhw->amd_nb, but
> >> > > I don't see any obvious bugs.
> >> >
> >> > I reported a problem with the AMD initialization just last week.
> >> > There is an issue with amd_pmu_cpu_online() which gets called
> >> > too early, and thus fails. That leaves some bogus state and causes
> >> > a crash in amd_pmu_cpu_offline().
> >> >
> >> > I proposed a fix which was rejected. The alternative involves moving
> >> > some the of CPU initialization code (on AMD) to an earlier position,i.e.,
> >> > which would be executed before the CPU_STARTED notifier. Nobody
> >> > has proposed anything else so far.
> >>
> >> I don't know about the early bootmem stuff, but regardless of this issue,
> >> if amd_pmu_cpu_online() can fail, then amd_pmu_cpu_offline() must be able
> >> to handle this without blowing up. Something like this (untested):
> >
> > I guess we handle that already:
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a90110c61073eab95d1986322693c2b9a8a6a5f6
> >
> Ok, the fix avoids the crash but perf_events support for AMD is still broken.
>
> The root of the problem is elsewhere as I pointed out last week. Peter proposed
> a patch today and I think this would be enough to avoid the crash and have
> perf_events working again on AMD.

Yes, I saw the Peter's patch.

Thanks,
Rafael

0 new messages