Re: [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms

William Dauchy

unread,

Apr 2, 2014, 5:00:01 AM4/2/14

to

On Wed, Apr 2, 2014 at 9:55 AM, tip-bot for Chen, Gong <tip...@zytor.com> wrote:
> Commit-ID: 27f6c573e0f77f7d1cc907c1494c99a61e48b7d8
> Gitweb: http://git.kernel.org/tip/27f6c573e0f77f7d1cc907c1494c99a61e48b7d8
> Author: Chen, Gong <gong...@linux.intel.com>
> AuthorDate: Thu, 27 Mar 2014 21:24:36 -0400
> Committer: Tony Luck <tony...@intel.com>
> CommitDate: Fri, 28 Mar 2014 13:40:16 -0700
>
> x86, CMCI: Add proper detection of end of CMCI storms
>
> When CMCI storm persists for a long time(at least beyond predefined
> threshold. It's 30 seconds for now), we can watch CMCI storm is
> detected immediately after it subsides.
>
> ...
> Dec 10 22:04:29 kernel: CMCI storm detected: switching to poll mode
> Dec 10 22:04:59 kernel: CMCI storm subsided: switching to interrupt mode
> Dec 10 22:04:59 kernel: CMCI storm detected: switching to poll mode
> Dec 10 22:05:29 kernel: CMCI storm subsided: switching to interrupt mode
> ...
>
> The problem is that our logic that determines that the storm has
> ended is incorrect. We announce the end, re-enable interrupts and
> realize that the storm is still going on, so we switch back to
> polling mode. Rinse, repeat.
>
> When a storm happens we disable signaling of errors via CMCI and begin
> polling machine check banks instead. If we find any logged errors,
> then we need to set a per-cpu flag so that our per-cpu tests that
> check whether the storm is ongoing will see that errors are still
> being logged independently of whether mce_notify_irq() says that the
> error has been fully processed.
>
> cmci_clear() is not the right tool to disable a bank. It disables the
> interrupt for the bank as desired, but it also clears the bit for
> this bank in "mce_banks_owned" so we will skip the bank when polling
> (so we fail to see that the storm continues because we stop looking).
> New cmci_storm_disable_banks() just disables the interrupt while
> allowing polling to continue.
>
> Reported-by: William Dauchy <wda...@gmail.com>

Could you use the following address instead?
Reported-by: William Dauchy <wil...@gandi.net>

Thanks,

> Signed-off-by: Chen, Gong <gong...@linux.intel.com>
> Signed-off-by: Tony Luck <tony...@intel.com>

--
William
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Borislav Petkov

unread,

Apr 2, 2014, 5:10:02 AM4/2/14

to

On Wed, Apr 02, 2014 at 10:51:49AM +0200, William Dauchy wrote:
> Could you use the following address instead?
> Reported-by: William Dauchy <wil...@gandi.net>

It is too late for that now as the patch is in -tip already... Unless
Ingo can still amend it, that is.

But, we're working on a real solution for the storm issue and there
we'll be asking you to test stuff anyway so we'll make sure to use this
mail address then, ok?

:-)

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

William Dauchy

unread,

Apr 2, 2014, 5:20:02 AM4/2/14

to

On Apr02 11:01, Borislav Petkov wrote:
> It is too late for that now as the patch is in -tip already... Unless
> Ingo can still amend it, that is.
>
> But, we're working on a real solution for the storm issue and there
> we'll be asking you to test stuff anyway so we'll make sure to use this
> mail address then, ok?

ack
--
William

signature.asc

Ingo Molnar

unread,

Apr 2, 2014, 6:50:01 AM4/2/14

to

* Borislav Petkov <b...@alien8.de> wrote:

> On Wed, Apr 02, 2014 at 10:51:49AM +0200, William Dauchy wrote:
> > Could you use the following address instead?
> > Reported-by: William Dauchy <wil...@gandi.net>
>
> It is too late for that now as the patch is in -tip already... Unless
> Ingo can still amend it, that is.

There are already patches on top of it, so it's not possible - but
even if it was the last one, since it got committed by Tony I cannot
rebase or amend it.

Thanks,

Ingo

Borislav Petkov

unread,

Apr 2, 2014, 7:00:02 AM4/2/14

to

On Wed, Apr 02, 2014 at 12:46:12PM +0200, Ingo Molnar wrote:
> There are already patches on top of it, so it's not possible - but
> even if it was the last one, since it got committed by Tony I cannot
> rebase or amend it.

Ah right, you pulled it from him, sure.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

H. Peter Anvin

unread,

Apr 11, 2014, 1:10:01 PM4/11/14

to

On 04/02/2014 12:55 AM, tip-bot for Chen, Gong wrote:
> @@ -614,6 +618,8 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
> if (!(m.status & MCI_STATUS_VAL))
> continue;
>
> + v = &get_cpu_var(mce_polled_error);
> + set_bit(0, v);
> /*
> * Uncorrected or signalled events are handled by the exception
> * handler when it is enabled, so don't process those here.
> @@ -1278,10 +1284,18 @@ static unsigned long mce_adjust_timer_default(unsigned long interval)
> static unsigned long (*mce_adjust_timer)(unsigned long interval) =
> mce_adjust_timer_default;
>
> +static int cmc_error_seen(void)
> +{
> + unsigned long *v = &__get_cpu_var(mce_polled_error);
> +
> + return test_and_clear_bit(0, v);
> +}
> +

Please use this_cpu_*() whereever possible instead of __get_cpu_var().
Since this is not actually a bitmask this_cpu_xchg() can be used at the end.

In fact, using set_bit() is completely wasteful.

I'll push this onward since it is a bit late, but please submit a
cleanup patch.

-hpa

Chen, Gong

unread,

Apr 14, 2014, 5:10:03 AM4/14/14

to

According to Peter's suggestion, use this_cpu_* instead of
__get_cpu_var. BTW, remove bitmask ops to avoid unnecessary
overhead.

Signed-off-by: Chen, Gong <gong...@linux.intel.com>
Suggested-by: H. Peter Anvin <h...@zytor.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 36 ++++++++++++++++++------------------
1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 26eaf3b..a44506e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -402,7 +402,7 @@ static u64 mce_rdmsrl(u32 msr)

if (offset < 0)
return 0;
- return *(u64 *)((char *)&__get_cpu_var(injectm) + offset);
+ return *(u64 *)((char *)this_cpu_ptr(&injectm) + offset);
}

if (rdmsrl_safe(msr, &v)) {
@@ -424,7 +424,7 @@ static void mce_wrmsrl(u32 msr, u64 v)
int offset = msr_to_offset(msr);

if (offset >= 0)
- *(u64 *)((char *)&__get_cpu_var(injectm) + offset) = v;
+ *(u64 *)((char *)this_cpu_ptr(&injectm) + offset) = v;
return;
}
wrmsrl(msr, v);
@@ -480,7 +480,7 @@ static DEFINE_PER_CPU(struct mce_ring, mce_ring);
/* Runs with CPU affinity in workqueue */
static int mce_ring_empty(void)
{
- struct mce_ring *r = &__get_cpu_var(mce_ring);
+ struct mce_ring *r = this_cpu_ptr(&mce_ring);

return r->start == r->end;
}
@@ -492,7 +492,7 @@ static int mce_ring_get(unsigned long *pfn)

*pfn = 0;
get_cpu();
- r = &__get_cpu_var(mce_ring);
+ r = this_cpu_ptr(&mce_ring);
if (r->start == r->end)
goto out;
*pfn = r->ring[r->start];
@@ -506,7 +506,7 @@ out:
/* Always runs in MCE context with preempt off */
static int mce_ring_add(unsigned long pfn)
{
- struct mce_ring *r = &__get_cpu_var(mce_ring);
+ struct mce_ring *r = this_cpu_ptr(&mce_ring);
unsigned next;

next = (r->end + 1) % MCE_RING_SIZE;
@@ -528,7 +528,7 @@ int mce_available(struct cpuinfo_x86 *c)
static void mce_schedule_work(void)
{
if (!mce_ring_empty())
- schedule_work(&__get_cpu_var(mce_work));
+ schedule_work(this_cpu_ptr(&mce_work));
}

DEFINE_PER_CPU(struct irq_work, mce_irq_work);
@@ -553,7 +553,7 @@ static void mce_report_event(struct pt_regs *regs)
return;
}

- irq_work_queue(&__get_cpu_var(mce_irq_work));
+ irq_work_queue(this_cpu_ptr(&mce_irq_work));
}

/*
@@ -619,7 +619,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
continue;

v = &get_cpu_var(mce_polled_error);
- set_bit(0, v);
+ *v = 1;
put_cpu_var(mce_polled_error);

/*
* Uncorrected or signalled events are handled by the exception

@@ -1053,7 +1053,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)

mce_gather_info(&m, regs);

- final = &__get_cpu_var(mces_seen);
+ final = this_cpu_ptr(&mces_seen);
*final = m;

memset(valid_banks, 0, sizeof(valid_banks));
@@ -1287,14 +1287,14 @@ static unsigned long (*mce_adjust_timer)(unsigned long interval) =

static int cmc_error_seen(void)
{
- unsigned long *v = &__get_cpu_var(mce_polled_error);
+ unsigned long *v = this_cpu_ptr(&mce_polled_error);

- return test_and_clear_bit(0, v);
+ return this_cpu_xchg(*v, 0);
}

static void mce_timer_fn(unsigned long data)
{
- struct timer_list *t = &__get_cpu_var(mce_timer);
+ struct timer_list *t = this_cpu_ptr(&mce_timer);
unsigned long iv;
int notify;

@@ -1302,7 +1302,7 @@ static void mce_timer_fn(unsigned long data)

if (mce_available(__this_cpu_ptr(&cpu_info))) {
machine_check_poll(MCP_TIMESTAMP,
- &__get_cpu_var(mce_poll_banks));
+ this_cpu_ptr(&mce_poll_banks));
mce_intel_cmci_poll();
}

@@ -1332,7 +1332,7 @@ static void mce_timer_fn(unsigned long data)
*/
void mce_timer_kick(unsigned long interval)
{
- struct timer_list *t = &__get_cpu_var(mce_timer);
+ struct timer_list *t = this_cpu_ptr(&mce_timer);
unsigned long when = jiffies + interval;
unsigned long iv = __this_cpu_read(mce_next_interval);

@@ -1668,7 +1668,7 @@ static void mce_start_timer(unsigned int cpu, struct timer_list *t)

static void __mcheck_cpu_init_timer(void)
{
- struct timer_list *t = &__get_cpu_var(mce_timer);
+ struct timer_list *t = this_cpu_ptr(&mce_timer);
unsigned int cpu = smp_processor_id();

setup_timer(t, mce_timer_fn, cpu);
@@ -1711,8 +1711,8 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
__mcheck_cpu_init_generic();
__mcheck_cpu_init_vendor(c);
__mcheck_cpu_init_timer();
- INIT_WORK(&__get_cpu_var(mce_work), mce_process_work);
- init_irq_work(&__get_cpu_var(mce_irq_work), &mce_irq_work_cb);
+ INIT_WORK(this_cpu_ptr(&mce_work), mce_process_work);
+ init_irq_work(this_cpu_ptr(&mce_irq_work), &mce_irq_work_cb);
}

/*
@@ -1964,7 +1964,7 @@ static struct miscdevice mce_chrdev_device = {
static void __mce_disable_bank(void *arg)
{
int bank = *((int *)arg);
- __clear_bit(bank, __get_cpu_var(mce_poll_banks));
+ __clear_bit(bank, *this_cpu_ptr(&mce_poll_banks));
cmci_disable_bank(bank);
}

--
1.9.0

Chen, Gong

unread,

Apr 14, 2014, 5:10:03 AM4/14/14

to

This issue is introduced in commit 27f6c573e0. I forget to
execute put_cpu_var operation after get_cpu_var.

Signed-off-by: Chen, Gong <gong...@linux.intel.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index eeee23f..26eaf3b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -620,6 +620,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)

v = &get_cpu_var(mce_polled_error);
set_bit(0, v);
+ put_cpu_var(mce_polled_error);

/*
* Uncorrected or signalled events are handled by the exception

* handler when it is enabled, so don't process those here.

H. Peter Anvin

unread,

Apr 14, 2014, 12:20:01 PM4/14/14

to

On 04/14/2014 01:39 AM, Chen, Gong wrote:
> @@ -619,7 +619,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
> continue;
>
> v = &get_cpu_var(mce_polled_error);
> - set_bit(0, v);
> + *v = 1;
> put_cpu_var(mce_polled_error);
> /*
> * Uncorrected or signalled events are handled by the exception

The amazing thing is that you managed to miss the one place where you
could actively elide a pointer.

The above should simply be:

this_cpu_write(mce_polled_error, 1);

-hpa

H. Peter Anvin

unread,

Apr 14, 2014, 12:30:03 PM4/14/14

to

On 04/14/2014 01:39 AM, Chen, Gong wrote:

> @@ -1287,14 +1287,14 @@ static unsigned long (*mce_adjust_timer)(unsigned long interval) =
>
> static int cmc_error_seen(void)
> {
> - unsigned long *v = &__get_cpu_var(mce_polled_error);
> + unsigned long *v = this_cpu_ptr(&mce_polled_error);
>
> - return test_and_clear_bit(0, v);
> + return this_cpu_xchg(*v, 0);
> }
>

Here you produce a pointer and *then* passing it through a this_cpu_
function... this is actively wrong.

It should simply be:

return this_cpu_xchg(mce_polled_error, 0);

-hpa

H. Peter Anvin

unread,

Apr 14, 2014, 12:30:05 PM4/14/14

to

Please read Documentation/this_cpu_ops.txt for reference for how to use
these functions. However, you want to avoid forming a pointer if you
can; it is relatively expensive to do so.

-hpa

Chen, Gong

unread,

Apr 14, 2014, 11:00:02 PM4/14/14

to

On Mon, Apr 14, 2014 at 09:23:08AM -0700, H. Peter Anvin wrote:
> Please read Documentation/this_cpu_ops.txt for reference for how to use
> these functions. However, you want to avoid forming a pointer if you
> can; it is relatively expensive to do so.
>

:-). I just found your comment yesterday so that I was eager to finish it.
I should do more homework as you told and send it again. Thanks very much
for your patience.

signature.asc