Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH] x86: add workaround monitor bug

81 views
Skip to first unread message

Jacob Pan

unread,
Jul 6, 2016, 1:03:46 PM7/6/16
to LKML, Peter Zijlstra, Ingo Molnar, H. Peter Anvin, X86 Kernel, Arjan van de Ven, Len Brown, Jacob Pan
From: Peter Zijlstra <pet...@infradead.org>

Monitored cached line may not wake up from mwait on certain
Goldmont based CPUs. This patch will avoid calling
current_set_polling_and_test() and thereby not set the TIF_ flag.
The result is that we'll always send IPIs for wakeups.

Signed-off-by: Peter Zijlstra <pet...@infradead.org>
Signed-off-by: Jacob Pan <jacob....@linux.intel.com>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/mwait.h | 2 +-
arch/x86/kernel/cpu/intel.c | 5 +++++
arch/x86/kernel/process.c | 2 +-
4 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 78dbd28..197a3f4 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -304,6 +304,7 @@
#define X86_BUG_SYSRET_SS_ATTRS X86_BUG(8) /* SYSRET doesn't fix up SS attrs */
#define X86_BUG_NULL_SEG X86_BUG(9) /* Nulling a selector preserves the base */
#define X86_BUG_SWAPGS_FENCE X86_BUG(10) /* SWAPGS without input dep on GS */
+#define X86_BUG_MONITOR X86_BUG(11) /* IPI required to wake up remote cpu */


#ifdef CONFIG_X86_32
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 0deeb2d..f37f2d8 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -97,7 +97,7 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
*/
static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
{
- if (!current_set_polling_and_test()) {
+ if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
mb();
clflush((void *)&current_thread_info()->flags);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 6e2ffbe..77c6b3e 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -13,6 +13,7 @@
#include <asm/msr.h>
#include <asm/bugs.h>
#include <asm/cpu.h>
+#include <asm/intel-family.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -509,6 +510,10 @@ static void init_intel(struct cpuinfo_x86 *c)
(c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47))
set_cpu_bug(c, X86_BUG_CLFLUSH_MONITOR);

+ if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_MWAIT) &&
+ ((c->x86_model == INTEL_FAM6_ATOM_GOLDMONT)))
+ set_cpu_bug(c, X86_BUG_MONITOR);
+
#ifdef CONFIG_X86_64
if (c->x86 == 15)
c->x86_cache_alignment = c->x86_clflush_size * 2;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 96becbb..59f68f1 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -404,7 +404,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
if (c->x86_vendor != X86_VENDOR_INTEL)
return 0;

- if (!cpu_has(c, X86_FEATURE_MWAIT))
+ if (!cpu_has(c, X86_FEATURE_MWAIT) || static_cpu_has_bug(X86_BUG_MONITOR))
return 0;

return 1;
--
1.9.1

Ingo Molnar

unread,
Jul 8, 2016, 4:55:29 AM7/8/16
to Jacob Pan, LKML, Peter Zijlstra, Ingo Molnar, H. Peter Anvin, X86 Kernel, Arjan van de Ven, Len Brown
Hm, this might be suboptimal: if MONITOR/MWAIT is implemented by setting the
exclusive flag for the monitored memory address and then snooping for cache
invalidation requests for that cache line, then not modifying the ->flags value
with TIF_POLLING_NRFLAG makes MWAIT not wake up - only the IPI would wake it up.

I think a better approach would be to still optimistically modify the ->flags
value _AND_ to also send an IPI, to make sure the wakeup is not lost. This means
that the woken CPU will wake up much faster (no IPI latency).

(The system will still bear the ovehread of sending and receiving the IPI, but
that cost is unavoidable if there's no other workaround for this erratum.)

Thanks,

Ingo

Peter Zijlstra

unread,
Jul 8, 2016, 7:46:16 AM7/8/16
to Ingo Molnar, Jacob Pan, LKML, Ingo Molnar, H. Peter Anvin, X86 Kernel, Arjan van de Ven, Len Brown
On Fri, Jul 08, 2016 at 10:55:15AM +0200, Ingo Molnar wrote:

> > static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
> > {
> > - if (!current_set_polling_and_test()) {
> > + if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
>
> Hm, this might be suboptimal: if MONITOR/MWAIT is implemented by setting the
> exclusive flag for the monitored memory address and then snooping for cache
> invalidation requests for that cache line, then not modifying the ->flags value
> with TIF_POLLING_NRFLAG makes MWAIT not wake up - only the IPI would wake it up.

Confused.. POLLING_NRFLAGS is not used to wake up ever. It is only used
to determine if we want to send IPIs or not.

And since we _must_ send an IPI in this case, because the monitor is
busted, we cannot set this.

> I think a better approach would be to still optimistically modify the ->flags
> value _AND_ to also send an IPI, to make sure the wakeup is not lost. This means
> that the woken CPU will wake up much faster (no IPI latency).

This is exactly what is done. See resched_curr()'s use of
set_nr_and_not_polling(). That does:

if (!(fetch_or(&flags, NEED_RESCHED) & POLLING_NRFLAG))
smp_send_reschedule(cpu);

So we unconditionally set NEED_RESCHED, if, when we set that, POLLING
was set, we skip the IPI.

So again, since monitor is busted, simply setting NEED_RESCHED will not
wake us, we must send the IPI, this is achieved by not setting
POLLING_NRFLAG.

Ingo Molnar

unread,
Jul 8, 2016, 8:07:33 AM7/8/16
to Peter Zijlstra, Jacob Pan, LKML, Ingo Molnar, H. Peter Anvin, X86 Kernel, Arjan van de Ven, Len Brown

* Peter Zijlstra <pet...@infradead.org> wrote:

> On Fri, Jul 08, 2016 at 10:55:15AM +0200, Ingo Molnar wrote:
>
> > > static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
> > > {
> > > - if (!current_set_polling_and_test()) {
> > > + if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
> >
> > Hm, this might be suboptimal: if MONITOR/MWAIT is implemented by setting the
> > exclusive flag for the monitored memory address and then snooping for cache
> > invalidation requests for that cache line, then not modifying the ->flags value
> > with TIF_POLLING_NRFLAG makes MWAIT not wake up - only the IPI would wake it up.
>
> Confused.. POLLING_NRFLAGS is not used to wake up ever. It is only used
> to determine if we want to send IPIs or not.

I called the IPI the 'wakeup' - it's the 'CPU wakeup' :-)

> And since we _must_ send an IPI in this case, because the monitor is
> busted, we cannot set this.
>
> > I think a better approach would be to still optimistically modify the ->flags
> > value _AND_ to also send an IPI, to make sure the wakeup is not lost. This means
> > that the woken CPU will wake up much faster (no IPI latency).
>
> This is exactly what is done. See resched_curr()'s use of
> set_nr_and_not_polling(). That does:
>
> if (!(fetch_or(&flags, NEED_RESCHED) & POLLING_NRFLAG))
> smp_send_reschedule(cpu);
>
> So we unconditionally set NEED_RESCHED, if, when we set that, POLLING
> was set, we skip the IPI.

Ah, indeed, we set NEED_RESCHED in the same memory address that __monitor() is
watching so all is good.

> So again, since monitor is busted, simply setting NEED_RESCHED will not
> wake us, we must send the IPI, this is achieved by not setting
> POLLING_NRFLAG.

Yeah, so I got the impression that it might be broken in only certain
circumstances, or is it completely busted?

Thanks,

Ingo

Jacob Pan

unread,
Jul 8, 2016, 3:14:48 PM7/8/16
to Ingo Molnar, Peter Zijlstra, LKML, Ingo Molnar, H. Peter Anvin, X86 Kernel, Arjan van de Ven, Len Brown, jacob....@linux.intel.com
That is right, monitor is only partially broken not completely busted. I
don't have the statistics but it is not rare to miss wakeup. The
typical symptom is random slowness and fails to boot, without this
patch.

So doing both can speed up wake up in some cases.

Jacob
> Thanks,
>
> Ingo

[Jacob Pan]

Jacob Pan

unread,
Jul 18, 2016, 2:43:07 PM7/18/16
to LKML, Peter Zijlstra, Ingo Molnar, H. Peter Anvin, X86 Kernel, Arjan van de Ven, Len Brown, Jacob Pan
From: Peter Zijlstra <pet...@infradead.org>

Monitored cached line may not wake up from mwait on certain
Goldmont based CPUs. This patch will avoid calling
current_set_polling_and_test() and thereby not set the TIF_ flag.
The result is that we'll always send IPIs for wakeups.

Signed-off-by: Peter Zijlstra <pet...@infradead.org>
Signed-off-by: Jacob Pan <jacob....@linux.intel.com>
---
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/mwait.h | 2 +-
arch/x86/kernel/cpu/intel.c | 5 +++++
arch/x86/kernel/process.c | 2 +-
4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index c64b1e9..c5097a7 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -310,5 +310,5 @@
#endif
#define X86_BUG_NULL_SEG X86_BUG(10) /* Nulling a selector preserves the base */
#define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */
-
+#define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote cpu */
#endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 0deeb2d..f37f2d8 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -97,7 +97,7 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
*/
static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
{
- if (!current_set_polling_and_test()) {
+ if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
mb();
clflush((void *)&current_thread_info()->flags);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index f380b61..fcd484d 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -13,6 +13,7 @@
#include <asm/msr.h>
#include <asm/bugs.h>
#include <asm/cpu.h>
+#include <asm/intel-family.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -508,6 +509,10 @@ static void init_intel(struct cpuinfo_x86 *c)
(c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47))
set_cpu_bug(c, X86_BUG_CLFLUSH_MONITOR);

+ if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_MWAIT) &&
+ ((c->x86_model == INTEL_FAM6_ATOM_GOLDMONT)))
+ set_cpu_bug(c, X86_BUG_MONITOR);
+
#ifdef CONFIG_X86_64
if (c->x86 == 15)
c->x86_cache_alignment = c->x86_clflush_size * 2;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 61b703c..62c0b0e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -405,7 +405,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
if (c->x86_vendor != X86_VENDOR_INTEL)
return 0;

- if (!cpu_has(c, X86_FEATURE_MWAIT))
+ if (!cpu_has(c, X86_FEATURE_MWAIT) || static_cpu_has_bug(X86_BUG_MONITOR))
return 0;

return 1;
--
2.7.4

Jacob Pan

unread,
Jul 18, 2016, 2:47:19 PM7/18/16
to Jacob Pan, Ingo Molnar, Peter Zijlstra, LKML, Ingo Molnar, H. Peter Anvin, X86 Kernel, Arjan van de Ven, Len Brown
BTW, I just rebased and sent out v2 to avoid conflict with the commit
below. Could you take this patch for v4.8? it is a critical fix for
affected CPUs.

commit 8709ed4d4b0eab04561c1ec9e6ea50fd1e3897ff
Author: Dave Hansen <dave....@linux.intel.com>
Date: Fri Jun 17 17:15:03 2016 -0700

x86/cpu: Fix duplicated X86_BUG(9) macro



Thanks,

Jacob

Brian Gerst

unread,
Jul 18, 2016, 2:50:21 PM7/18/16
to Jacob Pan, LKML, Peter Zijlstra, Ingo Molnar, H. Peter Anvin, X86 Kernel, Arjan van de Ven, Len Brown
It would be simpler to just force clear X86_FEATURE_MWAIT.

--
Brian Gerst

Jacob Pan

unread,
Jul 18, 2016, 3:11:14 PM7/18/16
to Brian Gerst, LKML, Peter Zijlstra, Ingo Molnar, H. Peter Anvin, X86 Kernel, Arjan van de Ven, Len Brown, jacob....@linux.intel.com
On Mon, 18 Jul 2016 14:49:56 -0400
Brian Gerst <brg...@gmail.com> wrote:

> It would be simpler to just force clear X86_FEATURE_MWAIT.

Then we cannot use intel_idle driver. In addition, ACPI idle driver
needs BIOS support which may or may not be there for all platforms.

Thanks,

Jacob

tip-bot for Peter Zijlstra

unread,
Jul 20, 2016, 6:42:12 AM7/20/16
to linux-ti...@vger.kernel.org, le...@kernel.org, h...@zytor.com, brg...@gmail.com, torv...@linux-foundation.org, dvla...@redhat.com, lu...@kernel.org, pet...@infradead.org, b...@alien8.de, mi...@kernel.org, jpoi...@redhat.com, ar...@linux.intel.com, tg...@linutronix.de, jacob....@linux.intel.com, linux-...@vger.kernel.org
Commit-ID: 08e237fa56a1d95c1372033bc29c4a2517b3c0fa
Gitweb: http://git.kernel.org/tip/08e237fa56a1d95c1372033bc29c4a2517b3c0fa
Author: Peter Zijlstra <pet...@infradead.org>
AuthorDate: Mon, 18 Jul 2016 11:41:10 -0700
Committer: Ingo Molnar <mi...@kernel.org>
CommitDate: Wed, 20 Jul 2016 09:48:40 +0200

x86/cpu: Add workaround for MONITOR instruction erratum on Goldmont based CPUs

Monitored cached line may not wake up from mwait on certain
Goldmont based CPUs. This patch will avoid calling
current_set_polling_and_test() and thereby not set the TIF_ flag.
The result is that we'll always send IPIs for wakeups.

Signed-off-by: Peter Zijlstra <pet...@infradead.org>
Signed-off-by: Jacob Pan <jacob....@linux.intel.com>
Cc: Andy Lutomirski <lu...@kernel.org>
Cc: Arjan van de Ven <ar...@linux.intel.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brg...@gmail.com>
Cc: Denys Vlasenko <dvla...@redhat.com>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Josh Poimboeuf <jpoi...@redhat.com>
Cc: Len Brown <le...@kernel.org>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Thomas Gleixner <tg...@linutronix.de>
Link: http://lkml.kernel.org/r/1468867270-18493-1-git-...@linux.intel.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
---
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/mwait.h | 2 +-
arch/x86/kernel/cpu/intel.c | 5 +++++
arch/x86/kernel/process.c | 2 +-
4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index c64b1e9..19ecc6e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -310,5 +310,5 @@
#endif
#define X86_BUG_NULL_SEG X86_BUG(10) /* Nulling a selector preserves the base */
#define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */
-
+#define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */
#endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 0deeb2d..f37f2d8 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -97,7 +97,7 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
*/
static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
{
- if (!current_set_polling_and_test()) {
+ if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
mb();
clflush((void *)&current_thread_info()->flags);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index c1a89bc..abf6012 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -13,6 +13,7 @@
#include <asm/msr.h>
#include <asm/bugs.h>
#include <asm/cpu.h>
+#include <asm/intel-family.h>

#ifdef CONFIG_X86_64
#include <linux/topology.h>
@@ -508,6 +509,10 @@ static void init_intel(struct cpuinfo_x86 *c)
(c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47))
set_cpu_bug(c, X86_BUG_CLFLUSH_MONITOR);

+ if (c->x86 == 6 && boot_cpu_has(X86_FEATURE_MWAIT) &&
+ ((c->x86_model == INTEL_FAM6_ATOM_GOLDMONT)))
+ set_cpu_bug(c, X86_BUG_MONITOR);
+
#ifdef CONFIG_X86_64
if (c->x86 == 15)
c->x86_cache_alignment = c->x86_clflush_size * 2;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 96becbb..59f68f1 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -404,7 +404,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
0 new messages