[ANNOUNCE] 3.12.6-rt9

Sebastian Andrzej Siewior

unread,

Dec 23, 2013, 6:00:01 PM12/23/13

to

Dear RT folks!

I'm pleased to announce the v3.12.6-rt9 patch set.

Changes since v3.12.6-rt8
- ARM's mach-sti is now using rawlock as boot_lock (like the other
mach-*)
- There was a callpath to rcu_preempt_qs() with interrupts enabled. Tiejun
Chen posted a patch to call it with interrupt disabled like we always
do.
- A patch from Paul E. McKenney to not activate RCU core on NO_HZ_FULL
CPUs
- A patch from Thomas Gleixner not to raise the timer softirq
unconditionally (only if a timer is pending)

There is also a patch in the queue from Paul E. McKenney to move RCU
processing from softirq into its own thread. After Mike Galbraith
reported a few RCU stalls I decided to keep it disabled for now until I
have some time to look at it.

Known issues:

- bcache is disabled.

- Brian Silverman reported a BUG (via Debian BTS) where gdb's
record command does something nasty and causes a double fault on
x86-64 kernel with 32bit userland (the debugged application).
32bit and 64bit setup are not kernels are not affected. The
problem is limited is limited to x86.

- Sami Pietikäinen reported a crash in __ip_make_skb(). Nicholas
Mc Guire is preparing a patch for it.

The delta patch against v3.12.6-rt8 is appended below and can be found
here:
https://www.kernel.org/pub/linux/kernel/projects/rt/3.12/incr/patch-3.12.6-rt8-rt9.patch.xz

The RT patch against 3.12.6 can be found here:

https://www.kernel.org/pub/linux/kernel/projects/rt/3.12/patch-3.12.6-rt9.patch.xz

The split quilt queue is available at:

https://www.kernel.org/pub/linux/kernel/projects/rt/3.12/patches-3.12.6-rt9.tar.xz

Sebastian

diff --git a/arch/arm/mach-sti/platsmp.c b/arch/arm/mach-sti/platsmp.c
index dce50d9..c05b764 100644
--- a/arch/arm/mach-sti/platsmp.c
+++ b/arch/arm/mach-sti/platsmp.c
@@ -35,7 +35,7 @@ static void write_pen_release(int val)
outer_clean_range(__pa(&pen_release), __pa(&pen_release + 1));
}

-static DEFINE_SPINLOCK(boot_lock);
+static DEFINE_RAW_SPINLOCK(boot_lock);

void sti_secondary_init(unsigned int cpu)
{
@@ -50,8 +50,8 @@ void sti_secondary_init(unsigned int cpu)
/*
* Synchronise with the boot thread.
*/
- spin_lock(&boot_lock);
- spin_unlock(&boot_lock);
+ raw_spin_lock(&boot_lock);
+ raw_spin_unlock(&boot_lock);
}

int sti_boot_secondary(unsigned int cpu, struct task_struct *idle)
@@ -62,7 +62,7 @@ int sti_boot_secondary(unsigned int cpu, struct task_struct *idle)
* set synchronisation state between this boot processor
* and the secondary one
*/
- spin_lock(&boot_lock);
+ raw_spin_lock(&boot_lock);

/*
* The secondary processor is waiting to be released from
@@ -93,7 +93,7 @@ int sti_boot_secondary(unsigned int cpu, struct task_struct *idle)
* now the secondary core is starting up let it run its
* calibrations, then wait for it to finish
*/
- spin_unlock(&boot_lock);
+ raw_spin_unlock(&boot_lock);

return pen_release != -1 ? -ENOSYS : 0;
}
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 79a7a35..bdbf77db 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -461,9 +461,8 @@ extern int schedule_hrtimeout_range_clock(ktime_t *expires,
unsigned long delta, const enum hrtimer_mode mode, int clock);
extern int schedule_hrtimeout(ktime_t *expires, const enum hrtimer_mode mode);

-/* Soft interrupt function to run the hrtimer queues: */
+/* Called from the periodic timer tick */
extern void hrtimer_run_queues(void);
-extern void hrtimer_run_pending(void);

/* Bootup initialization: */
extern void __init hrtimers_init(void);
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index c383841..7aa442e 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1694,30 +1694,6 @@ static void run_hrtimer_softirq(struct softirq_action *h)
}

/*
- * Called from timer softirq every jiffy, expire hrtimers:
- *
- * For HRT its the fall back code to run the softirq in the timer
- * softirq context in case the hrtimer initialization failed or has
- * not been done yet.
- */
-void hrtimer_run_pending(void)
-{
- if (hrtimer_hres_active())
- return;
-
- /*
- * This _is_ ugly: We have to check in the softirq context,
- * whether we can switch to highres and / or nohz mode. The
- * clocksource switch happens in the timer interrupt with
- * xtime_lock held. Notification from there only sets the
- * check bit in the tick_oneshot code, otherwise we might
- * deadlock vs. xtime_lock.
- */
- if (tick_check_oneshot_change(!hrtimer_is_hres_enabled()))
- hrtimer_switch_to_hres();
-}
-
-/*
* Called from hardirq context every jiffy
*/
void hrtimer_run_queues(void)
@@ -1730,6 +1706,13 @@ void hrtimer_run_queues(void)
if (hrtimer_hres_active())
return;

+ /*
+ * Check whether we can switch to highres mode.
+ */
+ if (tick_check_oneshot_change(!hrtimer_is_hres_enabled())
+ && hrtimer_switch_to_hres())
+ return;
+
for (index = 0; index < HRTIMER_MAX_CLOCK_BASES; index++) {
base = &cpu_base->clock_base[index];
if (!timerqueue_getnext(&base->active))
diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 10365be..f4f61bb 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -204,7 +204,12 @@ static void rcu_preempt_qs(int cpu);

void rcu_bh_qs(int cpu)
{
+ unsigned long flags;
+
+ /* Callers to this function, rcu_preempt_qs(), must disable irqs. */
+ local_irq_save(flags);
rcu_preempt_qs(cpu);
+ local_irq_restore(flags);
}
#else
void rcu_bh_qs(int cpu)
@@ -2674,6 +2679,10 @@ static int __rcu_pending(struct rcu_state *rsp, struct rcu_data *rdp)
/* Check for CPU stalls, if enabled. */
check_cpu_stall(rsp, rdp);

+ /* Is this CPU a NO_HZ_FULL CPU that should ignore RCU? */
+ if (rcu_nohz_full_cpu(rsp))
+ return 0;
+
/* Is the RCU core waiting for a quiescent state from this CPU? */
if (rcu_scheduler_fully_active &&
rdp->qs_pending && !rdp->passed_quiesce) {
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index c36d59a..eb4fe67 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -563,6 +563,7 @@ static void rcu_sysidle_report_gp(struct rcu_state *rsp, int isidle,
unsigned long maxj);
static void rcu_bind_gp_kthread(void);
static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp);
+static bool rcu_nohz_full_cpu(struct rcu_state *rsp);

#endif /* #ifndef RCU_TREE_NONCORE */

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 05bcc6f..c1735a1 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2801,3 +2801,23 @@ static void rcu_sysidle_init_percpu_data(struct rcu_dynticks *rdtp)
}

#endif /* #else #ifdef CONFIG_NO_HZ_FULL_SYSIDLE */
+
+/*
+ * Is this CPU a NO_HZ_FULL CPU that should ignore RCU so that the
+ * grace-period kthread will do force_quiescent_state() processing?
+ * The idea is to avoid waking up RCU core processing on such a
+ * CPU unless the grace period has extended for too long.
+ *
+ * This code relies on the fact that all NO_HZ_FULL CPUs are also
+ * CONFIG_RCU_NOCB_CPUs.
+ */
+static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
+{
+#ifdef CONFIG_NO_HZ_FULL
+ if (tick_nohz_full_cpu(smp_processor_id()) &&
+ (!rcu_gp_in_progress(rsp) ||
+ ULONG_CMP_LT(jiffies, ACCESS_ONCE(rsp->gp_start) + HZ)))
+ return 1;
+#endif /* #ifdef CONFIG_NO_HZ_FULL */
+ return 0;
+}
diff --git a/kernel/timer.c b/kernel/timer.c
index b06c647..46467be 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1443,8 +1443,6 @@ static void run_timer_softirq(struct softirq_action *h)
irq_work_run();
#endif

- hrtimer_run_pending();
-
if (time_after_eq(jiffies, base->timer_jiffies))
__run_timers(base);
}
@@ -1454,8 +1452,27 @@ static void run_timer_softirq(struct softirq_action *h)
*/
void run_local_timers(void)
{
+ struct tvec_base *base = __this_cpu_read(tvec_bases);
+
hrtimer_run_queues();
- raise_softirq(TIMER_SOFTIRQ);
+ /*
+ * We can access this lockless as we are in the timer
+ * interrupt. If there are no timers queued, nothing to do in
+ * the timer softirq.
+ */
+ if (!spin_do_trylock(&base->lock)) {
+ raise_softirq(TIMER_SOFTIRQ);
+ return;
+ }
+ if (!base->active_timers)
+ goto out;
+
+ /* Check whether the next pending timer has expired */
+ if (time_before_eq(base->next_timer, jiffies))
+ raise_softirq(TIMER_SOFTIRQ);
+out:
+ rt_spin_unlock_after_trylock_in_irq(&base->lock);
+
}

#ifdef __ARCH_WANT_SYS_ALARM
diff --git a/localversion-rt b/localversion-rt
index 700c857..22746d6 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt8
+-rt9
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Mike Galbraith

unread,

Dec 24, 2013, 10:50:02 AM12/24/13

to

On Mon, 2013-12-23 at 23:50 +0100, Sebastian Andrzej Siewior wrote:
> Dear RT folks!
>
> I'm pleased to announce the v3.12.6-rt9 patch set.
>
> Changes since v3.12.6-rt8
> - ARM's mach-sti is now using rawlock as boot_lock (like the other
> mach-*)
> - There was a callpath to rcu_preempt_qs() with interrupts enabled. Tiejun
> Chen posted a patch to call it with interrupt disabled like we always
> do.
> - A patch from Paul E. McKenney to not activate RCU core on NO_HZ_FULL
> CPUs
> - A patch from Thomas Gleixner not to raise the timer softirq
> unconditionally (only if a timer is pending)
>
>
> There is also a patch in the queue from Paul E. McKenney to move RCU
> processing from softirq into its own thread. After Mike Galbraith
> reported a few RCU stalls I decided to keep it disabled for now until I
> have some time to look at it.

I built this kernel with Paul's patch and NO_HZ_FULL enabled again on 64
core box. I haven't seen RCU grip yet, but I just checked on it after
3.5 hours into this boot/beat (after fixing crash+kdump setup), and
found it in the process of dumping.

crash> bt
PID: 508 TASK: ffff8802739ba340 CPU: 16 COMMAND: "ksoftirqd/16"
#0 [ffff880276806a40] machine_kexec at ffffffff8103bc07
#1 [ffff880276806aa0] crash_kexec at ffffffff810d56b3
#2 [ffff880276806b70] panic at ffffffff815bf8b0
#3 [ffff880276806bf0] watchdog_overflow_callback at ffffffff810fed3d
#4 [ffff880276806c10] __perf_event_overflow at ffffffff81131928
#5 [ffff880276806ca0] perf_event_overflow at ffffffff81132254
#6 [ffff880276806cb0] intel_pmu_handle_irq at ffffffff8102078f
#7 [ffff880276806de0] perf_event_nmi_handler at ffffffff815c5825
#8 [ffff880276806e10] nmi_handle at ffffffff815c4ed3
#9 [ffff880276806ea0] default_do_nmi at ffffffff815c5063
#10 [ffff880276806ed0] do_nmi at ffffffff815c5388
#11 [ffff880276806ef0] end_repeat_nmi at ffffffff815c4371
[exception RIP: _raw_spin_trylock+48]
RIP: ffffffff815c3790 RSP: ffff880276803e28 RFLAGS: 00000002
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000002
RDX: ffff880276803e28 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff815c3790 R8: ffffffff815c3790 R9: 0000000000000018
R10: ffff880276803e28 R11: 0000000000000002 R12: ffffffffffffffff
R13: ffff880273a0c000 R14: ffff8802739ba340 R15: ffff880273a03fd8
ORIG_RAX: ffff880273a03fd8 CS: 0010 SS: 0018
--- <RT exception stack> ---
#12 [ffff880276803e28] _raw_spin_trylock at ffffffff815c3790
#13 [ffff880276803e30] rt_spin_lock_slowunlock_hirq at ffffffff815c2cc8
#14 [ffff880276803e50] rt_spin_unlock_after_trylock_in_irq at ffffffff815c3425
#15 [ffff880276803e60] get_next_timer_interrupt at ffffffff810684a7
#16 [ffff880276803ed0] tick_nohz_stop_sched_tick at ffffffff810c5f2e
#17 [ffff880276803f50] tick_nohz_irq_exit at ffffffff810c6333
#18 [ffff880276803f70] irq_exit at ffffffff81060065
#19 [ffff880276803f90] smp_apic_timer_interrupt at ffffffff810358f5
#20 [ffff880276803fb0] apic_timer_interrupt at ffffffff815cbf9d
--- <IRQ stack> ---
#21 [ffff880273a03b28] apic_timer_interrupt at ffffffff815cbf9d
[exception RIP: _raw_spin_lock+50]
RIP: ffffffff815c3642 RSP: ffff880273a03bd8 RFLAGS: 00000202
RAX: 0000000000008b49 RBX: ffff880272157290 RCX: ffff8802739ba340
RDX: 0000000000008b4a RSI: 0000000000000010 RDI: ffff880273a0c000
RBP: ffff880273a03bd8 R8: 0000000000000001 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff810927b5
R13: ffff880273a03b68 R14: 0000000000000010 R15: 0000000000000010
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#22 [ffff880273a03be0] rt_spin_lock_slowlock at ffffffff815c2591
#23 [ffff880273a03cc0] rt_spin_lock at ffffffff815c3362
#24 [ffff880273a03cd0] run_timer_softirq at ffffffff81069002
#25 [ffff880273a03d70] handle_softirq at ffffffff81060d0f
#26 [ffff880273a03db0] do_current_softirqs at ffffffff81060f3c
#27 [ffff880273a03e20] run_ksoftirqd at ffffffff81061045
#28 [ffff880273a03e40] smpboot_thread_fn at ffffffff81089c31
#29 [ffff880273a03ec0] kthread at ffffffff810807fe
#30 [ffff880273a03f50] ret_from_fork at ffffffff815cb28c
crash> gdb list *0xffffffff815c2591
0xffffffff815c2591 is in rt_spin_lock_slowlock (kernel/rtmutex.c:109).
104 }
105 #endif
106
107 static inline void init_lists(struct rt_mutex *lock)
108 {
109 if (unlikely(!lock->wait_list.node_list.prev))
110 plist_head_init(&lock->wait_list);
111 }
112
113 /*
crash> gdb list *0xffffffff815c2590
0xffffffff815c2590 is in rt_spin_lock_slowlock (kernel/rtmutex.c:744).
739 struct rt_mutex_waiter waiter, *top_waiter;
740 int ret;
741
742 rt_mutex_init_waiter(&waiter, true);
743
744 raw_spin_lock(&lock->wait_lock);
745 init_lists(lock);
746
747 if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) {
748 raw_spin_unlock(&lock->wait_lock);
crash> gdb list *0xffffffff815c2cc8
0xffffffff815c2cc8 is in rt_spin_lock_slowunlock_hirq (kernel/rtmutex.c:851).
846 {
847 int ret;
848
849 do {
850 ret = raw_spin_trylock(&lock->wait_lock);
851 } while (!ret);
852
853 __rt_spin_lock_slowunlock(lock);
854 }
855

Dang, Santa might have delivered a lock pick set in a few more hours.

Pavel Vasilyev

unread,

Dec 24, 2013, 11:40:01 AM12/24/13

to

24.12.2013 19:47, Mike Galbraith пишет:

> On Mon, 2013-12-23 at 23:50 +0100, Sebastian Andrzej Siewior wrote:

> crash> bt
> PID: 508 TASK: ffff8802739ba340 CPU: 16 COMMAND: "ksoftirqd/16"

YES!!! And ARM code broke :)

--

Pavel.

signature.asc

Mike Galbraith

unread,

Dec 24, 2013, 10:30:02 PM12/24/13

to

And NO_HZ_TICK config survived for only 4.5 hours.

PID: 6948 TASK: ffff880272d1f1c0 CPU: 29 COMMAND: "tbench"
#0 [ffff8802769a6a40] machine_kexec at ffffffff8103bc07
#1 [ffff8802769a6aa0] crash_kexec at ffffffff810d3e93
#2 [ffff8802769a6b70] panic at ffffffff815bce70
#3 [ffff8802769a6bf0] watchdog_overflow_callback at ffffffff810fd51d
#4 [ffff8802769a6c10] __perf_event_overflow at ffffffff8112f1f8
#5 [ffff8802769a6ca0] perf_event_overflow at ffffffff8112fb14
#6 [ffff8802769a6cb0] intel_pmu_handle_irq at ffffffff8102078f
#7 [ffff8802769a6de0] perf_event_nmi_handler at ffffffff815c2de5
#8 [ffff8802769a6e10] nmi_handle at ffffffff815c2493
#9 [ffff8802769a6ea0] default_do_nmi at ffffffff815c2623
#10 [ffff8802769a6ed0] do_nmi at ffffffff815c2948
#11 [ffff8802769a6ef0] end_repeat_nmi at ffffffff815c1931
[exception RIP: preempt_schedule+36]
RIP: ffffffff815be944 RSP: ffff8802769a3d98 RFLAGS: 00000002

RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000002

RDX: ffff8802769a3d98 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff815be944 R8: ffffffff815be944 R9: 0000000000000018
R10: ffff8802769a3d98 R11: 0000000000000002 R12: ffffffffffffffff
R13: ffff880273f74000 R14: ffff880272d1f1c0 R15: ffff880269cedfd8
ORIG_RAX: ffff880269cedfd8 CS: 0010 SS: 0018

--- <RT exception stack> ---

#12 [ffff8802769a3d98] preempt_schedule at ffffffff815be944
#13 [ffff8802769a3db0] _raw_spin_trylock at ffffffff815c0d6e
#14 [ffff8802769a3dc0] rt_spin_lock_slowunlock_hirq at ffffffff815c0288
#15 [ffff8802769a3de0] rt_spin_unlock_after_trylock_in_irq at ffffffff815c09e5
#16 [ffff8802769a3df0] run_local_timers at ffffffff81068025
#17 [ffff8802769a3e10] update_process_times at ffffffff810680ac
#18 [ffff8802769a3e40] tick_sched_handle at ffffffff810c3a92
#19 [ffff8802769a3e60] tick_sched_timer at ffffffff810c3d2f
#20 [ffff8802769a3e90] __run_hrtimer at ffffffff8108471d
#21 [ffff8802769a3ed0] hrtimer_interrupt at ffffffff8108497a
#22 [ffff8802769a3f70] local_apic_timer_interrupt at ffffffff810349e6
#23 [ffff8802769a3f90] smp_apic_timer_interrupt at ffffffff810358ee
#24 [ffff8802769a3fb0] apic_timer_interrupt at ffffffff815c955d
--- <IRQ stack> ---
#25 [ffff880269ced848] apic_timer_interrupt at ffffffff815c955d
[exception RIP: _raw_spin_lock+53]
RIP: ffffffff815c0c05 RSP: ffff880269ced8f8 RFLAGS: 00000202
RAX: 0000000000000b7b RBX: 0000000000000282 RCX: ffff880272d1f1c0
RDX: 0000000000000b7d RSI: ffff880269ceda38 RDI: ffff880273f74000
RBP: ffff880269ced8f8 R8: 0000000000000001 R9: 00000000b54d13a4
R10: 0000000000000001 R11: 0000000000000001 R12: ffff880269ced910
R13: ffff880276d32170 R14: ffffffff810c9030 R15: ffff880269ced8b8

ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018

#26 [ffff880269ced900] rt_spin_lock_slowlock at ffffffff815bfb51
#27 [ffff880269ced9e0] rt_spin_lock at ffffffff815c0922
#28 [ffff880269ced9f0] lock_timer_base at ffffffff81067f92
#29 [ffff880269ceda20] mod_timer at ffffffff81069bcb
#30 [ffff880269ceda70] sk_reset_timer at ffffffff814d1e57
#31 [ffff880269ceda90] inet_csk_reset_xmit_timer at ffffffff8152d4a8
#32 [ffff880269cedac0] tcp_rearm_rto at ffffffff8152d583
#33 [ffff880269cedae0] tcp_ack at ffffffff81534085
#34 [ffff880269cedb60] tcp_rcv_established at ffffffff8153443d
#35 [ffff880269cedbb0] tcp_v4_do_rcv at ffffffff8153f56a
#36 [ffff880269cedbe0] __release_sock at ffffffff814d3891
#37 [ffff880269cedc10] release_sock at ffffffff814d3942
#38 [ffff880269cedc30] tcp_sendmsg at ffffffff8152b955
#39 [ffff880269cedd00] inet_sendmsg at ffffffff8155350e
#40 [ffff880269cedd30] sock_sendmsg at ffffffff814cea87
#41 [ffff880269cede40] sys_sendto at ffffffff814cebdf
#42 [ffff880269cedf80] tracesys at ffffffff815c8b09 (via system_call)
RIP: 00007f0441a1fc35 RSP: 00007fffdea86130 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: ffffffff815c8b09 RCX: ffffffffffffffff
RDX: 000000000000248d RSI: 0000000000607260 RDI: 0000000000000004
RBP: 000000000000248d R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffdea86a10
R13: 00007fffdea86414 R14: 0000000000000004 R15: 0000000000607260
ORIG_RAX: 000000000000002c CS: 0033 SS: 002b

Nicholas Mc Guire

unread,

Dec 27, 2013, 3:10:02 PM12/27/13

to

On Mon, 23 Dec 2013, Sebastian Andrzej Siewior wrote:

> Dear RT folks!
>
> I'm pleased to announce the v3.12.6-rt9 patch set.
>
> Changes since v3.12.6-rt8

<snip>

> - A patch from Thomas Gleixner not to raise the timer softirq
> unconditionally (only if a timer is pending)
>

This one seems to deadlock early in the boot sequence on x86
(i3/i7/Phenom-4x here and Carsten Emde also had boot failures)

after droping this patch with:
patch -p1 -R < ../paches/timers-do-not-raise-softirq-unconditionally.patch
3.12.6-rt9 boots up fine. cyclictest seems to be back to what it was before
(only ran for a few minutes idle and 1h with load on an i3).

The main problem with this patch though are proceduaral isues
the commit note - which is a mail exchange - actually does not explain what
the rational for the changes is (...well I don't understand the logic of
run_local_timers - if someone can explain - pleas do) and notably:

from timers-do-not-raise-softirq-unconditionally.patch
<snip>
well, that very same problem is in mainline if you add "threadirqs" to
the command line. But we can be smart about this. The untested patch
^^^^^^^^^^^^^^^^^^
below should address that issue. If that works on mainline we can
adapt it for RT (needs a trylock(&base->lock) there).
<snip>

does make me wonder why this went into -rt9 ?
It also build fails with CONFIG_PREEMPT_RT_FULL not set.

as with this patch, systems that booted just fine with 3.12.5-rt7 don't
even boot (atleast my 3 x86 test boxes here did not) this raises some
questions regarding the process of getting patches into -rtX - are
we going to fast here ?

I would prefere if such patches would go out with a request for testing
or atleast a "might blow up your system" note in them...

thx!
hofrat

Mike Galbraith

unread,

Dec 27, 2013, 10:40:01 PM12/27/13

to

On Fri, 2013-12-27 at 21:00 +0100, Nicholas Mc Guire wrote:
> On Mon, 23 Dec 2013, Sebastian Andrzej Siewior wrote:
>
> > Dear RT folks!
> >
> > I'm pleased to announce the v3.12.6-rt9 patch set.
> >
> > Changes since v3.12.6-rt8
> <snip>
> > - A patch from Thomas Gleixner not to raise the timer softirq
> > unconditionally (only if a timer is pending)
> >
>
> This one seems to deadlock early in the boot sequence on x86
> (i3/i7/Phenom-4x here and Carsten Emde also had boot failures)
>
> after droping this patch with:
> patch -p1 -R < ../paches/timers-do-not-raise-softirq-unconditionally.patch
> 3.12.6-rt9 boots up fine. cyclictest seems to be back to what it was before
> (only ran for a few minutes idle and 1h with load on an i3).
>
> The main problem with this patch though are proceduaral isues
> the commit note - which is a mail exchange - actually does not explain what
> the rational for the changes is

Raising the timer softirq unconditionally wakes ksoftirqd at every tick,
so the only time the no_hz_full "one and only one task is runnable" tick
shutdown criteria can be met is when the box has zero other runnable
tasks.. i.e. when box is idle.

Here, patch works fine boot wise, and no_hz_full tick shutdown works as
well, but there are a couple spots where taking an interrupt is a bad
idea as things sit. Watchdog barked at two such spots, and there's a
"you _will_ hit this warning in -rt" spot as well.

With bandaids on the sore spots, my 64 core box survives.

-Mike

(Less than wonderful changelogs probably comes from the fact that
maintaining -rt out of tree is time consuming as all hell. Everybody
gets to breaks it, a couple guys get to fix it up again and again.)

Mike Galbraith

unread,

Dec 27, 2013, 10:50:01 PM12/27/13

to

On Sat, 2013-12-28 at 04:30 +0100, Mike Galbraith wrote:

> (Less than wonderful changelogs probably comes from the fact that
> maintaining -rt out of tree is time consuming as all hell. Everybody
> gets to breaks it, a couple guys get to fix it up again and again.)

P.S. try rolling your tree forward to master or tip for entertainment,
you'll see what I mean. Hi Peter, Rik.. other breakers of worlds :)

Mike Galbraith

unread,

Dec 27, 2013, 11:40:01 PM12/27/13

to

On Sat, 2013-12-28 at 04:30 +0100, Mike Galbraith wrote:

> Watchdog barked at two such spots..

btw, lockdep doesn't grumble about that (didn't stare at annotation,
don't speak lockdep well). I fixed it up to not take it's toys and go
home in a snit at boot (rt_mutex debug offends it methinks), but it
didn't gripe.

Nicholas Mc Guire

unread,

Dec 28, 2013, 2:50:01 AM12/28/13

to

On Sat, 28 Dec 2013, Mike Galbraith wrote:

> On Sat, 2013-12-28 at 04:30 +0100, Mike Galbraith wrote:
>
> > (Less than wonderful changelogs probably comes from the fact that
> > maintaining -rt out of tree is time consuming as all hell. Everybody
> > gets to breaks it, a couple guys get to fix it up again and again.)
>
> P.S. try rolling your tree forward to master or tip for entertainment,
> you'll see what I mean. Hi Peter, Rik.. other breakers of worlds :)
>

protesting exernal breakage by ameding -rt with home-made landmines
does sound like an optimized entertainment strategy...

This type of blowups will not help to go mainline (refereing to 3.12.X here,
3.4/6/8/10 is a different story).

thx!
hofrat

Mike Galbraith

unread,

Dec 28, 2013, 9:00:02 AM12/28/13

to

On Sat, 2013-12-28 at 08:43 +0100, Nicholas Mc Guire wrote:

> This type of blowups will not help to go mainline (refereing to 3.12.X here,
> 3.4/6/8/10 is a different story).

Nah. Breakage is a vital sign. When breakage stops, bury it.

-Mike

Joakim Hernberg

unread,

Jan 11, 2014, 3:30:01 PM1/11/14

to

On Fri, 27 Dec 2013 21:00:24 +0100
Nicholas Mc Guire <der....@hofr.at> wrote:

> On Mon, 23 Dec 2013, Sebastian Andrzej Siewior wrote:
>
> > Dear RT folks!
> >
> > I'm pleased to announce the v3.12.6-rt9 patch set.
> >
> > Changes since v3.12.6-rt8
> <snip>
> > - A patch from Thomas Gleixner not to raise the timer softirq
> > unconditionally (only if a timer is pending)
> >
>
> This one seems to deadlock early in the boot sequence on x86
> (i3/i7/Phenom-4x here and Carsten Emde also had boot failures)

This patch seems to frequently make the kernel hang hard early in the
boot process on my i7-2600k too. Reverting
timers-do-not-raise-softirq-unconditionally.patch appears to fix the
problem.

--

Joakim

Sebastian Andrzej Siewior

unread,

Jan 17, 2014, 11:20:01 AM1/17/14

to

* Nicholas Mc Guire | 2013-12-27 21:00:24 [+0100]:

>> - A patch from Thomas Gleixner not to raise the timer softirq
>> unconditionally (only if a timer is pending)
>>
>
>This one seems to deadlock early in the boot sequence on x86
>(i3/i7/Phenom-4x here and Carsten Emde also had boot failures)
>
>after droping this patch with:
>patch -p1 -R < ../paches/timers-do-not-raise-softirq-unconditionally.patch
>3.12.6-rt9 boots up fine. cyclictest seems to be back to what it was before
>(only ran for a few minutes idle and 1h with load on an i3).
>
>The main problem with this patch though are proceduaral isues
>the commit note - which is a mail exchange - actually does not explain what
>the rational for the changes is (...well I don't understand the logic of
>run_local_timers - if someone can explain - pleas do) and notably:
>
>from timers-do-not-raise-softirq-unconditionally.patch
><snip>
>well, that very same problem is in mainline if you add "threadirqs" to
>the command line. But we can be smart about this. The untested patch
> ^^^^^^^^^^^^^^^^^^
>below should address that issue. If that works on mainline we can
>adapt it for RT (needs a trylock(&base->lock) there).
><snip>
>
> does make me wonder why this went into -rt9 ?

It was on the mailing list for a few weeks. My understanding was that
Mike Galbraith tested it on mainline and then I added the RT specific
pieces and added it it to the tree.

> It also build fails with CONFIG_PREEMPT_RT_FULL not set.

I will add a non-RT based config to my compile tests.

> as with this patch, systems that booted just fine with 3.12.5-rt7 don't
> even boot (atleast my 3 x86 test boxes here did not) this raises some
> questions regarding the process of getting patches into -rtX - are
> we going to fast here ?
>
> I would prefere if such patches would go out with a request for testing
> or atleast a "might blow up your system" note in them...

I didn't expect that much trouble. In general I try to avoid adding
explosives unless marked as such.

>thx!
>hofrat

Sebastian

Sebastian Andrzej Siewior

unread,

Jan 17, 2014, 12:10:02 PM1/17/14

to

* Mike Galbraith | 2013-12-24 16:47:47 [+0100]:

>I built this kernel with Paul's patch and NO_HZ_FULL enabled again on 64
>core box. I haven't seen RCU grip yet, but I just checked on it after
>3.5 hours into this boot/beat (after fixing crash+kdump setup), and
>found it in the process of dumping.

So you also have the timers-do-not-raise-softirq-unconditionally.patch?

I have a small problem with understanding this…

|#24 [ffff880273a03cd0] run_timer_softirq at ffffffff81069002

Here we obtain wait_lock from tvec_base of _this_ CPU. And we get to
init_lists() before the apic timer kicks in. So we have the wait_lock.
In the hard interrupt triggered by the apic timer we get to
get_next_timer_interrupt() and go again for same the wait_lock. Here we
have the try_lock so we avoid this deadlock.
The odd part: we get the lock. It should be the same lock because both use

| struct tvec_base *base = __this_cpu_read(tvec_bases);

to ge it. And we shouldn't get it because the lock is already hold.
We get into trouble in the unlock path where we spin forever:

|#14 [ffff880276803e50] rt_spin_unlock_after_trylock_in_irq at ffffffff815c3425

|#12 [ffff880276803e28] _raw_spin_trylock at ffffffff815c3790

which releases the lock with a trylock in order to keep lockdep happy.
My understanding was that we should be able to obtain the wait_lock here
since we were able to obtain it in the lock path and in irq off context
there is nothing that could take the lock in the meantime.

Sebastian

Mike Galbraith

unread,

Jan 17, 2014, 10:20:02 PM1/17/14

to

On Fri, 2014-01-17 at 18:00 +0100, Sebastian Andrzej Siewior wrote:
> * Mike Galbraith | 2013-12-24 16:47:47 [+0100]:
>
> >I built this kernel with Paul's patch and NO_HZ_FULL enabled again on 64
> >core box. I haven't seen RCU grip yet, but I just checked on it after
> >3.5 hours into this boot/beat (after fixing crash+kdump setup), and
> >found it in the process of dumping.
>
> So you also have the timers-do-not-raise-softirq-unconditionally.patch?

Oh dear, there's holidays, vacation, and massive turkey overdose between
then and now, but I'm almost positive that the tree was virgin $subject,
with only Paul's patch enabled, that being what I wanted to beat on.

> I have a small problem with understanding this…
>
> |#24 [ffff880273a03cd0] run_timer_softirq at ffffffff81069002
>
> Here we obtain wait_lock from tvec_base of _this_ CPU. And we get to
> init_lists() before the apic timer kicks in. So we have the wait_lock.

gdb fibs a little, we're acquiring.

>--- <IRQ stack> ---
> >#21 [ffff880273a03b28] apic_timer_interrupt at ffffffff815cbf9d
> > [exception RIP: _raw_spin_lock+50]

> In the hard interrupt triggered by the apic timer we get to
> get_next_timer_interrupt() and go again for same the wait_lock. Here we
> have the try_lock so we avoid this deadlock.
> The odd part: we get the lock. It should be the same lock because both use
> | struct tvec_base *base = __this_cpu_read(tvec_bases);
> to ge it. And we shouldn't get it because the lock is already hold.
> We get into trouble in the unlock path where we spin forever:
>
> |#14 [ffff880276803e50] rt_spin_unlock_after_trylock_in_irq at ffffffff815c3425
> |#12 [ffff880276803e28] _raw_spin_trylock at ffffffff815c3790
>
> which releases the lock with a trylock in order to keep lockdep happy.
> My understanding was that we should be able to obtain the wait_lock here
> since we were able to obtain it in the lock path and in irq off context
> there is nothing that could take the lock in the meantime.

IIRC, we were endlessly trying, but with an un-punched ticket under us,
and no Xen like evilness to save the day.

I've since cleaned out my crashdump directory and moved on to frolicking
with hotplug gremlins, so don't have that one to revisit, but the don't
unconditionally raise timer softirq patch is the bad guy.

-Mike

Steven Rostedt

unread,

Jan 20, 2014, 9:20:01 PM1/20/14

to

On Sat, 18 Jan 2014 04:15:29 +0100
Mike Galbraith <bitb...@online.de> wrote:

> > So you also have the timers-do-not-raise-softirq-unconditionally.patch?
>

People have been complaining that the latest 3.12-rt does not boot on
intel i7 boxes. And by reverting this patch, it boots fine.

I happen to have a i7 box to test on, and sure enough, the latest
3.12-rt locks up on boot and reverting the
timers-do-not-raise-softirq-unconditionally.patch, it boots fine.

Looking into it, I made this small update, and the box boots. Seems
checking "active_timers" is not enough to skip raising softirqs. I
haven't looked at why yet, but I would like others to test this patch
too.

I'll leave why this lets i7 boxes boot as an exercise for Thomas ;-)

-- Steve

Signed-off-by: Steven Rostedt <ros...@goodmis.org>

diff --git a/kernel/timer.c b/kernel/timer.c
index 46467be..8212c10 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1464,13 +1464,11 @@ void run_local_timers(void)
raise_softirq(TIMER_SOFTIRQ);
return;
}
- if (!base->active_timers)
- goto out;

/* Check whether the next pending timer has expired */

if (time_before_eq(base->next_timer, jiffies))
raise_softirq(TIMER_SOFTIRQ);
-out:
+
rt_spin_unlock_after_trylock_in_irq(&base->lock);

Joe Korty

unread,

Jan 21, 2014, 10:50:01 AM1/21/14

to

On Tue, Jan 21, 2014 at 01:39:10AM -0500, Muli Baron wrote:

> > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in

> > the body of a message to majo...@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>

> While this might fix booting on i7 machines it kinds of defeats the
> original purpose of this patch, which was to let NO_HZ_FULL work
> properly with threaded interrupts. With the active_timers check removed
> the timer interrupt keeps firing even though there is only one task
> running on a specific processor, since it can't shut down the tick
> because the ksoftirqd thread keeps getting scheduled (see the previous
> thread "CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo" for the full
> discussion).
>
> -- Muli

Would something like this work? This would get us past boot, which has
always been this strange, half initialized thing one has to tiptoe around.

- if (!base->active_timers)
+ if (!base->active_timers && system_state == SYSTEM_RUNNING)

Joe

Joakim Hernberg

unread,

Jan 22, 2014, 4:30:02 PM1/22/14

to

On Mon, 20 Jan 2014 21:17:36 -0500
Steven Rostedt <ros...@goodmis.org> wrote:

> I happen to have a i7 box to test on, and sure enough, the latest
> 3.12-rt locks up on boot and reverting the
> timers-do-not-raise-softirq-unconditionally.patch, it boots fine.

> Signed-off-by: Steven Rostedt <ros...@goodmis.org>
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 46467be..8212c10 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1464,13 +1464,11 @@ void run_local_timers(void)
> raise_softirq(TIMER_SOFTIRQ);
> return;
> }
> - if (!base->active_timers)
> - goto out;
>
> /* Check whether the next pending timer has expired */
> if (time_before_eq(base->next_timer, jiffies))
> raise_softirq(TIMER_SOFTIRQ);
> -out:
> +
> rt_spin_unlock_after_trylock_in_irq(&base->lock);
>
> }

This fixes the problem on my i7-2600k.

--

Joakim

Sebastian Andrzej Siewior

unread,

Jan 24, 2014, 6:30:01 AM1/24/14

to

On 01/21/2014 03:17 AM, Steven Rostedt wrote:
> Signed-off-by: Steven Rostedt <ros...@goodmis.org>
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 46467be..8212c10 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -1464,13 +1464,11 @@ void run_local_timers(void)
> raise_softirq(TIMER_SOFTIRQ);
> return;
> }
> - if (!base->active_timers)
> - goto out;
>
> /* Check whether the next pending timer has expired */
> if (time_before_eq(base->next_timer, jiffies))
> raise_softirq(TIMER_SOFTIRQ);

Hmmm. If active_timers is 0 and "time_before_eq(base->next_timer,
jiffies))" is true than that timer should have been initialized with
init_timer_deferrable() or we have a serious bug here where
active_timers isn't properly synchronized anymore.

Now. If there is really just a deferrable timer that expired and nothing
else then this would explain it.

> -out:
> +
> rt_spin_unlock_after_trylock_in_irq(&base->lock);
>
> }

Sebastian