After I added some printks to show all elements in clockevent_devices, I
found kernel hangs when I tried to resume from s2ram.
In clockevents_register_device, clockevents_do_notify ADD is always followed
by clockevents_notify_released. Although clockevents_do_notify ADD will use
tick_check_new_device to add new devices and replace old devices to the
clockevents_released list, clockevents_notify_released add them back to
clockevent_devices list.
My system is Quad-Core x86_64, with apic and hpet enables, after boot up,
the elements in clockevent_devices list is :
clockevent_device->lapic(3)->hpet5(3)->lapic(2)->hpet4(2)->lapic(1)->hpet3(1)-
->lapic(0)->hpet2(0)->hpet(0)
* () means cpu id
But active clock_event_device is hpet2,hpet3,hpet4,hpet5. Then at s2ram stage,
cpu 1,2,3 is down, then notify CLOCK_EVT_NOTIFY_CPU_DEAD will calls tick_shutdown,
then hpet2,hpet3,hpet4,hpet5 was deleted from clockevent_device list.
So after s2ram, elements in clockevent_device list is:
clockevent_device->lapic(3)->lapic(2)->lapic(1)->lapic(0)->hpet2(0)->hpet(0)
Then at resume stage, cpu 1,2,3 is up, it will register lapic again, and then
perform list_add lapic on clockevent_device list, e.g. list_add lapic(1) on
above list, lapic will move to the clockevent_device->next, but lapic(2)->next
is still point to lapic(1), the list is circular and corrupted then.
This patchset aims to fixes above behaviour by:
- on clockevents_register_device, if notify ADD success, move new devices
to the clockevent_devices list, otherwise move to clockevents_released
list.
- on clockevents_notify_released, same behaviour as above.
- on clockevents_notify CPU_DEAD, remove related devices on dead cpu from
clockevents_released list.
It makes sure that only active devices on each cpu is on clockevent_devices list.
With this patchset, the list_del corruption disappeared, and suspend/resume, cpu
hotplug works fine on my system.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Great detective work !
> This patchset aims to fixes above behaviour by:
> - on clockevents_register_device, if notify ADD success, move new devices
> to the clockevent_devices list, otherwise move to clockevents_released
> list.
> - on clockevents_notify_released, same behaviour as above.
> - on clockevents_notify CPU_DEAD, remove related devices on dead cpu from
> clockevents_released list.
>
> It makes sure that only active devices on each cpu is on clockevent_devices list.
> With this patchset, the list_del corruption disappeared, and suspend/resume, cpu
> hotplug works fine on my system.
I'm not happy about that churn. Why don't we simply scan the
clockevent_devices list for leftovers of the dead CPU ?
Untested patch below solves the same problem.
Thanks,
tglx
----
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index 20a8920..5dd857f 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -238,8 +238,9 @@ void clockevents_exchange_device(struct clock_event_device *old,
*/
void clockevents_notify(unsigned long reason, void *arg)
{
- struct list_head *node, *tmp;
+ struct clock_event_device *dev, *tmp;
unsigned long flags;
+ int cpu;
spin_lock_irqsave(&clockevents_lock, flags);
clockevents_do_notify(reason, arg);
@@ -250,8 +251,19 @@ void clockevents_notify(unsigned long reason, void *arg)
* Unregister the clock event devices which were
* released from the users in the notify chain.
*/
- list_for_each_safe(node, tmp, &clockevents_released)
- list_del(node);
+ list_for_each_entry_safe(dev, tmp, &clockevents_released, list)
+ list_del(&dev->list);
+ /*
+ * Now check whether the CPU has left unused per cpu devices
+ */
+ cpu = *((int *)arg);
+ list_for_each_entry_safe(dev, tmp, &clockevent_devices, list) {
+ if (cpumask_test_cpu(cpu, dev->cpumask) &&
+ cpumask_weight(dev->cpumask) == 1) {
+ BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
+ list_del(&dev->list);
+ }
+ }
break;
default:
break;
>
> I'm not happy about that churn. Why don't we simply scan the
> clockevent_devices list for leftovers of the dead CPU ?
>
> Untested patch below solves the same problem.
Hi, I have 3 bug reports about a clockevent failure while halting the system (2.6.31.11 kernel).
It exactly pinpoints to the line 262 which got changed with this patch merged to 2.6.31.10 (and also 2.6.32.3):
[4406.986777] kernel BUG at kernel/time/clockevents.c:262!
[4406.986777] invalid opcode: 0000 [#1] SMP
[4406.986777] last sysfs file: /sysfs/module/ip_tables/initstate
All 3 systems have Pentium 4 processors with different clock speeds.
Here's a screenshot of the full stacktrace:
http://bugs.pardus.org.tr/attachment.cgi?id=4820
and the relevant bugzilla report:
http://bugzilla.kernel.org/show_bug.cgi?id=15073
Thanks,
Ozan Caglayan
Pardus Linux -- http://www.pardus.org.tr/eng
I think this one is duplicated of
http://bugzilla.kernel.org/show_bug.cgi?id=15037
Thomas is working on the fix. I've sent a workaround patch on
http://patchwork.kernel.org/patch/71537/
> On 01/17/2010 05:28 PM, Ozan Çağlayan wrote:
> > Thomas Gleixner wrote:
> >
> > >
> > > I'm not happy about that churn. Why don't we simply scan the
> > > clockevent_devices list for leftovers of the dead CPU ?
> > >
> > > Untested patch below solves the same problem.
> >
> >
> > Hi, I have 3 bug reports about a clockevent failure while halting the system
> > (2.6.31.11 kernel).
> > It exactly pinpoints to the line 262 which got changed with this patch
> > merged to 2.6.31.10 (and also 2.6.32.3):
> >
> > [4406.986777] kernel BUG at kernel/time/clockevents.c:262!
> > [4406.986777] invalid opcode: 0000 [#1] SMP
> > [4406.986777] last sysfs file: /sysfs/module/ip_tables/initstate
>
> I think this one is duplicated of
> http://bugzilla.kernel.org/show_bug.cgi?id=15037
> Thomas is working on the fix. I've sent a workaround patch on
> http://patchwork.kernel.org/patch/71537/
I just applied your patch, but kept the cpuweight check. This is the
least intrusive solution for now. The logic needs an overhaul, but
thats neither rc4 nor stable material.
Thanks,
tglx