[PATCH 0/23] reboot-fixes

0 views
Skip to first unread message

Eric W. Biederman

unread,
Jul 26, 2005, 1:21:46 PM7/26/05
to Andrew Morton, Linus Torvalds, linux-...@vger.kernel.org

The reboot code paths seems to be suffering from 15 years of people
only looking at the code when it breaks. The result is there
are several code paths in which different callers expect different
semantics from the same functions, and a fair amount of imperfect
inline replication of code.

For a year or more every time I fix one bug in the bug fix reveals yet
another bug. In an attempt to end the cycle of bug fixes revealing
yet more bugs I have generated a series of patches to clean up
the semantics along the reboot path.

With the callers all agreeing on what to expect from the functions
they call it should at least be possible to kill bugs without
more showing up because of the bug fix.

My primary approach is to factor sys_reboot into several smaller
functions and provide those functions for the general kernel
consumers instead of the architecture dependent restart and
halt hooks.

I don't expect this to noticeably fix any bugs along the
main code paths but magic sysrq and several of the more obscure
code paths should work much more reliably.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Eric W. Biederman

unread,
Jul 26, 2005, 1:23:42 PM7/26/05
to Andrew Morton, Linus Torvalds, linux-...@vger.kernel.org

In the recent addition of device_suspend calls into
sys_reboot two code paths were missed.

Signed-off-by: Eric W. Biederman <ebie...@xmission.com>
---

kernel/sys.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

5f0fb00783b94248b5a76c161f1c30a033fce4d3
diff --git a/kernel/sys.c b/kernel/sys.c
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -391,6 +391,7 @@ asmlinkage long sys_reboot(int magic1, i
case LINUX_REBOOT_CMD_RESTART:
notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
system_state = SYSTEM_RESTART;
+ device_suspend(PMSG_FREEZE);
device_shutdown();
printk(KERN_EMERG "Restarting system.\n");
machine_restart(NULL);
@@ -452,6 +453,7 @@ asmlinkage long sys_reboot(int magic1, i
}
notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
system_state = SYSTEM_RESTART;
+ device_suspend(PMSG_FREEZE);
device_shutdown();
printk(KERN_EMERG "Starting new kernel\n");
machine_shutdown();

Eric W. Biederman

unread,
Jul 26, 2005, 1:31:09 PM7/26/05
to Andrew Morton, Linus Torvalds, linux-...@vger.kernel.org

Because the factors of sys_reboot don't exist people calling
into the reboot path duplicate the code badly, leading to
inconsistent expectations of code in the reboot path.

This patch should is just code motion.

Signed-off-by: Eric W. Biederman <ebie...@xmission.com>
---

include/linux/reboot.h | 9 ++++
kernel/sys.c | 106 +++++++++++++++++++++++++++++-------------------
2 files changed, 73 insertions(+), 42 deletions(-)

bbc94d1109869fe5054f7cd697c978d494edeb62
diff --git a/include/linux/reboot.h b/include/linux/reboot.h
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -55,6 +55,15 @@ extern void machine_shutdown(void);
struct pt_regs;
extern void machine_crash_shutdown(struct pt_regs *);

+/*
+ * Architecture independent implemenations of sys_reboot commands.
+ */
+
+extern void kernel_restart(char *cmd);
+extern void kernel_halt(void);
+extern void kernel_power_off(void);
+extern void kernel_kexec(void);
+
#endif

#endif /* _LINUX_REBOOT_H */


diff --git a/kernel/sys.c b/kernel/sys.c
--- a/kernel/sys.c
+++ b/kernel/sys.c

@@ -361,6 +361,62 @@ out_unlock:
return retval;
}

+void kernel_restart(char *cmd)
+{
+ notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
+ system_state = SYSTEM_RESTART;
+ device_suspend(PMSG_FREEZE);
+ device_shutdown();
+ if (!cmd) {
+ printk(KERN_EMERG "Restarting system.\n");
+ } else {
+ printk(KERN_EMERG "Restarting system with command '%s'.\n", cmd);
+ }
+ printk(".\n");
+ machine_restart(cmd);
+}
+EXPORT_SYMBOL_GPL(kernel_restart);
+
+void kernel_kexec(void)
+{
+#ifdef CONFIG_KEXEC
+ struct kimage *image;
+ image = xchg(&kexec_image, 0);
+ if (!image) {
+ return;
+ }
+ notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
+ system_state = SYSTEM_RESTART;
+ device_suspend(PMSG_FREEZE);
+ device_shutdown();
+ printk(KERN_EMERG "Starting new kernel\n");
+ machine_shutdown();
+ machine_kexec(image);
+#endif
+}
+EXPORT_SYMBOL_GPL(kernel_kexec);
+
+void kernel_halt(void)
+{
+ notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);
+ system_state = SYSTEM_HALT;
+ device_suspend(PMSG_SUSPEND);
+ device_shutdown();
+ printk(KERN_EMERG "System halted.\n");
+ machine_halt();
+}
+EXPORT_SYMBOL_GPL(kernel_halt);
+
+void kernel_power_off(void)
+{
+ notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);
+ system_state = SYSTEM_POWER_OFF;
+ device_suspend(PMSG_SUSPEND);
+ device_shutdown();
+ printk(KERN_EMERG "Power down.\n");
+ machine_power_off();
+}
+EXPORT_SYMBOL_GPL(kernel_power_off);

/*
* Reboot system call: for obvious reasons only root may call it,
@@ -389,12 +445,7 @@ asmlinkage long sys_reboot(int magic1, i
lock_kernel();
switch (cmd) {
case LINUX_REBOOT_CMD_RESTART:
- notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
- system_state = SYSTEM_RESTART;
- device_suspend(PMSG_FREEZE);
- device_shutdown();
- printk(KERN_EMERG "Restarting system.\n");
- machine_restart(NULL);
+ kernel_restart(NULL);
break;

case LINUX_REBOOT_CMD_CAD_ON:
@@ -406,23 +457,13 @@ asmlinkage long sys_reboot(int magic1, i
break;

case LINUX_REBOOT_CMD_HALT:
- notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);
- system_state = SYSTEM_HALT;
- device_suspend(PMSG_SUSPEND);
- device_shutdown();
- printk(KERN_EMERG "System halted.\n");
- machine_halt();
+ kernel_halt();
unlock_kernel();
do_exit(0);
break;

case LINUX_REBOOT_CMD_POWER_OFF:
- notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);
- system_state = SYSTEM_POWER_OFF;
- device_suspend(PMSG_SUSPEND);
- device_shutdown();
- printk(KERN_EMERG "Power down.\n");
- machine_power_off();
+ kernel_power_off();
unlock_kernel();
do_exit(0);
break;
@@ -434,33 +475,14 @@ asmlinkage long sys_reboot(int magic1, i
}
buffer[sizeof(buffer) - 1] = '\0';

- notifier_call_chain(&reboot_notifier_list, SYS_RESTART, buffer);
- system_state = SYSTEM_RESTART;
- device_suspend(PMSG_FREEZE);
- device_shutdown();
- printk(KERN_EMERG "Restarting system with command '%s'.\n", buffer);
- machine_restart(buffer);
+ kernel_restart(buffer);
break;

-#ifdef CONFIG_KEXEC
case LINUX_REBOOT_CMD_KEXEC:
- {
- struct kimage *image;
- image = xchg(&kexec_image, 0);
- if (!image) {
- unlock_kernel();
- return -EINVAL;
- }
- notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
- system_state = SYSTEM_RESTART;
- device_suspend(PMSG_FREEZE);
- device_shutdown();
- printk(KERN_EMERG "Starting new kernel\n");
- machine_shutdown();
- machine_kexec(image);
- break;
- }
-#endif
+ kernel_kexec();
+ unlock_kernel();
+ return -EINVAL;
+
#ifdef CONFIG_SOFTWARE_SUSPEND
case LINUX_REBOOT_CMD_SW_SUSPEND:
{

Pavel Machek

unread,
Jul 26, 2005, 4:10:29 PM7/26/05
to Eric W. Biederman, Andrew Morton, Linus Torvalds, linux-...@vger.kernel.org
Hi!

> The reboot code paths seems to be suffering from 15 years of people
> only looking at the code when it breaks. The result is there
> are several code paths in which different callers expect different
> semantics from the same functions, and a fair amount of imperfect
> inline replication of code.
>
> For a year or more every time I fix one bug in the bug fix reveals yet
> another bug. In an attempt to end the cycle of bug fixes revealing
> yet more bugs I have generated a series of patches to clean up
> the semantics along the reboot path.
>
> With the callers all agreeing on what to expect from the functions
> they call it should at least be possible to kill bugs without
> more showing up because of the bug fix.
>
> My primary approach is to factor sys_reboot into several smaller
> functions and provide those functions for the general kernel
> consumers instead of the architecture dependent restart and
> halt hooks.
>
> I don't expect this to noticeably fix any bugs along the
> main code paths but magic sysrq and several of the more obscure
> code paths should work much more reliably.

It looks good to me. Good ammount of cruft really accumulated there...

Pavel
--
teflon -- maybe it is a trademark, but it should not be.

Eric W. Biederman

unread,
Jul 26, 2005, 3:30:15 PM7/26/05
to Andrew Morton, Linus Torvalds, linux-...@vger.kernel.org

It is obvious we wanted to call kernel_restart here
but since we don't have it the code was expanded inline and hasn't
been correct since sometime in 2.4.

Signed-off-by: Eric W. Biederman <ebie...@xmission.com>
---

kernel/sys.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

252e08b37dc29ce42a0aef357b75ec1882eb634c


diff --git a/kernel/sys.c b/kernel/sys.c
--- a/kernel/sys.c
+++ b/kernel/sys.c

@@ -502,8 +502,7 @@ asmlinkage long sys_reboot(int magic1, i

static void deferred_cad(void *dummy)
{
- notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
- machine_restart(NULL);
+ kernel_restart(NULL);
}

/*

Andrew Morton

unread,
Jul 27, 2005, 6:02:06 AM7/27/05
to Eric W. Biederman, torv...@osdl.org, linux-...@vger.kernel.org

My fairly ordinary x86 test box gets stuck during reboot on the
wait_for_completion() in ide_do_drive_cmd():


(gdb) thread 73
[Switching to thread 73 (Thread 2906)]#0 ide_do_drive_cmd (drive=0xc072dd10, rq=0x0,
action=ide_wait) at drivers/ide/ide-io.c:1671
1671 rq->waiting = NULL;
(gdb) bt
#0 ide_do_drive_cmd (drive=0xc072dd10, rq=0x0, action=ide_wait) at drivers/ide/ide-io.c:1671
#1 0xc02e64c0 in ide_diag_taskfile (drive=0xc072dd10, args=0xcc0c1e20, data_size=0, buf=0x0)
at drivers/ide/ide-taskfile.c:503
#2 0xc02e64e6 in ide_raw_taskfile (drive=0x0, args=0x0, buf=0x0) at drivers/ide/ide-taskfile.c:508
#3 0xc02eab6d in do_idedisk_flushcache (drive=0x0) at drivers/ide/ide-disk.c:831
#4 0xc02eb232 in ide_cacheflush_p (drive=0xc072dd10) at drivers/ide/ide-disk.c:1027
#5 0xc02eb2e4 in ide_device_shutdown (dev=0xc072ddf4) at drivers/ide/ide-disk.c:1083
#6 0xc02af75c in device_shutdown () at drivers/base/power/shutdown.c:45
#7 0xc0128bfe in kernel_restart (cmd=0x0) at kernel/sys.c:375
#8 0xc0128dea in sys_reboot (magic1=-18751827, magic2=672274793, cmd=19088743, arg=0x0)
at kernel/sys.c:484
#9 0xc0102ba3 in sysenter_past_esp () at arch/i386/kernel/semaphore.c:177

Eric W. Biederman

unread,
Jul 27, 2005, 12:02:04 PM7/27/05
to Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org
ebie...@xmission.com (Eric W. Biederman) writes:

> Andrew Morton <ak...@osdl.org> writes:
>
>> My fairly ordinary x86 test box gets stuck during reboot on the
>> wait_for_completion() in ide_do_drive_cmd():
>

> Hmm. The only thing I can think of is someone started adding calls
> to device_suspend() before device_shutdown(). Not understanding
> where it was a good idea I made certain the calls were in there
> consistently.
>
> Andrew can you remove the call to device_suspend from kernel_restart
> and see if this still happens?
>
> I would suspect interrupts of being disabled but it looks like
> kgdb is working and I think that requires an interrupt to notice
> new characters.

Looking at it the device_suspend calls should be safe but
in case we need to follow it up the device_suspend calls in sys_reboot
were initially introduced in:


commit 620b03276488c3cf103caf1e326bd21f00d3df84
Author: Pavel Machek <pa...@ucw.cz>
Date: Sat Jun 25 14:55:11 2005 -0700

[PATCH] properly stop devices before poweroff

Without this patch, Linux provokes emergency disk shutdowns and
similar nastiness. It was in SuSE kernels for some time, IIRC.

Signed-off-by: Pavel Machek <pa...@suse.cz>
Signed-off-by: Andrew Morton <ak...@osdl.org>
Signed-off-by: Linus Torvalds <torv...@osdl.org>

Eric

Eric W. Biederman

unread,
Jul 27, 2005, 12:49:52 PM7/27/05
to Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org
Andrew Morton <ak...@osdl.org> writes:

> My fairly ordinary x86 test box gets stuck during reboot on the
> wait_for_completion() in ide_do_drive_cmd():

Hmm. The only thing I can think of is someone started adding calls


to device_suspend() before device_shutdown(). Not understanding
where it was a good idea I made certain the calls were in there
consistently.

Andrew can you remove the call to device_suspend from kernel_restart
and see if this still happens?

I would suspect interrupts of being disabled but it looks like
kgdb is working and I think that requires an interrupt to notice
new characters.

Eric

Andrew Morton

unread,
Jul 27, 2005, 1:46:14 PM7/27/05
to Eric W. Biederman, torv...@osdl.org, linux-...@vger.kernel.org, Pavel Machek
ebie...@xmission.com (Eric W. Biederman) wrote:
>
> Andrew Morton <ak...@osdl.org> writes:
>
> > My fairly ordinary x86 test box gets stuck during reboot on the
> > wait_for_completion() in ide_do_drive_cmd():
>
> Hmm. The only thing I can think of is someone started adding calls
> to device_suspend() before device_shutdown(). Not understanding
> where it was a good idea I made certain the calls were in there
> consistently.
>
> Andrew can you remove the call to device_suspend from kernel_restart
> and see if this still happens?

yup, that fixes it.

--- devel/kernel/sys.c~a 2005-07-27 10:36:06.000000000 -0700
+++ devel-akpm/kernel/sys.c 2005-07-27 10:36:26.000000000 -0700
@@ -371,7 +371,6 @@ void kernel_restart(char *cmd)
{
notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);


system_state = SYSTEM_RESTART;
- device_suspend(PMSG_FREEZE);

device_shutdown();
if (!cmd) {


printk(KERN_EMERG "Restarting system.\n");

_


Presumably it unfixes Pavel's patch?


From: Pavel Machek <pa...@ucw.cz>

Without this patch, Linux provokes emergency disk shutdowns and
similar nastiness. It was in SuSE kernels for some time, IIRC.

Signed-off-by: Pavel Machek <pa...@suse.cz>
Signed-off-by: Andrew Morton <ak...@osdl.org>

---

include/linux/pm.h | 33 +++++++++++++++++++++------------
kernel/sys.c | 3 +++
2 files changed, 24 insertions(+), 12 deletions(-)

diff -puN kernel/sys.c~properly-stop-devices-before-poweroff kernel/sys.c
--- 25/kernel/sys.c~properly-stop-devices-before-poweroff 2005-06-25 14:47:00.000000000 -0700
+++ 25-akpm/kernel/sys.c 2005-06-25 14:50:53.000000000 -0700
@@ -405,6 +405,7 @@ asmlinkage long sys_reboot(int magic1, i
case LINUX_REBOOT_CMD_HALT:
notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);


system_state = SYSTEM_HALT;
+ device_suspend(PMSG_SUSPEND);

device_shutdown();


printk(KERN_EMERG "System halted.\n");

machine_halt();
@@ -415,6 +416,7 @@ asmlinkage long sys_reboot(int magic1, i
case LINUX_REBOOT_CMD_POWER_OFF:
notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);


system_state = SYSTEM_POWER_OFF;
+ device_suspend(PMSG_SUSPEND);

device_shutdown();


printk(KERN_EMERG "Power down.\n");

machine_power_off();
@@ -431,6 +433,7 @@ asmlinkage long sys_reboot(int magic1, i

notifier_call_chain(&reboot_notifier_list, SYS_RESTART, buffer);


system_state = SYSTEM_RESTART;
+ device_suspend(PMSG_FREEZE);

device_shutdown();


printk(KERN_EMERG "Restarting system with command '%s'.\n", buffer);

machine_restart(buffer);
diff -puN include/linux/pm.h~properly-stop-devices-before-poweroff include/linux/pm.h
--- 25/include/linux/pm.h~properly-stop-devices-before-poweroff 2005-06-25 14:47:00.000000000 -0700
+++ 25-akpm/include/linux/pm.h 2005-06-25 14:47:00.000000000 -0700
@@ -103,7 +103,8 @@ extern int pm_active;
/*
* Register a device with power management
*/
-struct pm_dev __deprecated *pm_register(pm_dev_t type, unsigned long id, pm_callback callback);
+struct pm_dev __deprecated *
+pm_register(pm_dev_t type, unsigned long id, pm_callback callback);

/*
* Unregister a device with power management
@@ -190,17 +191,18 @@ typedef u32 __bitwise pm_message_t;
/*
* There are 4 important states driver can be in:
* ON -- driver is working
- * FREEZE -- stop operations and apply whatever policy is applicable to a suspended driver
- * of that class, freeze queues for block like IDE does, drop packets for
- * ethernet, etc... stop DMA engine too etc... so a consistent image can be
- * saved; but do not power any hardware down.
- * SUSPEND - like FREEZE, but hardware is doing as much powersaving as possible. Roughly
- * pci D3.
+ * FREEZE -- stop operations and apply whatever policy is applicable to a
+ * suspended driver of that class, freeze queues for block like IDE
+ * does, drop packets for ethernet, etc... stop DMA engine too etc...
+ * so a consistent image can be saved; but do not power any hardware
+ * down.
+ * SUSPEND - like FREEZE, but hardware is doing as much powersaving as
+ * possible. Roughly pci D3.
*
- * Unfortunately, current drivers only recognize numeric values 0 (ON) and 3 (SUSPEND).
- * We'll need to fix the drivers. So yes, putting 3 to all diferent defines is intentional,
- * and will go away as soon as drivers are fixed. Also note that typedef is neccessary,
- * we'll probably want to switch to
+ * Unfortunately, current drivers only recognize numeric values 0 (ON) and 3
+ * (SUSPEND). We'll need to fix the drivers. So yes, putting 3 to all different
+ * defines is intentional, and will go away as soon as drivers are fixed. Also
+ * note that typedef is neccessary, we'll probably want to switch to
* typedef struct pm_message_t { int event; int flags; } pm_message_t
* or something similar soon.
*/
@@ -222,11 +224,18 @@ struct dev_pm_info {

extern void device_pm_set_parent(struct device * dev, struct device * parent);

-extern int device_suspend(pm_message_t state);
extern int device_power_down(pm_message_t state);
extern void device_power_up(void);
extern void device_resume(void);

+#ifdef CONFIG_PM
+extern int device_suspend(pm_message_t state);
+#else
+static inline int device_suspend(pm_message_t state)
+{
+ return 0;
+}
+#endif

#endif /* __KERNEL__ */

_

Eric W. Biederman

unread,
Jul 27, 2005, 2:18:18 PM7/27/05
to Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org, Pavel Machek
Andrew Morton <ak...@osdl.org> writes:

> ebie...@xmission.com (Eric W. Biederman) wrote:
>>
>> Andrew Morton <ak...@osdl.org> writes:
>>
>> > My fairly ordinary x86 test box gets stuck during reboot on the
>> > wait_for_completion() in ide_do_drive_cmd():
>>
>> Hmm. The only thing I can think of is someone started adding calls
>> to device_suspend() before device_shutdown(). Not understanding
>> where it was a good idea I made certain the calls were in there
>> consistently.
>>
>> Andrew can you remove the call to device_suspend from kernel_restart
>> and see if this still happens?
>
> yup, that fixes it.
>
> --- devel/kernel/sys.c~a 2005-07-27 10:36:06.000000000 -0700
> +++ devel-akpm/kernel/sys.c 2005-07-27 10:36:26.000000000 -0700
> @@ -371,7 +371,6 @@ void kernel_restart(char *cmd)
> {
> notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
> system_state = SYSTEM_RESTART;
> - device_suspend(PMSG_FREEZE);
> device_shutdown();
> if (!cmd) {
> printk(KERN_EMERG "Restarting system.\n");
> _
>
>
> Presumably it unfixes Pavel's patch?

Good question. I'm not certain if Pavel intended to add
device_suspend(PMSG_FREEZE) to the reboot path. It was
there in only one instance. Pavel comments talk only about
the suspend path.

My gut feel is the device_suspend calls are the right direction
as it allows us to remove code from the drivers and possible
kill device_shutdown completely.

But this close to 2.6.13 I'm not certain what the correct solution
is. With this we have had issues with both ide and the e1000.
But those are among the few drivers that do anything in either
device_shutdown() or the reboot_notifier.

The e1000 has been fixed. Can we fix the ide driver?

Looking at it more closely the code is confusing because
FREEZE and SUSPEND are actually the same message, and in
addition to what shutdown does they place the device in
a low power state. That worries me because last I checked
the e1000 driver could not bring itself out of the low power
state.

Are the semantic differences to great to make this interesting?

Is this additional suspend what is causing some of the oddball
power off bug reports?

My gut feel is that device_suspend(PMSG_FREEZE) should be
removed from kernel_restart until is a different message
from PMSG_SUSPEND at which point it should be equivalent
to device_shutdown and we can remove that case.

We really need to get a consistent set of requirements
for the device driver authors so they have a chance of
catching up.

Eric

Eric W. Biederman

unread,
Jul 27, 2005, 2:25:13 PM7/27/05
to Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org, Pavel Machek
ebie...@xmission.com (Eric W. Biederman) writes:

> We really need to get a consistent set of requirements
> for the device driver authors so they have a chance of
> catching up.

I meant to say. We really need to get a simple and
stable set of requirement for the device driver authors so they
have a chance of catching up and implementing it correctly.

Andrew Morton

unread,
Jul 27, 2005, 2:41:00 PM7/27/05
to Eric W. Biederman, torv...@osdl.org, linux-...@vger.kernel.org, pa...@ucw.cz

By "fixed" do you mean Tony Luck's patch which I added to rc3-mm2? If so,
do you think that's needed for 2.6.13? Getting patches into e100[0] is a
bit of an exercise :(

Eric W. Biederman

unread,
Jul 27, 2005, 2:54:52 PM7/27/05
to Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org, pa...@ucw.cz
Andrew Morton <ak...@osdl.org> writes:

> ebie...@xmission.com (Eric W. Biederman) wrote:
>
> By "fixed" do you mean Tony Luck's patch which I added to rc3-mm2? If so,
> do you think that's needed for 2.6.13? Getting patches into e100[0] is a
> bit of an exercise :(

That is what I was referring to.

Eric

Pavel Machek

unread,
Jul 27, 2005, 6:55:29 PM7/27/05
to Eric W. Biederman, Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org
Hi!

> >> > My fairly ordinary x86 test box gets stuck during reboot on the
> >> > wait_for_completion() in ide_do_drive_cmd():
> >>
> >> Hmm. The only thing I can think of is someone started adding calls
> >> to device_suspend() before device_shutdown(). Not understanding
> >> where it was a good idea I made certain the calls were in there
> >> consistently.
> >>
> >> Andrew can you remove the call to device_suspend from kernel_restart
> >> and see if this still happens?
> >
> > yup, that fixes it.
> >
> > --- devel/kernel/sys.c~a 2005-07-27 10:36:06.000000000 -0700
> > +++ devel-akpm/kernel/sys.c 2005-07-27 10:36:26.000000000 -0700
> > @@ -371,7 +371,6 @@ void kernel_restart(char *cmd)
> > {
> > notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
> > system_state = SYSTEM_RESTART;
> > - device_suspend(PMSG_FREEZE);
> > device_shutdown();
> > if (!cmd) {
> > printk(KERN_EMERG "Restarting system.\n");
> > _
> >
> >
> > Presumably it unfixes Pavel's patch?
>
> Good question. I'm not certain if Pavel intended to add
> device_suspend(PMSG_FREEZE) to the reboot path. It was
> there in only one instance. Pavel comments talk only about
> the suspend path.

Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.

> My gut feel is the device_suspend calls are the right direction
> as it allows us to remove code from the drivers and possible
> kill device_shutdown completely.
>
> But this close to 2.6.13 I'm not certain what the correct solution
> is. With this we have had issues with both ide and the e1000.
> But those are among the few drivers that do anything in either
> device_shutdown() or the reboot_notifier.

.


> Looking at it more closely the code is confusing because
> FREEZE and SUSPEND are actually the same message, and in
> addition to what shutdown does they place the device in

Not in -mm; I was finally able to fix that one.

> My gut feel is that device_suspend(PMSG_FREEZE) should be
> removed from kernel_restart until is a different message
> from PMSG_SUSPEND at which point it should be equivalent
> to device_shutdown and we can remove that case.

PMSG_FREEZE != PMSG_SUSPEND in current -mm, but I'm not sure if we can
push that to 2.6.13...


Pavel
--
teflon -- maybe it is a trademark, but it should not be.

Linus Torvalds

unread,
Jul 27, 2005, 6:58:07 PM7/27/05
to Pavel Machek, Eric W. Biederman, Andrew Morton, linux-...@vger.kernel.org

On Thu, 28 Jul 2005, Pavel Machek wrote:
>
> Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.

Considering how many device drivers that are likely broken, I disagree.
Especially since Andrew seems to have trivially found a machine where it
doesn't work.

Linus

Pavel Machek

unread,
Jul 27, 2005, 6:59:45 PM7/27/05
to Linus Torvalds, Eric W. Biederman, Andrew Morton, linux-...@vger.kernel.org
Hi!

> > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
>
> Considering how many device drivers that are likely broken, I disagree.
> Especially since Andrew seems to have trivially found a machine where it
> doesn't work.

I'm not sure if we want to do that for 2.6.13, but long term, we
should just tell drivers to FREEZE instead of inventing reboot
notifier lists and similar uglynesses...

Pavel
--
teflon -- maybe it is a trademark, but it should not be.

Pavel Machek

unread,
Jul 27, 2005, 7:02:04 PM7/27/05
to Andrew Morton, ebie...@xmission.com, torv...@osdl.org, linux-...@vger.kernel.org
Hi!

> > > Good question. I'm not certain if Pavel intended to add
> > > device_suspend(PMSG_FREEZE) to the reboot path. It was
> > > there in only one instance. Pavel comments talk only about
> > > the suspend path.
> >

> > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
>

> Why?

Many bioses are broken; if you leave hardware active during reboot,
they'll hang during reboot. It is so common problem that I think that
only sane solution is make hardware quiet before reboot.

Andrew Morton

unread,
Jul 27, 2005, 7:11:58 PM7/27/05
to Pavel Machek, ebie...@xmission.com, torv...@osdl.org, linux-...@vger.kernel.org
Pavel Machek <pa...@ucw.cz> wrote:
>
> > Good question. I'm not certain if Pavel intended to add
> > device_suspend(PMSG_FREEZE) to the reboot path. It was
> > there in only one instance. Pavel comments talk only about
> > the suspend path.
>
> Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.

Why?

Eric W. Biederman

unread,
Jul 27, 2005, 7:14:00 PM7/27/05
to Pavel Machek, Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org
Pavel Machek <pa...@ucw.cz> writes:

> Hi!


>
> Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
>
>> My gut feel is the device_suspend calls are the right direction
>> as it allows us to remove code from the drivers and possible
>> kill device_shutdown completely.
>>
>> But this close to 2.6.13 I'm not certain what the correct solution
>> is. With this we have had issues with both ide and the e1000.
>> But those are among the few drivers that do anything in either
>> device_shutdown() or the reboot_notifier.

> ..


>> Looking at it more closely the code is confusing because
>> FREEZE and SUSPEND are actually the same message, and in
>> addition to what shutdown does they place the device in
>
> Not in -mm; I was finally able to fix that one.

Cool.

>> My gut feel is that device_suspend(PMSG_FREEZE) should be
>> removed from kernel_restart until is a different message
>> from PMSG_SUSPEND at which point it should be equivalent
>> to device_shutdown and we can remove that case.
>
> PMSG_FREEZE != PMSG_SUSPEND in current -mm, but I'm not sure if we can
> push that to 2.6.13...

Currently device_suspend(PMSG_FREEZE) in the reboot path breaks
the e1000 and the ide driver. Which is common enough hardware it
effectively breaks reboots in 2.6.13 despite the fact that nearly
everything thing else works.

To make device_suspend(PMSG_FREEZE) solid in the reboot path I think
it would take pushing and stabilizing all of PMSG_FREEZE != PMSG_SUSPEND.
It will certainly take a bunch of digging to make certain reboots keep
working. Since the number of drivers that implement either a reboot
notifier or a shutdown method is small the it is conceivable we could
track down all of the serious issues before 2.6.13. However I'm
not ready to take point on the bug hunt.

So unless you are really ambitious I'd like to take
device_suspend(PMSG_FREEZE) out of the reboot path for
2.6.13, put in -mm where people can bang on it for a bit
and see that it is coming and delay the merge with the stable
branch until the bugs with common hardware have been sorted out.

Eric

Eric W. Biederman

unread,
Jul 27, 2005, 7:27:22 PM7/27/05
to Pavel Machek, Linus Torvalds, Andrew Morton, linux-...@vger.kernel.org
Pavel Machek <pa...@suse.cz> writes:

> Hi!
>
>> > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
>>
>> Considering how many device drivers that are likely broken, I disagree.
>> Especially since Andrew seems to have trivially found a machine where it
>> doesn't work.
>
> I'm not sure if we want to do that for 2.6.13, but long term, we
> should just tell drivers to FREEZE instead of inventing reboot
> notifier lists and similar uglynesses...

Then as part of the patch device_shutdown should disappear. It
is silly to have two functions that want to achieve the same
thing.

Right now the device driver model is ugly and over complicated in
that case and it needs to be simplified so driver writers have
a chance of getting it correct.

device_suspend(PMSG_FREEZE) feels more reusable than device_shutdown
so long term it feels right.

But a simple question is why can't you simply use the shutdown
method instead of extending the suspend method in the drivers.

Eric

Nigel Cunningham

unread,
Jul 27, 2005, 8:58:48 PM7/27/05
to Eric W. Biederman, Andrew Morton, Linus Torvalds, Linux Kernel Mailing List, Linux-pm mailing list
Hi.

Could you please send PMSG_* related patches to linux-pm at
lists.osdl.org as well?

Thanks!

Nigel

--
Evolution.
Enumerate the requirements.
Consider the interdependencies.
Calculate the probabilities.

Eric W. Biederman

unread,
Jul 27, 2005, 9:14:29 PM7/27/05
to ncunn...@cyclades.com, Andrew Morton, Linus Torvalds, Linux Kernel Mailing List, Linux-pm mailing list
Nigel Cunningham <ncunn...@cyclades.com> writes:

> Hi.
>
> Could you please send PMSG_* related patches to linux-pm at
> lists.osdl.org as well?

I'll try. My goal was not to add or change not functionality but to
make what the kernel was already doing be consistent.

It turns out the device_suspend(PMSG_FREEZE) is a major pain
sitting in the reboot path and I will be submitting a patch to
remove it from the reboot path in 2.6.13 completely.

At the very least the ide driver breaks, and the e1000 driver
is affected.

And there is of course the puzzle of why there exists simultaneously
driver shutdown() and suspend(PMSG_FREEZE) methods as I believed they
are defined to do exactly the same thing.

Eric

dav...@pacbell.net

unread,
Jul 27, 2005, 10:23:35 PM7/27/05
to ncunn...@cyclades.com, ebie...@xmission.com, torv...@osdl.org, linu...@lists.osdl.org, linux-...@vger.kernel.org, ak...@osdl.org
> And there is of course the puzzle of why there exists simultaneously
> driver shutdown() and suspend(PMSG_FREEZE) methods as I believed they
> are defined to do exactly the same thing.

No puzzle; that's not how they're defined. Both of them cause the
device to quiesce that device's activity. But:

- shutdown() is a "dirty shutdown OK" heads-up for some level
of restart/reboot; hardware will be completely re-initialized
later, normally with hardware level resets and OS rebooting.

- freeze() is a "clean shutdown only" that normally sees hardware
state preserved, and is followed by suspend() or resume().

Or so I had understood. That does suggest why having FREEZE in the
reboot path could be trouble: you could be rebooting because that
hardware won't do a clean shutdown!

- Dave

Shaohua Li

unread,
Jul 27, 2005, 10:46:01 PM7/27/05
to Eric W. Biederman, ncunn...@cyclades.com, Andrew Morton, Linus Torvalds, Linux-pm mailing list, Linux Kernel Mailing List
On Wed, 2005-07-27 at 19:12 -0600, Eric W. Biederman wrote:
> Nigel Cunningham <ncunn...@cyclades.com> writes:
>
> > Hi.
> >
> > Could you please send PMSG_* related patches to linux-pm at
> > lists.osdl.org as well?
>
> I'll try. My goal was not to add or change not functionality but to
> make what the kernel was already doing be consistent.
>
> It turns out the device_suspend(PMSG_FREEZE) is a major pain
> sitting in the reboot path and I will be submitting a patch to
> remove it from the reboot path in 2.6.13 completely.
>
> At the very least the ide driver breaks, and the e1000 driver
> is affected.
>
> And there is of course the puzzle of why there exists simultaneously
> driver shutdown() and suspend(PMSG_FREEZE) methods as I believed they
> are defined to do exactly the same thing.
I would expect more driver breakage and for the shutdown either. In
current stage, suspend(PMSG_FREEZE) might put devices into D3 state. How
can a shutdown() be done again?

Thanks,
Shaohua

Pavel Machek

unread,
Jul 28, 2005, 3:47:09 AM7/28/05
to Eric W. Biederman, Linus Torvalds, Andrew Morton, linux-...@vger.kernel.org
Hi!

> >> > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
> >>
> >> Considering how many device drivers that are likely broken, I disagree.
> >> Especially since Andrew seems to have trivially found a machine where it
> >> doesn't work.
> >
> > I'm not sure if we want to do that for 2.6.13, but long term, we
> > should just tell drivers to FREEZE instead of inventing reboot
> > notifier lists and similar uglynesses...
>
> Then as part of the patch device_shutdown should disappear. It
> is silly to have two functions that want to achieve the same
> thing.

I always thought that device_shutdown is different phase -- the one
with interrupts disabled...


Pavel
--
teflon -- maybe it is a trademark, but it should not be.

Pavel Machek

unread,
Jul 28, 2005, 3:48:51 AM7/28/05
to Eric W. Biederman, Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org
Hi!

> So unless you are really ambitious I'd like to take
> device_suspend(PMSG_FREEZE) out of the reboot path for
> 2.6.13, put in -mm where people can bang on it for a bit
> and see that it is coming and delay the merge with the stable
> branch until the bugs with common hardware have been sorted out.

Works for me. I may feel ambitious, but I don't feel like debugging
every reboot problem or every strange architecture for 2.6.13 :-).

Pavel
--
teflon -- maybe it is a trademark, but it should not be.

Eric W. Biederman

unread,
Jul 28, 2005, 10:59:06 AM7/28/05
to Pavel Machek, Linus Torvalds, Andrew Morton, linux-...@vger.kernel.org
Pavel Machek <pa...@suse.cz> writes:

> I always thought that device_shutdown is different phase -- the one
> with interrupts disabled...

Nope. device_shutdown runs before interrupts are disabled.

Eric

Eric W. Biederman

unread,
Jul 28, 2005, 11:16:03 PM7/28/05
to Pavel Machek, Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org
Pavel Machek <pa...@suse.cz> writes:

> Hi!
>
>> So unless you are really ambitious I'd like to take
>> device_suspend(PMSG_FREEZE) out of the reboot path for
>> 2.6.13, put in -mm where people can bang on it for a bit
>> and see that it is coming and delay the merge with the stable
>> branch until the bugs with common hardware have been sorted out.
>
> Works for me. I may feel ambitious, but I don't feel like debugging
> every reboot problem or every strange architecture for 2.6.13 :-).

Curses. Foiled again!

I have failed to pass the buck.

Eric

Eric W. Biederman

unread,
Jul 28, 2005, 11:18:58 PM7/28/05
to Pavel Machek, Linus Torvalds, Andrew Morton, linux-...@vger.kernel.org
Pavel Machek <pa...@suse.cz> writes:


> I always thought that device_shutdown is different phase -- the one
> with interrupts disabled...

At the end of device_shutdown interrupts are disabled because we
shutdown the interrupt controllers.

I don't think we have a phase where the interrupts are disabled,
except for the code that runs after device_shutdown, and the bootup
code that happens before interrupts are enabled.

Eric

Nigel Cunningham

unread,
Aug 3, 2005, 11:25:40 PM8/3/05
to Pavel Machek, Andrew Morton, ebie...@xmission.com, Linus Torvalds, Linux Kernel Mailing List
Hi.

On Thu, 2005-07-28 at 08:54, Pavel Machek wrote:
> Hi!
>
> > > > Good question. I'm not certain if Pavel intended to add
> > > > device_suspend(PMSG_FREEZE) to the reboot path. It was
> > > > there in only one instance. Pavel comments talk only about
> > > > the suspend path.
> > >
> > > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
> >
> > Why?
>
> Many bioses are broken; if you leave hardware active during reboot,
> they'll hang during reboot. It is so common problem that I think that
> only sane solution is make hardware quiet before reboot.

Sorry for my slow reply.

If I remember correctly PMSG_FREEZE was intended solely for stopping
activity when suspend to disk implementations are about to do their
atomic copies. I thought that ide reacts to this message by putting a
hold on queues, but doesn't otherwise do anything to prepare a drive for
a restart. If that's true, using FREEZE here isn't going to stop drives
from doing their emergency shutdown actions. Don't we need PMSG_SUSPEND
instead?

Regards,

Nigel


--
Evolution.
Enumerate the requirements.
Consider the interdependencies.
Calculate the probabilities.

-

Pavel Machek

unread,
Aug 4, 2005, 5:56:45 PM8/4/05
to Nigel Cunningham, Andrew Morton, ebie...@xmission.com, Linus Torvalds, Linux Kernel Mailing List
Hi!

> > > > > Good question. I'm not certain if Pavel intended to add
> > > > > device_suspend(PMSG_FREEZE) to the reboot path. It was
> > > > > there in only one instance. Pavel comments talk only about
> > > > > the suspend path.
> > > >
> > > > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
> > >
> > > Why?
> >
> > Many bioses are broken; if you leave hardware active during reboot,
> > they'll hang during reboot. It is so common problem that I think that
> > only sane solution is make hardware quiet before reboot.
>
> Sorry for my slow reply.
>
> If I remember correctly PMSG_FREEZE was intended solely for stopping
> activity when suspend to disk implementations are about to do their

Well, I think that PMSG_FREEZE can be handy when we want to stop
activity for other reasons, too...

> atomic copies. I thought that ide reacts to this message by putting a
> hold on queues, but doesn't otherwise do anything to prepare a drive for
> a restart. If that's true, using FREEZE here isn't going to stop drives
> from doing their emergency shutdown actions. Don't we need PMSG_SUSPEND
> instead?

Spinning disk down is not neccessary for reboot. Users will be angry
if we do it before reboot...
Pavel
--
if you have sharp zaurus hardware you don't need... you know my address

Nigel Cunningham

unread,
Aug 4, 2005, 6:23:56 PM8/4/05
to Pavel Machek, Andrew Morton, ebie...@xmission.com, Linus Torvalds, Linux Kernel Mailing List
Hi.

On Fri, 2005-08-05 at 07:45, Pavel Machek wrote:
> Hi!
>
> > > > > > Good question. I'm not certain if Pavel intended to add
> > > > > > device_suspend(PMSG_FREEZE) to the reboot path. It was
> > > > > > there in only one instance. Pavel comments talk only about
> > > > > > the suspend path.
> > > > >
> > > > > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
> > > >
> > > > Why?
> > >
> > > Many bioses are broken; if you leave hardware active during reboot,
> > > they'll hang during reboot. It is so common problem that I think that
> > > only sane solution is make hardware quiet before reboot.
> >
> > Sorry for my slow reply.
> >
> > If I remember correctly PMSG_FREEZE was intended solely for stopping
> > activity when suspend to disk implementations are about to do their
>
> Well, I think that PMSG_FREEZE can be handy when we want to stop
> activity for other reasons, too...
>
> > atomic copies. I thought that ide reacts to this message by putting a
> > hold on queues, but doesn't otherwise do anything to prepare a drive for
> > a restart. If that's true, using FREEZE here isn't going to stop drives
> > from doing their emergency shutdown actions. Don't we need PMSG_SUSPEND
> > instead?
>
> Spinning disk down is not neccessary for reboot. Users will be angry
> if we do it before reboot...

Yes, but I understood (perhaps wrongly) that we were discussing the
shutdown path. Nevertheless, for rebooting, you don't want to simply
freeze the queue - you want to flush the queue and tell the drive to
flush too. For freeze, you may well flush the queue, but you might not
necessarily force the drive to flush its queue too.

Regards,

Nigel
--
Evolution.
Enumerate the requirements.
Consider the interdependencies.
Calculate the probabilities.

-

Eric W. Biederman

unread,
Aug 7, 2005, 8:50:36 AM8/7/05