Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 0/23] reboot-fixes

0 views
Skip to first unread message

Eric W. Biederman

unread,
Jul 26, 2005, 1:21:46 PM7/26/05
to Andrew Morton, Linus Torvalds, linux-...@vger.kernel.org

The reboot code paths seems to be suffering from 15 years of people
only looking at the code when it breaks. The result is there
are several code paths in which different callers expect different
semantics from the same functions, and a fair amount of imperfect
inline replication of code.

For a year or more every time I fix one bug in the bug fix reveals yet
another bug. In an attempt to end the cycle of bug fixes revealing
yet more bugs I have generated a series of patches to clean up
the semantics along the reboot path.

With the callers all agreeing on what to expect from the functions
they call it should at least be possible to kill bugs without
more showing up because of the bug fix.

My primary approach is to factor sys_reboot into several smaller
functions and provide those functions for the general kernel
consumers instead of the architecture dependent restart and
halt hooks.

I don't expect this to noticeably fix any bugs along the
main code paths but magic sysrq and several of the more obscure
code paths should work much more reliably.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Eric W. Biederman

unread,
Jul 26, 2005, 1:23:42 PM7/26/05
to Andrew Morton, Linus Torvalds, linux-...@vger.kernel.org

In the recent addition of device_suspend calls into
sys_reboot two code paths were missed.

Signed-off-by: Eric W. Biederman <ebie...@xmission.com>
---

kernel/sys.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

5f0fb00783b94248b5a76c161f1c30a033fce4d3
diff --git a/kernel/sys.c b/kernel/sys.c
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -391,6 +391,7 @@ asmlinkage long sys_reboot(int magic1, i
case LINUX_REBOOT_CMD_RESTART:
notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
system_state = SYSTEM_RESTART;
+ device_suspend(PMSG_FREEZE);
device_shutdown();
printk(KERN_EMERG "Restarting system.\n");
machine_restart(NULL);
@@ -452,6 +453,7 @@ asmlinkage long sys_reboot(int magic1, i
}
notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
system_state = SYSTEM_RESTART;
+ device_suspend(PMSG_FREEZE);
device_shutdown();
printk(KERN_EMERG "Starting new kernel\n");
machine_shutdown();

Eric W. Biederman

unread,
Jul 26, 2005, 1:31:09 PM7/26/05
to Andrew Morton, Linus Torvalds, linux-...@vger.kernel.org

Because the factors of sys_reboot don't exist people calling
into the reboot path duplicate the code badly, leading to
inconsistent expectations of code in the reboot path.

This patch should is just code motion.

Signed-off-by: Eric W. Biederman <ebie...@xmission.com>
---

include/linux/reboot.h | 9 ++++
kernel/sys.c | 106 +++++++++++++++++++++++++++++-------------------
2 files changed, 73 insertions(+), 42 deletions(-)

bbc94d1109869fe5054f7cd697c978d494edeb62
diff --git a/include/linux/reboot.h b/include/linux/reboot.h
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -55,6 +55,15 @@ extern void machine_shutdown(void);
struct pt_regs;
extern void machine_crash_shutdown(struct pt_regs *);

+/*
+ * Architecture independent implemenations of sys_reboot commands.
+ */
+
+extern void kernel_restart(char *cmd);
+extern void kernel_halt(void);
+extern void kernel_power_off(void);
+extern void kernel_kexec(void);
+
#endif

#endif /* _LINUX_REBOOT_H */


diff --git a/kernel/sys.c b/kernel/sys.c
--- a/kernel/sys.c
+++ b/kernel/sys.c

@@ -361,6 +361,62 @@ out_unlock:
return retval;
}

+void kernel_restart(char *cmd)
+{
+ notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
+ system_state = SYSTEM_RESTART;
+ device_suspend(PMSG_FREEZE);
+ device_shutdown();
+ if (!cmd) {
+ printk(KERN_EMERG "Restarting system.\n");
+ } else {
+ printk(KERN_EMERG "Restarting system with command '%s'.\n", cmd);
+ }
+ printk(".\n");
+ machine_restart(cmd);
+}
+EXPORT_SYMBOL_GPL(kernel_restart);
+
+void kernel_kexec(void)
+{
+#ifdef CONFIG_KEXEC
+ struct kimage *image;
+ image = xchg(&kexec_image, 0);
+ if (!image) {
+ return;
+ }
+ notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
+ system_state = SYSTEM_RESTART;
+ device_suspend(PMSG_FREEZE);
+ device_shutdown();
+ printk(KERN_EMERG "Starting new kernel\n");
+ machine_shutdown();
+ machine_kexec(image);
+#endif
+}
+EXPORT_SYMBOL_GPL(kernel_kexec);
+
+void kernel_halt(void)
+{
+ notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);
+ system_state = SYSTEM_HALT;
+ device_suspend(PMSG_SUSPEND);
+ device_shutdown();
+ printk(KERN_EMERG "System halted.\n");
+ machine_halt();
+}
+EXPORT_SYMBOL_GPL(kernel_halt);
+
+void kernel_power_off(void)
+{
+ notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);
+ system_state = SYSTEM_POWER_OFF;
+ device_suspend(PMSG_SUSPEND);
+ device_shutdown();
+ printk(KERN_EMERG "Power down.\n");
+ machine_power_off();
+}
+EXPORT_SYMBOL_GPL(kernel_power_off);

/*
* Reboot system call: for obvious reasons only root may call it,
@@ -389,12 +445,7 @@ asmlinkage long sys_reboot(int magic1, i
lock_kernel();
switch (cmd) {
case LINUX_REBOOT_CMD_RESTART:
- notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
- system_state = SYSTEM_RESTART;
- device_suspend(PMSG_FREEZE);
- device_shutdown();
- printk(KERN_EMERG "Restarting system.\n");
- machine_restart(NULL);
+ kernel_restart(NULL);
break;

case LINUX_REBOOT_CMD_CAD_ON:
@@ -406,23 +457,13 @@ asmlinkage long sys_reboot(int magic1, i
break;

case LINUX_REBOOT_CMD_HALT:
- notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);
- system_state = SYSTEM_HALT;
- device_suspend(PMSG_SUSPEND);
- device_shutdown();
- printk(KERN_EMERG "System halted.\n");
- machine_halt();
+ kernel_halt();
unlock_kernel();
do_exit(0);
break;

case LINUX_REBOOT_CMD_POWER_OFF:
- notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);
- system_state = SYSTEM_POWER_OFF;
- device_suspend(PMSG_SUSPEND);
- device_shutdown();
- printk(KERN_EMERG "Power down.\n");
- machine_power_off();
+ kernel_power_off();
unlock_kernel();
do_exit(0);
break;
@@ -434,33 +475,14 @@ asmlinkage long sys_reboot(int magic1, i
}
buffer[sizeof(buffer) - 1] = '\0';

- notifier_call_chain(&reboot_notifier_list, SYS_RESTART, buffer);
- system_state = SYSTEM_RESTART;
- device_suspend(PMSG_FREEZE);
- device_shutdown();
- printk(KERN_EMERG "Restarting system with command '%s'.\n", buffer);
- machine_restart(buffer);
+ kernel_restart(buffer);
break;

-#ifdef CONFIG_KEXEC
case LINUX_REBOOT_CMD_KEXEC:
- {
- struct kimage *image;
- image = xchg(&kexec_image, 0);
- if (!image) {
- unlock_kernel();
- return -EINVAL;
- }
- notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
- system_state = SYSTEM_RESTART;
- device_suspend(PMSG_FREEZE);
- device_shutdown();
- printk(KERN_EMERG "Starting new kernel\n");
- machine_shutdown();
- machine_kexec(image);
- break;
- }
-#endif
+ kernel_kexec();
+ unlock_kernel();
+ return -EINVAL;
+
#ifdef CONFIG_SOFTWARE_SUSPEND
case LINUX_REBOOT_CMD_SW_SUSPEND:
{

Pavel Machek

unread,
Jul 26, 2005, 4:10:29 PM7/26/05
to Eric W. Biederman, Andrew Morton, Linus Torvalds, linux-...@vger.kernel.org
Hi!

> The reboot code paths seems to be suffering from 15 years of people
> only looking at the code when it breaks. The result is there
> are several code paths in which different callers expect different
> semantics from the same functions, and a fair amount of imperfect
> inline replication of code.
>
> For a year or more every time I fix one bug in the bug fix reveals yet
> another bug. In an attempt to end the cycle of bug fixes revealing
> yet more bugs I have generated a series of patches to clean up
> the semantics along the reboot path.
>
> With the callers all agreeing on what to expect from the functions
> they call it should at least be possible to kill bugs without
> more showing up because of the bug fix.
>
> My primary approach is to factor sys_reboot into several smaller
> functions and provide those functions for the general kernel
> consumers instead of the architecture dependent restart and
> halt hooks.
>
> I don't expect this to noticeably fix any bugs along the
> main code paths but magic sysrq and several of the more obscure
> code paths should work much more reliably.

It looks good to me. Good ammount of cruft really accumulated there...

Pavel
--
teflon -- maybe it is a trademark, but it should not be.

Eric W. Biederman

unread,
Jul 26, 2005, 3:30:15 PM7/26/05
to Andrew Morton, Linus Torvalds, linux-...@vger.kernel.org

It is obvious we wanted to call kernel_restart here
but since we don't have it the code was expanded inline and hasn't
been correct since sometime in 2.4.

Signed-off-by: Eric W. Biederman <ebie...@xmission.com>
---

kernel/sys.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

252e08b37dc29ce42a0aef357b75ec1882eb634c


diff --git a/kernel/sys.c b/kernel/sys.c
--- a/kernel/sys.c
+++ b/kernel/sys.c

@@ -502,8 +502,7 @@ asmlinkage long sys_reboot(int magic1, i

static void deferred_cad(void *dummy)
{
- notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
- machine_restart(NULL);
+ kernel_restart(NULL);
}

/*

Andrew Morton

unread,
Jul 27, 2005, 6:02:06 AM7/27/05
to Eric W. Biederman, torv...@osdl.org, linux-...@vger.kernel.org

My fairly ordinary x86 test box gets stuck during reboot on the
wait_for_completion() in ide_do_drive_cmd():


(gdb) thread 73
[Switching to thread 73 (Thread 2906)]#0 ide_do_drive_cmd (drive=0xc072dd10, rq=0x0,
action=ide_wait) at drivers/ide/ide-io.c:1671
1671 rq->waiting = NULL;
(gdb) bt
#0 ide_do_drive_cmd (drive=0xc072dd10, rq=0x0, action=ide_wait) at drivers/ide/ide-io.c:1671
#1 0xc02e64c0 in ide_diag_taskfile (drive=0xc072dd10, args=0xcc0c1e20, data_size=0, buf=0x0)
at drivers/ide/ide-taskfile.c:503
#2 0xc02e64e6 in ide_raw_taskfile (drive=0x0, args=0x0, buf=0x0) at drivers/ide/ide-taskfile.c:508
#3 0xc02eab6d in do_idedisk_flushcache (drive=0x0) at drivers/ide/ide-disk.c:831
#4 0xc02eb232 in ide_cacheflush_p (drive=0xc072dd10) at drivers/ide/ide-disk.c:1027
#5 0xc02eb2e4 in ide_device_shutdown (dev=0xc072ddf4) at drivers/ide/ide-disk.c:1083
#6 0xc02af75c in device_shutdown () at drivers/base/power/shutdown.c:45
#7 0xc0128bfe in kernel_restart (cmd=0x0) at kernel/sys.c:375
#8 0xc0128dea in sys_reboot (magic1=-18751827, magic2=672274793, cmd=19088743, arg=0x0)
at kernel/sys.c:484
#9 0xc0102ba3 in sysenter_past_esp () at arch/i386/kernel/semaphore.c:177

Eric W. Biederman

unread,
Jul 27, 2005, 12:02:04 PM7/27/05
to Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org
ebie...@xmission.com (Eric W. Biederman) writes:

> Andrew Morton <ak...@osdl.org> writes:
>
>> My fairly ordinary x86 test box gets stuck during reboot on the
>> wait_for_completion() in ide_do_drive_cmd():
>

> Hmm. The only thing I can think of is someone started adding calls
> to device_suspend() before device_shutdown(). Not understanding
> where it was a good idea I made certain the calls were in there
> consistently.
>
> Andrew can you remove the call to device_suspend from kernel_restart
> and see if this still happens?
>
> I would suspect interrupts of being disabled but it looks like
> kgdb is working and I think that requires an interrupt to notice
> new characters.

Looking at it the device_suspend calls should be safe but
in case we need to follow it up the device_suspend calls in sys_reboot
were initially introduced in:


commit 620b03276488c3cf103caf1e326bd21f00d3df84
Author: Pavel Machek <pa...@ucw.cz>
Date: Sat Jun 25 14:55:11 2005 -0700

[PATCH] properly stop devices before poweroff

Without this patch, Linux provokes emergency disk shutdowns and
similar nastiness. It was in SuSE kernels for some time, IIRC.

Signed-off-by: Pavel Machek <pa...@suse.cz>
Signed-off-by: Andrew Morton <ak...@osdl.org>
Signed-off-by: Linus Torvalds <torv...@osdl.org>

Eric

Eric W. Biederman

unread,
Jul 27, 2005, 12:49:52 PM7/27/05
to Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org
Andrew Morton <ak...@osdl.org> writes:

> My fairly ordinary x86 test box gets stuck during reboot on the
> wait_for_completion() in ide_do_drive_cmd():

Hmm. The only thing I can think of is someone started adding calls


to device_suspend() before device_shutdown(). Not understanding
where it was a good idea I made certain the calls were in there
consistently.

Andrew can you remove the call to device_suspend from kernel_restart
and see if this still happens?

I would suspect interrupts of being disabled but it looks like
kgdb is working and I think that requires an interrupt to notice
new characters.

Eric

Andrew Morton

unread,
Jul 27, 2005, 1:46:14 PM7/27/05
to Eric W. Biederman, torv...@osdl.org, linux-...@vger.kernel.org, Pavel Machek
ebie...@xmission.com (Eric W. Biederman) wrote:
>
> Andrew Morton <ak...@osdl.org> writes:
>
> > My fairly ordinary x86 test box gets stuck during reboot on the
> > wait_for_completion() in ide_do_drive_cmd():
>
> Hmm. The only thing I can think of is someone started adding calls
> to device_suspend() before device_shutdown(). Not understanding
> where it was a good idea I made certain the calls were in there
> consistently.
>
> Andrew can you remove the call to device_suspend from kernel_restart
> and see if this still happens?

yup, that fixes it.

--- devel/kernel/sys.c~a 2005-07-27 10:36:06.000000000 -0700
+++ devel-akpm/kernel/sys.c 2005-07-27 10:36:26.000000000 -0700
@@ -371,7 +371,6 @@ void kernel_restart(char *cmd)
{
notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);


system_state = SYSTEM_RESTART;
- device_suspend(PMSG_FREEZE);

device_shutdown();
if (!cmd) {


printk(KERN_EMERG "Restarting system.\n");

_


Presumably it unfixes Pavel's patch?


From: Pavel Machek <pa...@ucw.cz>

Without this patch, Linux provokes emergency disk shutdowns and
similar nastiness. It was in SuSE kernels for some time, IIRC.

Signed-off-by: Pavel Machek <pa...@suse.cz>
Signed-off-by: Andrew Morton <ak...@osdl.org>

---

include/linux/pm.h | 33 +++++++++++++++++++++------------
kernel/sys.c | 3 +++
2 files changed, 24 insertions(+), 12 deletions(-)

diff -puN kernel/sys.c~properly-stop-devices-before-poweroff kernel/sys.c
--- 25/kernel/sys.c~properly-stop-devices-before-poweroff 2005-06-25 14:47:00.000000000 -0700
+++ 25-akpm/kernel/sys.c 2005-06-25 14:50:53.000000000 -0700
@@ -405,6 +405,7 @@ asmlinkage long sys_reboot(int magic1, i
case LINUX_REBOOT_CMD_HALT:
notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);


system_state = SYSTEM_HALT;
+ device_suspend(PMSG_SUSPEND);

device_shutdown();


printk(KERN_EMERG "System halted.\n");

machine_halt();
@@ -415,6 +416,7 @@ asmlinkage long sys_reboot(int magic1, i
case LINUX_REBOOT_CMD_POWER_OFF:
notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);


system_state = SYSTEM_POWER_OFF;
+ device_suspend(PMSG_SUSPEND);

device_shutdown();


printk(KERN_EMERG "Power down.\n");

machine_power_off();
@@ -431,6 +433,7 @@ asmlinkage long sys_reboot(int magic1, i

notifier_call_chain(&reboot_notifier_list, SYS_RESTART, buffer);


system_state = SYSTEM_RESTART;
+ device_suspend(PMSG_FREEZE);

device_shutdown();


printk(KERN_EMERG "Restarting system with command '%s'.\n", buffer);

machine_restart(buffer);
diff -puN include/linux/pm.h~properly-stop-devices-before-poweroff include/linux/pm.h
--- 25/include/linux/pm.h~properly-stop-devices-before-poweroff 2005-06-25 14:47:00.000000000 -0700
+++ 25-akpm/include/linux/pm.h 2005-06-25 14:47:00.000000000 -0700
@@ -103,7 +103,8 @@ extern int pm_active;
/*
* Register a device with power management
*/
-struct pm_dev __deprecated *pm_register(pm_dev_t type, unsigned long id, pm_callback callback);
+struct pm_dev __deprecated *
+pm_register(pm_dev_t type, unsigned long id, pm_callback callback);

/*
* Unregister a device with power management
@@ -190,17 +191,18 @@ typedef u32 __bitwise pm_message_t;
/*
* There are 4 important states driver can be in:
* ON -- driver is working
- * FREEZE -- stop operations and apply whatever policy is applicable to a suspended driver
- * of that class, freeze queues for block like IDE does, drop packets for
- * ethernet, etc... stop DMA engine too etc... so a consistent image can be
- * saved; but do not power any hardware down.
- * SUSPEND - like FREEZE, but hardware is doing as much powersaving as possible. Roughly
- * pci D3.
+ * FREEZE -- stop operations and apply whatever policy is applicable to a
+ * suspended driver of that class, freeze queues for block like IDE
+ * does, drop packets for ethernet, etc... stop DMA engine too etc...
+ * so a consistent image can be saved; but do not power any hardware
+ * down.
+ * SUSPEND - like FREEZE, but hardware is doing as much powersaving as
+ * possible. Roughly pci D3.
*
- * Unfortunately, current drivers only recognize numeric values 0 (ON) and 3 (SUSPEND).
- * We'll need to fix the drivers. So yes, putting 3 to all diferent defines is intentional,
- * and will go away as soon as drivers are fixed. Also note that typedef is neccessary,
- * we'll probably want to switch to
+ * Unfortunately, current drivers only recognize numeric values 0 (ON) and 3
+ * (SUSPEND). We'll need to fix the drivers. So yes, putting 3 to all different
+ * defines is intentional, and will go away as soon as drivers are fixed. Also
+ * note that typedef is neccessary, we'll probably want to switch to
* typedef struct pm_message_t { int event; int flags; } pm_message_t
* or something similar soon.
*/
@@ -222,11 +224,18 @@ struct dev_pm_info {

extern void device_pm_set_parent(struct device * dev, struct device * parent);

-extern int device_suspend(pm_message_t state);
extern int device_power_down(pm_message_t state);
extern void device_power_up(void);
extern void device_resume(void);

+#ifdef CONFIG_PM
+extern int device_suspend(pm_message_t state);
+#else
+static inline int device_suspend(pm_message_t state)
+{
+ return 0;
+}
+#endif

#endif /* __KERNEL__ */

_

Eric W. Biederman

unread,
Jul 27, 2005, 2:18:18 PM7/27/05
to Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org, Pavel Machek
Andrew Morton <ak...@osdl.org> writes:

> ebie...@xmission.com (Eric W. Biederman) wrote:
>>
>> Andrew Morton <ak...@osdl.org> writes:
>>
>> > My fairly ordinary x86 test box gets stuck during reboot on the
>> > wait_for_completion() in ide_do_drive_cmd():
>>
>> Hmm. The only thing I can think of is someone started adding calls
>> to device_suspend() before device_shutdown(). Not understanding
>> where it was a good idea I made certain the calls were in there
>> consistently.
>>
>> Andrew can you remove the call to device_suspend from kernel_restart
>> and see if this still happens?
>
> yup, that fixes it.
>
> --- devel/kernel/sys.c~a 2005-07-27 10:36:06.000000000 -0700
> +++ devel-akpm/kernel/sys.c 2005-07-27 10:36:26.000000000 -0700
> @@ -371,7 +371,6 @@ void kernel_restart(char *cmd)
> {
> notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
> system_state = SYSTEM_RESTART;
> - device_suspend(PMSG_FREEZE);
> device_shutdown();
> if (!cmd) {
> printk(KERN_EMERG "Restarting system.\n");
> _
>
>
> Presumably it unfixes Pavel's patch?

Good question. I'm not certain if Pavel intended to add
device_suspend(PMSG_FREEZE) to the reboot path. It was
there in only one instance. Pavel comments talk only about
the suspend path.

My gut feel is the device_suspend calls are the right direction
as it allows us to remove code from the drivers and possible
kill device_shutdown completely.

But this close to 2.6.13 I'm not certain what the correct solution
is. With this we have had issues with both ide and the e1000.
But those are among the few drivers that do anything in either
device_shutdown() or the reboot_notifier.

The e1000 has been fixed. Can we fix the ide driver?

Looking at it more closely the code is confusing because
FREEZE and SUSPEND are actually the same message, and in
addition to what shutdown does they place the device in
a low power state. That worries me because last I checked
the e1000 driver could not bring itself out of the low power
state.

Are the semantic differences to great to make this interesting?

Is this additional suspend what is causing some of the oddball
power off bug reports?

My gut feel is that device_suspend(PMSG_FREEZE) should be
removed from kernel_restart until is a different message
from PMSG_SUSPEND at which point it should be equivalent
to device_shutdown and we can remove that case.

We really need to get a consistent set of requirements
for the device driver authors so they have a chance of
catching up.

Eric

Eric W. Biederman

unread,
Jul 27, 2005, 2:25:13 PM7/27/05
to Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org, Pavel Machek
ebie...@xmission.com (Eric W. Biederman) writes:

> We really need to get a consistent set of requirements
> for the device driver authors so they have a chance of
> catching up.

I meant to say. We really need to get a simple and
stable set of requirement for the device driver authors so they
have a chance of catching up and implementing it correctly.

Andrew Morton

unread,
Jul 27, 2005, 2:41:00 PM7/27/05
to Eric W. Biederman, torv...@osdl.org, linux-...@vger.kernel.org, pa...@ucw.cz

By "fixed" do you mean Tony Luck's patch which I added to rc3-mm2? If so,
do you think that's needed for 2.6.13? Getting patches into e100[0] is a
bit of an exercise :(

Eric W. Biederman

unread,
Jul 27, 2005, 2:54:52 PM7/27/05
to Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org, pa...@ucw.cz
Andrew Morton <ak...@osdl.org> writes:

> ebie...@xmission.com (Eric W. Biederman) wrote:
>
> By "fixed" do you mean Tony Luck's patch which I added to rc3-mm2? If so,
> do you think that's needed for 2.6.13? Getting patches into e100[0] is a
> bit of an exercise :(

That is what I was referring to.

Eric

Pavel Machek

unread,
Jul 27, 2005, 6:55:29 PM7/27/05
to Eric W. Biederman, Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org
Hi!

> >> > My fairly ordinary x86 test box gets stuck during reboot on the
> >> > wait_for_completion() in ide_do_drive_cmd():
> >>
> >> Hmm. The only thing I can think of is someone started adding calls
> >> to device_suspend() before device_shutdown(). Not understanding
> >> where it was a good idea I made certain the calls were in there
> >> consistently.
> >>
> >> Andrew can you remove the call to device_suspend from kernel_restart
> >> and see if this still happens?
> >
> > yup, that fixes it.
> >
> > --- devel/kernel/sys.c~a 2005-07-27 10:36:06.000000000 -0700
> > +++ devel-akpm/kernel/sys.c 2005-07-27 10:36:26.000000000 -0700
> > @@ -371,7 +371,6 @@ void kernel_restart(char *cmd)
> > {
> > notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
> > system_state = SYSTEM_RESTART;
> > - device_suspend(PMSG_FREEZE);
> > device_shutdown();
> > if (!cmd) {
> > printk(KERN_EMERG "Restarting system.\n");
> > _
> >
> >
> > Presumably it unfixes Pavel's patch?
>
> Good question. I'm not certain if Pavel intended to add
> device_suspend(PMSG_FREEZE) to the reboot path. It was
> there in only one instance. Pavel comments talk only about
> the suspend path.

Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.

> My gut feel is the device_suspend calls are the right direction
> as it allows us to remove code from the drivers and possible
> kill device_shutdown completely.
>
> But this close to 2.6.13 I'm not certain what the correct solution
> is. With this we have had issues with both ide and the e1000.
> But those are among the few drivers that do anything in either
> device_shutdown() or the reboot_notifier.

.


> Looking at it more closely the code is confusing because
> FREEZE and SUSPEND are actually the same message, and in
> addition to what shutdown does they place the device in

Not in -mm; I was finally able to fix that one.

> My gut feel is that device_suspend(PMSG_FREEZE) should be
> removed from kernel_restart until is a different message
> from PMSG_SUSPEND at which point it should be equivalent
> to device_shutdown and we can remove that case.

PMSG_FREEZE != PMSG_SUSPEND in current -mm, but I'm not sure if we can
push that to 2.6.13...


Pavel
--
teflon -- maybe it is a trademark, but it should not be.

Linus Torvalds

unread,
Jul 27, 2005, 6:58:07 PM7/27/05
to Pavel Machek, Eric W. Biederman, Andrew Morton, linux-...@vger.kernel.org

On Thu, 28 Jul 2005, Pavel Machek wrote:
>
> Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.

Considering how many device drivers that are likely broken, I disagree.
Especially since Andrew seems to have trivially found a machine where it
doesn't work.

Linus

Pavel Machek

unread,
Jul 27, 2005, 6:59:45 PM7/27/05
to Linus Torvalds, Eric W. Biederman, Andrew Morton, linux-...@vger.kernel.org
Hi!

> > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
>
> Considering how many device drivers that are likely broken, I disagree.
> Especially since Andrew seems to have trivially found a machine where it
> doesn't work.

I'm not sure if we want to do that for 2.6.13, but long term, we
should just tell drivers to FREEZE instead of inventing reboot
notifier lists and similar uglynesses...

Pavel
--
teflon -- maybe it is a trademark, but it should not be.

Pavel Machek

unread,
Jul 27, 2005, 7:02:04 PM7/27/05
to Andrew Morton, ebie...@xmission.com, torv...@osdl.org, linux-...@vger.kernel.org
Hi!

> > > Good question. I'm not certain if Pavel intended to add
> > > device_suspend(PMSG_FREEZE) to the reboot path. It was
> > > there in only one instance. Pavel comments talk only about
> > > the suspend path.
> >

> > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
>

> Why?

Many bioses are broken; if you leave hardware active during reboot,
they'll hang during reboot. It is so common problem that I think that
only sane solution is make hardware quiet before reboot.

Andrew Morton

unread,
Jul 27, 2005, 7:11:58 PM7/27/05
to Pavel Machek, ebie...@xmission.com, torv...@osdl.org, linux-...@vger.kernel.org
Pavel Machek <pa...@ucw.cz> wrote:
>
> > Good question. I'm not certain if Pavel intended to add
> > device_suspend(PMSG_FREEZE) to the reboot path. It was
> > there in only one instance. Pavel comments talk only about
> > the suspend path.
>
> Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.

Why?

Eric W. Biederman

unread,
Jul 27, 2005, 7:14:00 PM7/27/05
to Pavel Machek, Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org
Pavel Machek <pa...@ucw.cz> writes:

> Hi!


>
> Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
>
>> My gut feel is the device_suspend calls are the right direction
>> as it allows us to remove code from the drivers and possible
>> kill device_shutdown completely.
>>
>> But this close to 2.6.13 I'm not certain what the correct solution
>> is. With this we have had issues with both ide and the e1000.
>> But those are among the few drivers that do anything in either
>> device_shutdown() or the reboot_notifier.

> ..


>> Looking at it more closely the code is confusing because
>> FREEZE and SUSPEND are actually the same message, and in
>> addition to what shutdown does they place the device in
>
> Not in -mm; I was finally able to fix that one.

Cool.

>> My gut feel is that device_suspend(PMSG_FREEZE) should be
>> removed from kernel_restart until is a different message
>> from PMSG_SUSPEND at which point it should be equivalent
>> to device_shutdown and we can remove that case.
>
> PMSG_FREEZE != PMSG_SUSPEND in current -mm, but I'm not sure if we can
> push that to 2.6.13...

Currently device_suspend(PMSG_FREEZE) in the reboot path breaks
the e1000 and the ide driver. Which is common enough hardware it
effectively breaks reboots in 2.6.13 despite the fact that nearly
everything thing else works.

To make device_suspend(PMSG_FREEZE) solid in the reboot path I think
it would take pushing and stabilizing all of PMSG_FREEZE != PMSG_SUSPEND.
It will certainly take a bunch of digging to make certain reboots keep
working. Since the number of drivers that implement either a reboot
notifier or a shutdown method is small the it is conceivable we could
track down all of the serious issues before 2.6.13. However I'm
not ready to take point on the bug hunt.

So unless you are really ambitious I'd like to take
device_suspend(PMSG_FREEZE) out of the reboot path for
2.6.13, put in -mm where people can bang on it for a bit
and see that it is coming and delay the merge with the stable
branch until the bugs with common hardware have been sorted out.

Eric

Eric W. Biederman

unread,
Jul 27, 2005, 7:27:22 PM7/27/05
to Pavel Machek, Linus Torvalds, Andrew Morton, linux-...@vger.kernel.org
Pavel Machek <pa...@suse.cz> writes:

> Hi!
>
>> > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
>>
>> Considering how many device drivers that are likely broken, I disagree.
>> Especially since Andrew seems to have trivially found a machine where it
>> doesn't work.
>
> I'm not sure if we want to do that for 2.6.13, but long term, we
> should just tell drivers to FREEZE instead of inventing reboot
> notifier lists and similar uglynesses...

Then as part of the patch device_shutdown should disappear. It
is silly to have two functions that want to achieve the same
thing.

Right now the device driver model is ugly and over complicated in
that case and it needs to be simplified so driver writers have
a chance of getting it correct.

device_suspend(PMSG_FREEZE) feels more reusable than device_shutdown
so long term it feels right.

But a simple question is why can't you simply use the shutdown
method instead of extending the suspend method in the drivers.

Eric

Nigel Cunningham

unread,
Jul 27, 2005, 8:58:48 PM7/27/05
to Eric W. Biederman, Andrew Morton, Linus Torvalds, Linux Kernel Mailing List, Linux-pm mailing list
Hi.

Could you please send PMSG_* related patches to linux-pm at
lists.osdl.org as well?

Thanks!

Nigel

--
Evolution.
Enumerate the requirements.
Consider the interdependencies.
Calculate the probabilities.

Eric W. Biederman

unread,
Jul 27, 2005, 9:14:29 PM7/27/05
to ncunn...@cyclades.com, Andrew Morton, Linus Torvalds, Linux Kernel Mailing List, Linux-pm mailing list
Nigel Cunningham <ncunn...@cyclades.com> writes:

> Hi.
>
> Could you please send PMSG_* related patches to linux-pm at
> lists.osdl.org as well?

I'll try. My goal was not to add or change not functionality but to
make what the kernel was already doing be consistent.

It turns out the device_suspend(PMSG_FREEZE) is a major pain
sitting in the reboot path and I will be submitting a patch to
remove it from the reboot path in 2.6.13 completely.

At the very least the ide driver breaks, and the e1000 driver
is affected.

And there is of course the puzzle of why there exists simultaneously
driver shutdown() and suspend(PMSG_FREEZE) methods as I believed they
are defined to do exactly the same thing.

Eric

dav...@pacbell.net

unread,
Jul 27, 2005, 10:23:35 PM7/27/05
to ncunn...@cyclades.com, ebie...@xmission.com, torv...@osdl.org, linu...@lists.osdl.org, linux-...@vger.kernel.org, ak...@osdl.org
> And there is of course the puzzle of why there exists simultaneously
> driver shutdown() and suspend(PMSG_FREEZE) methods as I believed they
> are defined to do exactly the same thing.

No puzzle; that's not how they're defined. Both of them cause the
device to quiesce that device's activity. But:

- shutdown() is a "dirty shutdown OK" heads-up for some level
of restart/reboot; hardware will be completely re-initialized
later, normally with hardware level resets and OS rebooting.

- freeze() is a "clean shutdown only" that normally sees hardware
state preserved, and is followed by suspend() or resume().

Or so I had understood. That does suggest why having FREEZE in the
reboot path could be trouble: you could be rebooting because that
hardware won't do a clean shutdown!

- Dave

Shaohua Li

unread,
Jul 27, 2005, 10:46:01 PM7/27/05
to Eric W. Biederman, ncunn...@cyclades.com, Andrew Morton, Linus Torvalds, Linux-pm mailing list, Linux Kernel Mailing List
On Wed, 2005-07-27 at 19:12 -0600, Eric W. Biederman wrote:
> Nigel Cunningham <ncunn...@cyclades.com> writes:
>
> > Hi.
> >
> > Could you please send PMSG_* related patches to linux-pm at
> > lists.osdl.org as well?
>
> I'll try. My goal was not to add or change not functionality but to
> make what the kernel was already doing be consistent.
>
> It turns out the device_suspend(PMSG_FREEZE) is a major pain
> sitting in the reboot path and I will be submitting a patch to
> remove it from the reboot path in 2.6.13 completely.
>
> At the very least the ide driver breaks, and the e1000 driver
> is affected.
>
> And there is of course the puzzle of why there exists simultaneously
> driver shutdown() and suspend(PMSG_FREEZE) methods as I believed they
> are defined to do exactly the same thing.
I would expect more driver breakage and for the shutdown either. In
current stage, suspend(PMSG_FREEZE) might put devices into D3 state. How
can a shutdown() be done again?

Thanks,
Shaohua

Pavel Machek

unread,
Jul 28, 2005, 3:47:09 AM7/28/05
to Eric W. Biederman, Linus Torvalds, Andrew Morton, linux-...@vger.kernel.org
Hi!

> >> > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
> >>
> >> Considering how many device drivers that are likely broken, I disagree.
> >> Especially since Andrew seems to have trivially found a machine where it
> >> doesn't work.
> >
> > I'm not sure if we want to do that for 2.6.13, but long term, we
> > should just tell drivers to FREEZE instead of inventing reboot
> > notifier lists and similar uglynesses...
>
> Then as part of the patch device_shutdown should disappear. It
> is silly to have two functions that want to achieve the same
> thing.

I always thought that device_shutdown is different phase -- the one
with interrupts disabled...


Pavel
--
teflon -- maybe it is a trademark, but it should not be.

Pavel Machek

unread,
Jul 28, 2005, 3:48:51 AM7/28/05
to Eric W. Biederman, Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org
Hi!

> So unless you are really ambitious I'd like to take
> device_suspend(PMSG_FREEZE) out of the reboot path for
> 2.6.13, put in -mm where people can bang on it for a bit
> and see that it is coming and delay the merge with the stable
> branch until the bugs with common hardware have been sorted out.

Works for me. I may feel ambitious, but I don't feel like debugging
every reboot problem or every strange architecture for 2.6.13 :-).

Pavel
--
teflon -- maybe it is a trademark, but it should not be.

Eric W. Biederman

unread,
Jul 28, 2005, 10:59:06 AM7/28/05
to Pavel Machek, Linus Torvalds, Andrew Morton, linux-...@vger.kernel.org
Pavel Machek <pa...@suse.cz> writes:

> I always thought that device_shutdown is different phase -- the one
> with interrupts disabled...

Nope. device_shutdown runs before interrupts are disabled.

Eric

Eric W. Biederman

unread,
Jul 28, 2005, 11:16:03 PM7/28/05
to Pavel Machek, Andrew Morton, torv...@osdl.org, linux-...@vger.kernel.org
Pavel Machek <pa...@suse.cz> writes:

> Hi!
>
>> So unless you are really ambitious I'd like to take
>> device_suspend(PMSG_FREEZE) out of the reboot path for
>> 2.6.13, put in -mm where people can bang on it for a bit
>> and see that it is coming and delay the merge with the stable
>> branch until the bugs with common hardware have been sorted out.
>
> Works for me. I may feel ambitious, but I don't feel like debugging
> every reboot problem or every strange architecture for 2.6.13 :-).

Curses. Foiled again!

I have failed to pass the buck.

Eric

Eric W. Biederman

unread,
Jul 28, 2005, 11:18:58 PM7/28/05
to Pavel Machek, Linus Torvalds, Andrew Morton, linux-...@vger.kernel.org
Pavel Machek <pa...@suse.cz> writes:


> I always thought that device_shutdown is different phase -- the one
> with interrupts disabled...

At the end of device_shutdown interrupts are disabled because we
shutdown the interrupt controllers.

I don't think we have a phase where the interrupts are disabled,
except for the code that runs after device_shutdown, and the bootup
code that happens before interrupts are enabled.

Eric

Nigel Cunningham

unread,
Aug 3, 2005, 11:25:40 PM8/3/05
to Pavel Machek, Andrew Morton, ebie...@xmission.com, Linus Torvalds, Linux Kernel Mailing List
Hi.

On Thu, 2005-07-28 at 08:54, Pavel Machek wrote:
> Hi!
>
> > > > Good question. I'm not certain if Pavel intended to add
> > > > device_suspend(PMSG_FREEZE) to the reboot path. It was
> > > > there in only one instance. Pavel comments talk only about
> > > > the suspend path.
> > >
> > > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
> >
> > Why?
>
> Many bioses are broken; if you leave hardware active during reboot,
> they'll hang during reboot. It is so common problem that I think that
> only sane solution is make hardware quiet before reboot.

Sorry for my slow reply.

If I remember correctly PMSG_FREEZE was intended solely for stopping
activity when suspend to disk implementations are about to do their
atomic copies. I thought that ide reacts to this message by putting a
hold on queues, but doesn't otherwise do anything to prepare a drive for
a restart. If that's true, using FREEZE here isn't going to stop drives
from doing their emergency shutdown actions. Don't we need PMSG_SUSPEND
instead?

Regards,

Nigel


--
Evolution.
Enumerate the requirements.
Consider the interdependencies.
Calculate the probabilities.

-

Pavel Machek

unread,
Aug 4, 2005, 5:56:45 PM8/4/05
to Nigel Cunningham, Andrew Morton, ebie...@xmission.com, Linus Torvalds, Linux Kernel Mailing List
Hi!

> > > > > Good question. I'm not certain if Pavel intended to add
> > > > > device_suspend(PMSG_FREEZE) to the reboot path. It was
> > > > > there in only one instance. Pavel comments talk only about
> > > > > the suspend path.
> > > >
> > > > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
> > >
> > > Why?
> >
> > Many bioses are broken; if you leave hardware active during reboot,
> > they'll hang during reboot. It is so common problem that I think that
> > only sane solution is make hardware quiet before reboot.
>
> Sorry for my slow reply.
>
> If I remember correctly PMSG_FREEZE was intended solely for stopping
> activity when suspend to disk implementations are about to do their

Well, I think that PMSG_FREEZE can be handy when we want to stop
activity for other reasons, too...

> atomic copies. I thought that ide reacts to this message by putting a
> hold on queues, but doesn't otherwise do anything to prepare a drive for
> a restart. If that's true, using FREEZE here isn't going to stop drives
> from doing their emergency shutdown actions. Don't we need PMSG_SUSPEND
> instead?

Spinning disk down is not neccessary for reboot. Users will be angry
if we do it before reboot...
Pavel
--
if you have sharp zaurus hardware you don't need... you know my address

Nigel Cunningham

unread,
Aug 4, 2005, 6:23:56 PM8/4/05
to Pavel Machek, Andrew Morton, ebie...@xmission.com, Linus Torvalds, Linux Kernel Mailing List
Hi.

On Fri, 2005-08-05 at 07:45, Pavel Machek wrote:
> Hi!
>
> > > > > > Good question. I'm not certain if Pavel intended to add
> > > > > > device_suspend(PMSG_FREEZE) to the reboot path. It was
> > > > > > there in only one instance. Pavel comments talk only about
> > > > > > the suspend path.
> > > > >
> > > > > Yes, I think we should do device_suspend(PMSG_FREEZE) in reboot path.
> > > >
> > > > Why?
> > >
> > > Many bioses are broken; if you leave hardware active during reboot,
> > > they'll hang during reboot. It is so common problem that I think that
> > > only sane solution is make hardware quiet before reboot.
> >
> > Sorry for my slow reply.
> >
> > If I remember correctly PMSG_FREEZE was intended solely for stopping
> > activity when suspend to disk implementations are about to do their
>
> Well, I think that PMSG_FREEZE can be handy when we want to stop
> activity for other reasons, too...
>
> > atomic copies. I thought that ide reacts to this message by putting a
> > hold on queues, but doesn't otherwise do anything to prepare a drive for
> > a restart. If that's true, using FREEZE here isn't going to stop drives
> > from doing their emergency shutdown actions. Don't we need PMSG_SUSPEND
> > instead?
>
> Spinning disk down is not neccessary for reboot. Users will be angry
> if we do it before reboot...

Yes, but I understood (perhaps wrongly) that we were discussing the
shutdown path. Nevertheless, for rebooting, you don't want to simply
freeze the queue - you want to flush the queue and tell the drive to
flush too. For freeze, you may well flush the queue, but you might not
necessarily force the drive to flush its queue too.

Regards,

Nigel
--
Evolution.
Enumerate the requirements.
Consider the interdependencies.
Calculate the probabilities.

-

Eric W. Biederman

unread,
Aug 7, 2005, 8:50:36 AM8/7/05
to Pavel Machek

Early in the 2.6.13 process my kexec related patches were introduced
into the reboot path, and under the rule you touch it you fix it
it I have been involved in tracking quite a few regressions
on the reboot path.

Recently with Benjamin Herrenschmidt's removal of
device_suppend(PMSG_SUPPEND) from kernel_power_off(),
it seems the last of the issues went away. I asked
for confirmation that reintroducing the patch below
would break systems and I received it.

The experimental evidence then is that calling
device_suspend(PMSG_SUSPEND) in kernel_power_off
when performing an acpi_power_off is actively harmful.

There as been a fair amount of consensus that calling
device_suspend(...) in the reboot path was inappropriate now, because
the device suspend code was too immature. With this latest
piece of evidence it seems to me that introducing device_suspend(...)
in kernel_power_off, kernel_halt, kernel_reboot, or kernel_kexec
can never be appropriate. I am including as many people as
I can on this so we in our collective wisdom don't forget this
lesson.

What lead us to this situation was a real problem, and a desire
to fix it. Currently linux has a proliferation of interfaces
to place devices in different states. The reboot notifiers,
device_suspend(...), device_shutdown+system_state, remove_one,
module_exit, and probably a few I'm not aware of. Each interface
introduced by a different author wanting to solve a different problem.
Just writing this list of interfaces is confusing. The implementation
task for device driver writers appears even harder as simultaneously
implementing all of these interfaces correctly seems not to happen.

The lesson: fixing this problem is heavy lifting, and that
device_suspend() in the reboot path is not the answer.

Eric


This is the patch that subtly and mysterously broke things.


> diff --git a/kernel/sys.c b/kernel/sys.c
> --- a/kernel/sys.c
> +++ b/kernel/sys.c

> @@ -404,6 +404,7 @@ void kernel_halt(void)
> {


> notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);
> system_state = SYSTEM_HALT;
> + device_suspend(PMSG_SUSPEND);
> device_shutdown();
> printk(KERN_EMERG "System halted.\n");
> machine_halt();

> @@ -414,6 +415,7 @@ void kernel_power_off(void)
> {


> notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);
> system_state = SYSTEM_POWER_OFF;
> + device_suspend(PMSG_SUSPEND);
> device_shutdown();
> printk(KERN_EMERG "Power down.\n");
> machine_power_off();
>
>

Pavel Machek

unread,
Aug 7, 2005, 3:04:25 PM8/7/05
to Eric W. Biederman
Hi!

> There as been a fair amount of consensus that calling
> device_suspend(...) in the reboot path was inappropriate now, because
> the device suspend code was too immature. With this latest
> piece of evidence it seems to me that introducing device_suspend(...)
> in kernel_power_off, kernel_halt, kernel_reboot, or kernel_kexec
> can never be appropriate.

Code is not ready now => it can never be fixed? Thats quite a strange
conclusion to make.
Pavel
--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

Eric W. Biederman

unread,
Aug 7, 2005, 3:48:13 PM8/7/05
to Pavel Machek
Pavel Machek <pa...@suse.cz> writes:

> Hi!
>
>> There as been a fair amount of consensus that calling
>> device_suspend(...) in the reboot path was inappropriate now, because
>> the device suspend code was too immature. With this latest
>> piece of evidence it seems to me that introducing device_suspend(...)
>> in kernel_power_off, kernel_halt, kernel_reboot, or kernel_kexec
>> can never be appropriate.
>
> Code is not ready now => it can never be fixed? Thats quite a strange
> conclusion to make.

It seems there is an fundamental incompatibility with ACPI power off.
As best as I can tell the normal case of device_suspend(PMSG_SUSPEND)
works reasonably well in 2.6.x.

From what I can tell there are some fairly fundamental semantic
differences, on that code path. The most peculiar problem I tracked
is someone had a machine that would go into power off state and then
wake right back up because of the device_suspend(PMSG_SUSPEND) change.
If acpi power off doesn't expect the hardware to be suspended I don't
see how you can make device_suspend(PMSG_SUSPEND) a default in 2.6.x.

I won't call it impossible to resolve the problems, but the people
doing it must be extremely sensitive to the subtle semantic
differences that exist and the burden of proof for showing things work
better need to be extremely high. And the developer who reintroduces
it gets to own all of the reboot/halt/power off problems in the stable
tree when it gets merged.

Pavel the fact that you did not articulate a possibility that your
change could have caused most of the problems that were seen with
it is what scares me the most. The fairly trivial and obvious
problems were not addressed when the patch was merged much less the
subtle ones. Your initial patch did not even touch all of the code
paths necessary to achieve what it was trying to do.

So yes without a darn good argument as to why it should work. I will
go with the experimental evidence that it fails miserably and
trivially because of semantic incompatibility and can therefore
never be fixed.

Eric

Pavel Machek

unread,
Aug 7, 2005, 5:11:56 PM8/7/05
to Eric W. Biederman
Hi!

> >> There as been a fair amount of consensus that calling
> >> device_suspend(...) in the reboot path was inappropriate now, because
> >> the device suspend code was too immature. With this latest
> >> piece of evidence it seems to me that introducing device_suspend(...)
> >> in kernel_power_off, kernel_halt, kernel_reboot, or kernel_kexec
> >> can never be appropriate.
> >
> > Code is not ready now => it can never be fixed? Thats quite a strange
> > conclusion to make.
>
> It seems there is an fundamental incompatibility with ACPI power off.
> As best as I can tell the normal case of device_suspend(PMSG_SUSPEND)
> works reasonably well in 2.6.x.

Powerdown is going to have the same problems as the powerdown at the
end of suspend-to-disk. Can you ask people reporting broken shutdown
to try suspend-to-disk?

> >From what I can tell there are some fairly fundamental semantic
> differences, on that code path. The most peculiar problem I tracked
> is someone had a machine that would go into power off state and then
> wake right back up because of the device_suspend(PMSG_SUSPEND)
> change.

So something is wrong with ACPI wakeup GPEs. It would hurt in
suspend-to-disk case, too.

> I won't call it impossible to resolve the problems, but the people

Good.

> So yes without a darn good argument as to why it should work. I will
> go with the experimental evidence that it fails miserably and
> trivially because of semantic incompatibility and can therefore
> never be fixed.

I do not think any "semantic" issues exist. We need to pass detailed
info down to the drivers that care, and we need to fix all the bugs in
the drivers. That should be pretty much it.


Pavel
--
if you have sharp zaurus hardware you don't need... you know my address

Eric W. Biederman

unread,
Aug 9, 2005, 1:27:05 PM8/9/05
to Pavel Machek
Pavel Machek <pa...@ucw.cz> writes:

> Hi!
>
>> >> There as been a fair amount of consensus that calling
>> >> device_suspend(...) in the reboot path was inappropriate now, because
>> >> the device suspend code was too immature. With this latest
>> >> piece of evidence it seems to me that introducing device_suspend(...)
>> >> in kernel_power_off, kernel_halt, kernel_reboot, or kernel_kexec
>> >> can never be appropriate.
>> >
>> > Code is not ready now => it can never be fixed? Thats quite a strange
>> > conclusion to make.
>>
>> It seems there is an fundamental incompatibility with ACPI power off.
>> As best as I can tell the normal case of device_suspend(PMSG_SUSPEND)
>> works reasonably well in 2.6.x.
>
> Powerdown is going to have the same problems as the powerdown at the
> end of suspend-to-disk. Can you ask people reporting broken shutdown
> to try suspend-to-disk?

Everyone I know of who is affected has been copied on this thread.
However your request is just nonsense. There is a device_resume in
the code before we get to the device_shutdown so there should be no
effect at all. Are we looking at the same kernel?

>> >From what I can tell there are some fairly fundamental semantic
>> differences, on that code path. The most peculiar problem I tracked
>> is someone had a machine that would go into power off state and then
>> wake right back up because of the device_suspend(PMSG_SUSPEND)
>> change.
>
> So something is wrong with ACPI wakeup GPEs. It would hurt in
> suspend-to-disk case, too.

Something was wrong. I can't possibly see how the suspend-to-disk
case would be affected.

>> I won't call it impossible to resolve the problems, but the people
>
> Good.

Nope. Now that I have read the code I would just call it nonsense.

>> So yes without a darn good argument as to why it should work. I will
>> go with the experimental evidence that it fails miserably and
>> trivially because of semantic incompatibility and can therefore
>> never be fixed.
>
> I do not think any "semantic" issues exist. We need to pass detailed
> info down to the drivers that care, and we need to fix all the bugs in
> the drivers. That should be pretty much it.

Given that acpi and other platform firmware is involved there are
pieces we cannot fix. We either match the spec or we are incorrect.

I haven't a clue how suspend/resume is expected to interact with
things in suspend to disk scenario. Reading through the code
the power message is PMSG_FREEZE not PMSG_SUSPEND (as you
implemented). All of the hardware is actually resumed before
we device_shutdown() is called.

I want to see the correlation between device_suspend(PMSG_FREEZE) and
the code in device_shutdown(), but I don't see it.
device_suspend(...) is all about allowing the state of a device to be
preserved. device_shutdown() is really about stopping it. These are
really quite different operations.

With the pm_suspend_disk calling kernel_power_off it appears that we
currently have complete code reuse of the relevant code on that path.

Currently I see no true redundancy between the two cases at all.
The methods do different things for different purposes. Which is
about the largest semantic difference I can think of. The fact
that the methods at first glance look like they do the same
thing is probably the real surprise.

Calling device_suspend(...) from kernel_power_off, kernel_halt,
kernel_kexec, or kernel_restart seems pointless, useless and silly.

Eric

Nigel Cunningham

unread,
Aug 9, 2005, 5:31:48 PM8/9/05
to Eric W. Biederman
Hi.

On Wed, 2005-08-10 at 03:25, Eric W. Biederman wrote:
> Pavel Machek <pa...@ucw.cz> writes:
>
> > Hi!
> >
> >> >> There as been a fair amount of consensus that calling
> >> >> device_suspend(...) in the reboot path was inappropriate now, because
> >> >> the device suspend code was too immature. With this latest
> >> >> piece of evidence it seems to me that introducing device_suspend(...)
> >> >> in kernel_power_off, kernel_halt, kernel_reboot, or kernel_kexec
> >> >> can never be appropriate.
> >> >
> >> > Code is not ready now => it can never be fixed? Thats quite a strange
> >> > conclusion to make.
> >>
> >> It seems there is an fundamental incompatibility with ACPI power off.
> >> As best as I can tell the normal case of device_suspend(PMSG_SUSPEND)
> >> works reasonably well in 2.6.x.
> >
> > Powerdown is going to have the same problems as the powerdown at the
> > end of suspend-to-disk. Can you ask people reporting broken shutdown
> > to try suspend-to-disk?
>
> Everyone I know of who is affected has been copied on this thread.
> However your request is just nonsense. There is a device_resume in
> the code before we get to the device_shutdown so there should be no
> effect at all. Are we looking at the same kernel?

My poweroff after suspend-to-disk was broken during 2.6.13-rcs, and came
right in rc6.

Agreed here.

> With the pm_suspend_disk calling kernel_power_off it appears that we
> currently have complete code reuse of the relevant code on that path.
>
> Currently I see no true redundancy between the two cases at all.
> The methods do different things for different purposes. Which is
> about the largest semantic difference I can think of. The fact
> that the methods at first glance look like they do the same
> thing is probably the real surprise.

If the suspend to disk code called kernel_power_off, it should be
exactly what it sounds like. We've already written the image and we now
went to simply power down the machine. Just as with a 'normal'
powerdown, we should do everything necessary to ensure all data
submitted to hard drives is really flushed and that emergency head
parking isn't done, and then power down (or reboot). This should, so far
as I can see, be exactly the same in both cases.

Regards,

Nigel

> Calling device_suspend(...) from kernel_power_off, kernel_halt,
> kernel_kexec, or kernel_restart seems pointless, useless and silly.
>
> Eric

--
Evolution.
Enumerate the requirements.
Consider the interdependencies.
Calculate the probabilities.

-

Pavel Machek

unread,
Aug 10, 2005, 9:08:50 AM8/10/05
to Eric W. Biederman
Hi!

> >> > Code is not ready now => it can never be fixed? Thats quite a strange
> >> > conclusion to make.
> >>
> >> It seems there is an fundamental incompatibility with ACPI power off.
> >> As best as I can tell the normal case of device_suspend(PMSG_SUSPEND)
> >> works reasonably well in 2.6.x.
> >
> > Powerdown is going to have the same problems as the powerdown at the
> > end of suspend-to-disk. Can you ask people reporting broken shutdown
> > to try suspend-to-disk?
>
> Everyone I know of who is affected has been copied on this thread.
> However your request is just nonsense. There is a device_resume in
> the code before we get to the device_shutdown so there should be no
> effect at all. Are we looking at the same kernel?

Sorry, my fault. kernel/power/disk.c:power_down(): it calls
device_shutdown even in PM_DISK_PLATFORM case. I thought it would do
device_suspend() to enable drivers doing something more clever.

So only missing piece of puzzle seems to be making sure disks are
properly spinned down in device_shutdown... and even that seems to be
there, not sure why it was broken before.


Pavel
--
if you have sharp zaurus hardware you don't need... you know my address

0 new messages