Package: sysvinit Version: 2.86.ds1-61 Severity: important
Upgrading to lenny from etch has broken wake on lan after 'poweroff' on our many Dell Optiplex headless /diskless systems. All packages are latest lenny/testing as of 6 October 2008.
Previously the systems would respond to etherwake 'magic packets'. Now, they don't. The adapters are 3c59x or e100 based, one per system. ethtool reports wake on lan is enabled.
Looking at the code I see that the latest network drivers always disable WOL after ifup and before ifdown. Then, either at driver unload time or ifdown time they set the WOL bit in hardware according to option. However, when the root filesystem is NFS, the system halt doesn't do ifdown as when that is forced on under NFS root, that causes halt to hang waiting for now-disable d NFS to satisfy calls - this blocks halt operations forever.
So, no ifdown when NFS root means the WOL enablement bits aren't set in the drivers -- breaking WOL.
The only way I got it to work under etch was to modify the drivers. Also there was a problem in the binary halt distribution not calling ifdown, compling from standard source (same version ) did work (odd, that).
But the drivers have changed quite a bit since .18 and the right place to deal with this I think is above the driver level. What network driver routine is always called even on NFS root s ystems after it is certain no further kernel or userspace filesystem accesses will occur? Either the halt routine has to change so it operates the same way NFS root or local root, or net d river writers need to know for sure what the last driver call will be when powering off in an NFS root setup.
I was considering changing halt to create a tiny tempfs root file system, then doing a pivot_root to it inside halt so ifdown could proceed. But the hack rating was just too high.
Kernel: Linux 2.6.26-1-686 (SMP w/1 CPU core) Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1) Shell: /bin/sh linked to /bin/bash
Versions of packages sysvinit depends on: ii initscripts 2.86.ds1-61 Scripts for initializing and shutt ii libc6 2.7-13 GNU C Library: Shared libraries ii libselinux1 2.0.65-5 SELinux shared libraries ii libsepol1 2.0.30-2 Security Enhanced Linux policy lib ii sysv-rc 2.86.ds1-61 System-V-like runlevel change mech ii sysvinit-utils 2.86.ds1-61 System-V-like utilities
sysvinit recommends no packages.
sysvinit suggests no packages.
-- no debconf information
-- To UNSUBSCRIBE, email to debian-bugs-dist-REQU...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Discussion subject changed to "Bug#501351: [Pkg-sysvinit-devel] Bug#501351: sysvinit: halt breaks wake on lan / WOL under NFS root /diskless lenny" by Petter Reinholdtsen
> Upgrading to lenny from etch has broken wake on lan after 'poweroff' > on our many Dell Optiplex headless /diskless systems. All packages > are latest lenny/testing as of 6 October 2008.
This is sad.
> Looking at the code I see that the latest network drivers always > disable WOL after ifup and before ifdown. Then, either at driver > unload time or ifdown time they set the WOL bit in hardware > according to option.
It the kernel drivers behavior with respect to wake-on-lan defined somewhere? As I see this, the kernel drivers have no consistent interface to userspace, and thus make it impossible to solve this in a reliably way in userspace. This make it the responsibility of the kernel drivers and the kernel space to come up with a solution. If there is a definition on how the kernel drivers should behave, and user space applications can depend on this definition during shutdown, we can discuss how to solve this in sysvinit. Until such definition show up, I have no idea how to solve this issue reliably.
Please let us know how the kernel decided to handle wake-on-lan in network drivers, if such decision exist.
Happy hacking, -- Petter Reinholdtsen
-- To UNSUBSCRIBE, email to debian-bugs-dist-REQU...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
>> Upgrading to lenny from etch has broken wake on lan after 'poweroff' >> on our many Dell Optiplex headless /diskless systems. All packages >> are latest lenny/testing as of 6 October 2008.
> This is sad.
Yes.
>> Looking at the code I see that the latest network drivers always >> disable WOL after ifup and before ifdown. Then, either at driver >> unload time or ifdown time they set the WOL bit in hardware >> according to option.
> It the kernel drivers behavior with respect to wake-on-lan defined > somewhere? As I see this, the kernel drivers have no consistent > interface to userspace, and thus make it impossible to solve this in a > reliably way in userspace.
Because 'halt' has kernel-feeling options to block or do 'ifdown' and block or do 'disk sync', its authors must be part of the answer. Long term my suggestion is both those options seem to be things the kernel filesystem and storage drivers should do on its way to orderly ending/rebooting and they properly shouldn't be part of userspace halt.
But since they are at present and since halt is the program that hangs when doing the right thing (asking for ifdown to set WOL) I am asking halt authors to engage the correct kernel maintainers to decide either the kernel and network driver authors must take responsibility to down the interface or call shutdown routines even when NFS root just before poweroff happens, or userspace drivers must pivot_root off the nfs root to a tempfs system and down the interface before calling for a system poweroff. As it is, network drivers engage WOL options when downing the interface (though I worry some do it in the .shutdown routine).
Anyone can reproduce this problem: set up a NFS root system then try poweroff using and not using halt's ifdown option. When choosing ifdown option, you will see halt hang just before poweroff waiting for a response from the NFS server that will never come because ifdown has happened. In this case WOL is set in the adapters but because halt never actually turns the system off and never exits -- having WOL turned on is of no importance. The system is effectively hung until a physical reboot. In the other case, leaving halt's ifdown option not chosen when using NFS root will allow halt to exit normally, but the network drivers routines that set the WOL bit never get called by the kernel, so WOL is broken although the system is turned off.
> This make it the responsibility of the > kernel drivers and the kernel space to come up with a solution. If > there is a definition on how the kernel drivers should behave, and > user space applications can depend on this definition during shutdown, > we can discuss how to solve this in sysvinit. Until such definition > show up, I have no idea how to solve this issue reliably.
> Please let us know how the kernel decided to handle wake-on-lan in > network drivers, if such decision exist.
I am asking the halt authors to forward this bug to the correct people since your ideas about who they are must be better than mine.
> Happy hacking,
Not so much as I hope! Upgrades near release like Lenny are not supposed to break basic things. The effect of this problem is wasteful usage of energy keeping systems that have no work to do nevertheless powered up.
-- To UNSUBSCRIBE, email to debian-bugs-dist-REQU...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Discussion subject changed to "Bug#501351: [Pkg-sysvinit-devel] Bug#501351: Bug#501351: sysvinit: halt breaks wake on lan / WOL under NFS root /diskless lenny" by Henrique de Moraes Holschuh
On Tue, 07 Oct 2008, Harry Coin wrote: > > It the kernel drivers behavior with respect to wake-on-lan defined > > somewhere? As I see this, the kernel drivers have no consistent > > interface to userspace, and thus make it impossible to solve this in a > > reliably way in userspace. > Because 'halt' has kernel-feeling options to block or do 'ifdown' and > block or do 'disk sync', its authors must be part of the answer. Long > term my suggestion is both those options seem to be things the kernel > filesystem and storage drivers should do on its way to orderly > ending/rebooting and they properly shouldn't be part of userspace halt.
We have been through a similar problem a while ago: disk shutdown and cache flushing. It caused A LOT of pain.
The short answer is: the kernel has to do its job right on anything related to device setup for suspend/hibernation/power off, and trying to fix it from userspace just causes things to blow up when the kernel finally gets its act together.
This is true for WoL just like it was true for disk shutdown. If WoL is enabled, the kernel driver has to make sure everything that needs to be done to the hardware to make it active when entering S3/S4/S5 is done. There is NO other acceptable solution, because this is the only way to *always* get it right, without weird border conditions.
Since the state of things right now is so utterly sad, we can't do much more than provide both possibilities (good and broken drivers), warn the user that he has to test which one he needs, and keep one eye in the kernel to react quickly if the network drivers get fixed.
> when doing the right thing (asking for ifdown to set WOL) I am asking > halt authors to engage the correct kernel maintainers to decide either > the kernel and network driver authors must take responsibility to down
I could do that, yes. Please provide me a list of the drivers you have verified to have retarded WoL behaviour, along with the kernel version. I can at least make sure it is all listed in bugzilla.kernel.org, although I doubt I can fix it myself.
> poweroff. As it is, network drivers engage WOL options when downing > the interface (though I worry some do it in the .shutdown routine).
They should engage/de-engage WoL in device-model .shutdown and .suspend. And if that means they have to bring the transceiver up/down or even paint the moon blue in order to have the hardware do the right thing, they are to do it.
At that point, userspace (i.e. Debian) is already frozen (for S3/S4) or destroied (for S5 - shutdown) and doesn't care.
In fact, we should not have to touch the interfaces at all in the userspace shutdown path, just like we don't have to do it anymore for disks (in fact, we must NOT, for disks).
> Not so much as I hope! Upgrades near release like Lenny are not supposed > to break basic things. The effect of this problem is wasteful usage of > energy keeping systems that have no work to do nevertheless powered up.
I fear we are not going to touch this anymore for Lenny, except as documentation. If any of the other maintainers want to, I won't stand in the way, nor complain about it... but I doubt anyone will want to touch the shutdown path this late in the freeze.
It is not like we can even fix this crap for good unless the kernel gets its act together, anyway...
-- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh
-- To UNSUBSCRIBE, email to debian-bugs-dist-REQU...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
>>> It the kernel drivers behavior with respect to wake-on-lan defined >>> somewhere? As I see this, the kernel drivers have no consistent >>> interface to userspace, and thus make it impossible to solve this in a >>> reliably way in userspace.
>> Because 'halt' has kernel-feeling options to block or do 'ifdown' and >> block or do 'disk sync', its authors must be part of the answer. Long >> term my suggestion is both those options seem to be things the kernel >> filesystem and storage drivers should do on its way to orderly >> ending/rebooting and they properly shouldn't be part of userspace halt.
> We have been through a similar problem a while ago: disk shutdown and cache > flushing. It caused A LOT of pain.
> The short answer is: the kernel has to do its job right on anything related > to device setup for suspend/hibernation/power off, and trying to fix it from > userspace just causes things to blow up when the kernel finally gets its act > together.
> This is true for WoL just like it was true for disk shutdown. If WoL is > enabled, the kernel driver has to make sure everything that needs to be done > to the hardware to make it active when entering S3/S4/S5 is done. There is > NO other acceptable solution, because this is the only way to *always* get > it right, without weird border conditions.
> Since the state of things right now is so utterly sad, we can't do much more > than provide both possibilities (good and broken drivers), warn the user > that he has to test which one he needs, and keep one eye in the kernel to > react quickly if the network drivers get fixed.
>> when doing the right thing (asking for ifdown to set WOL) I am asking >> halt authors to engage the correct kernel maintainers to decide either >> the kernel and network driver authors must take responsibility to down
> I could do that, yes. Please provide me a list of the drivers you have > verified to have retarded WoL behaviour, along with the kernel version. I > can at least make sure it is all listed in bugzilla.kernel.org, although I > doubt I can fix it myself.
Thank you.
The drivers that don't enable WOL / wake-on-lan in the context of NFS root after poweroff include e100.c and 3c59x.c as shipped in Debian /Lenny-Testing. Modified versions of these drivers did function properly in Debian etch (kernel .18). But so many changes happened to them that my changes don't apply any longer. I think the problem is they expect ifdown to happen and enable WOL there, they don't check for it at .shutdown. I think I remember someone hacking a version of the e100.c to WOL on suspend, but as for poweroff: Etch / 2..18 yes, Lenny 2..26 no.
>> poweroff. As it is, network drivers engage WOL options when downing >> the interface (though I worry some do it in the .shutdown routine).
> They should engage/de-engage WoL in device-model .shutdown and .suspend. > And if that means they have to bring the transceiver up/down or even paint > the moon blue in order to have the hardware do the right thing, they are to > do it.
> At that point, userspace (i.e. Debian) is already frozen (for S3/S4) or > destroied (for S5 - shutdown) and doesn't care.
> In fact, we should not have to touch the interfaces at all in the userspace > shutdown path, just like we don't have to do it anymore for disks (in fact, > we must NOT, for disks).
>> Not so much as I hope! Upgrades near release like Lenny are not supposed >> to break basic things. The effect of this problem is wasteful usage of >> energy keeping systems that have no work to do nevertheless powered up.
> I fear we are not going to touch this anymore for Lenny, except as > documentation. If any of the other maintainers want to, I won't stand in > the way, nor complain about it... but I doubt anyone will want to touch the > shutdown path this late in the freeze.
> It is not like we can even fix this crap for good unless the kernel gets its > act together, anyway...
-- To UNSUBSCRIBE, email to debian-bugs-dist-REQU...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org