Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#501351: sysvinit: halt breaks wake on lan / WOL under NFS root /diskless lenny

11 views
Skip to first unread message

Harry Coin

unread,
Oct 6, 2008, 4:10:27 PM10/6/08
to
Package: sysvinit
Version: 2.86.ds1-61
Severity: important

Upgrading to lenny from etch has broken wake on lan after 'poweroff' on our many Dell Optiplex headless /diskless systems. All packages are latest lenny/testing as of 6 October 2008.

Previously the systems would respond to etherwake 'magic packets'. Now, they don't. The adapters are 3c59x or e100 based, one per system. ethtool reports wake on lan is enabled.

Looking at the code I see that the latest network drivers always disable WOL after ifup and before ifdown. Then, either at driver unload time or ifdown time they set the WOL bit in hardware
according to option. However, when the root filesystem is NFS, the system halt doesn't do ifdown as when that is forced on under NFS root, that causes halt to hang waiting for now-disable
d NFS to satisfy calls - this blocks halt operations forever.

So, no ifdown when NFS root means the WOL enablement bits aren't set in the drivers -- breaking WOL.

The only way I got it to work under etch was to modify the drivers. Also there was a problem in the binary halt distribution not calling ifdown, compling from standard source (same version
) did work (odd, that).

But the drivers have changed quite a bit since .18 and the right place to deal with this I think is above the driver level. What network driver routine is always called even on NFS root s
ystems after it is certain no further kernel or userspace filesystem accesses will occur? Either the halt routine has to change so it operates the same way NFS root or local root, or net d
river writers need to know for sure what the last driver call will be when powering off in an NFS root setup.

I was considering changing halt to create a tiny tempfs root file system, then doing a pivot_root to it inside halt so ifdown could proceed. But the hack rating was just too high.


-- System Information:
Debian Release: lenny/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: i386 (i686)

Kernel: Linux 2.6.26-1-686 (SMP w/1 CPU core)
Locale: LANG=en_US, LC_CTYPE=en_US (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash

Versions of packages sysvinit depends on:
ii initscripts 2.86.ds1-61 Scripts for initializing and shutt
ii libc6 2.7-13 GNU C Library: Shared libraries
ii libselinux1 2.0.65-5 SELinux shared libraries
ii libsepol1 2.0.30-2 Security Enhanced Linux policy lib
ii sysv-rc 2.86.ds1-61 System-V-like runlevel change mech
ii sysvinit-utils 2.86.ds1-61 System-V-like utilities

sysvinit recommends no packages.

sysvinit suggests no packages.

-- no debconf information

--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Petter Reinholdtsen

unread,
Oct 6, 2008, 5:30:16 PM10/6/08
to
[Harry Coin]

> Upgrading to lenny from etch has broken wake on lan after 'poweroff'
> on our many Dell Optiplex headless /diskless systems. All packages
> are latest lenny/testing as of 6 October 2008.

This is sad.

> Looking at the code I see that the latest network drivers always
> disable WOL after ifup and before ifdown. Then, either at driver
> unload time or ifdown time they set the WOL bit in hardware
> according to option.

It the kernel drivers behavior with respect to wake-on-lan defined
somewhere? As I see this, the kernel drivers have no consistent
interface to userspace, and thus make it impossible to solve this in a
reliably way in userspace. This make it the responsibility of the
kernel drivers and the kernel space to come up with a solution. If
there is a definition on how the kernel drivers should behave, and
user space applications can depend on this definition during shutdown,
we can discuss how to solve this in sysvinit. Until such definition
show up, I have no idea how to solve this issue reliably.

Please let us know how the kernel decided to handle wake-on-lan in
network drivers, if such decision exist.

Happy hacking,
--
Petter Reinholdtsen

Harry Coin

unread,
Oct 7, 2008, 12:10:14 PM10/7/08
to
Petter Reinholdtsen wrote:
> [Harry Coin]
>
>> Upgrading to lenny from etch has broken wake on lan after 'poweroff'
>> on our many Dell Optiplex headless /diskless systems. All packages
>> are latest lenny/testing as of 6 October 2008.
>>
>
> This is sad.
>
Yes.

>
>> Looking at the code I see that the latest network drivers always
>> disable WOL after ifup and before ifdown. Then, either at driver
>> unload time or ifdown time they set the WOL bit in hardware
>> according to option.
>>
>
> It the kernel drivers behavior with respect to wake-on-lan defined
> somewhere? As I see this, the kernel drivers have no consistent
> interface to userspace, and thus make it impossible to solve this in a
> reliably way in userspace.
Because 'halt' has kernel-feeling options to block or do 'ifdown' and
block or do 'disk sync', its authors must be part of the answer. Long
term my suggestion is both those options seem to be things the kernel
filesystem and storage drivers should do on its way to orderly
ending/rebooting and they properly shouldn't be part of userspace halt.

But since they are at present and since halt is the program that hangs
when doing the right thing (asking for ifdown to set WOL) I am asking
halt authors to engage the correct kernel maintainers to decide either
the kernel and network driver authors must take responsibility to down
the interface or call shutdown routines even when NFS root just before
poweroff happens, or userspace drivers must pivot_root off the nfs root
to a tempfs system and down the interface before calling for a system
poweroff. As it is, network drivers engage WOL options when downing
the interface (though I worry some do it in the .shutdown routine).

Anyone can reproduce this problem: set up a NFS root system then try
poweroff using and not using halt's ifdown option. When choosing ifdown
option, you will see halt hang just before poweroff waiting for a
response from the NFS server that will never come because ifdown has
happened. In this case WOL is set in the adapters but because halt
never actually turns the system off and never exits -- having WOL turned
on is of no importance. The system is effectively hung until a physical
reboot. In the other case, leaving halt's ifdown option not chosen
when using NFS root will allow halt to exit normally, but the network
drivers routines that set the WOL bit never get called by the kernel, so
WOL is broken although the system is turned off.


> This make it the responsibility of the
> kernel drivers and the kernel space to come up with a solution. If
> there is a definition on how the kernel drivers should behave, and
> user space applications can depend on this definition during shutdown,
> we can discuss how to solve this in sysvinit. Until such definition
> show up, I have no idea how to solve this issue reliably.
>
> Please let us know how the kernel decided to handle wake-on-lan in
> network drivers, if such decision exist.
>
>

I am asking the halt authors to forward this bug to the correct people
since your ideas about who they are must be better than mine.

> Happy hacking,
>

Not so much as I hope! Upgrades near release like Lenny are not supposed
to break basic things. The effect of this problem is wasteful usage of
energy keeping systems that have no work to do nevertheless powered up.

Henrique de Moraes Holschuh

unread,
Oct 7, 2008, 1:10:13 PM10/7/08
to
On Tue, 07 Oct 2008, Harry Coin wrote:
> > It the kernel drivers behavior with respect to wake-on-lan defined
> > somewhere? As I see this, the kernel drivers have no consistent
> > interface to userspace, and thus make it impossible to solve this in a
> > reliably way in userspace.
> Because 'halt' has kernel-feeling options to block or do 'ifdown' and
> block or do 'disk sync', its authors must be part of the answer. Long
> term my suggestion is both those options seem to be things the kernel
> filesystem and storage drivers should do on its way to orderly
> ending/rebooting and they properly shouldn't be part of userspace halt.

We have been through a similar problem a while ago: disk shutdown and cache
flushing. It caused A LOT of pain.

The short answer is: the kernel has to do its job right on anything related
to device setup for suspend/hibernation/power off, and trying to fix it from
userspace just causes things to blow up when the kernel finally gets its act
together.

This is true for WoL just like it was true for disk shutdown. If WoL is
enabled, the kernel driver has to make sure everything that needs to be done
to the hardware to make it active when entering S3/S4/S5 is done. There is
NO other acceptable solution, because this is the only way to *always* get
it right, without weird border conditions.

Since the state of things right now is so utterly sad, we can't do much more
than provide both possibilities (good and broken drivers), warn the user
that he has to test which one he needs, and keep one eye in the kernel to
react quickly if the network drivers get fixed.

> when doing the right thing (asking for ifdown to set WOL) I am asking
> halt authors to engage the correct kernel maintainers to decide either
> the kernel and network driver authors must take responsibility to down

I could do that, yes. Please provide me a list of the drivers you have
verified to have retarded WoL behaviour, along with the kernel version. I
can at least make sure it is all listed in bugzilla.kernel.org, although I
doubt I can fix it myself.

> poweroff. As it is, network drivers engage WOL options when downing
> the interface (though I worry some do it in the .shutdown routine).

They should engage/de-engage WoL in device-model .shutdown and .suspend.
And if that means they have to bring the transceiver up/down or even paint
the moon blue in order to have the hardware do the right thing, they are to
do it.

At that point, userspace (i.e. Debian) is already frozen (for S3/S4) or
destroied (for S5 - shutdown) and doesn't care.

In fact, we should not have to touch the interfaces at all in the userspace
shutdown path, just like we don't have to do it anymore for disks (in fact,
we must NOT, for disks).

> Not so much as I hope! Upgrades near release like Lenny are not supposed
> to break basic things. The effect of this problem is wasteful usage of
> energy keeping systems that have no work to do nevertheless powered up.

I fear we are not going to touch this anymore for Lenny, except as
documentation. If any of the other maintainers want to, I won't stand in
the way, nor complain about it... but I doubt anyone will want to touch the
shutdown path this late in the freeze.

It is not like we can even fix this crap for good unless the kernel gets its
act together, anyway...

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

Harry Coin

unread,
Oct 7, 2008, 2:40:13 PM10/7/08
to
Thank you.

The drivers that don't enable WOL / wake-on-lan in the context of NFS
root after poweroff include e100.c and 3c59x.c as shipped in Debian
/Lenny-Testing. Modified versions of these drivers did function
properly in Debian etch (kernel .18). But so many changes happened to
them that my changes don't apply any longer. I think the problem is
they expect ifdown to happen and enable WOL there, they don't check for
it at .shutdown. I think I remember someone hacking a version of the
e100.c to WOL on suspend, but as for poweroff: Etch / 2..18 yes, Lenny
2..26 no.

>> poweroff. As it is, network drivers engage WOL options when downing
>> the interface (though I worry some do it in the .shutdown routine).
>>
>
> They should engage/de-engage WoL in device-model .shutdown and .suspend.
> And if that means they have to bring the transceiver up/down or even paint
> the moon blue in order to have the hardware do the right thing, they are to
> do it.
>
> At that point, userspace (i.e. Debian) is already frozen (for S3/S4) or
> destroied (for S5 - shutdown) and doesn't care.
>
> In fact, we should not have to touch the interfaces at all in the userspace
> shutdown path, just like we don't have to do it anymore for disks (in fact,
> we must NOT, for disks).
>
>
>> Not so much as I hope! Upgrades near release like Lenny are not supposed
>> to break basic things. The effect of this problem is wasteful usage of
>> energy keeping systems that have no work to do nevertheless powered up.
>>
>
> I fear we are not going to touch this anymore for Lenny, except as
> documentation. If any of the other maintainers want to, I won't stand in
> the way, nor complain about it... but I doubt anyone will want to touch the
> shutdown path this late in the freeze.
>
> It is not like we can even fix this crap for good unless the kernel gets its
> act together, anyway...
>
>


--

0 new messages