Bug#756903: systemd: Boot hangs if filesystems unavailable

Michael Biebl

unread,

Aug 3, 2014, 4:20:02 PM8/3/14

to

Control: -1 important

Am 03.08.2014 12:21, schrieb Tony Green:
> Since my machine recently updated to using systemd, I have experienced a number
> of occasions when it would just hang at a blank screen when booting.
>
> After some searching I managed to work out how to get back to having verbose
> output during the boot process, which showed me that the system was refusing to
> initialise because filesystems specified in /etc/fstab were not available
> (either NFS filesystems, when my network was playing up, or a USB external
> drive which had not spun up fast enough).
>
> It seems that systemd regards ANY missing filesystem as being a fatal error,
> whether or not that filesystem is actually essential to the boot process.
> Although this is certainly valid if vital partitions such as / or /usr can't be
> mounted, it's unhelpful for NFS or external partitions.

That's correct. Systemd waits for 90 seconds for devices listed in
/etc/fstab to show up. Since it doesn't know (unless you tell it) which
devices are essential, it drops you into a rescue shell afterwards to
examine the situation (unfortunately emergency mode currently has some
issues, but we are dealing with that in a separate bug report).

> Given the default result of a blank screen with no information about what's
> gone wrong, this could be very problematic for less experienced or confident
> users if it goes out beyond Debian Testing.

v209 and later will have slightly improved behaviour here.
Even if the default is to boot in quiet mode, systemd will switch into
verbose mode as soon as a unit fails to load or a timeout occurs.

> As a workaround, I have been able to ensure my system boots OK if any of these
> filesystems can't be mounted by adding "noauto" to /etc/fstab and then mounting
> the filesystems via /etc/rc.local instead.

The better alternative in your case (i.e. mount if available but don't
fail otherwise) is to mark the file systems as "nofail". See man fstab.

> I feel that a more robust (and set up by default) fix to this is needed before
> systemd goes mainstream on Debian, as this aspect of systemd is somewhat less
> reliable than on the old init system.
>
> Please feel free to ask for any more information if it may help fixing this.

Well, I consider the sysvinit behaviour buggy and unfortunately this
lead to broken fstab configurations in the past.
Imho there is no robust and reliable way to determine which file systems
are essential and which are not (but I'm happy to be proven wrong).
E.g. you might have a mysql server storing its databases on a
non-standard location . Failing to mount such a partition could lead to
data loss, etc.

I think it's better to simply fix-up /etc/fstab once and be done with that.

We can try to improve the documentation in that regard (wiki/release
notes/ README.Debian) and maybe add a preinst check which tries to
detect non-existing devices upon installation.
The latter obviously isn't a foolproof method, as in your case you might
actually have your external USB drive attached during installation but
not during reboot.

Comments on how to address this are welcome.

Michael

--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?

signature.asc

Cameron Norman

unread,

Aug 3, 2014, 6:10:01 PM8/3/14

to

On Sun, 03 Aug 2014 22:08:59 +0200 Michael Biebl <bi...@debian.org> wrote:
> Control: -1 important
>
> Am 03.08.2014 12:21, schrieb Tony Green:
> > Since my machine recently updated to using systemd, I have experienced a number
> > of occasions when it would just hang at a blank screen when booting.
> >
> > After some searching I managed to work out how to get back to having verbose
> > output during the boot process, which showed me that the system was refusing to
> > initialise because filesystems specified in /etc/fstab were not available
> > (either NFS filesystems, when my network was playing up, or a USB external
> > drive which had not spun up fast enough).
> >
> > It seems that systemd regards ANY missing filesystem as being a fatal error,
> > whether or not that filesystem is actually essential to the boot process.
> > Although this is certainly valid if vital partitions such as / or /usr can't be
> > mounted, it's unhelpful for NFS or external partitions.

>
> [snip]

>
> > As a workaround, I have been able to ensure my system boots OK if any of these
> > filesystems can't be mounted by adding "noauto" to /etc/fstab and then mounting
> > the filesystems via /etc/rc.local instead.
>
> The better alternative in your case (i.e. mount if available but don't
> fail otherwise) is to mark the file systems as "nofail". See man fstab.

With mountall/Upstart, there is a nobootwait option supported. I believe the behavior is similar to nofail, except that mountall will emit the filesystem event before finishing mounting the filesystem as well as not GAF about success/failure. Do you know if systemd supports this? To implement this in systemd I believe you would make the generator for mount units from fstab not add Before=local-fs.target or Before=remote-fs.target if the nobootwait option is used. This solves the problem that systemd does not know which filesystems are essential or not.

Best wishes,

--

Cameron Norman

Marco d'Itri

unread,

Aug 3, 2014, 6:50:02 PM8/3/14

to

On Aug 04, Cameron Norman <cameron...@gmail.com> wrote:

> With mountall/Upstart, there is a nobootwait option supported. I believe the
> behavior is similar to nofail, except that mountall will emit the filesystem
> event before finishing mounting the filesystem as well as not GAF about
> success/failure. Do you know if systemd supports this? To implement this in
> systemd I believe you would make the generator for mount units from fstab
> not add Before=local-fs.target or Before=remote-fs.target if the nobootwait
> option is used. This solves the problem that systemd does not know which
> filesystems are essential or not.

Such an option would not be the default, and if you can change your
configuration to use it then you can more easily fix your fstab as well.

--
ciao,
Marco

signature.asc

Tony Green

unread,

Aug 3, 2014, 6:50:02 PM8/3/14

to

I can see that this is a tricky issue.

I would suggest that at the very least, when systemd is installed to
replace the old init system, the changelogs generated and emailed to the
sysadmin ought to warn of potential problems with remote or removable
filesystems and recommend adding nofail to the /etc/fstab entries.

Thanks.
--
Tony Green
Ipswich, Suffolk, England
web-brewer.co.uk/
beermad.org.uk/

--
To UNSUBSCRIBE, email to debian-bugs-...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org

Cameron Norman

unread,

Aug 3, 2014, 7:40:01 PM8/3/14

to

What do you mean by "fix your fstab"? Adding this option is even beneficial if there is nothing wrong with the fstab, as services can be started before non-essential fs's are up.

Thanks,

--

Cameron

Marco d'Itri

unread,

Aug 3, 2014, 8:00:02 PM8/3/14

to

If you really want this then it can be arranged with noauto and
a dedicated unit.

--
ciao,
Marco

signature.asc

Jean-Michaël Celerier

unread,

Aug 8, 2014, 7:50:02 AM8/8/14

to

I tried using nofail, nobootwait and systemd still failed (which combined with the emergency infinite loop is quite frustrating).

Any hints ? I don't want to use noauto because I want my disk to be mounted at boot if it is plugged.

Best regards

Michael Biebl

unread,

Aug 8, 2014, 10:00:04 AM8/8/14

to

Please post your fstab, and a debug log.
To get the latter, please enable the debug shell service:
systemctl enable debug-shell.service [1]
Then boot with systemd.log_level=debug as boot parameter and attach the
output of journalctl -alb

Thanks

signature.asc

Stefan Monnier

unread,

Aug 9, 2014, 4:40:02 PM8/9/14

to

> Well, I consider the sysvinit behaviour buggy and unfortunately this
> lead to broken fstab configurations in the past.

There are 2 changes here:
1- systemd seems to *wait* for the device to be available, whereas the
old scripts just failed right away if the device was absent.
2- if the mount fails, this is considered by default as a fatal error.

Which of those 2 was "buggy" in sysvinit? The lack of wait or the fact
that it moved on upon failure?

I like the idea of mounting/fscking partitions without blocking other
unrelated boot steps, so I'm OK with adding boot options to help systemd
do a better job, but blocking the boot just because some random fstab
entry has an error is really obnoxious (it took me a while to fix my
machine after installing systemd-sysv because of such an fstab entry
referring to a disk that was not connected. Thank god there is
"break=mount"!).

Here are my suggestions:
- In the absence of an explicit request to "bootwait", systemd should
not wait for a device to appear, or at least not anywhere near 90s,
especially if it is holding back the whole rest of the boot process.
- If a mount fails, keep on booting. And then do your best to try and
bring this problem to the attention of someone (mentioning the
"nofail" option in that same message). Only stop the boot if the
partition is explicitly marked as "critical" or "stoponfail".

The second part is really important, because in 99.9% of the cases, it's
much better to keep on booting, even if the resulting system won't be
fully functional: it's likely to be a lot easier to fix the problem with
a "half booted" machine than a "not booted" one.

E.g. If the /home partition is missing, the machine might be largely
unusable, but I should usually still be able to log into it remotely,
fix up the partition or fstab entry, and reboot. But with the current
"stop on fail" behavior, I don't get such a chance.

Stefan

Zbigniew Jędrzejewski-Szmek

unread,

Aug 22, 2014, 12:20:01 AM8/22/14

to

On Sat, Aug 09, 2014 at 04:36:50PM -0400, Stefan Monnier wrote:
> > Well, I consider the sysvinit behaviour buggy and unfortunately this
> > lead to broken fstab configurations in the past.
>
> There are 2 changes here:
> 1- systemd seems to *wait* for the device to be available, whereas the
> old scripts just failed right away if the device was absent.
> 2- if the mount fails, this is considered by default as a fatal error.
>
> Which of those 2 was "buggy" in sysvinit? The lack of wait or the fact
> that it moved on upon failure?

Both :)

It is not uncommon for systems on SSD to boot to graphical login in ~2s,
which simply is not enough for all hardware, especially USB, to be
reliably detected. Since systemd mounts disks somewhere in the first
second, waiting for devices to show up is simply unavoidable.

You snipped Michaels response for the other issue. There's no way to
guess which filesystems listen in fstab are "important", and which
can be skipped.

> I like the idea of mounting/fscking partitions without blocking other
> unrelated boot steps, so I'm OK with adding boot options to help systemd
> do a better job, but blocking the boot just because some random fstab
> entry has an error is really obnoxious (it took me a while to fix my
> machine after installing systemd-sysv because of such an fstab entry
> referring to a disk that was not connected. Thank god there is
> "break=mount"!).
>
> Here are my suggestions:
> - In the absence of an explicit request to "bootwait", systemd should
> not wait for a device to appear, or at least not anywhere near 90s,
> especially if it is holding back the whole rest of the boot process.
> - If a mount fails, keep on booting. And then do your best to try and
> bring this problem to the attention of someone (mentioning the
> "nofail" option in that same message). Only stop the boot if the
> partition is explicitly marked as "critical" or "stoponfail".

Well, 'fail', which is the default, means just that. Systemd tries
to do the corrent and safe thing by default.

Zbyszek

Stefan Monnier

unread,

Aug 22, 2014, 11:10:02 AM8/22/14

to

>> - If a mount fails, keep on booting. And then do your best to try and
>> bring this problem to the attention of someone (mentioning the
>> "nofail" option in that same message). Only stop the boot if the
>> partition is explicitly marked as "critical" or "stoponfail".
> Well, 'fail', which is the default, means just that.

Not sure what "that" means here. Does your "that" mean "that which you
just described" or does it mean "fail"?

> Systemd tries to do the corrent and safe thing by default.

I'd hope so, but here's my case:
- a machine somewhat far away with an old and unimportant fstab entry
that refers to a drive that's rarely connected.
- after upgrading to systemd, the fstab entry caused systemd to stop the
boot (presumably asking for the operator to do something on console).
- with the boot stopped, I (the operator who is not on console, since
it's a remote machine), I can't fix the fstab entry.

In which way is it "safe and correct" to interrupt the boot in this case?

I can understand that making the mount wait (rather than just fail
right away) might be made necessary by the fact that systemd changes the
order in which operations are performed. I.e. I understand why the
change nb 1 might be needed.

But that doesn't explain why change number 2 was needed.
Apparently you think Michael's response explains it, but I failed to
see how.

Stefan

Zbigniew Jędrzejewski-Szmek

unread,

Aug 22, 2014, 1:00:03 PM8/22/14

to

On Fri, Aug 22, 2014 at 10:30:47AM -0400, Stefan Monnier wrote:
> >> - If a mount fails, keep on booting. And then do your best to try and
> >> bring this problem to the attention of someone (mentioning the
> >> "nofail" option in that same message). Only stop the boot if the
> >> partition is explicitly marked as "critical" or "stoponfail".
> > Well, 'fail', which is the default, means just that.
> Not sure what "that" means here. Does your "that" mean "that which you
> just described" or does it mean "fail"?

Treat all mounts as "critical".

> > Systemd tries to do the corrent and safe thing by default.
>
> I'd hope so, but here's my case:
> - a machine somewhat far away with an old and unimportant fstab entry
> that refers to a drive that's rarely connected.
> - after upgrading to systemd, the fstab entry caused systemd to stop the
> boot (presumably asking for the operator to do something on console).
> - with the boot stopped, I (the operator who is not on console, since
> it's a remote machine), I can't fix the fstab entry.
>
> In which way is it "safe and correct" to interrupt the boot in this case?

In the way that missing some mounts may indicate a serious problem and
could lead to incorrect behaviour or data loss.

> I can understand that making the mount wait (rather than just fail
> right away) might be made necessary by the fact that systemd changes the
> order in which operations are performed. I.e. I understand why the
> change nb 1 might be needed.
>
> But that doesn't explain why change number 2 was needed.
> Apparently you think Michael's response explains it, but I failed to
> see how.

Michael's response explains why it is not possible for systemd to
distinguish which filesystems are "critical", and which are
not. Sysvinit documentation is not particularly verbose about what happens
on error. mount(8) describes 'nofail', which implies that the opposite it
the default. I agree that stopping the boot is a change in semantics
to some degree, but this kind of change parallels changes to other
parts of the boot done by systemd, where status of things (services,
swap points, configuration settings) is checked and the boot is
stopped when something important fails. You can configure things
otherwise, but the default is to strictly obey dependencies.

Zbyszek

Stefan Monnier

unread,

Aug 22, 2014, 4:10:01 PM8/22/14

to

>> In which way is it "safe and correct" to interrupt the boot in this case?
> In the way that missing some mounts may indicate a serious problem and
> could lead to incorrect behaviour or data loss.

Haven't heard many complaints about that over the years, so it shouldn't
be a super-top-priority goal, I think.

In any case, that's not incompatible with the desire to "boot enough for
remote maintenance to be possible".

> stopped when something important fails. You can configure things
> otherwise, but the default is to strictly obey dependencies.

To get the best of both worlds, I suggest that if a problem happens
during boot, systemd doesn't just interrupt the boot, but instead tries
to keep booting up to a "fallback/safe" state, so that maintenance can be
done conveniently over the network, instead of having to be done "on
console" which is all too often completely unworkable.

Stefan

Zbigniew Jędrzejewski-Szmek

unread,

Aug 22, 2014, 7:50:02 PM8/22/14

to

On Fri, Aug 22, 2014 at 01:51:56PM -0400, Stefan Monnier wrote:
> >> In which way is it "safe and correct" to interrupt the boot in this case?
> > In the way that missing some mounts may indicate a serious problem and
> > could lead to incorrect behaviour or data loss.
>
> Haven't heard many complaints about that over the years, so it shouldn't
> be a super-top-priority goal, I think.

It is one of the mail goals of systemd to use the dependencies. In the
end this is what allows us to achieve reliability.

> In any case, that's not incompatible with the desire to "boot enough for
> remote maintenance to be possible".
>
> > stopped when something important fails. You can configure things
> > otherwise, but the default is to strictly obey dependencies.
>
> To get the best of both worlds, I suggest that if a problem happens
> during boot, systemd doesn't just interrupt the boot, but instead tries
> to keep booting up to a "fallback/safe" state, so that maintenance can be
> done conveniently over the network, instead of having to be done "on
> console" which is all too often completely unworkable.

Yes, you can configure such behaviour. You can add OnFailure= and
OnFailureJobMode= to default.target (e.g. in
/etc/systemd/system/default.target.d/failure.conf') to launch some
target you define, and add e.g. sshd.service and other things to this
target. It is hard to do something like this in a general way though.

You can also add nofail where necessary.

In systemd git, there's a more general setting StartTimeoutAction= [1]
which makes it possible to configure an action that will fire also
when boot "hangs" wait for password input or similar.

[1] http://cgit.freedesktop.org/systemd/systemd/commit/?id=2928b0a863

Zbyszek

Stefan Monnier

unread,

Aug 23, 2014, 9:40:01 AM8/23/14

to

> Yes, you can configure such behaviour.

I already have plenty of ways to configure the behavior I need.
This discussion is about the default behavior.

Stefan

Zbigniew Jędrzejewski-Szmek

unread,

Aug 23, 2014, 11:30:03 AM8/23/14

to

On Sat, Aug 23, 2014 at 09:33:56AM -0400, Stefan Monnier wrote:
> > Yes, you can configure such behaviour.
>
> I already have plenty of ways to configure the behavior I need.
> This discussion is about the default behavior.

Exactly. I hope the reasoning behind current defaults has been explained
adequately.

That said, it would be great to improve reporting if something like this
happens. Hopefully with #755581 fixed, emergency mode will be entered
successfully.

Zbyszek

Stefan Monnier

unread,

Aug 26, 2014, 12:50:02 AM8/26/14

to

> Exactly. I hope the reasoning behind current defaults has been explained
> adequately.

Not sure what you mean by "adequately". I understand your argument, but
I disagree with it. Do you understand my argument?

> That said, it would be great to improve reporting if something like this
> happens. Hopefully with #755581 fixed, emergency mode will be entered
> successfully.

Makes no difference to the case where you're not sitting at the console.
The way I see it, this case is very common.

Stefan

Zbigniew Jędrzejewski-Szmek

unread,

Aug 26, 2014, 8:10:02 AM8/26/14

to

On Tue, Aug 26, 2014 at 12:44:07AM -0400, Stefan Monnier wrote:
> > Exactly. I hope the reasoning behind current defaults has been explained
> > adequately.
>
> Not sure what you mean by "adequately". I understand your argument, but
> I disagree with it. Do you understand my argument?

Yes, I understand your argument. We just disagree on the "right" behaviour.
The behaviour that was the subject of the original bug report is by
design. I don't think there's much to be gained by discussing this
futher here.

Zbyszek

Andrei POPESCU

unread,

Aug 27, 2014, 2:10:03 PM8/27/14

to

On Sb, 23 aug 14, 17:17:05, Zbigniew Jędrzejewski-Szmek wrote:
> On Sat, Aug 23, 2014 at 09:33:56AM -0400, Stefan Monnier wrote:
> > > Yes, you can configure such behaviour.
> >
> > I already have plenty of ways to configure the behavior I need.
> > This discussion is about the default behavior.
> Exactly. I hope the reasoning behind current defaults has been explained
> adequately.

From mount(8):

nofail Do not report errors for this device if it does not exist.

It's really a bit of a stretch from this to "any mountpoint without
'nofail' should interrupt the boot unconditionally".

Kind regards,
Andrei
--
http://wiki.debian.org/FAQsFromDebianUser
Offtopic discussions among Debian users and developers:
http://lists.alioth.debian.org/mailman/listinfo/d-community-offtopic
http://nuvreauspam.ro/gpg-transition.txt

signature.asc