Long boot time for "Initialize and mount /rw and /home" unit

Alex

unread,

Jan 31, 2017, 3:55:35 PM1/31/17

to qubes-users

Hi,
I've been having some issues in the past few weeks with a Fedora 25
AppVM (template updated from f24), sometimes the
qubes-mount-dirs.service unit takes ~5 minutes to complete and I'm stuck
waiting for it to finish.

The private.img of this AppVM is ~48GiB, but I have another 20GiB AppVm
that is working just fine.

Since I had some spare time, I've been looking at what is this unit
doing and... I'm quite puzzled.

Please note that I'm not a /bin/sh master, so I may have mis-interpreted
what happens here.

The unit file calls /usr/lib/qubes/init/mount-dirs.sh, which in turn
calls /usr/lib/qubes/init/setup-rwdev.sh.

And in this file, I may be mistaken here, but what I see is:
- the script checks if /dev/xvdb exists
- if it does, it gets the size into $private_size_512
- it then asks dd to produce that many 512-byte-sized blocks, and asks
diff to compare them to /dev/xvdb
- if they match, a filesystem is created

What I don't understand is... is this thing really comparing ~50GiB of
disk on every boot with a stream of 50 billion zeros just to see if a
filesystem exists? It's weird, because if this was the case I would have
to wait a long time on every boot, while this does not happen; on 1 in 3
boots, the VM starts up in ~20 seconds instead of the usual 5 minutes.

Checking in journalctl, I see that this check seems to last less than a
second when a boot succeeds. I'll add some more log lines in the
template and will try to catch the log for a long boot.

Other questions arise. Is it really necessary for the AppVM to try to
perform those steps (one being the zero-check, the other being the
resize2fs that immediately follows in the same script) on every boot?
why not something like /usr/sbin/blkid or /usr/bin/file, i.e. why the
specific all-zero check? And why not place some dot-file in the root of
the filesystem, like /.resize to signal a resize2fs to be done, instead
of having it run on every boot?

Thank you for your time,

--
Alex

signature.asc

Chris Laprise

unread,

Jan 31, 2017, 10:09:20 PM1/31/17

to Alex, qubes-users

On 01/31/2017 03:55 PM, Alex wrote:
> What I don't understand is... is this thing really comparing ~50GiB of
> disk on every boot with a stream of 50 billion zeros just to see if a
> filesystem exists? It's weird, because if this was the case I would have
> to wait a long time on every boot, while this does not happen; on 1 in 3
> boots, the VM starts up in ~20 seconds instead of the usual 5 minutes.

I have VMs with even larger /rw filesystems, and they start in about 7
sec. under Debian 9. I don't think I have ever experienced this issue
with Fedora 23/24 either.

My guess about the dd|diff comparison in 'setup-rwdev.sh' is that diff
is closing its connection with dd as soon as it sees non-zero data. This
should cause both dd and diff to terminate immediately.

Chris

Alex

unread,

Feb 3, 2017, 10:30:14 AM2/3/17

to qubes-users

If this is the case, then I don't understand why the check is made this
way. Why not check only the first 512MiB, for example? No reasonable FS
would leave that much space wasted.

Anyway, I've had this same problem today (Long time for that unit), and
I had the logged proof that the diff does take less than one second to
complete, but after the resize2fs pass, I have an error in the log,
saying "Please run 'e2fsck -f /dev/xvdb' first.".

I'll dig further to inspect why the volume was somehow marked as "not
clean", because last shutdown was trouble-free (no fsck should be
needed), and because the wait for this unit (the fsck took 1'40" to
complete) caused a qrexec-not-connected error, that prevented the
automated completion of the boot process and the launch of some programs.

--
Alex

signature.asc

Chris Laprise

unread,

Feb 3, 2017, 12:52:16 PM2/3/17

to Alex, qubes-users

On 02/03/2017 10:30 AM, Alex wrote:
> On 02/01/2017 04:09 AM, Chris Laprise wrote:
>> On 01/31/2017 03:55 PM, Alex wrote:
>>> What I don't understand is... is this thing really comparing ~50GiB
>>> of disk on every boot with a stream of 50 billion zeros just to see
>>> if a filesystem exists? It's weird, because if this was the case I
>>> would have to wait a long time on every boot, while this does not
>>> happen; on 1 in 3 boots, the VM starts up in ~20 seconds instead of
>>> the usual 5 minutes.
>> I have VMs with even larger /rw filesystems, and they start in about
>> 7 sec. under Debian 9. I don't think I have ever experienced this
>> issue with Fedora 23/24 either.
>>
>> My guess about the dd|diff comparison in 'setup-rwdev.sh' is that
>> diff is closing its connection with dd as soon as it sees non-zero
>> data. This should cause both dd and diff to terminate immediately.
>>
> If this is the case, then I don't understand why the check is made this
> way. Why not check only the first 512MiB, for example? No reasonable FS
> would leave that much space wasted.

I wondered that myself. Probably there was a high confidence that the
pipe would work as predicted, and there would be no large+blank volumes.

> Anyway, I've had this same problem today (Long time for that unit), and
> I had the logged proof that the diff does take less than one second to
> complete, but after the resize2fs pass, I have an error in the log,
> saying "Please run 'e2fsck -f /dev/xvdb' first.".
>
> I'll dig further to inspect why the volume was somehow marked as "not
> clean", because last shutdown was trouble-free (no fsck should be
> needed), and because the wait for this unit (the fsck took 1'40" to
> complete) caused a qrexec-not-connected error, that prevented the
> automated completion of the boot process and the launch of some programs.

Good luck. I have an opposite issue: VMs on newer templates (Debian 9,
Fedora 25) are taking up to 3min to shutdown.

Chris

Alex

unread,

Feb 11, 2017, 5:56:22 AM2/11/17

to qubes-users

Same issue (long boot time) happening again today; a specific appVM
(which has been resized multiple times in the last 2 years) took 2'40"
to run an e2fsck pass. No errors reported from fsck (I added a couple
rows in /usr/lib/qubes/init/setup-rwdev.sh). After it was run, resize2fs
stopped complaining.

I discovered that resize2fs seems to be extremely picky about date and
time, particularly regarding the check about last fsck run: from
http://blog.bastelhalde.de/?p=746 i see
> After an hour of searching I found out that the time settings of your
> system (hardware clock and software clock) have to be correct to let
> rezise2fs check your last run of e2fsck.
So this latency, and the fact that fsck seems to be run nearly on every
boot, may have to do with time being inconsistent (actually, all VM
environments are known for time issues).

On 02/03/2017 06:52 PM, Chris Laprise wrote:
>
> Good luck. I have an opposite issue: VMs on newer templates (Debian
> 9, Fedora 25) are taking up to 3min to shutdown.
>
> Chris

I've had some problems in the past with shutdown because of some issues
in pulseaudio; you can try to investigate with the last lines of a
guest.log from the affected machines. Now I'm not having this situation
anymore.

--
Alex

signature.asc

Reply all

Reply to author

Forward