Template's root volume partition table in Qubes 4.0

598 views
Skip to first unread message

Marek Marczykowski-Górecki

unread,
Oct 13, 2017, 6:22:35 PM10/13/17
to qubes-devel, Joanna Rutkowska, Wojciech Porczyk
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi all,

There is a problem with our shiny new templates for Qubes 4.0. Partition
table there make it hard to resize root filesystem.

Current partition layout of a template in Qubes 4.0 is:
1. xvda1: root filesystem (almost all available space)
2. xvda2: EFI system partition (empty, prepared for PVHv2 boot with VM-provided kernel)
3. xvda3: BIOS boot (grub2 installed, used when `kernel=''`)

This makes resizing root volume hard, because one need to move xvda[23]
data to the (new) end of the disk first. Also, having partitions at all
(de facto required by grub2), makes online resize of root volume
hard/impossible.

Proposed solution: reorder partitions to 1. ESP, 2. BIOS boot, 3. root
filesystem. This will not solve online resize problem, but will allow
offline one (with VM restart in the middle).

The problem is, it is a bit late for changing such fundamental thing...
Affected components:
- template builder[1]
- initrd[2] (responsible for mounting root filesystem)

So, at this stage, changing partition layout (dropping support for the
current one, adding support for the new one) IMO is out of the options.
The only possibility (if at all) is to add support for new layout and
keep support for the current one. In a way not breaking the current
setup.

Alternatively, we can keep it as is (and change later - like in Qubes
4.1). And for now document how to extend root volume manually. Something
like:
1. Shutdown VM (TemplateVM/StandaloneVM)
2. Use `qvm-volume` to extend root volume
3. Set VM kernel to some version (if was set to empty - i.e. VM provided)
3. Start VM
4. Launch fdisk, remove all partitions
5. Re-create partition 1 with new size - almost the whole disk - minus 202M, set its type to "Linux"
6. Re-create partition 2 with size 200M, set its type to "EFI system partition"
7. Re-create partition 3 with size 2M, set its type to "BIOS boot"
8. Restart VM (to reload partition table)
9. Re-install grub (`grub2-install /dev/xvda`)
10. Restore original kernel property (if changed in step 3)

A lot of things can go wrong in the process.

Yet another option would be to automate the above in initrd, before
mounting root filesystem - so it is possible to reload partition table
there, without VM restart. This would require copying data of partitions
2 & 3. Not sure if grub will survive such thing (because of moving BIOS
boot partition). It is fragile operation, too.

Any preference, or maybe another solution?

Tracking issue:
https://github.com/QubesOS/qubes-issues/issues/3173

[1]
https://github.com/QubesOS/qubes-linux-template-builder/blob/master/prepare_image#L59-L76
- - and other places there assuming root fs is on the first partition
[2] https://github.com/QubesOS/qubes-linux-utils/tree/master/dracut

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJZ36tUAAoJENuP0xzK19csGXcH/iwlhVVSURbAxGB9Xam1q7po
RoOSG8SqOTU56zGw2V+3I1ILZcwXDmlUy9qE2Fqpn7ZkeuOpV22mzRh4Bl+yFxaH
s0tTta3SQb7vxkf5GW1/2bScSmmvJ9onLKPcX42PLEMfAzgS/nJ716Oi7HWYB+l3
hO+oYezBGUELJ+LY1sXUwUmCGupFxlYqITwIa4CP0EK2HgtfW4LoiwpVI9pnt6bB
HCFT617MKldszIibKRVSjgRpHexAdKg+IPE7y2lFkJHAAg5dMwjCkDQAJ8BcS50t
dmp5oKWOHgyZEzuvmI9M8qfXH6m7Okfwyf3Cpr1KShQzYuwZ6d0ZpTpoyuWgQT4=
=+a57
-----END PGP SIGNATURE-----

Wojtek Porczyk

unread,
Oct 13, 2017, 7:17:59 PM10/13/17
to Marek Marczykowski-Górecki, qubes-devel, Joanna Rutkowska
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Oct 14, 2017 at 12:22:28AM +0200, Marek Marczykowski-Górecki wrote:
> There is a problem with our shiny new templates for Qubes 4.0. Partition
> table there make it hard to resize root filesystem.
>
> Current partition layout of a template in Qubes 4.0 is:
> 1. xvda1: root filesystem (almost all available space)
> 2. xvda2: EFI system partition (empty, prepared for PVHv2 boot with VM-provided kernel)
> 3. xvda3: BIOS boot (grub2 installed, used when `kernel=''`)
>
> This makes resizing root volume hard, because one need to move xvda[23]
> data to the (new) end of the disk first. Also, having partitions at all
> (de facto required by grub2), makes online resize of root volume
> hard/impossible.
>
> Proposed solution: reorder partitions to 1. ESP, 2. BIOS boot, 3. root
> filesystem. This will not solve online resize problem, but will allow
> offline one (with VM restart in the middle).

Why won't it solve online resizing? If you're growing the last partition, it
works OK.
pozdrawiam / best regards _.-._
Wojtek Porczyk .-^' '^-.
Invisible Things Lab |'-.-^-.-'|
| | | |
I do not fear computers, | '-.-' |
I fear lack of them. '-._ : ,-'
-- Isaac Asimov `^-^-_>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJZ4UmbAAoJEL9r2TIQOiNRuBQP/jmybgdQAS17qc6KE8/Mxdxc
IhpPxSawlDAsH3awlR0Osw79/Dh8wAyf+d+XTqvreQAmXx5rRZXMvxY50Ax0DOXR
ojw7kk29OktzJ9TO7/J9rWrd57CiLVtrLVov8ZoSxbq1odiBGVz9QNZwlibgTYLM
q9Saue9yOUIE4E21PYC8qz46fQB3wKOEUwRLKDjETZNPqKDtWnz/KNy7RP0RBH0y
7WUo62IaNbpYRCumv55tx0TqwaYQI+7Jh2jPIztiC0xBzvftJbrwNu0xRI/2xgew
U2c6Q9ecGbVm7RtMS5fEjiOCbD8s8b++YTSD43gjR+Mwl5MBxQEWoOyPEYA2v51t
CvlwTeH6ReSbkunmbkCqzC8y6XbFt0y/b11VLTV/R9lbBGC6JB24xCseYtr4zGg+
ijmnvXBQobfimKYFYAMaf4A1BviQERjPKb1YiyoWxrowyfloIw13f0il8N1fw7UZ
oElhngZ9BJIYkXuD9Xn8jaZ22AnKmOUICdDPzmoVDzU7AFa1JJpdDS3aEYR6jalT
eMRqLooYwSHIU56rqg+MPVfUO08U9v2jIyyAm8+It51I2RqoeFF7u7yunSu/AMH1
18fGWlclGS9bvNi/sIdyg20zscjjdWQuGxEilfOlvpORPJ0jlYrUy/cgprx1wKk7
ijahj6lT6gWZd8PqX3ZN
=FBgi
-----END PGP SIGNATURE-----

Marek Marczykowski-Górecki

unread,
Oct 13, 2017, 7:20:20 PM10/13/17
to qubes-devel, Joanna Rutkowska
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Oct 14, 2017 at 01:17:48AM +0200, Wojtek Porczyk wrote:
> On Sat, Oct 14, 2017 at 12:22:28AM +0200, Marek Marczykowski-Górecki wrote:
> > There is a problem with our shiny new templates for Qubes 4.0. Partition
> > table there make it hard to resize root filesystem.
> >
> > Current partition layout of a template in Qubes 4.0 is:
> > 1. xvda1: root filesystem (almost all available space)
> > 2. xvda2: EFI system partition (empty, prepared for PVHv2 boot with VM-provided kernel)
> > 3. xvda3: BIOS boot (grub2 installed, used when `kernel=''`)
> >
> > This makes resizing root volume hard, because one need to move xvda[23]
> > data to the (new) end of the disk first. Also, having partitions at all
> > (de facto required by grub2), makes online resize of root volume
> > hard/impossible.
> >
> > Proposed solution: reorder partitions to 1. ESP, 2. BIOS boot, 3. root
> > filesystem. This will not solve online resize problem, but will allow
> > offline one (with VM restart in the middle).
>
> Why won't it solve online resizing? If you're growing the last partition, it
> works OK.

You can't reload partition table while it is used (something from there
is mounted). At least Linux doesn't support it.
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJZ37jdAAoJENuP0xzK19csF4MH/RE9ZON//Yc7YIpVqXNompJq
aKzh5xgcsVjG+BrGymim23yMEFVlwXz0jE/UOb4otefW4MPLxEb8EzuwPjeo4kCY
7gJzzbvxJ3O71iSxCjHoVoCS7NGiuJbjM61mI+y8LqWHRTv5a7gRoTbnim9p1q82
2mVACgvSMTRoYVuoBRw78mb6741Y35OCpdYnpRQOve2VZS6B5GcEev/Xm1W2KVz4
uU6i9CbZ6er3Vf3mNeQrOZaiu3yCkVW9f6hk6ZKG3tSBC2QYWGODW50an04SavPS
ymM3OhwLfZ4DDbLwWAEbEVwjqYIu93BFXBDVxDjSrzgo3ncJF+5t/fGVmIyPKXU=
=1nep
-----END PGP SIGNATURE-----

Wojtek Porczyk

unread,
Oct 13, 2017, 7:22:06 PM10/13/17
to Marek Marczykowski-Górecki, qubes-devel, Joanna Rutkowska
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Oct 14, 2017 at 01:20:13AM +0200, Marek Marczykowski-Górecki wrote:
> On Sat, Oct 14, 2017 at 01:17:48AM +0200, Wojtek Porczyk wrote:
> > On Sat, Oct 14, 2017 at 12:22:28AM +0200, Marek Marczykowski-Górecki wrote:
> > > There is a problem with our shiny new templates for Qubes 4.0. Partition
> > > table there make it hard to resize root filesystem.
> > >
> > > Current partition layout of a template in Qubes 4.0 is:
> > > 1. xvda1: root filesystem (almost all available space)
> > > 2. xvda2: EFI system partition (empty, prepared for PVHv2 boot with VM-provided kernel)
> > > 3. xvda3: BIOS boot (grub2 installed, used when `kernel=''`)
> > >
> > > This makes resizing root volume hard, because one need to move xvda[23]
> > > data to the (new) end of the disk first. Also, having partitions at all
> > > (de facto required by grub2), makes online resize of root volume
> > > hard/impossible.
> > >
> > > Proposed solution: reorder partitions to 1. ESP, 2. BIOS boot, 3. root
> > > filesystem. This will not solve online resize problem, but will allow
> > > offline one (with VM restart in the middle).
> >
> > Why won't it solve online resizing? If you're growing the last partition, it
> > works OK.
>
> You can't reload partition table while it is used (something from there
> is mounted). At least Linux doesn't support it.

Just tested with loop device, fdisk, and ext4. Looks like it works. Unless it
works only with loop?
> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
>
> --
> You received this message because you are subscribed to the Google Groups "qubes-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to qubes-devel...@googlegroups.com.
> To post to this group, send email to qubes...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/qubes-devel/20171013232013.GN1059%40mail-itl.
> For more options, visit https://groups.google.com/d/optout.

- --
pozdrawiam / best regards _.-._
Wojtek Porczyk .-^' '^-.
Invisible Things Lab |'-.-^-.-'|
| | | |
I do not fear computers, | '-.-' |
I fear lack of them. '-._ : ,-'
-- Isaac Asimov `^-^-_>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJZ4UqVAAoJEL9r2TIQOiNRAiYQAJBN38BsvsKM8bC/6uXI2pMe
B0ZhoW2KCSQeEE2yWOarUJ3UHJzYqfxZuO0FMmZwGadNWQy2jSzgNjphMojpu9Me
kjJWr0Xf1W1ZTgtRde5HFgPr06Dgy8rHimljeRlxBPqWK0d4LYaXLmEy9+21Oim4
ayNxWmFsrXVv0/vg8WXMXKoho7saeti/Z1606f9m5sd/KCqSisyk6FvQiLSjhLIa
LHXx1N/gxbkw9/TlX9RShmBr5QpsJx4VC0cBDBz0eIDsfEuy2wynKZdUN9Bh4zI0
ubXwHxRF/6UilJf85pgDp8b4USo5fAckzX9HgRl8gNqegFmFRto+i4vX61zl85wU
SCTYa9lHZUSEdL/urLHOBF1ObsR6p6cWKKF2bIS4+/IV4HcXJiaAqN3EE36GL7Us
HlMp6biWBZSI6lJuBmNDQUbYsv2hJcFFBxSB3di48chRJgLdMpOqcWEMXad/zb1b
u/7nET5TE3AsRF39N6WI6FerKgG6q3bFeWPtvCrnXoTHWGE1etbrZ3o/4WlT+PKZ
kqel6j2vZC/3XVa2ZWDzgDuFJeScvgBfq22s8a7znyj1wJAUoX4aTq0SloXmo+RX
7T2Beg+LTq8gF2IvwJOrIGzPV1/WT+auAcXQVIFRCOX6JiOJrUR1h4NTPlP2CvBC
C0MYu1jM80Gs7T9nKbyt
=lTFr
-----END PGP SIGNATURE-----

Marek Marczykowski-Górecki

unread,
Oct 13, 2017, 7:40:52 PM10/13/17
to qubes-devel, Wojciech Porczyk, Joanna Rutkowska
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Oct 14, 2017 at 01:21:58AM +0200, Wojtek Porczyk wrote:
> On Sat, Oct 14, 2017 at 01:20:13AM +0200, Marek Marczykowski-Górecki wrote:
> > You can't reload partition table while it is used (something from there
> > is mounted). At least Linux doesn't support it.
>
> Just tested with loop device, fdisk, and ext4. Looks like it works. Unless it
> works only with loop?

No, it doesn't work, at least not directly:

[root@testvm user]# truncate -s 10G test.img
[root@testvm user]# losetup -f -P --show test.img
/dev/loop0
[root@testvm user]# fdisk /dev/loop0
(...)
Created a new partition 1 of type 'Linux' and of size 10 GiB.

Command (m for help): w
The partition table has been altered.
Syncing disks.

[root@testvm user]# blockdev --rereadpt /dev/loop0
[root@testvm user]# mkfs /dev/loop0p1
(...)
[root@testvm user]# mount /dev/loop0p1 /mnt
[root@testvm user]# truncate -s 20G test.img
[root@testvm user]# losetup -c /dev/loop0
[root@testvm user]# fdisk /dev/loop0
(...)
Command (m for help): p
Disk /dev/loop0: 20 GiB, 21474836480 bytes, 41943040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x75e36dce

Device Boot Start End Sectors Size Id Type
/dev/loop0p1 2048 41943039 41940992 20G 83 Linux

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Re-reading the partition table failed.: Device or resource busy

The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8).

I wonder on what conditions partprobe would work. And how would mounted
filesystem react for it.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJZ372sAAoJENuP0xzK19csCUYH/1Of/xg/fHvXsS1n1KB5YjJR
0qzlghXOouL3rIajw1GX2HmnIdGrDxs7sFF0+IUC5ruPp05svp/HocgJ91o5b33u
RnQBxiwqPrP75Bt/j9wisAF0jIoTtnqDDRGjDLzkxx3n0fq0K/GlrzKImqAe/RU8
k5Vn6G3ek/iP2umB6QOAnxjqPycb1Vr+PJXoUaDznL4jlr/CLaVehbhB/Di/wMb3
4HydEQBcrlTS3y8JuHFgyOtOCmlbl5YoCqo0l2CZkvfNop5RyorBA5/zfewylvSO
CnDHlHH8zFlSUnHF+daqXYngXl2eDPsHRU3LQ0G4YcdnwpyCWI8tuzdYEZk37G4=
=F/oP
-----END PGP SIGNATURE-----

Wojtek Porczyk

unread,
Oct 13, 2017, 8:07:00 PM10/13/17
to Marek Marczykowski-Górecki, qubes-devel, Joanna Rutkowska
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I'm calling bullshit on that. Coredump follows:

- -- >8 --

Command (m for help): n
Partition type
p primary (2 primary, 0 extended, 2 free)
e extended (container for logical partitions)
Select (default p):

Using default response p.
Partition number (3,4, default 3):
First sector (43008-195311, default 43008):
Last sector, +sectors or +size{K,M,G,T,P} (43008-195311, default 195311): +50M

Created a new partition 3 of type 'Linux' and of size 50 MiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Re-reading the partition table failed.: Invalid argument

The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8).

[root@qubes-dev tmp]# partprobe /dev/loop0
[root@qubes-dev tmp]# ls -ld /dev/loop0*
brw-rw---- 1 root disk 7, 0 Oct 14 01:16 /dev/loop0
brw-rw---- 1 root disk 259, 0 Oct 14 01:16 /dev/loop0p1
brw-rw---- 1 root disk 259, 1 Oct 14 01:16 /dev/loop0p2
brw-rw---- 1 root disk 259, 2 Oct 14 01:16 /dev/loop0p3
[root@qubes-dev tmp]# blockdev --getsize64 /dev/loop0p3
52428800
[root@qubes-dev tmp]# resize2fs /dev/loop0p3
resize2fs 1.43.3 (04-Sep-2016)
Filesystem at /dev/loop0p3 is mounted on /mnt; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
The filesystem on /dev/loop0p3 is now 51200 (1k) blocks long.

[root@qubes-dev tmp]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/dmroot 9.5G 4.6G 4.5G 51% /
/dev/xvdd 477M 227M 226M 51% /usr/lib/modules/4.9.45-21.pvops.qubes.x86_64
devtmpfs 145M 0 145M 0% /dev
tmpfs 1.0G 0 1.0G 0% /dev/shm
tmpfs 153M 616K 152M 1% /run
tmpfs 153M 0 153M 0% /sys/fs/cgroup
tmpfs 1.0G 76M 949M 8% /tmp
/dev/xvdb 197G 168G 30G 85% /rw
tmpfs 31M 12K 31M 1% /run/user/1000
/dev/loop0p3 48M 650K 45M 2% /mnt
[root@qubes-dev tmp]# truncate -s 1G hda1
[root@qubes-dev tmp]# losetup -c /dev/loop0
[root@qubes-dev tmp]# blockdev --getsize64 /dev/loop0
1073741824
[root@qubes-dev tmp]# fdisk /dev/loop0

Welcome to fdisk (util-linux 2.28.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): p
Disk /dev/loop0: 1 GiB, 1073741824 bytes, 2097152 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x75b4ec65

Device Boot Start End Sectors Size Id Type
/dev/loop0p1 2048 22527 20480 10M 83 Linux
/dev/loop0p2 22528 43007 20480 10M 83 Linux
/dev/loop0p3 43008 145407 102400 50M 83 Linux

Command (m for help): d
Partition number (1-3, default 3):

Partition 3 has been deleted.

Command (m for help): n
Partition type
p primary (2 primary, 0 extended, 2 free)
e extended (container for logical partitions)
Select (default p):

Using default response p.
Partition number (3,4, default 3):
First sector (43008-2097151, default 43008):
Last sector, +sectors or +size{K,M,G,T,P} (43008-2097151, default 2097151):

Created a new partition 3 of type 'Linux' and of size 1003 MiB.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Re-reading the partition table failed.: Invalid argument

The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8).

[root@qubes-dev tmp]# partprobe /dev/loop0
[root@qubes-dev tmp]# echo $?
0
[root@qubes-dev tmp]# resize2fs /dev/loop0p3
resize2fs 1.43.3 (04-Sep-2016)
Filesystem at /dev/loop0p3 is mounted on /mnt; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 8
The filesystem on /dev/loop0p3 is now 1027072 (1k) blocks long.

[root@qubes-dev tmp]# cat /etc/os-release
NAME=Fedora
VERSION="25 (Twenty Five)"
ID=fedora
VERSION_ID=25
PRETTY_NAME="Fedora 25 (Twenty Five)"
ANSI_COLOR="0;34"
CPE_NAME="cpe:/o:fedoraproject:fedora:25"
HOME_URL="https://fedoraproject.org/"
SUPPORT_URL="https://fedoraproject.org/wiki/Communicating_and_getting_help"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=25
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=25
PRIVACY_POLICY_URL=https://fedoraproject.org/wiki/Legal:PrivacyPolicy
[root@qubes-dev tmp]# uname -a
Linux qubes-dev 4.9.45-21.pvops.qubes.x86_64 #1 SMP Tue Aug 29 14:50:28 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@qubes-dev tmp]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/dmroot 9.5G 4.6G 4.5G 51% /
/dev/xvdd 477M 227M 226M 51% /usr/lib/modules/4.9.45-21.pvops.qubes.x86_64
devtmpfs 145M 0 145M 0% /dev
tmpfs 1.0G 0 1.0G 0% /dev/shm
tmpfs 153M 616K 152M 1% /run
tmpfs 153M 0 153M 0% /sys/fs/cgroup
tmpfs 1.0G 75M 950M 8% /tmp
/dev/xvdb 197G 168G 30G 85% /rw
tmpfs 31M 12K 31M 1% /run/user/1000
/dev/loop0p3 976M 1.7M 943M 1% /mnt
[root@qubes-dev tmp]#

- -- >8 --

- --
pozdrawiam / best regards _.-._
Wojtek Porczyk .-^' '^-.
Invisible Things Lab |'-.-^-.-'|
| | | |
I do not fear computers, | '-.-' |
I fear lack of them. '-._ : ,-'
-- Isaac Asimov `^-^-_>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJZ4VUSAAoJEL9r2TIQOiNRrYoP/125r8sw47QEm2G1dDxFc18Y
f01ulwwzp93H6C8dD005wVovM28spumBm09rSV6iQdY1dZf4PMEK6leLZ6h0O0ef
iYMsrC8qIyFdRqd8UylpZapDKwpdfG+u7G/8aEkk5FcMbhYTZXSZS9IMtAAoE1Be
veiJoU7kOnZU2sWwWKkan8DLoJSyD3B/wpwUEGhaxepaq409ZGfzVyAM63ZdQ3/d
WfOvSycjO/glxcYRP3vtP8BFBktRHUpBkS/2bVqpDOs7i0bfT4Q6BdiaHtso5xM3
Zx3lXGxePxTG8vyeap5ncaaKB3EJiCUqpyJaufhv5gccslkaRQeVb9sDBsjoV9/f
eypA7xmwVLQay8/sN8he0urNNUbbgcimQ6NB0neNRaPnewJAp0Z1YYG126HcXQ58
nVpr/Fmkd4WDniqKgIt+CL/j/i/1DpgiZelucVtYCF9TG4AR8QDcilEpuCZCtmAR
tUxgpTrMBVX89cWKQgjeC3qfAu4AKmEkAN1B/s3SMCuyJAw/B0EfG9gGiweEL/ic
b16mVwgyj/xasAr/tLPXtgca4D22veDSbBit4gQzvVt9azXptKkqjdRM23KLvPYQ
IlfIuAtohmXp3WOmb0AuymstKMKDMTMsyVQIlZtRy3Xm+oWVKcs8joDwn8vPnJ9H
vl5LIisbhFV4owWf6U28
=rTm+
-----END PGP SIGNATURE-----

Wojtek Porczyk

unread,
Oct 13, 2017, 8:20:52 PM10/13/17
to Marek Marczykowski-Górecki, qubes-devel, Joanna Rutkowska
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Oct 14, 2017 at 02:06:43AM +0200, Wojtek Porczyk wrote:
> On Sat, Oct 14, 2017 at 01:40:45AM +0200, Marek Marczykowski-Górecki wrote:
> > On Sat, Oct 14, 2017 at 01:21:58AM +0200, Wojtek Porczyk wrote:
> > > On Sat, Oct 14, 2017 at 01:20:13AM +0200, Marek Marczykowski-Górecki wrote:
> > > > You can't reload partition table while it is used (something from there
> > > > is mounted). At least Linux doesn't support it.
> > >
> > > Just tested with loop device, fdisk, and ext4. Looks like it works. Unless it
> > > works only with loop?
> >
> > No, it doesn't work, at least not directly:
> >

(snip)

> > The kernel still uses the old table. The new table will be used at the next reboot or after you run partprobe(8) or kpartx(8).
> >
> > I wonder on what conditions partprobe would work. And how would mounted
> > filesystem react for it.
>
> I'm calling bullshit on that. Coredump follows:

(...)

> [root@qubes-dev tmp]# partprobe /dev/loop0
> [root@qubes-dev tmp]# echo $?
> 0

(...)

Look at strace of blockdev --rereadpt and partprobe. The difference is that
blockdev (and I assume fdisk also, since both are from util-linux) fires
ioctl(fd, BLKRRPART), but partprobe (from GNU parted) does something funny to
/sys, which I didn't try to understand, but seems to work.

My guess is that this is a missing feature in kernel, which parted works
around.


- --
pozdrawiam / best regards _.-._
Wojtek Porczyk .-^' '^-.
Invisible Things Lab |'-.-^-.-'|
| | | |
I do not fear computers, | '-.-' |
I fear lack of them. '-._ : ,-'
-- Isaac Asimov `^-^-_>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJZ4VhQAAoJEL9r2TIQOiNRmj4P/2jqcgFLMywBayUXXCrTAVIx
9KC8xQ20PFlLaI4PRU/3SvuN0yZWuuM90uy6pQ9HhNAJU35YPy7KMy9p75+GCJ/i
/NiTPtmvlFY6FoiJx1llbZvl70HJNrKZUNtYmXgQtxXihC43iCYdLWGZtazp7Deb
WBi8aT1W5Q8LclnNUu0OUY+TA6jFZoD6urfva9CKBKAeH6L6bFSZ0TfPxaQlFNz8
Ya7pa6Jg39eepZS8VUjU5snqkZtA164DpvMtkXz7Fi7gjXYBNK3Y5fe6nhJ5/dsu
uGHlbr1CC6WNvX6Yoh26iCkthrX0BkobpIFxqoxQxExpXrc3Px3QPlK9afiM5fn2
x3l8KRlN7FN5f+izteoP6xjeukE72IhTUOhL8c8F/ceY5jnP+hnIgVTmXMkWPTyJ
WnOwGO6cPraLhyxZOlOSphN+OOY0BCR8bZbMuYRzn37prxiDrSZAEAzEixuBahBL
N3a4fapwUG1k9o4Pb1NPqdT26WtNibt0+n2AcL7YjUfYclRC9ET0B80QPsfj7K2k
hEaaBHlok+h0//ZqiGXAzASygG+pO8SyyLeMMjwZuPxhJoK9+tK9LuD/UK2s7Igu
mgdrqnZzYlTII+J6f2CWS/T964svsWWtHy4egO7L8GPOIx2joriha264fC9tImx2
BXhYmOae919Gy9Sg8rT5
=UAwV
-----END PGP SIGNATURE-----

Andrew Clausen

unread,
Oct 14, 2017, 12:42:42 AM10/14/17
to Marek Marczykowski-Górecki, qubes-devel, Joanna Rutkowska
Hi all,

I am a Qubes user, and coincidentally, the original author of partprobe (and parted).  I haven't looked at partprobe/parted since 2005.  The code has changed a lot since then, but let me do my best...

The BLKRRPART ioctl is no good because it can't accommodate busy block devices at all, i.e. resizing partition 1 when partition 2 on the same disk is mounted.

Instead, Parted primarily uses the BLKPG family of ioctl to inform the kernel of partition table changes.  (It also has "new" support for the device mapper -- but that's probably not relevant here.)  You can read about the BLKPG ioctl in /usr/include/linux/blkpg.h.  Since 2012, the linux kernel supports a new BLKPG feature to do online partition resizing, i.e. telling the kernel to modify a mounted partition.  I think this is what is being used here.

The relevant Parted code is in the function linux_disk_commit(), which calls _disk_sync_part_table() and _blkpg_resize_partition() inside http://git.savannah.gnu.org/cgit/parted.git/tree/libparted/arch/linux.c

If the BLKPG ioctl fails, then partprobe/parted will throw an exception and tell you about it.

Wojtek: what part of your shell transcript was unexpected?  It looked like everything worked to me.

Cheers,
Andrew

Wojtek Porczyk

unread,
Oct 14, 2017, 9:35:03 AM10/14/17
to Andrew Clausen, Marek Marczykowski-Górecki, qubes-devel, Joanna Rutkowska
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Thank you for your insight. I pasted the above to
https://github.com/QubesOS/qubes-issues/issues/3173#issuecomment-336635237.

> Wojtek: what part of your shell transcript was unexpected? It looked like
> everything worked to me.

Yes, it did. The "bullshit" comment was in reply to Marek's statement about
supposed inability to reread partition table while any partition is mounted.
Well, as you point out, it can't be reloaded using BLKRRPART ioctl, which is
used by fdisk. Marek didn't use the partprobe, and he just took the error
message from fdisk at face value and concluded that we need to reboot.


- --
pozdrawiam / best regards _.-._
Wojtek Porczyk .-^' '^-.
Invisible Things Lab |'-.-^-.-'|
| | | |
I do not fear computers, | '-.-' |
I fear lack of them. '-._ : ,-'
-- Isaac Asimov `^-^-_>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIbBAEBCAAGBQJZ4hJ6AAoJEL9r2TIQOiNRK+cP8QFb36PkY3+hmdlostS1Nqa2
QE1men8VxZmniwMtpUl9XRdWWYZpgsZy3OZ7M+WXncWQGzQNBk+3UIcJUC4chYS7
hR9rTLQpkV5g7fZmP8QCQukgG5SfJFhAcsBvl00WK+b7g46fIeaikWoQ2AgwZuTa
ZNnZLyJPOOp4W/93HxEkeQapgoe/arL+4R0F3SRnPloGI1bac8wHZUpPa5Y69hTO
+evN7Ibkt4VWkRlQm/rjKn6cF070B+VwThvxEvIbQtXc8dMnIifhxoGwjwYCWiWv
q7Tws0Fgqcjr/EaRGrj/pUtPPIJkIiqKg+w6p7mNUyZSm0Ro8eXm5Z4ywJCZjJ55
vC5Z8Nl/zOOcCOIDl2u8rqA5nKR9qN4B7+IgyR6dXzYgqvvRZaNBR/FVQeu5ETMC
VtuI6XgaRNeEC6EVRym+p4wh9ARIiw84QgJF0MJ8GgTXBHvNU10WroQTC68bCJNJ
gSEVHW8jNR5ccUAqvBDfgXcAx7iMEpfeJISXWd7uBYDgKZ1GxjHIJSG3LKWf+uZo
rLwU9IojN6+fu7akreyUEp+BQKAtJYZM29E3jJxAQRKkAPB0nQuIiaXx4K6uNbXZ
7oZLan4Ucnj6O0YI5Jm0RG40rXjLCDrEBaUL3xBf08g48x6kOtGw7rzE+PXHPSvl
g/Hgj6mlAQT1bgjxvE8=
=LFw9
-----END PGP SIGNATURE-----

Marek Marczykowski-Górecki

unread,
Oct 14, 2017, 9:36:13 AM10/14/17
to Andrew Clausen, qubes-devel, Joanna Rutkowska
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Oct 14, 2017 at 05:42:39AM +0100, Andrew Clausen wrote:
Thanks for the explanation.
This should be enough for the online resize part. Now, back to the main
issue here: partition to resize isn't the last one, one need to move two
other partitions first. Hmm, maybe parted can do that too?

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJZ4IF0AAoJENuP0xzK19csdS8H/1W/UsGDIIYtkOPr6tyvAYDM
vXPK8MD9pO8EUaYz2iAUJHJyn2crmF0JNiU++ZtAKpxU9QlTc+ZPgJ/2br2NaWKU
jJotIfgtEdJL6jOZjNhbZGb9gtQjalPx8fGj5CR2inxWvY1Q2Twj0yb/os4kdjSM
OtJEft8l5jTfu0GUOaOleohv4VTebQOc6s4Ea1rJvDYBWvj/JE1bOkRoBkk4A2EE
1cGOHLTJpuIjJWhE5prO9DtVB5ppQHTsm2VFslgHdxtO3CorEPnGaltNOZoARffS
yJRAr0FgxaT9KqMRTF0feZjs/DuYf9PzbfRvnBQKRc2p06ASzCLijgNBbgtlMVQ=
=iMtL
-----END PGP SIGNATURE-----

Wojtek Porczyk

unread,
Oct 14, 2017, 10:01:22 AM10/14/17
to Marek Marczykowski-Górecki, Andrew Clausen, qubes-devel, Joanna Rutkowska
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Oct 14, 2017 at 03:36:05PM +0200, Marek Marczykowski-Górecki wrote:
> Thanks for the explanation.
> This should be enough for the online resize part. Now, back to the main
> issue here: partition to resize isn't the last one, one need to move two
> other partitions first. Hmm, maybe parted can do that too?

Can't we change the partition order and add some compatibility to initrd? We
could support resizing only for "new" templates and leave the instruction to
resize older ones in documentation, with a big scary warning that it's better
to reinstall template than attempt manual resizing.


- --
pozdrawiam / best regards _.-._
Wojtek Porczyk .-^' '^-.
Invisible Things Lab |'-.-^-.-'|
| | | |
I do not fear computers, | '-.-' |
I fear lack of them. '-._ : ,-'
-- Isaac Asimov `^-^-_>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJZ4himAAoJEL9r2TIQOiNRKYgP/2SvBiuC5LNv+WbUOwT40m17
zwVMUzdsuYwhXoJd7sMTM/afWEE/UpuyPXgeScENNxFwkNN1O2moNDFsKo/vl+nj
SQocEWuPaXpN5DNdbbSoVyRLo+w6xXi3qNBfXIX13Q0zgVh6hCAn4QdbHh12xxqh
oeRCmsnw8pdXevbsn7S7beJpCkU2kV25HfShh5OHJWtSrNE1anSICqN5kqoe9K61
BsoM47AXmQZQULFFCHGS03YiWdCbNo5n79Bu0+bO5/A0YPBKzft1HK+R51+fNOLZ
iFYWmPBs7OrvsinH62lhvmIGlzjVo295DAd3xXCaUhgNIMjJ9MqSUIC0OADTmX3M
xElF4dvY3cd7ipCm97yFitt60ig83j8qWZqk2e7gXrsuoNgTzGF+STaRdFl5R0WB
cU7v+FWDVyOXhIQ9eEghagHa8YOMXfrj5rZsrQcF9TEhKi+NTA83zh7HO1TpyEA/
iAUeeZ0ak8RoAuFxtIxO6D05VFsqi+5zBR9lkXuSQt10emjlTuqfunNaT26S7G8B
7ZdbJbS5Iz9NwpI1+wPyIgCa90XQ7Do0prqLJDS5jKCuzXhqiJclzKfAZ2gG1qn4
uvaP7gYNBWSWgK9tQHyOnUi7E0eB4NjJhdBNZGNXeSKYqwK1EctCr2K8L6NkDc7h
FPWRJ0JZ+vhEFjZ/Fn7j
=xwUP
-----END PGP SIGNATURE-----

Marek Marczykowski-Górecki

unread,
Oct 14, 2017, 10:45:37 AM10/14/17
to Andrew Clausen, qubes-devel, Joanna Rutkowska
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Oct 14, 2017 at 04:01:11PM +0200, Wojtek Porczyk wrote:
> On Sat, Oct 14, 2017 at 03:36:05PM +0200, Marek Marczykowski-Górecki wrote:
> > Thanks for the explanation.
> > This should be enough for the online resize part. Now, back to the main
> > issue here: partition to resize isn't the last one, one need to move two
> > other partitions first. Hmm, maybe parted can do that too?
>
> Can't we change the partition order and add some compatibility to initrd? We
> could support resizing only for "new" templates and leave the instruction to
> resize older ones in documentation, with a big scary warning that it's better
> to reinstall template than attempt manual resizing.

This is exactly what this thread is about. The question is what would be
more stable/less fragile - supporting two partition layouts, or resizing
of the current layout. This question is especially tricky, because there
are multiple ways how initrd is distributed:
- kernel provided by dom0 (so, dom0 package)
- kernel privided by VM (so, VM package)

Those packages may be updated independently, and in case of kernel from
dom0, may exists in multiple versions simultaneously. This may lead to
"funny" effect that some templates will work only with some kernel
versions. This may, or may not be a problem.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJZ4JG5AAoJENuP0xzK19csPvwH/Rusqx8OMH4iFqa+xcPOwxDL
EBv+PHHO7zg7PewCFSSriRuRNS8sniY0kaJz9mpBsAppd9Y6YALyB91qL3h13ZYJ
WYbzaGFRrZRtBrv+JwEGbEMPHW2CcBO4/3c31iiJGqB1vtva3oNAJ1Z0+e3I0JVU
mLDm7AgNNcrmjDwDTgReaqQawrKEbWEx9lbvVdqs71DZvQOS9GNEUWyxrcRSqe2D
h5lEW4umThbzZOkFXPCaxrOhVsp01jNWf5fZAw0hueaUMONcXtHkE7ge87HkQCIM
SrCBRayJRY6Ey9ZY/EeD0h0Ie3jvdyBJpeGTfVxkYtw7MJuXqfMvoUPG5fUg7jo=
=EI7D
-----END PGP SIGNATURE-----

Andrew Clausen

unread,
Oct 14, 2017, 11:29:32 AM10/14/17
to Marek Marczykowski-Górecki, qubes-devel, Joanna Rutkowska
Hi Marek,

On 14 October 2017 at 15:45, Marek Marczykowski-Górecki <marm...@invisiblethingslab.com> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Oct 14, 2017 at 04:01:11PM +0200, Wojtek Porczyk wrote:
> On Sat, Oct 14, 2017 at 03:36:05PM +0200, Marek Marczykowski-Górecki wrote:
> > Thanks for the explanation.
> > This should be enough for the online resize part. Now, back to the main
> > issue here: partition to resize isn't the last one, one need to move two
> > other partitions first. Hmm, maybe parted can do that too?

It used to be able to.  However, when I retired from maintaining Parted, the new maintainers removed support for resizing file systems.  (It's fairly complicated code, and much less socially important now than when I wrote it in the late 90s.)  After some protest, they added it back into libparted, but it's still not available in the parted command-line tool.  There is another tool, fatresize, but this isn't very helpful because it doesn't allow you to move the start of the partition.  I believe other front-ends such as Parted Magic and GParted maintain the full functionality, but probably not in a scriptable fashion.

I could write a small tool especially for Qubes, if you think this is the best solution.  But see below...

> Can't we change the partition order and add some compatibility to initrd? We
> could support resizing only for "new" templates and leave the instruction to
> resize older ones in documentation, with a big scary warning that it's better
> to reinstall template than attempt manual resizing.

This is exactly what this thread is about. The question is what would be
more stable/less fragile - supporting two partition layouts, or resizing
of the current layout. This question is especially tricky, because there
are multiple ways how initrd is distributed:
 - kernel provided by dom0 (so, dom0 package)
 - kernel privided by VM (so, VM package)

Those packages may be updated independently, and in case of kernel from
dom0, may exists in multiple versions simultaneously. This may lead to
"funny" effect that some templates will work only with some kernel
versions. This may, or may not be a problem.

I'm surprised there isn't a generic initrd standard to avoid this problem.  How disappointing!  For example, initrd could identify the system partition by its name, not its location.

Some other options:

 (1) Delete the old system partition, and create a new one by copying the files across.  This should be fairly quick, because the system partition shouldn't have much stuff on it, right?  (I doubt it would be slower than libparted's resizer.)

 (2) Make the virtual hard disk very large (e.g. 100TB) but store it in a sparse format.  Put the system partition at the very end, so you never have to move it around.

 (3) Use LVM.

I find option (2) most elegant.  The main difficulty would be: how do you enforce space limits?  I worry that an AppVM would have a copy-on-write view of the massive template image, and would be able to fill in the sparse holes to do a denial-of-service attack on the whole system.

Option (1) is probably quite easy to implement and most robust.

Cheers,
Andrew

Marek Marczykowski-Górecki

unread,
Oct 14, 2017, 12:07:03 PM10/14/17
to Andrew Clausen, qubes-devel, Joanna Rutkowska
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Oct 14, 2017 at 04:29:30PM +0100, Andrew Clausen wrote:
> Hi Marek,
>
> On 14 October 2017 at 15:45, Marek Marczykowski-Górecki <
> marm...@invisiblethingslab.com> wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA256
> >
> > On Sat, Oct 14, 2017 at 04:01:11PM +0200, Wojtek Porczyk wrote:
> > > On Sat, Oct 14, 2017 at 03:36:05PM +0200, Marek Marczykowski-Górecki
> > wrote:
> > > > Thanks for the explanation.
> > > > This should be enough for the online resize part. Now, back to the main
> > > > issue here: partition to resize isn't the last one, one need to move
> > two
> > > > other partitions first. Hmm, maybe parted can do that too?
> >
>
> It used to be able to. However, when I retired from maintaining Parted,
> the new maintainers removed support for resizing file systems. (It's
> fairly complicated code, and much less socially important now than when I
> wrote it in the late 90s.) After some protest, they added it back into
> libparted, but it's still not available in the parted command-line tool.
> There is another tool, fatresize, but this isn't very helpful because it
> doesn't allow you to move the start of the partition. I believe other
> front-ends such as Parted Magic and GParted maintain the full
> functionality, but probably not in a scriptable fashion.

It's not about resizing those other partitions, it's about moving them.
For resizing main filesystem resize2fs is enough.

> I could write a small tool especially for Qubes, if you think this is the
> best solution. But see below...
>
> > Can't we change the partition order and add some compatibility to initrd?
> > We
> > > could support resizing only for "new" templates and leave the
> > instruction to
> > > resize older ones in documentation, with a big scary warning that it's
> > better
> > > to reinstall template than attempt manual resizing.
> >
> > This is exactly what this thread is about. The question is what would be
> > more stable/less fragile - supporting two partition layouts, or resizing
> > of the current layout. This question is especially tricky, because there
> > are multiple ways how initrd is distributed:
> > - kernel provided by dom0 (so, dom0 package)
> > - kernel privided by VM (so, VM package)
> >
> > Those packages may be updated independently, and in case of kernel from
> > dom0, may exists in multiple versions simultaneously. This may lead to
> > "funny" effect that some templates will work only with some kernel
> > versions. This may, or may not be a problem.
> >
>
> I'm surprised there isn't a generic initrd standard to avoid this problem.
> How disappointing! For example, initrd could identify the system partition
> by its name, not its location.

There are few issues with using filesystem label there. The primary one
is: it is ambiguous, because there are two things in AppVM:
- read-only base image, from VM template
- read-write copy-on-write image, assembled from the one above

Granted, that before assembling the second one (which is done by initrd),
there is only one. I haven't investigated _partition_ names available
in GPT (as opposed to filesystem labels). Maybe that would be the way to
go?

> Some other options:
>
> (1) Delete the old system partition, and create a new one by copying the
> files across. This should be fairly quick, because the system partition
> shouldn't have much stuff on it, right? (I doubt it would be slower than
> libparted's resizer.)

Up to 10GB, in default setup. resize2fs is sufficiently fast, no need to
copy files. But to do that, you need to resize the partition. And to
resize it, you need to move other two partitions further to the disk end
(in the current partition layout). Let me remind that current layout:

1. xvda1: root filesystem (almost all available space)
2. xvda2: EFI system partition (empty, prepared for PVHv2 boot with VM-provided kernel)
3. xvda3: BIOS boot (grub2 installed, used when `kernel=''`)

You don't need to move xvda1. Only xvda2 and xvda3, which are small
(about 200M total).

Changing it to have root filesystem at the end would ease resizing, but
may introduce compatibility issues explained before.

> (2) Make the virtual hard disk very large (e.g. 100TB) but store it in a
> sparse format. Put the system partition at the very end, so you never have
> to move it around.

Well, in dom0 it is stored as such (or on LVM thin volume - depending on
configuration). But the volume size is limited on purpose - otherwise
_any_ VM could easily fill the whole disk.

> (3) Use LVM.
>
> I find option (2) most elegant. The main difficulty would be: how do you
> enforce space limits? I worry that an AppVM would have a copy-on-write
> view of the massive template image, and would be able to fill in the sparse
> holes to do a denial-of-service attack on the whole system.

Exactly. This is the reason why it is limited. Unfortunately most (all?)
copy-on-write schemes and sparse volumes (either files, or LVM) react
badly when you run out of disk space. If you're very unlucky, even
fsck will not help you to cleanup all that mess (after providing more
space...). On the other hand, if you have filesystem size same as volume
size limit, you get "No space left on device" error, instead of various
I/O errors and filesystem corruption. Bonus: you don't break free space
reporting.

This is does not exclude storing such VM disk image as sparse thing in
dom0, so you use even less disk (than that 10GB).

> Option (1) is probably quite easy to implement and most robust.
>
> Cheers,
> Andrew

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJZ4KTPAAoJENuP0xzK19csncwH/jbqDqQFN/htk25gPKIg7N9d
x6LOgTIdGoUuyXOdnw6nxKYEBxmQgl0pCnUEldP3dlI2FpXX8ZnRPydoV0k9tM3+
JzDwscNwnr6FZwvgZzlDm8/8dx7LTzAsISkLatXnJ2C48mZMi7ff7mQ7GvGkDa2Y
QeMPi5ZfywVZx50ziSBcAMKYjQJ2KJhiFcEvk675j48k2xkC/xXUn4Md6yBMABo2
9uvIvaZhzhs19IPIrQzdtqyiUspTGoPcqbkGaY4QTtFu23+DrRAMuHlJUvHR0q2I
j/0bHM79Ht32iROEULmJZhvzkiMmzaHzrd7aLFGU9FW9eNDg/Q8D6REkN3hDDzU=
=YqZb
-----END PGP SIGNATURE-----

Andrew Clausen

unread,
Oct 14, 2017, 3:47:15 PM10/14/17
to Marek Marczykowski-Górecki, qubes-devel, Joanna Rutkowska
Hi Marek,

On 14 October 2017 at 17:06, Marek Marczykowski-Górecki <marm...@invisiblethingslab.com> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Oct 14, 2017 at 04:29:30PM +0100, Andrew Clausen wrote:
> Hi Marek,
>
> On 14 October 2017 at 15:45, Marek Marczykowski-Górecki <
> marmarek@invisiblethingslab.com> wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA256
> >
> > On Sat, Oct 14, 2017 at 04:01:11PM +0200, Wojtek Porczyk wrote:
> > > On Sat, Oct 14, 2017 at 03:36:05PM +0200, Marek Marczykowski-Górecki
> > wrote:
> > > > Thanks for the explanation.
> > > > This should be enough for the online resize part. Now, back to the main
> > > > issue here: partition to resize isn't the last one, one need to move
> > two
> > > > other partitions first. Hmm, maybe parted can do that too?
> >
>
> It used to be able to.  However, when I retired from maintaining Parted,
> the new maintainers removed support for resizing file systems.  (It's
> fairly complicated code, and much less socially important now than when I
> wrote it in the late 90s.)  After some protest, they added it back into
> libparted, but it's still not available in the parted command-line tool.
> There is another tool, fatresize, but this isn't very helpful because it
> doesn't allow you to move the start of the partition.  I believe other
> front-ends such as Parted Magic and GParted maintain the full
> functionality, but probably not in a scriptable fashion.

It's not about resizing those other partitions, it's about moving them.

Sorry for my esoteric choice of words... I think of moving is a special case of resizing when the old and new locations don't overlap.  The modern Parted command line tools can neither move nor resize, although all the code to do so is still present inside inside libparted (for fat only).
 
For resizing main filesystem resize2fs is enough.

Right.

> > Can't we change the partition order and add some compatibility to initrd?
> > We
> > > could support resizing only for "new" templates and leave the
> > instruction to
> > > resize older ones in documentation, with a big scary warning that it's
> > better
> > > to reinstall template than attempt manual resizing.
> >
> > This is exactly what this thread is about. The question is what would be
> > more stable/less fragile - supporting two partition layouts, or resizing
> > of the current layout. This question is especially tricky, because there
> > are multiple ways how initrd is distributed:
> >  - kernel provided by dom0 (so, dom0 package)
> >  - kernel privided by VM (so, VM package)
> >
> > Those packages may be updated independently, and in case of kernel from
> > dom0, may exists in multiple versions simultaneously. This may lead to
> > "funny" effect that some templates will work only with some kernel
> > versions. This may, or may not be a problem.
> >
>
> I'm surprised there isn't a generic initrd standard to avoid this problem.
> How disappointing!  For example, initrd could identify the system partition
> by its name, not its location.

There are few issues with using filesystem label there. The primary one
is: it is ambiguous, because there are two things in AppVM:
 - read-only base image, from VM template
 - read-write copy-on-write image, assembled from the one above

Ah, I see.  Can't the AppVM's initrd change the label of the copy-on-written file-system image?  For example, the original template might be called "root", and the AppVM's private copy could be renamed to "root-copy" during boot.  I think this can be implemented with tune2fs.  Whether you use partition labels or filesystem labels boils down to what is more convenient, e.g. what is easier inside an initrd environment.

Granted, that before assembling the second one (which is done by initrd),
there is only one. I haven't investigated _partition_ names available
in GPT (as opposed to filesystem labels). Maybe that would be the way to
go?

Yes, although I don't see how this is conceptually different from filesystem labels.  (Neither seem to be a problem -- see above.)

Just to clarify the problem: in the long run, you are happy to require that templates locate the root filesystem at the end of the virtual disk.  But for now, you think it's too big a change for the imminent Qubes 4.0 release.  But you are willing to change the initramfs scripts for all templates?  I'm just trying to understand which solutions are easiest for you to implement.

> Some other options:
>
>  (1) Delete the old system partition, and create a new one by copying the
> files across.  This should be fairly quick, because the system partition
> shouldn't have much stuff on it, right?  (I doubt it would be slower than
> libparted's resizer.)

Up to 10GB, in default setup. resize2fs is sufficiently fast, no need to
copy files. But to do that, you need to resize the partition. And to
resize it, you need to move other two partitions further to the disk end
(in the current partition layout). Let me remind that current layout:

1. xvda1: root filesystem (almost all available space)
2. xvda2: EFI system partition (empty, prepared for PVHv2 boot with VM-provided kernel)
3. xvda3: BIOS boot (grub2 installed, used when `kernel=''`)

You don't need to move xvda1. Only xvda2 and xvda3, which are small
(about 200M total).

Changing it to have root filesystem at the end would ease resizing, but
may introduce compatibility issues explained before.

I'm not sure you understood my proposal (which is my fault).  I meant that we could move xvda{2,3} by deleting them and recreating them in a new location.  I think this is the best solution.  The main difficulty is backing up the contents somewhere, but this should be easy -- especially since the contents are small.

Cheers,
Andrew

Leo Gaspard

unread,
Oct 14, 2017, 5:42:27 PM10/14/17
to Marek Marczykowski-Górecki, Andrew Clausen, qubes-devel, Joanna Rutkowska
On 10/14/2017 06:06 PM, Marek Marczykowski-Górecki wrote:> Changing it
to have root filesystem at the end would ease resizing, but
> may introduce compatibility issues explained before.

Most likely a stupid / already thought of before idea, but... as, to
migrate from Qubes 3.2 to Qubes 4.0, it will be required to make a
backup and import from the backup... maybe just having the
“import-from-backup” tool do the partition shuffling would be a simpler
way to go than relying on the initramfs?

Then, rc's for Qubes 4 are already quite far in the works, so maybe it
is too late.

Just my two cents,
Leo

signature.asc

Andrew Clausen

unread,
Oct 14, 2017, 5:51:21 PM10/14/17
to Leo Gaspard, Marek Marczykowski-Górecki, qubes-devel, Joanna Rutkowska
Hi Leo,

On 14 October 2017 at 22:42, Leo Gaspard <l...@gaspard.io> wrote:
On 10/14/2017 06:06 PM, Marek Marczykowski-Górecki wrote:> Changing it
to have root filesystem at the end would ease resizing, but
> may introduce compatibility issues explained before.

Most likely a stupid / already thought of before idea, but... as, to
migrate from Qubes 3.2 to Qubes 4.0, it will be required to make a
backup and import from the backup... maybe just having the
“import-from-backup” tool do the partition shuffling would be a simpler
way to go than relying on the initramfs?

I think the problem we're trying to solve is about TemplateVMs.  Backups are for AppVMs, not TemplateVMs.  Specifically, backups are for user data (such as /home), which are stored on separate virtual disks.

Then, rc's for Qubes 4 are already quite far in the works, so maybe it
is too late.

I think the issue is that the tools for building templates would need to be adapted at the last minute.

Cheers,
Andrew

Marek Marczykowski-Górecki

unread,
Oct 16, 2017, 9:45:31 AM10/16/17
to Andrew Clausen, qubes-devel, Joanna Rutkowska
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Oct 14, 2017 at 04:01:11PM +0200, Wojtek Porczyk wrote:
> On Sat, Oct 14, 2017 at 03:36:05PM +0200, Marek Marczykowski-Górecki wrote:
> > Thanks for the explanation.
> > This should be enough for the online resize part. Now, back to the main
> > issue here: partition to resize isn't the last one, one need to move two
> > other partitions first. Hmm, maybe parted can do that too?
>
> Can't we change the partition order and add some compatibility to initrd? We
> could support resizing only for "new" templates and leave the instruction to
> resize older ones in documentation, with a big scary warning that it's better
> to reinstall template than attempt manual resizing.

We've just had extensive discussion about possible options. It boils
down to:

1. Leave the current partition layout but have more complex resize
operation (more fragile, harder to support - especially when someone
heavily customize the template). This option would allow us to release
4.0rc2 today (with possibility to add root resize script later),
according to the schedule.

2. Change the partition layout to have simpler resize operation. This
also would be more logical layout - boot-related partition at the
beginning. But this means breaking existing 4.0rc1 installations, or at
least not supporting root volume resize there at all (unless templates
are reinstalled). This also means delaying rc2 a week more, to do this
change, test it, and rebuild all the templates.

Given those two options, we've decided it's better to have cleaner
situation longterm (option 2) at the cost of less convenient rc stage,
than the other way around. After all, we'll need to support this for a
much longer time than we spend now at release candidates.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJZ4yanAAoJENuP0xzK19csVFcH/RUKu+YjxvyXz56SF1Gf2ZiW
MF8jC9lagojeDoxMBz/9wDDkUj9xz4tLUffagPtSSyxKrF2CTQfBAPEsHsHbY/sC
EAdTc1jIZl9ea4n551bVA8l41Tntwj5PWkla6TtZF3k0u0XLKCBAk23antBtqN+/
CRvmxjeVp0qrZ0nk9nUyvGvBpvBvfTI6waqLt0PUpdTspEwmgn3rY3d+5T2W0Jhk
SXC+8oQRLAWeOQkiUhikCvUT8kogE/GLV5B2CerzCBgQvf4bw+D2J1/PR3ByKAYw
dmmugv8AS6WPxs/Hu6y82735u4f3bnrSe7qKyZeEn63lCl+gz88chBwM5H89S4o=
=S7wD
-----END PGP SIGNATURE-----

Holger Levsen

unread,
Oct 16, 2017, 11:38:10 AM10/16/17
to Marek Marczykowski-Górecki, Andrew Clausen, qubes-devel, Joanna Rutkowska
On Mon, Oct 16, 2017 at 03:45:24PM +0200, Marek Marczykowski-Górecki wrote:
> Given those two options, we've decided it's better to have cleaner
> situation longterm (option 2) at the cost of less convenient rc stage,
> than the other way around. After all, we'll need to support this for a
> much longer time than we spend now at release candidates.

Thumbs up & thank you! :-)


--
cheers,
Holger
signature.asc

Zrubi

unread,
Oct 17, 2017, 7:10:04 AM10/17/17
to Marek Marczykowski-Górecki, qubes-devel, Joanna Rutkowska, Wojciech Porczyk
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 10/14/2017 12:22 AM, Marek Marczykowski-Górecki wrote:
> Current partition layout of a template in Qubes 4.0 is: 1. xvda1:
> root filesystem (almost all available space) 2. xvda2: EFI system
> partition (empty, prepared for PVHv2 boot with VM-provided kernel)
> 3. xvda3: BIOS boot (grub2 installed, used when `kernel=''`)
>
> This makes resizing root volume hard, because one need to move
> xvda[23] data to the (new) end of the disk first. Also, having
> partitions at all (de facto required by grub2), makes online resize
> of root volume hard/impossible.


I just wonder why not using separate disks instead of partitions?
This approach much more flexible, even if you have to create a single
partition for compatibility.


- --
Zrubi
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJZ5eT9AAoJEBozWgtUjzdkfGAP/2lj5w3xo4u9LM4HYJrDDN/N
j7zBv8ZxHoJAKhN/7DURqVaNLYeap7+Ge9bj6pGhFx5JWXYm3x4VqlQJVD9MdqjP
hYtkR1jKvPPTmPVs/Z0ZSrawoS8hHD6YjTrRbFqaLREt0xGJ/uKJVNbNokOpL33R
IqJVTp8AVYOY6ErPZK38QtsUs2OqS3tND4na7INZUmvaGT4mUGJDCrCkZ+qaEICS
4QnoOnqiPty8n8eIBRxJEPZwGNObei0KdSYla2MrZ0VZZcR8Tkaf2TBApEWTOXuK
31GvTGm0b81AZQsCdgcA6e8Sy13cx9/lrQ2Ic1eILE1MdveOPdJA49WfmqoKmD3w
yvEydk27BtG4GX7K+LJT9gc4y2N79eP1iPVfvuDji8vBIVCQk0CBqVRkr3LXJ53e
RlOS8gGOSuR20IQjv/BmQbEM37r8oh20gCelPLlXS1+Ft6KnRDSHaAk8L5duwk6X
wbP1OGgwnwAGAkcKT/Mrcj5TsftvWldFu4l5ABVCHv/HZotIOdsmqkyNrNlhyEn+
yxDcTSuukbRhZP95lfVO8EZ+h7Ge7iOS6FtPjmWbNRDIgJbiWStJBiIYva32YGhK
l1M4QBfbDE4j2XMc0enuEA/U0BYohSLgXyQLSXXFzmX/p7P9Opxcm+im1VR9f4pr
DQVUpjlI0ndJJ7YBjYo3
=ZHty
-----END PGP SIGNATURE-----

Marek Marczykowski-Górecki

unread,
Oct 17, 2017, 7:24:59 AM10/17/17
to Zrubi, qubes-devel, Joanna Rutkowska, Wojciech Porczyk
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Tue, Oct 17, 2017 at 01:09:58PM +0200, Zrubi wrote:
> On 10/14/2017 12:22 AM, Marek Marczykowski-Górecki wrote:
> > Current partition layout of a template in Qubes 4.0 is: 1. xvda1:
> > root filesystem (almost all available space) 2. xvda2: EFI system
> > partition (empty, prepared for PVHv2 boot with VM-provided kernel)
> > 3. xvda3: BIOS boot (grub2 installed, used when `kernel=''`)
> >
> > This makes resizing root volume hard, because one need to move
> > xvda[23] data to the (new) end of the disk first. Also, having
> > partitions at all (de facto required by grub2), makes online resize
> > of root volume hard/impossible.
>
>
> I just wonder why not using separate disks instead of partitions?
> This approach much more flexible, even if you have to create a single
> partition for compatibility.

While in this particular case it would indeed make things easier, it
would be harder to support different operating systems. Here, Linux
require two partitions (root fs + EFI or BIOS boot (*)), but other OSes
have different requirements - for example Windows. Having different
(number of) disks depending on OS installed inside would greatly
complicate things. Here we treat content of root.img as a black box from
dom0 point of view. See that the issue is about qubes.ResizeDisk
service running inside of the VM, which can be implemented differently
depending on operating system.

(*) We have both to ease migration between those two modes, without the need
to repartition root volume.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJZ5Fc5AAoJENuP0xzK19csEccH/jiMONC/BVqRkQdh+5bqONR2
s+a6vfxpYkv0qboJaxvAJdq9Ahy1kY3SBw1RxwtEV4Qs67sifh9FCRZ8XRUtSCkE
SbBJ617CJnJtwNikk9rZ+yUwhPgmAdDeweh6968XPnahT/Tk29IJdkMfO83YCvAq
Vp0sG9+X2E24WmIZOolC5m7MxP2mxiKhqGoOfTFMcL6MBtyHjBEnXF5AY4SVr2pz
e6nSIZmOFiwldI47WVrg8Nn8DtKbnYR1PODd8j0Zh6oPfYvup1MF3C4P0oLHNpGr
tHjSqj7MCgETfQl9lXuk7rhdGDPCbWldyuCi0uSU5+ablrpgQ3qsq4y/EJ4SNcA=
=Q62R
-----END PGP SIGNATURE-----
Reply all
Reply to author
Forward
0 new messages