Bug#989462: set Vagrant box filesystem size to something much larger than 20GB

Hans-Christoph Steiner

unread,

Jun 4, 2021, 6:40:03 AM6/4/21

to

Package: cloud.debian.org

The Vagrant guidelines say:

"you should create a dynamically resizing drive with a large maximum size. This
causes the actual footprint of the drive to be small initially, but to
dynamically grow towards the max size as disk space is needed, providing the
most flexibility for the end user."
https://www.vagrantup.com/docs/boxes/base

Both VirtualBox and libvirt qcow2 formats support exactly that. The official
Debian box seems to be set at a fixed 20GB. This makes it very difficult for
F-Droid to use these images as the base box for our Vagrant boxes.
https://gitlab.com/fdroid/fdroid-bootstrap-buildserver/-/issues/10

F-Droid's base box has been using 1TB for a long time with good results.
gitlab.com/fdroid/basebox/

Once this is fixed in Debian's images, then F-Droid no longer needs the forked
base box and can use only Debian's.

Emmanuel Kasper

unread,

Jun 4, 2021, 9:40:03 AM6/4/21

to

Hi Hans-Christoph
Thanks for your interest for the Debian Vagrant base boxes. Please note
that these boxes do not have the "official" status, because they're not
built on a Debian Server, but the code to create those boxes is
maintained by Debian Developers.

Now to go back to your initial bug report, creating a thin-provisioned
image seems like a good idea but there is some work involved.
Right now we're building a raw disk image which we are then converting
to qcow2 and vmdk.
My understanding is that creating a thin provisioned disk image would
only work if we're already using a disk image format supporting at the
file system creation step.
maybe @Thomas can comment on that

An alternative would be to use in the disk image the cloud-init rppt
growfs stiff, where you can enlarge the disk via qemu-img, and on the
next boot a script in initrd detect the disk change and resize the
filesystem accordingly.

IMHO the second option is the best, as the first option would mean we
would need two different builds, one for VMDK, one for qcows2.

Hans-Christoph, a first question, are you sure VirtualBox supports
dynamically resized drives for VMDK ? Last time I checked it was only VDI.

Emmanuel

Hans-Christoph Steiner

unread,

Jun 4, 2021, 11:40:03 AM6/4/21

to

I think at least one of the F-Droid contributors could find some time to
contribute to the Debian images if we have direction on what kind of solution
you'd accept here. Sounds like you've provided some ideas already.

Looks like VMDK is still not resizable, but it is possible to convert to VDI
then back. We're aiming to move entirely to libvirt anyway, so a fix only in
libvirt will still be very helpful.

Emmanuel Kasper

unread,

Jun 7, 2021, 4:20:02 AM6/7/21

to

Hi Hans

I had a look at the fai-disk-image, and it seems that nothing in the
toolchain prevents us from using a disk image with a larger virtual size
for VirtualBox *and* libvirt,
as currently:
- we create a sparse file of raw format using fai-disk-image
- then we convert this disk image to a thin provisioned vmdk/qcow2
vmdk in virtualbox 6 do support thin provisioned disk images, the 20GB
current vagrant boxes uses 1.4 GB on my hard drive

So the next for you to do would be:
- bump the disk size in
https://salsa.debian.org/cloud-team/debian-vagrant-images/-/blob/master/src/debian_cloud_images/cli/build.py#L179
and following

- create a pull request, so that it triggers a build and let's see if
this passes the test suite

In a distinct, separate step, we should probably review the settings we
use when creating the qcow2 image in
https://salsa.debian.org/cloud-team/debian-vagrant-images/-/blob/master/utils/vagrant/libvirt/create-vagrant-libvirt-box#L18

to see if if we could make use of the recommended settings mentioned at
https://www.jamescoyle.net/how-to/2060-qcow2-physical-size-with-different-preallocation-settings
for performance

Do you know any caveats of working with a larger filesystem, let's say
we would use a 512GB disk image. Slower fsck on boot in case of crash ?

Emmanuel Kasper

unread,

Oct 18, 2021, 3:40:03 AM10/18/21

to

Hi all
I am following the
https://salsa.debian.org/cloud-team/debian-vagrant-images/-/merge_requests/11
and wanted to remark something that did not come to my mind earlier.

If the build you're using inside the Vagrant box needed need so much
room in the filsystem, what about doing this build inside the /vagrant
synced directory which is shared with the host ?
(https://www.vagrantup.com/docs/synced-folders)

THis way you're only limited by the disk space of the host.
When using NFS, and providing you don't do fsyncs write (I doubt the
builds tools are doing fsync) you get nearly host system performance.
I just verified that with fio.

When doing a mount over NFS you need root permission at the initial
creation of the box to create the NFS export to the guest in /etc/exports.

vagrant will create and manage an entry like this:
# VAGRANT-BEGIN: 1000 9a45b17b-a2b0-4d05-852d-8e5cf1709068
"/home/manu/Projects/vagrenvs/kuberdebian"
192.168.121.91(rw,no_subtree_check,all_squash,anonuid=1000,anongid=1000,fsid=4201683301)
# VAGRANT-END: 1000 9a45b17b-a2b0-4d05-852d-8e5cf1709068

but you only need this root permission *at box creation* and not for
further start/stop.

What do you think ? Any reason not to use the shared /vagrant directory
for your builds ?

Hans-Christoph Steiner

unread,

Nov 10, 2021, 11:10:03 AM11/10/21

to

synced dirs are difficult to manage when the use case is security-sensitive. In
F-Droid production, they are only used during box creation. Then production
builds do not use them. Also, some Android app builds are literally bigger than
20GB, and running the build in the synced folder would be painfully slow.

Emmanuel Kasper

unread,

Jan 20, 2022, 1:30:07 PM1/20/22

to

I did some testing around
https://salsa.debian.org/cloud-team/debian-vagrant-images/-/tree/1TBv2
(not merged in master yet) and I am still reluctant to merge the branch.
I am OK to bump the default disk size to something like 40GB but not to 1TB.

The problem with the disk size of 1TB is as such: when you do a lot of
write / erase cycles, the deletion of blocks is not propagated to the
qcow2 backing disk image, so even though the OS in the VM reports only
2GB of block usage, the disk image could grow to 1TB, without the user
knowing it.
I could reproduce this behavior running `fio` in the guest in a loop.
I find this behavior dangerous.

At that point I see three possibilities:
- you add to your pull request a change of the virtualized disk
controller from virtio-blk to virtio-scsi and to the default libvirt
vagrantfile the "unmap" option so that deletion of blocks in the guest
are propagated om host storage
- you're fine with a disk image size of 40, or let's say 80GB
- you use a shared folder for the builds. I just noticed vagrant-libvirt
has also support for virtio-fs which according to its author has native
host performance. If they are security concerns, let's discuss that in
details and involve upstream if needed. virtio-fs is mature enough that
it's use in production for Kata Containers in Kubernetes and OpenShift
Sandboxed containers in the Red Hat Kubernetes offering.

Hans-Christoph Steiner

unread,

Jan 21, 2022, 6:20:05 AM1/21/22

to

Emmanuel Kasper:

> I did some testing around
> https://salsa.debian.org/cloud-team/debian-vagrant-images/-/tree/1TBv2
> (not merged in master yet) and I am still reluctant to merge the branch.
> I am OK to bump the default disk size to something like 40GB but not to 1TB.
>
> The problem with the disk size of 1TB is as such: when you do a lot of
> write / erase cycles, the deletion of blocks is not propagated to the
> qcow2 backing disk image, so even though the OS in the VM reports only
> 2GB of block usage, the disk image could grow to 1TB, without the user
> knowing it.

Yeah, I also thought about that. Would it be possible to ship the images with a
disk/partition size of 1TB but keep a filesystem size of 20GB? It is easy to
expand the filesystem as needed. The hard part is expanding the disk/partition
size.

> I could reproduce this behavior running `fio` in the guest in a loop.
> I find this behavior dangerous.
>
> At that point I see three possibilities:
> - you add to your pull request a change of the virtualized disk
> controller from virtio-blk to virtio-scsi and to the default libvirt
> vagrantfile the "unmap" option so that deletion of blocks in the guest
> are propagated om host storage

This sounds like the ideal solution. I have no idea how much work it would be.
Do you?

> - you're fine with a disk image size of 40, or let's say 80GB

Chromium builds can take more than 100GB, so either of those would mean we still
need to make our own basebox.

> - you use a shared folder for the builds. I just noticed vagrant-libvirt
> has also support for virtio-fs which according to its author has native
> host performance. If they are security concerns, let's discuss that in
> details and involve upstream if needed. virtio-fs is mature enough that
> it's use in production for Kata Containers in Kubernetes and OpenShift
> Sandboxed containers in the Red Hat Kubernetes offering.

We like the security isolation of throwing away all things that the build
process has written to, so this option is less appealing, though perhaps
workable. It there was a host-controlled method of resetting the virtio-fs to a
previous snapshot, then it could work.

.hc

Hans-Christoph Steiner

unread,

Feb 3, 2022, 5:10:04 PM2/3/22

to

> - you add to your pull request a change of the virtualized disk
> controller from virtio-blk to virtio-scsi and to the default libvirt
> vagrantfile the "unmap" option so that deletion of blocks in the guest
> are propagated om host storage

I looked into this a bit more. These feature weren't added to vagrant-libvirt
until 0.4.0, so no stable Debian package of vagrant-libvirt could support this
yet. Debian/testing has 0.7.0, and a backport to stable would be possible. I
guess a lot of people just use `vagrant plugin install ...`. It might be
possible to make it work with older vagrant-libvirt versions by using a hack
like this:

https://github.com/vagrant-libvirt/vagrant-libvirt/pull/692#issuecomment-922329049

There is a little more info here:
https://github.com/vagrant-libvirt/vagrant-libvirt/issues/999#issuecomment-487728207

```
ENV['VAGRANT_EXPERIMENTAL'] = 'typed_triggers'

require 'open3'

Vagrant.configure('2') do |config|
...
config.vm.provider :libvirt do |lv, config|
lv.storage :file, :size => '3G', :device => 'sda', :bus => 'scsi', :discard
=> 'unmap', :cache => 'unsafe'
config.trigger.before
:'VagrantPlugins::ProviderLibvirt::Action::StartDomain', type: :action do |trigger|
trigger.ruby do |env, machine|
stdout, stderr, status = Open3.capture3(
'virt-xml', machine.id,
'--edit', 'type=scsi',
'--controller', 'model=virtio-scsi')
if status.exitstatus != 0
raise "failed to run virt-xml to modify the controller model.
status=#{status.exitstatus} stdout=#{stdout} stderr=#{stderr}"
end
end
end
...
end
end
```

Emmanuel Kasper

unread,

Feb 5, 2022, 8:10:07 AM2/5/22

to

Hi Hans-Christoph

I had a look at this and it looks like although we can now set via the
`disk_driver` options to use `discard` when a SCSI controller is
choosed, it is still not possible to set the SCSI controller `model` to
virtio-scsci via a vagrant-libvirt option. Without a virtio-scsi SCSI
controller I/O will be 10 times slower.

We could actually force the controller model and the discard option
manually in the Vagrantfile shipped by the box using the hack/workaround
you pasted above. But the :action trigger used there is an experimental
feature, requiring VAGRANT_EXPERIMENTAL="typed_triggers" to be set.
So this is too much of an over stretch here at the momment.

When time allows, I will now try to use a box with virtio-fs and see how
it works against NFS as this seem to be way the way forward (maybe use
virtio-fs instead of NFS too by default for libvirt too)

Emmanuel

--
You know an upstream is nice when they even accept m68k patches.
- John Paul Adrian Glaubitz, Debian OpenJDK maintainer

Hans-Christoph Steiner

unread,

Jul 13, 2022, 12:10:04 PM7/13/22

to

Thanks for mapping it out. Do you have any contact to upstream? If so, could
you request the new release? I can update the package.

Emmanuel Kasper:
> Hi
> I took some time to revisit this bug in regards to vagrant libvirt developments.
> I see vagrant-libvirt upstream has merged in virtio-scsi support, which we
> needed for discard
> (https://github.com/vagrant-libvirt/vagrant-libvirt/pull/692/commits)
> So the next steps should be:
> - upstream to release a new version of libvirt with virtio-scsi support
> - Debian to package it
> - change the libvirt vagrant box to use virtio-scsi. It should be enough to set
> the disk bus to scsi then vagrant-libvirt will set the controller type to
> virtio-iscsi by default
> - add the `discard` option so that deleted disk blocks in the VM are also
> deleted in the file disk image
> - finally bump the disk image size in the build process
>