Minimal backups

154 views
Skip to first unread message

Axon

unread,
Oct 21, 2013, 3:52:15 AM10/21/13
to qubes...@googlegroups.com
Regarding the future backup system, and I'm wondering how feasible it
would be to further minimize the size of the backup blob. If you're
uploading backups to the cloud and paying (most likely) for both
bandwidth and storage, it would be nice not to be forced to backup
unnecessary data.

Qubes' backup system is already pretty good about this, because in
normal AppVMs, only the home folder uses disk space. (TemplateVMs use
disk space for the whole file system.) However, it could be better. For
example, I have an AppVM which is around 80 MB on disk, but the files
that I actually created, put there, care about, and want to back up are
only around 200 KB. In a different, larger AppVM, the difference is
around 270 MB.

That may not be much for a single backup, but if someone has a large
number of AppVMs and backs them up regularly and often over a long
period of time, it could really add up. (An average total "discrepancy"
of 1 GB, with daily backups for three years, would amount to over a
terabyte of stuff you didn't really want to back up.)

Perhaps the simplest remedy would be either a backup whitelist or just
an option to back up only non-hidden files/directories in the home
folder (if there are hidden files/directories you want to back up, you
could just make a non-hidden copy). I imagine either of these would be
on a per-vm basis (e.g., a check box in the Qubes VM Manager next to
"Include in backups by default."

Anyway, this is just an idea, not a feature request or anything like
that. I'm sure others have given this more thought than I have.

signature.asc

Zrubecz Laszlo

unread,
Oct 21, 2013, 4:21:44 AM10/21/13
to qubes...@googlegroups.com
On 21 October 2013 09:52, Axon <ax...@openmailbox.org> wrote:
> Perhaps the simplest remedy would be either a backup whitelist or just
> an option to back up only non-hidden files/directories in the home
> folder (if there are hidden files/directories you want to back up, you
> could just make a non-hidden copy).

The current 'Qubes backup system' is designet to do a FULL VM backup.
- means it is backing up the whole private image of a given VM.

What you need is file level backup, then you can pick files instead of
the whole image - there are a lot of tool for that already.



--
Zrubi

Axon

unread,
Oct 22, 2013, 3:32:35 AM10/22/13
to qubes...@googlegroups.com, Zrubecz Laszlo
Yes, but how can a Qubes user do file-level backup/restore securely
(e.g., for the "vault" AppVM)?

signature.asc

Zrubecz Laszlo

unread,
Oct 22, 2013, 4:19:27 AM10/22/13
to qubes...@googlegroups.com
On 22 October 2013 09:32, Axon <ax...@openmailbox.org> wrote:

> Yes, but how can a Qubes user do file-level backup/restore securely
> (e.g., for the "vault" AppVM)?

Well...

- fetch it to dom0
- or just attach a trusted external media and copy your backup. - just
like you do with your full backups in dom0


Of course we can make more sophisticated backup mechanism - (work in
progress as I see) but not sure if it is makes any sense.



--
Zrubi

Axon

unread,
Oct 22, 2013, 4:59:24 AM10/22/13
to qubes...@googlegroups.com, Zrubecz Laszlo
On 10/22/13 01:19, Zrubecz Laszlo wrote:
> On 22 October 2013 09:32, Axon <ax...@openmailbox.org> wrote:
>
>> Yes, but how can a Qubes user do file-level backup/restore securely
>> (e.g., for the "vault" AppVM)?
>
> Well...
>
> - fetch it to dom0

Fetch it to dom0? I'm not sure what you mean. Surely you're not
suggesting copying the data from an AppVM to dom0?

> - or just attach a trusted external media and copy your backup. - just
> like you do with your full backups in dom0
>

The problem is that a medium remains "trusted" only if it's not used
with any lower-security-level AppVMs, which means we end up needing a
different trusted medium for each AppVM (or at least for each security
level/label color). This is quite inelegant.

By contrast, full backups in dom0 require only one trusted medium.
signature.asc

Zrubecz Laszlo

unread,
Oct 22, 2013, 6:40:55 AM10/22/13
to qubes...@googlegroups.com
On 22 October 2013 10:59, Axon <ax...@openmailbox.org> wrote:

> Fetch it to dom0? I'm not sure what you mean. Surely you're not
> suggesting copying the data from an AppVM to dom0?

Well it depends on it's security level.
If it's a vault VM (without networking) then there is a big chance
this is your most trusted VM.


But I'm using the other method right now:

>> - or just attach a trusted external media and copy your backup. - just
>> like you do with your full backups in dom0
>>
>
> The problem is that a medium remains "trusted" only if it's not used
> with any lower-security-level AppVMs, which means we end up needing a
> different trusted medium for each AppVM (or at least for each security
> level/label color). This is quite inelegant.

Well, I'm not see that this issue is so sorious.


Let's see a possible solution:

- You make a file level backup (signed, crypted, etc) inside your AppVM.

- You shoud copy it to somewhere to keep it safe:
* to another AppVM (you can call it backupVM or just use a disp. VM)
* or to dom0

- Then -> copy the whole thing to an attached external/remote media.


What should you trust in this case?

* The source VM.
If it is compomised -> your file level backup can be f*cked anyway.

* The 'another VM'
(No matter if its a backup vm for remote backup or a disposable to
make a local copy to an external datastore.)

If it is somehow compromised? -> no worry, because your backups are
encrypted, signed, etc. So you will see if the backup is tampered or
not.

But in the case you are useing dosposable VM + external media then it
cannot be compromised unless your template is -> but this case your
game is already ower anyway.

* The external media:
Well.. you should only attach it to the backup VM which can be:
- a dedicated VM,
- or disposable VM,
- or dom0

But you should not worry about the backups, but your backup VM, but if
the external media not user anywhere... then why???

* The remote media
This is untrusted anyway. so you should only care about your backup VM security.


* the copy mechanism between VMs:

- copy to another VM
- copy from a VM to dom0

Both are base part of Qubes - which must be trusted.
But if those are compromised shomehow, then you still have the
encrypted, signed, etc files, and they will tell you if they tampered
or not.



Correct me please if I'm went wrong somewhere.





--
Zrubi

Joanna Rutkowska

unread,
Oct 22, 2013, 7:22:24 AM10/22/13
to qubes...@googlegroups.com, Axon
The fundamental problem with backing up AppVMs on anything more
fine-grained that a volume-level, is that we would need to mount AppVM's
volumes (e.g. home dir volume, which is kept in the private.img file in
the AppVM directory). But mounting a volume of an untrusted AppVM is a
very risk operation and we really don't want Dom0 code to do that. Just
think how much the potentially compromised AppVM could mess with its
filesystem or partition table, which might then exploit a hypothetical
flaw in the Dom0 kernel filesystem modules, or partition table parsing
code, or somewhere else.

So, no mounting of AppVMs volumes in Dom0! That's the reason, BTW, why
we don't support incremental backups in Qubes.

I personally think that splitting home directories into "more useful"
and "less useful" parts for backup is just not worth it. However, one
improvement we could make is to give an option to bacup only the
private.img of StandaloneVMs (so ignoring the root.img, which is
normally backup up for Standalone VMs).

Similarly, we could (re)introduce private.img for our HVMs, allowing
users to install e.g. Windows and use private.img as a disk (D:) for
keeping the User directory. Hot sure how easy it is, however, to choose
a non-standard location for Users directory during installation of Win7
or Win8 though.

joanna.

signature.asc

Steve Coleman

unread,
Oct 22, 2013, 6:52:54 PM10/22/13
to qubes...@googlegroups.com
On 10/22/13 07:22, Joanna Rutkowska wrote:

> So, no mounting of AppVMs volumes in Dom0! That's the reason, BTW, why
> we don't support incremental backups in Qubes.

I personally would _never_ advocate involving Dom0 mounting an unknown
data filesystem in any backup scheme. But recently I have been wondering
myself if there might be a compromise with regards to AppVM access to
offline storage in general, and in particular for the possibility of
running local incrementals.

I Personally had to implement a IT corporate/network backup of one AppVM
using proprietary network backup software just to be compliant with IT
policy. Shutting down that one VM, or even multiple VM's just to back
them up has been somewhat problematic in scheduling. The best I have
been able to do is perhaps twice a week. It appears it would be quite
hard to automate a Qubes full backup since it naturally affects almost
everything running on that physical hardware if you wand to back up
everything. My 'compliant' incrementals by contrast kick off every 15
minutes, but for now it only runs in my most important work VM.

Having a way to automate an online scheduled incremental for individual
AppVM's independently as a part of a Qubes infrastructure would be
extremely desirable, and as such there would naturally be tradeoffs in
any design given Qubes overall security goals. I have therefor been
thinking if there were a way of leveraging existing security aware
software with the Qubes internal network topology with DispVM's, or
possibly even facilities within Qubes itself, to support some form of
incremental backup architecture that might make some sense security
wise. Ideally one that would not introduce too much of an attack
surface. So, let me just toss a few ideas out there to see what others
might think.

What I have been thinking is one of the following:

1) A DispVM based backup service layered as an encrypted network
filesystem (yes, I can sense you all cringing at the very thought of
this already, but give me a chance):

- A specialized disposable backup VM, with an attached physical drive or
raid volume, which runs only a single sshd like server. Nothing else
runs here. This server hands a client-unique encfs server side encrypted
directory as a chrooted volume to each connecting AppVM client. The
client VM's connection has no access to anything other than their own
server side encrypted space. No other software runs in this disposable
storage VM, and no interpretation of any data structure or content
occurs here, thus little possibility of data corruption attacks due to
the layering of encryption on both client and server side. The client
session has no shell, and so there is no need of any /usr,/etc/ or
anything else mapped into its virtually allocated system file space
other than what is directly needed for ssh tunneling support.

- Each client VM has a different ssh key/id to identify their connection
to this server, so the authenticity of both the server and specific
client is fairly well vetted. The Qubes firewall may provide an extra
level of protections directing who can talk to what VM and when.

- On the client VM side the connection is fuse session mounted through
sshfs and thus only active during the actual backup/storage session. Any
data written to the local fuse volume should be encrypted first before
being passed on to the backup DispVM session, so nobody on the server
end can read or tamper with the client data.

No client VM can play with any other clients data due to the individual
chroot environments and multiple layers of encryption. There is little
possibility that a circumvented storage VM will permit reading or
tampering of the data because data is double encrypted using different
keys found on each end. Malware might somehow obtain the servers set of
encryption keys but still will be unable to decrypt the clients backup
data. Since the server is a disposable VM there should be no persistent
malware present on the backup-instance-per-client-connection VM which
might be capable of messing with a subsequent restore session, and the
backup data file could be checksumed and verified cryptologically by the
client before its use, so even if it had been tampered with that could
be detected. For any injected malware getting a foothold on the DispVM
backup server instance, to inject something into the client side
encrypted file data it would first need the clients private key, and if
it had that you likely have bigger problems already.

The weakest link might be the ssh protocol itself and its interaction
with the fuse-mount mechanism on the client side, before the client side
decryption. If the backup server instance were circumvented while in
operation it could certainly serve bad ssh data back to the client side
in attempt to corrupt the clients IP stack, ssh client software, fuse
mount, or decryption module. Still significant, but not exactly easy
considering they have to first take control of a stripped-down single
use DispVM running only one service, and protected by the Qubes
firewall, in order to pull it off.

The benefit of the above is that each client VM would potentially have
access to volumes of much needed offline storage, even beyond simple
backups if desired, which should be reasonably safe from tampering or
data exfiltration by external threats. But this reasonableness is
obviously not quite up to the hardware separation requirements that the
Qubes system as a whole strives for. For some people this trade off
might be suitable, for others maybe not.

2) Or one could also try layering a simple backup capability built on
top of the Qubes domain file qvm-copy-to-vm mechanism, where the client
VM may need to allocate additional space enough to cache an encrypted
backup incremental file of itself, locally, before it is moved out to an
archival VM's physical disk space. The backup DispVM could receive and
keep different client data files separated into differently encrypted
directories based on the source of the client data. This might then be
good enough to prevent clients from corrupting each others backup data
as they have no opportunity to overwrite other VM's data.

In this scenario, because qvm-copy-to-vm is generally a push-only
mechanism, it would be up to the client to maintain a catalog of backed
up data, and then would need some way to signal for the deletion of
older datasets when that backup set is no longer needed, as the backup
DispVM would have no clue what is actually in any given backup file so
it could not just do any kind of log-rotate type deletion other than
what can be known by general filename conventions. Since this is simply
a one way copy mechanism, restores could only be performed through
manual interaction through Dom0, only serving as the controlling command
center, to recopy the archived files back to an individual AppVM for a
proper data restore there. In that case Dom0 also needs access to the
backup servers crypto keys to extract the proper backup data from any
given per-client encrypted directory/volume.

The 'double the space' requirement for local backup file caching might
be eliminated if there were a such thing as a disposable virtual USB
drive that could be used to cache the encrypted backup file until it can
be moved over to the backup VM. Once the virtual USB physical space is
unmounted its space could be scrubbed for reuse elsewhere. This might
actually be handy for any VM where temporary disk space is required,
such as during VM dependent software installs or creation of temporary
ISO images.

> I personally think that splitting home directories into "more useful"
> and "less useful" parts for backup is just not worth it. However, one
> improvement we could make is to give an option to bacup only the
> private.img of StandaloneVMs (so ignoring the root.img, which is
> normally backup up for Standalone VMs).

Its very likely that any incremental backup will only be defined to
include user created files that are deemed important enough. Keeping
multiple copies of important documents as you are working on them can be
a life saver if you need to roll back for any reason. If a person were
to accidentally configure an incremental set of mostly static files they
would only be backed up once, as they won't change except when the
template underneath changes. It would be a waste of backup disk space
for sure, but at least you would still have all your email you otherwise
would have lost had that last incremental not fired off just an hour
before your VM's drive failed. These days loosing a single unread email
could potentially loose you a job if your particular industry is in
lay-off or downsizing mode. No, things are not like that for me, but
eventually Qubes will be moving into industries where this can and will
happen. Life is full of tradeoffs, and not all of them are always
technical ones.

I believe in the Qubes design goals and I often push others to take a
long and serious look at it for what important security features it can
provide. But at the same time I need to take simple precautions with
backups because I have simply lost way more than my share of disk drives
in the last couple of months. And between disk failures I am still
trying to work out the complexities of getting my actual work done given
this new structure enforcing paradigm. And I'm definitely sticking with
it. Its worth it. ;)

> joanna.

Thank you for all your hard work and the whole teams dedication. I will
be interested in any comments to the above, as I am sure there are built
in features of Qubes I have yet to learn about.

Steve.



Igor Bukanov

unread,
Oct 23, 2013, 4:18:00 AM10/23/13
to qubes...@googlegroups.com
> In this scenario, because qvm-copy-to-vm is generally a push-only
> mechanism, it would be up to the client to maintain a catalog of backed
> up data,


http://duplicity.nongnu.org/ stores encrypted backups remotely
assuming very dumb remote filesystems. It uses local filesystem cache
to store metadata about stored backup so it can generate incremental
backups without accessing the remote part. During the backup it
assumes that the data can be streamed to the remote system so there is
no need to store a copy of the data locally. I suppose it could be
adopted for push-only operations.

Axon

unread,
Oct 22, 2013, 8:29:10 PM10/22/13
to Joanna Rutkowska, qubes...@googlegroups.com
Ah, I hadn't thought about that, but that makes sense.

> I personally think that splitting home directories into "more useful"
> and "less useful" parts for backup is just not worth it. However, one
> improvement we could make is to give an option to bacup only the
> private.img of StandaloneVMs (so ignoring the root.img, which is
> normally backup up for Standalone VMs).
>

Yes, in light of what you said above, it no longer seems worthwhile to
me try to do what I was suggesting for non-StandaloneVMs.
signature.asc

cprise

unread,
Oct 23, 2013, 9:45:30 PM10/23/13
to qubes...@googlegroups.com, ax...@openmailbox.org
Why not have each VM back itself up, using something that can do increments like 'rsnapshot'? Its available in the Fedora repo.

If Qubes used disk image bundles that were similar to OS X sparsebundles, then you could do minimal incremental backups on the disk images without the risk of having to mount them.

Joanna Rutkowska

unread,
Oct 24, 2013, 4:57:27 AM10/24/13
to qubes...@googlegroups.com, cprise, ax...@openmailbox.org
Sure, that's possible. Then, whatever blob each VM creates, this blob is
then exposed to Dom0 via qrexec service. However, one practical problem
with this approach is that it won't automatically work for all the HVMs
VMs (because they might not even run Qubes tools inside), so would be
best for Linux AppVMs, and potentially also for Windows AppVMs (=HVMs
with Qubes Tools installed).

Surely doable, yes.

joanna.

signature.asc

Steve Coleman

unread,
Oct 24, 2013, 11:25:30 AM10/24/13
to qubes...@googlegroups.com
On 10/23/13 21:45, cprise wrote:
> Why not have each VM back itself up, using something that can do
> increments like 'rsnapshot'? Its available in the Fedora repo.

That is exactly what I was attempting in my prior discussion, but while
trying to make use of security features provided by Qubes already.

> If Qubes used disk image bundles that were similar to OS X
> sparsebundles, then you could do minimal incremental backups on the disk
> images without the risk of having to mount them.

You still have the problem of where the data gets stored irrespective of
the file format you are storing. It makes no sense to store it within
that AppVM's own space. Unless you have a dedicated physical block
devices to permanently attach to each individual AppVM then you are
subjecting each AppVM to corruption and possible malware threats from
other rouge or subverted AppVM's. It only takes one bad apple to spoil
the whole bunch so to speak.

My attempt earlier in this thread was to create dedicated instances of a
secure DispVM server which then maintains the control of a file space,
but then partitioned that space in a way that no AppVM can corrupt or
tamper with any other AppVM's data. Neither side need trust the other
except for the communications channel between them, which would thus
need to be security audited.

It seems to me that all that is really needed is a 'virtual encrypted
block device' that can be mounted by its client as required. If an AppVM
could simply mount the equivalent of a chrooted directory of a physical
disk, as a predefined virtual block device, and have its contents
physically stored as an encrypted filesystem pre-keyed for that
particular AppVM, then clients would not need to do anything special in
order to support almost any offline storage needs, including
incrementals. Even HVM's should be able to make use if them so long as
it looks like a regular block device.



Steve

cprise

unread,
Oct 25, 2013, 1:56:39 PM10/25/13
to qubes...@googlegroups.com, cprise, ax...@openmailbox.org

Would an HVM even know (or care) about the disk container format in which it resides? I was thinking if sparsebundles were implemented by the hypervisor, then neither the image format nor backup preparations would be of any worry to the VMs. The backup program running in the hypervisor could do increments by comparing the modification dates on each file within the bundles' folders.

Maybe there is a Linux block device driver already that creates disk images as bundles of files.

cprise

unread,
Oct 30, 2013, 5:34:08 PM10/30/13
to qubes...@googlegroups.com
(...sorry, where I said "hypervisor" read "Dom0" instead.)

Having searched a bit for a block device handler similar to sparsebundles on OS X, I got a couple of what are probably poor substitutes (fuse-zip with multi-part archives, or a JBOD using many small image files) before stumbling upon 'sparsebundlefs' at github:
https://github.com/torarnv/sparsebundlefs

Its currently read-only, but would make an interesting starting point.

Some thoughts on ticket #703, which has some bearing on this issue (or at least backups in general): I don't think there is any risk associated with Dom0 backing up to a local volume that it both creates and encrypts; It can attempt to unlock a partition only if it sees a particular USB identifier + partition scheme as Dom0 is already reading this data whenever a storage device is connected anyway. So the question of using another domain to handle backups (which I find convoluted and error-prone) comes down to just how essential a requirement it is to be able to back up over a network in the near-term. For now, I'd rather see Dom0 handle the backups locally using only partitions encrypted by the Qubes backup system. Dom0 could also do incremental backups safely if we are able to implement sparsebundle disk images instead of the current disk image format used with VMs.


cprise

unread,
Oct 30, 2013, 6:09:21 PM10/30/13
to qubes...@googlegroups.com

On 10/24/13 11:25, Steve Coleman wrote:
> On 10/23/13 21:45, cprise wrote:
>> Why not have each VM back itself up, using something that can do
>> increments like 'rsnapshot'? Its available in the Fedora repo.
> That is exactly what I was attempting in my prior discussion, but while
> trying to make use of security features provided by Qubes already.
>
>> If Qubes used disk image bundles that were similar to OS X
>> sparsebundles, then you could do minimal incremental backups on the disk
>> images without the risk of having to mount them.
> You still have the problem of where the data gets stored irrespective of
> the file format you are storing. It makes no sense to store it within
> that AppVM's own space. Unless you have a dedicated physical block
> devices to permanently attach to each individual AppVM then you are
> subjecting each AppVM to corruption and possible malware threats from
> other rouge or subverted AppVM's. It only takes one bad apple to spoil
> the whole bunch so to speak.

Sorry I did not see your response earlier. FWIW, my first paragraph was
actually an entirely different suggestion... an off-hand "keep it
simple" architecture solution that leaves it up to users to install and
setup backup software for each VM, and attach backup disks to VMs as
needed. I.e., the "do nothing" to Qubes approach.

The second paragraph is suggesting something entirely different. The
message I posted to this thread a little while ago explains my second
idea more clearly. In short, have a sparsebundle block device driver
upon which all VM root and home volumes are based on. Then you can have
Dom0 safely do incremental backups the same way Apple does it with
encrypted Filevault home directories.

My view is that its enough to simply have a backup drive that is
encrypted by the Dom0 backup process. Accessing this volume from Dom0 at
any point in the future would be safe and secure.

> My attempt earlier in this thread was to create dedicated instances of a
> secure DispVM server which then maintains the control of a file space,
> but then partitioned that space in a way that no AppVM can corrupt or
> tamper with any other AppVM's data. Neither side need trust the other
> except for the communications channel between them, which would thus
> need to be security audited.
>
> It seems to me that all that is really needed is a 'virtual encrypted
> block device' that can be mounted by its client as required. If an AppVM
> could simply mount the equivalent of a chrooted directory of a physical
> disk, as a predefined virtual block device, and have its contents
> physically stored as an encrypted filesystem pre-keyed for that
> particular AppVM, then clients would not need to do anything special in
> order to support almost any offline storage needs, including
> incrementals. Even HVM's should be able to make use if them so long as
> it looks like a regular block device.
>
>
>
> Steve
My initial thought is that we don't yet have Qubes VMs which operate
very reliably; chaining them together to execute backups sounds problematic.


Reply all
Reply to author
Forward
0 new messages