Qubes vm.swappiness=0

Patrick Schleizer

unread,

Nov 28, 2016, 5:44:17 PM11/28/16

to qubes...@googlegroups.com, Patrick Schleizer

Would setting

/etc/sysctl.d/swaplow.conf
vm.swappiness=0

in Qubes by default make sense?

If not effective at all, why is it not required?

Cheers,
Patrick

drew....@gmail.com

unread,

Nov 28, 2016, 7:01:28 PM11/28/16

to qubes-devel, adre...@riseup.net, patrick-ma...@whonix.org

Question you have to ask yourself. : Do you have enough RAM to not need swap?

Personally, I have no swap space.

Patrick Schleizer

unread,

Nov 28, 2016, 7:38:33 PM11/28/16

to drew....@gmail.com, qubes-devel, Patrick Schleizer, patrick-ma...@whonix.org

drew....@gmail.com:

I should have mentioned before: It doesn't deactivate swap. It just lets
Linux swap as little as possible. Still swaps if really required.

So I was wondering if that makes a good default setting for Qubes.

Personally I have enough RAM and I am using this. Even using this on
lower RAM systems of mine.

Chris Laprise

unread,

Nov 29, 2016, 5:52:27 AM11/29/16

to Patrick Schleizer, qubes...@googlegroups.com, Patrick Schleizer

I think it would depend on the processing profile of your software. If
the system loads a lot of seldom-executed code and data, then default or
higher swappiness may be a benefit. But a leaner system would call for
lower values.

Of course, there is a difference between dom0 and domU swapping... the
latter is much slower.

What is a mystery to me is how the interplay between buffers and swap is
accounted for in Qubes memory management. When memory is full (according
to Xen) the way demand/supply plays out appears suboptimal.

Chris

Patrick Schleizer

unread,

Nov 30, 2016, 8:42:10 AM11/30/16

to Chris Laprise, qubes...@googlegroups.com, Patrick Schleizer

Chris Laprise:

I also forgot to add I had this in mind for domU only since memory
assignment is dynamic anyhow.

Vít Šesták

unread,

Dec 5, 2016, 2:01:27 AM12/5/16

to qubes-devel

My experience with swapiness (on non-Qubes): When having high swapiness, it uses swap more than actually needed, sligtly slowing the system down. OTOH, when having swapiness=0, it runs smoothly until RAM limit is reached. Then, it suddenly freezes for few minutes.

Qubes doesn't look like designed to be good at properly using swap. Unless you have enough RAM to run almost without swap, you are going to have various issues, including non-ability to start a new VM. So, I it makes sense to have enough RAM with swaps used as rarely as possible (i.e. swapiness=0). I don't recommend not having swap at all, because this can lead to frequent OOMs, especially of X11. I remember having some X11 crashes when not having swap, and they were more than rare.

Things might have changes, but as far as I remember, Qubes memory balancing works like this:

memory_required = ram_used + swap_used
k = 1.3
memory_assigned = k * memory_required

This is one of reasons why Qubes is not good at excessive swapping* and why swapiness > 0 in domU does not make much sense: When you move something to swap, Qubes still tries to assign RAM for it. AFAIR, the argument for this behavior was that qmemman has some latencies, so when system needs more RAM, it might be too late to request it.

On one hand, I believe this system could be improved to require less RAM. On the other hand, even a simple modification in the formula would require non-trivial testing under various load. If someone wants to offer a modification and research about it, I can elaborate my ideas (handling swap separately and adding some caps), but I am not going to spend all the time required for testing and fine-tuning constants.

*) Well, no system works good at excessive swapping, but some systems are less bad at it.

johny...@sigaint.org

unread,

Dec 9, 2016, 12:41:00 PM12/9/16

to qubes-devel

> On Tuesday, 29 November 2016 09:44:17 UTC+11, Patrick Schleizer wrote:
>>
>> Would setting
>>
>> /etc/sysctl.d/swaplow.conf
>> vm.swappiness=0
>>
>> in Qubes by default make sense?
>>
>> If not effective at all, why is it not required?

Why do you thik it is not effective? I've played around with swappiness a
lot. It has the intended effect when I play with it, both in the vm's and
in dom0. (I wrote a handy little dom0 memory monitor utility that shows
more details on all VM's, that I'll hopefully post before long.)

> Question you have to ask yourself. : Do you have enough RAM to not need
> swap?
>
> Personally, I have no swap space.

I generally agree that is smart, especially in the VM's. Although it's
nice to have in case you get one process get memory-hungry; it's nice not
to have things randomly killed.

vm.swappiness *does* have an affect. With swappiness=0 swapping will only
happen when it really needs to; when cache/buffers have been taken back
and used, and there's just no more ram to allocate.

With higher swappiness values, infrequently-used data will be swapped out
to disk, even when it isn't necessary. The higher the number, the more
aggressively this happens. With a high number, only pages in active use
stay in memory. Sounds good, but you shouldn't have a lot of inactive
stuff running in your VM's anyway.

With higher swappiness values, since the swapping isn't really necessary,
the result will be extra free ram in a VM, which Linux immediately starts
using for cache/buffers.

On bare hardware, this makes sense. Unused memory is simply wasted, might
as well cache some data and code there until we need it. It can't hurt.

*However*, inside a VM, it's stupid and wasteful to swap stuff out so you
can have more buffers/cache.

In fact, it's stupid and wasteful to even have buffers/cache inside a VM
at all. Any cached data will also be cached in dom0, doing the .img
reads.

And since the templates root.img file (for example) is shared between
multiple domU's, having that cached in dom0 gives you a noticable
performance boost.

Having a virtual disks's contents *also* cached inside the VM, is
redundant, wastes memory and CPU, and makes the whole memory management
thing more awkward. Having a block from root.img cached in dom0, as well
as every domU that uses that template, is nuts.

Qubes' memory manager deals with things as follows: for each VM, it
allocats what the VM indicates it is using, plus a "fudge factor" of 1.3x
(also known as the cache-margin-factor in /etc/qubes/qmmeman.conf, but I
think it's hard-coded elsewhere, ugh).

There's also a fudgey extra amount reserved for dom0, configurable in the
Qubes manager. Adding more memory to this is a roundabout away of
cranking up dom0's cache, if you wish.

The memory manager, also takes any leftover memory on top of of the VM's
usage (+fudge factor) and divvies that memory up amongst all the VM's.

That's not exactly optimal. Giving that extra memory to VM's that don't
really need it doesn't hurt anything memory-allocation-wise; if another VM
suddenly needs memory, VM's will give up the part they don't need. It'd
result in a bit of extra unnecessarily memory shuffling and CPU usage
during memory reallocations. I do notice the odd extra pause as memory
gets shuffled.

However, when qmemman bestows that extra memory on a Linux VM, Linux in
the VM will start using all of it for buffers/cache, which is redundant.
The memory would be better used in dom0 caching, or for another VM doing
something useful.

I've cranked my cache-margin-factor down to 1.1, and it helps performance
for me.

The cache-margin-factor also acts as a bit of a pre-allocated margin for
memory growth, without having to request a qubes memory manager reshuffle.
(I'm not super-thrilled with the somewhat implicit allocations of memory
to different key system purposes. It's like controlling the system via a
bunch of loose rubber bands. :) )

On a Virtual machine system, swapping makes very little sense. It's a
major performance killer, especially inside a VM.

About the main reason I keep it around is that if something does suddenly
take up memory, I'd rather critical processes not be killed. But if it's
ever in use, you need to look into it. (I have a dom0 utility which
reports on VM's swap space among other stuff. Will post it at some
point.)

There's an argument that swapping lets unused stuff migrate to disk and
not take memory (like X server data in sys-net, or whatever).

But unless you're intentionally running very bloated AppVM's that start a
bunch of unnecessary stuff (not a great idea), swapping out unused stuff
buys you very little.

I've turned to using small fixed-size VM's for most service VM's (with my
own no-gui service flag, to avoid loading qubes-gui-agent and the X
server). Same for pulseaudio, and other services that really don't need
to be everywhere.

That keeps the servicevm's from grabbing/releasing memory which only gets
used as a cache anwyays. Redundant, as mentioned, due to dom0's caching,
but also mostly unused since sys-net and friends don't do a lot of disk
activity.

I tried cranking up the swappiness in dom0, and it had the result of
swapping out unused stuff in dom0, like unused parts of the X server, idle
Guid and libvirt daemons, etc.

That did result in more memory for active dom0 caching and for VM usage,
which was a performance boost in general. It even let me start more VM's
that I used to be able to. (Although changing the cache-margin-factor
from 1.3 to 1.1 was far more helpful for creating more VM's.)

Of course, when you do access that seldom-used-VM's GUI or the little used
utilites/gui parts in dom0, you'll notice the delay as things get swapped
back in.

So high swappiness gives you faster performance/more memory for routine
stuff, with a penalty when you do the non-routine stuff. Makes sense.

I also tried running with 0 swappiness in dom0. Ran out of memory more
quikcly, and less memory was available for cache/buffers but other than
that, it was fairly unexciting.

Having swappiness at anything other than 0 for a VM (set in the template)
doesn't make a lot of sense. If you have large but unused programs/data
in your VM's, you're doing it wrong.

I've done some performance testing of buffered vs. non-buffered disk
access in a VM (hdparm), and there is no performance gain from the
buffers/cache inside the VM. In fact, some tests showed the cached
situation slower than non-cached, sometimes significantly so. Let dom0 do
the caching.

I think there are Xen features (such as Intellicache?) for doing domU's
caching in a shared way in dom0. That seems like a win-win. Has Qubes
considered looking at this feature?

In the short term, I'm going to try to find a way to disable cache/buffers
fully in a VM, rather than doing so indirectly by limiting its memory
allocation (cache-margin-factor-like).

Mounting with the "sync" option supposedly does this, but the standard
Qubes template as a device-mapper snapshot in there for root, so I need to
dig around a bit to find the right way to do it. The sync option is only
for certain file system types, too (not ext4 nor btrfs?).

I thought I had read somewhere that the virtual block drivers either could
disable cacheing, or did it by default, but I'm not sure if that's
actually true. Certainly doesn't seem to be the default. Anyone?

I'm pretty sure LVM lets you control cache-ability, and is the typical
mechanism that Xen installations use to limit VM caching. I think this
is the "proper" way to do it.

With Qubes 4.0 requiring thin LVMs (I think I read), this might be a good
step towards controlling VM cache-ability.

Unfortunately expect I will part ways with Qubes by then, hopefully
contributing a few things back soon. Qubes is getting more complex and
its hardware requirements are creeping up more quickly than I'd like.
Complexity is no friend of security.

Layers of libvirt and salt and whatever are more than required to for the
micro-kernelish features of Qubes dom0, and simply add vastly more attack
surface to the system. libvirt/salt/etc. might give additional
portability to different virtualization systems, but there's too much of a
cost in terms of size, complexity, attack surface, performance,
reliability. I have to do a "systemctl restart libvirtd" about once a
day. Feh.

Qubes has already shown that it will trade off portability (less hardware
support) in achieving additional security. Increasing complexity and more
layers and more components works against that approach, imho.

I've spent a lot of time learning about Qubes, Xen, and brushing up on
Linux, but a lot of the time I was still wasn't quite sure what is going
on under Qubes' hood. (Hence writing a few utilities, like the dom*
memory monitor, to get a better insight.)

Xen's complexity itself (1/2 the LOC of Linux) doesn't help.

My dom0 currently has 283 processes running; in my ideal setup, dom0
should really have about *one* process running, a Qubes API manager that
provides all of its services in a central place, all via VCHAN.
Everything else, including providing most of those actual services,
should be done by a VM.

Qubes is a MicroKernel-wannabe, let's make it explicit. :)

I have a bit of an essay on "what Qubes is" that I'll post a link to when
I finish it; it kind of grew out of hand. Basically, I see Qubes as
taking Linux's unmatched hardware support, and Linux/Windows unmatched
application support, and making those features run as if on (or part of) a
microkernel desktop system. And presenting it in a nice GUI.

Device drivers, applications, entire operating systems are effectively
reduced to a microkernel-like "user level privilege" through
virtualization, with the Microkernel (Qubes dom0) providing basic
launching, memory, etc., via a simple and robust IPC (message passing)
mechanism, VCHAN. (Which it turns out the Qubes folks invented; who knew?)

It's a round-about way to trick Linux (and Windows) apps and drivers into
running on a Microkernel(-like) architecture, with the associated security
benefits. There are better ultimate ways to achieve this, such as a real
microkernel, but Qubes is a top-notch transitional tool in the meantime.

(L4 instead of Xen would be a huge step in the right direction. Far
simpler, far faster; a real modern Microkernel. Moving Qubes to L4 would
be no small task, although the L4Linux folks have Linux booting under L4
with Networking/graphics. And it boots *fast*. But my digression has
spawned its own digressions.)

I was disappointed that 4.0 likely wouldn't work on any hardware I have;
but it turns out, I didn't have to wait for 4.0 to experience that. A
recent Qubes update is resulting in Xen/Vmlinux crashes. I'm migrating
towards my own setup that borrows from so many great ideas of Qubes, but
is far leaner and less complex. (I will share my experiences with the
list.)

-d

Chris Laprise

unread,

Dec 9, 2016, 1:52:39 PM12/9/16

to johny...@sigaint.org, qubes-devel

On 12/09/2016 12:40 PM, johny...@sigaint.org wrote:
*However*, inside a VM, it's stupid and wasteful to swap stuff out so you
> can have more buffers/cache.
>
> In fact, it's stupid and wasteful to even have buffers/cache inside a VM
> at all. Any cached data will also be cached in dom0, doing the .img
> reads.
>
> And since the templates root.img file (for example) is shared between
> multiple domU's, having that cached in dom0 gives you a noticable
> performance boost.
>
> Having a virtual disks's contents *also* cached inside the VM, is
> redundant, wastes memory and CPU, and makes the whole memory management
> thing more awkward. Having a block from root.img cached in dom0, as well
> as every domU that uses that template, is nuts.

I have long wondered about this, but assumed that Xen project would have
worked-out the right balance long ago.

However, there is a trade-off. Relying more on dom0 for caching makes
the caching slower as its now copied between VMs. Conversely, swapping
in domUs is _extra_ expensive.

Adding to the complexity of the situation is that root.img and
private.img will respond very differently to domU-vs-dom0 caching, where
private.img access benefits more from domU cache.

I'm glad all this is being discussed.

But I think limited swapping in vms can be good--if it frees up memory
for running processes in other vms.

>
> About the main reason I keep it around is that if something does suddenly
> take up memory, I'd rather critical processes not be killed. But if it's
> ever in use, you need to look into it. (I have a dom0 utility which
> reports on VM's swap space among other stuff. Will post it at some
> point.)
>
> There's an argument that swapping lets unused stuff migrate to disk and
> not take memory (like X server data in sys-net, or whatever).
>
> But unless you're intentionally running very bloated AppVM's that start a
> bunch of unnecessary stuff (not a great idea), swapping out unused stuff
> buys you very little.

This is where I have a problem. Qubes is a desktop system where most
users aren't expected to be manually ensuring that all their running
code is "tight". There will have to be plenty of niceties on hand and
techies often refer to this as "bloat".

> I've turned to using small fixed-size VM's for most service VM's (with my
> own no-gui service flag, to avoid loading qubes-gui-agent and the X
> server). Same for pulseaudio, and other services that really don't need
> to be everywhere.
>
> That keeps the servicevm's from grabbing/releasing memory which only gets
> used as a cache anwyays. Redundant, as mentioned, due to dom0's caching,
> but also mostly unused since sys-net and friends don't do a lot of disk
> activity.

I do this, too. I would even recommend restricting netvms and proxyvms
to 250MB or so.

> I tried cranking up the swappiness in dom0, and it had the result of
> swapping out unused stuff in dom0, like unused parts of the X server, idle
> Guid and libvirt daemons, etc.
>
> That did result in more memory for active dom0 caching and for VM usage,
> which was a performance boost in general. It even let me start more VM's
> that I used to be able to. (Although changing the cache-margin-factor
> from 1.3 to 1.1 was far more helpful for creating more VM's.)

Hmmm.... Where exactly? Can you post a howto?

> Of course, when you do access that seldom-used-VM's GUI or the little used
> utilites/gui parts in dom0, you'll notice the delay as things get swapped
> back in.
>
> So high swappiness gives you faster performance/more memory for routine
> stuff, with a penalty when you do the non-routine stuff. Makes sense.
>
> I also tried running with 0 swappiness in dom0. Ran out of memory more
> quikcly, and less memory was available for cache/buffers but other than
> that, it was fairly unexciting.
>
> Having swappiness at anything other than 0 for a VM (set in the template)
> doesn't make a lot of sense. If you have large but unused programs/data
> in your VM's, you're doing it wrong.

"Unused" by what metric? If something is executed twice in a 10min
session, what then?

>
> I've done some performance testing of buffered vs. non-buffered disk
> access in a VM (hdparm), and there is no performance gain from the
> buffers/cache inside the VM. In fact, some tests showed the cached
> situation slower than non-cached, sometimes significantly so. Let dom0 do
> the caching.

I don't believe this would hold true for private.img.

>
> I think there are Xen features (such as Intellicache?) for doing domU's
> caching in a shared way in dom0. That seems like a win-win. Has Qubes
> considered looking at this feature?

Would be prudent to search for security risks associated with that.

Chris

johny...@sigaint.org

unread,

Dec 9, 2016, 6:00:58 PM12/9/16

to qubes-devel

>> Having a virtual disks's contents *also* cached inside the VM, is
>> redundant, wastes memory and CPU, and makes the whole memory management
>> thing more awkward. Having a block from root.img cached in dom0, as
>> well
>> as every domU that uses that template, is nuts.
>
> I have long wondered about this, but assumed that Xen project would have
> worked-out the right balance long ago.

Well, I think the Xen project has the preferred way of doing with with
LVM, and Qubes broke away from that a bit using .img files instead of
lvm's to host the VM's. But 4.0 is going to require thin LVM, no? Or is
that just for dom0?

I'm a bit of an unusual case, working on a somewhat underpowered machine.
But that is not necessarily a bad thing; stuff that breaks more easily
when memory or other resources are low, should be spotted and fixed, as
they are broken regardless, and will likely bite others at some point (and
possibly more intermittently). Low memory brings out bugs.

Qubes dom0 gets *very* cranky if it doesn't have enough memory. It
doesn't handle low memory gracefully at all. In the memory manager you
see a fudge factor % added to each VM, and then a constant fudge factor
added on, and a miminum VM size used, and then another fudge factor boost
added to dom0, etc..

It almost feels that the numbers were chosen (extra dom0 memory boost, for
example) by cranking up the number until things worked well, rather than
knowing why things stumbled with a bit less memory. Just my opinion from
a low memory system, and poking around the source a bit.

I found there were places I could save a lot of memory (cache-margin of
1.1 instead of 1.3) without any ill effect; and other areas that if you
tweaked things (like a very low dom0 memory boost) the system just
stumbled and got unusable.

> However, there is a trade-off. Relying more on dom0 for caching makes
> the caching slower as its now copied between VMs.

You'd think, wouldn't you? But in practice (well, in 10 minutes of
testing), the dom0 cache was as fast, or faster than the VM cache.
Perhaps more layers of caching involves more administrative code to run,
and maybe the VBD layer is efficient enough that the additional calls
aren't that big a deal.

> Conversely, swapping in domUs is _extra_ expensive.

Swapping is way worse than anything related to cache/buffers, for sure.
Swapping in a VM hurts way more than (modest) dom0 swapping. But yeah,
when dom0 gets swapping too much, it's a bad thing. Important requests
(such as for more memory, or keyboard/mouse I/O) might get delayed due to
deadlocks from swapping, and the whole system feels terrible.

> Adding to the complexity of the situation is that root.img and
> private.img will respond very differently to domU-vs-dom0 caching, where
> private.img access benefits more from domU cache.

The snapshot thing adds yet another layer of complexity, as well.

Every VM's volatile.img is (in doing a copy-on-write snapshot thing)
keeping track of the differences between the template's root.img and the
current state of its root file system.

Say you have 5 VM's all opened based upon debian-8's root.img. They're
all tracking diff's against (a snapshot of) root.img. Thankfully the root
shouldn't change too much during normal operations.

>> On a Virtual machine system, swapping makes very little sense. It's a
>> major performance killer, especially inside a VM.
>
> I'm glad all this is being discussed.
>
> But I think limited swapping in vms can be good--if it frees up memory
> for running processes in other vms.

I kind of bounced around on the issue. Went from no-swapping-allowed! to
trying heavy swapping; hey, if it ain't being used, don't keep it in
memory.

But personally, my VM's aren't long-lived enough to let things swap out
and stabilize. For someone who keeps VM's opened for long periods, the
benefit could be greater in having a higher vm.swappiness.

(There's something about the templates being so clean and shiny that makes
frequent VM reboots soothing to me, from a security standpoint. That may
just be my autistic side, tho.)

>> But unless you're intentionally running very bloated AppVM's that start
>> a
>> bunch of unnecessary stuff (not a great idea), swapping out unused stuff
>> buys you very little.
>
> This is where I have a problem. Qubes is a desktop system where most
> users aren't expected to be manually ensuring that all their running
> code is "tight". There will have to be plenty of niceties on hand and
> techies often refer to this as "bloat".

I hear ya, for sure. The great thing about Qubes is that it makes setting
up and configuring things a relative breeze. I had never successfully
configured a Xen VM, until I started using Qubes. (Now, I'm crafting my
own, but I'd never be here without Qubes. Xen has a big learning curve on
its own.)

So yes, users shouldn't in general have to worry about how much memory is
going where and such. As simplistic as qmemman's algorithm is, it serves
exactly that purpose, of shuffling the memory around to wherever the
user's applications require it.

However, there are some things we can do to be smart. For example, a
little Sound toggle icon in the manager next to each VM (or in a config
page) would let the user pick which VM's need sound. Most of them don't,
and it's a huge security risk (IMHO) to allow it by default.

Same with all of cups and samba and stuff. They're all installed and
running by default, presenting a security risk (or at least greater attack
surface), and they're not easy to turn off unless you know what you'd
doing.

My homebrew "Qubes-Lite" has a very small Linux installations for the
service VM's. I think that's reasonable for service VM's and their
nature.

The template-vm (equivalent) is stocked up fully with many, many packages,
so I can run the programs I want when I want. But I turn off most
systemctl startups. I don't need all those daemons running in every child
VM. It's a waste of memory and a security risk.

>> That keeps the servicevm's from grabbing/releasing memory which only
>> gets
>> used as a cache anwyays. Redundant, as mentioned, due to dom0's
>> caching,
>> but also mostly unused since sys-net and friends don't do a lot of disk
>> activity.
>
> I do this, too. I would even recommend restricting netvms and proxyvms
> to 250MB or so.

Agreed. 256MB is what I use if the VM is going to have X going. 128mb
works fine if not. (But if X then somehow starts up anyway, it gets
unpleasant.)

(One of the few reasons you'd want X ability in sys-net or sys-firewall,
might be for the network manager applet, nm-applet. If you're a
command-line guy like me and don't want the GUI, but want the network
manager functionality, check out "nmcli." It's a real gem..)

>> That did result in more memory for active dom0 caching and for VM usage,
>> which was a performance boost in general. It even let me start more
>> VM's
>> that I used to be able to. (Although changing the cache-margin-factor
>> from 1.3 to 1.1 was far more helpful for creating more VM's.)
>
> Hmmm.... Where exactly? Can you post a howto?

I've been meaning to, on a few different topics. But for now, short version:

Obviously, you're voiding your warrantee by tampering with the parameters,
so if your system gets stupid, don't complain on the list until you try
with the settings changed back. :)

In /etc/qubes/qmemman.conf, change "cache-margin-factor = 1.3" to "1.1"
instead. Or whatever.

You can restart the memory manager to have this take effect, with "sudo
systemctl restart qubes-qmemman"

Doing stupid things like setting the value to < 1.0 and turning off swap
will guarantee you ill effects.

The Qubes manager only shows the memory actually allocated to the VM, not
how much it has requested/needs. I have a text utility which shows all of
these, and have started turning it into a live dom0/domU memory monitor.
I'll try to post at least the text version in the next day or two.

(I was wrong about that value being hard-coded elsewhere; it was just that
when I used the same Qubes Python API to get at the memory information,
the config value wasn't loaded from the file by the library; I had to do
it myself, like qmemman_server.py does.)

>> Having swappiness at anything other than 0 for a VM (set in the
>> template)
>> doesn't make a lot of sense. If you have large but unused programs/data
>> in your VM's, you're doing it wrong.
>
> "Unused" by what metric? If something is executed twice in a 10min
> session, what then?

I guess it depends upon the person and the system. If they don't mind a
slight delay when the do something every ten minutes, crank up the
swappiness. If you only want any interface pauses (from swapping back in)
when absolutely necessary, crank swappiness down. But you might not be
able to launch as many VM's. It's all a balance. :)

I know I have limited memory, so in order to do more simultaneous things,
I'm going to have to accept that infrequently used things might suffer
from the odd delay.

In digging through the memory manager, I actually want it to do *more* for
the user and not less (while being more flexible), so I think we're in
agreement about the user experience as simple as possible.

>> I've done some performance testing of buffered vs. non-buffered disk
>> access in a VM (hdparm), and there is no performance gain from the
>> buffers/cache inside the VM. In fact, some tests showed the cached
>> situation slower than non-cached, sometimes significantly so. Let dom0
>> do
>> the caching.
>
> I don't believe this would hold true for private.img.

Fair enough. But the penalty for dom0 caching even in that case is small
enough that I think it may warrant the simplicity of keeping caching out
of the VM's. An unshared cache page in dom0 takes no more resources than
an unshared cache page in a VM. In fact, dom0 is gonna cache it anyways,
so why duplicate the effort?

(Unless you disable dom0's caching, but that seems like a terrible,
terrible idea. :) )

>> I think there are Xen features (such as Intellicache?) for doing domU's
>> caching in a shared way in dom0. That seems like a win-win. Has Qubes
>> considered looking at this feature?
>
> Would be prudent to search for security risks associated with that.

That was the first thing that crossed my mind, having a shared memory page
accessible by dom0 and one or more VM's. Just opens the door for more
potential border cases and buffer overflows.

I would imagine it certainly adds complexity and configuration issues and
so on.

On the other hand, Qubes (with Xen's help) is indeed solidly in the
business of shuffling memory around between VM's, and making sure it does
so securely.

And it does already hand out shared pages for the X server to blast bits
to the gui daemons. Adding shared cache pages might not be a huge stretch
if done carefully and smartly. I'll have to learn more about Intellicache
and other options.

Qubes devs "rolled their own" (God love them) when it came to a GUI
system, sound, remote execution, rpc, file copying, and even created the
underlying vchan mechanism they're all based upon. So even that's a
possibility.

I'd love to see code pages shared between AppVm's instances. Even the
kernel (especially the kernel) could benefit hugely from that. If done
carfefully.

Basing all of its service upon its vchan is a key part of Qubes' security
win, using a Microkernel-ish IPC mechanism (mostly with fixed-sized
messages) instead of the TCP/IP stack. It's what lets dom0 go
networkless.

At the end of the day, it's kind of sad how we got here.

Vchan is a win because we can't trust our networking stacks, or the
operating systems they run upon, or the hardware that it uses, from either
bugs or compromise.

Virtualization and isolation of device drivers and applications is a win
for the same reasons, as well as not being able to trust the applications
themselves from bugs, inappropriate permissions, or intentional
compromise.

If the hardware, drivers, operating systems, and applications were all
designed properly, validated, signed, yadda yadda yadda, Qubes would be
unnecessary, and just a curiosity.

Ideally, it'd be nice to have a secure 64-bit Microkernel operating that
could run all applications you need and support all the hardware you need
it to. Qubes, kinda, sorta, is that. Or as close as we can come today.

I could see it evolving towards eliminating some of the awkward layers
(such as whole operating system instances to isolate a questionable app or
device driver) and become more and more of a pure microkernel. Especially
if it ever makes the jump (or forks to) something like L4.

I'm not sure I'll ever have full confidence in anything as complex as Xen.

-d

Marek Marczykowski-Górecki

unread,

Dec 9, 2016, 8:43:20 PM12/9/16

to johny...@sigaint.org, qubes-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Fri, Dec 09, 2016 at 10:57:25PM -0000, johny...@sigaint.org wrote:
> >> Having a virtual disks's contents *also* cached inside the VM, is
> >> redundant, wastes memory and CPU, and makes the whole memory management
> >> thing more awkward. Having a block from root.img cached in dom0, as
> >> well
> >> as every domU that uses that template, is nuts.
> >
> > I have long wondered about this, but assumed that Xen project would have
> > worked-out the right balance long ago.
>
> Well, I think the Xen project has the preferred way of doing with with
> LVM, and Qubes broke away from that a bit using .img files instead of
> lvm's to host the VM's. But 4.0 is going to require thin LVM, no? Or is
> that just for dom0?

4.0 by default will use LVM thin for VM images too.

(...)

> >> On a Virtual machine system, swapping makes very little sense. It's a
> >> major performance killer, especially inside a VM.
> >
> > I'm glad all this is being discussed.
> >
> > But I think limited swapping in vms can be good--if it frees up memory
> > for running processes in other vms.
>
> I kind of bounced around on the issue. Went from no-swapping-allowed! to
> trying heavy swapping; hey, if it ain't being used, don't keep it in
> memory.

Reading the whole thread, I think setting vm.swappiness=0 in VMs by
default makes sense.

Side note: qmemman (or actually: meminfo-writer - process in VM
reporting memory usage) consider used swap as used memory. So swapping
something out does not free that memory for other VMs, only to other
stuff in the same VM.
This may be considered a bug, but without this, if something land in
swap, it would stay there forever and even when you want to use that
data, something else would need to be swapped out. It somehow amplifies
swapping.

(...)

> So yes, users shouldn't in general have to worry about how much memory is
> going where and such. As simplistic as qmemman's algorithm is, it serves
> exactly that purpose, of shuffling the memory around to wherever the
> user's applications require it.

Reading your previous mail I think one additional knob would make sense:
whether to redistribute unused memory across all the VMs. On one hand it
lower the risk of swapping (when some application quickly request a lot
of memory), but on the other hand, it slow down requests for memory
(because you first need to take such memory from other VMs).

> However, there are some things we can do to be smart. For example, a
> little Sound toggle icon in the manager next to each VM (or in a config
> page) would let the user pick which VM's need sound. Most of them don't,
> and it's a huge security risk (IMHO) to allow it by default.

> Same with all of cups and samba and stuff. They're all installed and
> running by default, presenting a security risk (or at least greater attack
> surface), and they're not easy to turn off unless you know what you'd
> doing.

I think you may like to read qvm-service manual page. Some of those
services are easy to disable. For example CUPS is enabled by default
intentionally, to have printing working out of the box (ok, in default
setup it works only if you configure it in the template, because of
non-persistency of /etc/cups...). Having it (and others) disabled would
pose another barrier for new users...

BTW You've mentioned X-less VMs. How (if at all) you execute anything
there? qvm-run -p bash? xl console?
Care to share a patch? This is something we'd like to include in Qubes
4.0. While sys-net do use X server (nm-applet), sys-firewall, sys-usb
and few others may serve its purpose without X server.
I'd like to get rid of X server in gpg VMs (I have a lot of them...),
but currently it isn't possible because split-gpg ask for confirmation,
and also show notification of key usage.
The later could be solved #889[1]. Any though on this, especially the
former? Service similar to #889? I don't like duplicating the whole
desktop environment API in inter-VM services...

> My homebrew "Qubes-Lite" has a very small Linux installations for the
> service VM's. I think that's reasonable for service VM's and their
> nature.

There are minimal templates for this purpose.

(...)

> >> I think there are Xen features (such as Intellicache?) for doing domU's
> >> caching in a shared way in dom0. That seems like a win-win. Has Qubes
> >> considered looking at this feature?
> >
> > Would be prudent to search for security risks associated with that.
>
> That was the first thing that crossed my mind, having a shared memory page
> accessible by dom0 and one or more VM's. Just opens the door for more
> potential border cases and buffer overflows.

Xen do provide something like this called tmem. And it is explicitly
excluded from security support[2][3], among other things for this reason.

> I would imagine it certainly adds complexity and configuration issues and
> so on.
>
> On the other hand, Qubes (with Xen's help) is indeed solidly in the
> business of shuffling memory around between VM's, and making sure it does
> so securely.
>
> And it does already hand out shared pages for the X server to blast bits
> to the gui daemons. Adding shared cache pages might not be a huge stretch
> if done carefully and smartly. I'll have to learn more about Intellicache
> and other options.

I'm not sure what Intellicache is, but handling any kind of shared
memory should be very careful. In case of GUI, the worst thing that can
happen is some artifacts on the screen, inside VM window (which actually
have happened[4]). Because both sides (and especially dom0) treat such
memory as untrusted. But using shared memory for any kind of caches,
code pages, etc is a huge risk, because something is going to parse that
data and use in non-trivial way. Any bug in such mechanism may be fatal
to the security of the whole system.

(...)

[1] https://github.com/QubesOS/qubes-issues/issues/889
[2] https://wiki.xenproject.org/wiki/Xen_Project_Release_Features
[3]
https://lists.xen.org/archives/html/xen-announce/2012-09/msg00006.html
[4] https://github.com/QubesOS/qubes-issues/issues/1028

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJYS12zAAoJENuP0xzK19csKdUIAI5zQx4NRi++1qQQaua02iyK
Lc9WHDw9vSGwuoXj9E0K3GsMn4KaZlBK6OC42cmYmA5+f1aryFuErA9XWDuEIyvf
rFF4/i8DX32y5Esd5PFHfkvRCVBQDCCgVLprxn5GXZJzmux7ooSMcZy2cyBOEJFI
x+wO3xzCUm8EuyLoDSyQ1xoYAHiblORC8loKFzqISjuVoF9CJgb7VokRvi4H1HhZ
c8tlg0XBUNgv5m8tF2UWKmOVWO4JMfXwZJZMJ/g2UNJwrFjZsPOCLooIvTK8wrLv
NQbp7jJuCJV6cujcD4yzxM91SSXS9p5rVNLFULd0Jj5DSwiyycBAlSMdk0LppSo=
=k9g8
-----END PGP SIGNATURE-----

Chris Laprise

unread,

Dec 10, 2016, 12:22:46 AM12/10/16

to Marek Marczykowski-Górecki, johny...@sigaint.org, qubes-devel

I'm going to try vm.swappiness=15 in my debian vms, and probably leave
dom0 as the default for now. In an 8GB system, I'll probably notice some
difference before long.

Chris

Vít Šesták

unread,

Dec 10, 2016, 7:54:17 AM12/10/16

to qubes-devel

Just an idea: Is there a good reason why the he extra RAM assigned to the VM is calculated by a ratio? Why it is not a constant?

AFAIU, the reason for the extra RAM is to allow the VM to use the RAM before qmemman+Xen assign more memory (i.e. to prevent OOM in such case). The memory management has some latency (AFAIK up to 0.1s of latency is caused by polling, plus there might be some additional latency) which should be rather constant. Within this constant time, the processes can allocate at most some constant amount of memory (because allocating memory requires at least linear amount of time). This is the reason for constant addition instead of linearity. Linearity is caused by the need of zeroing the RAM before actually allocating it.

I am not sure how large the constant should be. Maybe we cannot reasonably satisfy the worst-case scenario of excessive RAM allocation – with fast RAM in dualchannel, it could require even about 0.5GiB of RAM. (RAM throughput estimate is based on https://www.quora.com/How-can-we-convert-RAM-speed-into-MBPS , faster RAM frequencies and dual channels.) This is too much. So, we maybe should rather use some more conservative estimate, maybe combined with some hysteresis ot with some heuristics (e.g. different strategy when swap is used).

The issue with fast RAM sounds interesting. In general, having a faster machine could imply troubles caused by allocating memory too fast. It would be ideal to have a kernel module or kernel patch that could handle such situation well (1. pause all processes, 2. notify dom0, 3. wait for more memory, 4. continue). I know this is non-trivial – not only for the kernel-mode programming, but also for lowlevel paradigms required when handling lowmem conditions. Maybe someone will suggest some alternative to this that would bring most of its benefits with a fraction of the effort.

But maybe all such situations would be handled by swap, which is not nice, but somehow works.

Some note on the latency: For a given polling the rate (currently AFAIK 100ms), the polling rate is lower bound for worst-case. When DomUs have a little extra memory, the worst-case latency could be closer to the lower bound.

I am not sure if swap has to be handled as a special case provided that we have vm.swappiness=0. For vm.swappiness>0, not counting swap as used memory could be truly disastrous. (When a threshold is reached and some RAM is subsequently proactively swapped, VM memory requirements would be lowered, which could trigger proactive swapping again, until the swap is full.) For vm.swappiness=0, I can see no such scenario. The VM would start swapping only if the RAM is trully full. Even if qmemman starts reporting slightly lower memory usage, some extra memory would be still assigned. With vm.swappiness=0, there are AFAIK only two scenarios of swapping:

a. VM cannot have more memory assigned (either because of memory limit or because of memory avaliable). In such case, the some extra memory assigned to the VM would prevent assigning less memory.
b. VM could have more memory assigned, but the memory management is too slow to do this on time. Less memory usage is reported and less memory is assigned. However, assigning more memory would not help there, as Linux AFAIK never unsewaps until the memory is actually used.

Due to some extra memory, it also would not harm unswapping. When some more memory is needed, the extra memory is used. Memory-writer then reports higher memory usage, so more memory is assigned (if allowed and available).

There seem to be just two cases where this approach causes some issues, both are IMHO acceptable:

a. Assigning less memory has allowed user to start more VMs. When some VM needs more memory later, the memory is not available. This one is inherent; It comes directly from the added ability to run more VMs.
b. My favourite hack sudo swapoff -a && sudo swapon -a would not work well. Maybe some alternative could be found, though.

Regards,
Vít Šesták 'v6ak'

Jean-Philippe Ouellet

unread,

Dec 10, 2016, 8:05:29 PM12/10/16

to johny...@sigaint.org, qubes-devel

On Fri, Dec 9, 2016 at 5:57 PM, <johny...@sigaint.org> wrote:
> I'd love to see code pages shared between AppVm's instances. Even the
> kernel (especially the kernel) could benefit hugely from that. If done
> carfefully.

I would definitely *NOT* like to see that! Trying to do memory
deduplication safely is has been shown by history to be truly a nasty
mine-field! Some examples below.

Using mem dedup to massively increase the reliability and
effectiveness of rowhammer-style attacks:
"Flip Feng Shui: Hammering a Needle in the Software Stack"
Abstract: We introduce Flip Feng Shui (FFS), a new exploitation vector
which allows an attacker to induce bit flips over arbitrary physical
memory in a fully controlled way. FFS relies on hardware bugs to
induce bit flips over memory and on the ability to surgically control
the physical memory layout to corrupt attacker-targeted data anywhere
in the software stack. We show FFS is possible today with very few
constraints on the target data, by implementing an instance using the
Rowhammer bug and memory deduplication (an OS feature widely deployed
in production). Memory deduplication allows an attacker to reverse-map
any physical page into a virtual page she owns as long as the page’s
contents are known. Rowhammer, in turn, allows an attacker to flip
bits in controlled (initially unknown) locations in the target page.
We show FFS is extremely powerful: a malicious VM in a practical cloud
setting can gain unauthorized access to a co-hosted victim VM running
OpenSSH. Using FFS, we exemplify end-to-end attacks breaking OpenSSH
public-key authentication, and forging GPG signatures from trusted
keys, thereby compromising the Ubuntu/Debian update mechanism. We
conclude by discussing mitigations and future directions for FFS
attacks.

Paper the above abstract is from:
http://www.cs.vu.nl//~herbertb/download/papers/flip-feng-shui_sec16.pdf
Various other publications at: https://www.vusec.net/projects/flip-feng-shui/

"CAIN: Silently Breaking ASLR in the Cloud"
Abstract excerpt: Memory pages with the same content are merged into
one read-only memory page. Writing to these pages is expensive due to
page faults caused by the memory protection, and this cost can be used
by an attacker as a side-channel to detect whether a page has been
shared. Leveraging this memory side-channel, we craft an attack that
leaks the addressspace layouts of the neighboring VMs, and hence,
defeats ASLR. Our proof-of-concept exploit, CAIN (Cross-VM ASL
INtrospection) defeats ASLR of a 64-bit Windows Server 2012 victim VM
in less than 5 hours (for 64-bit Linux victims the attack takes
several days). Further, we show that CAIN reliably defeats ASLR,
regardless of the number of victim VMs or the system load.

Full paper: https://www.usenix.org/system/files/conference/woot15/woot15-paper-barresi.pdf

"Memory Deduplication as a Threat to the Guest OS"
Abstract excerpt: Memory deduplication, however, is vulnerable to
memory disclosure attacks, which reveal the existence of an
application or file on another virtual machine. Such an attack takes
advantage of a difference in write access times on deduplicated memory
pages that are re-created by Copy-On-Write. In our experience on KSM
(kernel samepage merging) with the KVM virtual machine, the attack
could detect the existence of sshd and apache2 on Linux, and IE6 and
Firefox on WindowsXP. It also could detect a downloaded file on the
Firefox browser. We describe the attack mechanism in this paper, and
also mention countermeasures against this attack.

Full paper: https://dl.acm.org/citation.cfm?id=1972552
Non-paywall: https://staff.aist.go.jp/k.suzaki/EuroSec2011-suzaki.pdf
Slides: http://www.slideshare.net/suzaki/eurosec2011-slide-memory-deduplication
https://www.usenix.org/system/files/conference/woot15/woot15-paper-barresi.pdf

"Implementation of a Memory Disclosure Attack on Memory Deduplication
of Virtual Machines"
Various infoleak side-channels which would aid in subsequent
exploitation of a different VM.
https://www.researchgate.net/publication/270441749

"Security Implications of Memory Deduplication in a Virtualized Environment"
More analysis of the (rather obvious IMO) covert channel introduced by
mem dedup, (and also describing a less relevant scheme for kernel
integrity monitoring).
http://www.cs.wm.edu/~hnw/paper/memdedup.pdf

If I had veto-power on design decisions I would use it at this point.

> Basing all of its service upon its vchan is a key part of Qubes' security
> win, using a Microkernel-ish IPC mechanism (mostly with fixed-sized
> messages) instead of the TCP/IP stack. It's what lets dom0 go
> networkless.
>
> At the end of the day, it's kind of sad how we got here.
>
> Vchan is a win because we can't trust our networking stacks, or the
> operating systems they run upon, or the hardware that it uses, from either
> bugs or compromise.
>
> Virtualization and isolation of device drivers and applications is a win
> for the same reasons, as well as not being able to trust the applications
> themselves from bugs, inappropriate permissions, or intentional
> compromise.
>
> If the hardware, drivers, operating systems, and applications were all
> designed properly, validated, signed, yadda yadda yadda, Qubes would be
> unnecessary, and just a curiosity.

I find the irony amusing as well :)

> Ideally, it'd be nice to have a secure 64-bit Microkernel operating that
> could run all applications you need and support all the hardware you need
> it to. Qubes, kinda, sorta, is that. Or as close as we can come today.

+1

> I could see it evolving towards eliminating some of the awkward layers
> (such as whole operating system instances to isolate a questionable app or
> device driver) and become more and more of a pure microkernel. Especially
> if it ever makes the jump (or forks to) something like L4.

Man... that would be the day :)

This is 2016! Where's my genode on risc-v already!? Oh wait... all my
proprietary engineering tools the only run on x86 linux/windows :(

I think there is really something to be said for the Qubes value
proposition of actually supporting existing real-world workflows. (And
it does it quite well I might add!)

> I'm not sure I'll ever have full confidence in anything as complex as Xen.

+1

Jean-Philippe Ouellet

unread,

Dec 10, 2016, 8:09:56 PM12/10/16

to Vít Šesták, qubes-devel

On Sat, Dec 10, 2016 at 7:54 AM, Vít Šesták
<groups-no-private-mail--con...@v6ak.com>
wrote:

> It would be ideal to have a kernel module or kernel patch that could handle such situation well (1. pause all processes, 2. notify dom0, 3. wait for more memory, 4. continue).

I would be interested to try that too. If only we had infinite free time :)

Marek Marczykowski-Górecki

unread,

Dec 11, 2016, 9:10:18 PM12/11/16

to Vít Šesták, qubes-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Sat, Dec 10, 2016 at 04:54:17AM -0800, Vít Šesták wrote:
> Just an idea: Is there a good reason why the he extra RAM assigned to the VM is calculated by a ratio? Why it is not a constant?
>
> AFAIU, the reason for the extra RAM is to allow the VM to use the RAM before qmemman+Xen assign more memory (i.e. to prevent OOM in such case). The memory management has some latency (AFAIK up to 0.1s of latency is caused by polling, plus there might be some additional latency) which should be rather constant. Within this constant time, the processes can allocate at most some constant amount of memory (because allocating memory requires at least linear amount of time). This is the reason for constant addition instead of linearity. Linearity is caused by the need of zeroing the RAM before actually allocating it.

Linear factor is used as a heuristic - if you have 2 VMs - one using
500MB and another using 2500MB, there is much bigger chance that the
later one may request more memory (because you're probably running
Firefox/Chrome there ;) ).

In some cases it may make sense to split free memory equally, or even
not assign at all (keep unallocated). But I think the current default is
reasonable.

> The issue with fast RAM sounds interesting. In general, having a
> faster machine could imply troubles caused by allocating memory too
> fast. It would be ideal to have a kernel module or kernel patch that
> could handle such situation well (1. pause all processes, 2. notify
> dom0, 3. wait for more memory, 4. continue). I know this is
> non-trivial – not only for the kernel-mode programming, but also for
> lowlevel paradigms required when handling lowmem conditions. Maybe
> someone will suggest some alternative to this that would bring most of
> its benefits with a fraction of the effort.

I would say it is highly non-trivial. Especially the part "pause all
processes" (but still be able to communicate with dom0), then "continue"
and not introduce any deadlock or other problem.

> But maybe all such situations would be handled by swap, which is not
> nice, but somehow works.

Yes, this is why VMs have swap at all. But it should be last resort so
vm.swappiness=0 makes sense.

> Some note on the latency: For a given polling the rate (currently AFAIK 100ms), the polling rate is lower bound for worst-case. When DomUs have a little extra memory, the worst-case latency could be closer to the lower bound.
>
> I am not sure if swap has to be handled as a special case provided
> that we have vm.swappiness=0. For vm.swappiness>0, not counting swap
> as used memory could be truly disastrous. (When a threshold is reached
> and some RAM is subsequently proactively swapped, VM memory
> requirements would be lowered, which could trigger proactive swapping
> again, until the swap is full.) For vm.swappiness=0, I can see no such
> scenario. The VM would start swapping only if the RAM is trully full.

(...)

AFAIK even if vm.swappiness=0, if something is swapped out, it will
not be moved back to RAM unless needed (and then it may force something
else to be swapped out). So I think this problem still applies to some
degree.

> Even if qmemman starts reporting slightly lower memory usage, some
> extra memory would be still assigned.

(...)

> Due to some extra memory, it also would not harm unswapping. When some
> more memory is needed, the extra memory is used. Memory-writer then
> reports higher memory usage, so more memory is assigned (if allowed
> and available).

Hmm, this may indeed work. Worth some testing.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJYTgcEAAoJENuP0xzK19csXh0H/jOc4ZQiHcQg0yr0Jfcn6grt
BfNbcsG5BALVnBu5BH/X4v1qUJA6PkaI6PlxDs34FBf3/5ZThMrcmlxzEMmZkvTg
bu6WQwYEfyjGPybeHrpEi7PQ0EgCVQ+2zV1DqfvCVlAdRX4fDpJCD0YEV2MKZ0CW
ydqkaxeYLM/FKUwrachT2Ee6JxfBOTpIT4lG4kBualqjKhJ9+aYaz4b4OHmNeIte
lR6+FG9UWfWitvL1QNb7UNtKaIR2EV7c584kGoNk4Lmz7qHqtcg2uVcgbWhM0ch3
WXegkBV4yX2NBMIkAZtC/S4QUyfu8+lSmHJCAzKSYCfEcafdOW1VZLhU/8kfYu0=
=RgoT
-----END PGP SIGNATURE-----

Reg Tiangha

unread,

Dec 11, 2016, 9:20:14 PM12/11/16

to qubes...@googlegroups.com

On 12/11/2016 07:10 PM, Marek Marczykowski-Górecki wrote:

> Yes, this is why VMs have swap at all. But it should be last resort so
> vm.swappiness=0 makes sense.
>

FWIW, the Whonix templates have had vm.swappiness=0 included in them for
a while. I only noticed it was already there when I was applying the
setting to the rest of my template vms.

Chris Laprise

unread,

Dec 12, 2016, 1:22:39 PM12/12/16

to Marek Marczykowski-Górecki, Vít Šesták, qubes-devel

Just wanted to note there are various warnings against using
swappiness=0 as it can result in killed processes; swappiness=1 is
considered the minimum value to avoid this problem.

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-tunables.html
https://en.wikipedia.org/wiki/Swappiness

Chris

Vít Šesták

unread,

Dec 14, 2016, 3:11:05 PM12/14/16

to qubes-devel, groups-no-private-mail--con...@v6ak.com

Hello,

Linear factor is used as a heuristic - if you have 2 VMs - one using
500MB and another using 2500MB, there is much bigger chance that the
later one may request more memory (because you're probably running
Firefox/Chrome there ;) ).

Sounds mostly reasonable, maybe it could require something like req + max(50, min(400, 0.3*req)) MiB instead of 1.3*req. (Constants are just roughly considered.)

I would say it is highly non-trivial. Especially the part "pause all
processes" (but still be able to communicate with dom0), then "continue"
and not introduce any deadlock or other problem.

I agree.

AFAIK even if vm.swappiness=0, if something is swapped out, it will
not be moved back to RAM unless needed

I believe so. It corresponds to my observations, at least. It also makes sense most of the time.

(and then it may force something else to be swapped out). So I think this problem still applies to some degree.

Why it would do so? If there is some extra memory and vm.swappiness==0, I can't see a scenario for that (except those unrelated to swap).

Regards,
Vít Šesták 'v6ak'

Vít Šesták

unread,

Dec 14, 2016, 3:33:02 PM12/14/16

to qubes-devel, marm...@invisiblethingslab.com, groups-no-private-mail--con...@v6ak.com, tas...@openmailbox.org

Not nice. So, IIUC,
* we should have vm.swappiness=1 instead of vm.swappiness=0
* we should consider the (even tiny) swappiness in reasoning about the formula for required/assigned memory.

So, having enough memory that do not reach swappiness watermark requires multiplying the memory by 1/(1-swappiness/100). This might play well for swappiness=1 (factor 1.01, i.e. just 1% overhead), but it sounds strange (and counterproductive) for swappiness=50 (factor 2, i.e. 100% overhead).

So, when we have a memory requirements calculation function (In my previous post, I suggested f(req) = req + max(50, min(400, 0.3*req)) MiB, but I don't insist on this one), there are two variants:

a) Simple variant: Assume that there is no use case for large vm.swappiness in Qubes: f(usage_mem)/(1-swappiness/100)

b) Complex variant: Assume one might want higher swappiness, so add some upper limit: min(f(usage_mem+usage_swap), f(usage_mem)/(1-swappiness/100))

Does this sound reasonable?

Regards,
Vít Šesták 'v6ak'

Reply all

Reply to author

Forward