Software Rowhammer mitigations

63 views
Skip to first unread message

Demi M. Obenour

unread,
Jun 18, 2020, 9:49:23 PM6/18/20
to qubes-devel
Would it be possible to mitigate Rowhammer by ensuring that different VMs don’t use nearby rows?

Sincerely,

Demi

signature.asc

Marek Marczykowski-Górecki

unread,
Jun 18, 2020, 10:38:26 PM6/18/20
to Demi M. Obenour, qubes-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Thu, Jun 18, 2020 at 09:49:14PM -0400, Demi M. Obenour wrote:
> Would it be possible to mitigate Rowhammer by ensuring that different VMs don’t use nearby rows?

I'm pretty sure no hypervisor supports such thing (such specific
physical memory layout control). But also, it would be really hard to do
in a dynamic environment, where you release memory and then possibly
allocate to another VM. This would make memory allocator even more
complex. I guess if this approach would be practical practical, someone
would have implemented it already - yet, I see many research papers with
similar ideas, some from many years ago, and exactly zero real-life
working implementations.

Anyway, this is rather a question to relevant hypervisor developers (Xen
in this case).

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEhrpukzGPukRmQqkK24/THMrX1ywFAl7sJRoACgkQ24/THMrX
1yxOYAf7BCYuY4nPtLHzIBfmjujsEqCG0BDaCiIfS5pgDODHzYwjh/pV2OeYJAhP
Ymtel7/bmqDgjVMJ9CKBlr5fpGLy3AV8xzxr3Rpt6g7Y6WqwXoR4JYOVuw9M0xPh
CSWUBT17KfRh6PZQkZbJXcohqwTc2eiGNPtOUAmuHcboRh37ToPMsb4BxbeMIesn
0/jntaGamAze7BrarbkkV5nshBu8fqjQDn9CyWrS4gVFi05AMgwck9RZmWXzc4aa
ASC2k4PNOpoosr3GsRijRzg+g/VFrhJJhkww+T1OBXb9OjlS6nhuxurAS+gJsJCC
RCt0MPdGV2aCeBFkQPL833sAhzt4Og==
=ZngD
-----END PGP SIGNATURE-----

Demi M. Obenour

unread,
Jun 18, 2020, 11:40:14 PM6/18/20
to Marek Marczykowski-Górecki, qubes-devel
On 2020-06-18 22:38, Marek Marczykowski-Górecki wrote:
> On Thu, Jun 18, 2020 at 09:49:14PM -0400, Demi M. Obenour wrote:
>> Would it be possible to mitigate Rowhammer by ensuring that different VMs don’t use nearby rows?
>
> I'm pretty sure no hypervisor supports such thing (such specific
> physical memory layout control). But also, it would be really hard to do
> in a dynamic environment, where you release memory and then possibly
> allocate to another VM. This would make memory allocator even more
> complex. I guess if this approach would be practical practical, someone
> would have implemented it already - yet, I see many research papers with
> similar ideas, some from many years ago, and exactly zero real-life
> working implementations.
>
> Anyway, this is rather a question to relevant hypervisor developers (Xen
> in this case).
In the meantime, what does the Qubes Security Team recommend be done
to mitigate the risk of Rowhammer? ECC RAM is not available on most
laptops, and even it is only a partial mitigation. TRRespass showed
that TRR, at least as currently implemented, is not a complete
fix either.

Sincerely,

Demi

signature.asc

Marek Marczykowski-Górecki

unread,
Jun 19, 2020, 12:14:09 AM6/19/20
to Demi M. Obenour, qubes-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

For extremely sensitive tasks you can shutdown/pause other VMs. But
in practice a HVM/PVH VM have very few (if at all) ways of
learning/probing physical memory layout to perform the attack.
This is yet another reason to not use PV.
If I'm not mistaken, the only demonstrated cross-VM rowhammer-based
attack relies on memory-deduplication (across VMs), which we don't use.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEhrpukzGPukRmQqkK24/THMrX1ywFAl7sO4gACgkQ24/THMrX
1yy14Af/YoV4/X7A/jzo26ttMeL7cqyeTv6Bbg5fXh1vJ6Bvkc07a0M7OaxsITpL
OCvlNrO6sK/+hIQCAW6Zwm8DgwXSNQ5DPDV/FrveXGwH5aft07w6ky1npYZuAPVf
+a1mm8K/9FNq8r+Z8Wb1YnnWkhggpIYv+6w7/EWlCSF3KTceNmIqL7ow1grteeLO
Xt5evCfI7j7Pk4jDXI1DnHkXa/FabQCqmX9TxAXyZZV0OLSAZWXvv+WZz0oO7JOs
+0gM6FoDNyJegz7PYilujc/vY7BB0ZaiuwlDFAKrkM2B45dHJYk/84XdZdrzeaL2
Ifpq3uCheHmKXqliYoWaCRSll4J3mw==
=xoPm
-----END PGP SIGNATURE-----

Demi M. Obenour

unread,
Jun 19, 2020, 10:17:08 AM6/19/20
to Marek Marczykowski-Górecki, qubes-devel
On 2020-06-19 00:14, Marek Marczykowski-Górecki wrote:> For extremely sensitive tasks you can shutdown/pause other VMs. But
> in practice a HVM/PVH VM have very few (if at all) ways of
> learning/probing physical memory layout to perform the attack.
> This is yet another reason to not use PV.
> If I'm not mistaken, the only demonstrated cross-VM rowhammer-based
> attack relies on memory-deduplication (across VMs), which we don't use.

That is good to know!

1. Does that mean that I can safely purchase hardware (such as the
Insurgo PrivacyBeast) that is likely to be vulnerable to Rowhammer?
Do you (personally) consider Rowhammer when purchasing hardware?

2. Will we be able to get rid of PV stubdomains at some point?

Sincerely,

Demi

signature.asc

Jean-Philippe Ouellet

unread,
Jun 19, 2020, 7:21:02 PM6/19/20
to Marek Marczykowski-Górecki, Demi M. Obenour, qubes-devel
On Thu, Jun 18, 2020 at 9:14 PM Marek Marczykowski-Górecki
<marm...@invisiblethingslab.com> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> On Thu, Jun 18, 2020 at 11:40:07PM -0400, Demi M. Obenour wrote:
> > On 2020-06-18 22:38, Marek Marczykowski-Górecki wrote:
> > > On Thu, Jun 18, 2020 at 09:49:14PM -0400, Demi M. Obenour wrote:
> > >> Would it be possible to mitigate Rowhammer by ensuring that different VMs don’t use nearby rows?
> > >
> > > I'm pretty sure no hypervisor supports such thing (such specific
> > > physical memory layout control). But also, it would be really hard to do
> > > in a dynamic environment, where you release memory and then possibly
> > > allocate to another VM. This would make memory allocator even more
> > > complex. I guess if this approach would be practical practical, someone
> > > would have implemented it already - yet, I see many research papers with
> > > similar ideas, some from many years ago, and exactly zero real-life
> > > working implementations.

IIRC Google deployed something similar in upstream Android, to ensure
pages for DMA from userspace (which are marked uncachable, and thus
are more easily exploitable) don't end up physically adjacent to
kernel pages + I think also some set of pages reserved for processes
which were "sensitive / necessary for system integrity", or something
like that.

> > > Anyway, this is rather a question to relevant hypervisor developers (Xen
> > > in this case).
> > In the meantime, what does the Qubes Security Team recommend be done
> > to mitigate the risk of Rowhammer? ECC RAM is not available on most
> > laptops, and even it is only a partial mitigation. TRRespass showed
> > that TRR, at least as currently implemented, is not a complete
> > fix either.
>
> For extremely sensitive tasks you can shutdown/pause other VMs.

Not sure how helpful pausing would be. I'd expect not much since it'd
still be resident in mem. Even shutting down is suspect if you
consider hypervisor / dom0 integrity an attack vector.

> But
> in practice a HVM/PVH VM have very few (if at all) ways of
> learning/probing physical memory layout to perform the attack.

Err, I wouldn't be so sure about that. If one can point a speculative
read gadget at the page tables, then you win, and we should probably
expect a steady stream of speculative read gadgets over the next
however many years. As for finding the page tables: it's probably
possible to find *at least* one reliably-locatable `mov <anything>,
cr3`gadget to speculatively execute, especially on kernels/hypervisors
which have lots of code mapped after a ctx switch into privileged
mode, and perhaps *even* more on architectures which allow unaligned
instructions like x86.

On my system:
$ objdump -D vmlinux | grep '%cr3,' | wc -l
24
^ there are plenty

Furthermore, even if your privileged code is sufficiently hardened
against speculative bullshit, I wouldn't be surprised if there turned
out to be some additional hardware misfeature that could be abused to
leak physical layout directly. We have that already for virtual
addresses [1] (side-channel attacks against ASLR by timing TLB lookups
vs. page walk latency), which makes sense, and while I don't see any
direct analog for physical addresses (would look like a corresponding
reverse-lookup translation in hardware), it might be reasonable to
conjecture the plausibility of something that achieves the same goal
via an unrelated mechanism.

[1]: https://www.vusec.net/projects/anc/

> This is yet another reason to not use PV.
> If I'm not mistaken, the only demonstrated cross-VM rowhammer-based
> attack relies on memory-deduplication (across VMs), which we don't use.

Mmmm, I wouldn't bet on that. It's certainly _easier_ to determine
where the thing you want to flip is - and whether it's been
successfully flipped yet - if that page is also mapped to you, but
that is not strictly a necessary condition for actually flipping it.

Marek Marczykowski-Górecki

unread,
Jun 19, 2020, 7:59:53 PM6/19/20
to Jean-Philippe Ouellet, Demi M. Obenour, qubes-devel
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On Fri, Jun 19, 2020 at 04:20:40PM -0700, Jean-Philippe Ouellet wrote:
> On Thu, Jun 18, 2020 at 9:14 PM Marek Marczykowski-Górecki
> <marm...@invisiblethingslab.com> wrote:
> >
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA256
> >
> > On Thu, Jun 18, 2020 at 11:40:07PM -0400, Demi M. Obenour wrote:
> > > On 2020-06-18 22:38, Marek Marczykowski-Górecki wrote:
> > > > On Thu, Jun 18, 2020 at 09:49:14PM -0400, Demi M. Obenour wrote:
> > > >> Would it be possible to mitigate Rowhammer by ensuring that different VMs don’t use nearby rows?
> > > >
> > > > I'm pretty sure no hypervisor supports such thing (such specific
> > > > physical memory layout control). But also, it would be really hard to do
> > > > in a dynamic environment, where you release memory and then possibly
> > > > allocate to another VM. This would make memory allocator even more
> > > > complex. I guess if this approach would be practical practical, someone
> > > > would have implemented it already - yet, I see many research papers with
> > > > similar ideas, some from many years ago, and exactly zero real-life
> > > > working implementations.
>
> IIRC Google deployed something similar in upstream Android, to ensure
> pages for DMA from userspace (which are marked uncachable, and thus
> are more easily exploitable) don't end up physically adjacent to
> kernel pages + I think also some set of pages reserved for processes
> which were "sensitive / necessary for system integrity", or something
> like that.

Deployed? I've seen a bunch of research papers about similar ideas, but
quick google doesn't confirm existence of such thing in Android or
elsewhere.

> > > > Anyway, this is rather a question to relevant hypervisor developers (Xen
> > > > in this case).
> > > In the meantime, what does the Qubes Security Team recommend be done
> > > to mitigate the risk of Rowhammer? ECC RAM is not available on most
> > > laptops, and even it is only a partial mitigation. TRRespass showed
> > > that TRR, at least as currently implemented, is not a complete
> > > fix either.
> >
> > For extremely sensitive tasks you can shutdown/pause other VMs.
>
> Not sure how helpful pausing would be. I'd expect not much since it'd
> still be resident in mem.

Paused VMs can't really perform this kind of attack.

> Even shutting down is suspect if you
> consider hypervisor / dom0 integrity an attack vector.

Yes, attacking hypervisor / dom0 beforehand would be a risk, but again -
that's quite unlikely given very little knowledge/influence about memory
layout.

This BTW could be a good case for late-launch of DRTM, discussed on
recent Qubes-3mdeb minisummit. You could reload Xen (and perhaps dom0?)
to a known good, measured state and only then start your sensitive VM.
Live update of Xen is something in development already, and if we
disaggregate dom0 enough, reloading dom0 would also become much more
realistic (not simple, but realistic).

> > But
> > in practice a HVM/PVH VM have very few (if at all) ways of
> > learning/probing physical memory layout to perform the attack.
>
> Err, I wouldn't be so sure about that. If one can point a speculative
> read gadget at the page tables, then you win, and we should probably
> expect a steady stream of speculative read gadgets over the next
> however many years. As for finding the page tables: it's probably
> possible to find *at least* one reliably-locatable `mov <anything>,
> cr3`gadget to speculatively execute, especially on kernels/hypervisors
> which have lots of code mapped after a ctx switch into privileged
> mode, and perhaps *even* more on architectures which allow unaligned
> instructions like x86.
>
> On my system:
> $ objdump -D vmlinux | grep '%cr3,' | wc -l
> 24
> ^ there are plenty

But you do realize this isn't really helpful for a cross-VM attack,
(unless you use PV), right?
For that, you need EPT tables, and a gadget in Xen. Probably not
impossible, but several order of magnitudes harder. To the point, I
wouldn't considered it a practical risk.

> Furthermore, even if your privileged code is sufficiently hardened
> against speculative bullshit, I wouldn't be surprised if there turned
> out to be some additional hardware misfeature that could be abused to
> leak physical layout directly. We have that already for virtual
> addresses [1] (side-channel attacks against ASLR by timing TLB lookups
> vs. page walk latency), which makes sense, and while I don't see any
> direct analog for physical addresses (would look like a corresponding
> reverse-lookup translation in hardware), it might be reasonable to
> conjecture the plausibility of something that achieves the same goal
> via an unrelated mechanism.
>
> [1]: https://www.vusec.net/projects/anc/
>
> > This is yet another reason to not use PV.
> > If I'm not mistaken, the only demonstrated cross-VM rowhammer-based
> > attack relies on memory-deduplication (across VMs), which we don't use.
>
> Mmmm, I wouldn't bet on that. It's certainly _easier_ to determine
> where the thing you want to flip is - and whether it's been
> successfully flipped yet - if that page is also mapped to you, but
> that is not strictly a necessary condition for actually flipping it.

That attack relied on deduplication not only to get information, but to
actually put the page in the right place. It attacked its own page
(conveniently using other RAM of the same VM) and it gave them anything
only because the same page was mapped _also_ to another VM. Not the
other way around.

- --
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEhrpukzGPukRmQqkK24/THMrX1ywFAl7tUXAACgkQ24/THMrX
1yy1Twf/fGmmryw3XYlOUewtNe6AeaCAxR9PLdnyAuzAyzy8/jzhEpvQwp/LmeND
swmvrNL0D3lIfb43dCkepnuRQC64rdJlxbHyfgv6g6FQ6TIl6AwsoiwQk0rGjSFA
H9eNjecs3dwW4JRL+x5bMwnedC3MSMQAi41YPBlEnG9eIK+zvoeAi+w9JJqF2CtI
8Ob5NlyeyZQdE178G3qqi2lEZCMcpXZzGMEdIjBFBeEWKrMCy8r3IgOORtdww+8M
Kk+XW3KkMlmDZF5lB8WqMZSUYt39hUnSecoOYJcqUS2l/QlD8xTsKfFWK2b+2rOq
x3iCiwr4rKHgp3GOmx/sX+rO0EYZjA==
=sWz2
-----END PGP SIGNATURE-----

Demi M. Obenour

unread,
Jun 19, 2020, 8:43:17 PM6/19/20
to Marek Marczykowski-Górecki, Jean-Philippe Ouellet, qubes-devel
On 2020-06-19 19:59, Marek Marczykowski-Górecki wrote:
> On Fri, Jun 19, 2020 at 04:20:40PM -0700, Jean-Philippe Ouellet wrote:
>> Err, I wouldn't be so sure about that. If one can point a speculative
>> read gadget at the page tables, then you win, and we should probably
>> expect a steady stream of speculative read gadgets over the next
>> however many years. As for finding the page tables: it's probably
>> possible to find *at least* one reliably-locatable `mov <anything>,
>> cr3`gadget to speculatively execute, especially on kernels/hypervisors
>> which have lots of code mapped after a ctx switch into privileged
>> mode, and perhaps *even* more on architectures which allow unaligned
>> instructions like x86.
>
>> On my system:
>> $ objdump -D vmlinux | grep '%cr3,' | wc -l
>> 24
>> ^ there are plenty
>
> But you do realize this isn't really helpful for a cross-VM attack,
> (unless you use PV), right?
> For that, you need EPT tables, and a gadget in Xen. Probably not
> impossible, but several order of magnitudes harder. To the point, I
> wouldn't considered it a practical risk.

I will add that even if such an attack is possible, it would result in
massive unexplained CPU consumption, which may very well be detectable.
In the absence of deduplication, I am not even sure that intra-VM
attacks are possible.

Sincerely,

Demi

signature.asc

Jean-Philippe Ouellet

unread,
Jun 19, 2020, 11:42:37 PM6/19/20
to Marek Marczykowski-Górecki, Demi M. Obenour, qubes-devel
Oops. You're right.

Not sure what I was thinking. Thanks for making me realize I need some sleep :)


On Fri, Jun 19, 2020 at 4:59 PM Marek Marczykowski-Górecki
That's the impression I was under, probably from following references
found in https://vvdveen.com/publications/dimva2018.pdf

> > > > > Anyway, this is rather a question to relevant hypervisor developers (Xen
> > > > > in this case).
> > > > In the meantime, what does the Qubes Security Team recommend be done
> > > > to mitigate the risk of Rowhammer? ECC RAM is not available on most
> > > > laptops, and even it is only a partial mitigation. TRRespass showed
> > > > that TRR, at least as currently implemented, is not a complete
> > > > fix either.
> > >
> > > For extremely sensitive tasks you can shutdown/pause other VMs.
> >
> > Not sure how helpful pausing would be. I'd expect not much since it'd
> > still be resident in mem.
>
> Paused VMs can't really perform this kind of attack.

You're right, obviously (unless something is seriously broken, lol)

For some reason I was thinking you were suggesting to pause the
sensitive VMs while others were running, and thought "wat, no, that
doesn't protect the ones that are paused from the ones that aren't". I
missed the **other** VMs part :)
Hmm, indeed.

The same gadget does happen to appear in Xen as well:
[user@dom0 boot]$ zcat /boot/xen-4.8.5-18.fc25.gz > /tmp/xen
[user@dom0 boot]$ objdump -D /tmp/xen | grep '%cr3,' | wc -l
11

But yes, VMX and all. You're right. Thanks.

> > Furthermore, even if your privileged code is sufficiently hardened
> > against speculative bullshit, I wouldn't be surprised if there turned
> > out to be some additional hardware misfeature that could be abused to
> > leak physical layout directly. We have that already for virtual
> > addresses [1] (side-channel attacks against ASLR by timing TLB lookups
> > vs. page walk latency), which makes sense, and while I don't see any
> > direct analog for physical addresses (would look like a corresponding
> > reverse-lookup translation in hardware), it might be reasonable to
> > conjecture the plausibility of something that achieves the same goal
> > via an unrelated mechanism.
> >
> > [1]: https://www.vusec.net/projects/anc/
> >
> > > This is yet another reason to not use PV.
> > > If I'm not mistaken, the only demonstrated cross-VM rowhammer-based
> > > attack relies on memory-deduplication (across VMs), which we don't use.
> >
> > Mmmm, I wouldn't bet on that. It's certainly _easier_ to determine
> > where the thing you want to flip is - and whether it's been
> > successfully flipped yet - if that page is also mapped to you, but
> > that is not strictly a necessary condition for actually flipping it.
>
> That attack relied on deduplication not only to get information, but to
> actually put the page in the right place. It attacked its own page
> (conveniently using other RAM of the same VM) and it gave them anything
> only because the same page was mapped _also_ to another VM. Not the
> other way around.

Assuming you're talking about [1], you're right again. I wouldn't rule
out the plausibility of other attacks though.

[1]: https://download.vusec.net/papers/flip-feng-shui_sec16.pdf
Reply all
Reply to author
Forward
0 new messages