A new type of potential security vulnerability

41 views
Skip to first unread message

John Dallman

unread,
Jun 6, 2022, 7:20:04 AMJun 6
to
I'm starting to wonder if every type of speed-up technique comes with a
security vulnerability. This one is in a prefetcher in Apple's ARM
processors, and described at https://www.prefetchers.info/

We present a new type of microarchitectural attack that leaks data
at rest: data that is never read into the core architecturally.
This attack technique, Augury, leverages a novel microarchitectural
optimization present in Apple Silicon: a Data Memory-Dependent
Prefetcher (DMP).

John

Andy Valencia

unread,
Jun 6, 2022, 9:25:30 AMJun 6
to
j...@cix.co.uk (John Dallman) writes:
> I'm starting to wonder if every type of speed-up technique comes with a
> security vulnerability.
> ...

I remember when I first learned about covert channels in the 70's, and
followed the analysis on how they could be plugged. Dike, meet fingers.
It seems like there was indeed something to my very young intuition....

Andy Valencia
Home page: https://www.vsta.org/andy/
To contact me: https://www.vsta.org/contact/andy.html

Terje Mathisen

unread,
Jun 6, 2022, 10:23:06 AMJun 6
to
It does, there's even a few papers which tries to be thorough about all
the possible types, MDS is one of them:

https://www.techrepublic.com/article/why-mds-vulnerabilities-present-a-threat-as-serious-as-spectre-and-meltdown/

The key takeaway is that anything that leaves any kind of rubble behind
in the form of non-architectural side effects, can and will be exploited.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Anton Ertl

unread,
Jun 6, 2022, 12:16:50 PMJun 6
to
This prefetcher sees the software chasing some pointers, and then
predicts that it will continue to do so, and prefetches from there.
The page gives the following code as example:

for( i=0; i<len(arr); i++ ){
*arr[i];
}

Normal prefetchers prefetch the elements of arr, DMP also prefetches
the items pointed to by arr[i]. DMP looks to me like a hardware
implementation of what a gadget exploited in a Spectre attack does in
(speculatively executed) software.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

EricP

unread,
Jun 6, 2022, 12:46:46 PMJun 6
to
Anton Ertl wrote:
> j...@cix.co.uk (John Dallman) writes:
>> I'm starting to wonder if every type of speed-up technique comes with a
>> security vulnerability. This one is in a prefetcher in Apple's ARM
>> processors, and described at https://www.prefetchers.info/
>>
>> We present a new type of microarchitectural attack that leaks data
>> at rest: data that is never read into the core architecturally.
>> This attack technique, Augury, leverages a novel microarchitectural
>> optimization present in Apple Silicon: a Data Memory-Dependent
>> Prefetcher (DMP).
>
> This prefetcher sees the software chasing some pointers, and then
> predicts that it will continue to do so, and prefetches from there.
> The page gives the following code as example:
>
> for( i=0; i<len(arr); i++ ){
> *arr[i];
> }
>
> Normal prefetchers prefetch the elements of arr, DMP also prefetches
> the items pointed to by arr[i]. DMP looks to me like a hardware
> implementation of what a gadget exploited in a Spectre attack does in
> (speculatively executed) software.
>
> - anton

Note that the addresses in cache are virtual so this prefetcher
has to be forwarding them to the TLB then back to cache for prefetch.

Possibly a prefetcher like this could also work for linked lists.

This might negatively impact SMP shared objects by chasing pointers and
prefetching objects that other nodes are holding Exclusive or Modified.
A prefetch might cause nodes to downgrade to Shared only to have
the owning node upgrade back to Exclusive when it writes again.
So you might want an off switch for heavily shared environments like OS.

MitchAlsup

unread,
Jun 6, 2022, 1:26:45 PMJun 6
to
On Monday, June 6, 2022 at 11:46:46 AM UTC-5, EricP wrote:
> Anton Ertl wrote:
> > j...@cix.co.uk (John Dallman) writes:
> >> I'm starting to wonder if every type of speed-up technique comes with a
> >> security vulnerability. This one is in a prefetcher in Apple's ARM
> >> processors, and described at https://www.prefetchers.info/
> >>
> >> We present a new type of microarchitectural attack that leaks data
> >> at rest: data that is never read into the core architecturally.
> >> This attack technique, Augury, leverages a novel microarchitectural
> >> optimization present in Apple Silicon: a Data Memory-Dependent
> >> Prefetcher (DMP).
> >
> > This prefetcher sees the software chasing some pointers, and then
> > predicts that it will continue to do so, and prefetches from there.
> > The page gives the following code as example:
> >
> > for( i=0; i<len(arr); i++ ){
> > *arr[i];
> > }
> >
> > Normal prefetchers prefetch the elements of arr, DMP also prefetches
> > the items pointed to by arr[i]. DMP looks to me like a hardware
> > implementation of what a gadget exploited in a Spectre attack does in
> > (speculatively executed) software.
> >
> > - anton
> Note that the addresses in cache are virtual so this prefetcher
> has to be forwarding them to the TLB then back to cache for prefetch.
>
> Possibly a prefetcher like this could also work for linked lists.
<
From page 3 of the paper::
<
B for future k*(i+delta), the DMP must use the data and
addresses it has seen so far to infer &B[0]. Note, A[k*i] is
an offset into an array in the program’s virtual address space,
whereas &B[A[k*i]] is likely a physical address. So, the
DMP must autonomously perform virtual to physical address
translations to identify data-address correspondences.
<
So it seems that the prefetched has a TLB to solve the VA->PA problem
>
> This might negatively impact SMP shared objects by chasing pointers and
> prefetching objects that other nodes are holding Exclusive or Modified.
<
Nothing was mentioned about performing cache coherence checks early,
just about removing latency from DRAM access.
<
> A prefetch might cause nodes to downgrade to Shared only to have
> the owning node upgrade back to Exclusive when it writes again.
> So you might want an off switch for heavily shared environments like OS.
<
But what intrigues me is a consideration that ANY autonomous prefetching
strategy might alter the run time in such a way that a covert channel is opened.
<
Obviously putting the data in a cache close to the CPU opens such a channel.
but what would happen if the DRAM was accessed but the data never left the
DRAM controller? it seems to me that DRAM access would be lessened and
this in and of itself would alter the latency in an observable way--EVEN WHEN
no CPU visible microarchitectural state has been altered !!
<
The implications of this are that microarchitectural state remote from any CPU
is part of the state that cannot be altered until the causal instruction is retired
(leading to all sorts of speed of light problems in system design).
<
Consider a SATA disk driver that prefetches disk sectors such that no rotational
latency is observed when an actual access transpires. This in an of itself might
open such a covert channel.

Ivan Godard

unread,
Jun 6, 2022, 11:38:18 PMJun 6
to
Yes - it's hard to see how it makes an exploitable difference, except in
very contrived code. But it there's no exploitable difference in
practical code, then how can there be a beneficial difference in
practical code either? What code is there that is walking vectors of
pointers - pointers with cold targets worth prefetching - often enough
to get a measurable benefit?

Garbage collection maybe?

Stephen Fuld

unread,
Jun 7, 2022, 12:29:20 AMJun 7
to
This is vaguely disquieting. What about the caches within essentially
all disk drives. These work on exactly the same principle, eliminate
rotational latency. Is that exploitable? Even if you turn off the
on-drive cache, could you exploit whether the next I/O was exactly one
rotational latency later, as opposed to requiring a seek? This is
getting very murky, at least to me.



--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Anton Ertl

unread,
Jun 7, 2022, 3:02:28 AMJun 7
to
Ivan Godard <iv...@millcomputing.com> writes:
>But it there's no exploitable difference in
>practical code, then how can there be a beneficial difference in
>practical code either?

The practical benefit requires only access to data X. This is not a
vulnerability. To turn it into a vulnerability, requires a side
channel based on the value X. E.g., consider the example

for( i=0; i<len(arr); i++ ){
*arr[i];
}

where the prefetcher fetches *arr[i] for some unaccessed i. This can
be used to reveal the value of arr[i] through a cache side-channel,
but not the value of *arr[i].

So traditional non-pointer-chasing hardware prefetchers that are
trained only on architectural accesses do not reveal data at rest
(they can reveal architectural access patterns, but secret keys and
the like should not affect architectural access patterns in
competently written code).

>What code is there that is walking vectors of
>pointers - pointers with cold targets worth prefetching - often enough
>to get a measurable benefit?

Walking through an array of objects in, e.g., Java (or lots of other
languages more recent than Fortran).

>Garbage collection maybe?

That, too.

- anton
--

Anton Ertl

unread,
Jun 7, 2022, 3:27:57 AMJun 7
to
Stephen Fuld <sf...@alumni.cmu.edu.invalid> writes:
>On 6/6/2022 10:26 AM, MitchAlsup wrote:
>> But what intrigues me is a consideration that ANY autonomous prefetching
>> strategy might alter the run time in such a way that a covert channel is opened.

Prefetching is a side channel for the addresses (not the data) used by
the prefetcher. A linear prefetcher just reveals that there were some
accesses to earlier elements of the linear array, something that is
already revealed more precisely through cache side channels without
the prefetcher.

The special thing about Apple's prefetcher is that it uses prefetched
data as addresses, and thus reveals that data through a cache side
channel.

>> Consider a SATA disk driver that prefetches disk sectors such that no rotational
>> latency is observed when an actual access transpires. This in an of itself might
>> open such a covert channel.

It tells you that one or more of the sectors on that track has been
accessed. Can that be used for revealing important secrets? Not in
competently-written code. Also, can it be used for revealing things
that are not revealed by caching only the sector?

>This is vaguely disquieting. What about the caches within essentially
>all disk drives. These work on exactly the same principle, eliminate
>rotational latency. Is that exploitable?

It reveals as little as the OS's prefetching.

>Even if you turn off the
>on-drive cache, could you exploit whether the next I/O was exactly one
>rotational latency later, as opposed to requiring a seek? This is
>getting very murky, at least to me.

While some amount of paranoia is useful when trying to defend against
potential exploits, it has to be tempered with an analysis of what
data can be revealed, and whether that data is not also revealed in
some other, equally exploitable way. Letting your imagination run
wild leads to either panic or to ignoring the topic because of
hopelessness; both attitudes are not useful.

- anton
--

Skybuck Flying

unread,
Jun 7, 2022, 4:16:08 AMJun 7
to
I kinda see one, I call it: "the combo attack".

Hackers using/penetrating multiple subsystems to together hack the system.

Bye,
Skybuck.

EricP

unread,
Jun 7, 2022, 9:25:26 AMJun 7
to
MitchAlsup wrote:
> On Monday, June 6, 2022 at 11:46:46 AM UTC-5, EricP wrote:
>>> j...@cix.co.uk (John Dallman) writes:
>>>> I'm starting to wonder if every type of speed-up technique comes with a
>>>> security vulnerability. This one is in a prefetcher in Apple's ARM
>>>> processors, and described at https://www.prefetchers.info/
>>>>
>> This might negatively impact SMP shared objects by chasing pointers and
>> prefetching objects that other nodes are holding Exclusive or Modified.
> <
> Nothing was mentioned about performing cache coherence checks early,
> just about removing latency from DRAM access.
> <
>> A prefetch might cause nodes to downgrade to Shared only to have
>> the owning node upgrade back to Exclusive when it writes again.
>> So you might want an off switch for heavily shared environments like OS.

I wasn't considering data leaks, just thinking that this data
prefetch mechanism itself could easily bugger up performance.

I don't see any point to prefetching without saving the cache line
and in order to do that they have to play by the coherence rules.

The code paths are guarded by mutexes which use memory barrier
and/or atomic ops so there should not be premature prefetching
into shared data regions due to execution speculation.

But this data prefetch knows nothing of the guarding mutexes.
If one node starts reading Shared, say, a linked list of objects
while some other node has that list locked for update
it could cause ping-ponging cache lines between nodes.

It also might be a good idea for this DMP to have a check whether
it just did prefetch of a line that was subsequently taken back.
Imagine if some code could wind up in a fight with this invisible
automatic prefetcher that repeatedly keeps grabbing away ownership.


Reply all
Reply to author
Forward
0 new messages