Discussion: Safeguards on speculative execution?

826 views
Skip to first unread message

Jacob Bachmeyer

unread,
Jan 4, 2018, 1:49:15 AM1/4/18
to RISC-V ISA Dev
The recent mess with bad out-of-order speculative execution in Intel
processors (which do actually implement the published x86 ISA as I
understand) suggests to me that we should examine our ISA for loopholes
that could lead to similar issues in RISC-V implementations.

To start a discussion, I suggest that all instructions have a control
dependency on any preceding instructions in program order that can cause
exceptions (excluding interrupt exceptions, which can occur at any
instruction), until those potential traps are resolved. Implementations
are permitted to speculate through ECALL and into the trap handler,
however, since the ECALL trap is unconditional and therefore resolved
upon decoding ECALL.


On another side note, Intel TSX apparently allows suppression of page
fault traps, instead simply failing the transaction. We should ensure
that RVA and RVT do not repeat this mistake -- a page fault occurring
during an LR/SC block or transaction must trap (thus breaking atomicity
and failing the transaction) to avoid providing a means to replicate
Meltdown on RISC-V.

Currently, the only concern I see here is the question of whether a
failed SC translates its address. I originally suggested that SC should
always translate its address, but Cesar Eduardo Barros gave (in
message-id <064caf54-b612-195d...@cesarb.eti.br>
<URL:https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/064caf54-b612-195d-5fce-4efb98a8f934%40cesarb.eti.br>)
a good argument for permitting certain failing SCs to ignore page
faults. Can we use that suggestion or does it now raise concerns about
another Meltdown vector?


Spectre produces more serious concerns. I suggest a HINT for
security-sensitive branches (or possibly the other way around: a HINT
for branches that are permitted to use dynamic branch prediction) be
added. Indirect branch target buffers should be either keyed on an ASID
column (and ASID-selectively flushed when an ASID's root PPN changes) or
flushed entirely upon context-switch. Could flushing an indirect target
buffer upon xRET be sufficient if separate indirect target buffers are
maintained for each implemented privilege level?

Generally, flushing branch prediction buffers on xRET should be low-cost
or even an improvement, since the branch buffer contains irrelevant
information immediately after a context switch. Alternately,
partitioning branch prediction buffers on privilege level and ASID would
provide isolation to close the side channel entirely.

An architectural guarantee that each hart has its own branch prediction,
return stack, and other performance features would help; lack of such
isolation allows Spectre to cross hyper-threads on Haswell.


Other ideas?


-- Jacob

Christoph Hellwig

unread,
Jan 4, 2018, 2:19:25 AM1/4/18
to jcb6...@gmail.com, isa...@groups.riscv.org
The only really effective fix for spectre are split supervisor/user
pagetables. And I'm about to post actual spec patches for that.

Stefan O'Rear

unread,
Jan 4, 2018, 2:31:08 AM1/4/18
to Christoph Hellwig, jcb6...@gmail.com, isa...@groups.riscv.org
Please do not spread misinformation.

Supervisor/user page tables are not relevant to Spectre; you are
thinking of Meltdown.

Meltdown does not require split page tables; you will notice that AMD
and most of Arm's lineup (except for the A75) are immune to Meltdown
despite unified page tables. The only requirement is that page table
permissions be checked prior to bypassing L1 read results; since the
TLB result is needed for way selection, this is most likely to be an
issue in designs which use a way predictor. It is not an ISA issue
and does not require an ISA fix.

-s

Andrew Waterman

unread,
Jan 4, 2018, 2:32:31 AM1/4/18
to Stefan O'Rear, Christoph Hellwig, jcb6...@gmail.com, isa...@groups.riscv.org
+1
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CADJ6UvOXKu74r9-WK758rFtYk5TdWVePUqrmKVpeCBnuUHqfQQ%40mail.gmail.com.

Stefan O'Rear

unread,
Jan 4, 2018, 3:10:55 AM1/4/18
to Jacob Bachmeyer, RISC-V ISA Dev
On Wed, Jan 3, 2018 at 10:49 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> The recent mess with bad out-of-order speculative execution in Intel
> processors (which do actually implement the published x86 ISA as I
> understand) suggests to me that we should examine our ISA for loopholes that
> could lead to similar issues in RISC-V implementations.

1. This isn't primarily an ISA concern. We can provide implementation
guidance at other layers.

2. Nine hours after the first public disclosure is far too early to
discuss permanent mitigations. We can do things but we need to start
from a perspective that those things will prove useless after more
research and need to be *removed* and replaced.

3. Sanctum ( https://eprint.iacr.org/2015/564 ) and related designs
(e.g. antikernel) are the only end state here, we'll get there in my
liftime.

> To start a discussion, I suggest that all instructions have a control
> dependency on any preceding instructions in program order that can cause
> exceptions (excluding interrupt exceptions, which can occur at any
> instruction), until those potential traps are resolved. Implementations are
> permitted to speculate through ECALL and into the trap handler, however,
> since the ECALL trap is unconditional and therefore resolved upon decoding
> ECALL.

This is a rather imprecise statement and appears to using "control
dependency" in a sense other than the standard one associated with
memory models.

> On another side note, Intel TSX apparently allows suppression of page fault
> traps, instead simply failing the transaction. We should ensure that RVA
> and RVT do not repeat this mistake -- a page fault occurring during an LR/SC
> block or transaction must trap (thus breaking atomicity and failing the
> transaction) to avoid providing a means to replicate Meltdown on RISC-V.

Meltdown doesn't require transactions. It does require aggressive
bypassing, which can be avoided at the microarchitectural level.

> Currently, the only concern I see here is the question of whether a failed
> SC translates its address. I originally suggested that SC should always
> translate its address, but Cesar Eduardo Barros gave (in message-id
> <064caf54-b612-195d...@cesarb.eti.br>
> <URL:https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/064caf54-b612-195d-5fce-4efb98a8f934%40cesarb.eti.br>)
> a good argument for permitting certain failing SCs to ignore page faults.
> Can we use that suggestion or does it now raise concerns about another
> Meltdown vector?

I've read most of that (very long) thread and remain baffled as to
what problem is actually being solved.

> Spectre produces more serious concerns. I suggest a HINT for
> security-sensitive branches (or possibly the other way around: a HINT for
> branches that are permitted to use dynamic branch prediction) be added.

That sounds way too similar to the CSDB hint Arm announced today. Be careful.

> Indirect branch target buffers should be either keyed on an ASID column (and
> ASID-selectively flushed when an ASID's root PPN changes) or flushed
> entirely upon context-switch. Could flushing an indirect target buffer upon
> xRET be sufficient if separate indirect target buffers are maintained for
> each implemented privilege level?

This is a very reasonable suggestion for anything in development
today, again, a bit out of scope for an ISA.

> Generally, flushing branch prediction buffers on xRET should be low-cost or
> even an improvement, since the branch buffer contains irrelevant information
> immediately after a context switch. Alternately, partitioning branch
> prediction buffers on privilege level and ASID would provide isolation to
> close the side channel entirely.

Not really low-cost. Syscalls make the user-mode code after them
slower (often 10K+ cycles lost in total) due to evicted pages and
branch information; you are proposing to unconditionally evict all of
the branch information. Partitioning is IMO a better idea, but again,
out of scope for an ISA.

> An architectural guarantee that each hart has its own branch prediction,
> return stack, and other performance features would help; lack of such
> isolation allows Spectre to cross hyper-threads on Haswell.

"Architectural" is generally understood to exclude side channels.
Let's pick a different word.

-s

Jonas Oberhauser

unread,
Jan 4, 2018, 8:47:38 AM1/4/18
to RISC-V ISA Dev
What I don't quite understand is why this is a spec issue. Shouldn't it simply be possible to evict all speculatively read cachelines on misspeculation? In particular only those which were not already loaded before the speculation. This seems to be cheap in HW and solve the issue.

Is there something I am missing?

Cesar Eduardo Barros

unread,
Jan 4, 2018, 9:12:08 AM1/4/18
to Jonas Oberhauser, RISC-V ISA Dev
Em 04-01-2018 11:47, Jonas Oberhauser escreveu:
> What I don't quite understand is why this is a spec issue. Shouldn't it simply be possible to evict all speculatively read cachelines on misspeculation? In particular only those which were not already loaded before the speculation. This seems to be cheap in HW and solve the issue.
>
> Is there something I am missing?
>

The speculative load of a cacheline probably evicted something else from
the cache. This second-order effect could probably be used to detect
whether a cacheline had been speculatively loaded, even if the
speculative load is undone.

And there might be other ways to observe the speculative load. Coherence
protocol (the corresponding cacheline on another core changing its
state), bus traffic (a "memory virus" on another core sensing increased
latency), subtle differences on microarchitectural state (leading to
slightly different latencies on following instructions, for instance
some internal buffer getting "misaligned"), ...

The only "safe" way, as far as I can see, is to wait on a L1 cache miss
until it's no longer speculative to fetch. And even that might not be
safe, if it changes something in the cache replacement state which leads
to a change on which cacheline gets evicted later (with LRU, on a hit
the cacheline will no longer be the "least" recently used).

--
Cesar Eduardo Barros
ces...@cesarb.eti.br

Cesar Eduardo Barros

unread,
Jan 4, 2018, 9:31:08 AM1/4/18
to jcb6...@gmail.com, RISC-V ISA Dev
Em 04-01-2018 04:49, Jacob Bachmeyer escreveu:
>
> Spectre produces more serious concerns.  I suggest a HINT for
> security-sensitive branches (or possibly the other way around:  a HINT
> for branches that are permitted to use dynamic branch prediction) be
> added.  Indirect branch target buffers should be either keyed on an ASID
> column (and ASID-selectively flushed when an ASID's root PPN changes) or
> flushed entirely upon context-switch.  Could flushing an indirect target
> buffer upon xRET be sufficient if separate indirect target buffers are
> maintained for each implemented privilege level?

I don't like the HINT idea, sounds too much like "enumerating badness"
to me. A single missed HINT (or a wrongly placed one, for the opposite
idea) could be enough to blow the doors wide open, and defining what is
"security-sensitive" isn't necessarily obvious, especially when an
innocent-looking branch could be used with unexpected values to read
unrelated data.

Separate branch prediction state for each privilege level sounds like an
interesting idea. If you go further with the isolation, you can end up
with a design which works as if each privilege level were on its own
separate and isolated hart. Given how small RISC-V designs can get, it's
a fun thought experiment: what if, instead of several privilege levels
in a single hart, you had two separate harts, a smaller one which ran
exclusively in S-mode, and another one which ran exclusively in U-mode?

>
> Generally, flushing branch prediction buffers on xRET should be low-cost
> or even an improvement, since the branch buffer contains irrelevant
> information immediately after a context switch.  Alternately,
> partitioning branch prediction buffers on privilege level and ASID would
> provide isolation to close the side channel entirely.

The kernel and user modes are not two separate programs, they are part
of the same program. When my program is in a loop doing read()/write()
system calls, it switches into the kernel, does some work, and switches
back; the branch prediction state is still relevant for the next time it
switches into the kernel, and for a predictor which uses branch history,
the history of the branches in user mode can be relevant to predict
where it will branch in the kernel ("if the previous branch was that
loop in user space, then the kernel will jump to SYS_read, otherwise it
will jump to SYS_write").

Of course, that has to be balanced with information leaks, so using
branch history from a different protection level is not that good an
idea, even if, as on my example, it could make the predictions somewhat
better.

Christoph Hellwig

unread,
Jan 4, 2018, 9:41:06 AM1/4/18
to sor...@gmail.com, jcb6...@gmail.com, isa...@groups.riscv.org
On Wed, 2018-01-03 at 23:31 -0800, Stefan O'Rear wrote:
>
> Supervisor/user page tables are not relevant to Spectre; you are
> thinking of Meltdown.

Yes, but I would not call using the wrong code name "misinformation".

> Meltdown does not require split page tables; you will notice that AMD
> and most of Arm's lineup (except for the A75) are immune to Meltdown
> despite unified page tables.  The only requirement is that page table
> permissions be checked prior to bypassing L1 read results; since the
> TLB result is needed for way selection, this is most likely to be an
> issue in designs which use a way predictor.  It is not an ISA issue
> and does not require an ISA fix.

It does not require split page tables, but they are the most clear
architectural barrier. Of course you can always still rely on
verification being absolutely accurate for speculative loads.

Also note that the non-Intel non-ARM A75 architectures aren't
necessarily immune - it is just that so far there are no exploits. I
would be very surprised if there weren't other implementations with
similar issues.

Eric McCorkle

unread,
Jan 4, 2018, 10:40:53 AM1/4/18
to isa...@groups.riscv.org
On 01/04/2018 03:10, Stefan O'Rear wrote:

> 2. Nine hours after the first public disclosure is far too early to
> discuss permanent mitigations. We can do things but we need to start
> from a perspective that those things will prove useless after more
> research and need to be *removed* and replaced.

Also, this disclosure strongly suggests a *class* of attacks, based on
observing performance side-effects from implementation decisions. I
would be surprised if it's the last such attack.

Addressing this class of attacks is going to require extensive analysis
of architectural implementation techniques, and will probably go on for
some time.

Michael Clark

unread,
Jan 4, 2018, 4:01:13 PM1/4/18
to Eric McCorkle, RISC-V ISA Dev
To be precise, there are two classes of attacks here:

Meltdown - timing side effects from speculation of faulting instructions or instructions that bypass permissions
Spectre - timing side effects from speculation e.g. cache effects from speculation of data dependent loads after branches

I really liked the Google Project Zero write-up (in addition to the Meltdown and Spectre papers) as they go into a lot of detail of Spectre, which is a much more general class of attacks:

- https://googleprojectzero.blogspot.co.nz/2018/01/reading-privileged-memory-with-side.html

The Branch Target Buffer attacks are quite interesting as this is a case where there is an explicit co-mingling of state (address hashes) that cross privilege boundaries.

I was speculating before the disclosure about possible vectors and was thinking about TLB hit vs TLB miss latency, however one of the properties of a TLB by its nature is that it must include all bits relevant to a privilege domain i.e. while its possible to alias a virtual address in another privilege domain with a virtually indexed physically tagged L1 cache, as long as there is no timing side-channel from such aliasing then it can’t be exploited. ASID aka PCID in this case reduces aliasing across one boundary and reduces the expense of a TLB flush. i.e. making the flush of one domain / ASID.

I find the Spectre attacks more interesting as the researchers have found micro-architectural structures such as the Branch Target Buffer, that essentially co-mingle state between privilege boundaries. i.e. attacking the hash compression algorithm used for entries in the BTB. i.e. XOR of address bits (for compression) creates an explicit side-channel between privilege boundaries.

The Spectre class (general class) is such that speculation or other micro-architectural optimisation structures should have no visible artefacts across privilege domains. This seems difficult for a fast branch predictor as to secure it, the hashing scheme can no-longer co-mingle bits between privilege domains e.g. two SMT logical hardware threads running on the same physical core. e.g. sharing a BTB. To make the BTB robust, one would need to include the PCID / ASID in entries however the increased size of entries may impact performance as the BTB is an optimisation structure.

The Meltdown class is simply buggy hardware. PREFETCH instructions and TSX should simply fault, and any cache side-effects of speculation, even with correct permissions, should be rolled back. One can imagine attacks that exploit speculation of data dependent loads after branches using code vulnerable to such an attack vs using the eBPF JIT. i.e. cache ways are shared between both privilege domains, so eviction on one side of the boundary leaks which way a branch was taken in an exploitable system call on the other side of the boundary, then there is an info leak.

Shared caches are problematic. Spectre is in the same class of attacks as FLUSH+RELOAD.

I can see there will be a lot of emphasis on constant time operation, where time of operations is not influenced from other privilege domains. Features such as explicit cache control and cache partitioning for security-related code is going to become important. One imagines that if code in a given ASID is not influenced by L1 and L2 cache effects of other ASID’s and chooses its loads and stores wisely, then it can avoid such info leaks, but this requires that all micro-architectural state between domains, is separate, besides the memory and registers for the message being passed and returned.

Jacob Bachmeyer

unread,
Jan 4, 2018, 10:21:06 PM1/4/18
to Christoph Hellwig, isa...@groups.riscv.org
Christoph Hellwig wrote:
> The only really effective fix for spectre are split supervisor/user
> pagetables. And I'm about to post actual spec patches for that.

Thank you for CC'ing me on those; I have made a proposal "RVSdas" that
addresses the (relatively minor) quibbles I have with that patchset.

I was planning to propose RVSdas again anyway after thinking about these
attacks overnight. We need it now, but there is no rush -- specs to
hardware has months of latency, so getting it right is more important
than getting it fast, unlike in software. :-)

Further, if I understand Spectre correctly, splitting the page tables
only fixes some of the problems. While it protects the supervisor, it
does not help with user programs and split page tables do nothing to
prevent the JavaScript version of the attack from leaking data from
other parts of a browser, for example. RVJ will need to consider these
risks.


-- Jacob

Jacob Bachmeyer

unread,
Jan 4, 2018, 10:28:51 PM1/4/18
to Christoph Hellwig, sor...@gmail.com, isa...@groups.riscv.org
I believe that both of you are mistaken. While split page tables
absolutely stop Meltdown (and numerous other attacks... and can improve
performance... and make more address space available to user programs...
etc.), AMD did get verification on speculative loads correct, so we have
an existence proof for Meltdown-proof systems with unified page tables.
On the other hand, Meltdown *is* an ISA issue -- the vulnerable Intel
processors (as I understand) correctly implement the published x86 ISA,
which *allows* the very speculation that Meltdown exploits.

Put simply, Meltdown is the consequence of a loophole in the x86 ISA.
We should make certain that we do not have similar loopholes in the
RISC-V ISA, which was my point in starting this thread.


-- Jacob

Jacob Bachmeyer

unread,
Jan 4, 2018, 11:02:39 PM1/4/18
to Stefan O'Rear, RISC-V ISA Dev
Stefan O'Rear wrote:
> On Wed, Jan 3, 2018 at 10:49 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> The recent mess with bad out-of-order speculative execution in Intel
>> processors (which do actually implement the published x86 ISA as I
>> understand) suggests to me that we should examine our ISA for loopholes that
>> could lead to similar issues in RISC-V implementations.
>>
>
> 1. This isn't primarily an ISA concern. We can provide implementation
> guidance at other layers.
>

The published x86 ISA (explicitly, as I understand) allows the very
speculation that Meltdown exploits.

> 2. Nine hours after the first public disclosure is far too early to
> discuss permanent mitigations. We can do things but we need to start
> from a perspective that those things will prove useless after more
> research and need to be *removed* and replaced.
>

Perhaps, but a patch set to change the spec has already been written and
people were already asking on hw-dev if Rocket and BOOM are affected.
That the Spectre paper specifically mentioned RISC-V's indirect jump was
not reassuring.

I do not believe that isolation requirements will ever prove useless.
Inconvenient, perhaps, but not useless.

>> To start a discussion, I suggest that all instructions have a control
>> dependency on any preceding instructions in program order that can cause
>> exceptions (excluding interrupt exceptions, which can occur at any
>> instruction), until those potential traps are resolved. Implementations are
>> permitted to speculate through ECALL and into the trap handler, however,
>> since the ECALL trap is unconditional and therefore resolved upon decoding
>> ECALL.
>>
>
> This is a rather imprecise statement and appears to using "control
> dependency" in a sense other than the standard one associated with
> memory models.
>

I may have misused the term, since I am still learning this area. Put
simply, instructions should have dependencies on the "raise exception"
output of all preceding instructions in program order. This should be
sufficient to prevent speculation past an exception, but hopefully not
necessary and a weaker rule can also be sufficient. Meltdown definitely
relies on this rule being violated, and I suspect that at least some
forms of Spectre can be similarly prevented.

>> On another side note, Intel TSX apparently allows suppression of page fault
>> traps, instead simply failing the transaction. We should ensure that RVA
>> and RVT do not repeat this mistake -- a page fault occurring during an LR/SC
>> block or transaction must trap (thus breaking atomicity and failing the
>> transaction) to avoid providing a means to replicate Meltdown on RISC-V.
>>
>
> Meltdown doesn't require transactions. It does require aggressive
> bypassing, which can be avoided at the microarchitectural level.
>

The ability to use Intel TSX to suppress page faults approximately
quadruples the channel capacity that Meltdown exploits. Defense in
depth requires us to avoid replicating Intel's mistake. Transactions
are irrelevant -- what matters is that Intel processors do not trap on
page fault in a TSX group.

>> Currently, the only concern I see here is the question of whether a failed
>> SC translates its address. I originally suggested that SC should always
>> translate its address, but Cesar Eduardo Barros gave (in message-id
>> <064caf54-b612-195d...@cesarb.eti.br>
>> <URL:https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/064caf54-b612-195d-5fce-4efb98a8f934%40cesarb.eti.br>)
>> a good argument for permitting certain failing SCs to ignore page faults.
>> Can we use that suggestion or does it now raise concerns about another
>> Meltdown vector?
>>
>
> I've read most of that (very long) thread and remain baffled as to
> what problem is actually being solved.
>

At base, standardizing an edge case that is currently
implementation-defined. The original motivation was for performance:
if LR "faults in" a page, the supervisor might establish the mapping as
copy-on-write. The page fault ensures that SC fails and the LR/SC loop
iterates. On the next iteration, SC takes a page fault -- the target
page is read-only at hardware, and the supervisor copies the page -- SC
fails again. On the third iteration, the SC finally succeeds. If SC
takes the page fault on the first iteration, even though there has been
an exception and the reservation is lost, SC will succeed on the second
iteration.

The argument for permitting certain failing SCs to ignore page faults
applies to concurrent programs that can "freeze" an object as part of
memory management. This was admitted to be a contrived situation, but I
believe that real programs in larger concurrent systems with managed
memory might use this scenario eventually if not today. On the other
hand, using TSX to suppress exceptions quadrupled the performance of a
Meltdown attack, and I am asking for people more knowledgeable than I to
examine that issue. I do not know if suppressing page fault traps on
failed SCs can have similar benefits to an attacker.

>> Spectre produces more serious concerns. I suggest a HINT for
>> security-sensitive branches (or possibly the other way around: a HINT for
>> branches that are permitted to use dynamic branch prediction) be added.
>>
>
> That sounds way too similar to the CSDB hint Arm announced today. Be careful.
>

Explain? Restricting branch prediction is an obvious solution to
attacks that rely on causing other code to mispredict branches. And I
posted that yesterday, mere hours after details of an exploit that
relies on branch misprediction were published. I do *not* have inside
information from ARM and the isa-dev archives are public.

>> Indirect branch target buffers should be either keyed on an ASID column (and
>> ASID-selectively flushed when an ASID's root PPN changes) or flushed
>> entirely upon context-switch. Could flushing an indirect target buffer upon
>> xRET be sufficient if separate indirect target buffers are maintained for
>> each implemented privilege level?
>>
>
> This is a very reasonable suggestion for anything in development
> today, again, a bit out of scope for an ISA.
>

The RISC-V ISA spec has suggested implementation details in commentary,
so I believe that these issues are in-scope.

>> Generally, flushing branch prediction buffers on xRET should be low-cost or
>> even an improvement, since the branch buffer contains irrelevant information
>> immediately after a context switch. Alternately, partitioning branch
>> prediction buffers on privilege level and ASID would provide isolation to
>> close the side channel entirely.
>>
>
> Not really low-cost. Syscalls make the user-mode code after them
> slower (often 10K+ cycles lost in total) due to evicted pages and
> branch information; you are proposing to unconditionally evict all of
> the branch information. Partitioning is IMO a better idea, but again,
> out of scope for an ISA.
>

But again, in-scope for implementation guidance, which the RISC-V ISA
spec has in commentary.

>> An architectural guarantee that each hart has its own branch prediction,
>> return stack, and other performance features would help; lack of such
>> isolation allows Spectre to cross hyper-threads on Haswell.
>>
>
> "Architectural" is generally understood to exclude side channels.
> Let's pick a different word.
>

Fair enough; I was using it in the same sense as the architectural
guarantee of forward progress for LR/SC. Any suggestions?


-- Jacob

Jacob Bachmeyer

unread,
Jan 4, 2018, 11:09:57 PM1/4/18
to Eric McCorkle, isa...@groups.riscv.org
I agree; that is why I moved to start a discussion now. Solving
existing problems is one thing, but preventing our efforts from
introducing new problems is even better.

Meltdown exploits speculative execution that the x86 ISA permits, so
there is an ISA issue involved.


-- Jacob

Eric McCorkle

unread,
Jan 4, 2018, 11:28:19 PM1/4/18
to jcb6...@gmail.com, isa...@groups.riscv.org
On 01/04/2018 23:09, Jacob Bachmeyer wrote:

> I agree; that is why I moved to start a discussion now.  Solving
> existing problems is one thing, but preventing our efforts from
> introducing new problems is even better.
>
> Meltdown exploits speculative execution that the x86 ISA permits, so
> there is an ISA issue involved.

I'm working on some countermeasure ideas based on the fact that
speculative execution must still obey dependency relationships, and that
if sensitive data never makes it to the core, then it never makes it to
a side-channel. Specifically, if you can ensure that fetches of
sensitive information depend on completion of address translation, then
the sensitive data never hits the core, and therefore, never makes it
into a side-channel (at least, not with these attacks).

On existing architectures, you ought to be able to defeat these attacks
by keeping sensitive info in non-cached memory ranges, thus fetches
depend on address translation, which detects faults (unless I'm missing
something). There's probably a lot more you can do in this direction
with RISC-V.

Also, keeping sensitive information safe from this kind of thing was a
major rationale for the engines extension proposal.

Eric McCorkle

unread,
Jan 4, 2018, 11:41:48 PM1/4/18
to isa...@groups.riscv.org
If I understand the attack correctly, you basically jump into someone
else's address space having pre-loaded your registers so as to do
computation on some sensitive data, and hopefully soak up enough
information about the sensitive data into your branch predictors that
you can reconstruct it after the fault kicks you out.

To reliably defeat the attack, it seems like you'd have to somehow make
any load of at least the sensitive information depend on address
translation (or something else that detects the fault).

Jacob Bachmeyer

unread,
Jan 4, 2018, 11:56:52 PM1/4/18
to Cesar Eduardo Barros, RISC-V ISA Dev
Cesar Eduardo Barros wrote:
> Em 04-01-2018 04:49, Jacob Bachmeyer escreveu:
>> Spectre produces more serious concerns. I suggest a HINT for
>> security-sensitive branches (or possibly the other way around: a
>> HINT for branches that are permitted to use dynamic branch
>> prediction) be added. Indirect branch target buffers should be
>> either keyed on an ASID column (and ASID-selectively flushed when an
>> ASID's root PPN changes) or flushed entirely upon context-switch.
>> Could flushing an indirect target buffer upon xRET be sufficient if
>> separate indirect target buffers are maintained for each implemented
>> privilege level?
>
> I don't like the HINT idea, sounds too much like "enumerating badness"
> to me. A single missed HINT (or a wrongly placed one, for the opposite
> idea) could be enough to blow the doors wide open, and defining what
> is "security-sensitive" isn't necessarily obvious, especially when an
> innocent-looking branch could be used with unexpected values to read
> unrelated data.

It seemed questionable to me at the time, but I was looking for "first
answers" to Spectre to seed discussion, so went with it anyway. How
about considering all branches where the wrong path can lead to a
program crash (such as bounds checks) sensitive? Then, of course, a
parser that is safe on any input could still benefit from dynamic branch
prediction.

Or should we have a HINT that permits indirect jumps to be predicted to
its location, similar to previous proposals on this list to restrict
indirect jumps? A predicted indirect jump not landing at a "safe
landing HINT" would cancel speculative execution, preventing the abuse
of gadgets. An actual indirect jump would proceed, regardless of the
absence of the "safe landing" HINT.

> Separate branch prediction state for each privilege level sounds like
> an interesting idea. If you go further with the isolation, you can end
> up with a design which works as if each privilege level were on its
> own separate and isolated hart. Given how small RISC-V designs can
> get, it's a fun thought experiment: what if, instead of several
> privilege levels in a single hart, you had two separate harts, a
> smaller one which ran exclusively in S-mode, and another one which ran
> exclusively in U-mode?

I suggested (message-id <5901720E...@gmail.com>
<URL:https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5901720E.3090904%40gmail.com>)
a similar solution to someone who was trying to make a large number of
minimal RISC-V processors: have a large number of U-mode-only harts
that halt and feed interrupts to a common control processor instead of
trapping.

The major problem I foresee would be how to handle CSRs with effects
that must cross privilege boundaries? In the "grid" case, this is easy
-- the U-mode nodes have control registers in MMIO space for the control
processor. In the "partitioned" case, this could get "thorny". Or are
the CSRs, themselves grouped by privilege level, shared across all
quasi-harts in this scheme?

>> Generally, flushing branch prediction buffers on xRET should be
>> low-cost or even an improvement, since the branch buffer contains
>> irrelevant information immediately after a context switch.
>> Alternately, partitioning branch prediction buffers on privilege
>> level and ASID would provide isolation to close the side channel
>> entirely.
>
> The kernel and user modes are not two separate programs, they are part
> of the same program. When my program is in a loop doing read()/write()
> system calls, it switches into the kernel, does some work, and
> switches back; the branch prediction state is still relevant for the
> next time it switches into the kernel, and for a predictor which uses
> branch history, the history of the branches in user mode can be
> relevant to predict where it will branch in the kernel ("if the
> previous branch was that loop in user space, then the kernel will jump
> to SYS_read, otherwise it will jump to SYS_write").
>
> Of course, that has to be balanced with information leaks, so using
> branch history from a different protection level is not that good an
> idea, even if, as on my example, it could make the predictions
> somewhat better.

But partitioning the branch buffer makes the predictions even better by
avoiding confusion between the two, equivalent to a much larger branch
buffer within the same delay budget. And nearly every syscall should
have its own ECALL instruction, so the S-mode branch predictor can use
sepc as an input. ("if the user program executed _that_ ECALL, the
kernel will jump to SYS_read, if the user program executed _this_ ECALL,
the kernel will jump to SYS_write") Privileged branch predictors could
be further partitioned on *cause codes, to reflect different patterns
associated with handling different traps. The branch prediction state
associated with ECALL, however, risks inter-task leaks, so would need to
be further partitioned by user ASID or simply erased when satp (or suatp
if RVSdas is implemented) is written. Do page faults raise similar
concerns as ECALL?


-- Jacob

Daniel Lustig

unread,
Jan 5, 2018, 12:20:29 AM1/5/18
to jcb6...@gmail.com, Stefan O'Rear, RISC-V ISA Dev

On 1/4/2018 8:02 PM, Jacob Bachmeyer wrote:
> Stefan O'Rear wrote:
>> On Wed, Jan 3, 2018 at 10:49 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>>> To start a discussion, I suggest that all instructions have a control
>>> dependency on any preceding instructions in program order that can cause
>>> exceptions (excluding interrupt exceptions, which can occur at any
>>> instruction), until those potential traps are resolved.  Implementations are
>>> permitted to speculate through ECALL and into the trap handler, however,
>>> since the ECALL trap is unconditional and therefore resolved upon decoding
>>> ECALL.
>>>    
>>
>> This is a rather imprecise statement and appears to using "control
>> dependency" in a sense other than the standard one associated with
>> memory models.
>>  
>
> I may have misused the term, since I am still learning this area.
> Put simply, instructions should have dependencies on the "raise
> exception" output of all preceding instructions in program order.
> This should be sufficient to prevent speculation past an exception,
> but hopefully not necessary and a weaker rule can also be sufficient.
> Meltdown definitely relies on this rule being violated, and I suspect
> that at least some forms of Spectre can be similarly prevented.

In the memory model world, control dependencies are usually more about
"which path among the multiple options am I going to take?", rather
than "is there a possible trap along this path I'm on?" Traps and
faults are more often than not just considered a separate issue,
for better or for worse.

And fun fact FWIW: control dependencies alone don't prevent load-load
reordering from becoming even architecturally visible under RVWMO (or
under the ARM or Power memory models). In other words, it's legal to
speculatively execute past a branch, perform a later load along the
speculated path, and only then perform an earlier load that resolves
the branch condition, assuming the speculation turned out to be
correct. If it turns out the branch was mispredicted, you'd obviously
have to squash the later load, but that doesn't mean the speculation
itself was illegal for the implementation to have done.

All this isn't meant to be a criticism of your idea, Jacob. It's
just meant to explain a bit more how we use the term "control
dependency" in the memory model world, so that as Stefan says we
can try to avoid any unfortunate terminology clashes.

Dan
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Samuel Falvo II

unread,
Jan 5, 2018, 12:37:03 AM1/5/18
to Jacob Bachmeyer, Christoph Hellwig, isa...@groups.riscv.org
On Thu, Jan 4, 2018 at 7:21 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> browser, for example. RVJ will need to consider these risks.

I'm not convinced that this falls within RVJ's purview. It seems to
me that any application which allows 3rd party plugins, even those
natively compiled (e.g., COM-based extensions), runs the risk of a
Spectre-style side-channel attack.

This actually ties into a very early suggestion I made on these
mailing lists (date unknown, it was while privilege spec was still
pre-1.9), which was to not ignore the needs of single address space
operating systems. Any technique that would mitigate Spectre in a
SASOS would absolutely help in a multi-address-space environment as
well. It's not clear to me what those techniques would be; perhaps
it's a good time to revisit options.

--
Samuel A. Falvo II

Jacob Bachmeyer

unread,
Jan 5, 2018, 12:56:30 AM1/5/18
to Eric McCorkle, isa...@groups.riscv.org
That is probably a different variant of the attack -- the attacks
published used cache-based side channels to read back information.

Indirect branch prediction was abused to cause speculative execution in
another process (or another part of the same process; this last attack
seems hardest to prevent) to land at a known "gadget" or series of
"gadgets" to perform some calculation with sensitive data that has
observable effects on the cache. Essentially, the output of the rogue
computation is a memory address that gets loaded into the cache. All
other effects of the rogue computation are canceled when the processor
eventually finds that it mispredicted the indirect jump. The attacker
then determines which of several addresses has been cached and has
leaked some number of bits.

Essentially, the branch target buffers are "hostile" side-channels used
to affect the execution of "innocent" code that does *not* intend to
receive anything from that channel.

On at least some Intel processors, this can be used to "run" arbitrary
unchecked eBPF bytecode (from a user buffer) by causing a branch in the
kernel to be mispredicted to the eBPF interpreter. This attack probably
combines Spectre and Meltdown, since SMAP does not prevent the bad
bytecode from being speculatively evaluated, even though it would stop
those accesses normally and Project Zero did not demonstrate these on
AMD processors. This was how Project Zero leaked data from a KVM
hypervisor.

On further rereading of Project Zero's post, you are right: they *did*
use the branch buffers to leak hypervisor code addresses, although that
was different from the attacks that manipulated speculative control
flow. Spectre is a whole class of attacks and I think it may be bigger
than anyone has yet realized. Ouch. That defeats KASLR handily.


-- Jacob

Jacob Bachmeyer

unread,
Jan 5, 2018, 1:17:54 AM1/5/18
to Daniel Lustig, Stefan O'Rear, RISC-V ISA Dev
Fair enough; is there a better term for expressing this idea?

> And fun fact FWIW: control dependencies alone don't prevent load-load
> reordering from becoming even architecturally visible under RVWMO (or
> under the ARM or Power memory models). In other words, it's legal to
> speculatively execute past a branch, perform a later load along the
> speculated path, and only then perform an earlier load that resolves
> the branch condition, assuming the speculation turned out to be
> correct. If it turns out the branch was mispredicted, you'd obviously
> have to squash the later load, but that doesn't mean the speculation
> itself was illegal for the implementation to have done.
>

Such that a value can appear in a register that is inconsistent with the
branch state that was resolved? (Example: with X and Y initially 0,
(on A) store 5 -> Y, store 1 -> X, (on B) "if (X) print Y" prints 0?)
If so, we probably need cautions about this in the RISC-V Assembly
Programmer's Handbook or many programmers will get very unpleasant
surprises.

> All this isn't meant to be a criticism of your idea, Jacob. It's
> just meant to explain a bit more how we use the term "control
> dependency" in the memory model world, so that as Stefan says we
> can try to avoid any unfortunate terminology clashes.
>

Criticism of ideas is good; it is the uncritically-accepted ideas that
worry me. :-)


-- Jacob

Jonas Oberhauser

unread,
Jan 5, 2018, 1:37:12 AM1/5/18
to Jacob Bachmeyer, Daniel Lustig, Stefan O'Rear, RISC-V ISA Dev


On Jan 5, 2018 7:17 AM, "Jacob Bachmeyer" <jcb6...@gmail.com> wrote:

Such that a value can appear in a register that is inconsistent with the branch state that was resolved? 

Nope.

Jacob Bachmeyer

unread,
Jan 5, 2018, 1:38:01 AM1/5/18
to Samuel Falvo II, Christoph Hellwig, isa...@groups.riscv.org
Samuel Falvo II wrote:
> On Thu, Jan 4, 2018 at 7:21 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> browser, for example. RVJ will need to consider these risks.
>>
>
> I'm not convinced that this falls within RVJ's purview. It seems to
> me that any application which allows 3rd party plugins, even those
> natively compiled (e.g., COM-based extensions), runs the risk of a
> Spectre-style side-channel attack.
>

The Spectre authors demonstrated a version of the attack in JavaScript
using Chrome's JIT. Spectre does not give native plugins anything new
-- they are already loaded into the program's address space with no
sandbox. It does break the JavaScript sandbox, however, and I
understand that RVJ is supposed to be enhanced JIT support. Since
JavaScript is likely to be a potential attacker's "first landing", and
JITs often include software sandboxing features, I argue that protecting
software sandboxes should be in-scope for RVJ if not the baseline RVI.

The indirect branch poisoning to abuse native code is a different attack
in the Spectre class.

> This actually ties into a very early suggestion I made on these
> mailing lists (date unknown, it was while privilege spec was still
> pre-1.9), which was to not ignore the needs of single address space
> operating systems.

I believe that that was your early opposition to restricting S-mode
instruction fetch, which prompted me to work out a means to use VM
aliasing to allow SuperState()/UserState() syscalls to be implemented,
even though they are deprecated on AROS and insane on POSIX. :-) (I
even wrote them as "SetSuper()/SetUser()" at first! Easily
misremembered names, those are!)

> Any technique that would mitigate Spectre in a
> SASOS would absolutely help in a multi-address-space environment as
> well. It's not clear to me what those techniques would be; perhaps
> it's a good time to revisit options.
>

Agreed. Some applications are effectively SASOS environments
themselves, such as Emacs, or a modern Web browser with JavaScript.


-- Jacob

Jonas Oberhauser

unread,
Jan 5, 2018, 1:44:47 AM1/5/18
to Cesar Eduardo Barros, RISC-V ISA Dev
Great points, thanks!

How does this relate to speculative instruction fetch? E.g., as Jacob hinted, a possible interrupt followed by an indirect jump? I doubt that it is feasible to resolve the interrupt before fetching new instructions.

And in case the OS does have different page tables, how does that help? It looks like the cache line collision is probably determined by the some bits of the virtual address, which can thus still be leaked. Did I get that wrong?

Michael Clark

unread,
Jan 5, 2018, 3:57:31 AM1/5/18
to Jonas Oberhauser, Jacob Bachmeyer, Daniel Lustig, Stefan O'Rear, RISC-V ISA Dev
Yes. Nope. Chaos would ensue if one could see speculated results.

So not visible in an “architectural register” nevertheless the micro-architecture may hold speculated results in “physical registers” but in those whose state may never be retired as it is during retirement that the physical register for an operation is marked as the current version of the architectural register. It’s only during retirement that results actually become visible, and retirement is in-order, even in an out-of-order cpu.

It one reads Chris’s BOOM papers one can note that an OoO may have an arbitrarily large physical register file to hold the results of inflight but un-retired operations, and may optionally have a copy of the retired state in a portion of the register file where architectural registers map one to one to physical registers; which I believe is done to aid with rollback after mis-predict, such that one doesn’t have to reverse renames, rather one can just return temporary physical registers used by the mis-predicted operations back to the free list for the renamer to reallocate once the front-end is re-steered down the correct path.

Mis-predict penalty on a deeply pipelined OoO can be 12-15 cycles or more i.e the entire length of the pipeline, and is one of the reasons why so much effort is put into multiple fast branch prediction heuristics. Speed! It’s common for branch predictors to be > 99%  accurate so it pays to speculate.

I haven’t built an OoO so this is just my layman’s understanding so it’s likely approximately correct :-D

I like the idea of a small fully associative L0 cache for the results of speculated loads such that cache line eviction (cache side-effects) for speculated ops only happen / become visible during retirement, just like register side effects. It’s only necessary to use this L0 for loads after unresolved branches until the point that the branch is resolved i.e. a place to hide the results of mis-predicted loads.

Is that possible?

Stefan O'Rear

unread,
Jan 5, 2018, 4:15:13 AM1/5/18
to Michael Clark, Jonas Oberhauser, Jacob Bachmeyer, Daniel Lustig, RISC-V ISA Dev
On Fri, Jan 5, 2018 at 12:57 AM, Michael Clark <michae...@mac.com> wrote:
> I like the idea of a small fully associative L0 cache for the results of
> speculated loads such that cache line eviction (cache side-effects) for
> speculated ops only happen / become visible during retirement, just like
> register side effects. It’s only necessary to use this L0 for loads after
> unresolved branches until the point that the branch is resolved i.e. a place
> to hide the results of mis-predicted loads.
>
> Is that possible?

That could work reasonably well if you have a single level of cache;
it gets a lot more difficult with a hierarchy, since you'd need a
speculative buffer for each level of cache, and if the system contains
multiple cores or threads you would still need to worry about bank
conflicts leaking address bits from speculative accesses.

-s

Jonas Oberhauser

unread,
Jan 5, 2018, 4:18:41 AM1/5/18
to Michael Clark, Jacob Bachmeyer, Daniel Lustig, Stefan O'Rear, RISC-V ISA Dev


On Jan 5, 2018 09:57, "Michael Clark" <michae...@mac.com> wrote:

I like the idea of a small fully associative L0 cache for the results of speculated loads such that cache line eviction (cache side-effects) for speculated ops only happen / become visible during retirement, just like register side effects. It’s only necessary to use this L0 for loads after unresolved branches until the point that the branch is resolved i.e. a place to hide the results of mis-predicted loads.

Is that possible?

I also thought about it, but I am worried about cache coherence protocols where a remote access can change the state of the local cacheline. I think by measuring write latency of previously M cachelines which became S due to the remote read, you can still see the speculation.

--
You received this message because you are subscribed to a topic in the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/groups.riscv.org/d/topic/isa-dev/8ejNKIqFChw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Cesar Eduardo Barros

unread,
Jan 5, 2018, 4:24:18 AM1/5/18
to jcb6...@gmail.com, RISC-V ISA Dev
Em 05-01-2018 02:56, Jacob Bachmeyer escreveu:
> Cesar Eduardo Barros wrote:
>> Em 04-01-2018 04:49, Jacob Bachmeyer escreveu:
>>> Spectre produces more serious concerns.  I suggest a HINT for
>>> security-sensitive branches (or possibly the other way around:  a
>>> HINT for branches that are permitted to use dynamic branch
>>> prediction) be added.  Indirect branch target buffers should be
>>> either keyed on an ASID column (and ASID-selectively flushed when an
>>> ASID's root PPN changes) or flushed entirely upon context-switch.
>>> Could flushing an indirect target buffer upon xRET be sufficient if
>>> separate indirect target buffers are maintained for each implemented
>>> privilege level?
>>
>> I don't like the HINT idea, sounds too much like "enumerating badness"
>> to me. A single missed HINT (or a wrongly placed one, for the opposite
>> idea) could be enough to blow the doors wide open, and defining what
>> is "security-sensitive" isn't necessarily obvious, especially when an
>> innocent-looking branch could be used with unexpected values to read
>> unrelated data.
>
> It seemed questionable to me at the time, but I was looking for "first
> answers" to Spectre to seed discussion, so went with it anyway.  How
> about considering all branches where the wrong path can lead to a
> program crash (such as bounds checks) sensitive?  Then, of course, a
> parser that is safe on any input could still benefit from dynamic branch
> prediction.

In the kernel, the wrong side of a bounds check most probably won't
crash; instead, it probably will return -EINVAL or similar to the
caller, which will test for negative return values, and in that case
release everything it had allocated and pass the return value to its
caller, and so on, until it reaches userspace, where the C library
stashes the negation of the return value in errno and returns -1 or NULL
depending on the call.

In a userspace bounds check, some languages might crash, but others like
Rust or Java will throw an exception (Rust's "panic", used by its bounds
checks and other preconditions, by default uses the same exception
mechanism as C++, which unwinds the stack and can be caught).

So the processor, which has a much more local view of the program flow,
can't know it's a wrong path. Worse, when Spectre happens, the processor
is looking at the other side of the branch, the one where it would go
when the bounds check passed, so even if there's a "undefined
instruction" right at the start of the failed path, the processor won't
see it.

> Or should we have a HINT that permits indirect jumps to be predicted to
> its location, similar to previous proposals on this list to restrict
> indirect jumps?  A predicted indirect jump not landing at a "safe
> landing HINT" would cancel speculative execution, preventing the abuse
> of gadgets.  An actual indirect jump would proceed, regardless of the
> absence of the "safe landing" HINT.

This reminds me of an other thread where we are talking about
overlapping instructions. Like gadgets, this HINT might show up where
you don't expect. And you're still jumping into an unexpected location,
for instance into a case label of a C switch in the middle of a
function, or into a function expecting different parameters.

Besides, I don't think it would work that well in practice. People (and
compilers) won't do the careful and subtle analysis to know whether it's
ok to speculate into that landing place. They will either add it
nowhere, so the branch target cache is a useless waste of area and
power, or they will add it everywhere without much thought, so you're
back to the initial situation but with extra complexity in the hardware
and junk instructions in the software.

>> Separate branch prediction state for each privilege level sounds like
>> an interesting idea. If you go further with the isolation, you can end
>> up with a design which works as if each privilege level were on its
>> own separate and isolated hart. Given how small RISC-V designs can
>> get, it's a fun thought experiment: what if, instead of several
>> privilege levels in a single hart, you had two separate harts, a
>> smaller one which ran exclusively in S-mode, and another one which ran
>> exclusively in U-mode?
>
> I suggested (message-id <5901720E...@gmail.com>
> <URL:https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5901720E.3090904%40gmail.com>)
> a similar solution to someone who was trying to make a large number of
> minimal RISC-V processors:  have a large number of U-mode-only harts
> that halt and feed interrupts to a common control processor instead of
> trapping.
>
> The major problem I foresee would be how to handle CSRs with effects
> that must cross privilege boundaries?  In the "grid" case, this is easy
> -- the U-mode nodes have control registers in MMIO space for the control
> processor.  In the "partitioned" case, this could get "thorny".  Or are
> the CSRs, themselves grouped by privilege level, shared across all
> quasi-harts in this scheme?

Like you said, the quasi-harts (I liked that name) on this thought
experiment share the CSRs, and won't run at the same time (so for
instance the ALUs could be shared too). Of course, an even more
interesting thought experiment would use completely separate harts for
each privilege mode, using message passing to communicate between them,
but then you'd need a special mechanism for the S-mode hart to set the
CSRs for the U-mode hart.
I hadn't thought of the delay. For sure, the main concern on a
high-performance design with a bigger first-level cache of any kind is
not area or power, but delay. So using a separate branch target buffer
for each privilege level would add minimal delay (one mux) while
allowing each privilege level to use the whole thing without
interference from the other privilege level. And as a bonus, this
hinders Spectre across privilege levels. Cool idea, though only for when
area isn't a concern (but when it is, you won't be doing out-of-order,
right?)

Hashing the sepc and cause into the branch prediction lookup sounds
interesting, though I'd be wary of allowing attacker-controlled input to
influence it. As long as the attacker knows enough of the hash funcion,
they can induce collisions. That is, it might be an interesting idea to
get a bit more of a hit rate in the normal case, but might not protect
much against an attack.

Stefan O'Rear

unread,
Jan 5, 2018, 4:44:10 AM1/5/18
to Cesar Eduardo Barros, Jacob Bachmeyer, RISC-V ISA Dev
On Fri, Jan 5, 2018 at 1:24 AM, Cesar Eduardo Barros
<ces...@cesarb.eti.br> wrote:
> I hadn't thought of the delay. For sure, the main concern on a
> high-performance design with a bigger first-level cache of any kind is not
> area or power, but delay. So using a separate branch target buffer for each
> privilege level would add minimal delay (one mux) while allowing each
> privilege level to use the whole thing without interference from the other
> privilege level. And as a bonus, this hinders Spectre across privilege
> levels. Cool idea, though only for when area isn't a concern (but when it
> is, you won't be doing out-of-order, right?)

For a typical "high-performance design" these days delay has less to
do with the number of muxes traversed and more to do with physical
distance in micrometers. So adding more components will necessarily
make everything further apart. It's tricky.

-s

Eric McCorkle

unread,
Jan 5, 2018, 8:01:58 AM1/5/18
to jcb6...@gmail.com, isa...@groups.riscv.org
On 01/05/2018 00:56, Jacob Bachmeyer wrote:

> That is probably a different variant of the attack -- the attacks
> published used cache-based side channels to read back information.

So I misunderstood the attack, and ended up making up another attack?
*sigh* Wonderful.

> Indirect branch prediction was abused to cause speculative execution in
> another process (or another part of the same process; this last attack
> seems hardest to prevent) to land at a known "gadget" or series of
> "gadgets" to perform some calculation with sensitive data that has
> observable effects on the cache.  Essentially, the output of the rogue
> computation is a memory address that gets loaded into the cache.  All
> other effects of the rogue computation are canceled when the processor
> eventually finds that it mispredicted the indirect jump.  The attacker
> then determines which of several addresses has been cached and has
> leaked some number of bits.

Yeah, that's... bad. There's basically no tools for OS developers to
even begin to mitigate this in current architectures. There's no way to
control branch predictors or BTBs.

I'll keep it focused here on "how do we provide the tools to mitigate
this stuff". One obvious tool that comes to mind is *every* cache-like
feature needs to be flushable by some instruction at a minimum, and
going further, probably keyed to some sort of "security domain"
identifier to prevent effects from crossing over. It's the only
absolutely bulletproof way to clear out the side-channels.

Eric McCorkle

unread,
Jan 5, 2018, 8:18:19 AM1/5/18
to isa...@groups.riscv.org
On 01/05/2018 08:01, Eric McCorkle wrote:

> Yeah, that's... bad. There's basically no tools for OS developers to
> even begin to mitigate this in current architectures. There's no way to
> control branch predictors or BTBs.

Actually...

I posted a thing on the FreeBSD lists about storing sensitive data in
non-cacheable memory. Unless I'm missing something, this ends up
defeating meltdown outright, and it also defeats the "dive into their
code and soak up info into the branch predictors" attack.

It wouldn't be an absolute defense against spectre, but it seems to me
there'd be a very low probability of a speculative execution branch
surviving long enough for data to come in from a non-cached load, which
in turn prevents information about it from reaching side-channels.

Storing things in non-cacheable memory is the best we can do with what
we have, but it suggests a better mechanism- some kind of
non-speculative load, combined with an "all loads from this cache
line/page/whatever are non-speculative" flag.

Jonas Oberhauser

unread,
Jan 5, 2018, 8:25:59 AM1/5/18
to Eric McCorkle, Jacob Bachmeyer, RISC-V ISA Dev
I don't understand how this clears out side channels to other caches, such as M lines being changed to S in a MESI protocol cache, and thus a second core being able to read out the data by analysing write latency. 
Can you explain how it helps or why situation is not an issue? 

Samuel Falvo II

unread,
Jan 5, 2018, 10:15:54 AM1/5/18
to Jacob Bachmeyer, Christoph Hellwig, isa...@groups.riscv.org
On Thu, Jan 4, 2018 at 10:37 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> The Spectre authors demonstrated a version of the attack in JavaScript using
> Chrome's JIT. Spectre does not give native plugins anything new -- they are
> already loaded into the program's address space with no sandbox. It does
> break the JavaScript sandbox, however, and I understand that RVJ is supposed
> to be enhanced JIT support. Since JavaScript is likely to be a potential
> attacker's "first landing", and JITs often include software sandboxing
> features, I argue that protecting software sandboxes should be in-scope for
> RVJ if not the baseline RVI.

Yes, I'm aware of all of this. However, it seems like you didn't
think through what I was trying to say.

There are plenty of native-code sandboxes (e.g., Spring research OS
from Sun, Oberon System's modules compiled w/out importing SYSTEM,
Google's NaCl, pretty much *any* modern dialect of Forth, graphics
engines that "compile" sprites, and so on ). My contention is that
this is a *fundamental* issue, not something that is unique to JITs or
dynamic languages.

> I believe that that was your early opposition to restricting S-mode
> instruction fetch, which prompted me to work out a means to use VM aliasing
> to allow SuperState()/UserState() syscalls to be implemented, even though
> they are deprecated on AROS and insane on POSIX. :-) (I even wrote them as
> "SetSuper()/SetUser()" at first! Easily misremembered names, those are!)

Not quite; that discussion was focusing more on supporting legacy
environments. As I recall, I mentioned legacy several times, and gave
AmigaOS and AROS as a reasonably popular case studies, both of which
are still maintained (AmigaOS 4 targets PowerPCs, but it's conceivable
that it could be ported to another platform; AROS is already
multi-platform). I am not sure how MorphOS factors into this (another
AmigaOS clone, but one which I know nothing about).

Commodore-Amiga, having implemented a SASOS environment for their
Kickstart operating system, realized that it could not implement
multiple address spaces and retain seamless backward compatibility.
Knowing that process protection was going to be important in the
future, they were going to make their own custom MMU to specifically
support protection in the context of Kickstart. (Of course, then they
went bankrupt and whatever work they had accomplished on paper
vanished.)

I want to know what they were thinking and get these out in the open,
so that these techniques can be debated on technical merits.

> Agreed. Some applications are effectively SASOS environments themselves,
> such as Emacs, or a modern Web browser with JavaScript.

Exactly.

Eric McCorkle

unread,
Jan 5, 2018, 10:21:39 AM1/5/18
to isa...@groups.riscv.org
On 01/05/2018 08:25, Jonas Oberhauser wrote:
>
>
> 2018-01-05 14:01 GMT+01:00 Eric McCorkle <er...@metricspace.net
> <mailto:er...@metricspace.net>>:
Ugh. You're right. Coherence protocols are a whole other side-channel.

Alex Elsayed

unread,
Jan 5, 2018, 1:50:13 PM1/5/18
to Eric McCorkle, RISC-V ISA Dev
The Spectre paper itself warns of _many_ other side channels near the end of section 7:

The practicality of microcode fixes for existing processors is also unknown. It is possible that a patch could disable speculative execution or prevent speculative memory reads, but this would bring a significant performance penalty. Buffering speculatively-initiated memory transactions separately from the cache until speculative execution is committed is not a sufficient countermeasure, since the timing of speculative execution can also reveal information. For example, if speculative execution uses a sensitive value to form the address for a memory read, the cache status of that read will affect the timing of the next speculative operation. If the timing of that operation can be inferred, e.g., because it affects a resource such as a bus or ALU used by other threads, the memory is compromised.

More broadly, potential countermeasures limited to the memory cache are likely to be insufficient, since there are other ways that speculative execution can leak information. For example, timing effects from memory bus contention, DRAM row address selection status, availability of virtual registers, ALU activity, and the state of the branch predictor itself need to be considered. Of course, speculative execution will also affect conventional side channels, such as power and EM. 

Eric McCorkle

unread,
Jan 5, 2018, 5:33:39 PM1/5/18
to isa...@groups.riscv.org
On 01/05/2018 13:50, Alex Elsayed wrote:

> The Spectre paper itself warns of _many_ other side channels near the
> end of section 7:
>
> The practicality of microcode fixes for existing processors is also
> unknown. It is possible that a patch could disable speculative execution
> or prevent speculative memory reads, but this would bring a significant
> performance penalty. Buffering speculatively-initiated memory
> transactions separately from the cache until speculative execution is
> committed is not a sufficient countermeasure, since the timing of
> speculative execution can also reveal information. For example, if
> speculative execution uses a sensitive value to form the address for a
> memory read, the cache status of that read will affect the timing of the
> next speculative operation. If the timing of that operation can be
> inferred, e.g., because it affects a resource such as a bus or ALU used
> by other threads, the memory is compromised.

Yeah, this is only the beginning. The only perfect defense I can come
up with is stopping sensitive data from being used in speculative
execution at all.

I posted a proposal on the FreeBSD lists, which suggests a
countermeasure of keeping sensitive information in non-cacheable memory.
This should defeat the meltdown attack, and should repel the spectre
attack with very high probability. Obviously, this only works if you
correctly store sensitive information in these non-cacheable pages.

The real thing this technique is accomplishing (rather crudely, but I
have to work with what's there) is to prevent certain memory locations
from being used in speculative execution. A better way would be to do
this directly: mark pages as "sensitive" which ends up making its way
into the pipeline as a flag on registers. Then, no operation on any
sensitive value gets launched until it's guaranteed to commit.

> More broadly, potential countermeasures limited to the memory cache are
> likely to be insufficient, since there are other ways that speculative
> execution can leak information. For example, timing effects from memory
> bus contention, DRAM row address selection status, availability of
> virtual registers, ALU activity, and the state of the branch predictor
> itself need to be considered. Of course, speculative execution will also
> affect conventional side channels, such as power and EM.

From a security perspective, the best story would have multiple defense
mechanisms. That way, if one of them fails, or if a new attack is
discovered, hopefully another one will hold.

The engines extension I proposed a while back is one method: keep
sensitive data completely isolated. The "non-speculative" flag on
memory regions might be another mechanism. Hard virtual address space
separation a la SPARC/Power is another.

The trick is figuring out a reasonable set of features which are general
and powerful.

Jacob Bachmeyer

unread,
Jan 5, 2018, 6:07:43 PM1/5/18
to Jonas Oberhauser, Daniel Lustig, Stefan O'Rear, RISC-V ISA Dev
Jonas Oberhauser wrote:
> On Jan 5, 2018 7:17 AM, "Jacob Bachmeyer" <jcb6...@gmail.com
> <mailto:jcb6...@gmail.com>> wrote:
>
>
> Such that a value can appear in a register that is inconsistent
> with the branch state that was resolved?
>
>
> Nope.

Then how can load-load reordering become architecturally visible?


-- Jacob

Jonas Oberhauser

unread,
Jan 5, 2018, 6:10:55 PM1/5/18
to Jacob Bachmeyer, Daniel Lustig, Stefan O'Rear, RISC-V ISA Dev
Thread 1:
t1 = x
if t1 == 1 {
  y
}

Thread 2:
S y 1
fence 
S x 1

In this case thread 1's L y does not have to see S y


Jacob Bachmeyer

unread,
Jan 5, 2018, 6:24:21 PM1/5/18
to Cesar Eduardo Barros, RISC-V ISA Dev
Cesar Eduardo Barros wrote:
> Em 05-01-2018 02:56, Jacob Bachmeyer escreveu:
>> Cesar Eduardo Barros wrote:
>>> Em 04-01-2018 04:49, Jacob Bachmeyer escreveu:
>>>> Spectre produces more serious concerns. I suggest a HINT for
>>>> security-sensitive branches (or possibly the other way around: a
>>>> HINT for branches that are permitted to use dynamic branch
>>>> prediction) be added. Indirect branch target buffers should be
>>>> either keyed on an ASID column (and ASID-selectively flushed when
>>>> an ASID's root PPN changes) or flushed entirely upon
>>>> context-switch. Could flushing an indirect target buffer upon xRET
>>>> be sufficient if separate indirect target buffers are maintained
>>>> for each implemented privilege level?
>>>
>>> I don't like the HINT idea, sounds too much like "enumerating
>>> badness" to me. A single missed HINT (or a wrongly placed one, for
>>> the opposite idea) could be enough to blow the doors wide open, and
>>> defining what is "security-sensitive" isn't necessarily obvious,
>>> especially when an innocent-looking branch could be used with
>>> unexpected values to read unrelated data.
> [...]
>
>> Or should we have a HINT that permits indirect jumps to be predicted
>> to its location, similar to previous proposals on this list to
>> restrict indirect jumps? A predicted indirect jump not landing at a
>> "safe landing HINT" would cancel speculative execution, preventing
>> the abuse of gadgets. An actual indirect jump would proceed,
>> regardless of the absence of the "safe landing" HINT.
>
> This reminds me of an other thread where we are talking about
> overlapping instructions. Like gadgets, this HINT might show up where
> you don't expect. And you're still jumping into an unexpected
> location, for instance into a case label of a C switch in the middle
> of a function, or into a function expecting different parameters.
>
> Besides, I don't think it would work that well in practice. People
> (and compilers) won't do the careful and subtle analysis to know
> whether it's ok to speculate into that landing place. They will either
> add it nowhere, so the branch target cache is a useless waste of area
> and power, or they will add it everywhere without much thought, so
> you're back to the initial situation but with extra complexity in the
> hardware and junk instructions in the software.

For example, the start of a function can be expected to be a safe
indirect landing, assuming function pointers exist. Since I expect that
the most common indirect jump is "call through function pointer", a
large fraction of the cases can be met by simply treating the standard
function prologue as a safe landing HINT and holding speculative
execution if an indirect jump does not land at a function prologue. So
an explicit safe landing HINT would only be needed for jump table
destinations, which the compiler should be able to determine reliably.
In the "grid" case, that was an MMIO window, assuming the U-mode nodes
even have a full set of CSRs and do not simply share the control
processor's MMU. (The point of that exercise was to reduce the CSR set
that a U-mode node must implement and therefore the area required for
each node.)
I was suggesting to hash sepc into the branch prediction buffers used
with ECALL, and to partition the branch prediction buffer on *cause. In
other words, an independent branch prediction buffer for each cause
code, since each cause will go to different handlers after software
dispatch. Since the ECALL prediction buffer could risk inter-task
leaks, it would be further partitioned by user ASID or simply cleared
when the user address space is swapped during user task switch.


-- Jacob

Jacob Bachmeyer

unread,
Jan 5, 2018, 6:32:56 PM1/5/18
to Eric McCorkle, isa...@groups.riscv.org
Eric McCorkle wrote:
> On 01/05/2018 00:56, Jacob Bachmeyer wrote
>> Indirect branch prediction was abused to cause speculative execution in
>> another process (or another part of the same process; this last attack
>> seems hardest to prevent) to land at a known "gadget" or series of
>> "gadgets" to perform some calculation with sensitive data that has
>> observable effects on the cache. Essentially, the output of the rogue
>> computation is a memory address that gets loaded into the cache. All
>> other effects of the rogue computation are canceled when the processor
>> eventually finds that it mispredicted the indirect jump. The attacker
>> then determines which of several addresses has been cached and has
>> leaked some number of bits.
>>
>
> Yeah, that's... bad. There's basically no tools for OS developers to
> even begin to mitigate this in current architectures. There's no way to
> control branch predictors or BTBs.
>

Fortunately, Project Zero was only able to demonstrate this variant on
Intel CPUs; I suspect that issues related to Meltdown are necessary for
the cross-process form of the attack to work.

> I'll keep it focused here on "how do we provide the tools to mitigate
> this stuff".

That was my intent when starting this thread -- and "how do we prevent
this in the first place?". Prevention is better than mitigation. :-)

> One obvious tool that comes to mind is *every* cache-like
> feature needs to be flushable by some instruction at a minimum, and
> going further, probably keyed to some sort of "security domain"
> identifier to prevent effects from crossing over. It's the only
> absolutely bulletproof way to clear out the side-channels.
>

I would prefer keying those structures to security domains, rather than
expecting explicit flushes. The latter would also require adding many
more instructions that would require microarchitectural knowledge to
correctly use, so explicit flushes are not really an option for RISC-V. :-/


-- Jacob

Jacob Bachmeyer

unread,
Jan 5, 2018, 6:43:33 PM1/5/18
to Eric McCorkle, isa...@groups.riscv.org
Eric McCorkle wrote:
> On 01/05/2018 08:01, Eric McCorkle wrote:
>
>> Yeah, that's... bad. There's basically no tools for OS developers to
>> even begin to mitigate this in current architectures. There's no way to
>> control branch predictors or BTBs.
>>
>
> Actually...
>
> I posted a thing on the FreeBSD lists about storing sensitive data in
> non-cacheable memory. Unless I'm missing something, this ends up
> defeating meltdown outright, and it also defeats the "dive into their
> code and soak up info into the branch predictors" attack.
>

It *might* prevent Meltdown; we do not know what the precondition for
Meltdown actually is. Stacking enough speculation for an uncached read
to complete may be possible: the attacks published so far did not need it.

> It wouldn't be an absolute defense against spectre, but it seems to me
> there'd be a very low probability of a speculative execution branch
> surviving long enough for data to come in from a non-cached load, which
> in turn prevents information about it from reaching side-channels.
>

With respect to the attacks published so far, I believe that you are
correct: the current attacks use a cache miss (either data or
instruction or both) to provide the delay in which speculative execution
occurs. I do not know if there are ways to produce longer speculation
times, or further Meltdown-like issues that would defeat this
mitigation. (Finding that Intel has been speculatively caching
"uncachable" reads and invalidating those lines when the instruction
retires would not surprise me at this point.)

> Storing things in non-cacheable memory is the best we can do with what
> we have, but it suggests a better mechanism- some kind of
> non-speculative load, combined with an "all loads from this cache
> line/page/whatever are non-speculative" flag.
>

In RISC-V, this could be a configurable PMA.


-- Jacob

Jonas Oberhauser

unread,
Jan 5, 2018, 6:47:46 PM1/5/18
to Eric McCorkle, RISC-V ISA Dev


On Jan 5, 2018 23:33, "Eric McCorkle" <er...@metricspace.net> wrote:

I posted a proposal on the FreeBSD lists, which suggests a
countermeasure of keeping sensitive information in non-cacheable memory.
 This should defeat the meltdown attack, and should repel the spectre
attack with very high probability.  Obviously, this only works if you
correctly store sensitive information in these non-cacheable pages.

The real thing this technique is accomplishing (rather crudely, but I
have to work with what's there) is to prevent certain memory locations
from being used in speculative execution.  A better way would be to do
this directly: mark pages as "sensitive" which ends up making its way
into the pipeline as a flag on registers.  Then, no operation on any
sensitive value gets launched until it's guaranteed to commit.

I think this can at least be used to leak the sensitive regions by accessing some data, then evicting a line using the data as an address -- if the line is evicted, the data was not sensitive. By going through it twice, you can probably also get the reverse -- if the line is not evicted, the data was sensitive, because non-sensitive data would now be cached.

Jacob Bachmeyer

unread,
Jan 5, 2018, 6:49:20 PM1/5/18
to Jonas Oberhauser, Eric McCorkle, RISC-V ISA Dev
Jonas Oberhauser wrote:
> 2018-01-05 14:01 GMT+01:00 Eric McCorkle <er...@metricspace.net
> <mailto:er...@metricspace.net>>:
It is not 100%, but it is a step in the right direction. At this point,
if we could limit side channels to only those present with independent
processors and a cache hierarchy, we will have made an improvement and
closed off the majority of Spectre-class attacks. Then we can work to
mitigate those side channels.

There is a saying about this: How do you eat an elephant? One bite at
a time.


-- Jacob

Jonas Oberhauser

unread,
Jan 5, 2018, 6:56:05 PM1/5/18
to Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
To be honest I understand only for the DRAM selection state and the branch predictor why they are really bad. For the others the window of attack seems to be too small to do anything useful. Is that assumption wrong?


If all we had to do was prevent effects to the cache, one could speculatively load only in case
1) nothing will be evicted (no cache contention)
2) no other cache will need to make a transition
3) the loaded data is clean

LRs are different; speculative LRs that lock a cacheline can be timed by an attacker.

If a remote operation is observed that will invalidate condition 3) while still speculating, one needs to abort and retry (or wait for the speculation to complete).

I guess the branch predictor can ignore speculative training data (only train at retirement), and if one never speculatively loads from DRAM, that problem can also be avoided.

The difficulty of course lies in the fact that these may be things that happen very often, making the performance penalty unbearable. Every instruction speculates on not being interrupted, and you typically want to fetch before resolving that speculation. I still hope HW interrupts (/IPIs) are not a problem because they are too unpredictable, or at least can be made somewhat unpredictable, or at least that attackers can not normally exploit those interrupts (e.g., time them). I don't know if that hope is realistic.

--
You received this message because you are subscribed to a topic in the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/groups.riscv.org/d/topic/isa-dev/8ejNKIqFChw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Jacob Bachmeyer

unread,
Jan 5, 2018, 6:57:49 PM1/5/18
to Samuel Falvo II, Christoph Hellwig, isa...@groups.riscv.org
Samuel Falvo II wrote:
> On Thu, Jan 4, 2018 at 10:37 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
>> The Spectre authors demonstrated a version of the attack in JavaScript using
>> Chrome's JIT. Spectre does not give native plugins anything new -- they are
>> already loaded into the program's address space with no sandbox. It does
>> break the JavaScript sandbox, however, and I understand that RVJ is supposed
>> to be enhanced JIT support. Since JavaScript is likely to be a potential
>> attacker's "first landing", and JITs often include software sandboxing
>> features, I argue that protecting software sandboxes should be in-scope for
>> RVJ if not the baseline RVI.
>>
>
> Yes, I'm aware of all of this. However, it seems like you didn't
> think through what I was trying to say.
>
> There are plenty of native-code sandboxes (e.g., Spring research OS
> from Sun, Oberon System's modules compiled w/out importing SYSTEM,
> Google's NaCl, pretty much *any* modern dialect of Forth, graphics
> engines that "compile" sprites, and so on ). My contention is that
> this is a *fundamental* issue, not something that is unique to JITs or
> dynamic languages.
>

I stand corrected, then. I still would like to see measures to protect
software sandboxes in baseline RVI, but RVJ, as an extension, will have
more leeway than I expect for changes to baseline RVI at this point. I
also expect to see RVJ widely implemented, so sandbox protection in RVJ
would be widely available.

> [...]
>
> Commodore-Amiga, having implemented a SASOS environment for their
> Kickstart operating system, realized that it could not implement
> multiple address spaces and retain seamless backward compatibility.
> Knowing that process protection was going to be important in the
> future, they were going to make their own custom MMU to specifically
> support protection in the context of Kickstart. (Of course, then they
> went bankrupt and whatever work they had accomplished on paper
> vanished.)
>
> I want to know what they were thinking and get these out in the open,
> so that these techniques can be debated on technical merits.
>

This could be very interesting: a protection scheme usable *within*
user programs. Got any leads on it? Know any people you can ask?


-- Jacob

Eric McCorkle

unread,
Jan 5, 2018, 7:10:42 PM1/5/18
to jcb6...@gmail.com, isa...@groups.riscv.org


On 01/05/2018 18:43, Jacob Bachmeyer wrote:
 
>>
>> Actually...
>>
>> I posted a thing on the FreeBSD lists about storing sensitive data in
>> non-cacheable memory.  Unless I'm missing something, this ends up
>> defeating meltdown outright, and it also defeats the "dive into their
>> code and soak up info into the branch predictors" attack.
>>  
>
> It *might* prevent Meltdown; we do not know what the precondition for
> Meltdown actually is.  Stacking enough speculation for an uncached read
> to complete may be possible:  the attacks published so far did not need it.

This is my reasoning. It might be flawed; we're in uncharted waters here:

Meltdown starts with an access to a kernel page, grabbing sensitive
data, then executing transient ops on it. Assume a virtually-tagged
cache and look at the TLB/Cache hit/miss matrix:

TLB Hit, Cache Hit: TLB reports fault no later than the cache returns a
value (probably sooner). Fault happens before the data shows up.

TLB Miss, Cache Hit: Cache returns data quick, then you're executing
transient ops until the page table walk completes (possibly 1000s of
cycles). This is the only case where meltdown works.

TLB Hit, Cache miss: Cache fill operation strongly depends on address
translation (otherwise, you don't know where to look). TLB lookup
reports a fault.

TLB Miss, Cache miss: Cache fill operation strongly depends on address
translation, operation stalls for page table walk, which reports a fault.

So unless I'm missing something, if the cache misses, the attack fails.
If the cache is physically-indexed, then you need an address
translation, which reports a fault.

So the critical thing about meltdown seems to be that it depends on the
fact that in some cases (hit on a virtually-indexed cache), the
dependence of data access on address translation is erased (by design).
This defense works by restoring that dependence.

> (Finding that Intel has been speculatively caching
> "uncachable" reads and invalidating those lines when the instruction
> retires would not surprise me at this point.)

That would wreck the defense against spectre, but the defense against
meltdown might survive.

Jacob Bachmeyer

unread,
Jan 5, 2018, 7:22:42 PM1/5/18
to Jonas Oberhauser, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
Jonas Oberhauser wrote:
> On Jan 5, 2018 19:50, "Alex Elsayed" <etern...@gmail.com
> <mailto:etern...@gmail.com>> wrote:
>
> On Jan 5, 2018 7:21 AM, "Eric McCorkle" <er...@metricspace.net
> <mailto:er...@metricspace.net>> wrote:
>
> On 01/05/2018 08:25, Jonas Oberhauser wrote:
> >
> >
> > 2018-01-05 14:01 GMT+01:00 Eric McCorkle
> <er...@metricspace.net <mailto:er...@metricspace.net>
> > <mailto:er...@metricspace.net <mailto:er...@metricspace.net>>>:
The correctness of that assumption will depend on microarchitectural
details, such as maximum speculation depth. Speculating that Intel
processors have the greatest maximum speculation depth is reasonable and
would explain the observation that Intel processors are much more
severely affected.

> If all we had to do was prevent effects to the cache, one could
> speculatively load only in case
> 1) nothing will be evicted (no cache contention)
> 2) no other cache will need to make a transition
> 3) the loaded data is clean
>
> LRs are different; speculative LRs that lock a cacheline can be timed
> by an attacker.
>
> If a remote operation is observed that will invalidate condition 3)
> while still speculating, one needs to abort and retry (or wait for the
> speculation to complete).

Such a remote operation would need to cause all speculation past that
load to be abandoned. Also, any cachelines speculatively loaded would
need to be invalidated if speculation is abandoned. This could require
several additional columns in the cache to store a "speculation
checkpoint" number, to allow partial invalidation of speculated
execution, analogous to transaction savepoints in PostgreSQL.

> I guess the branch predictor can ignore speculative training data
> (only train at retirement), and if one never speculatively loads from
> DRAM, that problem can also be avoided.

Limiting speculative loads to data already cached could be an
interesting option and meets your condition (1) by definition.

> The difficulty of course lies in the fact that these may be things
> that happen very often, making the performance penalty unbearable.
> Every instruction speculates on not being interrupted, and you
> typically want to fetch before resolving that speculation.

I believe that you are mistaken about interrupts in RISC-V: an
interrupt causes some instruction to take an interrupt exception and
trap. The interrupt trap (as "move pc to ?epc") can be inserted into
the decoded instruction stream ahead of (almost) any instruction and the
fetch unit repointed to *tvec. (I say almost any because some
implementations might choose to hold off interrupts in LR/SC
sequences.) The "interrupted" instruction is not executed at all.
Which instruction takes the exception is unspecified, so as long as the
"trap marker" appears correctly in the fetched instruction stream, the
fetch unit can simply start fetching from the address in *tvec some time
after the interrupt arrives.

> I still hope HW interrupts (/IPIs) are not a problem because they are
> too unpredictable, or at least can be made somewhat unpredictable, or
> at least that attackers can not normally exploit those interrupts
> (e.g., time them). I don't know if that hope is realistic.

In high-performance (=low-latency) systems, some interrupt timing will
effectively leak all the way to user space and may provide side channels
associated with storage hardware, for example. On the other hand,
interrupt timing is or has been a source of entropy for the Linux RNG,
so at least the Linux devs seem to expect interrupts to be random. If
interrupts can be made predictable by an attacker, poisoning the state
of the Linux kernel RNG to cause predicatble keys to be generated might
be possible.


-- Jacob

Eric McCorkle

unread,
Jan 5, 2018, 7:26:46 PM1/5/18
to jcb6...@gmail.com, isa...@groups.riscv.org
On 01/05/2018 18:32, Jacob Bachmeyer wrote:

>
> Fortunately, Project Zero was only able to demonstrate this variant on
> Intel CPUs; I suspect that issues related to Meltdown are necessary for
> the cross-process form of the attack to work.

Point of fact: there have been attacks against AMD, and ARM reported.
Additionally, Red Hat is reporting Power8 and 9 as being vulnerable.

> I would prefer keying those structures to security domains, rather than
> expecting explicit flushes.  The latter would also require adding many
> more instructions that would require microarchitectural knowledge to
> correctly use, so explicit flushes are not really an option for RISC-V. 
> :-/

That's the better approach, I agree.

Eric McCorkle

unread,
Jan 5, 2018, 7:33:15 PM1/5/18
to Jonas Oberhauser, Alex Elsayed, RISC-V ISA Dev
On 01/05/2018 18:56, Jonas Oberhauser wrote:

> To be honest I understand only for the DRAM selection state and the
> branch predictor why they are really bad. For the others the window of
> attack seems to be too small to do anything useful. Is that assumption
> wrong?

I don't think it is. Weaponized exploit are almost always probabilistic
tools, so they're built to hammer away, collecting data until they get
the full picture.

Even if a tiny attack window translates into say, a 0.1% chance of
success, I can run the attack 1000 times and still probably only take a
few seconds of compute time.

For an attacker with persistence going after a valuable secret, they
could potentially tolerate odds of success approaching 1/million or even
1/billion.

Jacob Bachmeyer

unread,
Jan 5, 2018, 7:51:51 PM1/5/18
to Eric McCorkle, isa...@groups.riscv.org
Eric McCorkle wrote:
> On 01/05/2018 18:43, Jacob Bachmeyer wrote:
>
>
>>> Actually...
>>>
>>> I posted a thing on the FreeBSD lists about storing sensitive data in
>>> non-cacheable memory. Unless I'm missing something, this ends up
>>> defeating meltdown outright, and it also defeats the "dive into their
>>> code and soak up info into the branch predictors" attack.
>>>
>>>
>> It *might* prevent Meltdown; we do not know what the precondition for
>> Meltdown actually is. Stacking enough speculation for an uncached read
>> to complete may be possible: the attacks published so far did not need it.
>>
>
> This is my reasoning. It might be flawed; we're in uncharted waters here:
>
> Meltdown starts with an access to a kernel page, grabbing sensitive
> data, then executing transient ops on it. Assume a virtually-tagged
> cache and look at the TLB/Cache hit/miss matrix:
>
> TLB Hit, Cache Hit: TLB reports fault no later than the cache returns a
> value (probably sooner). Fault happens before the data shows up.
>

As I understand it, the fault is raised, but the instructions using the
bogus data have already been dispatched for speculative parallel
execution and must "complete" before an Intel processor can take the
trap. (This was a performance issue on the Pentium 4: speculatively
executed instructions after a predicted branch must "complete" before
the processor can resume from a branch misprediction. If one of those
instructions is DIV, the branch misprediction penalty can reach over a
hundred cycles. The AMD K8 from the same period would drop its
half-executed instructions "on the floor" after a branch misprediction.
The DIV instruction is about equally slow on both of them, but a DIV
close enough after a branch could severely hurt performance on the P4.
I suspect that page fault handling is similarly different and that these
differences have been carried through to the current microarchitectures.)

> TLB Miss, Cache Hit: Cache returns data quick, then you're executing
> transient ops until the page table walk completes (possibly 1000s of
> cycles). This is the only case where meltdown works.
>

Assuming the TLB has more rows than the cache, this case *should* be
logically impossible.

> TLB Hit, Cache miss: Cache fill operation strongly depends on address
> translation (otherwise, you don't know where to look). TLB lookup
> reports a fault.
>
> TLB Miss, Cache miss: Cache fill operation strongly depends on address
> translation, operation stalls for page table walk, which reports a fault.
>

Speculation is that these cases cause speculatively executed loads to
return zero on Intel processors, but speculative execution continues and
the published attack simply loops if the load returns zero. Amazingly
enough, the researchers found that on Intel processors this loop often
causes the load to eventually return non-zero before the page fault trap
is taken.

> So unless I'm missing something, if the cache misses, the attack fails.
> If the cache is physically-indexed, then you need an address
> translation, which reports a fault.
>

As I understand the published attacks, the problem (and the reason
Meltdown only affects Intel processors) is that Intel processors do not
take a trap until the offending instruction is retired *and* a fault
raised by a speculatively-executed instruction does not halt speculative
execution. If the retirement of the offending load can be delayed by
some trick (possibly using Spectre) for long enough that the page table
walk can complete (and the PTE walk is not aborted upon reaching a
supervisor page while in user mode, another Intel bug if so), then
Meltdown can work even if the data is not cached.

It seems that Meltdown *does* sometimes work even if the data is not
cached. We need to be certain that similar bugs in RISC-V processors
are plainly the vendor's fault -- I understand that Meltdown-vulnerable
Intel processors actually conform to the published x86 ISA. Meltdown is
both a processor bug and an ISA loophole, as I understand.


-- Jacob

Andrew Waterman

unread,
Jan 5, 2018, 8:01:30 PM1/5/18
to jcb6...@gmail.com, Eric McCorkle, isa...@groups.riscv.org
L1 TLBs often have many fewer entries than there are sets in the L1.

That said, straightforward physically indexed, virtually tagged caches can’t ever exhibit this case. It’s not possible to hit in such a cache without a TLB hit.

I’m guessing the Intel design used virtual microtags to reduce cache hit time. The vulnerability would then manifest when the virtual microtag match was a false positive (and presumably also when the TLB missed, but not necessarily).

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Jacob Bachmeyer

unread,
Jan 5, 2018, 8:03:11 PM1/5/18
to Eric McCorkle, isa...@groups.riscv.org
Eric McCorkle wrote:
> On 01/05/2018 18:32, Jacob Bachmeyer wrote:
>
>> Fortunately, Project Zero was only able to demonstrate this variant on
>> Intel CPUs; I suspect that issues related to Meltdown are necessary for
>> the cross-process form of the attack to work.
>>
>
> Point of fact: there have been attacks against AMD, and ARM reported.
>

The attacks that I have seen reported against AMD and ARM are the
within-process Spectre bounds-check violation. I think that there are
three attacks in the Spectre class so far: a bounds-check violation
attack (that works everywhere Project Zero tested it) that is limited to
bogus speculative reads within the same address space, an extension of
that attack that abused the Linux eBPF subsystem to read from a 4GiB
window in kernel memory (this attack worked on one of two AMD x86 if the
eBPF JIT was enabled, did not work on ARM, and worked with both eBPF JIT
and eBPF interpreted on Intel x86) but required that the execution of
the eBPF program was non-speculative, a branch-prediction attack that
leaks the previous values of %rip, even out of a hypervisor (this attack
only worked on Intel x86), and an indirect branch prediction poisoning
attack that allowed to speculatively "execute" unverified eBPF programs
using the hypervisor's eBPF interpreter (which cannot ordinarily be
reached from a KVM guest) that were then used to leak hypervisor memory
through a cache side channel (this last attack also only worked on Intel
x86).

> Additionally, Red Hat is reporting Power8 and 9 as being vulnerable.
>

Vulnerable to which of these? The bounds-check violation that breaks
software sandboxes, has worked everywhere else, and does not cross a
hardware security boundary? All of them?


-- Jacob

Jonas Oberhauser

unread,
Jan 5, 2018, 8:20:28 PM1/5/18
to Jacob Bachmeyer, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
On Jan 6, 2018 01:22, "Jacob Bachmeyer" <jcb6...@gmail.com> wrote:
Jonas Oberhauser wrote:

To be honest I understand only for the DRAM selection state and the branch predictor why they are really bad. For the others the window of attack seems to be too small to do anything useful. Is that assumption wrong?

The correctness of that assumption will depend on microarchitectural details, such as maximum speculation depth.  Speculating that Intel processors have the greatest maximum speculation depth is reasonable and would explain the observation that Intel processors are much more severely affected.

That makes sense; but I'm still not sure what the exact effect will be -- assume you had 1000 cycles per iteration to find the data, how many iterations would one need? I wouldn't be surprised if this amounted to leaking one byte per second or something similar, but I'm not deep enough into the technical parts to understand this. I see Eric also replied to this but I haven't had time to read his reply yet; I'll consider it tomorrow.

If all we had to do was prevent effects to the cache, one could speculatively load only in case
1) nothing will be evicted (no cache contention)
2) no other cache will need to make a transition
3) the loaded data is clean

LRs are different; speculative LRs that lock a cacheline can be timed by an attacker.

If a remote operation is observed that will invalidate condition 3) while still speculating, one needs to abort and retry (or wait for the speculation to complete).

Such a remote operation would need to cause all speculation past that load to be abandoned. 

Yes, what I meant by abort was even more severe -- retry the whole speculative sequence. You are right that only that load needs to be retried.

Also, any cachelines speculatively loaded would need to be invalidated if speculation is abandoned.

Exactly, hence condition 3).

This could require several additional columns in the cache to store a "speculation checkpoint" number, to allow partial invalidation of speculated execution, analogous to transaction savepoints in PostgreSQL.

Hm, I didn't think about that -- I thought it would be an all or nothing, but you're right, if a nested inner speculation/speculative load is detected as incorrect first, only those loads have to be invalidated.


I guess the branch predictor can ignore speculative training data (only train at retirement), and if one never speculatively loads from DRAM, that problem can also be avoided.

Limiting speculative loads to data already cached could be an interesting option and meets your condition (1) by definition.

Yes, and I think you (or someome else) suggested something like it before. I'm just not sure what the performance penalty is, so I'm trying to carve out as much space as possible.

The difficulty of course lies in the fact that these may be things that happen very often, making the performance penalty unbearable. Every instruction speculates on not being interrupted, and you typically want to fetch before resolving that speculation.

I believe that you are mistaken about interrupts in RISC-V:  an interrupt causes some instruction to take an interrupt exception and trap.  The interrupt trap (as "move pc to ?epc") can be inserted into the decoded instruction stream ahead of (almost) any instruction and the fetch unit repointed to *tvec.  (I say almost any because some implementations might choose to hold off interrupts in LR/SC sequences.)  The "interrupted" instruction is not executed at all.  Which instruction takes the exception is unspecified, so as long as the "trap marker" appears correctly in the fetched instruction stream, the fetch unit can simply start fetching from the address in *tvec some time after the interrupt arrives.

It's possible that I misunderstood something or used the wrong term but generally speaking in HW, many interrupts cause speculative instruction fetch, e.g., in a pipeline, an interrupt can not be detected in the early stages, but to avoid bubbles, one already fetches from pc, pc+4, pc+8, ... This has an effect on the cache, even if at a later stage the interrupt is detected and those instructions architecturally speaking never have been fetched.

Now using an indirect branch with some register R, one can read out the value of R (e.g., by timing instruction fetches, or by sharing the cachelines in a data cache and observing state changes). If one manages to put a secret value into R before the indirect branch, the attack is complete.

This is definitely true for exact SW interrupts. I now realize HW interrupts can avoid this speculation by draining the pipe. I was under the impression that RISCV has some exact SW interrupts, but I may be wrong.

The problem is that early on in the pipeline it is often impossible to tell whether the instruction may be of a type that traps or not, so you might potentially have to delay speculative instruction fetch  from DRAM after every instruction. That sounds really bad.

Maybe these attacks are not possible in practice because nobody has such code where an instruction might trap right before an indirect branch in just the right configuration, but I would not count on it. 

I still hope HW interrupts (/IPIs) are not a problem because they are too unpredictable, or at least can be made somewhat unpredictable, or at least that attackers can not normally exploit those interrupts (e.g., time them). I don't know if that hope is realistic.

In high-performance (=low-latency) systems, some interrupt timing will effectively leak all the way to user space and may provide side channels associated with storage hardware, for example.  On the other hand, interrupt timing is or has been a source of entropy for the Linux RNG, so at least the Linux devs seem to expect interrupts to be random.  If interrupts can be made predictable by an attacker, poisoning the state of the Linux kernel RNG to cause predicatble keys to be generated might be possible.


That is good to know, thanks!

Jacob Bachmeyer

unread,
Jan 5, 2018, 10:43:33 PM1/5/18
to Jonas Oberhauser, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
Jonas Oberhauser wrote:
> On Jan 6, 2018 01:22, "Jacob Bachmeyer" <jcb6...@gmail.com
> <mailto:jcb6...@gmail.com>> wrote:
>
> Jonas Oberhauser wrote:
>
>
> To be honest I understand only for the DRAM selection state
> and the branch predictor why they are really bad. For the
> others the window of attack seems to be too small to do
> anything useful. Is that assumption wrong?
>
>
> The correctness of that assumption will depend on
> microarchitectural details, such as maximum speculation depth.
> Speculating that Intel processors have the greatest maximum
> speculation depth is reasonable and would explain the observation
> that Intel processors are much more severely affected.
>
>
> That makes sense; but I'm still not sure what the exact effect will be
> -- assume you had 1000 cycles per iteration to find the data, how many
> iterations would one need? I wouldn't be surprised if this amounted to
> leaking one byte per second or something similar, but I'm not deep
> enough into the technical parts to understand this. I see Eric also
> replied to this but I haven't had time to read his reply yet; I'll
> consider it tomorrow.

The indirect branch poisoning attack on Haswell can speculate far enough
to execute eBPF bytecode using the Linux kernel's eBPF interpreter.
Project Zero used an eBPF program to leak hypervisor memory. (Note that
there is no legitimate way for a KVM guest to cause the host kernel to
evaluate an eBPF program.)

> If all we had to do was prevent effects to the cache, one
> could speculatively load only in case
> 1) nothing will be evicted (no cache contention)
> 2) no other cache will need to make a transition
> 3) the loaded data is clean
>
> LRs are different; speculative LRs that lock a cacheline can
> be timed by an attacker.
>
> If a remote operation is observed that will invalidate
> condition 3) while still speculating, one needs to abort and
> retry (or wait for the speculation to complete).
>
>
> Such a remote operation would need to cause all speculation past
> that load to be abandoned.
>
>
> Yes, what I meant by abort was even more severe -- retry the whole
> speculative sequence. You are right that only that load needs to be
> retried.

I would add that speculative execution should end at this point -- once
any speculated instructions have been canceled, it is certain that a
wrong path has been taken somewhere.

> Also, any cachelines speculatively loaded would need to be
> invalidated if speculation is abandoned.
>
>
> Exactly, hence condition 3).
>
> This could require several additional columns in the cache to
> store a "speculation checkpoint" number, to allow partial
> invalidation of speculated execution, analogous to transaction
> savepoints in PostgreSQL.
>
>
> Hm, I didn't think about that -- I thought it would be an all or
> nothing, but you're right, if a nested inner speculation/speculative
> load is detected as incorrect first, only those loads have to be
> invalidated.

The reason for not making a speculation abort all or nothing is that
some of the speculated instructions might still be able to commit.

> I guess the branch predictor can ignore speculative training
> data (only train at retirement), and if one never
> speculatively loads from DRAM, that problem can also be avoided.
>
>
> Limiting speculative loads to data already cached could be an
> interesting option and meets your condition (1) by definition.
>
>
> Yes, and I think you (or someome else) suggested something like it
> before. I'm just not sure what the performance penalty is, so I'm
> trying to carve out as much space as possible.

Right now the best we can do is discuss and find ideas that can be
evaluated using models or maybe even actual hardware.

> The difficulty of course lies in the fact that these may be
> things that happen very often, making the performance penalty
> unbearable. Every instruction speculates on not being
> interrupted, and you typically want to fetch before resolving
> that speculation.
>
>
> I believe that you are mistaken about interrupts in RISC-V: an
> interrupt causes some instruction to take an interrupt exception
> and trap. The interrupt trap (as "move pc to ?epc") can be
> inserted into the decoded instruction stream ahead of (almost) any
> instruction and the fetch unit repointed to *tvec. (I say almost
> any because some implementations might choose to hold off
> interrupts in LR/SC sequences.) The "interrupted" instruction is
> not executed at all. Which instruction takes the exception is
> unspecified, so as long as the "trap marker" appears correctly in
> the fetched instruction stream, the fetch unit can simply start
> fetching from the address in *tvec some time after the interrupt
> arrives.
>
>
> It's possible that I misunderstood something or used the wrong term
> but generally speaking in HW, many interrupts cause speculative
> instruction fetch, e.g., in a pipeline, an interrupt can not be
> detected in the early stages, but to avoid bubbles, one already
> fetches from pc, pc+4, pc+8, ... This has an effect on the cache, even
> if at a later stage the interrupt is detected and those instructions
> architecturally speaking never have been fetched.

Interrupts have latency, so the processor can simply decide that the
interrupt applies to the Nth instruction from now and take the interrupt
trap then.

This is distinct from synchronous exceptions that are expected to cause
precise traps. (user ISA section 1.3 "Exceptions, Traps, and
Interrupts") Synchronous exceptions will require flushing the pipeline,
since execution must resume at the trap handler instead of continuing
after the instruction that raised the exception. On the other hand,
since ECALL *always* raises an exception, an implementation *can*
speculate through ECALL and into the trap handler. (Is it really
speculation if it is always correct?)

Generally, RISC-V instructions fall into three categories:

(1) Always raises exception: ECALL, EBREAK, illegal instructions
(2) Never raises exception: RVI computational instructions
(3) Conditionally raises exception: memory access, privileged instructions

Note that control transfer instructions can be either (2) or (3)
depending on the implementation -- while instruction misalignment and
instruction access faults are possible, whether they occur
(microarchitecturally speaking) at JAL/JALR or on the following
instruction fetch is left to the implementation. (My reading of the
user ISA spec suggests that *epc must point to the offending JAL/JALR
for instruction misalignment faults, but microarchitectures can achieve
that by adjusting the generated link value after the fact.)

> Now using an indirect branch with some register R, one can read out
> the value of R (e.g., by timing instruction fetches, or by sharing the
> cachelines in a data cache and observing state changes). If one
> manages to put a secret value into R before the indirect branch, the
> attack is complete.

I think that this is a different type of side channel. How is
speculative execution part of this attack? The Spectre indirect branch
poisoning attack causes the processor to assume an attacker-controlled
value for R and speculate execution from that address. The speculative
execution is canceled when the processor later gets the real value for
R. (Presumably, R is loaded from memory just before the indirect jump
that uses R.)

> This is definitely true for exact SW interrupts. I now realize HW
> interrupts can avoid this speculation by draining the pipe. I was
> under the impression that RISCV has some exact SW interrupts, but I
> may be wrong.

You seem to be confusing terms -- RISC-V does not have exact interrupts,
but _exceptions_ are expected to be exact and are expected to cause
_precise_ traps. Which instruction gets a pending interrupt exception
is not specified, however.

For software interrupts in RISC-V, the ?SIP bit can be set, and an
interrupt occurs at some unspecified future time. Hardware can, of
course, make that future time convenient, for example, after "runahead
fetch" has retrieved the first cacheline of the trap handler. (Or
initiate fetching the trap handler and continue non-speculative
execution until either (1) an I-cache miss or (2) the first cacheline of
the trap handler is ready. In case (1), stall and wait for (2). In
case (2), take the interrupt trap.)

> The problem is that early on in the pipeline it is often impossible to
> tell whether the instruction may be of a type that traps or not, so
> you might potentially have to delay speculative instruction fetch
> from DRAM after every instruction. That sounds really bad.

Fortunately, in RISC-V, the major opcode (for 32-bit instructions) is
sufficient to sort an instruction into category (1)/(2)/(3) and the
memory access instructions are all in one corner of the opcode table.
JAL/JALR also differ by one bit, placing them in another corner of the
table. Another corner of the table holds instructions that can never
raise exceptions: OP, OP-IMM, AUIPC, LUI, OP-IMM-32, OP-32. At first
glance, sorting most 32-bit instructions into "exception risk"
categories seems to require 3~4 gate delays and the opcodes not covered
are either reserved or SYSTEM. All of the legal instructions in
category (1) are in the SYSTEM opcode, as are the privileged
instructions that fall in category (3).

> Maybe these attacks are not possible in practice because nobody has
> such code where an instruction might trap right before an indirect
> branch in just the right configuration, but I would not count on it.

How can speculative instruction fetch itself leak data? Instruction
fetch is linear and therefore completely predictable in the absence of
control-transfer instructions. Program code is assumed to be known to
the attacker, so what can leak?


-- Jacob

Jonas Oberhauser

unread,
Jan 6, 2018, 2:59:30 AM1/6/18
to Jacob Bachmeyer, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev


On Jan 6, 2018 04:43, "Jacob Bachmeyer" <jcb6...@gmail.com> wrote:
Jonas Oberhauser wrote:


        If all we had to do was prevent effects to the cache, one
        could speculatively load only in case
        1) nothing will be evicted (no cache contention)
        2) no other cache will need to make a transition
        3) the loaded data is clean

        LRs are different; speculative LRs that lock a cacheline can
        be timed by an attacker.

        If a remote operation is observed that will invalidate
        condition 3) while still speculating, one needs to abort and
        retry (or wait for the speculation to complete).


    Such a remote operation would need to cause all speculation past
    that load to be abandoned.

Yes, what I meant by abort was even more severe -- retry the whole speculative sequence. You are right that only that load needs to be retried.

I would add that speculative execution should end at this point -- once any speculated instructions have been canceled, it is certain that a wrong path has been taken somewhere.

That is not certain, because the instructions are not cancelled due to a speculation error but due to a "can not hide speculation in cache" event. However the result is the same: as long as the remote cacheline remains dirty, the data can not be loaded into the local cache without potentially measurable side effects, so speculation ends.


Hm, I didn't think about that -- I thought it would be an all or nothing, but you're right, if a nested inner speculation/speculative load is detected as incorrect first, only those loads have to be invalidated.

The reason for not making a speculation abort all or nothing is that some of the speculated instructions might still be able to commit.

I agree.


Yes, and I think you (or someome else) suggested something like it before. I'm just not sure what the performance penalty is, so I'm trying to carve out as much space as possible.

Right now the best we can do is discuss and find ideas that can be evaluated using models or maybe even actual hardware.

I agree.

Jonas Oberhauser

unread,
Jan 6, 2018, 3:23:14 AM1/6/18
to Jacob Bachmeyer, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev


On Jan 6, 2018 04:43, "Jacob Bachmeyer" <jcb6...@gmail.com> wrote:
Jonas Oberhauser wrote:



It's possible that I misunderstood something or used the wrong term but generally speaking in HW, many interrupts cause speculative instruction fetch, e.g., in a pipeline, an interrupt can not be detected in the early stages, but to avoid bubbles, one already fetches from pc, pc+4, pc+8, ... This has an effect on the cache, even if at a later stage the interrupt is detected and those instructions architecturally speaking never have been fetched.

Interrupts have latency, so the processor can simply decide that the interrupt applies to the Nth instruction from now and take the interrupt trap then.

This is distinct from synchronous exceptions that are expected to cause precise traps.  (user ISA section 1.3 "Exceptions, Traps, and Interrupts")  Synchronous exceptions will require flushing the pipeline, since execution must resume at the trap handler instead of continuing after the instruction that raised the exception.  On the other hand, since ECALL *always* raises an exception, an implementation *can* speculate through ECALL and into the trap handler.  (Is it really speculation if it is always correct?)

Generally, RISC-V instructions fall into three categories:

(1) Always raises exception:  ECALL, EBREAK, illegal instructions
(2) Never raises exception:  RVI computational instructions
(3) Conditionally raises exception:  memory access, privileged instructions

Note that control transfer instructions can be either (2) or (3) depending on the implementation -- while instruction misalignment and instruction access faults are possible, whether they occur (microarchitecturally speaking) at JAL/JALR or on the following instruction fetch is left to the implementation.  (My reading of the user ISA spec suggests that *epc must point to the offending JAL/JALR for instruction misalignment faults, but microarchitectures can achieve that by adjusting the generated link value after the fact.)


Thanks for the detailed explanation and disambiguation of the terms.

I guess what I meant then (in the terms you use above) is any exception, not just an interrupt.



Now using an indirect branch with some register R, one can read out the value of R (e.g., by timing instruction fetches, or by sharing the cachelines in a data cache and observing state changes). If one manages to put a secret value into R before the indirect branch, the attack is complete.

I think that this is a different type of side channel.  How is speculative execution part of this attack? 

HW fetched and executed the indirect branch (by that I mean: fetched the jumped-to instruction), which has to be rolled back because an earlier instruction had an exception which was not detected when executing the indirect branch.


The Spectre indirect branch poisoning attack causes the processor to assume an attacker-controlled value for R and speculate execution from that address.  The speculative execution is canceled when the processor later gets the real value for R.  (Presumably, R is loaded from memory just before the indirect jump that uses R.)

No, the attacker controls the exception, not R. HW speculates that no exception is taken and loads R and fetches from the cacheline pointed to by R. 
The exception is later detected and the branch discarded.

Now the attacker measures which cacheline was fetched from by the indirect branch, thereby deducing the content of R.






This is definitely true for exact SW interrupts. I now realize HW interrupts can avoid this speculation by draining the pipe. I was under the impression that RISCV has some exact SW interrupts, but I may be wrong.

You seem to be confusing terms -- RISC-V does not have exact interrupts, but _exceptions_ are expected to be exact and are expected to cause _precise_ traps.  Which instruction gets a pending interrupt exception is not specified, however.

For software interrupts in RISC-V, the ?SIP bit can be set, and an interrupt occurs at some unspecified future time.  Hardware can, of course, make that future time convenient, for example, after "runahead fetch" has retrieved the first cacheline of the trap handler.  (Or initiate fetching the trap handler and continue non-speculative execution until either (1) an I-cache miss or (2) the first cacheline of the trap handler is ready.  In case (1), stall and wait for (2).  In case (2), take the interrupt trap.)

What do you mean by SW interrupts? I meant what you call synchronous exception, such as page faults, PMAs, system calls, and the like.


The problem is that early on in the pipeline it is often impossible to tell whether the instruction may be of a type that traps or not, so you might potentially have to delay speculative instruction fetch  from DRAM after every instruction. That sounds really bad.

Fortunately, in RISC-V, the major opcode (for 32-bit instructions) is sufficient to sort an instruction into category (1)/(2)/(3) and the memory access instructions are all in one corner of the opcode table. 

Yes but unfortunately the opcode is not available until the instruction passes the decode stage. Until then, subsequent instructions have usually been fetched. 

Besides, category 3 includes common instructions such as stores and loads, and thus you run into a similar problem -- do you want to delay fetching uncached instructions until these loads and stores have been translated and found exception free?

Maybe these attacks are not possible in practice because nobody has such code where an instruction might trap right before an indirect branch in just the right configuration, but I would not count on it.

How can speculative instruction fetch itself leak data?  Instruction fetch is linear and therefore completely predictable in the absence of control-transfer instructions.  Program code is assumed to be known to the attacker, so what can leak?

In case of an indirect branch on register R, you can leak the content of the register R. 

I suppose for conditional branches you can leak the value of the condition.

Cesar Eduardo Barros

unread,
Jan 6, 2018, 5:54:47 AM1/6/18
to jcb6...@gmail.com, Stefan O'Rear, RISC-V ISA Dev
Em 05-01-2018 02:02, Jacob Bachmeyer escreveu:
> Stefan O'Rear wrote:
>> On Wed, Jan 3, 2018 at 10:49 PM, Jacob Bachmeyer <jcb6...@gmail.com>
>> wrote:
>>> To start a discussion, I suggest that all instructions have a control
>>> dependency on any preceding instructions in program order that can cause
>>> exceptions (excluding interrupt exceptions, which can occur at any
>>> instruction), until those potential traps are resolved.
>>> Implementations are
>>> permitted to speculate through ECALL and into the trap handler, however,
>>> since the ECALL trap is unconditional and therefore resolved upon
>>> decoding
>>> ECALL.
>>
>> This is a rather imprecise statement and appears to using "control
>> dependency" in a sense other than the standard one associated with
>> memory models.
>
> I may have misused the term, since I am still learning this area.  Put
> simply, instructions should have dependencies on the "raise exception"
> output of all preceding instructions in program order.  This should be
> sufficient to prevent speculation past an exception, but hopefully not
> necessary and a weaker rule can also be sufficient.  Meltdown definitely
> relies on this rule being violated, and I suspect that at least some
> forms of Spectre can be similarly prevented.

In modern cryptography, there is the concept of a "data-dependent array
index" or a "data-dependent branch", as something to be avoided. The
"data" here is a secret value, like a key or the internal state of a
cipher. A "data-dependent array index" is when you index into an array
using secret data; this leaks the secret data through the cacheline
address. A "data-dependent branch" is when you decide whether or not to
branch based on secret data; this leaks the secret data through timing
(a taken branch and a non-taken branch take different time), and also
through the cache.

Modern cryptography design avoids data-dependent array indexes and
branches: the code flow, and the memory addresses acessed, should either
be exactly the same for every run of the cryptographic algorithm, or
depend only on non-secret data.

Older cryptography was not as careful. For instance, the "S-box"
construct (basically, an array indexed by the algorithm state,
containing values derived from the key to be mixed into the next
algorithm state) used to be popular. Modern implementations attempt to
prevent the leak, through tricks like bit-slicing, using masking instead
of branches, or acessing every cacheline when looking up an S-box.

A similar approach could work against Spectre. The secret values, in our
case, are "speculated register values". These are non-retired register
values, which were created within the speculative region of code. The
following should never be done with speculated register values, until
the speculation is confirmed as correct:

- Using a speculated register value as a memory address
- Using a speculated register value as a branch condition
- Using a speculated register value as a jump/branch target
- Using a speculated register value in a variable-time instruction

The following are safe:

- Using a non-speculated register value as a memory address
- Using a non-speculated register value as a branch condition
- Using a non-speculated register value as a jump/branch target
- Using a non-speculated register value in a variable-time instruction
- Using a prediction as a branch condition (nested speculation)
- Using a predicted address as a jump/branch target

The first case needs a bit of an explanation. The canonical Spectre
victim does the following:

- check if x is within the bounds
- load into y from an array indexed by x
- load from an array indexed by y

Here, while speculating whether the bounds check passed, y is a
"speculated register value", so the load from y (which is what causes
the leak) has to wait until the speculation ends. But what if the
compiler reorders the load into y to before the bounds check? Luckly, it
can't, because unless x is within the bounds, there's no guarantee that
there's memory at that array location. The compiler can hoist the array
index calculation to before the bounds check, but the actual memory
access has to be after it, so y becomes a "speculated register value".

So I conjecture, but have not proved, that the rules above would make
speculative execution a lot safer, while still allowing for a good
amount of speculation.

--
Cesar Eduardo Barros
ces...@cesarb.eti.br

Jacob Bachmeyer

unread,
Jan 6, 2018, 7:37:20 PM1/6/18
to Cesar Eduardo Barros, Stefan O'Rear, RISC-V ISA Dev
Any suggestions how progress towards a security proof can be made?
Obviously, modeling speculation is needed for measuring the performance
impact.


-- Jacob

Jacob Bachmeyer

unread,
Jan 6, 2018, 8:19:46 PM1/6/18
to Jonas Oberhauser, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
Jonas Oberhauser wrote:
> On Jan 6, 2018 04:43, "Jacob Bachmeyer" <jcb6...@gmail.com
Those are not my terms: they are from the RISC-V ISA spec. These
particular nuances of "exception", "interrupt", and "trap" should
probably be considered unique to RISC-V, although the ISA spec mentions
in commentary that "exception" and "trap" align with IEEE-754.

> Now using an indirect branch with some register R, one can
> read out the value of R (e.g., by timing instruction fetches,
> or by sharing the cachelines in a data cache and observing
> state changes). If one manages to put a secret value into R
> before the indirect branch, the attack is complete.
>
>
> I think that this is a different type of side channel. How is
> speculative execution part of this attack?
>
>
> HW fetched and executed the indirect branch (by that I mean: fetched
> the jumped-to instruction), which has to be rolled back because an
> earlier instruction had an exception which was not detected when
> executing the indirect branch.

This is not one of the published Spectre attacks, then, but possibly
something new. ("Fun"... the second time in this thread a possible
attack has been found "near" the recently-published attacks.) In this
case, the indirect branch target is known non-speculatively, but the
indirect branch itself is in the "shadow" of something that prevents it
from actually being executed (either a Meltdown-like delayed exception
or a mispredicted conditional branch). This could be an "if
(object->flags.use_vtable) { object->ops->some_method(...) } else {
global_version_of_some_method(...) }" construction, but that would be
very strange code and could only leak whatever the slot in whatever
replaces "ops" in objects that lack local method tables actually holds.

Or is this another means to exploit the indirect branch poisoning
attack: cause the hardware to speculate an indirect jump destination
that points to another indirect jump that uses the register that happens
to contain a sensitive value. Of course, odds are that the sensitive
value will not point to an executable page, so forbidding speculation
past an access violation kills this attack dead except for leaking valid
control flow. (Presumably, a PTE walk that results in a page fault does
not affect the TLB state.)

> The Spectre indirect branch poisoning attack causes the processor
> to assume an attacker-controlled value for R and speculate
> execution from that address. The speculative execution is
> canceled when the processor later gets the real value for R.
> (Presumably, R is loaded from memory just before the indirect jump
> that uses R.)
>
>
> No, the attacker controls the exception, not R. HW speculates that no
> exception is taken and loads R and fetches from the cacheline pointed
> to by R.
> The exception is later detected and the branch discarded.
>
> Now the attacker measures which cacheline was fetched from by the
> indirect branch, thereby deducing the content of R.

I still see no way for an attacker to gain from this. The attacker gets
the contents of R, I grant, but R must be a valid code pointer, since
the attacked program jumps to it (and does not crash in the case where
the attacker has not arranged an unexpected diversion of control flow).
Assuming the attacker knows the attacked program code, how does this
leak anything useful? It only breaks ASLR if the attacker already has
access to the attacked program's address space, to observe which
cacheline was fetched.

> This is definitely true for exact SW interrupts. I now realize
> HW interrupts can avoid this speculation by draining the pipe.
> I was under the impression that RISCV has some exact SW
> interrupts, but I may be wrong.
>
>
> You seem to be confusing terms -- RISC-V does not have exact
> interrupts, but _exceptions_ are expected to be exact and are
> expected to cause _precise_ traps. Which instruction gets a
> pending interrupt exception is not specified, however.
>
> For software interrupts in RISC-V, the ?SIP bit can be set, and an
> interrupt occurs at some unspecified future time. Hardware can,
> of course, make that future time convenient, for example, after
> "runahead fetch" has retrieved the first cacheline of the trap
> handler. (Or initiate fetching the trap handler and continue
> non-speculative execution until either (1) an I-cache miss or (2)
> the first cacheline of the trap handler is ready. In case (1),
> stall and wait for (2). In case (2), take the interrupt trap.)
>
>
> What do you mean by SW interrupts? I meant what you call synchronous
> exception, such as page faults, PMAs, system calls, and the like.

RISC-V has both synchronous exceptions (like "page fault") and software
interrupts; the latter are caused by setting the "Software Interrupt
Pending" bit in the *ip CSR for the appropriate privilege level. IPIs
are expected to be delivered as software interrupts.

> The problem is that early on in the pipeline it is often
> impossible to tell whether the instruction may be of a type
> that traps or not, so you might potentially have to delay
> speculative instruction fetch from DRAM after every
> instruction. That sounds really bad.
>
>
> Fortunately, in RISC-V, the major opcode (for 32-bit instructions)
> is sufficient to sort an instruction into category (1)/(2)/(3) and
> the memory access instructions are all in one corner of the opcode
> table.
>
>
> Yes but unfortunately the opcode is not available until the
> instruction passes the decode stage. Until then, subsequent
> instructions have usually been fetched.

The major opcode is available to the fetch unit, as long as the fetch
unit knows when it is looking at the first parcel of an instruction.
Further, the subsequent instructions fetched prior to decode are always
a completely predictable linear advance, since control-transfer
instructions are not known until decode either. How can this "run-ahead
fetch" leak information usable to an attacker?

> Besides, category 3 includes common instructions such as stores and
> loads, and thus you run into a similar problem -- do you want to delay
> fetching uncached instructions until these loads and stores have been
> translated and found exception free?

There is no need to delay linear "run-ahead fetch" if it cannot leak
useful information. The mere fact that a given load was executed at all
is enough to ensure a certain "run-ahead fetch" will occur.

The key is that at most one exception can be yet-to-be resolved -- since
instructions that affect the cache are (or can be, for jumps) in
category (3), speculative execution must hold when a load is encountered
in the "exception shadow" of another pending load. Relaxing this to
"instructions with a data dependency on instructions that can raise
exceptions must not be speculatively executed until the preceding
instruction (the target of the dependency) is known not to raise an
exception" may also be sufficient and would allow multiple independent
loads to proceed in parallel.

> Maybe these attacks are not possible in practice because
> nobody has such code where an instruction might trap right
> before an indirect branch in just the right configuration, but
> I would not count on it.
>
>
> How can speculative instruction fetch itself leak data?
> Instruction fetch is linear and therefore completely predictable
> in the absence of control-transfer instructions. Program code is
> assumed to be known to the attacker, so what can leak?
>
>
> In case of an indirect branch on register R, you can leak the content
> of the register R.
>
> I suppose for conditional branches you can leak the value of the
> condition.

Both of these cases can only leak when a branch is *executed*, perhaps
speculatively. But branches cannot be executed until after they have
been decoded, at which time the dependencies on previous possible
exceptions become known.


-- Jacob

Michael Chapman

unread,
Jan 7, 2018, 5:23:56 AM1/7/18
to jcb6...@gmail.com, Jonas Oberhauser, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev

On 07-Jan-18 02:19, Jacob Bachmeyer wrote:
>
> The major opcode is available to the fetch unit, as long as the fetch
> unit knows when it is looking at the first parcel of an instruction. 
> Further, the subsequent instructions fetched prior to decode are
> always a completely predictable linear advance, since control-transfer
> instructions are not known until decode either.  How can this
> "run-ahead fetch" leak information usable to an attacker?

Some (most?) processors perform branch prediction before anything is
decoded. So this is not a linear advance before decode.

Jonas Oberhauser

unread,
Jan 7, 2018, 7:15:57 AM1/7/18
to Jacob Bachmeyer, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev


2018-01-07 2:19 GMT+01:00 Jacob Bachmeyer <jcb6...@gmail.com>:

Jonas Oberhauser wrote:

Thanks for the detailed explanation and disambiguation of the terms.

I guess what I meant then (in the terms you use above) is any exception, not just an interrupt.

Those are not my terms:  they are from the RISC-V ISA spec.  These particular nuances of "exception", "interrupt", and "trap" should probably be considered unique to RISC-V, although the ISA spec mentions in commentary that "exception" and "trap" align with IEEE-754.

I understand that :) I didn't mean to imply that they are your terms.

        Now using an indirect branch with some register R, one can
        read out the value of R (e.g., by timing instruction fetches,
        or by sharing the cachelines in a data cache and observing
        state changes). If one manages to put a secret value into R
        before the indirect branch, the attack is complete.


    I think that this is a different type of side channel.  How is
    speculative execution part of this attack? 

HW fetched and executed the indirect branch (by that I mean: fetched the jumped-to instruction), which has to be rolled back because an earlier instruction had an exception which was not detected when executing the indirect branch.

This is not one of the published Spectre attacks, then, but possibly something new.  ("Fun"... the second time in this thread a possible attack has been found "near" the recently-published attacks.) 

I think this is subjective, but I just see it as part of the spectre class.
 
In this case, the indirect branch target is known non-speculatively, but the indirect branch itself is in the "shadow" of something that prevents it from actually being executed (either a Meltdown-like delayed exception or a mispredicted conditional branch).  This could be an "if (object->flags.use_vtable) { object->ops->some_method(...) } else { global_version_of_some_method(...) }" construction, but that would be very strange code and could only leak whatever the slot in whatever replaces "ops" in objects that lack local method tables actually holds.

Like I said before, it's not clear to me that there is vulnerable code in the wild, but I wouldn't be surprised.
 
Or is this another means to exploit the indirect branch poisoning attack:  cause the hardware to speculate an indirect jump destination that points to another indirect jump that uses the register that happens to contain a sensitive value.

One could say so. The exception is just one way to get to the second branch, but what I'm saying is that the exception is probably harder to fix in HW without ruining performance.

Of course, odds are that the sensitive value will not point to an executable page, so forbidding speculation past an access violation kills this attack dead except for leaking valid control flow. 
 
I assume the attacker has a means to make his pages executable. The victim only needs to be able to jump into pages of the attacker, which for same address space victims should be possible.
 
(Presumably, a PTE walk that results in a page fault does not affect the TLB state.)

I don't think that is a reasonable assumption. Our MMUs (in MIPS86) have always buffered non-faulting PTEs read during walking even if the walk ends in a page fault, and they now also buffer the faulty walk for performance reasons.

With the RISCV MMU model that we have discussed (i.e., the one that you believe to be the intended spec, with a remote PTE write only being visible after an IPI and SFENCE) it seems like RISCV implementations can also buffer the faulty translations (and of course any partial walks that lead to the fault).

    The Spectre indirect branch poisoning attack causes the processor
    to assume an attacker-controlled value for R and speculate
    execution from that address.  The speculative execution is
    canceled when the processor later gets the real value for R.     (Presumably, R is loaded from memory just before the indirect jump
    that uses R.)


No, the attacker controls the exception, not R. HW speculates that no exception is taken and loads R and fetches from the cacheline pointed to by R. The exception is later detected and the branch discarded.

Now the attacker measures which cacheline was fetched from by the indirect branch, thereby deducing the content of R.

I still see no way for an attacker to gain from this.  The attacker gets the contents of R, I grant, but R must be a valid code pointer, since the attacked program jumps to it (and does not crash in the case where the attacker has not arranged an unexpected diversion of control flow).  Assuming the attacker knows the attacked program code, how does this leak anything useful?  It only breaks ASLR if the attacker already has access to the attacked program's address space, to observe which cacheline was fetched.

Yes, R needs to be a valid code pointer, but the odds for that are good especially if the attacker can make pages executable. There are two ways to deduce bits from R: either the victim needs to be able to branch into the attackers address space (but not necessarily the reverse), or the instruction and data cache must be the same (i.e., one cache for data and instructions) and so evictions by the instruction cache are visible in the data cache.

Cesar Eduardo Barros

unread,
Jan 7, 2018, 4:40:45 PM1/7/18
to jcb6...@gmail.com, Stefan O'Rear, RISC-V ISA Dev
Em 06-01-2018 22:37, Jacob Bachmeyer escreveu:
> Cesar Eduardo Barros wrote:
>>
In my opinion, the base of the security proof would be: the program
already leaks things through cache/instruction timing all the time;
adding speculation should not leak more than the program would already
leak without speculation.

So, two things have to be defined: what's the "secret data" which should
not be leaked during the speculated execution, and what is the set of
"leaking operations" which should never be done on that secret data.

For the set of leaking operations, I'd start with the four I've listed
above (using the data as: memory address, branch condition, branch
target, or input to a variable-time instruction like some div/mul
implementations).

Defining what's the "secret data" is the sticking part. Define too much
as "secret data", and you can do nearly nothing while speculating.
Define too little as "secret data", and you are still vulnerable to
leaks from speculation.

Thinking more about it today, it dawned on me that there might be not
one, but two separate definitions of "secret data", depending on which
Spectre variant you're defending against.

Variant 1 is what I'd call "imaginary execution". The code is there, but
its execution is a fantasy dreamed by the CPU, since an earlier branch
will have taken a different path. Variant 2 is what I'd call "imaginary
code", since there's no path leading to that code, and it might even not
be code at all; the CPU dreamed not only the execution, but also the
address it should be executing.

For variant 1 (predicting taken/not-taken), I'm still a bit fuzzy on the
details. Data speculatively loaded from memory certainly should be
treated as secret data, but should the value of registers from outside
the speculative region (and anything derived from them through simple
arithmetic) be treated as secret? In the canonical Spectre variant 1
example, they don't have to be, since they're attacker-controlled
anyway, but I'm not sure if that would always be the case.

For variant 2 (predicting the branch target), I believe we have to be
stricter: the secret data is everything. All the registers, all the
memory. This means that, while speculating the target of an indirect
jump, we can only do non-leaking operations. This is because, for
variant 2, the attacker can hijack any indirect jump and run nearly
arbitrary speculated code, while for variant 1, the attacker is more
restricted on what can be done during the speculation. So for variant 2,
there's a greater chance that what the attacker is after is a register
value, not a value to be speculatively loaded from memory.

Jacob Bachmeyer

unread,
Jan 7, 2018, 9:01:25 PM1/7/18
to Cesar Eduardo Barros, Stefan O'Rear, RISC-V ISA Dev
Cesar Eduardo Barros wrote:
> Thinking more about it today, it dawned on me that there might be not
> one, but two separate definitions of "secret data", depending on which
> Spectre variant you're defending against.
>
> Variant 1 is what I'd call "imaginary execution". The code is there,
> but its execution is a fantasy dreamed by the CPU, since an earlier
> branch will have taken a different path. Variant 2 is what I'd call
> "imaginary code", since there's no path leading to that code, and it
> might even not be code at all; the CPU dreamed not only the execution,
> but also the address it should be executing.
>
> For variant 1 (predicting taken/not-taken), I'm still a bit fuzzy on
> the details. Data speculatively loaded from memory certainly should be
> treated as secret data, but should the value of registers from outside
> the speculative region (and anything derived from them through simple
> arithmetic) be treated as secret? In the canonical Spectre variant 1
> example, they don't have to be, since they're attacker-controlled
> anyway, but I'm not sure if that would always be the case.

Variant 1 in a nutshell: speculative execution ignores software
bounds-checks. This variant seems the hardest to prevent, since the
attacker and target are in the same address space and no
hardware-enforced security boundaries are crossed. (The variation on
this attack that Project Zero described using Linux eBPF to read from
kernel memory fits in this category because the kernel has access to its
own address space.)

Since variant 1 does not cross hardware security boundaries, is it
really an attack against our current models or does it expose a loophole
in our current security models?

> For variant 2 (predicting the branch target), I believe we have to be
> stricter: the secret data is everything. All the registers, all the
> memory. This means that, while speculating the target of an indirect
> jump, we can only do non-leaking operations. This is because, for
> variant 2, the attacker can hijack any indirect jump and run nearly
> arbitrary speculated code, while for variant 1, the attacker is more
> restricted on what can be done during the speculation. So for variant
> 2, there's a greater chance that what the attacker is after is a
> register value, not a value to be speculatively loaded from memory.

"An ounce of prevention is worth a pound of cure."

Could simply requiring that branch prediction be isolated
per-address-space (and cleared when an address space is redefined by
changing the root page table PPN associated with an ASID) eliminate this
attack or at least reduce it to "same-process only"? Project Zero was
only able to make this attack work on the Intel processor they tested; I
suspect, given some of the advice in AMD's K8 optimization guide, that
AMD stores branch prediction data in the L1 I-cache, thus (mostly)
isolating branch prediction to each process.

We of course do not care where RISC-V implementations actually store
their branch prediction state, only that branch history observed in one
process cannot be used to predict branches in other processes (or
otherwise cross hardware-enforced security boundaries -- even if two
processes share code through VM aliasing, they must not share branch
prediction state). (Two processes sharing a page of code does not mean
that they are related; Linux has a feature to merge physical pages that
happen to contain the same data as copy-on-write, even across VMs.)


-- Jacob


Cesar Eduardo Barros

unread,
Jan 7, 2018, 9:47:11 PM1/7/18
to jcb6...@gmail.com, Stefan O'Rear, RISC-V ISA Dev
What scares me about variant 1, is that I can imagine it being remotely
exploitable. A remote attacker could in theory feed a network server
carefully constructed data to first prime the branch prediction, then
cause the misprediction, and finally expose the resulting timing
difference. That certainly is crossing a security boundary.

We can't prevent misprediction, it can even happen on its own; I don't
see how to prevent the timing difference from being detected; that only
leaves us with preventing the timing difference from being created in
the first place.

The simplest (and safest) approach would be to "forbid all side effects"
while speculating, but that could restrict speculation too much. A more
targeted approach would be to track when a register is loaded from
memory "during speculation", and until the speculation finishes, forbid
side effects involving that register. The simplest implementation of
that (adding no extra state to the pipeline) would be to forbid
side-effects from non-retired registers. But that could not restrict
speculation enough.

The canonical variant 1 example (bounds check, then load from memory)
would, as far as I can see, be stopped by the simple "no loading from an
address from a non-retired register" rule (since the register can't be
retired until the speculation ends). I'd extend it to the four
restrictions I described earlier (no loads/stores, no using as branch
condition, no using as branch target, no using in variable-timing
instructions like division). But what if instead of a bounds check, it's
something like a permission check, and the initial load was before it?
Would that be enough to lead to an attack?

As for security models, I think Spectre went outside of them: they
consider only what happens in the "real" execution, not the "imaginary"
one which Spectre exploits. The security models will have to be
augmented to also consider what can and what cannot happen during
speculation.

And this is where cooperation between the hardware and the software is
necessary. If the ISA can give guarantees like "this won't leak to the
cache, even if speculated", or anti-guarantees like "this might leak to
the cache due to speculation", the software side can order its
operations and/or add a few fences to make things go smoothly. Absent
guarantees from the hardware, the only way left for the software is to
put heavy barries everywhere.
Yeah, I had just thought of that a few minutes ago: the best way to
defend against variant 2 is to make the branch target history "ASID
tagged", like a L1 cache often is "physical tagged". (Of course, if more
than one privilege level can share the same ASID, the tag should be the
concatenation of the ASID and the privilege level.)

The tag could even be compared with the ASID in parallel with the
instruction being decoded (the decoding step should be harmless), so it
wouldn't be in the critical path. And this should preserve the hit rate
of the branch target history even across context switches (and perhaps
even reducing the chance of accidental useless speculation).

This simple solution to variant 2 frees us to focus on variant 1.

Jacob Bachmeyer

unread,
Jan 7, 2018, 10:53:46 PM1/7/18
to Michael Chapman, Jonas Oberhauser, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
How can the processor know that there is a branch at location X until
the instruction at location X has been decoded?


-- Jacob

Christopher Celio

unread,
Jan 7, 2018, 11:24:51 PM1/7/18
to jcb6...@gmail.com, Michael Chapman, Jonas Oberhauser, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
Many high-performance processors use a Branch Target Buffer to predict the instruction type well before decode has occurred. You can even let the Fetch Unit feed its own predictions into itself, fetching instructions entirely on its own, never actually seeing the instruction bits it is fetching.

-Chris
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5A52EB45.4070809%40gmail.com.

Jacob Bachmeyer

unread,
Jan 7, 2018, 11:38:42 PM1/7/18
to Jonas Oberhauser, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
Jonas Oberhauser wrote:
> 2018-01-07 2:19 GMT+01:00 Jacob Bachmeyer <jcb6...@gmail.com
> <mailto:jcb6...@gmail.com>>:
>
> Jonas Oberhauser wrote:
>
> Now using an indirect branch with some register R, one can
> read out the value of R (e.g., by timing instruction
> fetches,
> or by sharing the cachelines in a data cache and observing
> state changes). If one manages to put a secret value
> into R
> before the indirect branch, the attack is complete.
>
>
> I think that this is a different type of side channel. How is
> speculative execution part of this attack?
>
> HW fetched and executed the indirect branch (by that I mean:
> fetched the jumped-to instruction), which has to be rolled
> back because an earlier instruction had an exception which was
> not detected when executing the indirect branch.
>
>
> This is not one of the published Spectre attacks, then, but
> possibly something new. ("Fun"... the second time in this thread
> a possible attack has been found "near" the recently-published
> attacks.)
>
>
> I think this is subjective, but I just see it as part of the spectre
> class.

Spectre generally is an entire class of attacks, but the authors of the
Spectre paper forgot to describe it that way, causing much confusion.

> In this case, the indirect branch target is known
> non-speculatively, but the indirect branch itself is in the
> "shadow" of something that prevents it from actually being
> executed (either a Meltdown-like delayed exception or a
> mispredicted conditional branch). This could be an "if
> (object->flags.use_vtable) { object->ops->some_method(...) } else
> { global_version_of_some_method(...) }" construction, but that
> would be very strange code and could only leak whatever the slot
> in whatever replaces "ops" in objects that lack local method
> tables actually holds.
>
>
> Like I said before, it's not clear to me that there is vulnerable code
> in the wild, but I wouldn't be surprised.

JITs exist -- if there is no vulnerable code in the wild already, the
attacker can make some vulnerable code. You suggested a delayed
exception, but a mispredicted conditional branch works just as well and
can sneak past verifiers like the one used with Linux's eBPF facility.

> Or is this another means to exploit the indirect branch poisoning
> attack: cause the hardware to speculate an indirect jump
> destination that points to another indirect jump that uses the
> register that happens to contain a sensitive value.
>
>
> One could say so. The exception is just one way to get to the second
> branch, but what I'm saying is that the exception is probably harder
> to fix in HW without ruining performance.

If anything, the delayed exception is easier to fix -- delayed
exceptions are (as far as we know right now) the root cause of the
Meltdown vulnerability, which (so far) is unique to Intel x86
processors. Even AMD x86 implementations do not have the issue.

> Of course, odds are that the sensitive value will not point to an
> executable page, so forbidding speculation past an access
> violation kills this attack dead except for leaking valid control
> flow.
>
>
> I assume the attacker has a means to make his pages executable. The
> victim only needs to be able to jump into pages of the attacker, which
> for same address space victims should be possible.

How does the attacker know *which* pages to make executable if the goal
is to leak R?

> (Presumably, a PTE walk that results in a page fault does not
> affect the TLB state.)
>
>
> I don't think that is a reasonable assumption. Our MMUs (in MIPS86)
> have always buffered non-faulting PTEs read during walking even if the
> walk ends in a page fault, and they now also buffer the faulty walk
> for performance reasons.
>
> With the RISCV MMU model that we have discussed (i.e., the one that
> you believe to be the intended spec, with a remote PTE write only
> being visible after an IPI and SFENCE) it seems like RISCV
> implementations can also buffer the faulty translations (and of course
> any partial walks that lead to the fault).

I have proposed exactly that kind of paging TLB to store partial
translations previously, and there is no problem with storing part of a
non-speculative walk, even if it ends in page fault. The problem arises
when the TLB provides a side-channel. I suggest that the TLB
speculation side-channel can be closed by requiring the TLBs to track
entries that are products of speculative execution and erase those
entries when speculative execution is abandoned. (Of course, when the
speculated trace commits, the TLB entries it produced are no longer
speculative.)

The fact that other TLB entries may have been evicted to accommodate the
new speculative entries is less of a concern with this attack, since
that only leaks that *some* TLB entries were replaced, not *which*
address was loaded. This can become a problem if the TLB is large
enough, but for TLBs that store only a tiny fraction of the address
space, all an attacker gains from detecting a TLB eviction is that R was
not in the set of addresses covered by the TLB. Unless the attacker can
freely actually map nearly the entire address space, very little
information can be gained.

Implementations can close the second side-channel by defining a
high-water threshold and evicting entries from the TLB until some
minimum number of free TLB slots are available upon speculating any
memory access. This speculative TLB eviction must only occur when the
TLB does not have some minimum number of free slots. Speculative
execution must halt if the TLB reaches capacity again during speculative
execution.
I think that you still misunderstand: the victim does not crash after
the attack, which means that the value leaked in R must be a valid code
pointer chosen by the *victim* and part of the victim's normal control
flow. How does this leak useful information to an attacker who already
has access to the victim's address space?



-- Jacob


PS: Even if leaking indirect branch targets proves useless, this
discussion is valuable for shining a spotlight on TLB speculation side
channels.

Jacob Bachmeyer

unread,
Jan 7, 2018, 11:49:27 PM1/7/18
to Christopher Celio, Michael Chapman, Jonas Oberhauser, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
Christopher Celio wrote:
> Many high-performance processors use a Branch Target Buffer to predict the instruction type well before decode has occurred. You can even let the Fetch Unit feed its own predictions into itself, fetching instructions entirely on its own, never actually seeing the instruction bits it is fetching.
>

Perhaps speculating instruction decode is simply going too far? Or at
least, we need to ensure that the Branch Target Buffer is partitioned
along hardware-enforced security boundaries, so that process A cannot
influence predictions in process B, otherwise Jonas Oberhauser's
indirect branch target leak may prove to be quite useful after all --
and prove to not actually require an indirect branch at the targeted
location in the target process. (*OUCH!*)

I have a vague outline of such an attack, where an attacker influences
branch prediction of "phantom" branches to alter the timing of another
VM in a cloud. If the attacker can trick the branch predictor to use
the register of the attacker's choice instead of an actual instruction
in the target, the speculated branch could leak sensitive information
after all. The primary mitigating factor I see in that attack is that
the indirect branch target leaked must be part of the target's normal
control flow... but if the "leaky" branch does not have to actually
exist in the target, that no longer applies.


-- Jacob

Christopher Celio

unread,
Jan 7, 2018, 11:56:59 PM1/7/18
to jcb6...@gmail.com, Michael Chapman, Jonas Oberhauser, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
Yes, going forward, BTBs will have to be flushed or tagged to prevent an attacker training a victim's predictor. That by itself mitigates any Spectre attack, save the scenario where the attacker is the victim (e.g., a sandboxed JIT running with supervisor permissions).

-Chris

Jonas Oberhauser

unread,
Jan 8, 2018, 4:44:19 AM1/8/18
to Christopher Celio, Jacob Bachmeyer, Michael Chapman, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
2018-01-08 5:56 GMT+01:00 Christopher Celio <ce...@eecs.berkeley.edu>:
Yes, going forward, BTBs will have to be flushed or tagged to prevent an attacker training a victim's predictor.  That by itself mitigates any Spectre attack, save the scenario where the attacker is the victim (e.g., a sandboxed JIT running with supervisor permissions)

I don't think it helps extremely much because
1) there will still be mispredicted branches where the attacker can predict the misprediction, just not the ones that are trained by the attacker and 
2) especially synchronous exceptions will probably be mispredicted

Jose Renau

unread,
Jan 8, 2018, 2:11:01 PM1/8/18
to Jonas Oberhauser, Christopher Celio, Jacob Bachmeyer, Michael Chapman, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev

 A couple of things that we may want to consider for RISC-V:

 -Capacity to use some bits in the address space to mark the "domain" in which the application runs. This is more for single address space protection.
Currently, the CPU has no way to know if the code in thread one should not leak to thread two.

 The CPU has a way to know if it is in user vs system mode, or in APP1 vs APP2, but not if it is in app1-thread0 app1-thread1. The latter is something
that may happen in JS browsers.


 For example, we can say that the upper 10 bits in the virtual address space identify a "domain" and that information should not leak across domains.
(Notice that this only applies to PCs. If the library is mapped with a different domain, it should not have side-channel leaks outside the domain)

 If the CPU has "domain" information, it can decide to flush/tag resources accordingly. There may be a performance hit, but at leas it can isolate the domains if required.
If there is no "domain" information, there is not chance to isolate.

 I would not create new instructions to flush BTB or X or Y because some simple cores do not need to do anything else, adding these extra instructions
would be unnecessary overhead. Using the upper bits in the VA is not an overhead or fragmentation.

 -The 2nd thing that we may want to consider is to decrease the quality of the timers at user mode as a way to mitigate timing changes detection.
This paper talks about "fuzzy" time as a way to mitigate (not solve, but mitigate)

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Allen J. Baum

unread,
Jan 9, 2018, 1:02:13 AM1/9/18
to Christopher Celio, jcb6...@gmail.com, Michael Chapman, Jonas Oberhauser, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
At 8:56 PM -0800 1/7/18, Christopher Celio wrote:
>Yes, going forward, BTBs will have to be flushed or tagged to prevent an attacker training a victim's predictor. That by itself mitigates any Spectre attack, save the scenario where the attacker is the victim (e.g., a sandboxed JIT running with supervisor permissions).

Or we need to mark changes to BTB entries as themselves speculative and be able to roll them back?

>
>-Chris
>
>
>
>
>> On Jan 7, 2018, at 8:49 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>>
>> Christopher Celio wrote:
>>> Many high-performance processors use a Branch Target Buffer to predict the instruction type well before decode has occurred. You can even let the Fetch Unit feed its own predictions into itself, fetching instructions entirely on its own, never actually seeing the instruction bits it is fetching.
>>>
>>
>> Perhaps speculating instruction decode is simply going too far? Or at least, we need to ensure that the Branch Target Buffer is partitioned along hardware-enforced security boundaries, so that process A cannot influence predictions in process B, otherwise Jonas Oberhauser's indirect branch target leak may prove to be quite useful after all -- and prove to not actually require an indirect branch at the targeted location in the target process. (*OUCH!*)
>>
>> I have a vague outline of such an attack, where an attacker influences branch prediction of "phantom" branches to alter the timing of another VM in a cloud. If the attacker can trick the branch predictor to use the register of the attacker's choice instead of an actual instruction in the target, the speculated branch could leak sensitive information after all. The primary mitigating factor I see in that attack is that the indirect branch target leaked must be part of the target's normal control flow... but if the "leaky" branch does not have to actually exist in the target, that no longer applies.
>>
>>
>> -- Jacob
>
>--
>You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
>To post to this group, send email to isa...@groups.riscv.org.
>Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/4AF2D83D-7C4F-41E8-9218-AB5736B91E82%40eecs.berkeley.edu.


--
**************************************************
* Allen Baum tel. (908)BIT-BAUM *
* 248-2286 *
**************************************************

Jonas Oberhauser

unread,
Jan 9, 2018, 3:19:12 AM1/9/18
to Allen J. Baum, Christopher Celio, Jacob Bachmeyer, Michael Chapman, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
On Jan 9, 2018 07:02, "Allen J. Baum" <allen...@esperantotech.com> wrote:
At 8:56 PM -0800 1/7/18, Christopher Celio wrote:
>Yes, going forward, BTBs will have to be flushed or tagged to prevent an attacker training a victim's predictor.  That by itself mitigates any Spectre attack, save the scenario where the attacker is the victim (e.g., a sandboxed JIT running with supervisor permissions).

Or we need to mark changes to BTB entries as themselves speculative and be able to roll them back?

I think the problem is with 
1) the attacker training the BTB during non-speculative execution to increase the chance of successful attack, 

2) the attacker reading out the kernel BTB 

but not with the BTB leaking data during speculative execution -- although that might be another (weak?) attack vector (you might be able to read out whether a branch was reached during speculative execution -- but it's hard to read out the value of earlier branch conditions that led you there, since they may also have been mispredicted)

Rolling back changes to the BTB (or more simply, only training the BTB when instructions retire, as suggested earlier) is not enough to mitigate the first two problems; a per-thread BTB (e.g., by resetting the BTB on thread switch as suggested above) would be.


>
>-Chris
>
>
>
>
>> On Jan 7, 2018, at 8:49 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>>
>> Christopher Celio wrote:
>>> Many high-performance processors use a Branch Target Buffer to predict the instruction type well before decode has occurred. You can even let the Fetch Unit feed its own predictions into itself, fetching instructions entirely on its own, never actually seeing the instruction bits it is fetching.
>>>
>>
>> Perhaps speculating instruction decode is simply going too far?  Or at least, we need to ensure that the Branch Target Buffer is partitioned along hardware-enforced security boundaries, so that process A cannot influence predictions in process B, otherwise Jonas Oberhauser's indirect branch target leak may prove to be quite useful after all -- and prove to not actually require an indirect branch at the targeted location in the target process.  (*OUCH!*)
>>
>> I have a vague outline of such an attack, where an attacker influences branch prediction of "phantom" branches to alter the timing of another VM in a cloud.  If the attacker can trick the branch predictor to use the register of the attacker's choice instead of an actual instruction in the target, the speculated branch could leak sensitive information after all.  The primary mitigating factor I see in that attack is that the indirect branch target leaked must be part of the target's normal control flow... but if the "leaky" branch does not have to actually exist in the target, that no longer applies.
>>
>>
>> -- Jacob
>
>--
>You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

>To post to this group, send email to isa...@groups.riscv.org.
>Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Paul Miranda

unread,
Jan 9, 2018, 8:35:27 AM1/9/18
to RISC-V ISA Dev, s9jo...@gmail.com, ce...@eecs.berkeley.edu, jcb6...@gmail.com, michael.c...@gmail.com, etern...@gmail.com, er...@metricspace.net, re...@ucsc.edu

On Monday, January 8, 2018 at 1:11:01 PM UTC-6, Jose Renau wrote:
 -Capacity to use some bits in the address space to mark the "domain" in which the application runs. This is more for single address space protection.
Currently, the CPU has no way to know if the code in thread one should not leak to thread two.

Seems easier to just use ASID?
 
 
 -The 2nd thing that we may want to consider is to decrease the quality of the timers at user mode as a way to mitigate timing changes detection.This paper talks about "fuzzy" time as a way to mitigate (not solve, but mitigate)

The attacker can just use software-timed loops. As stated before, it doesn't have to work all the time since you can repeat the attack many times.
 

Jonas Oberhauser

unread,
Jan 9, 2018, 10:22:02 AM1/9/18
to Paul Miranda, RISC-V ISA Dev, Christopher Celio, Jacob Bachmeyer, Michael Chapman, Alex Elsayed, Eric McCorkle, Jose Renau
2018-01-09 14:35 GMT+01:00 Paul Miranda <paulcm...@gmail.com>:

On Monday, January 8, 2018 at 1:11:01 PM UTC-6, Jose Renau wrote:
 -Capacity to use some bits in the address space to mark the "domain" in which the application runs. This is more for single address space protection.
Currently, the CPU has no way to know if the code in thread one should not leak to thread two.

Seems easier to just use ASID?

We also thought about this but came to the conclusion that in many cases ASIDs are probably too small (our ASIDS are only 8 bits, but the ones in the RISCV spec range from 9-16, so that may be ok) and more importantly one would want to save TLB space by sharing the translations between multiple threads that use the same address space, but do not have the same detailed permissions. For that you would need something like ASIDs, but not quite ASIDs. The alternative is of course to allow ASIDs to share translations, e.g., by putting masks into the page tables ("translations made with these PTEs can be used for ASIDS where the first 7 bits equal ... and the remaining bits are whatever") or into a CSR or whatever, a little bit like a subnet.
 
 -The 2nd thing that we may want to consider is to decrease the quality of the timers at user mode as a way to mitigate timing changes detection.This paper talks about "fuzzy" time as a way to mitigate (not solve, but mitigate)

The attacker can just use software-timed loops. As stated before, it doesn't have to work all the time since you can repeat the attack many times.

I agree in this case but in general, if you reduce data flow enough, the attack may become worthless. Who cares once it takes 60 years to read out the kernel memory?

Jose Renau

unread,
Jan 9, 2018, 12:23:18 PM1/9/18
to Allen Baum, Christopher Celio, Jacob Bachmeyer, Michael Chapman, Jonas Oberhauser, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
Our btb is small, one bit tag (os vs user) is enough. (2 if we also have hypervisor)

Roll back is tricky to implement because it is an sram.

One thing is that tagging protects from Spectre but not against information leak because displaced btb entries could be visible (a previous correct btb prediction becomes an incorrect one)

Partitioning or flushing is more secure.

Maybe a reason for doing 0 cycle uBTB. It has to be way smaller to meet timing, so a flush/partition may be better. Eg reserve one entry for os, if more needed flush 50% and use those entries for os.

At return from os, just clear them

On Jan 8, 2018 10:02 PM, "Allen J. Baum" <allen...@esperantotech.com> wrote:
At 8:56 PM -0800 1/7/18, Christopher Celio wrote:
>Yes, going forward, BTBs will have to be flushed or tagged to prevent an attacker training a victim's predictor.  That by itself mitigates any Spectre attack, save the scenario where the attacker is the victim (e.g., a sandboxed JIT running with supervisor permissions).

Or we need to mark changes to BTB entries as themselves speculative and be able to roll them back?

>
>-Chris
>
>
>
>
>> On Jan 7, 2018, at 8:49 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>>
>> Christopher Celio wrote:
>>> Many high-performance processors use a Branch Target Buffer to predict the instruction type well before decode has occurred. You can even let the Fetch Unit feed its own predictions into itself, fetching instructions entirely on its own, never actually seeing the instruction bits it is fetching.
>>>
>>
>> Perhaps speculating instruction decode is simply going too far?  Or at least, we need to ensure that the Branch Target Buffer is partitioned along hardware-enforced security boundaries, so that process A cannot influence predictions in process B, otherwise Jonas Oberhauser's indirect branch target leak may prove to be quite useful after all -- and prove to not actually require an indirect branch at the targeted location in the target process.  (*OUCH!*)
>>
>> I have a vague outline of such an attack, where an attacker influences branch prediction of "phantom" branches to alter the timing of another VM in a cloud.  If the attacker can trick the branch predictor to use the register of the attacker's choice instead of an actual instruction in the target, the speculated branch could leak sensitive information after all.  The primary mitigating factor I see in that attack is that the indirect branch target leaked must be part of the target's normal control flow... but if the "leaky" branch does not have to actually exist in the target, that no longer applies.
>>
>>
>> -- Jacob
>
>--
>You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

>To post to this group, send email to isa...@groups.riscv.org.
>Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
>To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/4AF2D83D-7C4F-41E8-9218-AB5736B91E82%40eecs.berkeley.edu.


--
**************************************************
* Allen Baum              tel. (908)BIT-BAUM     *
*                                   248-2286     *
**************************************************

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Jacob Bachmeyer

unread,
Jan 9, 2018, 10:56:34 PM1/9/18
to Paul Miranda, RISC-V ISA Dev, s9jo...@gmail.com, ce...@eecs.berkeley.edu, michael.c...@gmail.com, etern...@gmail.com, er...@metricspace.net, re...@ucsc.edu
Paul Miranda wrote:
>
> On Monday, January 8, 2018 at 1:11:01 PM UTC-6, Jose Renau wrote:
>
> -Capacity to use some bits in the address space to mark the
> "domain" in which the application runs. This is more for single
> address space protection.
> Currently, the CPU has no way to know if the code in thread one
> should not leak to thread two.
>
>
> Seems easier to just use ASID?

Logically, ASIDs denote distinct address spaces, while this "domain"
concept is intended to partition a single user address space. I like
the idea, since, unlike ASIDs, domains could be within the control of a
user application.


-- Jacob

Jacob Bachmeyer

unread,
Jan 9, 2018, 11:01:14 PM1/9/18
to Jose Renau, Allen Baum, Christopher Celio, Michael Chapman, Jonas Oberhauser, Alex Elsayed, Eric McCorkle, RISC-V ISA Dev
Jose Renau wrote:
> Our btb is small, one bit tag (os vs user) is enough. (2 if we also
> have hypervisor)
>
> Roll back is tricky to implement because it is an sram.
>
> One thing is that tagging protects from Spectre but not against
> information leak because displaced btb entries could be visible (a
> previous correct btb prediction becomes an incorrect one)
>
> Partitioning or flushing is more secure.
>
> Maybe a reason for doing 0 cycle uBTB. It has to be way smaller to
> meet timing, so a flush/partition may be better. Eg reserve one entry
> for os, if more needed flush 50% and use those entries for os.
>
> At return from os, just clear them

This touches on a concept that I mentioned earlier: partitioning branch
prediction allows effectively larger BTBs to meet tighter timing
requirements -- if you have one BTB per privilege level and implement
four privilege levels, you effectively have a 4x BTB compared to mixing
them all together. You also gain isolation between privilege levels and
still meet the original timing.


-- Jacob

Reply all
Reply to author
Forward
0 new messages