Meltdown and Spectre Vulnerabilities

1,156 views
Skip to first unread message

Christopher Celio

unread,
Jan 8, 2018, 2:07:25 AM1/8/18
to riscv-boom
Hey everyone,

Since the Meltdown and Spectre vulnerabilities have been a very popular topic for the last few days, and I've seen a lot of misinformation going around regarding them, I thought I'd make a post to clarify a few things from BOOM's point of view.

The background information can be found at:


First, I am still digesting all of this information, and it may take a while for us as a community to fully process the consequences of these attacks and what the best mitigation techniques are. With that said, here are my (preliminary) thoughts on this topic.


BOOM is not susceptible to Meltdown. Meltdown appears to rely on bypassing load data that failed a permissions check. As BOOM checks the TLB as part of the dcache access pipeline, the permission violation is detected immediately and load data bypass and write-back is suppressed. There is no additional speculative cache access using the privileged data as its address.


Spectre is more complex, but it appears that both disclosed variants ("bounds check bypass" and "branch target injection") rely on two components to work: 1) a way for an attacker thread to force a victim thread to speculatively execute an existing "gadget", and 2) a covert channel to gain information from the gadget. 

In other words, Spectre relies on a malicious thread injecting information into a shared BTB/BPD structure. BOOM is currently susceptible to this, but a number of relatively simple, low-impact changes to the BTB/BPD structure (such as flushing or tagging) can guard against Spectre.

However, there is one form of Spectre that is confounding --- when the attacker thread and the victim thread are one and the same. In this scenario, there is no way to flush the BTB/BPD between the attacker setting up the misdirection and the victim speculatively executing it. 

I contend in this scenario that we have a software bug --- the software is attempting to enforce its own domain protections and not leveraging the existing protection mechanisms provided by the hardware (think of a sandboxed JIT that is running untrusted code with supervisor permissions). In this scenario, any act of speculative (not just speculative cache allocations) leaks information.

So far, I have seen a few ideas that have been proposed:
* allow SW to flush the BTB/BPD --- I'm not sure this will work as even a flushed BPD makes predictions, and a "not-taken" prediction is all that is required to force the leak.
* allow SW to insert speculation fences --- I'm concerned this is only a temporary patch, as it only protects known gadgets from attacks.
* force SW to move protected information to a protected hw domain --- I'm not sure how tenable this is, particularly in the short-term. Long-term, I suspect this might be the likely end-game.


The nice thing about open-source is that somebody can perform attacks on an understood system, somebody else can make the changes to mitigate the attack, and a third person can verify the mitigation. =)


-Chris

Tapabrata Ghosh

unread,
Jan 8, 2018, 2:22:26 AM1/8/18
to riscv-boom
>> "* allow SW to insert speculation fences --- I'm concerned this is only a temporary patch, as it only protects known gadgets from attacks."

This is the proposed solution in the Spectre paper IIRC, with the solution being to JIT in lfence or mfence instructions, at a potentially severe performance penalty.

In the short term it seems like the painful Reptoline is the only mitigation. 

BTW, I have a solution for the case where the attacker thread and victim thread are the same, and I don't think such a situation would necessarily qualify as a software bug, that code could be as simple as a password check. 


sor...@gmail.com

unread,
Jan 8, 2018, 2:35:07 AM1/8/18
to riscv-boom
On Sun, Jan 7, 2018 at 11:07 PM, Christopher Celio wrote:
> In other words, Spectre relies on a malicious thread injecting information
> into a shared BTB/BPD structure. BOOM is currently susceptible to this, but
> a number of relatively simple, low-impact changes to the BTB/BPD structure
> (such as flushing or tagging) can guard against Spectre.

This is accurate for variant 2 but not for variant 1.  Variant 1
attacks can use *any* misprediction, so it is not necessary to inject
information into the BPD; it is sufficient to wait for a natural
misprediction.


> However, there is one form of Spectre that is confounding --- when the
> attacker thread and the victim thread are one and the same. In this
> scenario, there is no way to flush the BTB/BPD between the attacker setting
> up the misdirection and the victim speculatively executing it.

Variant 1 attacks can either attack a target in the same
thread/privilege domain or a target in a different thread/privilege
domain.  Since no injection is required by the attack, no BPD-level
countermeasure is possible.


> I contend in this scenario that we have a software bug --- the software is
> attempting to enforce its own domain protections and not leveraging the
> existing protection mechanisms provided by the hardware (think of a
> sandboxed JIT that is running untrusted code with supervisor permissions).
> In this scenario, any act of speculative (not just speculative cache
> allocations) leaks information.

Per the above, I do not think this is accurate.


> So far, I have seen a few ideas that have been proposed:
> * allow SW to flush the BTB/BPD --- I'm not sure this will work as even a
> flushed BPD makes predictions, and a "not-taken" prediction is all that is
> required to force the leak.
> * allow SW to insert speculation fences --- I'm concerned this is only a
> temporary patch, as it only protects known gadgets from attacks.

The only watertight solution I see is to disable execution of loads,
stores, divides, and integer multiplies (if no IMul) behind an
unresolved branch.  It is fairly straightforward to argue that the
timing-channel leakage with that modification is the same as the
timing-channel leakage of a non-pipelined processor.  Memory
dependence speculation (both positive and negative) raises
substantially identical problems and would need to be disabled.

Rich Felker proposed "Loads whose address depend on another
speculative load should never be executed speculatively."; this is
superficially attractive, but NOT a complete fix, as when using tagged
pointers (possibly other cases) a misspeculation can cause the *first*
load after a branch to use a value as an address that should not be
leaked.

With speculative loads hindered in this way it is not necessary to
clear BTBs on privilege change, although it may be attractive as a
defense in depth.

All of these concerns are immaterial in a batch-processing environment
where a single confidentiality domain occupies the entire machine (or
a robust partition of the machine) for a sufficiently long computation
to prevent adversarial timing.  So it may make sense to have a "batch
mode" which permits more speculation but clears caches on entry and
exit …


> * force SW to move protected information to a protected hw domain --- I'm
> not sure how tenable this is, particularly in the short-term. Long-term, I
> suspect this might be the likely end-game.

> The nice thing about open-source is that somebody can perform attacks on an
> understood system, somebody else can make the changes to mitigate the
> attack, and a third person can verify the mitigation. =)

Agreed.

-s

Christopher Celio

unread,
Jan 8, 2018, 4:27:06 AM1/8/18
to riscv-boom
Thanks S, 

So to summarize our IRC discussion, a malicious user thread can use a syscall to invoke the gadget directly with the desired inputs in an attack against the kernel (or IPC to attack user-level threads). Flushing the BTB/BPD does not defend against this despite the change in priv mode.

An example is a malicious user calling into the OS via a Close() syscall (somewhat psuedo-code):

int close(int fd) 
  if (fd < process.file_num) { 
    process.files[fd].vtable.close(process.files[fd]); process.files[fd] = 0; return 0; 
  } else { 
    return -EBADF; 
  } 
}


The first thing the Close() does after fetching the file object is to load a vtable from it. That vtable fetch will leak bits from the out-of-bounds read.


Apparently the current fix is to use static analysis tools to find untrusted inputs used to drive data dependent loads and insert speculation fences: http://lkml.iu.edu/hypermail/linux/kernel/1801.0/04201.html


RISC-V does not have any dependable mechanism that I'm aware of to insert a speculation barrier. Blocking speculative cache allocations is likely only a bandaid, as other mechanisms can leak speculative information.



-Chris

lk...@lkcl.net

unread,
Jan 17, 2019, 5:05:53 PM1/17/19
to riscv-boom
hi christopher, so a bit of background: i've spent the past 2-3 intensive months designing and spec'ing a Vector Micro-architecture that happens to rely on dropping element-based operations into the instruction queue of a multi-issue out-of-order engine.  consequently, thinking about and dealing with spectre timing attacks was... kiinda high on the radar :)
 
On Monday, January 8, 2018 at 7:07:25 AM UTC, Christopher Celio wrote:

However, there is one form of Spectre that is confounding --- when the attacker thread and the victim thread are one and the same. In this scenario, there is no way to flush the BTB/BPD between the attacker setting up the misdirection and the victim speculatively executing it. 

I contend in this scenario that we have a software bug --- the software is attempting to enforce its own domain protections and not leveraging the existing protection mechanisms provided by the hardware (think of a sandboxed JIT that is running untrusted code with supervisor permissions). In this scenario, any act of speculative (not just speculative cache allocations) leaks information.


an example would be a single-threaded javascript implementation in a web browser.  this is how firefox does things.  however even chrome, which has a separate process for each javascript engine for each page, is still running arbitrary untrusted source code.

so i concur that it should be regarded as a "software bug", not a hardware one, that "untrusted" code is being executed in the same process/thread as "trusted" code.
 
So far, I have seen a few ideas that have been proposed:
* allow SW to flush the BTB/BPD --- I'm not sure this will work as even a flushed BPD makes predictions, and a "not-taken" prediction is all that is required to force the leak.
* allow SW to insert speculation fences --- I'm concerned this is only a temporary patch, as it only protects known gadgets from attacks.

this is the simplest solution, that does not have such extreme performance penalties or impose such software complexities it gets rejected by both hardware *and* software engineers alike.

it does mean a lot of time needs to be spent, reviewing *ALL* software, patching it to add the fences at critical points.

that also in turn means that, realistically, they need to be "hints", *NOT* hard-coded instructions.  it's far too late now to add special opcodes, as in-order systems (and basically any existing hardware) would be flat-out incompatible with the modifications.  which means that the modifications to insert the special opcodes would be outright rejected (debian, fedora).

so, they have to be hints, basically.

 
* force SW to move protected information to a protected hw domain --- I'm not sure how tenable this is, particularly in the short-term. Long-term, I suspect this might be the likely end-game.


without those speculation fences, it's... yeah.
 

The nice thing about open-source is that somebody can perform attacks on an understood system, somebody else can make the changes to mitigate the attack, and a third person can verify the mitigation. =)


 :)

 the idea we are mulling over for the Libre RISC-V SoC is to have two types of speculation-fence instruction:

 * one that says "for the next 16-or-so instructions, please don't allocate any more resources to speculation, maybe you could drop down to single-issue".
 * another that says "save all outstanding commits to registers, wait for outstanding write hazards (etc.), then, when completed, cancel all outstanding speculative instructions".

the first is important to call just before e.g. an upcoming system call, because you *know* that the speculation is going to be cancelled (within the next 5-16 instructions)... so why have the processor waste time, resources and power doing speculation (etc.) when it's just going to get cancelled almost immediately?

the second action is, we feel, important to *always* do (unconditionally) on an ECALL, or an interrupt, or an exception.  the idea is that any time that there is a transition from untrusted to trusted code (system calls) you absolutely have to have the internal state cleared back to a known quiescent state.  hence the need for ECALL protection, because otherwise an untrusted process may make arbitrary system calls and ascertain timing-related information about what the kernel is doing.

this is horribly, horribly complex, and absolutely impossible to deal with in a sane way in hardware (alone).  the only way we could think of to deal with this properly in hardware is to basically shackle the OoO engine so that it becomes, in effect, no different from an in-order single-issue processor, it's that appallingly bad.

it also can't be dealt with automatically in the linux kernel either: linus torvalds has got so pissed off with intel's spectre mitigation patches that he's started rejecting them, the performance and complexity are so drastically affected.

speculation fences are what we (independently) came up with as well, however we do not see them as being a temporary patch, we perceive them to be *the* long-term solution that may take years to complete: that's what it's just going to take.

luckily a lot of the high-security entry points may easily be identified automatically: SE/Linux function calls, use of the PAM security library calls, and any instance where an openssl function call is used: all of these are pretty obvious high-priority candidates for patching.

funfunfun :)

l.

p.s. shared DIV unit across multiple cores, obviously bad idea, cannot be dealt with by any number of speculation fences, as it's core-to-core.  the speculation fence (hint) deals with *same* core timing attacks, specifically same-process ones.

lk...@lkcl.net

unread,
Jan 18, 2019, 2:32:33 AM1/18/19
to riscv-boom


On Monday, January 8, 2018 at 7:35:07 AM UTC, sor...@gmail.com wrote:
 
With speculative loads hindered in this way it is not necessary to
clear BTBs on privilege change, although it may be attractive as a
defense in depth.

the keyword there being "hindered", and that's just one type of attack.  [there are others that have not yet been discovered, and, this is not a joke: the only way to ensure that all these past *and future* attacks will not succeed is to basically make the architecture no different from a single-issue in-order one].

in the analysis that we did, we noted that it's whenever there is resource starvation that a timing attack may occur.  in-order processors [almost] always do not stall on instruction issue: the register file has enough bandwidth (enough ports), the operand forwarding bus has enough bandwidth, and the in-order pipelines [almost] never cause another instruction to stall.

the Mill Architecture, fascinatingly, also is completely immune to spectre timing attacks.

so whilst you _could_ massively over-allocate on resources (massive over-porting of the register file, massively over-expand the operand forwarding bus), and throttle back the number of instructions issued per cycle, then do a formal analysis to make absolutely, absolutely sure that under no circumstances will any one instruction cause resource starvation, the cost of the resource over-allocation is so high that the design would be rejected on the basis of its uncompetitive power consumption.

we concluded that this basically leaves speculation fences (hints) as the only real (sane) workable option.

so that in turn means that the next phase is to propose which hints should be allocated, as the first "official" hints, to be added to the RISC-V Specification.

l.

lkcl

unread,
Feb 3, 2019, 4:12:23 AM2/3/19
to riscv-boom
Christopher, I note that you are cc'd in the following discussion:
https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/3jvLrCKpfag/QrRKS0oKGwAJ

There has been no response from anyone at Berkeley and noone has come forward from the RISCV Foundation to take responsibility for leading a critical speculation initiative.

Why?

lk...@lkcl.net

unread,
Feb 21, 2019, 4:39:16 PM2/21/19
to riscv-boom
https://arxiv.org/abs/1902.05178

the above paper shows that it is absolutely critical to have intra-process timing attack protection, as it is completely impractical for all software (world-wide) to be redesigned to an inter-process security model.  FastCGI as a paradigm would need to be utterly abandoned.

i note that there are responses in other topics.  why is there no response on this topic?  are you aware of and has the BOOM group thought through the full implications for the BOOM group of failing to respond, given that these discussions are public and archived irrevocably and permanently, for everyone across the world to read and see?

fixing intra-process timing attacks clearly cannot be ignored.  why is the BOOM Group ignoring the issue?


Christopher Celio

unread,
Feb 21, 2019, 5:41:22 PM2/21/19
to riscv-boom
Thank you for sharing that paper. 

Abstract: "As a result of our work, we now believe that speculative vulnerabilities on today’s hardware defeat all language-enforced confidentiality with no known comprehensive software mitigations [...] In the face of this reality, we have shifted the security model of the Chrome web browser and V8 to process isolation."

I think the paper reinforces my earlier assertion that I suspect the SW will have to hide private information behind HW protection, as it is difficult/impossible for SW to protect itself from itself. Of course this upends everything (JIT, sandboxes, etc.). And I still stand by the belief that speculation fences are a poor bandaid solution that will not catch all holes, but I know many other people are pursuing this approach using extant hardware and I eagerly await their results! If that worked, it would be the easiest way out of this disaster by far.

fixing intra-process timing attacks clearly cannot be ignored.  why is the BOOM Group ignoring the issue?

I don't know how you can leap from a lack of satisfactory email traffic to ignoring security. We have already demonstrated spectre attacks on BOOM and open-sourced these efforts so that others can measure, extend, and attempt to mitigate the attacks. But solutions will not come quickly. We are excited for people to bring their own ideas and extend BOOM to test them out. If people have any questions about how BOOM works and how one might implement any particular mitigation strategy, then I think we can quickly give advice on a path forward. If people are looking for the One Correct Solution to all things Spectre, that will come much, much slower.
 

-Chris

lk...@lkcl.net

unread,
Feb 22, 2019, 12:26:55 AM2/22/19
to riscv-boom


On Thursday, February 21, 2019 at 10:41:22 PM UTC, Christopher Celio wrote:
Thank you for sharing that paper. 

Abstract: "As a result of our work, we now believe that speculative vulnerabilities on today’s hardware defeat all language-enforced confidentiality with no known comprehensive software mitigations [...] In the face of this reality, we have shifted the security model of the Chrome web browser and V8 to process isolation."

I think the paper reinforces my earlier assertion that I suspect the SW will have to hide private information behind HW protection, as it is difficult/impossible for SW to protect itself from itself.

jose's slides show that there's a common paradigm of different security "domains" (timing domains) in-process, and he refers to these as "API" domains.

we discussed an idea of having hardware-level instructions that could be called at entry and exit of such areas of code (within the "trusted" context of a given single process of course), in exactly the same analagous (in-process) way as a hardware context-switch, to *achieve* the exact same privilege-fencing opportunity *as* a hardware-level (process-to-process) context-switch.

however it occurred to me that there is no actual need to differentiate between the entry and exit "fence".

Of course this upends everything (JIT, sandboxes, etc.).

it does.  it's a massive 25-year-delayed paradigm shift that will require acceptance across the entire computing industry.  it's going to need something idiotic like getting someone like linus torvalds or other high-profile individual to first understand and accept, and then talk publicly about it.

fortunately for the most part, the majority of software such as openssl, PAM etc. can have the call to the speculation fence added, to cover a significant proportion of cases.  JIT and sandboxes however is more involved, and interpreted programming languages will need a new library added where the users of that programming language (python, perl, php) may call the speculation fence code to protect single-process applications from themselves.

ultimately it needs to sink in to standard computing design at the level of an extremely boring "HOWTO" that comes up on stackoverflow as often as "how do i write a helloworld program" or "how do i do network sockets".

And I still stand by the belief that speculation fences are a poor bandaid solution that will not catch all holes,

speculation fences are not enough. it's *timing* fences that are needed, where speculation is a *subset* of timing attack mitigation.

the requirement is that the timing-protection fence *be* unequivocably and absolutely as strong and equal in EVERY respect as a hardware-level process-process context-switch's timing-protection.  if an implementor disregards that and fails to provide the exact same strong timing-attack quiescing as they put into place in a hardware-level context-switch, then they have failed as badly as if they had failed to solve timing attacks *at* a context-switch.

now, there may be cases where they actually genuinely fail to understand quite how serious this is *even* on hardware-level context-switching, the attack surface is so insanely large.

the more i investigate this, the more shocked i become at quite how problematic this really is.  it's not just speculation, it's not just out-of-order systems: it's *IN-ORDER* ones as well.  everywhere and anywhere there is a pipeline stall or resource contention that results in delays, that is the *definition* of an opportunity for a timing attack.

but I know many other people are pursuing this approach using extant hardware and I eagerly await their results! If that worked, it would be the easiest way out of this disaster by far.

fixing intra-process timing attacks clearly cannot be ignored.  why is the BOOM Group ignoring the issue?

I don't know how you can leap from a lack of satisfactory email traffic to ignoring security.

six to eight messages over an eight to ten week period on numerous forums, asking who is going to take responsibility, and not receiving a response?  unfortunately, it's a logical implication that fits the available evidence.

it's good to have a response and have the misunderstanding cleared up.  i have however still not *actually* received a response as to who is going to take responsibility within the RISC-V Foundation for leading a *public* (non-secretive) initiative, and interacting with and listening to the needs of *everyone* (not just those people and organisations who have signed up for RISC-V Membership).

We have already demonstrated spectre attacks on BOOM and open-sourced these efforts so that others can measure, extend, and attempt to mitigate the attacks. But solutions will not come quickly. We are excited for people to bring their own ideas and extend BOOM to test them out. If people have any questions about how BOOM works and how one might implement any particular mitigation strategy, then I think we can quickly give advice on a path forward. If people are looking for the One Correct Solution to all things Spectre, that will come much, much slower.

it's a huge cross-industry screwup / oversight of at least 25 years standing.  it's an issue that the entire industry faces.  just as that paper points out: every single processor that has any kind of optimisation (basically involving resource contention) at *any* level.  just about the only current design that is not vulnerable to timing attacks is a single-core single-issue in-order no-pipeline-stalling design with very, very special attention paid to the design of the TLB, L1 and L2 caches (or does not have any at all, such as ECs).

yes it's really that serious.  it's so serious that the entire computing industry hasn't fully grasped it, and much of the industry will be in complete denial for a long, long time.  i've already encountered several levels of denial, in various quarters.

we're also going to be facing severe marketing dept driven back-lash due to the amount of money invested in processor design (and the latency involved, i.e. the fact that design teams start work a good couple of years before a processor hits a market).  no processor company wants to hear that the processor they've spent $50m on *so far* is full of *new* gaping timing-related holes.

l.

Reply all
Reply to author
Forward
0 new messages