Jose Renau Professor, Computer Science & Engineering |
On Fri, Aug 3, 2018 at 3:54 PM Jose Renau <re...@ucsc.edu> wrote:You can do speculation and be Spectre safe as long as there are no side effects. Current cores did bit do it because they did not care.Check the last riscv workshop talk from Chris. It shows how to solve the issue.Or check this longer talkso it's amazing how much difference a few months can make, as, thanksto mitch alsup and others on comp.arch i did an intensive in-depthstudy of out-of-order systems. this is an extremely usefulpresentation, jose, as it helps categorise the domains in which timingattacks occur.the long and short of it is: OoO is f*****d, big-time. yet, we can't"give up" and go back to in-order single-issue as the benefits of theincreased performance dramatically outweigh the security risks in themajority of use-cases.the area that is hardest to protect against *in hardware only* is thesame-process one (which is covered by the "APIs" category in yourpresentation, jose).inter-core and across kernelspace-userspace boundaries can be dealtwith in hardware. inter-core: instead of a shared DIV unit, have oneper core. across kernelspace-userspace boundaries (exceptions,basically), the processor has an atomic hardware event at which theengine may be paused until it reaches a quiescent uniform state.same-process timing attacks ("API" category) however simply cannot bedealt with - they simply cannot be detected - without an *actual*instruction that is called which the hardware knows "this is an APIcall, we have to quiesce the speculation internal state, right now".these instructions need to be done as hints, because they also needto be ignored by in-order systems.*we need consensus on what to do* as a group, here. this cannot beleft for just one RISC-V implementor to proceed without a discussion,as it involves modifying software right across the board, in userspace*and* kernelspace *and* firmware *and* bootloaders.who is going to step forward and take responsibility for leading thediscussion?l.
Jose Renau Professor, Computer Science & Engineering |
On Tue, Jan 29, 2019 at 6:55 AM 'Jose Renau' via RISC-V ISA Dev <isa...@groups.riscv.org> wrote:
My main "requirement" is to allow hardware to handle this spectre leaks efficiently. I mean,if we add instructions, hardware that does not leak should be able to perform them as NOPs,and try to avoid Gazzilions of NOPs.
that's why i advocate them to be "hints". it needs to be the first "official" hints, otherwise the first implementor that uses "custom" hints for public distribution (public world-wide release of upstream patches to gcc and other software that requires the hints) will result in fragmentation of the RISC-V ecosystem.
My other "want" is being able to mark "time domains" or when is possible to have a time leakbetween two groups and when it is not.
that's same-process, right? do you envisage that to still result in quiescing of the internal state, except perhaps each time domain having the equivalent of an ASID? (an identifier per time-domain?)
E.g: it is OK to time like between threads in a PARSECapplication, but not OK between threads in a web browser.
would that be based on an assumption that web browsers cannot trigger spectre-style timing attacks? or have i misunderstood? the reason i ask is that it is known that javascript may be used to trigger spectre-style timing attacks:
There should be some RISCV way tomark this efficiently.
hints [hints are operations that have no effect, that one microarchitecture may take to mean "something", whilst they are NOPs on others].l.
Jose Renau Professor, Computer Science & Engineering |
On Tuesday, January 29, 2019, 'Jose Renau' via RISC-V ISA Dev <isa...@groups.riscv.org> wrote:See inlineAck.
I meant the opposite. In Javascript we need protection between different threads ("Time domains" == threads that need time side channel isolation) because threads do not share data. In PARSEC, we have threads, but we do not need protection because they share pointers and data.
Right got it. I did wonder :)In SE/Linux the security boundary is exec (not even fork because fork can share sockets). JS and other interpreters (python, java) there will be assumptions involving mutexes and so on...This is a massive deal, it's a huge paradigm shift in how programming needs to be done. Just as people are taught how to do sockets, when to call the time-speculation fence hint will need to become just as prevalent, with tutorials and example code online.dang.
I agree that extending the ASIDs may be a way to deal with it. Now, different ASIDS are used for different process, if we have different ASIDs for different threads, we could use this to mark time domains.
That in turn implies that, realistically, there needs to be a "start of domain" hint, not just a "transition" hint.Effectively similar to how mutexes work. Or how the NT System Calls EnterCriticalSection and LeaveCriticalSection workWithout both a start and end hint to mark the critical section, the risk is that the program may continue to run after a transition, thinking that it is in a given time domain when in fact it is not.Also... deep breath: the TDID (time domain id) needs to be saved and restored on context switch.Ah... is the TDID actually something that needs to be pushed on the stack? Argh, I think it is. A function call may need to be temporarily in one time domain, and switch back to the former on exit.Are TDIDs to be shared across processes? Don't know the answer to that one. Doesn't sound to me like they should be. Making them unique therefore, at the hardware level, they would need to be concatenated with the ASID from the TLB.Are TDIDs to be shared across cores? No, standard hardware spectre timing mitigation is supposed to take care of that boundary.This is really very involved, and not a lot of choice in the matter, if OoO and speculation (and associated performance) is to be kept.
A time domain is a collection of threads that can share data and that it is OK to have time leaks within the same time domain. It is not OK to have time leaks across time domains.
I think that the Time Domain ID can be fairly low overhead.E.g: if we use bit 62 to 53 of the physical address space (10 bits). We can have 1024 different IDs at the same time. The hypervisor can assign different Time Domain IDs (TDID) and use the upper bits in the physical as ID. It should be transparent for software. Only if they want to use it, they need to make sure to have different upper physical bits (a request to the OS, and the OS maps to available time domain IDs).I think that it should be possible extending the ASIDs concept, but I have not gone over details.
[apologies to cc recipients who may have already received it: this message has not shown up on the isa-dev mailing list: reposting]On Tue, Jan 29, 2019 at 5:36 PM 'Jose Renau' via RISC-V ISA Dev <isa...@groups.riscv.org> wrote:I think that the Time Domain ID can be fairly low overhead.E.g: if we use bit 62 to 53 of the physical address space (10 bits). We can have 1024 different IDs at the same time. The hypervisor can assign different Time Domain IDs (TDID) and use the upper bits in the physical as ID. It should be transparent for software. Only if they want to use it, they need to make sure to have different upper physical bits (a request to the OS, and the OS maps to available time domain IDs).I think that it should be possible extending the ASIDs concept, but I have not gone over details.allow me to take a step back, and make an assertion then (haha) do some speculative branch-prediction of the conversation.the hypothesis is that the TDID is not needed (nor the invasive paradigm shift in computing), on the basis that a uniform quiescence of the OoO engine to a known state is all that is needed when switching from one time domain to another.would you agree with that? if not, please do ignore the branch-predicted path of the conversation that follows :)
some background:a way to state spectre-based timing attacks is that a given (untrusted) instruction may affect the completion time of past *or future* instructions, the timing being potentially affected through shared resource bottlenecks of numerous different types.in-order systems are [typically] immune to timing attacks precisely because they are specifically designed never to stall the pipeline(s). any given instruction *always* [typically] completes in a fixed time independent of past [and future] instructions, because that's just the way that the pipelines, register ports, caches (and TLB?) are set up.put over-simply: in an in-order system there *is* no speculation by which a stall may be caused (which by definition *is* a timing attack).so this is why (in other threads) i described that OoO systems may be made immune to timing attacks by *massively* over-resourcing the number of ports on the register file, as well as the bandwidth on the operand forwarding bus (if one exists), massively over-resourcing the number of Function Units (if a scoreboard design is utilised), and backing down the amount of branch-prediction and instruction-issue to the point where it can be *formally proven* that all and any given OoO instructions *will* complete in a guaranteed time.augmentations to that include permitting resource-consuming speculation on the proviso that if the resources being used for speculation are required for a *non*-speculative instruction, the non-speculative instruction takes absolute guaranteed precedence *in the same instruction cycle*.basically, it's hell to implement and takes up huge numbers of gates, hence the need for the alternative solutions.so the idea is that as long as, to one group of instructions (in one time domain) another group has no way to determine any information from a group of instructions in another time domain, we're ok.my point is: that does *not* necessarily mean that it is necessary to assign an ID *to* any given Time Domain. we *only* need to guarantee a means of separation *between* them.now, if it were the case that there was some sort of special instruction usage (a restricted subset of instructions or features of instructions) that would guarantee that certain *TYPES* of spectre-style timing attacks were known NEVER to occur (across any given Time Domain transition), THEN it would be useful to assign TDIDs to groups of instructions, and, in a similar fashion to memory FENCE instructions, use the change of TDID to identify which spectre-related resources needed to be quiesced, thus, we reason, reducing latency i.e. the amount of time needed to wait for the processor to quiesce to a known-good (uniform) state.an example would be that it was known (guaranteed and formally declared by the application writer) that a given Time Domain was not going to use any DIV instructions. thus, the TDID-FENCE instruction could declare "This TDID does not use DIV", and, consequently, on switching from one TDID to another, if during the transition there happen to be some outstanding DIV operations, they need not be quiesced.clearly, if the Time Domain violated that constraint (by then actually using a DIV operation when it had formally declared that it was not going to), an exception would need to be raised.which means in turn that one of the primary advantages for having Time Domains is even more complex than formerly envisaged.my assertion is: in the case of spectre-style timing attacks, unlike memory FENCE instructions, i do not believe that there *are* any (safe) subdivisions of the types of attacks. the whole basis of immunity against spectre *is* that the processor returns to a known-good quiescent state in which it is *guaranteed* that no instruction to be executed in the immediate future will be short of resources due to past ones still within the system.or, more to the point: it is far, far too early and too little research has yet been done to be able to deploy such fine-grained Time-Domain-related security strategies.which leaves a blanket, uniform "*everything* is quiesced" speculative fence instruction as the safest, simplest, most pragmatic option.in other words, it is fortunate that a uniform quiescent state is what's needed, and it so happens that it doesn't matter what the domain is: all that matters *is* that the internal state is quiesced [1] at the transition point.does that sound reasonable?
l.[1] full quiescence may *or may not* be required. remember that the actual requirement is that the subsequent instructions have 100% available resources such that they are guaranteed not to be affected by the past instructions already in the system. so whilst on first analysis it may appear that a full commit to the register file is needed, a full cancellation of all speculative operations, etc. etc., this may not actually be the case. it is however up to the architectural implementor to determine that, *not* the specification of the proposed FENCE hint itself.
My main "requirement" is to allow hardware to handle this spectre leaks efficiently. I mean,if we add instructions, hardware that does not leak should be able to perform them as NOPs,and try to avoid Gazzilions of NOPs.My other "want" is being able to mark "time domains" or when is possible to have a time leakbetween two groups and when it is not. E.g: it is OK to time like between threads in a PARSECapplication, but not OK between threads in a web browser. There should be some RISCV way tomark this efficiently.
On Tuesday, January 29, 2019 at 12:55:30 AM UTC-6, Jose Renau wrote:My main "requirement" is to allow hardware to handle this spectre leaks efficiently. I mean,if we add instructions, hardware that does not leak should be able to perform them as NOPs,and try to avoid Gazzilions of NOPs.My other "want" is being able to mark "time domains" or when is possible to have a time leakbetween two groups and when it is not. E.g: it is OK to time like between threads in a PARSECapplication, but not OK between threads in a web browser. There should be some RISCV way tomark this efficiently.It is time to start designing microarchitectures that are not subject ot Spectré and Meltdown styleof attacks. These attacks observe microarchitectural state that is not defined at the architecturelevel.The first requirement to avoid the attacks is not to allow microarchitectural state to leak intoarchitectural state--and the prime way of doing this is to completely avoiding modification ofany support structure until the Write stage of the pipeline (ROB in OoO design points). Thisincludes I and D Caches, I and D TLBs, I and D tablewalkers (when present), along with CRwrites and register writes. Doing these cost fairly little, utilizes already existent HW features,but may add some additional pressure on those resources.
At a microarchitectural level, one is going to have to tag data with a "I'm not real" bit and donot allow a subsequent calculation using an operand so tagged. This is the integer and AGENversion of the FP NaN but needs to use a bit not part of the operand/result.
Multithreading is replete with leaks from one thread to its neighboring threads. MT probablyhas to die, especially if there is access to an accurate real time clock.RISC-V is still in design, let us not repeat mistakes of the past in the future.
RISC-V is still in design, let us not repeat mistakes of the past in the future.
> by the time things like muilt-issue have been added in, and L1 and L2 caches and TLBs, it's no longer called a barrel processor, it's called Hyper-Threading.
I'm showing my ignorance here -- I thought hyperthreading was
opportunistic -- e.g., switch contexts only when the current
hyperthread is to block on something. A barrel design is rigid in its
timing. It's like the difference between common Ethernet and Sonet:
both rely on time-division multiplexing, but the former is
opportunistic (I can use the network as long as I don't hear anyone
else using it) while the latter is rigidly defined by atomic clocks
(I'm not allowed to transmit this next fixed-size buffer until 125
microseconds from ..... now).
There are at least a 1/2 dozen ways to do multithreading CPUs.It it is all based on decisions made at/around FETCH time andat/around DECODE time in the pipeline.A Barrel processor has a strict DIV-N timing. Thread K gets a FETCHcycle or a DECODE cycle when clock MOD N = k.
A power saving MT design might change threads only when it seesa Cache/TLB miss and switch to another thread to occupy the pipe.
A higher perf design might have several threads fetch and decodeinstructions and when the master thread cannot use a function unit,some other thread lobs an instruction in its direction; thus keepingthe function units busy. {Have not seen this one implemented.}
In the guise of the immediately above, one could have multiplefetch and decode units that are BW limited into their register file(each not en-massé). As long as the aggregate RF BW is sufficientperf remains good.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/eb4f6600-6f42-424b-a4bc-22a76c9b0d3c%40groups.riscv.org.
That sounds like the fuzzing approach used in GPS to decrease accuracy in non-military receivers.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/b2811121-a499-4532-9e52-076e378f3781%40groups.riscv.org.