It doesn't seem very pleasant to write the glue to handle these cases on RISC-V. In particular, the exception entry code needs to save away all of the control state (sepc, scause, sbadaddr, etc), and if an exception nests before this is completed, then the outer exception's state is irretreivably lost. I can see a way to work around this (save all the state to a non-stack location, then set some flag so that a nested exception will know that it's nested, then save again to the stack), but in my opinion it would be quite nice to have some form of hardware support.
x86 avoids this issue entirely by pushing everything to the stack in microcode, but I can see why RISC-V would want to avoid this.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/d7bbe596-7aea-4839-ae8e-152bb17b1857%40groups.riscv.org.
On Tue, Oct 4, 2016 at 1:20 PM, Andrew Lutomirski <aml...@gmail.com> wrote:
> Hi all-
>
> Background: I've never written a line of RISC-V anything in my life, but I
> have recently rewritten a large fraction of Linux's x86 entry code, and I
> thought I'd give my two cents on the privileged spec 1.9.
Thank you for the time, I am very pleased with the attention the
RISC-V S-mode is getting recently.
> When a synchronous exception is delivered, it seems to me that it would be
> very helpful to record the faulting instruction into a CSR. This way faults
> can be handled (and instructions can be emulated if needed) by
> higher-privileged code without needing to re-fetch the instruction. This
> avoids races as well as any need to worry about memory protection that would
> allow, say, a user-executable instruction that is not readable by supervisor
> code without fiddling with the memory protection control bits. Both VMX and
> SVM on x86 can do something like this for VM monitors only, and it's quite
> handy. It seems like it would be straightforward for RISC-V to support it
> for all modes, especially given that instruction words are short.
RISC-V core instructions are short but the ISA allows (does not
require) arbitrarily long instructions. As such it's probably
necessary to have a mechanism to fetch whatever part of the
instruction is not captured in the buffer; there are already
mechanisms for S-mode to fake a U-mode data access, but not to fake a
U-mode instruction access, which is a problem since RISC-V supports
execute-only pages.
On Tue, Oct 4, 2016 at 4:18 PM, Andrew Lutomirski <aml...@gmail.com> wrote:
> That might be worth adding. In general, if I were working on a RISC-V
> kernel or hypervisor, I think I'd be okay if only a bounded number of
> instruction bytes were reported. The really long instructions are probably
> much less useful to emulate anyway.
> As a very minimal proposal, what if there were just a few more scratch
> registers along with a move-CSR-to-CSR instruction (which may already exist
> -- I haven't paid enough attention to the encoding)? Then the page fault
> entry could just stash the CSRs it cares about (which, in practice, might be
> just SEPC, but kernels could do whatever they need) into the extra scratch
> CSRs along with a flag saying "I'm in the page fault prologue". Then a
> nested page fault could notice this flag and handle the stack overflow.
Yes, part of our problem is that the very beginning of the trap
handler is constrained by a lack of free registers. sscratch isn't
truly free because it's used to locate the spill area IIRC.
One potential complication is that we might want to minimize the
number of CSR accesses - BOOM treats all CSR accesses as serializing
instructions. There's speculation in the ISA spec that
implementations might be able to rename sscratch and mscratch, but I
don't know how that will work in practice, and I imagine renaming
stvec is out of the question.
I'm not sure a CSR-CSR move is needed (it definitely does not exist);
you can move through the GPR file once you've moved one GPR to the
spill area.
If a hardware interrupt arrives when the kernel stack is full, how do
you process the stack overflow without losing the interrupt whose
entry code faulted? The entry code presumably needs to look at scause
before deciding which stack to use.
> Alternatively, the STVEC register has the two low bits free. What if one of
> those bits meant "don't update SEPC, etc on entry" or perhaps "write to
> SEPC2, etc on entry" where SEPC2 was an extra CSR? Then the page fault
> entry could switch STVEC to a special nested-entry handler with that bit set
> and then restore it back to normal once it's done saving all of its state
> away.
I think only one if the chip implements RVC or another odd-length
instruction set.
-s
-s
One potential complication is that we might want to minimize the
number of CSR accesses - BOOM treats all CSR accesses as serializing
instructions. There's speculation in the ISA spec that
implementations might be able to rename sscratch and mscratch, but I
don't know how that will work in practice, and I imagine renaming
stvec is out of the question.
Chris can comment more on whether we should be concerned. I have no
idea and it's clearly more "punted" than "impossible".
How much complexity does the second write-port of csrrw t0,sscratch,t0 add?
On Tue, Oct 4, 2016 at 5:05 PM, Stefan O'Rear <sor...@gmail.com> wrote:
> On Tue, Oct 4, 2016 at 4:56 PM, Andrew Lutomirski <aml...@gmail.com> wrote:
>> That sounds unfortunate. Even the current Linux code does quite a few CSR
>> accesses when handling traps (read all the interesting state on entry and
>> write it back on exit), and if serializing instructions are anywhere near as
>> expensive as they are on x86, this will destroy exception handling
>> performance.
>
> Chris can comment more on whether we should be concerned. I have no
> idea and it's clearly more "punted" than "impossible".
It's very possible -- a Andrew points out, Intel's x86 implementations
can, in certain circumstances, transition between privilege modes
without serializing, and they've got quite a bit more baggage to deal
with in such an operation. It's a mere matter of microarchitectural
complexity.
I think you'd get a lot of the benefit from renaming *scratch and
pattern-matching a few *status writes that don't have visible side
effects (e.g., no need to flush the pipeline when twiddling
interrupt-enables, if you're careful). Also, no need to serialize
when reading *epc/*cause, as long as you serialize on exceptions and
*epc/*cause writes.
By comparison, avoiding serializing on privilege transfers would
provide less improvement than the above, and would be substantially
more complex. Still doable, though.
Il 06/ott/2016 02:18, "Jacob Bachmeyer" <jcb6...@gmail.com> ha scritto:
> An easy solution to the memory-protection issue would be
> for all user pages to be implicitly read-write in S-mode. (Note that
> this does not permit the supervisor to execute from a user page,
> but I believe that the most common use for S-mode execution
> from a user page is exploiting the supervisor.)
This would work, but you don't want the kernel to be able to read certain pages except during code sections that copy from/to userspace, in order to block attacks such as return-oriented programming. So this would require a bit such as SPRV (mimicking MPRV) in sstatus.
Paolo
Andrew Lutomirski wrote:
> When a synchronous exception is delivered, it seems to me that it
> would be very helpful to record the faulting instruction into a CSR.
> This way faults can be handled (and instructions can be emulated if
> needed) by higher-privileged code without needing to re-fetch the
> instruction. This avoids races as well as any need to worry about
> memory protection that would allow, say, a user-executable instruction
> that is not readable by supervisor code without fiddling with the
> memory protection control bits.
An easy solution to the memory-protection issue would be for all user
pages to be implicitly read-write in S-mode. (Note that this does not
permit the supervisor to execute from a user page, but I believe that
the most common use for S-mode execution from a user page is exploiting
the supervisor.)
Race conditions could be a bit harder to address, but what would the
S-mode parsing of a user instruction on the same hart race with?
> It seems like it would be straightforward for RISC-V to support it for
> all modes, especially given that instruction words are short.
This is the fly in the ointment: currently instruction words are short,
but the instruction length is extensible and can exceed XLEN as others
have noted.
Another limitation is that this would require trap handlers to either be
idempotent or to run with interrupts enabled
On Oct 6, 2016, at 09:18 , Michael Chapman <michael.c...@gmail.com> wrote:
Is there any particular reason for specifying the result of an integer divide by zero?
(Other than specifying that the operation terminates and produces an undefined bit pattern in the register).
In particular, a signed value divided by zero using signed division is specified as giving the result of -1 for the quotient.
In at least one particular standard implementation method, the value resulting from a signed divide of a negative number by 0 if no additional logic is spent for detecting this case, is +1.
A slightly more natural value than -1 would be either the minimum possible signed value (i.e. the negative value 0x80000000 for a 32 bit operation) or the maximum possible signed value (i.e. 0x7fffffff for a 32 bit operation) depending on the sign of the dividend if we are going to the trouble of implementing a specific value for these cases.
Specifying -1 for the result of a signed division of a number by 0 does not seem logical.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/cb567177-4041-6093-baa7-4e21f1d8bbae%40gmail.com.
0. In particularly in the context of a processor ISA, *everything* should be definedby the spec. Leaving things undefined means it *will* be defined by implementationsand not necessarily consistently. Poorly defined behavior leads to misunderstandings,leading to bugs, leading to security issues.
1. Exactly *what* the behavior should be is a different issue. Personally, I prefer theprinciple of least surprise, followed by programming convenience, and only last,implementation convenience.
> sorear2's solution looks fairly good, although I expect will be slow
> unless sscratch access is well-optimized by the hardware implementation.
>
> Here's one more straw-man proposal: have trap entry set a bit in
> sstatus and make traps that happen while that bit is still set be
> called "double faults" and enter via a different vector. At the very
> least, this would let the kernel print a nice error instead of
> infinite looping if something goes wrong very early in exception entry.
That is the "trap acknowledged" flag I suggested, but with different
effects and an additional trap vector. What happens if the double fault
handler causes a fault?
Andrew Lutomirski wrote:
> I think this would cause problems. In Linux, at least, there are two
> or three modes for access to user pages that the kernel would want to use:
>
> 1. No access to user pages at all. For kernel hardening, this should
> be the normal state of affairs. When the kernel wants to access user
> memory, it will change the mode. (On very new x86, this is available
> using SMAP. ARM and ARM64 have similar mechanisms.) Switching in and
> out of this mode would ideally be very fast.
PUM currently provides this.
> 2. Access using the same restrictions that user code uses. When the
> kernel wants to access user variables (__get_user(), __put_user(),
> copy_from_user(), copy_to_user(), etc), it wants normal user access
> semantics to apply. This may result in page faults, and the kernel
> will handle those page faults appropriately. A user should not be
> able to call a function like gettimeofday() with a pointer to
> read-only memory as the output argument and get that read-only memory
> overwritten.
The kernel should be able to verify addresses using its own
memory-tracking structures. I admit that these checks would not benefit
from the TLB. Could a "verify user address" instruction be worth adding
even though it would not have encoding space to support an immediate offset?
> 3. (special case, rare) The kernel occasionally wants some way to read
> user instruction memory. This isn't very common and could be done by
> manually walking the page tables, but it would be somewhat nice if the
> kernel could just do it. Maybe a CSR bit could be set causing reads
> to user memory to use instruction fetch semantics. For x86, this
> hasn't mattered much in the past because reads were more permissive
> than instruction fetches, but memory protection keys (PKRU) changed
> this and it's a minor mess now.
I suggest that the supervisor always be able to read all user memory.
You have missed the more common case of the kernel needing to *write*
user instruction memory as when paging in program text.
The problem is that there are only a few bits left in mstatus, and they
may end up needed for nested trap handling. Since the supervisor
presumably loaded an X,!R page, I think that S-mode should always be
able (if PUM is clear) to read user pages, but never be able to execute
from user pages. You have convinced me that user-read-only mappings
should also be supervisor-read-only.