There is a similar issue on the return from sbreak, where I think
a separate sret flavor will be required if debugging or single-
stepping through an interrupt service return is desired.
Why not be more obvious? "NRE" (Non-Resumable Exception).
--
Cesar Eduardo Barros
ces...@cesarb.eti.br
----- Original Message -----
From: "Cesar Eduardo Barros" <ces...@cesarb.eti.br>
To: "Monte Dalrymple" <mon...@systemyde.com>; "Andrew Waterman"
<and...@sifive.com>
Cc: "isa-dev" <isa...@lists.riscv.org>
Sent: Thursday, February 04, 2016 1:57 PM
Subject: Re: NMI and mepc
Although they have occasionally been used as a way of providing a
higher-priority interrupt, this is really abusing the concept.
Krste
https://lwn.net/Articles/484932/
http://www.linaro.org/blog/core-dump/debugging-arm-kernels-using-nmifiq/
The main points:
* watchdog interrupt - needs to be able to trigger while interrupts
are disabled
* profiling interrupt - needs to measure time while interrupts are
disabled
* non-maskable IPI - need to wake up a hart that has interrupts disabled.
The first two sound like non-maskable timer interrupts:
The third is used for sampling backtraces in CONFIG_DEBUG_SPINLOCK and
likely has other uses
Observations:
* The current IPI in the privileged spec is a software interrupt in the
'sip' register (supervisor)
* The IPI CSR in the code is 'mipi' (machine) - this CSR is not in
privileged-spec-v1.7
Ideas:
* Should the IPI CSRs be sip/sipi or mip/mipi to be consitent?
* Should there be nmtie for non maskable timer interrupts?
This extra complexity would mean RISC-V would need to support
recoverable NMIs.
~mc
http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
The other thoughts are related to receiving an NMI while processing an
NMI. NMIs should be masked while handling an NMI if we want them to be
recoverable. Fault handling during an NMI also needs to be clearly
specified. Page faults can occur in machine mode (MPRV bit set) and are
a number of other faults that could occur during NMI processing
(privileged-spec-v1.7 Table 3.7: Machine cause register (mcause)
values). It's probably best to be able 'unmask' NMIs rather than add
some silly mechanism like unmasking NMIs on the first eret (which gets
broken by any subsequent fault). Machine mode can be considered as
micro-code level so we should be able to do magic things like unmask
NMIs. The other thing to consider is the so-called 'unmaskable' timer
interrupts and if they are only available in machine mode then we would
be requiring an ecall to the HAL to do things like enable watchdogs and
profiling timers (CONFIG_DEBUG_SPINLOCK) and to bust spinlocks
(mipi/mip,sipi/sip). Two-level IPI would let you distinguish masking
between Machine and Supervisor mode for IPIs. There is also overlap with
Debug mode to consider. I haven't read the Debug Mode Specification to
consider the implications. There are also virtual NMIs to consider for
HV. I'm not sure if adding more eret flavours is the way to go. Also
does adding a reg to eret imply a clobbered register? It's an I-type.
It's like there needs to be a pseudo stack of CSRs that mirrors the
privilege level vector in xstatus. i.e. eret n (imm) which is xepc0 + n,
(xscratch0...n, xepc0...n) and an immediate for the eret for the CSR
vector offset. eret is already I-type. I see the CSR vector format
(uarch0...15). This kind of change would break binary compat for
privileged code as the CSR space would need to be re-arranged (this
looks like it's going to have to happen anyway) however user code binary
compat would not be effected. Also eret without zero immediate would be
xepc0 + 0 which is status quo. Also what vector size? Magically
'unmasking' NMIs sounds like an interesting idea for machine mode. Don't
like the eret ra idea due to the register clobber. That's why we have
xscratch in the first place.
I need to re-read Monte's email...
We would like to reserve NMI to be used for bad things happening to
the hardware platform itself. The current RISC-V architecture already
supports nested, recoverable hardware interrupts to support many of
the use cases you describe below. In particular, machine-mode
interrupts cannot be masked by supervisor-mode code, so handle the use
cases below just fine, and I'd argue more preferably than handling
them at supervisor level.
>>>>> On Fri, 05 Feb 2016 20:59:32 +1300, Michael Clark <michae...@mac.com> said:
| This is worth reading regarding recoverable NMI handling in the Linux
| kernel:
| https://lwn.net/Articles/484932/
| http://www.linaro.org/blog/core-dump/debugging-arm-kernels-using-nmifiq/
| The main points:
| * watchdog interrupt - needs to be able to trigger while interrupts
| are disabled
This represents something gone wrong in your OS kernel (or possibly
underlying hardware), and is ideally handled by a machine-mode
interrupt. Watchdog timer reset would be an SBI call. This would
also allow these to be cleanly handled under virtualization also.
| * profiling interrupt - needs to measure time while interrupts are
| disabled
Machine-mode can provide profiling support more cleanly than adding to
kernel (since you perturb the thing you're measuring less). SBI calls
can initiate and report back results of profiling.
| * non-maskable IPI - need to wake up a hart that has interrupts disabled.
Not quite sure what this use case is, but again, machine-mode code on
one hart can always interrupt another hart.
| The first two sound like non-maskable timer interrupts:
| The third is used for sampling backtraces in CONFIG_DEBUG_SPINLOCK and
| likely has other uses
| Observations:
| * The current IPI in the privileged spec is a software interrupt in the
| 'sip' register (supervisor)
| * The IPI CSR in the code is 'mipi' (machine) - this CSR is not in
| privileged-spec-v1.7
| Ideas:
| * Should the IPI CSRs be sip/sipi or mip/mipi to be consitent?
| * Should there be nmtie for non maskable timer interrupts?
| This extra complexity would mean RISC-V would need to support
| recoverable NMIs.
I believe our separation of M-mode from S-mode provides the needed
functionality. Please let us know if there's something we're missing,
Krste
Thanks for taking the time for a detailed reply. I think the problem is more with my understanding than yours. I will read your emails (a few times) and digest. As I understand it, the (N)MI use cases are solvable within the current architecture. I won't comment further until I have digested your emails.
Thanks again,
Michael
Sent from my iPhone