NMI and mepc

Monte Dalrymple

unread,

Feb 4, 2016, 3:25:05 PM2/4/16

to isa...@lists.riscv.org

NMI has its own entry in the vector table, but I believe that it also
needs its own mepc register. Consider this case: a trap service
routine has just started and before the state can be saved an NMI
is taken, overwriting the mepc register. There may be a similar
issue with the mstatus register. A by product of this is that a
separate sret instruction will be needed (or perhaps a parameter
in the sret) to select the correct return register. If multiple NMIs
are allowed, as suggested in the spec, a separate register for
each NMI will be needed.

There is a similar issue on the return from sbreak, where I think
a separate sret flavor will be required if debugging or single-
stepping through an interrupt service return is desired.

Andrew Waterman

unread,

Feb 4, 2016, 3:59:47 PM2/4/16

to Monte Dalrymple, isa-dev

Our intent has been to reserve NMIs for non-resumable events, in which
case clobbering mepc and friends is OK. It's also not obvious that
another set of exception-handling register suffices; the NMI itself
could be interrupted by an NMI. (If that weren't true, it wouldn't
actually be non-maskable...)

Monte Dalrymple

unread,

Feb 4, 2016, 4:24:41 PM2/4/16

to Andrew Waterman, isa-dev

If that is the case then I submit that the "NMI" label should be
replaced by something else, since Interrupt implies resumable.
Perhaps "FE" (Fatal Exception) or something similar.

Cesar Eduardo Barros

unread,

Feb 4, 2016, 4:57:21 PM2/4/16

to Monte Dalrymple, Andrew Waterman, isa-dev

Em 04-02-2016 19:24, Monte Dalrymple escreveu:
> If that is the case then I submit that the "NMI" label should be
> replaced by something else, since Interrupt implies resumable.
> Perhaps "FE" (Fatal Exception) or something similar.

Why not be more obvious? "NRE" (Non-Resumable Exception).

--
Cesar Eduardo Barros
ces...@cesarb.eti.br

Monte Dalrymple

unread,

Feb 4, 2016, 5:03:05 PM2/4/16

to Andrew Waterman, Cesar Eduardo Barros, isa-dev

That was actually my first thought, but I am used to seeing NRE
used as shorthand for Non-Recurring Engineering Cost.

----- Original Message -----
From: "Cesar Eduardo Barros" <ces...@cesarb.eti.br>
To: "Monte Dalrymple" <mon...@systemyde.com>; "Andrew Waterman"
<and...@sifive.com>
Cc: "isa-dev" <isa...@lists.riscv.org>
Sent: Thursday, February 04, 2016 1:57 PM
Subject: Re: NMI and mepc

kr...@berkeley.edu

unread,

Feb 5, 2016, 2:16:44 AM2/5/16

to Monte Dalrymple, Andrew Waterman, Cesar Eduardo Barros, isa-dev

NMI _is_ the standard name for a non-resumable interrupt, usually used
to indicate something is broken in the machine, requiring a fail-stop
to prevent further data corruption.

Although they have occasionally been used as a way of providing a
higher-priority interrupt, this is really abusing the concept.

Krste

Michael Clark

unread,

Feb 5, 2016, 2:59:32 AM2/5/16

to kr...@berkeley.edu, Monte Dalrymple, Andrew Waterman, Cesar Eduardo Barros, isa-dev

This is worth reading regarding recoverable NMI handling in the Linux
kernel:

https://lwn.net/Articles/484932/
http://www.linaro.org/blog/core-dump/debugging-arm-kernels-using-nmifiq/

The main points:

* watchdog interrupt - needs to be able to trigger while interrupts
are disabled
* profiling interrupt - needs to measure time while interrupts are
disabled
* non-maskable IPI - need to wake up a hart that has interrupts disabled.

The first two sound like non-maskable timer interrupts:

The third is used for sampling backtraces in CONFIG_DEBUG_SPINLOCK and
likely has other uses

Observations:

* The current IPI in the privileged spec is a software interrupt in the
'sip' register (supervisor)
* The IPI CSR in the code is 'mipi' (machine) - this CSR is not in
privileged-spec-v1.7

Ideas:

* Should the IPI CSRs be sip/sipi or mip/mipi to be consitent?
* Should there be nmtie for non maskable timer interrupts?

This extra complexity would mean RISC-V would need to support
recoverable NMIs.

~mc

Michael Clark

unread,

Feb 5, 2016, 8:03:37 AM2/5/16

to kr...@berkeley.edu, Monte Dalrymple, Andrew Waterman, Cesar Eduardo Barros, isa-dev

The kernel guys would love to see spinlocks in sampling profiler
backtraces. I've wanted to see them on multiple occasions myself when
trying to debug Linux/FreeBSD performance bottlenecks.

http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

The other thoughts are related to receiving an NMI while processing an
NMI. NMIs should be masked while handling an NMI if we want them to be
recoverable. Fault handling during an NMI also needs to be clearly
specified. Page faults can occur in machine mode (MPRV bit set) and are
a number of other faults that could occur during NMI processing
(privileged-spec-v1.7 Table 3.7: Machine cause register (mcause)
values). It's probably best to be able 'unmask' NMIs rather than add
some silly mechanism like unmasking NMIs on the first eret (which gets
broken by any subsequent fault). Machine mode can be considered as
micro-code level so we should be able to do magic things like unmask
NMIs. The other thing to consider is the so-called 'unmaskable' timer
interrupts and if they are only available in machine mode then we would
be requiring an ecall to the HAL to do things like enable watchdogs and
profiling timers (CONFIG_DEBUG_SPINLOCK) and to bust spinlocks
(mipi/mip,sipi/sip). Two-level IPI would let you distinguish masking
between Machine and Supervisor mode for IPIs. There is also overlap with
Debug mode to consider. I haven't read the Debug Mode Specification to
consider the implications. There are also virtual NMIs to consider for
HV. I'm not sure if adding more eret flavours is the way to go. Also
does adding a reg to eret imply a clobbered register? It's an I-type.
It's like there needs to be a pseudo stack of CSRs that mirrors the
privilege level vector in xstatus. i.e. eret n (imm) which is xepc0 + n,
(xscratch0...n, xepc0...n) and an immediate for the eret for the CSR
vector offset. eret is already I-type. I see the CSR vector format
(uarch0...15). This kind of change would break binary compat for
privileged code as the CSR space would need to be re-arranged (this
looks like it's going to have to happen anyway) however user code binary
compat would not be effected. Also eret without zero immediate would be
xepc0 + 0 which is status quo. Also what vector size? Magically
'unmasking' NMIs sounds like an interesting idea for machine mode. Don't
like the eret ra idea due to the register clobber. That's why we have
xscratch in the first place.

I need to re-read Monte's email...

kr...@berkeley.edu

unread,

Feb 10, 2016, 3:13:13 PM2/10/16

to Michael Clark, kr...@berkeley.edu, Monte Dalrymple, Andrew Waterman, Cesar Eduardo Barros, isa-dev

Hi Michael,

We would like to reserve NMI to be used for bad things happening to
the hardware platform itself. The current RISC-V architecture already
supports nested, recoverable hardware interrupts to support many of
the use cases you describe below. In particular, machine-mode
interrupts cannot be masked by supervisor-mode code, so handle the use
cases below just fine, and I'd argue more preferably than handling
them at supervisor level.

>>>>> On Fri, 05 Feb 2016 20:59:32 +1300, Michael Clark <michae...@mac.com> said:
| This is worth reading regarding recoverable NMI handling in the Linux
| kernel:
| https://lwn.net/Articles/484932/
| http://www.linaro.org/blog/core-dump/debugging-arm-kernels-using-nmifiq/
| The main points:
| * watchdog interrupt - needs to be able to trigger while interrupts
| are disabled

This represents something gone wrong in your OS kernel (or possibly
underlying hardware), and is ideally handled by a machine-mode
interrupt. Watchdog timer reset would be an SBI call. This would
also allow these to be cleanly handled under virtualization also.

| * profiling interrupt - needs to measure time while interrupts are
| disabled

Machine-mode can provide profiling support more cleanly than adding to
kernel (since you perturb the thing you're measuring less). SBI calls
can initiate and report back results of profiling.

| * non-maskable IPI - need to wake up a hart that has interrupts disabled.

Not quite sure what this use case is, but again, machine-mode code on
one hart can always interrupt another hart.

| The first two sound like non-maskable timer interrupts:

| The third is used for sampling backtraces in CONFIG_DEBUG_SPINLOCK and
| likely has other uses

| Observations:

| * The current IPI in the privileged spec is a software interrupt in the
| 'sip' register (supervisor)
| * The IPI CSR in the code is 'mipi' (machine) - this CSR is not in
| privileged-spec-v1.7

| Ideas:

| * Should the IPI CSRs be sip/sipi or mip/mipi to be consitent?
| * Should there be nmtie for non maskable timer interrupts?

| This extra complexity would mean RISC-V would need to support
| recoverable NMIs.

I believe our separation of M-mode from S-mode provides the needed
functionality. Please let us know if there's something we're missing,

Krste

kr...@berkeley.edu

unread,

Feb 10, 2016, 3:15:16 PM2/10/16

to Michael Clark, kr...@berkeley.edu, Monte Dalrymple, Andrew Waterman, Cesar Eduardo Barros, isa-dev

I couldn't parse this email. Would be great if you could recast proposal after
reading my last email regarding using machine-mode to provide recoverable
non-maskable interrupts for supervisor-mode.
Thanks,
Krste

Michael Clark

unread,

Feb 10, 2016, 5:34:32 PM2/10/16

to kr...@berkeley.edu, Monte Dalrymple, Andrew Waterman, Cesar Eduardo Barros, isa-dev

Hi Krste,

Thanks for taking the time for a detailed reply. I think the problem is more with my understanding than yours. I will read your emails (a few times) and digest. As I understand it, the (N)MI use cases are solvable within the current architecture. I won't comment further until I have digested your emails.