Allen Baum wrote:
> Taking exceptions (synchronous exceptions) is nastier - your handler
> M-mode code should be written so that it doesn't take an exception, or
> at least not before its had time to save a few CSRs somewhere (e.g.
> saving the few csrs better not trao!). Primarily, that means access to
> the physical address of the handler should guaranteed, and access to
> the save area should be guaranteed.
This was hashed out previously in the "nested trap" discussions: *all*
trap handlers must have a context-save area that can be accessed without
incurring a horizontal trap. This issue also hits the supervisor. It
is more than just the CSRs: the entire general register file must be saved.
> Having said that, it is still the case that an NMI could come in -
> that's basically fatal in this scenario, and should be confined to HW
> error conditions that are fatal anyway.
Does this mean that MPIE in the NMI handler is effectively the
"recoverable NMI" flag? If MPIE is set when the NMI handler is entered,
nothing has been lost (since the monitor was prepared for an interrupt)
and the monitor can resume execution after handling the NMI, probably
after software-delegating the NMI to a "machine check" handler
previously registered by the supervisor using some to-be-defined SBI
call. If MPIE is clear when the NMI handler is entered, the NMI
occurred while entering the monitor trap handler and the resume point
has been destroyed; the only path forwards is to reset.
High-reliability system can avoid these problems by placing the monitor
entry point and context save areas in internal (multi-port) SRAM, with
extensive ECC on that SRAM and dedicated ECC scrubbing hardware with its
own SRAM port. (If *that* fails, the main registers probably cannot be
trusted to hold values either and the system will crash no matter what.)
-- Jacob