Sean Halle wrote:
> If I could verify my understanding, the statement "There's no leeway
> to remove standard U-mode CSRs from the perspective of U-mode
> programs, as their presence is mandated by the ABI." Seems to imply
> that all U mode CSRs are, indeed, required -- none are optional.. but
> you implement some in such a way that when they are accessed they trap
> to firmware or to the OS, which then implements their behavior in
> software. Did I understand that correctly?
In section 2.8 of the RISC-V User ISA spec: "We define the full set of
CSR instructions here, although in the standard user-level base ISA,
only a handful of read-only counter CSRs are accessible." That leads to
the RDCYCLE, RDTIME, and RDINSTRET instructions. So user-visible CSRs
in RVI are not actually required, only that those pseudo-instructions work.
> Does this imply that there is effectively no hardware cost to the
> non-implemented CSRs, because they just generate a trap? Does that
> imply that _all_ CSRs that are optional can be successfully handled
> this way? That may not be possible.. consider, for example,
> implementing performance counters via traps.. performance counters
> are part of U mode CSRs, as far as the spec seems to indicate.
Only "cycle", "time", and "instret" are part of the baseline user ISA.
> One feeling is that there may be some unwitting contradiction in
> play.. on the one had "presence is mandated by the ABI" on the other
> hand "most CSRs are nice-to-haves and are optional" and on a third
> hand "defer implementation to software" and on the fourth hand
> "implement CSR by wiring it to zero" -- these statements, from
> various emails, seem to contradict each other. For example, in one
> place there is mention of implementing non-critical CSRs by tying bits
> to zero, but in other places there is mention of trapping and
> implementing by software.. Perhaps there is more information behind
> the statements that hasn't been shared yet?
>
> What I'm hoping for is a clear vision of how all these requirements
> can all be simultaneously met:
> 1] Presence of U mode CSRs is required
Only "cycle", "time", and "instret" are present in the RVI baseline ISA;
all the rest are actually defined by various extensions or by the
privileged ISA (which can be swapped out /in toto/ if I understand
correctly). Even these are optional in RV32E, so there could be some
precedent for omitting them entirely.
> 2] Stripped down compute engines need total CSR area to be a fraction
> of the area of the scalar register file (say 1/4 the area)
> - -] CSR state is typically implemented as flip-flops (much larger
> area per bit of state), making this more difficult
A simple model: each compute engine is at all times either running in
U-mode or halted; when an engine is halted, its context (scalar register
file and program counter) is accessible in a memory-mapped region for an
associated control processor. All external interrupts are taken on the
control processor and halting a compute engine causes an interrupt to
the control processor (probably via the PLIC). Compute engines halt
instead of trapping. This would require a slightly modified supervisor,
but as long as the workload is predominately in U-mode, several compute
engines should be able to share a control processor with minimal
contention. In this model, compute engines could halt upon executing
any SYSTEM instruction. As I understand it, this is fully compliant
with the user ISA (since the control processor could emulate
RDCYCLE/RDTIME/RDINSTRET) and would be a non-standard privileged ISA.
RISC-V allows this. (Or you could push to standardize an alternate
privileged ISA for systems that have U-mode-only harts.)
> 3] Performance counters likely cannot be implemented by software, yet
> are mandated because they are part of U mode CSRs
> - -] (also, other CSRs may similarly be barred from being implemented
> by software)
Generally, a CSR that can be fully implemented without hardware support
is probably a waste of a CSR slot. On the other hand, CSRs may control
things that minimal compute engines may or may not actually have. If
your compute engines do not have paging support, then the satp CSR is
useless and can be omitted entirely. (This is plausible in the above
"simple model" if a group of compute engines that share a single virtual
address space mapped using a shared PMMU, possibly even shared with the
control processor, or the PMMU control registers can also be
memory-mapped for the control processor.) If your compute engines are
U-mode-only, than *all* of the privileged CSRs vanish in a puff of
smoke. With no interrupts at all (allowed in the current privileged
spec, which permits interrupts to be routed to specific harts) and traps
handled by halting (and sending an interrupt to the control processor),
the only CSRs you could even have are the cycle, time, and
instructions-retired counters--and fcsr if you implement floating-point.
> 4] Software -- both user level and OS -- will, over time, come to
> expect particular CSRs to be functioning
> - -] If hardware has not implemented the expected CSRs, or has done so
> in a crippled way, then the software will have consequences
The specs set fairly tight limits on what portable software can expect.
For example, the other 29 performance counters are almost completely
unspecified. Portable software cannot rely on them.
> 5] Many different kinds of hardware need to be supported
> - -] The approach to CSRs will be different for different categories
> of hardware -- micro-controllers will likely just not implement any
> CSRs at all, not even tying them to 0 -- stripped down compute engines
> will only implement the ones critical to OS functionality (it looks
> like roughly 6 or 7 total) -- full scale server cores will implement
> many, or perhaps all, directly.
RV32E permits exactly that--omitting all CSRs, but some of the functions
CSRs provide will still be needed (and would probably be
memory-mapped). As I suggested above, a truly minimal compute engine
could execute exclusively in U-mode, and pass all traps off to a nearby
control processor as interrupts.
> we need to also satisfy these, at the same time:
> 6] Before downloading a particular binary, a means is needed to check
> its compatibility with the hardware -- not just functional, but
> performance related (IE, software CSRs may be unacceptable to some
> binaries)
> - -] Checking at run time, for which CSRs are implemented in each
> particular fashion is too late -- the binary has already been selected
> and installed
Unfortunately, this is a general and not-entirely-solved problem. I
suspect that the practical solution, if your compute engines are
specialized enough, would be that special binaries for "Intensivate
FooGrid 9000" (product name made-up for an example) would be produced
and distributed. Labeling those binaries in a machine-readable way is a
different problem.
> 7] Coders are often poorly informed about hardware features, and make
> bad assumptions, and bake those into their code
>
> The main point for me, personally, is implications for software --
> programmers often don't spend the time to understand the hardware
> subtleties, if they see something in the spec they quickly jump to an
> intuition about it and go with that. Hence, warning that CSRs may not
> be fully implemented won't sink in for many coders. We can feel
> fairly certain that popular software packages will misunderstand, and
> assume that all CSRs are fully implemented, and make their code rely
> on the presence of all CSRs -- despite the fine print embedded into
> the spec that warns against this.
How about the fact that the user spec only mentions "time", "cycle", and
"instret"? If user-mode programmers rely on CSRs defined only in the
system ISA spec in portable programs, then PEBKAC and there is no
technical solution other than finding better programmers.
> My proposal is designed to make it obvious to coders, who are in a
> hurry, that some CSRs are optional, and to provide a simple
> _automated_ way to handle matching binaries with hardware. Tools need
> to be involved with the process, so that the tool creators are the
> ones who read the spec, and understand, and then add checks to the
> tools, that then inform the coders. More importantly, the tools need
> to automate the matching of binaries to hardware. Think about
> downloading on the web -- we need the tools to know, before download,
> which binaries are compatible with the target processor. To
> accomplish this, it isn't enough for certain things to be _possible_.
> Instead, a system needs to be in place that makes the default way of
> doing things be a good way of doing things. Unless we make the
> easiest, path of least resistance, be one that has good outcomes, then
> software will accumulate worst case behaviors, which will become the
> norm. That creates de-facto standards that we didn't intend. Then
> hardware has to implement things that are harmful, even though they
> are optional, in order to support the bad habits baked into popular
> software. This kind of legacy issue has a long history of repeating
> itself. I'm proposing that we do something to head off the problem
> before it arises.
Speaking of repeating legacy issues, I think that I understand your
frustration--I have been trying to get SUM changed to never permit
S-mode instruction fetch from a user page mapping since the first
message I sent to this list (back when it had the opposite sense and was
called "PUM") and my effort still seems to fall on deaf ears.
-- Jacob