RISC-V ISA subsets and the ISA interface "contract"

Alex Bradbury

unread,

Jan 23, 2017, 4:35:59 AM1/23/17

to RISC-V ISA Dev

From a recent mailing list discussion, it seemed clear to me there may
be an opportunity to further clarify the base RISC-V manual,
especially when it comes to M-mode-only implementations and the
precise meaning of the RISC-V interface "contract".

Let's say I'm implementing an M-mode-only RISC-V core that I want to
claim as RV32IMF. There are a number of choices I might make:
* Whether misaligned memory access is supported in hardware or causes a trap
* Whether rdcycle just reads mcycle or whether it traps (ref:
https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/jHfgKgwXey0/VGv71dU9AQAJ)
* In fact, an almost arbitrary subset of the RV32IMF instructions
might not be supported in hardware and instead rely on traps

For any of the above features not supported in hardware, there is the
option of either:

a) supporting that feature through code in a ROM. Trap handlers don't
need to be supplied as part of the program to be executed.
b) requiring appropriate trap handlers to be supplied, e.g. a
core-specific library that is linked in to the resulting binary. To be
fully general, this library would need to provide handlers for
misaligned access, any RV32I instructions that might reasonably by
handled by trapping and all RV32MF instructions. As code size is
normally at a premium, it seems logical that a library providing only
the necessary handlers for a given core implementation and
configuration would be used.

In discussions at RISC-V workshops and the like we often talk about a
'core' being compliant with a given ISA description, but if an
accepted approach is to rely on trapping for some functionality then
just talking about the 'core' doesn't make sense. Instead it makes
sense to talk about what is supported by the combination of a core and
a (vendor-supplied?) runtime library. This does seem to be consistent
with the description in chapter 11 of the v2.1 spec "This chapter
describes the RISC-V ISA subset naming scheme that is used to
concisely describe the set of instructions present in a hardware
implementation, or the set of instructions used by an application
binary interface (ABI)."

The specification also states "The base ISA supports misaligned
accesses, but these might run extremely slowly depending on the
implementation." Perhaps there is an opportunity to clarify that a
RISC-V "implementation" is this combination of hardware + support
library (i.e. misaligned memory access trap handlers need not be in
ROM).

I suppose the key point I'm looking for clarity on is this: If I have
a core that says it's an RV32IMF M-mode-only RISC-V implementation
and:
* Some of the RV32IMF instructions are handled by trapping
* Misaligned memory accesses are handled by trapping
Then is it fair to conclude that the SDK must be provide these
necessary handlers for it to be considered an RV32IMF platform?

The only problem with agreeing the above is that describing something
as 'RV32IMF' then conveys relatively little meaning, as any RV32I core
that allows a trap handler for illegal instructions to be installed
and has sufficient program memory to support emulating the
instructions can claim the same.

Thanks,

Alex

Bruce Hoult

unread,

Jan 23, 2017, 7:02:28 AM1/23/17

to Alex Bradbury, RISC-V ISA Dev

I think it would be ok in this situation to say that application code using the RV32IMF instruction set is supported and therefore the platform is RV32IMF.

Potential users would no doubt also be interested in seeing performance numbers on particular workloads of interest.

An implementation is equally as much RV32IMF compliant if it runs your code at 1 MIPS as if it runs it at 1 GIPS. Running is running, running your unchanged binary software is infinitely preferable to not running it, and there are use-cases for both ends of the performance spectrum.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2BwH294jfXbeHHkJg%2BdMeQLaOC%2Bm1A4-vQmxuf4AL6Zz4SH3Gw%40mail.gmail.com.

Alex Bradbury

unread,

Jan 23, 2017, 9:17:10 AM1/23/17

to Bruce Hoult, RISC-V ISA Dev

On 23 January 2017 at 12:02, Bruce Hoult <br...@hoult.org> wrote:
> I think it would be ok in this situation to say that application code using
> the RV32IMF instruction set is supported and therefore the platform is
> RV32IMF.
>
> Potential users would no doubt also be interested in seeing performance
> numbers on particular workloads of interest.
>
> An implementation is equally as much RV32IMF compliant if it runs your code
> at 1 MIPS as if it runs it at 1 GIPS. Running is running, running your
> unchanged binary software is infinitely preferable to not running it, and
> there are use-cases for both ends of the performance spectrum.

Hi Bruce. I certainly agree with your reasoning here, I think the key
question is whether any compliant core+SDK is required to supply the
necessary software support to trap+emulate instructions not supported
by the hardware, and (if necessary) misaligned memory access. It had
thought this was definitely the case, but the discussion about rdcycle
this weekend made me think it is worth checking that this is a shared
understanding.

Best,

Alex

Bruce Hoult

unread,

Jan 23, 2017, 9:36:10 AM1/23/17

to Alex Bradbury, RISC-V ISA Dev

As someone using a standard 3rd party library that used rdcycle internally, I'd be a little bit unhappy if it ran slowly on my m-mode only system, but I'd be much more upset if it didn't run at all.

Alas, I don't yet have real hardware to test this kind of thing on (it's now exactly one month and counting since my HiFive1 shipped from Crowd Supply -- others in Europe are reporting receiving theirs in recent days, so I'm hopeful it will be soon).

I recently did an exercise where I wanted a custom "instruction" on Aarch64 and I modified the standard linux svc trap handler to implement it. On an A53 (Odroid C2) I found it took about 30 cycles just to get into my handler and back.

As a RISC-V hardware implementer, do you have a feel for the minimum time to get to a trap handler and back?

Alex Bradbury

unread,

Jan 24, 2017, 4:40:42 AM1/24/17

to Bruce Hoult, RISC-V ISA Dev

On 23 January 2017 at 14:36, Bruce Hoult <br...@hoult.org> wrote:
> As someone using a standard 3rd party library that used rdcycle internally,
> I'd be a little bit unhappy if it ran slowly on my m-mode only system, but
> I'd be much more upset if it didn't run at all.
>
> Alas, I don't yet have real hardware to test this kind of thing on (it's now
> exactly one month and counting since my HiFive1 shipped from Crowd Supply --
> others in Europe are reporting receiving theirs in recent days, so I'm
> hopeful it will be soon).
>
> I recently did an exercise where I wanted a custom "instruction" on Aarch64
> and I modified the standard linux svc trap handler to implement it. On an
> A53 (Odroid C2) I found it took about 30 cycles just to get into my handler
> and back.
>
> As a RISC-V hardware implementer, do you have a feel for the minimum time to
> get to a trap handler and back?

I haven't really looked at the minimum achievable time for a
horizontal M-mode trap on an M-mode-only system. I think in practice
an important factor is going to be how many registers you have to
save/restore and the number of cycles it takes in the main trap
routine to determine the cause and dispatch the appropriate handler.

Thinking some more, I believe can restate the main point of this
thread more concisely:

Suppose I author libfoo and ensure it is compatible with RV32IMF. A
user sends me a bug report saying they tried to use it for a vendors
RV32IMF microcontroller, compiling it using their SDK. They report it
fails to run due to unhandled traps for divide and misaligned memory
accesses. I should be able to tell them to file a bug with the vendor
because the core+SDK they provide is failing to provide an
RV32IMF-compatible environment.

I had thought the above was a straight-forward conclusion from the
RISC-V spec and something we agree on, but it seems that might not be
the case. The RISC-V community could of course vote to standardise new
subsets of RV32I/E. If people feel this is an important gap I would
urge that they prepare a proposal as soon as possible so we can get
the disruption to the RISC-V compatibility story out of the way.

As an additional concrete recommendation, I suggest that the next
revision of the ISA specification should make a note that some
low-level code (such as trap handlers) must be written with knowledge
of the underlying implementation choices made by the target. One place
such a note might be useful is on page 18, when describing how
misaligned accesses are supported.

Best,

Alex

Bruce Hoult

unread,

Jan 24, 2017, 6:22:25 AM1/24/17

to Alex Bradbury, RISC-V ISA Dev

On Tue, Jan 24, 2017 at 12:40 PM, Alex Bradbury <a...@asbradbury.org> wrote:

On 23 January 2017 at 14:36, Bruce Hoult <br...@hoult.org> wrote:
> As someone using a standard 3rd party library that used rdcycle internally,
> I'd be a little bit unhappy if it ran slowly on my m-mode only system, but
> I'd be much more upset if it didn't run at all.
>
> Alas, I don't yet have real hardware to test this kind of thing on (it's now
> exactly one month and counting since my HiFive1 shipped from Crowd Supply --
> others in Europe are reporting receiving theirs in recent days, so I'm
> hopeful it will be soon).
>
> I recently did an exercise where I wanted a custom "instruction" on Aarch64
> and I modified the standard linux svc trap handler to implement it. On an
> A53 (Odroid C2) I found it took about 30 cycles just to get into my handler
> and back.
>
> As a RISC-V hardware implementer, do you have a feel for the minimum time to
> get to a trap handler and back?

I haven't really looked at the minimum achievable time for a
horizontal M-mode trap on an M-mode-only system. I think in practice
an important factor is going to be how many registers you have to
save/restore and the number of cycles it takes in the main trap
routine to determine the cause and dispatch the appropriate handler.

Sure, but the trap handler code is visible to and under the control of the programmer, and probably doesn't vary much in execution time from core to core (assuming for example 1-wide in order).

I see, for example, that Cortex M3 and M4 have a trap entry latency of 12 clock cycles and return latency of 10 cycles, assuming suitable memory interfaces. That includes saving/restoring status, PC, LR, R0-R3 and R12 (8 words), and in parallel fetching the correct trap handler vector and fetching the first instruction of the handler (or the interrupted code, on return). As I'm sure you know, Cortex M have the feature that you can just bung the address of a C function with the standard ABI into the trap vector, so it automatically saves/restores a lot more than machines where you need to write the handler in assembler and maybe only the status and PC are saved (and maybe saved to special registers, not to memory).

Ilan Pardo

unread,

Jan 26, 2017, 6:39:20 AM1/26/17

to RISC-V ISA Dev

The Problem is more than academic. SW (at any privilege level) need a way to inquire "implementation" (either HW or HW/FW combo) about its capability.

This is both functional capability (do I have this ISA?) but also the performance range it will get.

SW than can decide if to use the ISA or emulate it using supported ISA.

For this any "significant" deviation from the performance at the core support level should be stated explicitly.

I.e. If I emulate all the ISA at 20 cycles per instruction or so, it ok not to bother but if specific behavior such as unaligned significantly differ than aligned access it should be advertised to the running SW somehow.

See X86 CPUID instruction for a (not so clean) example.

The mechanism to such indication is missing in RISC-V.

Ilan

Bruce Hoult

unread,

Jan 26, 2017, 7:59:06 AM1/26/17

to Ilan Pardo, RISC-V ISA Dev

Looks like you could do a lot worse than emulate CPUID (and most do)

https://en.wikipedia.org/wiki/CPUID

It should be virtualizable so you can lie to a user process. Probably ok if it's slow.

I liked how it was handled in original MacOS. That was OS software, but simple enough that it could have been hardware.

https://en.wikipedia.org/wiki/Gestalt_(Mac_OS)

It's not clear from that description, but while Gestalt selectors were 32 bit integers, they in fact all (?) took the form of easily remembered 4 character ASCII strings e.g.

'bclk', 'ram ', 'rom ', 'fpu ', 'pgsz'

An archive of many Gestalt selectors is here: http://www.rgaros.nl/gestalt/index.html

--

You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/742b2604-5e3b-4f0c-a536-363fb0a12e0d%40groups.riscv.org.

Michael Clark

unread,

Jan 26, 2017, 12:44:40 PM1/26/17

to Bruce Hoult, Alex Bradbury, RISC-V ISA Dev

I measured instruction count for an SBI call that makes an ecall to an MCALL_MHARTID implemented with a save restore of only two scratch registers. The ecall part of the implementation less SBI wrapper takes 28 instructions. There would likely be a similar minimum for any other gestalt upcall that uses ecall to query platform information.

- M-mode scratch stack in mscratch is used to save the two temp registers and they are restored before mret to avoid info leak.

- The mtvec does a compare less than on the cause to check whether the trap is an interrupt

- then jumps into a cause vectored jump table which leads to the machine_ecall vector entry

- advancing mepc takes 3 instructions.

- a7 the ecall selector is then compared against the ecall table size and left shifted for another a7 vectored jump table

- a7 is clobbered as there is no info leak and the ABI allows it

- leads to the MCALL_MHARTID vector entry

- the actual function is run which reads a CSR

- mret

BTW It is also interesting to note that gcc didn’t compact the call to the SBI interface which is reachable in one instruction; possibly because I defined the SBI as a pointer to function. That said it’s not volatile so gcc should be able to compact the instructions.

unsigned long(*sbi_hart_id)(void) = (void*)-2048;

It likely needs a linker script or inline asm to get the one instruction x0 relative jalr call. e.g.

jalr ra, x0, -2048 /* jalr -2048(x0) */

0000000000000000028 core-0 :000000000001022c (80000793) addi a5, zero, -2048
0000000000000000029 core-0 :0000000000010230 (000780e7) jalr ra, a5, 0

0000000000000000030 core-0   :fffffffffffff800 (00000893) mv   a7, zero
0000000000000000031 core-0   :fffffffffffff804 (0000   ) ecall
TRAP     :machine_ecall pc:0xfffffffffffff804 badaddr:0xfffffffffffff804
0000000000000000031 core-0   :0000000000001074 (34011173) csrrw     sp, mscratch, sp
0000000000000000032 core-0   :0000000000001078 (00513023) sd   t0, 0(sp)
0000000000000000033 core-0   :000000000000107c (00613423) sd   t1, 8(sp)
0000000000000000034 core-0   :0000000000001080 (34202373) csrrs     t1, mcause, zero
0000000000000000035 core-0   :0000000000001084 (04035e63) bgez   t1, pc + 92
0000000000000000036 core-0   :00000000000010e0 (00c32293) slti   t0, t1, 12
0000000000000000037 core-0   :00000000000010e4 (04028463) beqz   t0, pc + 72
0000000000000000038 core-0   :00000000000010e8 (00000297) auipc     t0, pc + 0
0000000000000000039 core-0   :00000000000010ec (1b428293) addi   t0, t0, 436
0000000000000000040 core-0   :00000000000010f0 (00231313) slli   t1, t1, 2
0000000000000000041 core-0   :00000000000010f4 (006282b3) add     t0, t0, t1
0000000000000000042 core-0   :00000000000010f8 (0002a283) lw   t0, 0(t0)
0000000000000000043 core-0   :00000000000010fc (00028067) jr   t0
0000000000000000044 core-0   :0000000000001100 (0108a293) slti   t0, a7, 16
0000000000000000045 core-0   :0000000000001104 (02028463) beqz   t0, pc + 40
0000000000000000046 core-0   :0000000000001108 (00000297) auipc     t0, pc + 0
0000000000000000047 core-0   :000000000000110c (1c428293) addi   t0, t0, 452
0000000000000000048 core-0   :0000000000001110 (00289893) slli   a7, a7, 2
0000000000000000049 core-0   :0000000000001114 (011282b3) add     t0, t0, a7
0000000000000000050 core-0   :0000000000001118 (0002a283) lw   t0, 0(t0)
0000000000000000051 core-0   :000000000000111c (34102373) csrrs     t1, mepc, zero
0000000000000000052 core-0   :0000000000001120 (00430313) addi   t1, t1, 4
0000000000000000053 core-0   :0000000000001124 (34131073) csrrw     zero, mepc, t1
0000000000000000054 core-0   :0000000000001128 (00028067) jr   t0
0000000000000000055 core-0   :0000000000001138 (f1402573) csrrs     a0, mhartid, zero
0000000000000000056 core-0   :000000000000113c (00013283) ld   t0, 0(sp)
0000000000000000057 core-0   :0000000000001140 (00813303) ld   t1, 8(sp)
0000000000000000058 core-0   :0000000000001144 (34011173) csrrw     sp, mscratch, sp
0000000000000000059 core-0   :0000000000001148 (30200073) mret
0000000000000000060 core-0   :fffffffffffff808 (00008067) ret

Andrew Waterman

unread,

Jan 26, 2017, 3:10:03 PM1/26/17

to Michael Clark, Bruce Hoult, Alex Bradbury, RISC-V ISA Dev

We'd ordinarily do this with a symbol and a linker relaxation. But
yes, GCC could be improved to understand that absolute addresses
[-2048,2046] are legal JALR targets.

> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to isa-dev+u...@groups.riscv.org.

> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit

> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5EC7A929-C936-4FA1-A32D-D136E8DBD429%40mac.com.

Jacob Bachmeyer

unread,

Jan 26, 2017, 6:29:20 PM1/26/17

to Ilan Pardo, RISC-V ISA Dev

Ilan Pardo wrote:
> The Problem is more than academic. SW (at any privilege level) need a
> way to inquire "implementation" (either HW or HW/FW combo) about its
> capability.
> This is both functional capability (do I have this ISA?) but also the
> performance range it will get.
> SW than can decide if to use the ISA or emulate it using supported ISA.
> For this any "significant" deviation from the performance at the core
> support level should be stated explicitly.
> I.e. If I emulate all the ISA at 20 cycles per instruction or so, it
> ok not to bother but if specific behavior such as unaligned
> significantly differ than aligned access it should be advertised to
> the running SW somehow.
> See X86 CPUID instruction for a (not so clean) example.
>
> The mechanism to such indication is missing in RISC-V.

You mean the configuration string? I would expect that Linux (for
example) would export the configuration string (or some modified
version) in /proc/cpuinfo or perhaps even a new
/proc/riscv_config_string. Other supervisors can use similar approaches
that make sense in the context of their own APIs.

-- Jacob

Andrew Waterman

unread,

Jan 26, 2017, 6:39:06 PM1/26/17

to Jacob Bachmeyer, Ilan Pardo, RISC-V ISA Dev

Agreed, that approach is cleaner and more extensible than a
CPUID-style instruction.

>
> -- Jacob

>
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit

> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/588A8649.1080401%40gmail.com.

Alex Bradbury

unread,

Jan 31, 2017, 7:01:55 AM1/31/17

to RISC-V ISA Dev

On 24 January 2017 at 09:40, Alex Bradbury <a...@asbradbury.org> wrote:
> Thinking some more, I believe can restate the main point of this
> thread more concisely:
>
> Suppose I author libfoo and ensure it is compatible with RV32IMF. A
> user sends me a bug report saying they tried to use it for a vendors
> RV32IMF microcontroller, compiling it using their SDK. They report it
> fails to run due to unhandled traps for divide and misaligned memory
> accesses. I should be able to tell them to file a bug with the vendor
> because the core+SDK they provide is failing to provide an
> RV32IMF-compatible environment.
>
> I had thought the above was a straight-forward conclusion from the
> RISC-V spec and something we agree on, but it seems that might not be
> the case. The RISC-V community could of course vote to standardise new
> subsets of RV32I/E. If people feel this is an important gap I would
> urge that they prepare a proposal as soon as possible so we can get
> the disruption to the RISC-V compatibility story out of the way.
>
> As an additional concrete recommendation, I suggest that the next
> revision of the ISA specification should make a note that some
> low-level code (such as trap handlers) must be written with knowledge
> of the underlying implementation choices made by the target. One place
> such a note might be useful is on page 18, when describing how
> misaligned accesses are supported.

Seeing as this part of the thread seems to have gone unanswered over
the past week I thought I'd re-highlight the above. Do we have a
common compiler and programmer target, or are all bets off when
targeting M-mode-only microcontroller implementations?

Best,

Alex

Alex Bradbury

unread,

Feb 7, 2017, 2:08:29 PM2/7/17

to RISC-V ISA Dev

I'm surprised there seems to be no interest in this question.
Fundamentally, it seems very unclear whether RISC-V is forking in to
incompatible implementation subsets before we really get started.

In an attempt to move the conversation forwards, I suggest the following:

* The section in the spec on misaligned accesses clarifies that
misaligned trap handlers do not need to be provided in a ROM, but
should be provided by the SDK for a core that claims to be RISC-V

* We differentiate between a RISC-V core and a RISC-V execution
environment. An RV32IMF execution environment can be handled even on
very minimal cores by trapping and emulating. However, to say a core
is (for instance) RV32IMF it must support all I instructions in
hardware (other than the SYSTEM opcode, though again the SDK should
provide handlers), and support all M and F instructions in hardware. A
conforming, but slow, implementation could meet this requirement by
handling unimplemented M and F instructions in a handler in a ROM.
However this is still a useful distinction, as the details of this are
hidden from any programs being compiled and run on the device.

The above would be consistent with the publicly available cores
designed so far, ensures application portability (see my libfoo
question above), and avoids a breaking change in what has been
marketed extensively as a stable specification.

Best,

Alex

Andrew Waterman

unread,

Feb 7, 2017, 2:44:19 PM2/7/17

to Alex Bradbury, RISC-V ISA Dev

Hi Alex,

I'm also interested in this issue.

I agree that the capability of emulating any missing instructions
should be a requirement for claiming compatibility with an extension.

Nevertheless, I maintain that not supporting misaligned accesses in
M-mode is not a breaking change, since only the user spec has ever
been claimed to be frozen, and the text doesn't imply that M-mode
provides a superset of the user-mode functionality. (I would say the
commentary on emulating misaligned accesses and the SYSTEM opcode
implies the opposite.)

It is clear that the privileged architecture specification needs to
resolve this debate, setting the parameters for what constitutes a
conformant M-mode implementation.

Krste and I have been thinking about defining the notion of an M-mode
"profile," which specifies what the hardware natively implements.
This will be the assembly programming/compiler target necessary to
implement the emulation libraries, or to program the machine directly
to avoid the code size/performance cost of emulation.

Andrew

> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2BwH295XdhTcyRD%2BJB%2BXX6Co9eSUtEgO0Ai89xXUZ-xnUPwRkw%40mail.gmail.com.

Stefan O'Rear

unread,

Feb 7, 2017, 3:07:10 PM2/7/17

to Alex Bradbury, RISC-V ISA Dev

On Tue, Feb 7, 2017 at 11:08 AM, Alex Bradbury <a...@asbradbury.org> wrote:

> * The section in the spec on misaligned accesses clarifies that
> misaligned trap handlers do not need to be provided in a ROM, but
> should be provided by the SDK for a core that claims to be RISC-V

What's a "ROM"? What's a "SDK"? Both of these terms are at the wrong
semantic level to be used in an ISA spec.

-s

Bruce Hoult

unread,

Feb 7, 2017, 3:55:12 PM2/7/17

to Stefan O'Rear, Alex Bradbury, RISC-V ISA Dev

I completely agree with you about "SDK".

The ISA spec describes the hardware interface as seen by machine code loaded by a customer. Certain things should be guaranteed to work as expected. Performance is not part of the ISA spec, but is a quality of implementation detail that of course is of interest to customers :-)

*How* the ISA spec is implemented is somewhat irrelevant, but it could be useful to include non-normative text that makes clear that "hardware" isn't necessarily random logic but can also cover microcode or normal RV machine code executing out of gate ROM or mask ROM or One-Time-Programmable/EPROM. The important point being, I think, that at power-on the full RISC-V ISA interface is set up and available by the time you get to the first user-written instruction.

Jacob Bachmeyer

unread,

Feb 7, 2017, 6:00:48 PM2/7/17

to Bruce Hoult, Stefan O'Rear, Alex Bradbury, RISC-V ISA Dev

It may be wise to require that "hardware" be fixed and unchangeable, as
this is the common expectation. That rules out OTP/EPROM and the like,
but leaves an option for any type of fabrication-programmed ROM. I am
concerned about possible security issues here, and immutability is the
best defense against some very nasty low-level sabotage.

Requiring the full ISA to be available at the first user-written
instruction makes sense for a "full" system, with a supervisor, but to
think of the RISC-V privilege levels as a hierarchy with machine mode
most powerful at the top is wrong. Better to think of them as
"implementation levels", with machine mode at the bottom--M-mode may
very well be responsible for providing functionality expected by
higher-level modes H, S, U. For the higher levels, misaligned accesses
"just work", for example.

A microcontroller that does not *have* higher levels is then an
interesting case. I suggest that some aspects of user mode, like
misaligned accesses, may or may not be available in machine mode. Yes,
this means that M-mode-only implementations would be exempt from the
"misaligned access must be supported" requirement, but any reasonable
implementation will document support for misaligned access if present
and generate misaligned access traps to mtvec otherwise.

Alex's proposal to distinguish between a core and an execution
environment (from earlier on this thread) is an interesting solution to
this problem. Essentially, it is a requirement that support libraries
for RISC-V implementations be documented and at least semi-transparent.
I like the idea, but am uncertain whether it is within the logical scope
of an ISA spec.

-- Jacob

Alex Bradbury

unread,

Feb 9, 2017, 2:11:09 PM2/9/17

to Andrew Waterman, RISC-V ISA Dev

On 7 February 2017 at 19:43, Andrew Waterman <wate...@eecs.berkeley.edu> wrote:
> Hi Alex,
>
> I'm also interested in this issue.
>
> I agree that the capability of emulating any missing instructions
> should be a requirement for claiming compatibility with an extension.
>
> Nevertheless, I maintain that not supporting misaligned accesses in
> M-mode is not a breaking change, since only the user spec has ever
> been claimed to be frozen, and the text doesn't imply that M-mode
> provides a superset of the user-mode functionality. (I would say the
> commentary on emulating misaligned accesses and the SYSTEM opcode
> implies the opposite.)

"The base ISA supports misaligned accesses". Unless we're talking
about a new ISA or a new "base", I don't really see how this can be
avoided. Either way, it's probably more productive for me to try and
better understand the motivation for this subsetting.

RV32[I,E]/RV64I/RV128I have consistently been explained as offering a
"base ISA". I hope you can see why this talk of further subsetting
this "base" seems surprising to me. In fact the existence of this
common base has been put forward as a key RISC-V design requirement in
order to ensure software compatibility. Failing to provide the full
semantics described in the RV32I specification in an M-mode-only
implementation (e.g. supporting rdcycle/rdtime/rdinstret, and
misaligned memory accesses) creates a fork in the software ecosystem.
My routines which were written against the RV32I "base" all of a
sudden may fail to work on an M-mode-only implementation. These
minimal implementations are going to represent, by a huge margin, the
majority of RISC-V cores in existence. To me that's a strong argument
for ensuring compatibility unless there is a compelling reason not to.

As I think my emails make clear, I personally am concerned about the
downside to further subsetting the base RISC-V ISA. What I'm
struggling to understand is the upside that you see? Providing default
trap routines for misaligned access and to transparently handling
rdcycle/rdtime/rdinstret is surely at worst just going to mean RISC-V
code size grows a couple of hundred bytes in a naive comparison vs
ARM. Those bytes can always be clawed back with a simple compiler flag
if someone feels happy moving away from the common base.

Stefan, Bruce, Jacob - I totally accept your point that there may be a
better place to insert the requirement I'm suggesting. I think what
I'm trying to achieve is clear anyway. The way I see it, the ISA is a
"contract" defining an interface between hardware and software. I hope
we all agree that wherever possible we want to minimise the number of
different such interfaces. However, it would be consistent with
existing RISC-V implementations and approaches taken in other
proposals for RISC-V (e.g. the SBI) to allow part of this "contract"
to be fulfilled by low level support code. It has be fulfilled by
something though.

> Krste and I have been thinking about defining the notion of an M-mode
> "profile," which specifies what the hardware natively implements.
> This will be the assembly programming/compiler target necessary to
> implement the emulation libraries, or to program the machine directly
> to avoid the code size/performance cost of emulation.

I definitely like the idea of M-mode profiles in general, though more
for conveying information about non-core functionality like debug or
tuning information (e.g. if you div, it's going to trap and be SLOW).
You could even associate standard instruction latencies with a M-mode
profile, meaning the compiler can have a scheduling model matching
that profile rather than a specific implementation.

Best,

Alex

Allen J. Baum

unread,

Feb 10, 2017, 1:03:34 AM2/10/17

to Alex Bradbury, Andrew Waterman, RISC-V ISA Dev

At 7:11 PM +0000 2/9/17, Alex Bradbury wrote:
><stuff>...

>I definitely like the idea of M-mode profiles in general, though more
>for conveying information about non-core functionality like debug or
>tuning information (e.g. if you div, it's going to trap and be SLOW).
>You could even associate standard instruction latencies with a M-mode
>profile, meaning the compiler can have a scheduling model matching
>that profile rather than a specific implementation.

And I'm sure its obvious that if you have an Mmode profile that says DIV will trap and be slow, then its not a stretch to say the same about unaligned accesses. The requirement here is that there is some built in facility to implement that trap and emulate the instruction.

From a security perspective, I don't like that guarantee to involve anything on a separate chip, but your mileage may vary.

All of which to me implies ROM at some vector that can't be overridden - unless someone can come up with some other way to guarantee to honor that contract.

That, in turn, implies that the standard vector entry point be in that ROM (and that it falls through to a handler that may be modified), or that a hidden vector is taken instead that is separate from the normal handler (maybe all of the unimplemented instructions trap there and fall through to the normal handler if hidden handler decides it's not its business to handle it).

That is just another way of introducing microcode, and I'd be perfectly happy with that solution (it's rather time and product tested, after all) though I'm not sure how your squeeze that into the architectural spec. From a programmer's perspective, all that's visible is carving out a (very) small chunk of physical address space and making some of it execute only. That's not a large overhead, even in a very minimal implementation (and one that isn't minimal probably won't need it)
--
**************************************************
* Allen Baum tel. (908)BIT-BAUM *
* 248-2286 *
**************************************************

Alex Bradbury

unread,

Feb 10, 2017, 2:07:25 AM2/10/17

to Allen J. Baum, Andrew Waterman, RISC-V ISA Dev

On 10 February 2017 at 06:03, Allen J. Baum

<allen...@esperantotech.com> wrote:
> At 7:11 PM +0000 2/9/17, Alex Bradbury wrote:
>><stuff>...

>>I definitely like the idea of M-mode profiles in general, though more
>>for conveying information about non-core functionality like debug or
>>tuning information (e.g. if you div, it's going to trap and be SLOW).
>>You could even associate standard instruction latencies with a M-mode
>>profile, meaning the compiler can have a scheduling model matching
>>that profile rather than a specific implementation.
>

> And I'm sure its obvious that if you have an Mmode profile that says DIV will trap and be slow, then its not a stretch to say the same about unaligned accesses. The requirement here is that there is some built in facility to implement that trap and emulate the instruction.

Hi Allen, I think we're talking along similar lines. The existing ISA
spec makes it clear misaligned accesses can be handled with a software
routine. As you say, an M-mode profile could indicate whether this is
the case (and so you can expect it to be slow). I'm concerned about
the idea of an M-mode profile that makes misaligned accesses illegal
(i.e. where software can't expect such a routine to exist), as this is
either introducing a backwards-incompatible change to the spec or else
a new base ISA - pretty huge steps that should be strongly motivated
IMHO.

Best,

Alex

Allen J. Baum

unread,

Feb 10, 2017, 3:02:19 AM2/10/17

to Alex Bradbury, Andrew Waterman, RISC-V ISA Dev

Oh, I agree. But I see the errors of my ways now.
I am expecting that, if a part has a profile that requires DIV, but doesn't implement it, then then the bit indicates the either the presence of HW, or the presence of "microcode" to perform the operation.

Since all profiles require misaligned access, the "misaligned-access bit" is always present, and indicates HW vs SW. In no case does it indicate "not supported" - that is out of spec compliance. And, if it indicates SW, but that SW isn't available for any reason (I would argue that means it must be on-die or on-package), ditto.

Bruce Hoult

unread,

Feb 10, 2017, 3:33:02 AM2/10/17

to Alex Bradbury, Allen J. Baum, Andrew Waterman, RISC-V ISA Dev

I can tell you that, as an example, misaligned accesses on the HiFive1 cause your program to crash, whether in the Arduino environment or using the freedom-e-sdk.

E.g. using freedom-e-sdk (if you allow it to inline then it optimises to "return 0x05040302")

======================

__attribute__((noinline))

int deref(char *p){

return *(int*)p;

}

const char v[] = {1, 2, 3, 4, 5};

int main(){

return deref(v+1);

}

======================

20400070 <main>:

20400070: 85118513 addi a0,gp,-1967 # 800008f9 <v+0x1>

20400074: a065 j 2040011c <deref>

2040011c <deref>:

2040011c: 4108 lw a0,0(a0)

2040011e: 8082 ret

======================

core freq at 257913651 Hz

trap

Progam has exited with code:0x00000005

======================

Remove the +1 and you get:

======================

core freq at 257871052 Hz

Progam has exited with code:0x04030201

======================

Probably you could add your own trap handler. I haven't tried that. But for sure the environment, as delivered, does not add one for you.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2BwH297xY0%3DOTjGAkxXNZJ3%3DAbvyqYDTYQkcTjA4atu-pkLsHg%40mail.gmail.com.

Bruce Hoult

unread,

Feb 10, 2017, 3:43:35 AM2/10/17

to RISC-V ISA Dev, a...@asbradbury.org, allen...@esperantotech.com, wate...@eecs.berkeley.edu, br...@hoult.org

To clarify ... obviously there *is* a trap handler ... it catches it and prints a nice message to stdout (USB serial) and exits. That's better than some of the alternatives. But it doesn't fix up the misaligned access for you.

p.s. I find it's nicer to leave the Arduino "Serial Monitor" window open as the terminal (set to 115200, the freedom-e-sdk default), rather than gnu screen or something, even when using gcc & openocd directly. You need to leave some sketch source code open, but you can minimize/ignore/hide that.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Clifford Wolf

unread,

Feb 10, 2017, 4:05:52 AM2/10/17

to Alex Bradbury, Allen J. Baum, Andrew Waterman, RISC-V ISA Dev

Hi,

On Fri, Feb 10, 2017 at 07:07:21AM +0000, Alex Bradbury wrote:
> Hi Allen, I think we're talking along similar lines. The existing ISA
> spec makes it clear misaligned accesses can be handled with a software
> routine. As you say, an M-mode profile could indicate whether this is
> the case (and so you can expect it to be slow). I'm concerned about
> the idea of an M-mode profile that makes misaligned accesses illegal
> (i.e. where software can't expect such a routine to exist), as this is
> either introducing a backwards-incompatible change to the spec or else
> a new base ISA - pretty huge steps that should be strongly motivated
> IMHO.

IMO the current spec only says that misaligned accesses is illegal in user
mode, since it describes the "User-Level ISA". Really nothing is said in
the 2.1 spec Vol. 1 about other modes than U mode. Thus, making misaligned
accesses illegal for e.g. M-mode does not introduce a backwards-incompatible
change.

But by the same argument it would be possible do disable the addi
instruction for M-mode without introducing a backwards-incompatible
change. Obviously that would be a very bad idea.

So I think it is a mistake that the spec does not mention other modes and I
think several paragraphs should be added to the spec:

- Somewhere in the introduction it should mention that other modes than
user mode exists, and that all parts of the spec apply to all modes
unless mentioned otherwise.

- In the "Timers and Counters" section it should be mentioned that this
are U-mode CSRs that should only be used in code running in U-mode,
that there are other timer/counter CSRs for other modes, and M-mode only
machines don't implement this U-mode CSRs.

- In "Load and Store Instructions" it should be clarified if misaligned
memory access is something that is supported in other modes than U-mode.

- A comment should be added to "Calling Convention" clarifying that even
though the hard-float calling convention requires RV32G/RV64G, that
does not imply that the hardware must support this ISA natively. Thus
there is for example utility in building a binary for RV32IMF when this
is what your CPU actually can do, even when the OS running on the core
provides a full RV32G user land.

I disagree with Alex and do *not* think the ISA spec should require support
for misaligned memory access in other modes than U-mode. Here is why:

The spec gives two reasons why misaligned memory access is supported by the
user-land ISA: porting legacy code, and packed-SIMD performance.

Let me address packed-SIMD performance first: Right now there is no
packed-SIMD extension. When there is one, it would be really stupid to
generate packed-SIMD code with misaligned memory access and then the
misaligned memory accesses are handled by ISRs. The code would very likely
be significantly slower than without packed-SIMD. Therefore it would
probably be a good idea to make fast misaligned memory access a requirement
for the packed-SIMD extension. In this case it would be reasonable to also
require it to work in all modes, not just U-mode.

Regarding porting legacy code: Is this really such a big issue for code
that is neither going to run in U-mode nor within a large OS kernel? (An OS
kernel is of course free to implement a trap handler for misaligned mem
access and support it as an OS-specific extensions for its drivers.
However, I would suggest that most OS developers would prefer performence
over convenience in this regard, i.e. opt to not add a kernel-mode trap
handler to implement misaligned mem access from within the kernel code.)

Clearly, the legacy code the spec is talking about is programs. Practically
all existing cross-platform kernel code (drivers, network stacks, etc.)
probably already avoids misaligned memory access because that is something
that's not available cross-platform.

And what about M-mode only microcontrollers programmed on the bare-metal
level without any OS? I think it would be ridiculous to have a situation
where the system I'm writing code for is a RISC-V if and only if I
implement a trap handler for misaligned memory access within my own code.

That's why I think misaligned memory access should only be required to work
in U-mode. It is something that the OS code provides for userland, but not
something that necessarily the OS has to provide for itself in order to
make the environment it runs in RISC-V compliant. And when there is no OS
(such in the case of M-mode only systems), then there is no guarantee that
misaligned memory access is supported.

regards,
- clifford

Alex Bradbury

unread,

Feb 10, 2017, 9:25:40 AM2/10/17

to Clifford Wolf, Allen J. Baum, Andrew Waterman, RISC-V ISA Dev

On 10 February 2017 at 09:05, Clifford Wolf <clif...@clifford.at> wrote:
> Hi,
>
> On Fri, Feb 10, 2017 at 07:07:21AM +0000, Alex Bradbury wrote:
>> Hi Allen, I think we're talking along similar lines. The existing ISA
>> spec makes it clear misaligned accesses can be handled with a software
>> routine. As you say, an M-mode profile could indicate whether this is
>> the case (and so you can expect it to be slow). I'm concerned about
>> the idea of an M-mode profile that makes misaligned accesses illegal
>> (i.e. where software can't expect such a routine to exist), as this is
>> either introducing a backwards-incompatible change to the spec or else
>> a new base ISA - pretty huge steps that should be strongly motivated
>> IMHO.
>
> IMO the current spec only says that misaligned accesses is illegal in user
> mode, since it describes the "User-Level ISA". Really nothing is said in
> the 2.1 spec Vol. 1 about other modes than U mode. Thus, making misaligned
> accesses illegal for e.g. M-mode does not introduce a backwards-incompatible
> change.
>
> But by the same argument it would be possible do disable the addi
> instruction for M-mode without introducing a backwards-incompatible
> change. Obviously that would be a very bad idea.

Hi Clifford, thanks for your well-reasoned description of the way you
view the issue.

I think the RISC-V community definitely could choose to introduce a
new "base", which is effectively what RV32I without misaligned access
would be. It just means that to be fully portable software must target
the new base (let's call it RVm32I). This doesn't feel like an
insignificant change, with being getting on for 3 years since the
release of v2.0 of the current base. If that's the way people decide
to go, I think your suggested edits would really help to clarify
things. I will just note my personal concern isn't so much legacy
code, but programmers knowing what RISC-V subset to target and what
level of portability to expect.

> I think it would be ridiculous to have a situation
> where the system I'm writing code for is a RISC-V if and only if I
> implement a trap handler for misaligned memory access within my own code.

On this issue, I was suggesting this be provided by the default SDK
rather than being something you need to explicitly insert in your
code.

Ultimately, I think although it's clear I have an opinion on this
issue, I'd also like to emphasise that I'm not religious about it.
Clarifying what the proposed change is and why is my main aim - I also
think there is an advantage in pinning this down one way or another as
soon as possible. Perhaps we can agree that in the ideal world the 2.0
spec would have made it clear that the RISC-V "base" that's common
across RISC-V targets doesn't include a guarantee of access to
counters or the presence of support for misaligned access. The
question is whether it's better to stick to the old baseline + allow
cores to provide that interface with low-level support software, or
else whether to introduce this new subset.

Best,

Alex

David Horner

unread,

Feb 10, 2017, 3:03:45 PM2/10/17

to RISC-V ISA Dev, a...@asbradbury.org, allen...@esperantotech.com, wate...@eecs.berkeley.edu

On Friday, 10 February 2017 04:05:52 UTC-5, clifford wrote:

Hi,

....

And when there is no OS
(such in the case of M-mode only systems), then there is no guarantee that
misaligned memory access is supported.

IMO the relevant situation is when there is no U-Mode.
As in the case of a M-mode only system, the U-Mode guarantees (e.g. non-alligned data access) are irrelevant.

I agree with all your other comments and conclusions, including: that there must be some guarantees stipulated for the other Modes, especially M-mode.

David Horner

unread,

Feb 13, 2017, 7:38:35 PM2/13/17

to RISC-V ISA Dev, a...@asbradbury.org, allen...@esperantotech.com, wate...@eecs.berkeley.edu

An example of my take on providing meaningful representation of the standard is:

if U-mode is supported, the RV32IA designation is valid even if all the atomic instructions trap and are emulated in M-mode [and , even if system-wide serializing all memory access is the reason for the atomic nature of the emulation].

However, for the same implementation without U-mode (i.e. M-mode only) the RV32IA designation would not be valid, even with tapping and emulation used in M-Mode.

Reply all

Reply to author

Forward