unimp instructions

Luís Marques

unread,

Nov 29, 2018, 6:49:07 AM11/29/18

to sw-...@groups.riscv.org

Dear all,

GNU binutils defines a de facto standard unimplemented instruction, unimp. The binary form of that instruction depends on whether compressed instructions are enabled or not (e.g., directly by using .option norvc, or indirectly by setting the -march switch). Here's the current support matrix:

unimp | .option norvc | c0001073
unimp | .option rvc   | 0000
c.unimp | .option norvc | error
c.unimp | .option rvc   | error

A few comments:

1) Since all compressed instructions are in the form c.op, I believe it would make sense to support an explicit c.unimp.

2) One advantage of the current unimp design is that it's possible to distinguish between 16-bit and 32-bit unimp forms. A few other design would be possible, but that's probably something that we would want to preserve. If that were unimportant, we could simply use 0000_0000 as a 32-bit unimp, which also has its advantages.

3) The ISA spec doesn't specify 0000 as a canonical unimplemented instruction, but it does specify it as an invalid instruction. Therefore, if an implementation traps on invalid instructions, then it's guaranteed to trap. So keeping 0000 as a 16-bit unimp seems perfect. It would probably be worth documenting its use as a canonical 16-bit unimp in the spec, though.

4) The ISA spec doesn't have anything to say about the instruction c0001073. This is basically a variant of the ECALL/EBREAK/etc. family of instructions. The RISC-V specs also don't seem to guarantee that an implementation traps on unimplemented instructions. Even ignoring that, theoretically there's no guarantee that c0001073 isn't implemented, since it's not reserved for that purpose. Ideally we would use something like an all ones instruction as a 32-bit unimp, but the ISA spec pretty much rejects that possibility:

"Defining a 32-bit word of all ones as illegal was also considered, as all machines must support a 32-bit instruction size, but this requires the instruction-fetch unit on machines with ILEN>32 report an illegal instruction exception rather than access fault when such an instruction borders a protection boundary, complicating variable-instruction-length fetch and decode."

So, if we really have to reserve a legal instruction pattern for a 32-bit unimplemented instruction, we might as well choose the current c000107. I would just recommend that the ISA spec documents this. Preferably as a something actually determined by the spec, but if that's not possible or agreed upon then, IMO, the spec should at least mention that it's a de facto standard practice.

Assuming we want to keep the current 0000 / c0001073 design, and an explicit c.unimp is accepted, then this would be the result:

unimp | .option norvc | c0001073
unimp | .option rvc   | 0000
c.unimp | .option norvc | error
c.unimp | .option rvc   | 0000

I have an LLVM patch that adds support for unimp (c0001073) and c.unimp (0000), pending discussion of this issue.

Do you agree on keeping (and documenting) the current unimp design, plus support for an explicit c.unimp?

Best,
Luís

Andrew Waterman

unread,

Nov 29, 2018, 7:19:42 AM11/29/18

to luism...@lowrisc.org, RISC-V SW Dev

Hi Luis,

Thanks for the clear and thoughtful note. My responses are in-line below.

On Thu, Nov 29, 2018 at 3:49 AM Luís Marques <luism...@lowrisc.org> wrote:

Dear all,

GNU binutils defines a de facto standard unimplemented instruction, unimp. The binary form of that instruction depends on whether compressed instructions are enabled or not (e.g., directly by using .option norvc, or indirectly by setting the -march switch). Here's the current support matrix:

unimp | .option norvc | c0001073
unimp | .option rvc | 0000
c.unimp | .option norvc | error
c.unimp | .option rvc | error

A few comments:

1) Since all compressed instructions are in the form c.op, I believe it would make sense to support an explicit c.unimp.

Concurred. Omitting c.unimp from binutils was just an oversight on my part.

2) One advantage of the current unimp design is that it's possible to distinguish between 16-bit and 32-bit unimp forms. A few other design would be possible, but that's probably something that we would want to preserve. If that were unimportant, we could simply use 0000_0000 as a 32-bit unimp, which also has its advantages.

I reasoned that the 32-bit UNIMP unambiguously decoding as a 32-bit-long instruction was the most important concern. If not for that consideration, 0x00000000 would've been the obvious choice.

3) The ISA spec doesn't specify 0000 as a canonical unimplemented instruction, but it does specify it as an invalid instruction. Therefore, if an implementation traps on invalid instructions, then it's guaranteed to trap. So keeping 0000 as a 16-bit unimp seems perfect. It would probably be worth documenting its use as a canonical 16-bit unimp in the spec, though.

The statement in the spec that "Encodings with bits [15:0] all zeros are defined as illegal instructions" is tantamount to stating that "0x0000 is the canonical illegal instruction." My preference is to take no action, but could alternatively add some commentary to the spec.

4) The ISA spec doesn't have anything to say about the instruction c0001073. This is basically a variant of the ECALL/EBREAK/etc. family of instructions. The RISC-V specs also don't seem to guarantee that an implementation traps on unimplemented instructions. Even ignoring that, theoretically there's no guarantee that c0001073 isn't implemented, since it's not reserved for that purpose. Ideally we would use something like an all ones instruction as a 32-bit unimp, but the ISA spec pretty much rejects that possibility:

This is the instruction CSRRW x0, cycle, x0. As the specification states, cycle is a read-only CSR; so, whether or not the implementation provides the cycle CSR, this instruction is guaranteed to trap.

"Defining a 32-bit word of all ones as illegal was also considered, as all machines must support a 32-bit instruction size, but this requires the instruction-fetch unit on machines with ILEN>32 report an illegal instruction exception rather than access fault when such an instruction borders a protection boundary, complicating variable-instruction-length fetch and decode."

So, if we really have to reserve a legal instruction pattern for a 32-bit unimplemented instruction, we might as well choose the current c000107. I would just recommend that the ISA spec documents this. Preferably as a something actually determined by the spec, but if that's not possible or agreed upon then, IMO, the spec should at least mention that it's a de facto standard practice.

IMO this is not an ISA issue, but an ABI/assembler issue, so it should be documented here: https://github.com/riscv/riscv-asm-manual including my note about why CSRRW x0, cycle, x0 is guaranteed to be illegal.

Assuming we want to keep the current 0000 / c0001073 design, and an explicit c.unimp is accepted, then this would be the result:

unimp | .option norvc | c0001073
unimp | .option rvc | 0000
c.unimp | .option norvc | error
c.unimp | .option rvc | 0000

I have an LLVM patch that adds support for unimp (c0001073) and c.unimp (0000), pending discussion of this issue.

Do you agree on keeping (and documenting) the current unimp design, plus support for an explicit c.unimp?

Yep.

Best,
Luís

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CAPhv3np5TOijeWDhYqrBZsH-f-zHAo5EzMDKjNK5ohfYM%2BN5Yw%40mail.gmail.com.

Alex Bradbury

unread,

Nov 29, 2018, 8:27:49 AM11/29/18

to Andrew Waterman, Luís Marques, sw-...@groups.riscv.org

On Thu, 29 Nov 2018 at 12:19, Andrew Waterman
<wate...@eecs.berkeley.edu> wrote:
> On Thu, Nov 29, 2018 at 3:49 AM Luís Marques <luism...@lowrisc.org> wrote:
>> 4) The ISA spec doesn't have anything to say about the instruction c0001073. This is basically a variant of the ECALL/EBREAK/etc. family of instructions. The RISC-V specs also don't seem to guarantee that an implementation traps on unimplemented instructions. Even ignoring that, theoretically there's no guarantee that c0001073 isn't implemented, since it's not reserved for that purpose. Ideally we would use something like an all ones instruction as a 32-bit unimp, but the ISA spec pretty much rejects that possibility:
>
>
> This is the instruction CSRRW x0, cycle, x0. As the specification states, cycle is a read-only CSR; so, whether or not the implementation provides the cycle CSR, this instruction is guaranteed to trap.

Though instructions such as CSRRW are optional in the specification
proposed for ratification. By my understanding that means an
implementation that doesn't provide Zicsr may either:
1) Re-use that encoding space for something else, or
2) Fail to trap on unimplemented instructions

Of course it would make a lot of sense any platform spec / execution
environment spec to mandate that unimp traps, but right now all the
compiler has to go on is the -march=rv... string. I'm fully in favour
of adding c.unimp and unimp to the asm manual (and thanks Luís for
summarising the proposal so clearly!), just noting that we may need to
look again at this in the future.

Best,

Alex

Andrew Waterman

unread,

Nov 29, 2018, 8:43:34 AM11/29/18

to Alex Bradbury, luism...@lowrisc.org, RISC-V SW Dev

On Thu, Nov 29, 2018 at 5:27 AM Alex Bradbury <a...@lowrisc.org> wrote:

On Thu, 29 Nov 2018 at 12:19, Andrew Waterman
<wate...@eecs.berkeley.edu> wrote:
> On Thu, Nov 29, 2018 at 3:49 AM Luís Marques <luism...@lowrisc.org> wrote:
>> 4) The ISA spec doesn't have anything to say about the instruction c0001073. This is basically a variant of the ECALL/EBREAK/etc. family of instructions. The RISC-V specs also don't seem to guarantee that an implementation traps on unimplemented instructions. Even ignoring that, theoretically there's no guarantee that c0001073 isn't implemented, since it's not reserved for that purpose. Ideally we would use something like an all ones instruction as a 32-bit unimp, but the ISA spec pretty much rejects that possibility:
>
>
> This is the instruction CSRRW x0, cycle, x0. As the specification states, cycle is a read-only CSR; so, whether or not the implementation provides the cycle CSR, this instruction is guaranteed to trap.

Though instructions such as CSRRW are optional in the specification
proposed for ratification. By my understanding that means an
implementation that doesn't provide Zicsr may either:
1) Re-use that encoding space for something else, or

Reusing part of the standard encoding space is non-conforming. We don't directly support non-conforming targets in Binutils.

2) Fail to trap on unimplemented instructions

If the implementation fails to trap on unimplemented instructions, then it doesn't matter which opcode we pick for this purpose...

Luís Marques

unread,

Nov 29, 2018, 9:19:25 AM11/29/18

to wate...@eecs.berkeley.edu, Alex Bradbury, sw-...@groups.riscv.org

On Thu, Nov 29, 2018 at 1:43 PM Andrew Waterman <wate...@eecs.berkeley.edu> wrote:

If the implementation fails to trap on unimplemented instructions, then it doesn't matter which opcode we pick for this purpose...

It might not make a real-world difference, but I guess it could be argued that:

1) That implementation could still trap on *invalid* instructions, so picking something like all zeroes instead would matter;

2) We could have an instruction which the base spec mandates must trap (i.e., effectively just document the 32-bit unimp as part of RV32I), even if unimplemented instructions in general do not trap.

Jim Wilson

unread,

Nov 29, 2018, 4:46:21 PM11/29/18

to luism...@lowrisc.org, Andrew Waterman, Alex Bradbury, sw-...@groups.riscv.org

I checked in a binutils patch to add the missing c.unimp.
https://sourceware.org/ml/binutils-cvs/2018-11/msg00179.html

Jim

Andrew Waterman

unread,

Nov 29, 2018, 5:08:14 PM11/29/18

to Jim Wilson, Alex Bradbury, luism...@lowrisc.org, sw-...@groups.riscv.org

Thanks, Jim.

Bruce Hoult

unread,

Nov 29, 2018, 7:54:35 PM11/29/18

to luism...@lowrisc.org, Andrew Waterman, Alex Bradbury, sw-...@groups.riscv.org

I think I don't quite understand why 0x00000000 is a bad value for a 32 bit unimp.

If C is implemented then it hits c.unimp.

If C is not implemented then it's an invalid encoding, which will trap.

I guess the only risk is someone implementing something other than C using the 16 bit encodings. Which hits the "Reusing part of the standard encoding space is non-conforming. We don't directly support non-conforming targets in Binutils." It's probably not hard for anyone who does want to reuse 16 bit opcodes for a non-conforming extension to keep 0x0000 unused anyway.

--

You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CAPhv3nquL6k1HOi4C1-vj2wFNvup3xodsKyzNJ%2BYoa7KsarS8Q%40mail.gmail.com.

Andrew Waterman

unread,

Nov 29, 2018, 8:08:55 PM11/29/18

to Bruce Hoult, Alex Bradbury, luism...@lowrisc.org, sw-...@groups.riscv.org

On Thu, Nov 29, 2018 at 4:54 PM Bruce Hoult <bruce...@sifive.com> wrote:

I think I don't quite understand why 0x00000000 is a bad value for a 32 bit unimp.

If C is implemented then it hits c.unimp.

If C is not implemented then it's an invalid encoding, which will trap.

That’s all true. It’s about disassembly more than anything else - 00000000 won’t be disassembled as a 32-bit instruction but as two 16-bit ones. Not that this is fundamentally wrong; it’s just confusing.

Bruce Hoult

unread,

Nov 29, 2018, 8:11:57 PM11/29/18

to Andrew Waterman, Alex Bradbury, luism...@lowrisc.org, sw-...@groups.riscv.org

Seeing c.unimp twice in a row isn't awful. You could always special-case the disassembler :-)

Liviu Ionescu

unread,

Nov 29, 2018, 11:52:29 PM11/29/18

to Andrew Waterman, Bruce Hoult, Alex Bradbury, luism...@lowrisc.org, sw-...@groups.riscv.org

> On 30 Nov 2018, at 03:08, Andrew Waterman <wate...@eecs.berkeley.edu> wrote:
>
>
> ... It’s about disassembly more than anything else - 00000000 ...

sorry if I missed the details, but does this definition affect only the disassembler, or it also affects the instruction generated by the compiler for cases of undefined behaviour?

since the current solution for these cases (EBREAK) is quite convenient, it always breaks to the debugger without the developer having to do anything special, while 0x0000 would require the presence of a proper trap handler and manually inspecting system registers to detect the trap cause, which is definitely more tedious, at least for microcontroller case applications.

regards,

Liviu

Andrew Waterman

unread,

Nov 30, 2018, 12:16:33 AM11/30/18

to Liviu Ionescu, Alex Bradbury, Bruce Hoult, luism...@lowrisc.org, sw-...@groups.riscv.org