RiscV assembler - request for new feature

Dave Williams

unread,

May 21, 2017, 6:32:16 PM5/21/17

to RISC-V SW Dev

While working with the RiscV assembler (binutils version 2.28.51) - I discovered a 'feature'. My HDL test bench (Verilog HDL) revealed that the following RiscV base instructions can be invoked without all register operands.

SLT

SLTU

ADD

XOR

OR

AND

SLL

SRL

SRA

The assembler will convert these above commands to the immediate version of the base instruction if all register operands are NOT used. I stumbled upon this feature when I made a typo in my assembly source file.

For example, the RiscV assembly command

    slt x10,x11,12

will be assembled as

    slti x10,x11,12

with no warnings or errors thrown. So, if you intended to code

    slt x10,x11,12 and you did not depress the 'x' key for the 'x12' operand

you will get machine code for SLTI. A strict interpretation of the base command SLT in the ISA spec indicates this command requires all register operands. Of course, if you intended to use the immediate operand - the processor will execute this instruction seamlessly. The assembler appears to 'infer' that the immediate version was intended and change the instruction from SLT to SLTI. And writes machine code for SLTI even if the source file used SLT. I tested all the above list of base instructions and verified by inspecting machine code output.

My proposal is that for RiscV assembler, a command line option be added that would turn off this assembler feature of making these conversions. The existing functionality of the assembler would remain untouched and the assembler would default to the current operation of making these types of conversions. But, a new RiscV command line option would be added that enforces strict operand usage.

So if this new feature was invoked SLT regX,regY,imm or the example

slt x10,x11,12

would ether fail or at least warn that the operands are not technically correct.

My motivations for proposing this change are as follows. When you are implementing and verifying a synthesizable RiscV core - initially, I find it essential to work as close to the base instructions as possible. Adding more abstractions - even pseudoinstructions - only add to the difficulty of this task (I am not advocating any changes to the specified pseudoinstructions). I initially will verify instruction decode logic at the machine level and lean heavily on my assembly output listing. When I discovered this assembler feature, I found it to be a distraction and not of much value to me. Basically, I was left wondering why didn't the assembler enforce the correct operands for these base instructions? For those who want to see more details - GitHub riscv/riscv-binutils-gdb #79 contains discussion and executable test cases as well as a test case revealing a bug in 'objcopy' that should be resolved in #80. I had list file output that contained the mnemonic SLT paired along side SLTI machine code. Palmer and Andrew jumped on this hitch in 'objcopy' and after I reviewed the diff - it looks like this problem should be resolved in the next branch. When I proposed giving the user the option of turning off this feature in the the assembler, Palmer suggested I post my new feature request to this list. This appears to be a carryover feature from MIPS and there maybe some legacy reason why the assembler works this way. But speaking as a core implementer or as a implementor of any executable model of a RiscV core - where I am using assembly to create test stimulus - I see no value in this feature. Initially, my mantra is " Keep-It-Simple". So, I am soliciting feedback and comments.

Dave

Jacob Bachmeyer

unread,

May 21, 2017, 6:56:00 PM5/21/17

to Dave Williams, RISC-V SW Dev

Dave Williams wrote:
> While working with the RiscV assembler (binutils version 2.28.51) - I
> discovered a 'feature'. My HDL test bench (Verilog HDL) revealed that
> the following RiscV base instructions can be invoked without all
> register operands.
>

> [...]

I would suggest a further step in typo-proofing: require immediates to
be prefixed with either # (for decimal) or $ (for hexadecimal).
Register names in RISC-V are systematic enough that a % prefix is
probably unneeded, but might also be advisable.

If backwards compatibility prevents this change in all cases, I propose
adding an "-mstrict-operands" option to the assembler. When
"-mstrict-operands" (or an equivalent ".set strict-operands"?) is in
force, the assembler would require prefix characters on registers and
immediates. CSR names would be treated as symbols, rather than
registers, because they are unambiguous due to the special CSR access
instructions, but referencing a CSR by number would be considered an
immediate. The assembler would always accept prefix characters, but
would only require them if "strict-operands" is in effect.

-- Jacob

Michael Clark

unread,

May 21, 2017, 7:09:43 PM5/21/17

to Dave Williams, Jacob Bachmeyer, RISC-V SW Dev

On 22 May 2017, at 10:55 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:

Dave Williams wrote:
While working with the RiscV assembler (binutils version 2.28.51) - I discovered a 'feature'. My HDL test bench (Verilog HDL) revealed that the following RiscV base instructions can be invoked without all register operands.

[...]
My proposal is that for RiscV assembler, a command line option be added that would turn off this assembler feature of making these conversions. The existing functionality of the assembler would remain untouched and the assembler would default to the current operation of making these types of conversions. But, a new RiscV command line option would be added that enforces strict operand usage.

I noticed this too. I can’t remember if I emailed about it some time ago. Interestingly gcc emits “add” without the i for the immediate form, so a “canonical mode” would be incompatible with the current GCC port.

I have an experimental assembler on another branch in the binary translator I am working on that has the GCC/gas compatible behaviour:

https://github.com/rv8-io/rv8/blob/assembler-experiment/src/app/rv-asm.cc#L1107-L1116

So if this new feature was invoked SLT regX,regY,imm or the example

slt x10,x11,12

would ether fail or at least warn that the operands are not technically correct.

My motivations for proposing this change are as follows. When you are implementing and verifying a synthesizable RiscV core - initially, I find it essential to work as close to the base instructions as possible. Adding more abstractions - even pseudoinstructions - only add to the difficulty of this task (I am not advocating any changes to the specified pseudoinstructions). I initially will verify instruction decode logic at the machine level and lean heavily on my assembly output listing. When I discovered this assembler feature, I found it to be a distraction and not of much value to me. Basically, I was left wondering why didn't the assembler enforce the correct operands for these base instructions? For those who want to see more details - GitHub riscv/riscv-binutils-gdb #79 contains discussion and executable test cases as well as a test case revealing a bug in 'objcopy' that should be resolved in #80. I had list file output that contained the mnemonic SLT paired along side SLTI machine code. Palmer and Andrew jumped on this hitch in 'objcopy' and after I reviewed the diff - it looks like this problem should be resolved in the next branch. When I proposed giving the user the option of turning off this feature in the the assembler, Palmer suggested I post my new feature request to this list. This appears to be a carryover feature from MIPS and there maybe some legacy reason why the assembler works this way. But speaking as a core implementer or as a implementor of any executable model of a RiscV core - where I am using assembly to create test stimulus - I see no value in this feature. Initially, my mantra is " Keep-It-Simple". So, I am soliciting feedback and comments.

I would suggest a further step in typo-proofing: require immediates to be prefixed with either # (for decimal) or $ (for hexadecimal). Register names in RISC-V are systematic enough that a % prefix is probably unneeded, but might also be advisable.

If backwards compatibility prevents this change in all cases, I propose adding an "-mstrict-operands" option to the assembler. When "-mstrict-operands" (or an equivalent ".set strict-operands"?) is in force, the assembler would require prefix characters on registers and immediates. CSR names would be treated as symbols, rather than registers, because they are unambiguous due to the special CSR access instructions, but referencing a CSR by number would be considered an immediate. The assembler would always accept prefix characters, but would only require them if "strict-operands" is in effect.

-- Jacob

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/59221AFD.5080408%40gmail.com.

Bruce Hoult

unread,

May 22, 2017, 9:24:31 AM5/22/17

to Dave Williams, RISC-V SW Dev

This behaviour is inherited from MIPS assembly where "add $2,$4,13" produces the same opcode 2482000d as "addi $2,$4,13" (or "addi v0,a0,13").

Actually, it works elsewhere with gnu as too:

arm: add r0, r0, 13 => f100 000d add.w r0, r0, #13 (omitting the #)

arm64: add w0, w0, 13 => 11003400 add w0, w0, #0xd

I suspect askng RISC-V gnu as to omit this handy and common shortcut is unlikely to gain much traction. Assembly language is a sharp knife, use carefully :-)

--

You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CAFx0xxiUZbU4oKzNzFLnj73d_25SYsyP2AJfkFdHMFZORedW3w%40mail.gmail.com.

Dave Williams

unread,

May 22, 2017, 11:23:25 AM5/22/17

to Bruce Hoult, RISC-V SW Dev

On Mon, May 22, 2017 at 7:24 AM, Bruce Hoult <br...@hoult.org> wrote:

This behaviour is inherited from MIPS assembly where "add $2,$4,13" produces the same opcode 2482000d as "addi $2,$4,13" (or "addi v0,a0,13").

Actually, it works elsewhere with gnu as too:

arm: add r0, r0, 13 => f100 000d add.w r0, r0, #13 (omitting the #)
arm64: add w0, w0, 13 => 11003400 add w0, w0, #0xd

I suspect askng RISC-V gnu as to omit this handy and common shortcut is unlikely to gain much traction. Assembly language is a sharp knife, use carefully :-)

Hi Bruce

Yes - and I cut my hand on this 'feature' ;)

Perhaps handy for the software developer but not too handy for the person implementing/verifying the core. I believe over many years, assembly language is used less and less for general purpose SW development and more as a verification tool. HW as well as SW - when you disassemble your C code to check the implementation. And on the HW side, having the assembler create a shortcut for the SW developer does not necessarily help the person verifying the executable RiscV core. Yes, legacy is important. But the R in RiscV stands for 'reduced'. So with a concise,small instruction set - what value is there in saving the assembly language programmer the burden of using the correct operands? When you are developing for a processor ISA that has a large number of base instructions - this 'feature' may have more value. But I seem to recall MIPS has over 100+ base instructions and doesn't RV32I has less than 50?

You could always write a macro if you can't keep your operands straight.

This shortcut has one 'gotcha' that you have to remember. There is no SUBI in RiscV. So, you cannot invoke SUB regX,regY,imm without an error. I realize that SUB is probably not used often - when you can use ADD/ADDI with a 2's complement operand.

Dave

--

To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.

To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

Bruce Hoult

unread,

May 22, 2017, 11:34:10 AM5/22/17

to Dave Williams, RISC-V SW Dev

Yes, subi does not exist, and you get an error message, as appropriate:

Error: illegal operands `sub a0,a0,13'

Error: unrecognized opcode `subi a0,a0,13'

I'd probably support adding these as aliases for addi rather than removing the add alias for addi :-)

To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CAFx0xxg%3Ddp%3D6cvk6yXMNkA%3DSfR%3Dc8SJdo--XD9wnY_AyLmjceA%40mail.gmail.com.

Dave Williams

unread,

May 22, 2017, 5:10:29 PM5/22/17

to RISC-V SW Dev, dave.will...@gmail.com, jcb6...@gmail.com

Yes - great idea on requiring prefix characters for all operands. Raw numbers cannot be used.
And when I think back - isn't # and $ prefixes required with some of the old Moto processors (e.g. 68K?).

-Dave

Michael Clark

unread,

May 22, 2017, 5:42:07 PM5/22/17

to Dave Williams, RISC-V SW Dev, jcb6...@gmail.com

$ for numeric literals and % for register names are required by current gas (gnu assembler) in the default AT&T mode for x86-64.

The main concern will be compatibility between the compiler backends GCC/Clang/LLVM and binutils mainly because GCC and binutils/gas are maintained separately. I mention Clang/LLVM as they can be configured to either use the host assembler or the builtin integrated assembler.

Just to point out, C semantics interpret tokens starting with [0-9] as numbers and [a-zA-Z] as literals, so the current assembler semantics is no different to C semantics. I don’t think it should be changed.

Given the assembler already handles the canonical immediate opcodes, I would suggest the minimum change is actually in GCC and that is for it to emit the “i” version for instructions with immediate operands, then the assembly output would match the specification. This change would not require any change in binutils, rather it would require a GCC change. It’s a nit, but it would make the assembler output match the spec. I suspect LLVM may use the canonical form, as its assembler does not need to accept GCC output.

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/a62b270e-bfc7-45ba-8b97-0b7e351de686%40groups.riscv.org.

Michael Clark

unread,

May 22, 2017, 5:46:25 PM5/22/17

to Dave Williams, RISC-V SW Dev, jcb6...@gmail.com

The aliases for immediate versions are unfortunate as it prevents an assembler that reflects on specification metadata, as I tried to do. Even if GCC is changed to emit the “canonical form” for the immediate mode instructions, the aliases will need to be maintained in binutils for some time, perhaps forever.

Jacob Bachmeyer

unread,

May 22, 2017, 6:42:35 PM5/22/17

to Michael Clark, Dave Williams, RISC-V SW Dev

Michael Clark wrote:
> The aliases for immediate versions are unfortunate as it prevents an
> assembler that reflects on specification metadata, as I tried to do.
> Even if GCC is changed to emit the “canonical form” for the immediate
> mode instructions, the aliases will need to be maintained in binutils
> for some time, perhaps forever.

This is why I proposed a "strict-operands" assembler flag to select the
new behavior.

-- Jacob

>> On 23 May 2017, at 9:41 AM, Michael Clark <michae...@mac.com

>> <mailto:michae...@mac.com>> wrote:
>>
>> $ for numeric literals and % for register names are required by
>> current gas (gnu assembler) in the default AT&T mode for x86-64.
>>
>> The main concern will be compatibility between the compiler backends
>> GCC/Clang/LLVM and binutils mainly because GCC and binutils/gas are
>> maintained separately. I mention Clang/LLVM as they can be configured
>> to either use the host assembler or the builtin integrated assembler.
>>
>> Just to point out, C semantics interpret tokens starting with [0-9]
>> as numbers and [a-zA-Z] as literals, so the current assembler
>> semantics is no different to C semantics. I don’t think it should be
>> changed.
>>
>> Given the assembler already handles the canonical immediate opcodes,
>> I would suggest the minimum change is actually in GCC and that is for
>> it to emit the “i” version for instructions with immediate operands,
>> then the assembly output would match the specification. This change
>> would not require any change in binutils, rather it would require a
>> GCC change. It’s a nit, but it would make the assembler output match
>> the spec. I suspect LLVM may use the canonical form, as its assembler
>> does not need to accept GCC output.
>>
>>> On 23 May 2017, at 9:10 AM, Dave Williams
>>> <dave.will...@gmail.com

Dave Williams

unread,

May 23, 2017, 11:20:25 AM5/23/17

to jcb6...@gmail.com, Michael Clark, RISC-V SW Dev

Can someone explain aliases in this context of base instructions and

how that is used in binutils/GCC? An overview would be helpful.

-Dave

On Mon, May 22, 2017 at 4:42 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:

Michael Clark wrote:

The aliases for immediate versions are unfortunate as it prevents an assembler that reflects on specification metadata, as I tried to do. Even if GCC is changed to emit the “canonical form” for the immediate mode instructions, the aliases will need to be maintained in binutils for some time, perhaps forever.

This is why I proposed a "strict-operands" assembler flag to select the new behavior.

-- Jacob

On 23 May 2017, at 9:41 AM, Michael Clark <michae...@mac.com <mailto:michae...@mac.com>> wrote:

$ for numeric literals and % for register names are required by current gas (gnu assembler) in the default AT&T mode for x86-64.

The main concern will be compatibility between the compiler backends GCC/Clang/LLVM and binutils mainly because GCC and binutils/gas are maintained separately. I mention Clang/LLVM as they can be configured to either use the host assembler or the builtin integrated assembler.

Just to point out, C semantics interpret tokens starting with [0-9] as numbers and [a-zA-Z] as literals, so the current assembler semantics is no different to C semantics. I don’t think it should be changed.

Given the assembler already handles the canonical immediate opcodes, I would suggest the minimum change is actually in GCC and that is for it to emit the “i” version for instructions with immediate operands, then the assembly output would match the specification. This change would not require any change in binutils, rather it would require a GCC change. It’s a nit, but it would make the assembler output match the spec. I suspect LLVM may use the canonical form, as its assembler does not need to accept GCC output.

Michael Clark

unread,

May 23, 2017, 11:50:09 AM5/23/17

to Dave Williams, jcb6...@gmail.com, RISC-V SW Dev

Hi Dave,

- base instructions – the instructions in the Base ISA

- pseudo instructions – the pseudo instructions in the Base ISA that expand to one instruction

- macros – the pseudo instructions in the Base ISA that expand to more than one instruction

- regular aliases – not currently listed in the Base ISA but used internally by GCC

- name overloaded aliases - not currently listed in the Base ISA but used internally by GCC

“pseudo instructions” are synthesised from real instructions using implicit operands. They are listed in the specification in the “pseudo instructions” section. Here is metadata for the main pseudo instructions and descriptions for instructions from the specification:

- https://github.com/rv8-io/rv8/blob/master/meta/pseudos

- https://github.com/rv8-io/rv8/blob/master/meta/opcode-fullnames

e.g. “BNEZ rs1 ,disp” expands to “BNE rs1, x0, disp" where x0 is the hardwired zero register

Almost all “pseudo instructions" have a 1:1 mapping with base instructions however LA, LI, CALL and TAIL are exceptions. These are what I would describe as “macro instructions”.

Regular aliases are just other names for an instructions, pseudo instructions or macros. e.g. “MOVE” is an alias of the “MV” pseudo instruction that expands to the Base ISA instruction “ADDI rd, rs1, 0”

It seems evident that the register immediate instructions have aliases with the same name as the register register variants. i.e. the register immediate versions have “name overloaded aliases“ with the canonical register register operand versions:

- sll slli
- srl srli
- sra srai
- add addi
- addw addiw
- and andi
- xor xori
- or ori
- slt slti
- sltu sltiu

As far as I can tell, this list above are the only instructions that are name overloaded. i.e. the same name is used for two different instructions. These are aliases that alias an existing instruction name.

This is the link Megan provided, which shows the binutils source code, which is authoritative from the perspective of GCC. GCC has metadata for the instructions it emits in riscv.md, and it appears that it uses the name overloaded aliases to select the immediate operand versions of the instructions I’ve listed above.

- https://github.com/riscv/riscv-binutils-gdb/blob/riscv-next/opcodes/riscv-opc.c

Michael.

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CAFx0xxjTXQxpO9tHwdjhcprxC59nMfG1i%2BKF0mjz8j8n3pAdHw%40mail.gmail.com.

Dave Williams

unread,

May 23, 2017, 1:03:54 PM5/23/17

to Michael Clark, Jacob Bachmeyer, RISC-V SW Dev

Michael

Thanks for your detailed reply. I feel guilty that I was not more specific in my original question - I am very familiar with base instructions, pseudoinstructions, and macros. I have written my share of C/C++/assembler on various embedded processors over many years. What I do not understand is why binutils is dealing with the "name overloaded aliases" idea. These instructions are paired by their common high level function. E.g. ADD/ADDI are both adding numbers. Why would an assembler need to group instructions by their common high level function? Hmm...perhaps optimization? GCC can do all manner of optimizations - but an assembler? The only quasi - optimization I can think of for a RISC assembler - is adding NOPs under a branch instruction - or a branch delay slot. Due to the pipelined implementation of the core. All the assembler would do is recognize a branch and add the NOP as the next instruction (you can usually turn this on/off by an assembler pragma). Sure a HLL compiler can utilize this branch delay slot to stuff in another unrelated instruction - and therefore it would have to know how instructions work - but I have not seen that done at the assembler level. Typical thinking is that you drop down into assembler to avoid any HLL optimizations.

Anyway, this alias grouping/pairing - or call it a date structure idea - it may be essential for GCC but I am left to wonder why is this maintained in the low level utilities found in binutils? Since it only increases the maintenance effort. Case-in-point is the recent bug I found in objcopy which appears to be the result of this alias conversion. Looking at the diff it looked like SLT aliased SLTI which in some situations did not work. And the bug was - you had SLTI machine code paired with the SLT mnemonic in the list file.

And now it appears that binutils is constrained severely - there is pushback on changing the assembler because you may have conflicts with the "name overloaded aliases" used in GCC. It may be a huge effort to purge these instruction aliases used in binutils - I'm just trying to understand why they are there.

-Dave

To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CAFx0xxjTXQxpO9tHwdjhcprxC59nMfG1i%2BKF0mjz8j8n3pAdHw%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/F2862C80-96DD-4BE7-9580-B5976D311AD6%40mac.com.

Michael Clark

unread,

May 23, 2017, 1:23:47 PM5/23/17

to Dave Williams, Jacob Bachmeyer, RISC-V SW Dev

Hi Dave,

Okay. No worries. I can fix typos/grammar and reuse my description for the non alias related stuff for an assembly language guide.

Regarding the aliases, one would assume that GCC instruction selection is explicit for register register versus register immediate instructions, and that perhaps for historical reasons the name overloaded versions of the register immediate instructions are used in the GCC instruction metadata.

I think the appropriate step then would be to first make a change in GCC (riscv.md) to use the canonical register immediate instruction names. The current binutils accepts them so this wouldn't cause any problems. Then after some deprecation period the "name overloaded aliases" could potentially be removed. It is a notable difference from the specification, and one would assume the spec is the canon in this case.

Michael

Sent from my iPhone

To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.

To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CAFx0xxhwMxzvy%2BRZOeGGYwZkoN%2B5NbS%2BthxMCpQ_fJ%2BLcdunUA%40mail.gmail.com.

Benjamin Herrenschmidt

unread,

May 23, 2017, 6:52:01 PM5/23/17

to Dave Williams, RISC-V SW Dev

On Mon, 2017-05-22 at 18:34 +0300, Bruce Hoult wrote:
> Yes, subi does not exist, and you get an error message, as appropriate:
>
> Error: illegal operands `sub a0,a0,13'
> Error: unrecognized opcode `subi a0,a0,13'
>
> I'd probably support adding these as aliases for addi rather than removing the add alias for addi :-)

I VERY strongly disagree.

(Bruce Hoult, I had to take you out of the CC list, I'm getting
timeouts trying to reach your domain).

Having maintained an architecture in Linux for years and written more
assembly than I can remember in my life, I have come to despise those
little shortcuts.

The problem is that we do make mistake. Even the best of us do. The
resulting bugs however can be extremely sneaky and hard to debug, can
hit you on the field years after you wrote them. And are hard to "spot"
by reading the assembly as well.

Call me an old fart, but I have grown over time to be very strongly in
favor of static checking, and that means having *precise* semantics,
and failing at build time whenever possible.

Thus for what it's worth, I cast my vote for at least having an option
to forbid all those fancy aliases, and to use it for at least the Linux
kerne and glibc.

Cheers,
Ben.

Benjamin Herrenschmidt

unread,

May 23, 2017, 6:55:00 PM5/23/17

to Michael Clark, Dave Williams, RISC-V SW Dev, jcb6...@gmail.com

On Tue, 2017-05-23 at 09:41 +1200, Michael Clark wrote:
> Given the assembler already handles the canonical immediate opcodes,
> I would suggest the minimum change is actually in GCC and that is for
> it to emit the “i” version for instructions with immediate operands,
> then the assembly output would match the specification. This change
> would not require any change in binutils, rather it would require a
> GCC change. It’s a nit, but it would make the assembler output match
> the spec. I suspect LLVM may use the canonical form, as its assembler
> does not need to accept GCC output.

I would still strongly advocate changing binutils to (at least
optionally) reject an "add" with an immediate.

See my other message on this. That sort of typo happens and is very
very hard to debug, we should be as strict as possible in assembly.

Cheers,
Ben.

Michael Clark

unread,

May 23, 2017, 8:17:08 PM5/23/17

to Benjamin Herrenschmidt, Dave Williams, RISC-V SW Dev, jcb6...@gmail.com

I believe this is the complete set of register immediate aliases with register register "name overloads”:

- sll slli
- srl srli
- sra srai
- add addi
- addw addiw
- and andi
- xor xori
- or ori
- slt slti
- sltu sltiu

This alias could perhaps be removed too (as “mv” is the canonical pseudo instruction):

- move mv
- jr jalr

I have absolutely no objection to removing or selectively enabling these aliases via an option: -mdeprecated-aliases) as it makes the assembler precisely match the ISA specification.

There are some other more subtle argument overloads where implicit operands are filled. I believe jal and jalr are accepted without the link register, which is then implicitly set to “ra”.

- jal
- jalr

However, as I mentioned earlier, the first change needs to be made in GCC riscv.md as it is emitting the register register aliases with immediate values when it should really be using the “i” suffixed versions. If there is an option in the assembler, then the specs could be modified to add -mdeprecated-aliases in the interim or ideally the metadata should be updated to emit the canonical versions.

It would be nice for the assembler to precisely match the spec!

Michael.

Michael Clark

unread,

May 23, 2017, 8:25:14 PM5/23/17

to Benjamin Herrenschmidt, Dave Williams, RISC-V SW Dev, jcb6...@gmail.com

On 24 May 2017, at 12:17 PM, Michael Clark <michae...@mac.com> wrote:

This alias could perhaps be removed too (as “mv” is the canonical pseudo instruction):

- move mv
- jr jalr

Sorry, /these aliases/; I notice “jr” while writing the email and missed editing the preceding context.

Andrew Waterman

unread,

May 23, 2017, 8:40:06 PM5/23/17

to Michael Clark, Benjamin Herrenschmidt, Dave Williams, RISC-V SW Dev, Jacob Bachmeyer

To maintain backwards compatibility, we (the gas port maintainers)
won't change gas to disable these aliases by default. Optionally
disabling them with -mno-aliases or somesuch is a viable alternative,
but we don't have plans to add that feature.

I understand the argument for not inferring the 'i' forms of the
instructions by default, but RISC-V's gas port is far from alone in
supporting such shortcuts. My preference is to bring the assembler
into compliance by defining those mnemonics in the assembly
programmer's manual. They could still be disabled with -mno-aliases
when they are not desired.

> --
> You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
> To post to this group, send email to sw-...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/04E9C2E1-9F5A-4185-9774-2F32CE006A21%40mac.com.

Benjamin Herrenschmidt

unread,

May 23, 2017, 9:14:06 PM5/23/17

to Andrew Waterman, Michael Clark, Dave Williams, RISC-V SW Dev, Jacob Bachmeyer

On Tue, 2017-05-23 at 17:39 -0700, Andrew Waterman wrote:
> To maintain backwards compatibility, we (the gas port maintainers)
> won't change gas to disable these aliases by default. Optionally
> disabling them with -mno-aliases or somesuch is a viable alternative,
> but we don't have plans to add that feature.

I would strongly recommend that you add such a plan. I understand
the need for backward compatibility for the short and medium terms, but
there is real value in enforcing strong checking.

> I understand the argument for not inferring the 'i' forms of the
> instructions by default, but RISC-V's gas port is far from alone in
> supporting such shortcuts.

Trust my experience here, all those shortcuts will bring you in the
long run is to let the assembler silently assembly typos which will
result if very nasty and hard to find bugs in production code.

> My preference is to bring the assembler
> into compliance by defining those mnemonics in the assembly
> programmer's manual. They could still be disabled with -mno-aliases
> when they are not desired.

I would go further than that. I would deprecate the aliases. IE, Add an
-mno-aliases now, then, maybe in 2 years, make it the default.

With the amount of effort going accross the board to build mechanisms
to catch bugs at build time (static checking, gcc plugins, etc....)
this is a rather blatant hole ;-)

Cheers,
Ben.

Benjamin Herrenschmidt

unread,

May 23, 2017, 9:20:12 PM5/23/17

to Andrew Waterman, Michael Clark, Dave Williams, RISC-V SW Dev, Jacob Bachmeyer

On Wed, 2017-05-24 at 11:13 +1000, Benjamin Herrenschmidt wrote:
> Trust my experience here, all those shortcuts will bring you in the
> long run is to let the assembler silently assembly typos which will
> result if very nasty and hard to find bugs in production code.

2 typos in that sentence alone, looks like I need more coffee..

Ben.

Dave Williams

unread,

May 23, 2017, 10:07:19 PM5/23/17

to Benjamin Herrenschmidt, Andrew Waterman, Michael Clark, RISC-V SW Dev, Jacob Bachmeyer

Ben

We are of the same mind. And up until now, this alias feature is an undocumented feature of the assembler. As mentioned previously, I stumbled into this feature when I executed a typo and had to experiment with test cases to fully define what instructions have this alias feature. And don't forget that SUB does not have a corresponding SUBI.

FYI, that we had a very similar debate on GitHub riscv/riscv-binutils-gdb #79 and one action taken was this feature needed to be documented. So, 'sorear' opened an issue on documenting this alias feature in riscv/riscv-isa-manual #60.

I am confused to Andrew's last post - mno-aliases will not be supported but then later he said it will be supported with -mno-aliases? I would appreciate some clarification.

Dave

Jacob Bachmeyer

unread,

May 23, 2017, 11:28:59 PM5/23/17

to Benjamin Herrenschmidt, Andrew Waterman, Michael Clark, Dave Williams, RISC-V SW Dev

Benjamin Herrenschmidt wrote:
> On Tue, 2017-05-23 at 17:39 -0700, Andrew Waterman wrote:
>
>> To maintain backwards compatibility, we (the gas port maintainers)
>> won't change gas to disable these aliases by default. Optionally
>> disabling them with -mno-aliases or somesuch is a viable alternative,
>> but we don't have plans to add that feature.
>>
>
> I would strongly recommend that you add such a plan. I understand
> the need for backward compatibility for the short and medium terms, but
> there is real value in enforcing strong checking.
>
>
>> I understand the argument for not inferring the 'i' forms of the
>> instructions by default, but RISC-V's gas port is far from alone in
>> supporting such shortcuts.
>>
>
> Trust my experience here, all those shortcuts will bring you in the
> long run is to let the assembler silently assembly typos which will
> result if very nasty and hard to find bugs in production code.
>
>
>> My preference is to bring the assembler
>> into compliance by defining those mnemonics in the assembly
>> programmer's manual. They could still be disabled with -mno-aliases
>> when they are not desired.
>>
>
> I would go further than that. I would deprecate the aliases. IE, Add an
> -mno-aliases now, then, maybe in 2 years, make it the default.
>
> With the amount of effort going accross the board to build mechanisms
> to catch bugs at build time (static checking, gcc plugins, etc....)
> this is a rather blatant hole ;-)
>

I still advocate a "strict-operands" option, preferably available as
".option strict-operands" and -mstrict-operands that would require the
use of prefix characters for immediates and possibly registers. This
would at least make accidentally turning a register-register instruction
into a register-immediate instruction harder.

-- Jacob

Michael Clark

unread,

May 24, 2017, 12:19:12 AM5/24/17

to jcb6...@gmail.com, Benjamin Herrenschmidt, Andrew Waterman, Dave Williams, RISC-V SW Dev

It's not an issue if the aliases are not present (-mno-aliases) as all instructions taking an immediate operand have an "i" suffix.

Normalisation of the tools with the spec over a reasonable period of time or documentation of the aliases seem to be the available options.

There is a lot of asm there already and most of it is written based on what is documented in the spec, versus these undocumented aliases, which is why they come as a surprise.

Changing the assembler immediate format at this stage is not a reasonable option.

> --
> You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
> To post to this group, send email to sw-...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/5924FDF7.3020708%40gmail.com.

Samuel Falvo II

unread,

May 24, 2017, 12:31:27 AM5/24/17

to Michael Clark, Jacob Bachmeyer, Benjamin Herrenschmidt, Andrew Waterman, Dave Williams, RISC-V SW Dev

Why not just put some effort into a new assembler, something along the
lines of NASM or FASM, but with RISC-V syntax? I've written my own
assembler in Python that I use for my Kestrel development, and it
wasn't unbearably hard to write. It does not offer the syntax
checking you are looking for (registers are just numeric constants, so
slli x0, x0, x0 is valid syntax since "x0 equ 0" elsewhere in the
source listing), but it is not hard to adapt.

At the end of the day, we must remember that gas is *not* intended for
human consumption. It's a compiler target. For human programming,
one should use an assembler designed with human needs in mind.

I've been thinking about rewriting the assembler in Shen Lisp, taking
advantage of its built-in parser support, and as well perhaps letting
me re-use Shen's support for macros. If I undertook such an endeavor,
I would definitely take the time to alter the syntax a bit to make
finding common errors easier.

> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/084CF316-5172-4254-8186-8DA20DF23A50%40mac.com.

--
Samuel A. Falvo II

Andrew Waterman

unread,

May 24, 2017, 12:51:48 AM5/24/17

to Jacob Bachmeyer, Benjamin Herrenschmidt, Michael Clark, Dave Williams, RISC-V SW Dev

I've prototyped the -mstrict-operands option:
https://github.com/riscv/riscv-binutils-gdb/pull/81

(I did not add a .option directive because we are trying to avoid
those in general.)

Jacob Bachmeyer

unread,

May 24, 2017, 1:20:14 AM5/24/17

to Andrew Waterman, Benjamin Herrenschmidt, Michael Clark, Dave Williams, RISC-V SW Dev

Andrew Waterman wrote:
> I've prototyped the -mstrict-operands option:
> https://github.com/riscv/riscv-binutils-gdb/pull/81
>
> (I did not add a .option directive because we are trying to avoid
> those in general.)
>

While that is not the "strict-operands" option I proposed, I agree that
it is a more appropriate meaning for "strict-operands" and revise my
previous proposal to use "operand-prefixes" instead.

Since the prototype "strict-operands" does not actually change the
syntax, an .option directive to enable a source file to document the
intended mode is unneeded for "strict-operands". The "operand-prefixes"
I propose would need an .option directive and I am unsure if it is still
relevant if "strict-operands" is added.

(I still advocate at least optional register and immediate prefixes, but
"strict-operands" seems to solve the worst of the problem. A
one-character typo can no longer convert ADD to ADDI.)

-- Jacob

Michael Clark

unread,

May 24, 2017, 5:27:15 PM5/24/17

to Andrew Waterman, Jacob Bachmeyer, Benjamin Herrenschmidt, Dave Williams, RISC-V SW Dev

Looks good!

I’m not fussed about the option name. -mstrict-operands is semantically correct.

It would be really nice if GCC was also emitting canonical RISC-V assembly. The nice thing is that the assembler accepts the canonical form so it will be a backwards compatible change.

I have a question about the GCC instruction selection metadata. I see expressions like this:

(define_insn "addsi3"
[(set (match_operand:SI 0 "register_operand" "=r,r")
(plus:SI (match_operand:SI 1 "register_operand" " r,r")
(match_operand:SI 2 "arith_operand" " r,I")))]
""
{ return TARGET_64BIT ? "addw\t%0,%1,%2" : "add\t%0,%1,%2"; }
[(set_attr "type" "arith")
(set_attr “mode” “SI”)
])

Is “arith_operand” a more general type that matches either a “register_operand” or “immediate_operand”?

It seems then that we would need to split these “arith_operand” definitions into two definitions that specify either “register_operand” or “immediate_operand” instead of “arith_operand”?

(define_insn "addsi3"
[(set (match_operand:SI 0 "register_operand" "=r,r")
(plus:SI (match_operand:SI 1 "register_operand" " r,r")
(match_operand:SI 2 “register_operand" " r,r")))]
""
{ return TARGET_64BIT ? "addw\t%0,%1,%2" : "add\t%0,%1,%2"; }
[(set_attr "type" "arith")
(set_attr “mode” "SI")])

(define_insn "addsi3"
[(set (match_operand:SI 0 "register_operand" "=r,r")
(plus:SI (match_operand:SI 1 "register_operand" " r,r")
(match_operand:SI 2 “immediate_operand” “ r,I")))]
""
{ return TARGET_64BIT ? "addiw\t%0,%1,%2" : "addi\t%0,%1,%2"; }
[(set_attr "type" "arith")
(set_attr “mode” "SI")])

I see the use of “immediate_operand” (not in riscv.md) bit in some specs files and I see “nonimmediate_operand” in riscv.md. In which case would we use “nonimmediate_operand” instead of “register_operand”?

I guess it needs some experimentation to check the codegen.

If someone gives me some guidance, I don’t mind coming up with a patch and testing it with GCC torture.

> --
> You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
> To post to this group, send email to sw-...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CA%2B%2B6G0D8UfYoPi25%3D7FRhTqePBxjTCXqDfvpLrDfiAPVxXOEGg%40mail.gmail.com.

Andrew Waterman

unread,

May 24, 2017, 5:40:57 PM5/24/17

to Michael Clark, Jacob Bachmeyer, Benjamin Herrenschmidt, Dave Williams, RISC-V SW Dev

On Wed, May 24, 2017 at 2:27 PM, Michael Clark <michae...@mac.com> wrote:
> Looks good!
>
> I’m not fussed about the option name. -mstrict-operands is semantically correct.
>
> It would be really nice if GCC was also emitting canonical RISC-V assembly. The nice thing is that the assembler accepts the canonical form so it will be a backwards compatible change.
>
> I have a question about the GCC instruction selection metadata. I see expressions like this:
>
> (define_insn "addsi3"
> [(set (match_operand:SI 0 "register_operand" "=r,r")
> (plus:SI (match_operand:SI 1 "register_operand" " r,r")
> (match_operand:SI 2 "arith_operand" " r,I")))]
> ""
> { return TARGET_64BIT ? "addw\t%0,%1,%2" : "add\t%0,%1,%2"; }
> [(set_attr "type" "arith")
> (set_attr “mode” “SI”)
> ])
>
> Is “arith_operand” a more general type that matches either a “register_operand” or “immediate_operand”?

Yeah, see predicates.md.

>
> It seems then that we would need to split these “arith_operand” definitions into two definitions that specify either “register_operand” or “immediate_operand” instead of “arith_operand”?

We certainly do not want to split them up.

You can avoid doing so by adding a mode modifier, i, which prints the
letter 'i' if the corresponding operand is a constant. You'd change
the patterns to things like "add%i2\t%0,%1,%2", where %i2 means "print
the letter 'i' if operand 2 is a constant." The mode modifier is
implemented by adding another case to riscv_print_operand in riscv.c.

>
> (define_insn "addsi3"
> [(set (match_operand:SI 0 "register_operand" "=r,r")
> (plus:SI (match_operand:SI 1 "register_operand" " r,r")
> (match_operand:SI 2 “register_operand" " r,r")))]
> ""
> { return TARGET_64BIT ? "addw\t%0,%1,%2" : "add\t%0,%1,%2"; }
> [(set_attr "type" "arith")
> (set_attr “mode” "SI")])
>
> (define_insn "addsi3"
> [(set (match_operand:SI 0 "register_operand" "=r,r")
> (plus:SI (match_operand:SI 1 "register_operand" " r,r")
> (match_operand:SI 2 “immediate_operand” “ r,I")))]
> ""
> { return TARGET_64BIT ? "addiw\t%0,%1,%2" : "addi\t%0,%1,%2"; }
> [(set_attr "type" "arith")
> (set_attr “mode” "SI")])
>
> I see the use of “immediate_operand” (not in riscv.md) bit in some specs files and I see “nonimmediate_operand” in riscv.md. In which case would we use “nonimmediate_operand” instead of “register_operand”?

nonimmediate_operand also includes memory_operands.

> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/753B753C-D2D8-4234-85B5-23B6C4ECFAD4%40mac.com.

Michael Clark

unread,

May 24, 2017, 5:46:07 PM5/24/17

to Andrew Waterman, Jacob Bachmeyer, Benjamin Herrenschmidt, Dave Williams, RISC-V SW Dev

On 25 May 2017, at 9:40 AM, Andrew Waterman <and...@sifive.com> wrote:

On Wed, May 24, 2017 at 2:27 PM, Michael Clark <michae...@mac.com> wrote:
Looks good!

I’m not fussed about the option name. -mstrict-operands is semantically correct.

It would be really nice if GCC was also emitting canonical RISC-V assembly. The nice thing is that the assembler accepts the canonical form so it will be a backwards compatible change.

I have a question about the GCC instruction selection metadata. I see expressions like this:

(define_insn "addsi3"
[(set (match_operand:SI          0 "register_operand" "=r,r")
       (plus:SI (match_operand:SI 1 "register_operand" " r,r")
                (match_operand:SI 2 "arith_operand"    " r,I")))]
""
{ return TARGET_64BIT ? "addw\t%0,%1,%2" : "add\t%0,%1,%2"; }
[(set_attr "type" "arith")
  (set_attr “mode” “SI”)
])

Is “arith_operand” a more general type that matches either a “register_operand” or “immediate_operand”?

Yeah, see predicates.md.

It seems then that we would need to split these “arith_operand” definitions into two definitions that specify either “register_operand” or “immediate_operand” instead of “arith_operand”?

We certainly do not want to split them up.

You can avoid doing so by adding a mode modifier, i, which prints the
letter 'i' if the corresponding operand is a constant. You'd change
the patterns to things like "add%i2\t%0,%1,%2", where %i2 means "print
the letter 'i' if operand 2 is a constant." The mode modifier is
implemented by adding another case to riscv_print_operand in riscv.c.

Fair enough. That makes sense.

It’s a matter of knowing how these GCC RTL expressions generate target instructions. If the “i” can be added via substitution using one pattern then it keeps the metadata concise.

Dave Williams

unread,

May 24, 2017, 6:35:20 PM5/24/17

to Michael Clark, Andrew Waterman, Jacob Bachmeyer, Benjamin Herrenschmidt, RISC-V SW Dev

On Wed, May 24, 2017 at 3:45 PM, Michael Clark <michae...@mac.com> wrote:

On 25 May 2017, at 9:40 AM, Andrew Waterman <and...@sifive.com> wrote:

On Wed, May 24, 2017 at 2:27 PM, Michael Clark <michae...@mac.com> wrote:
Looks good!

I’m not fussed about the option name. -mstrict-operands is semantically correct.

It would be really nice if GCC was also emitting canonical RISC-V assembly.

When you say canonical RISC-V assembly do you mean emit base instructions only? (I am trying to synch up with your terminology). When I build with the RiscV GCC C compiler - I see base instructions and peudoinstructions in the list file output. Here's a snippet cut from a list file that contains many pseudoinstructions.

fc:   93070000             li   x15,0
100:   13850700             mv   x10,x15
104:   8320c134             lw   x1,844(x2)
108:   03248134             lw   x8,840(x2)
10c:   13010135             addi   x2,x2,848
110:   67800000             ret

Of course the pseudoinstructions like 'mv' will decompose into a base instruction.

Dave

To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CA%2B%2B6G0D8UfYoPi25%3D7FRhTqePBxjTCXqDfvpLrDfiAPVxXOEGg%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.

Andrew Waterman

unread,

May 24, 2017, 6:42:03 PM5/24/17

to Dave Williams, Michael Clark, Jacob Bachmeyer, Benjamin Herrenschmidt, RISC-V SW Dev

The canonical set includes sanctioned pseudoinstructions (like mv) but
excludes the lazy, unsanctioned ones (like add x0, x0, 1).

>> email to sw-dev+un...@groups.riscv.org.

>> To post to this group, send email to sw-...@groups.riscv.org.
>> Visit this group at
>> https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
>> To view this discussion on the web visit
>> https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/CA%2B%2B6G0D8UfYoPi25%3D7FRhTqePBxjTCXqDfvpLrDfiAPVxXOEGg%40mail.gmail.com.
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "RISC-V SW Dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an

>> email to sw-dev+un...@groups.riscv.org.

Michael Clark

unread,

May 24, 2017, 6:42:22 PM5/24/17

to Dave Williams, Andrew Waterman, Jacob Bachmeyer, Benjamin Herrenschmidt, RISC-V SW Dev

On 25 May 2017, at 10:35 AM, Dave Williams <dave.will...@gmail.com> wrote:

On Wed, May 24, 2017 at 3:45 PM, Michael Clark <michae...@mac.com> wrote:

On 25 May 2017, at 9:40 AM, Andrew Waterman <and...@sifive.com> wrote:

On Wed, May 24, 2017 at 2:27 PM, Michael Clark <michae...@mac.com> wrote:
Looks good!

I’m not fussed about the option name. -mstrict-operands is semantically correct.

It would be really nice if GCC was also emitting canonical RISC-V assembly.
When you say canonical RISC-V assembly do you mean emit base instructions only? (I am trying to synch up with your terminology). When I build with the RiscV GCC C compiler - I see base instructions and peudoinstructions in the list file output. Here's a snippet cut from a list file that contains many pseudoinstructions.

fc:   93070000             li   x15,0
100:   13850700             mv   x10,x15
104:   8320c134             lw   x1,844(x2)
108:   03248134             lw   x8,840(x2)
10c:   13010135             addi   x2,x2,848
110:   67800000             ret

Of course the pseudoinstructions like ‘mv' will decompose into a base instruction.

Canonical refers to the canon which is the specification. i.e. RISC-V ISA Manual.

That is objdump output so you are not seeing what the compiler is emitting. The compiler is using the name overloaded aliases which are not document in the specification. i.e. non canonical RISC-V assembly.

You need to compile with -S to see what the compiler is emitting. e.g.

$ cat foo.c
int foo(int a)
{
return a + 11;
}
$ riscv64-unknown-elf-gcc -S foo.c -o foo.s
$ cat foo.s
.file "foo.c"
.option nopic
.text
.align 1
.globl foo
.type foo, @function
foo:
add sp,sp,-32
sd s0,24(sp)
add s0,sp,32
mv a5,a0
sw a5,-20(s0)
lw a5,-20(s0)
addw a5,a5,11 # <— notice the addw with immediate instead of addiw
sext.w a5,a5
mv a0,a5
ld s0,24(sp)
add sp,sp,32
jr ra
.size foo, .-foo
.ident "GCC: (GNU) 7.0.1 20170321 (experimental)"

Dave Williams

unread,

May 24, 2017, 8:16:13 PM5/24/17

to Michael Clark, Andrew Waterman, Jacob Bachmeyer, Benjamin Herrenschmidt, RISC-V SW Dev

I did not find a list of canonical instructions in the RiscV ISA Manual v2.2. I searched for 'canonical' and found a few instances that did not reference specific instructions. I only found canonical NOP and NaN references. Sorry, but keep in mind I am not trying the distract the focus on the -mstrict-operands

review.

Michael - your code example is excellent and the 'addw' line really underscores this alias discussion. And further motivates the need to document this feature (nice work - hard to beat the value of an good example). I invoked GCC with the -c flag and that produces ELF output and I forgot about the -S flag.

Dave

Tommy Murphy

unread,

May 28, 2017, 5:41:05 AM5/28/17

to RISC-V SW Dev, dave.will...@gmail.com

Me too.

This is what I get:

Delivery incomplete

There was a temporary problem delivering your message to br...@hoult.org. Gmail will retry for 44 more hours. You'll be notified if the delivery fails permanently.

The response was:

DNS Error: 4815150 DNS type 'mx' lookup of hoult.org responded with code SERVFAIL

Bruce Hoult

unread,

May 28, 2017, 5:51:06 AM5/28/17

to Tommy Murphy, RISC-V SW Dev, Dave Williams

Yeah :-( The site itself is up if you put it in your hosts file, but the 3rd party DNS has disappeared. I'm looking at how to replace that today -- long time since I had to play with that stuff. I'm reachable at bruce...@gmail.com if anyone cares.

--

You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/725cba38-6979-4e5d-ac67-a7be373c23b1%40groups.riscv.org.

Bruce Hoult

unread,

May 28, 2017, 8:24:28 AM5/28/17

to Tommy Murphy, RISC-V SW Dev, Dave Williams

Ok, appear to have unborked it by going to free DNS service at the name registry instead of my (MIA) friend's server. Sorry for the bounces.

On Sun, May 28, 2017 at 12:51 PM, Bruce Hoult <br...@hoult.org> wrote:

Yeah :-( The site itself is up if you put it in your hosts file, but the 3rd party DNS has disappeared. I'm looking at how to replace that today -- long time since I had to play with that stuff. I'm reachable at bruce...@gmail.com if anyone cares.

On Sun, May 28, 2017 at 12:41 PM, Tommy Murphy <tommy_...@hotmail.com> wrote:

Me too.
This is what I get:
Delivery incomplete
There was a temporary problem delivering your message to br...@hoult.org. Gmail will retry for 44 more hours. You'll be notified if the delivery fails permanently.
The response was:
DNS Error: 4815150 DNS type 'mx' lookup of hoult.org responded with code SERVFAIL

On Tuesday, 23 May 2017 23:52:01 UTC+1, Benjamin Herrenschmidt wrote:
(Bruce Hoult, I had to take you out of the CC list, I'm getting
timeouts trying to reach your domain).

--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.

To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

Michael Clark

unread,

May 29, 2017, 9:48:49 PM5/29/17

to Andrew Waterman, Jacob Bachmeyer, Benjamin Herrenschmidt, Dave Williams, RISC-V SW Dev

Hi Andrew,

I have tested a patch that changes gcc to emitting addi using the approach you suggested:

- https://gist.github.com/michaeljclark/58fb3be4d6da7964d5683ec5e4fc7ed0

It works, but it’s only a test at this stage, as we would need to implement the other instructions. slt and sltu can be changed in the same way as add, however sll, srl, sra, and, xor and or do not have explicit definitions like the add and slt instructions, rather they are defined like this:

;; <insn> expands to the name of the insn that implements a particular code.
(define_code_attr insn [(ashift "sll")
  (ashiftrt "sra")
  (lshiftrt "srl")
  (div "div")
  (mod "rem")
  (udiv "divu")
  (umod "remu")
  (ior "or")
  (xor "xor")
  (and "and")
  (plus "add")
  (minus “sub")])

Any idea how we would handle slli, srli, srai, andi, xori and ori?

Michael.

On 25 May 2017, at 9:40 AM, Andrew Waterman <and...@sifive.com> wrote:

Andrew Waterman

unread,

May 29, 2017, 10:49:40 PM5/29/17

to Michael Clark, Jacob Bachmeyer, Benjamin Herrenschmidt, Dave Williams, RISC-V SW Dev

On Mon, May 29, 2017 at 6:48 PM, Michael Clark <michae...@mac.com> wrote:
> Hi Andrew,
>
> I have tested a patch that changes gcc to emitting addi using the approach
> you suggested:
>
> - https://gist.github.com/michaeljclark/58fb3be4d6da7964d5683ec5e4fc7ed0
>
> It works, but it’s only a test at this stage, as we would need to implement
> the other instructions. slt and sltu can be changed in the same way as add,
> however sll, srl, sra, and, xor and or do not have explicit definitions like
> the add and slt instructions, rather they are defined like this:
>
> ;; <insn> expands to the name of the insn that implements a particular code.
> (define_code_attr insn [(ashift "sll")
> (ashiftrt "sra")
> (lshiftrt "srl")
> (div "div")
> (mod "rem")
> (udiv "divu")
> (umod "remu")
> (ior "or")
> (xor "xor")
> (and "and")
> (plus "add")
> (minus “sub")])
>
>
> Any idea how we would handle slli, srli, srai, andi, xori and ori?

If I understand the issue correctly, the same approach should work.
For example, in the define_insn for andi/xor/ori, line 940, something
like "<insn>%i2\t%0,%1,%2" should do the right thing.

Michael Clark

unread,

May 29, 2017, 10:52:29 PM5/29/17

to Andrew Waterman, Jacob Bachmeyer, Benjamin Herrenschmidt, Dave Williams, RISC-V SW Dev

Okay. Got it. I didn’t see that substitution.

I’ll try and spin a complete patch.

Michael Clark

unread,

May 30, 2017, 12:39:22 AM5/30/17

to Andrew Waterman, Jacob Bachmeyer, Benjamin Herrenschmidt, Dave Williams, RISC-V SW Dev

I’ve compiled and tested this change:

- https://github.com/riscv/riscv-gcc/pull/75

It should be complete. I spun a first rev of the change, compiled it and checked asm output of a few programs and ran a small benchmark suite compiled with this toolchain. I noticed the zero extension andi,load was emitting and instead of andi so i’ve updated this too. Interestingly the comments use andi.

(define_insn "zero_extendqi<SUPERQI:mode>2"
  [(set (match_operand:SUPERQI 0 "register_operand"   "=r,r")
  (zero_extend:SUPERQI
  (match_operand:QI 1 "nonimmediate_operand" " r,m")))]
  ""
  "@
andi\t%0,%1,0xff
lbu\t%0,%1"
  [(set_attr "move_type" "andi,load")
(set_attr "mode" "<SUPERQI:MODE>")])

After this change, GCC should be emitting the same instruction names as the ISA manual.

Michael.

Reply all

Reply to author

Forward