RISC-V Compressed Instruction Set Manual Version 1.9 - C.J, C.JAL Immediate format

249 views
Skip to first unread message

Michael Clark

unread,
Jan 13, 2016, 4:04:20 AM1/13/16
to RISC-V ISA Specification Discussion
Hello,

I'm slowly working away on an instruction decoder / disassembler based on the RISC-V specifications and am cross checking it against output from objdump in the riscv-gnu-toolchain. I am through the main ISA and am know working on RVC. I have been having issues with C.J and C.JAL immediate decoding. Just sharing observations as I come across them...

See Section: 1.5 Control Transfer Instructions


������ RISC-V Compressed Instruction Set Manual Version 1.7

������ ������ C.J, C.JAL imm11�� offset[11:6|4:1|5]

������ RISC-V Compressed Instruction Set Manual Version 1.9

������ ������ C.J, C.JAL imm11�� offset[11|4|9:8|10|6|7|3:1|5]

������ https://github.com/riscv/riscv-gnu-toolchain (current code)

������ ������ C.J, C.JAL imm11�� offset[11|7|9:8|5|6|1|10|4:2]


The last tabulation I have created by hand from reversing the macro in riscv-gnu-toolchain. Please bear with me as I may have made mistakes in my analysis. There appears to be a discrepancy to me because the lower 3 bits in the macro is a range of 3 bits however in the 1.9 spec there is a single bit in the LSB position with a range of 3 bits to its right. The rest of my tabulation needs cross checking.

I will write some automated tests to cross check my implementation against the risc-v header decoder as an independent check. Ideally a template<> that takes an encoding similar to the specification encoding so we can verify the spec.


Here is an extract from the binutils source for the RVC CJ format decoder:

�� https://github.com/riscv/riscv-gnu-toolchain/blob/master/binutils/include/opcode/riscv.h

#define RV_X(x, s, n) (((x) >> (s)) & ((1 << (n)) - 1))

#define EXTRACT_RVC_J_IMM(x) \
�� ((RV_X(x, 3, 3) << 1) | \
��  (RV_X(x, 11, 1) << 4) | \
��  (RV_X(x, 2, 1) << 5) | \
��  (RV_X(x, 7, 1) << 6) | \
 �� (RV_X(x, 6, 1) << 7) | \
��  (RV_X(x, 9, 2) << 8) | \
��  (RV_X(x, 8, 1) << 10) | \
��  (-RV_X(x, 12, 1) << 11))

As I mentioned, I am attempting to write code from the spec (except in rare cases where I have needed to defer to the code). I am using a single shift and mask approach, and eliminating redundant shifts, which is a little different to the macros in the riscv headers. I have looked at the gcc -O3 assembler output and it produces more compact code. The gcc optimizer can't seem to fold the shifts given the bitwise-and in between. I'm especially interested in fast software decode performance as I plan to use the decoder in a binary translator (RISCV -> RISCV). When I get time I'll write some tests that loop over the full range of immediate values for each of the encodings, profile and cross check my implementation with the riscv implementation, and when I am far enough along I will be happy to share code (BSD or Dual BSD/GPLv2).

Here is how I have represented the CJ immediate decode:

template <typename T, unsigned B>
inline T sign_extend(const T x)
{
    struct {
        T x:B;
    } s;
    return s.x = x;
}

imm11=
  ((inst >> 2) & 0b10110100000) |
  ((inst << 3) & 0b01000000000) |
  ((inst >> 0) & 0b00001000000) |
  ((inst << 2) & 0b00000010000) |
  ((inst >> 8) & 0b00000001000) |
  ((inst >> 3) & 0b00000000111)

imm11 = sign_extend<signed int,11>(immm11) << 1

Just out of curiosity sake, below is the gcc -O3 x86_64 assembly output for the two approaches. Not a criticism. A disassembler doesn't need the decode performance of a binary translator. In fact CJ format will be quite tough for binary translators! :-) The 1.7 spec encoding was much easier in software :-(

single shift macro (20 instructions)

	mov	eax, edi
	shr	eax, 2
	and	eax, 1440
	mov	ecx, edi
	and	ecx, 64
	lea	esi, [8*rcx]
	lea	r8d, [4*rdi]
	and	r8d, 16
	mov	edx, edi
	shr	edx, 8
	and	edx, 8
	shr	edi, 3
	and	edi, 7
	or	esi, ecx
	or	esi, eax
	or	esi, r8d
	or	esi, edx
	or	esi, edi
	shl	esi, 21
	sar	esi, 20


riscv-gnu-toolchain macro (29 instructions)

	mov	eax, edi
	shr	eax, 2
	and	eax, 14
	mov	ecx, edi
	shr	ecx, 7
	and	ecx, 16
	or	ecx, eax
	lea	eax, [8*rdi]
	and	eax, 32
	or	eax, ecx
	mov	ecx, edi
	shr	ecx
	mov	edx, ecx
	and	edx, 64
	or	eax, edx
	mov	edx, edi
	and	edx, 64
	shl	edx, 1
	or	eax, edx
	and	ecx, 768
	or	eax, ecx
	lea	esi, [4*rdi]
	and	esi, 1024
	or	esi, eax
	shr	edi, 12
	and	edi, 1
	neg	edi
	shl	edi, 11
	or	esi, edi

Regards,
Michael

Michael Clark

unread,
Jan 13, 2016, 8:44:24 AM1/13/16
to Samuel Falvo II, RISC-V ISA Specification Discussion
Thanks. yes it's interesting. I can experiment. Using the bitfield sign extension template gcc decides to do this.


shl	esi, 21
sar	esi, 20

I'll try on clang x86 and with gcc and clang on riscv and see what the code gen does...

I found the bitfield method here: https://graphics.stanford.edu/~seander/bithacks.html#FixedSignExtend

On 13/1/16 10:56 pm, Samuel Falvo II wrote:

Not sure if this helps or is relevant, but here's how I implement sign extension in my emulator.�� If I want to sign extend x from bit 11, then I'd write x | -(x & 2048).�� This should expand to at most three instructions (maybe 4 on RISC-V).

Michael Clark

unread,
Jan 13, 2016, 8:57:17 AM1/13/16
to Samuel Falvo II, RISC-V ISA Specification Discussion
On 14/1/16 2:44 am, Michael Clark wrote:
> Thanks. yes it's interesting. I can experiment. Using the bitfield
> sign extension template gcc decides to do this.
>
> shl esi, 21
> sar esi, 20

Although i had the imm11 in the lower 11 bits and gcc has folded the <<
1 (for CJ format) into the shl/sar for the sign extension.

Andrew Waterman

unread,
Jan 14, 2016, 5:20:37 PM1/14/16
to Michael Clark, RISC-V ISA Specification Discussion
I am confused. By my reading, the spec (v1.9) is consistent with the
macro that you quoted. Spike matches, too.

Michael Clark

unread,
Jan 14, 2016, 8:16:46 PM1/14/16
to Andrew Waterman, isa...@lists.riscv.org

C.J, C.JAL imm11 offset[11|4|9:8|10|6|7|3:1|5]


I see where my confusion is now. It's with my reading of the notation. I was expecting a range of 3 bits in the right hand position based on reading the macro. 0b1110


Now I understand the convention. I didn't have this problem with the notation in the ISA Spec. It's a little confusing as I was interpreting the offset in the spec encoding as if it where in the physical bit location. i.e. reading right to left, bit 5 as the LSB.


It does actually say 3:1 only in my read it was in position 4:2 and bit 5 was in the LSB (not the LST at bit offset 5).


The Compressed Spec could do with some text on the immediate offset encoding notation (or I have overlooked it). Feedback: I found it much harder to read than the main ISA spec. If I can come up with some concise text describing the notation I will share it.


I follow now and my decoder is working.  Thanks for taking the time.


$ more immj.c

#include <stdio.h>


#define RV_X(x, s, n) (((x) >> (s)) & ((1 << (n)) - 1))


#define EXTRACT_RVC_J_IMM(x) \

((RV_X(x, 3, 3) << 1) | \

(RV_X(x, 11, 1) << 4) | \

(RV_X(x, 2, 1) << 5) | \

(RV_X(x, 7, 1) << 6) | \

(RV_X(x, 6, 1) << 7) | \

(RV_X(x, 9, 2) << 8) | \

(RV_X(x, 8, 1) << 10) | \

(-RV_X(x, 12, 1) << 11))


int main()

{

    printf("%04x\n", (RV_X(0xffff, 3, 3) << 1));

}

$ gcc immj.c -o immj

$ ./immj

000e

Michael Clark

unread,
Jan 14, 2016, 9:20:49 PM1/14/16
to and...@sifive.com, isa...@lists.riscv.org
I mean my mistake was interpreting |5] as bit 5 of the encoded immediate in bit position 0 of the decoded immediate (versus bit 0, relative to the range the encoding notation labels, i.e. bit 0 + m of the encoded immediate, where m is the starting physical bit offset of the range the encoding labels, and 5 is the bit offset in the decoded immediate).

This is how I would describe the immediate encoding notation:

n          m
[d|a-b|c]

* The n and m are the physical bit offsets of a range of bits in the encoded immediate
* The numeric values of a, b, c are the bit offsets within the decoded immediate
* the right to left position represents the bit offset within the encoded immediate, relative to the physical bit range that the encoding labels
* a-b represents a range of contiguous bits within the pre-decoded immediate; with the pre-decoded bit offset based on the right to left position; the offset being the sum of the number of bits in each range, or one for an individual bit, cumulatively adding from right to left, relative to m

11   ....   8
[3|5-6|4]

* 4 ; bit 8 of the encoded immediate is in bit position 4 of the decoded immediate
* 5-6 ; bit 9 and 10 of the encoded immediate are in bit position 5 and 6 of the decoded immediate
* 3 ; bit 11 of the encoded immediate is in bit position 3 of the decoded immediate
Reply all
Reply to author
Forward
0 new messages