It sounds like you are well on your way now; however;
There are tables here in C although some of it is switch statements:
I just fixed a bug with reserved encodings (compressed instructions that have non-zero immediate constraints). I have booted RVC Linux from the riscv-meta compression metadata although this was in the Privilege ISA v1.9.1 timeframe. It should not have been affected by this bug because the reversed encodings are not generated by the assembler; however they are useful test cases; because the knock out depends on immediate decode, which depends on opcode or type. I had to add code to translate these to illegal after decompression to the Base ISA opcode.
Also note there is a pseudo lifting stage which uses another set of constraints to map the Base ISA instruction after decompression to the set of simple single instruction pseudo instructions (not call, tail, li or la as they require a state machine across more than one instruction). A pipeline may have move detection in its rename stage so a mv uop might be useful (it does this for Base and Compressed).
The interesting case is 0x0000 which has an interesting natural decompression and lift to “mv s0, sp” (inst[1:0]=0b00, funct3=0, op=c.addi4spn, rs2=sp;implicit, rs1=compressed-reg-0 or x8). This interesting case is further obfuscated due to the move pattern.
Executing 0 data would be interesting if any silicon had this bug. Also of note is that the Base ISA and Compressed ISA have different move patterns. Andrew Waterman added a note regarding this as it affects move detection.
I have another repo ‘rv8’ that has an object model for the riscv-meta metadata along with a generator framework for machine generation. Here is LaTeX PDF generated from the metadata. It’s quite similar to the ISA spec:
I’m curious about your J bug, as I should double-check riscv-meta against your findings. I actually made the same mistake with signed vs zero-extended immediate forms because the Base ISA is all sign extended. Since then Andrew Waterman and others have revised this area of the spec to be more precise. We could move to a notation that is completely unambiguous as this was a slight usability issue for the spec.
It might be nice to have a sign-extend(x) zero-extend(x) type notation although it depends on what will fit in the tables. In my own notation I use sx(x) and ux(x) for signed and unsigned respectively. The immediate names for some of the opcodes could be somewhat improved as z has different potential interpretations. In fact with the generator it’s relatively easy to make these sorts of changes with relative safety once the metadata has been verified.
Indeed there is the potential to create new formats of the docs and diagrams using PGF/TiKZ. I must review the PDF to see if there is anything we could improve.