Hi,
a big update:
I've cleaned it up a lot, removed the bits relating to the "study" document, converted everything to LaTeX,
and moved most of the things that don't need to be part of the "spec" part to a separate chapter.
I've also added a temporary encoding using NSE (non-standard-extension) opcode space that would allow
implementers and compiler writers to target the XBitmanip extension for further evaluation of the extension.
Furthermore I've added the GREV instruction because on multiple occasions when discussing the
extension with someone over the last months I discovered various use cases where GREVI is insufficient
(or one would need to create a jump table to 32 different GREVI commands), and having a proper GREV
instruction is actually not very expensive.
I've also removed the immediate from CLZ. (I'm actually unsure if that was only there as a result of an
editing error. This one instruction would have used up a 8x larger encoding space than all the rest of
XBitmanip combined!) CLZ+ADDI should simply be a two instruction sequence that can be fused if an
implementation want's to squeeze out that extra bit of performance.
I've added compressed c.not, c.neg, c.clz, and c.pcnt for implementations that already have C support.
It fits nicely in the brownfield in c.lui/c.addi16sp (i.e. no fragmentation of the opcode space) and only
uses 0.13% of the RVC opcode space.
The main motivation here is that those instructions occur again and again in sequences that are
prime candidates for macro-op fusion on higher end processors, and keeping those sequences short
is key to make macro-op fusion work (because otherwise we'll starve the pipeline because we can't
fetch new instructions fast enough). For example x86 "bsr" is just the 64 bit sequence "c.clz, c.neg,
addi" on RISC-V.
regards,
- clifford