Another approach is sharing the GREV functional unit for bit reverse and byte swap with the immediate multiplexed in hardware. The area is similar and opcode space could be conserved.
i.e. Substitute BREV for GREV and add BSWAP to this block diagram and multiplex the GREV immediate depending on function (CLZ/BSWAP/BREV):
-
https://github.com/rv8-io/rv8/blob/master/doc/src/bitmanip.png
The question is how often will GREVI be used where k is not 31 or 24. I guess it could exist as a distinct instruction even if there are opcodes for BSWAP and BREV (which could also be pseudos for the GREVI values 31 and 24). My big question is do we make CLZ efficient given its popularity assuming we can re-use the functional units. The machine code sequence is too complex to macro-fuse given the need for a temporary register. I do hope we have CLZ/CTZ instructions and not 5 cycle instruction sequences.
However we shouldn’t be presuposing implementation approaches but at the same time they should be considered.
The first 9 of this set of instructions below maps precisely to compiler builtins and will magically make a lot of code faster, with the exception of the last 3, all of them have builtins or lifts.
## Candidate instructions for B extension (Bit Manipulation)
Opcode Long Name Description
RLL[W] rd,rs1,rs2 Rotate Left Logical Rotate bits in rs1 left by the amount in rs2
RRL[W] rd,rs1,rs2 Rotate Right Logical Rotate bits in rs1 right by the amount in rs2
RLLI[W] rd,rs1,shamt Rotate Left Logical Immediate Rotate bits in rs1 left by the immediate
RRLI[W] rd,rs1,shamt Rotate Right Logical Immediate Rotate bits in rs1 right by the immediate
BCLZ[W] rd,rs1 Bit Count Leading Zeros Count leading zero bits in rs1
BCTZ[W] rd,rs1 Bit Count Trailing Zeros Count trailing zero bits in rs1
BCNT[W] rd,rs1 Bit Count Count number of bits set in rs1
BREV[W] rd,rs1 Bit Reverse Reverse bits in rs1
BSWAP[W] rd,rs1 Byte Swap Swap byte order in rs1
BEXT[W] rd,rs1,rs2 Bit Extract Gather LSB justified bits to rd from rs1 using extract mask in rs2
BDEP[W] rd,rs1,rs2 Bit Deposit Scatter LSB justified bits from rs2 to rd using deposit mask in rs2
GREVI[W] rd,rs1,imm7 Generalized Reverse Generalized Reverse of bits in rs1 accoring to mask in imm7
One could possibly remove a rotate direction at the expense of a subtract for the non immediate form (which is certainly less frequent) but this list is pretty minimal compared to the large variety of rarely used bit manipulation instructions that are out there on other ISAs. I’d consider this list a relatively minimal list.
I’m mostly interested in the first 9 as a lot of code is already using the builtins/lifts…
The only field extract is BEXT. They all suffer from the same problem which is the limited space for an immediate. BEXT/BDEP are most useful in loops where the masks are loaded into registers.