Hi,
I have now added a chapter to the spec proposing an alternative to BEXT/BDEP. It's chapter 3 in the current draft:
In a nutshell, this proposes 5 (RV32) or 6 (RV64) grevmN instructions that can be used to efficiently build butterfly
and bene networks and can easily share almost all the resources from GREV.
It also proposes a ZIP instruction and its inverse UNZIP. In the example applications I played with this is the most
common operation that would benefit from BEXT/BDEP, and it is extremely simple to perform using a dedicated
instruction. Combined with grevmN the ZIP and UNZIP instructions can also be used to efficiently build omega and
flip networks.
This proposal would not get rid of the BEXT/BDEP instruction encodings. Instead it would make BEXT/BDEP an
optional feature that may be emulated in software, microcode, or implemented using another slow multi-cycle method.
An optimizing compiler would know how fast BEXT/BDEP is on the target hardware and act accordingly (similar
to the decision whether to use MUL or DIV with a constant, or to perform the same operation using a longer sequence
of much simpler operations).
Many of the applications that would require a really fast BEXT/BDEP implementation would be covered by the new
instructions, leaving BEXT/BDEP for the really hard cases where even a multi-cycle implementation of BEXT/BDEP
would be a great advantage. The proposal gives an example of such an application (calculating the index of the Nth
set bit in a word).
Regarding encoding space this would add 5 or 6 R-type opcodes for the grevmN instructions and two more unary
opcodes that could be encoded similarly to CLZ and CLZW. On RV32 this would increase the number of required
code points by approximately 40%.
Please have a look at the proposal and let me know what you think.
regards,
- clifford