So this makes no sense. One of those (integer multiply) is incredibly important and a fast implementation simplifies so much code, the other (div/rem) should basically be software-only because it's both so complicated and so rarely used, let alone in the hot path.
Note that every single div-by-constant can always be implemented as a wide multiply, which is another reason for a suggestion: I strongly think mul should be core. If not, then at least break up the addition so it's more clear that you can implement mul without also having to implement div/rem.
-Peter
The simplest 32-bit independent hardware divide/remainder unit is very
simple, less than 100 flops. Implementation tricks can further cut the
additional cost, depending on your existing microarchitecture.
Even without hardware support, trap and emulate ensures you stay
compatible with the ABI.
The compiler will usually translate simple div-by-constant to other
sequences anyway, as divide is usually much costlier.
We didn't want to make multiply/divide part of base, as neither are
needed in many low-end applications, or if you're building an
accelerator that has multiply/divide superpowers.
Add to the desire to avoid too many ABIs, and that's why we made
divide a standard part of the M extension.
There's nothing to prevent you defining your own ABI with hardware
multiply but software divide, but I'd recommend against that if you
want to run a substantial amount of software.
Krste
Would it be possible to create a compiler flag that inserts software routines for division but hardware instructions for multiplication? A function call is much cheaper than an illegal instruction exception. Or can someone point me in the right direction to fork riscv-gnu-toolchain?
I realize FPGAs have hardware support for multiply and not division,
but these other solutions don't seem that much better in the FPGA
context.
Krste