I have been wondering that if I want to add support in my hardware for some operations lets say matrix transformation or something similar for which there is no specific instruction in RISCV then how we can perform these operations efficiently ?
I am trying to make an application based processor based on RISC V ISA which will be suitable for an ML based application and the goal is to minimize the time for row operations it takes on a general purpose processor.
Any suggestion or resource will be highly appreciated