Using RISC–V for double precision Matrix operations

Muhammad Ali Akhtar

unread,

Feb 13, 2018, 9:07:04 AM2/13/18

to RISC-V HW Dev

Hello All,

I want to purpose RISC V as an alternate to my boss for our application.

Our current solution is discrete FPGA + DSP.

Soft processors (NIOS 2 / MICROBLAZE) is too slow for our application.

Arm Cortex A9 (as in Zynq ) is good but expensive.

We need to perform inverse, square and multiplication of 3 70x70 matrices in approximately 5 ms. All values are double precision floating point

Any idea about which (if any) RISC V variant can do the job.

FPGA area is not the problem at the moment.

thnks and regards.

--

Muhammad Ali Akhtar
Principal Design Engineer
http://www.linkedin.com/in/muhammadakhtar

高野茂幸

unread,

Feb 14, 2018, 2:29:45 AM2/14/18

to Muhammad Ali Akhtar, RISC-V HW Dev

Hi,

Is a power consumption your issue or not?
If it is not issue highly clock with ordinary program is helpful. And I think you must take care of processor is load store architecture and sequentially execution, so we can not expect parallelized performance implemented in FPGAs and ASICs.

Write a program and compile the code and look the assembly code then you can assess how much cycles are needed for your data size. After that you can also estimate necessary clock cycle time for your latency constraint. After this you can decide to take RISCV or not.

Best,
S.Takano

2018年2月13日(火) 23:07 Muhammad Ali Akhtar <muhamma...@gmail.com>:

--
You received this message because you are subscribed to the Google Groups "RISC-V HW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hw-dev+un...@groups.riscv.org.
To post to this group, send email to hw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/hw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/hw-dev/CADmwpy7teNWE7r8CdDnr4W1Kp12mM-dPuCr5d7TNaPQZ5o95mg%40mail.gmail.com.

Dr Jonathan Kimmitt

unread,

Feb 14, 2018, 3:34:43 AM2/14/18

to hw-...@groups.riscv.org

Dear Muhammad,

If you check out the lowrisc.org project ( http://www.lowrisc.org/docs/ethernet-v0.5/ )

It has a compiler so you can quickly try out your algorithm in embedded Linux.

This will give a top estimate for run-time. To improve you could make use of dedicated on-chip memory to hold your matrices (DDR memory access could be a bottleneck).

We only support one cheap board at the moment (Nexys4-DDR) but potentially a different FPGA could be used.

Regards,

Jonathan

Anton Krug

unread,

Feb 14, 2018, 5:12:29 AM2/14/18

to RISC-V HW Dev, jr...@cam.ac.uk

Hi Jonathen, Muhammed,

I don't have enough knowledge about the lowRisc, but is it RocketChip based (noticed in included in the repository)? Couldn't find what extensions is it build, does it have the D extension or would be the math implemented in the software. I think rocketchip still has the no 32bit D variant so that would mean a 64bit riscv if you want hardware double precision. Or hope software would implementation would be fast enough. You could cheat a bit, you were saying the FPGA size is not a factor and it looks like the 3 matrices are independent, could you synthesize completely 3 independent cores each working on one matrix?

The other recommendation to port part of the algorithm from your current setup to riscv to see how heavy assembler it produces could be pretty good estimation without a need to have any riscv hard/soft core by hand. You could switch the "arch" so see how rv32 soft float core would perform or how a rv64g would perform. Which would give you some guidance.

Anton

Dr Jonathan Kimmitt

unread,

Feb 14, 2018, 8:07:31 AM2/14/18

to Anton Krug, RISC-V HW Dev

Yes, Lowrisc is based on an old version of 64-bit Rocket. It does not have the compressed instruction extension but it does have a hardware floating point unit. In fact this unit is on the timing critical path.

You should be able to tell how many floating-point operations your algorithm requires. It it uses lapack then you would need to port the operations you need to RISCV. This can all be done before you go anywhere near any hardware.

If you need some kind of vector unit to get the performance then a different software would be needed. You can consider Vivado-HLS for example, which has a variety of trade-offs for different area/performance compromises.

Reply all

Reply to author

Forward