Prize announcement and aftermath

76 views
Skip to first unread message

Antti Lukats

unread,
Dec 6, 2018, 8:43:57 AM12/6/18
to RISC-V Soft CPU Discussion
Hi


there will be some more official info coming too, at summit was just quick announcement

my own promise for aftermath: I will clean up the mist I submitted and write up some documentation that I wanted, the other contest entries had so much better documentation and build scripts I feel terrible for the mess I pushed in the rush.

.. there was some small promise for contest next year spoken out too, but to be seen..

I hope there was some fun for everyone, despite the many issues, for me the actual work on the softcore only begins now

g
Antti

Nelson Ribeiro

unread,
Dec 6, 2018, 9:56:06 AM12/6/18
to Antti Lukats, softcpu...@riscv.org
Big congrats to Charles, Antti and Olof!. :)

I was hoping to see a design with better performance than the one that Charles did... 
Don't get me wrong, his work is very good and a good reference for us all, I just think there is more juice and performance to be achieved.
Some of them where discussed in this forum...

I will have some vacation days between Christmas and New Year festivities, I hope I can finish my design by then.

Nelson

--
You received this message because you are subscribed to the Google Groups "RISC-V Soft CPU Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to softcpu-discu...@riscv.org.
To post to this group, send email to softcpu...@riscv.org.
Visit this group at https://groups.google.com/a/riscv.org/group/softcpu-discuss/.
To view this discussion on the web visit https://groups.google.com/a/riscv.org/d/msgid/softcpu-discuss/41b25fd8-57d1-44f2-912a-c236c7721943%40riscv.org.
For more options, visit https://groups.google.com/a/riscv.org/d/optout.

Antti Lukats

unread,
Dec 6, 2018, 10:14:47 AM12/6/18
to RISC-V Soft CPU Discussion, antti....@gmail.com


On Thursday, 6 December 2018 15:56:06 UTC+1, Nelson Ribeiro wrote:
Big congrats to Charles, Antti and Olof!. :)

Thanks, but you forgot Reindeer, there is really beautiful documentation, really cool
 

I was hoping to see a design with better performance than the one that Charles did... 
 
getting max out on 2 very different FPGA arch - this was really an mission impossible for the contest deadline, only if full team working 12 hours/7day, or if everything was ready before contest start date

 
Don't get me wrong, his work is very good and a good reference for us all, I just think there is more juice and performance to be achieved.

his work is very impressive, the verilator stuff is really cool
 
Some of them where discussed in this forum...

I will have some vacation days between Christmas and New Year festivities, I hope I can finish my design by then.

Nelson

have FUN!

Antti 

Charles Papon

unread,
Dec 6, 2018, 10:40:27 AM12/6/18
to RISC-V Soft CPU Discussion, antti....@gmail.com
So, honnetly, I had quite an "unfair" advantage since the begining of the contest :
- I already had a  working CPU
- Already had a bit of experiance  with those specific FPGA flow
- Had a friend quite used with Zephyr to help me picking it up
- Had only a 40% employment.

So all of that is just to say GG to everybody,  Reading the spec, implementing the CPU and managing the hardware/software stuff was realy a lot of job for that small amount of time. I would probably not have been able to do it from zero in the contest timings.

About the performance of VexRiscv, i would say it is a realy regular pipelined CPU netlist generated from a non regular hardware description. So yes, probably there is ways to get faster/stronger/better. Maybe one day we will get competitive multiple issue RISCV softcore on FPGA, or even more exotic things :D ?

Nelson Ribeiro

unread,
Dec 6, 2018, 4:53:43 PM12/6/18
to Antti Lukats, softcpu...@riscv.org
Well, I only congratulated "active" persons here in the Forum. 

I don't remember seeing Changyi Gu posting anything in this forum; but performing a search of "Reindeer", I did found now that there is one post related to that design...

Do you know if they actually used metrics for the awards (area and performance), or did they assigned points in a more "Ad Hoc" style?

The results are published now, but they did not published a list of entries, and not even the score of the designs that got the awards... what  lack of transparency...
 
Microsemi distributed 50 boards to contestants, so at least 50 entries were to be expected. I did not requested one myself because I was not sure if I was able to finish in time.

Nelson

Nelson Ribeiro

unread,
Dec 6, 2018, 5:34:34 PM12/6/18
to Charles Papon, softcpu...@riscv.org, Antti Lukats

Well Charles, I have to try your design in a Spartan6 device (I have a Atlys board) or in a Zynq (Artix-7 fabric based) to have a real feeling of comparison. At work I make also designs for Lattice MachXO/2/3 and Intel Stratix iv/v, but I dont get to play with soft cpu cores. I only do that at home as a hobby. And at home I only own Xilinx devices based boards, and I have a better real feel with their family devices . (Well now I also own a Gnarly Grey UPDuinoBoard V2, which I bought to use in the contest.)

Your DMIPS/MHz is ok/good, but the maximum working frequency for Lattice device seems odd. I was targeting also 1.40  DMIPS/MHz but a working frequency near 40MHz, but I dont know if it was achievable or not. I made a few experiences with the 32x32 multiplier (pipelined) in terms of ice40 plus dsps, and I did not liked the results... need to check your implementation. But that is why i suggested a serial implementation with a sort of table for the values used in the benchmark. 

Nelson

Eric Smith

unread,
Dec 6, 2018, 5:35:39 PM12/6/18
to softcpu...@riscv.org
On Thu, Dec 6, 2018 at 2:53 PM Nelson Ribeiro <ngrr.r...@gmail.com> wrote:
Microsemi distributed 50 boards to contestants, so at least 50 entries were to be expected. I did not requested one myself because I was not sure if I was able to finish in time.

I thought I had a reasonable chance at completing my area-optimized entry, even though I started from scratch. I requested and received one Microsemi SmartFusion2 board and one Gnarly Grey board, for which I am grateful to the sponsors. I feel bad that I didn't get a complete entry submitted, but I did get my core, "Glacial", working in Verilator just a day before the deadline. Unfortunately I didn't get the Zephyr demos working. I put my core on github:


I am quite impressed that others did manage to complete all the requirements on time.

Best regards,
Eric

Eric Smith

unread,
Dec 6, 2018, 5:40:38 PM12/6/18
to softcpu...@riscv.org


On Thu, Dec 6, 2018 at 3:34 PM Nelson Ribeiro <ngrr.r...@gmail.com> wrote:
[Charles'] DMIPS/MHz is ok/good, but the maximum working frequency for Lattice device seems odd. I was targeting also 1.40  DMIPS/MHz but a working frequency near 40MHz.

My Glacial core runs at around 50MHz in the Lattice, but its DMIPS/MHz must be near zero. (I haven't measured it.) Glacial is microcoded and requires on the order of a thousand clock cycles per RISC-V instruction. It was designed to optimize for low resource utilization above any other consideration. I constantly had to stop myself from adding minor improvements to the microarchitecture that would significantly increase performance.

Eric


Charles Papon

unread,
Dec 6, 2018, 5:49:10 PM12/6/18
to RISC-V Soft CPU Discussion, charles....@gmail.com, antti....@gmail.com
@Nelson

So about the DMIPS, i made some trade of to get better FMax, mainly delaying path as branch / load and adding an extra fetch stage, if all the option were set to max DMIPS/Mhz, it would have been 1.57 DMIPS/Mhz..
One way which would have unlock the maximal frequancy on the igloo2 design would have been to incorporate the dual port main memory into the memory pipeline. One issue that i had is the relative fullness of the UP5k, probably it decrease the FMAX, but also, the UP5K is reaaaaaaly slow nearly 3 time slower than a ice40 HP. Anyway, let's me know if you get some numbers :)

So the UP5k multiply implementation that i used is exactly the same than for the igloo2 : 
https://github.com/SpinalHDL/VexRiscv/blob/6334f430fe1bed302733c6ea6c44f8b514f3e2c6/src/main/scala/vexriscv/plugin/MulPlugin.scala#L74
splited over 3 stages. Stage 1 do 4 partial multiplications, stage 2 and 3 sum the results.

Nelson Ribeiro

unread,
Dec 6, 2018, 6:11:57 PM12/6/18
to Charles Papon, softcpu...@riscv.org, Antti Lukats
Humm, you only used 32 bit x 32 bit signed multiplayer (pipelined and split in 4 smaller 16 bit mults : UxU, UxS, SxU, SxS)
I thought we needed 33 x 33 bits  signed multiplayer  because the need of support of MULH, MULHSU and MULHUU in the same isa,

How do you support MULHSU and MULHUU instructions?

Nelson

Charles Papon

unread,
Dec 6, 2018, 6:24:18 PM12/6/18
to RISC-V Soft CPU Discussion, charles....@gmail.com, antti....@gmail.com
No no, you are right, you need a 33*33, the code (after constant symplification) do a a 16*16 mul for lowlow, two 17*16 for highlow and lowhigh and one 17*17 for highhigh. But apparently, the synthesis tool was happy with that, and inferred the extra bit as regular lut adder :)

Basicaly, this implementation was made for Xilinx/Altera FPGA which have 18*18 bits multiplier, i was to lazy to implement a ICE40 UP dedicated one XD

Nelson Ribeiro

unread,
Dec 6, 2018, 6:37:18 PM12/6/18
to Charles Papon, softcpu...@riscv.org, Antti Lukats
2 minutes after posting my comment I realized my mistake. Thank you for the confirmation.

In annex there is my own version of this type of multiplier written in VHDL.

Nelson

--
You received this message because you are subscribed to the Google Groups "RISC-V Soft CPU Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to softcpu-discu...@riscv.org.
To post to this group, send email to softcpu...@riscv.org.
Visit this group at https://groups.google.com/a/riscv.org/group/softcpu-discuss/.
hmip_mul_unit_64b_rtl.vhd

Nelson Ribeiro

unread,
Dec 6, 2018, 6:59:45 PM12/6/18
to Charles Papon, softcpu...@riscv.org, Antti Lukats
Actually, your implementation is perfect for ice40, because you can have 16(signed/unsigned) x 16 (signed/unsigned) on ice40 plus devices . 
For Xilinx we only have signed values at the input of the dsp48, that is why we need the extra bit if the multiplication is unsigned. So I think I was screwing my tests by not using, for example, the lowlow 16x16 unsigned x unsigned multiplier of the ice40, but the 18x18 signed multiplier as I do with Xilinx devices.

Note: the unsigned or signed property of a input value of ice40's dsp is a pre-synthesis parameter of the primitive itself.

Always learning....

Nelson 
Reply all
Reply to author
Forward
0 new messages