It's dead, Jim!

109 views
Skip to first unread message

Eric Smith

unread,
Nov 26, 2018, 5:51:00 PM11/26/18
to softcpu...@riscv.org
With regret I find I must declare defeat. I won't have my submission ready by the deadline. My core passes all RV32I compliance tests, but despite advice and even published code from several other participants, I have been unable to get Zephyr built, after about 12 hours spent fighting it, and consequently haven't even gotten to the point of building FPGA bitstreams. I am accustomed to FPGAs from the Big FPGA Vendors, and had a fair bit of difficulty getting the Lattice iCEcube2 installed and working. I was never able to get Microsemi Libero working at all. In both cases, the problems were primarily with license management. I used to have license management problems with Big FPGA Vendor toolchains, but they seem to have made these issues a lot more manageable in recent years.

At the outset I thought I had a good chance of getting first place in both low-resource categories. I still think that my core will likely be smaller than most of the contest entries. I chose a vertically microcoded microarchitecture, which I call Glacial. It is, in fact, _so_ vertical compared to real-world vertically-microcoded processors that I think there needs to be a new term to describe it, "skyscraper microcode".

It was an interesting experience designing a core to optimize size only, with literally NO consideration given for performance. I constantly had to restrain myself from adding features that would improve performance but add LUTs. During development, I tried to migrate my original Verilog code, which was written in an extremely behavioral style, to a somewhat more structural style. That partially worked, but some of the changes unexpectedly caused Synplify to require significantly more LUTs. The result is that I still have a ridiculous casez statement over an 8-bit subfield of the microintruction, where the four LSBs are "????" for all cases. Trying to use casez on only the important 4 bits, or even 5-7 bits with some ? bits, all cause significantly higher LUT utilization, as does rewriting the casez to use explicit decode for the outputs it generates.

Without any specific memory interface, and without the SPI interface, Glacial uses 227 LUTs in the Lattice part. Adding the SPI interface (as yet untested) adds about 10 LUTs. The datapath is 8 bits, and core uses a single memory address space (also 8 bits wide) to contain the microcode, scratchpad, and RISC-V address space, similar to the IBM System/360 Model 25. The microcode and scratchpad memory take up a little under 2.5KiB of memory. The "Y" register, which is the memory pointer, is currently limited to 16 bits, restricting the RISC-V memory to a maximum of 61.5KiB, sufficient for the Zephyr demos. The Y register can be easily expanded to 24 or 32 bits, but that requires additional LUTs. Rather than use a UART in the FPGA fabric, requiring LUTs for both the UART and for address decode, the core has an optional microcode bit-banged UART output, which is included in the LUT count above. A RISC-V "custom0" instruction is used to output a character.

In practice, because neither vendor's FPGAs support initialization of large RAMs from the FPGA bitstream, I intended to use the Cortex-M3 of the SmartFusion2 to load the microcode and RISC-V program, but on the Lattice I intended to put a small portion of the microprogram in one EBR RAM, and have it boot the rest from the SPI flash.

The core alone will run at around 50 MHz on the Lattice part; maybe somewhat lower once RAM is interfaced. However, each microinstruction takes four clock cycles to execute, and each RISC-V instruction takes hundreds of microinstructions. The name "Glacial" is an understatement.

The tasks I _have_ accomplished are:
* defined a microarchitecture
* wrote a microassembler (in Python)
* wrote a microarchitecture simulator (in Python)
* wrote microcode
* debugged microcode (passes RV32I tests)
* wrote Verilog core
* debugged Verilog core (passes RV32I tests)

Writing the microassembler was somewhat easier than doing it from scratch, because I had previously written an assembler for the Intel 8089 (yes, 808_9_) in Python. I was able to adapt it, though a fair bit of rewrite and new code was nevertheless required.

In my opinion, the timeline of this contest, from announcement to deadline, was unrealistically short, even for a project less ambitious than mine. I estimate that I would have needed another week to complete my entry with the Zephyr samples running and bitstreams for both FPGAs.

For what it's worth, the core is now on Github:

Frank Buss

unread,
Nov 26, 2018, 6:26:39 PM11/26/18
to RISC-V Soft CPU Discussion
On Monday, November 26, 2018 at 11:51:00 PM UTC+1, Eric Smith wrote:
At the outset I thought I had a good chance of getting first place in both low-resource categories. I still think that my core will likely be smaller than most of the contest entries. I chose a vertically microcoded microarchitecture, which I call Glacial. It is, in fact, _so_ vertical compared to real-world vertically-microcoded processors that I think there needs to be a new term to describe it, "skyscraper microcode".

It was an interesting experience designing a core to optimize size only, with literally NO consideration given for performance.

Actually there is a limitation for performance. I simulated the required Zephyr examples and if it runs below about 10 kHz, it just stops, no matter how long it runs (at least with my C emulator). I guess the reasons is that then all the time the interrupt routine is called for the scheduler, and no time is spent in the actual threads. At 50 kHz you can already see slower output of the characters.

That's the reason I gave up as well, because I designed a very minimal microcode, but it would have needed probably more than 1,000 microcode instructions per RISC-V instruction, so would have been too slow, and I didn't have enough time to finish it anyway.

jruiz.ryanair

unread,
Nov 26, 2018, 6:35:18 PM11/26/18
to RISC-V Soft CPU Discussion
Ditto to the license issues, the synthesis mysteries and the inconvenient deadline complaint (I wish I could +1 this post).

(And congrats for this beautiful project!)

Eric Smith

unread,
Nov 26, 2018, 6:55:22 PM11/26/18
to Frank Buss, softcpu...@riscv.org
On Mon, Nov 26, 2018 at 4:26 PM Frank Buss <programmer...@gmail.com> wrote:
Actually there is a limitation for performance. I simulated the required Zephyr examples and if it runs below about 10 kHz, it just stops, no matter how long it runs (at least with my C emulator). I guess the reasons is that then all the time the interrupt routine is called for the scheduler, and no time is spent in the actual threads. At 50 kHz you can already see slower output of the characters.

I expected that, and planned to set the timer interrupt rate _really_ low, and maybe change whatever scheduling parameters I could find, to ensure that the interrupt and scheduling doesn't consume the entire CPU. But I haven't done it yet.

I designed a very minimal microcode, but it would have needed probably more than 1,000 microcode instructions per RISC-V instruction, so would have been too slow, and I didn't have enough time to finish it anyway.

I have 1154 16-bit microinstructions, and I haven't even measured an average microinstruction per RISC-V instruction ratio, but it's certainly in the hundreds.

and I didn't have enough time to finish it anyway.

I'm extremely frustrated because I was able to finish that, and get my core passing all 55 RV32I compliance tests, and ran out of time to get Zephyr working.
I feel like I got what should have been the hard part done, and got stuck on what should have been easy.

Tommy Thorn

unread,
Nov 26, 2018, 8:23:09 PM11/26/18
to Eric Smith, Frank Buss, softcpu...@riscv.org
n_yes ++;

95% of my time has been spent dealing with non-core issues (Libero, Microsemi, Zephyr, Dhrystone, and Compliance).
About a week ago I gave on winning due to time constraints, but I still wanted to enter as I have committed to by receiving
a board from Microsemi.  The final straw was when late last night my rig refused to program my board.  I don't know what
the issue is but it sees no programmer and I don't have time to debug that.  So, I might still enter a non-entry as I pass
all compliance tests in Verilator (and Icarus) simulations.  (Short a couple of things from running the Zephyr binaries).

Tommy



--
You received this message because you are subscribed to the Google Groups "RISC-V Soft CPU Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to softcpu-discu...@riscv.org.
To post to this group, send email to softcpu...@riscv.org.
Visit this group at https://groups.google.com/a/riscv.org/group/softcpu-discuss/.
To view this discussion on the web visit https://groups.google.com/a/riscv.org/d/msgid/softcpu-discuss/CAFrGgTQ9q%3DXXURW_f5eW%3D3Pdyr1CgXOdn4vNJ0nS6ADOBBmY1w%40mail.gmail.com.
For more options, visit https://groups.google.com/a/riscv.org/d/optout.

Tommy Thorn

unread,
Nov 26, 2018, 8:29:54 PM11/26/18
to Eric Smith, Frank Buss, softcpu...@riscv.org
Sorry, I'm very sleep deprived (guess why) and I MEANT to write

"About a week ago I gave *up all aspiration of* winning".

FWIW, I expect the winner in the performance category to use binary translation because
you can completely destroy the moronic Dhystone benchmark with that.  However,
eight weeks isn't enough to do that *and* all the other things that is required.  Certainly
not for people with lives/jobs/work/family.

Tommy

Rahul Behl

unread,
Nov 26, 2018, 11:28:15 PM11/26/18
to to...@thorn.ws, spac...@gmail.com, programmer...@gmail.com, softcpu...@riscv.org
Same here! Had to give up on the competition fairly early due to significant difficulties faced with just Libero setup! I couldn't even get the tool installed and could just manage to write a simple single-cycle verilog design without any area budget considerations. :( 

Rahul



--
Rahul Behl, 
BE(Hons) Electronics & Instrumentation Engineering
 
Birla Institute of Technology & Science, Pilani
KK Birla Goa Campus


Antti Lukats

unread,
Nov 26, 2018, 11:40:05 PM11/26/18
to RISC-V Soft CPU Discussion, programmer...@gmail.com
1154x16 does it include full support to run zephyr? It is pretty low anyway, I was epecting more.

The core engine MF8A18 behing engine-V running uses 486 words x16bit bit to run zephyr. I had to add 2 LUT's to make it fit to 512 words.

If you pass the RV32I tests getting zephyr running is actually easy - if you have time (two weeks at least).

Antti

Eric Smith

unread,
Nov 27, 2018, 12:02:03 AM11/27/18
to Antti Lukats, softcpu...@riscv.org, Frank Buss
On Mon, Nov 26, 2018 at 9:40 PM Antti Lukats <antti....@gmail.com> wrote:
1154x16 does it include full support to run zephyr? It is pretty low anyway, I was epecting more.

Although I haven't gotten Zephyr running, as far as I know the 1154x16 microcode includes everything Zephyr needs, including support for mtime, mtimecmp, and the timer interrupt. It does not include the support for the SPI bootloader for the Lattice FPGA, which requires about another 32 words of microcode and 10 LUTs.

If you pass the RV32I tests getting zephyr running is actually easy - if you have time (two weeks at least).

That much? Wow!

I'm stuck in Zephyr Kconfig hell. I've written the UART driver for my core, and I've created/modified the various Kconfig files etc., but when I try to do a build, I get a ton of error messages about configuration variables being set to one thing but actually having another value. I think if I could just get past that, actually debugging the code wouldn't be so bad. But I could be wrong. I'm spoiled by how easy it is to do things using FreeRTOS.


Antti Lukats

unread,
Nov 27, 2018, 12:09:00 AM11/27/18
to RISC-V Soft CPU Discussion, antti....@gmail.com, programmer...@gmail.com


On Tuesday, 27 November 2018 06:02:03 UTC+1, Eric Smith wrote:
On Mon, Nov 26, 2018 at 9:40 PM Antti Lukats <antti....@gmail.com> wrote:
1154x16 does it include full support to run zephyr? It is pretty low anyway, I was epecting more.

Although I haven't gotten Zephyr running, as far as I know the 1154x16 microcode includes everything Zephyr needs, including support for mtime, mtimecmp, and the timer interrupt. It does not include the support for the SPI bootloader for the Lattice FPGA, which requires about another 32 words of microcode and 10 LUTs.

If you pass the RV32I tests getting zephyr running is actually easy - if you have time (two weeks at least).

That much? Wow!

assuming 1 hour day, and work under overload/frustration.

with clear head and some helpful hints, its 1 day.
 
I'm stuck in Zephyr Kconfig hell. I've written the UART driver for my core, and I've created/modified the various Kconfig files etc., but when I try to do a build, I get a ton of error messages about configuration variables being set to one thing but actually having another value. I think if I could just get past that, actually debugging the code wouldn't be so bad. But I could be wrong. I'm spoiled by how easy it is to do things using FreeRTOS.


clone mine, and replace the uart driver, a friend tried it out and got it working on remote PC within hours, he has 0 previous zephyr knowledge

Antti 

Eric Smith

unread,
Nov 27, 2018, 12:14:42 AM11/27/18
to Antti Lukats, softcpu...@riscv.org, Frank Buss
On Mon, Nov 26, 2018 at 10:09 PM Antti Lukats <antti....@gmail.com> wrote:
On Tuesday, 27 November 2018 06:02:03 UTC+1, Eric Smith wrote:
On Mon, Nov 26, 2018 at 9:40 PM Antti Lukats <antti....@gmail.com> wrote:
If you pass the RV32I tests getting zephyr running is actually easy - if you have time (two weeks at least).
That much? Wow!
assuming 1 hour day, and work under overload/frustration.
with clear head and some helpful hints, its 1 day.

OK, that seems quite believable.

clone mine, and replace the uart driver, a friend tried it out and got it working on remote PC within hours, he has 0 previous zephyr knowledge

Will try that. Thanks!

Antti Lukats

unread,
Nov 27, 2018, 12:32:01 AM11/27/18
to RISC-V Soft CPU Discussion, antti....@gmail.com, programmer...@gmail.com
cd philosophers
mkdir build
cd build
create do.bat with

set ZEPHYR_BASE=X:\GIT\riscv-contest\zephyr\1.13\zephyr
set BOARD_DIR=X:\GIT\riscv-contest\zephyr\1.13\zephyr\boards

set BOARD=m2gl025_ev

set ARCH=riscv
set TOOLCHAIN_VENDOR=none
set ZEPHYR_TOOLCHAIN_VARIANT=zephyr

cmake -GNinja ..
 ninja
 ninja menuconfig

run do.bat

it should be similar on linux

Antti










 

Antti Lukats

unread,
Nov 27, 2018, 1:49:18 AM11/27/18
to RISC-V Soft CPU Discussion, antti....@gmail.com, programmer...@gmail.com
if you do please read the readme.pdf from engine-V first

I submitted it 1:30 before deadline, but AFTER the contest organizers fetched the repo, so I possible get removed from the contest because of the lack of README

Antti
 

Tommy Thorn

unread,
Nov 27, 2018, 2:01:55 AM11/27/18
to Antti Lukats, RISC-V Soft CPU Discussion, programmer...@gmail.com
Im out.  Bummer.

Tommy
--
You received this message because you are subscribed to the Google Groups "RISC-V Soft CPU Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to softcpu-discu...@riscv.org.
To post to this group, send email to softcpu...@riscv.org.
Visit this group at https://groups.google.com/a/riscv.org/group/softcpu-discuss/.

Olof Kindgren

unread,
Nov 27, 2018, 3:16:13 AM11/27/18
to RISC-V Soft CPU Discussion, antti....@gmail.com, programmer...@gmail.com
I didn't manage to complete either, but submitted anyway

It was fun, but a lot of work. I also have some complaints and lessons learned

First of all, I'm tremendously thankful for project IceStorm. I had exactly zero problems with licensing since I could use a fully open source toolchain. I did however install icecube2 last week to see if that would make any difference. It does seem that synplify does a slightly better job than yosys, but it's hard to say since it also adds a global buffer that causes the icecube2 placer to fail, so I can't even produce a bitstream with the commercial tools. Going for the MicroSemi tools was a no-go from day one, since I have had horrible experience with them when working on OpenRISC some years ago. I had a feeling there would be endless trouble with tools, and judging from what others report, it seems this was a correct guess :)

My biggest complaint would be the complete lack of communication from the organizers. We had several questions that were never answered, and then suddenly the rules just changed. My big issue here is the "RAM initialization data is not considered free". This makes a huge difference for microcoded designs. Is a 500 LUT solution that uses 2 block RAM smaller than a  550 LUT version that only uses one block RAM? Is there a difference if you only use part of a block RAM. For example, I'm planning to redesign the register file (some time in the future) so that it can use the same block RAM as the code memory. That would mean my CPU goes from using a full RAM to using 128 bytes of a RAM. Does that count as a difference?

I'm also curious to know if anyone got a Lattice board. I sent a mail asking for one but received no reply. Didn't hear back regarding the MicroSemi board either, but at least I have seen that other received their boards.

So, to the organizers. Please see this as constructive criticism. There were clearly a lot of people excited about this contest, but many of us got really frustrated with things that seems to have zero to do with designing a small or fast CPU

//Olof


Federico Tula Rovaletti

unread,
Nov 27, 2018, 3:58:13 AM11/27/18
to RISC-V Soft CPU Discussion, antti....@gmail.com, programmer...@gmail.com
Here is our story... 

The project was done by my students: 
- Santiago Abbate (graduate)
- Nicolas Bertolo
- Leandro Jalil
- Tomas Kromer

while they were taking my course on Computer Architectures in UNRN (Argentina) as an experiment. 
They have not finished but they were really close... They initially built a basic unicycle (basically an emulator) that passed all tests and Zephyr samples and then aimed for minimum area by redesigning it with microcoding and bit serial implementation.
I personally congratulate them for their work and commitment to this contest, considering they started learning Verilog/SystemVerilog and most of the fundamental concepts of Computer Architecture (ISA, unicycle, pipeline, cache, virtual memory, etc.) and RISC-V just four months ago. This is their first time using Verilator, Zephyr (they have possibly found a major bug that will be reported once fully analyzed), Microsemi and Lattice tools.. and they have done all that, from scratch, without any help and within their limited spare time.
This contest has helped us to initiate a RISC-V implementation as part of a research project targeting FPGAs so hopefully I will have more of these exceptional students in the future for the following contests.
Reply all
Reply to author
Forward
0 new messages