Thoughts on a Zrfinram extension

Olof Kindgren

unread,

Jan 3, 2023, 9:27:14 AM1/3/23

to RISC-V ISA Dev

Hey there isa-devils!

I have this idea for a new extension intended for the smallest of cores in the most deeply embedded systems for replacing 8-bit CPUs, complex FSMs and such with RISC-V. The general idea of the extension is to memory-map the register file in RAM (hence Z RF in RAM). I'm not a toolchain guy though, so I thought I'd throw it out here to get some feeling for the general interest and if this makes any sense at all.

For some background, I'm the designer of the award-winning SERV, the world's smallest RISC-V CPU (https://github.com/olofk/serv). SERV is around 2.1kGE or 100-200LUT on various FPGA implementations. This means that memory size is the dominating factor here. On FPGA, which has fixed size SRAM typically 1kB-8kB, we can save a whole SRAM by using a slice of the data/instruction memory for the RF (to give you an idea of how many SERV cores that can be fit into various FPGAs this way, you can look at https://corescore.store/). For ASIC we get some saving as well because we don't need additional control logic if we use the same memory for RF and instructions/data. And in both cases we can easily reclaim 64 bytes of RAM by just compiling our code with RV32E and use the storage normally used for x16-x31. For the kind of applications I'm envisioning this could be a signicant amount of memory. If we have a hand-written piece of assembler that only uses 4 regs then we have a whooping 112 bytes of additional RAM at our disposal.

None of the above requires a new extension. We just need to make sure SW doesn't touch the memory used by the RF, but I'm wondering whether to take this one step further. I'm not a compiler guy though, so I have no idea if any of these things make sense or would be terribly compilated

First thing I'm thinking of is whether we could/should add something to the compiler/linker that automatically finds out which is the highest-numbered register used and then allows the rest to be used as RAM. Perhaps a flag to set the maximum reg number?. I have recently learned about -ffixed-reg, but also that it might not be so easy to use with an ABI. Should this be discussed as part of the effort to ratify the RV32E extension perhaps? Can we do something clever by using sh/sb to change value of a half/quarter register?

The second thing is that if we have memory-mapped RF, someone mentioned we could also consider adding a CSR to indicate the base address. Having that writable could e.g. enable shadow register files for fast context switching. Again, does that make sense? Would it just be terribly complicated to implement support for?

I briefly discussed this during my presentation at the RISC-V Summit last week (https://award-winning.me/serv-32-bit-is-the-new-8-bit) but never found the time to discuss it in detail with anyone there so I'm hoping to get some comments this way instead. So....what do you think?

//Olof

Allen Baum

unread,

Jan 3, 2023, 12:16:06 PM1/3/23

to Olof Kindgren, RISC-V ISA Dev

Lots of early computers used this technique; the DEC PDP-10 comes to mind.

There was an option you could buy that implemented the GPRs separately in logic instead of main memory to increase performance

- but you could still address them as main memory (so RR ops actually used memory addresses; I don't recall that there were separate RR ops)

One trick used to increase performance was to load small, short loops into the register file and execute out of it!

I think early low-end IBM 360s might have done this also.

But, in this case, you want to essentially remove the "addresses" of the register file from the address map. That shouldn't be difficult,

e.g. the Sifive memory map has holes at the bottom of memory to catch null pointer references,

so SW that tries to access that range with a load/store would trap, You'd need to suppress that trap if HW were accessing it as a register, as opposed to a load/store'/or iFetch, though.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/867c9715-bae4-4ae8-88c1-c427656bd4d7n%40groups.riscv.org.

Michael Zoran

unread,

Jan 3, 2023, 1:41:53 PM1/3/23

to RISC-V ISA Dev, Allen Baum, RISC-V ISA Dev, olof.k...@gmail.com

I've seen this used on 8 bit microcontrollers such as the ones from Microchip. Not really sure if it fits in with the riscv instruction design because that means 3 memory accesses for most instructions plus the instruction fetch itself.

olof.k: It would be interesting to know which FPGAs are using that resource count. It's noticed that number can very alot even between different models by the same company.

Olof Kindgren

unread,

Jan 3, 2023, 2:14:46 PM1/3/23

to Allen Baum, RISC-V ISA Dev

Den tis 3 jan. 2023 18:16Allen Baum <allen...@esperantotech.com> skrev:

Lots of early computers used this technique; the DEC PDP-10 comes to mind.
There was an option you could buy that implemented the GPRs separately in logic instead of main memory to increase performance
- but you could still address them as main memory (so RR ops actually used memory addresses; I don't recall that there were separate RR ops)
One trick used to increase performance was to load small, short loops into the register file and execute out of it!

Interesting. I can definitely see the case for systems with a small fast mem for RF+hot loops and a larger slower mem.

I think early low-end IBM 360s might have done this also.

But, in this case, you want to essentially remove the "addresses" of the register file from the address map. That shouldn't be difficult,
e.g. the Sifive memory map has holes at the bottom of memory to catch null pointer references,
so SW that tries to access that range with a load/store would trap, You'd need to suppress that trap if HW were accessing it as a register, as opposed to a load/store'/or iFetch, though.

I'm not sure we want to trap. Partly because I believe that systems small enough to benefit from this wouldn't necessarily be able to do something meaningful on an illegal access and partly because it would allow doing byte accesses on registers...if that would be of any use. Or maybe not.

Olof Kindgren

unread,

Jan 3, 2023, 2:28:02 PM1/3/23

to Michael Zoran, RISC-V ISA Dev, Allen Baum

Den tis 3 jan. 2023 19:41Michael Zoran <mic...@michaelzoran.net> skrev:

I've seen this used on 8 bit microcontrollers such as the ones from Microchip. Not really sure if it fits in with the riscv instruction design because that means 3 memory accesses for most instructions plus the instruction fetch itself.

I wonder uf there's a misunderstanding here. There wouldn't be any more accesses. Basically just a mux in front of a single SRAM instead of two separate memories for RF and program memory respectively. I do that in the Subservient SoC https://github.com/olofk/subservient/

olof.k: It would be interesting to know which FPGAs are using that resource count. It's noticed that number can very alot even between different models by the same company.

Sure. You can check some numbers for popular families in my latest presentation. The lowest ones are currently for AMD 7-series. It's easy to provide numbers for most FPGA families if there's any in particular you're interested in.

Michael Zoran

unread,

Jan 3, 2023, 2:48:22 PM1/3/23

to RISC-V ISA Dev, olof.k...@gmail.com, RISC-V ISA Dev, Allen Baum, Michael Zoran

I'm thinking of how an instruction like "add t0, t1, t2" would work. I would think it would be 3 access through your dbus and 1 access through your ibus from the instruction itself. Both funnel though your serving_arbiter. So I think that would be 4 sram accesses for this instruction.

Olof Kindgren

unread,

Jan 3, 2023, 3:01:55 PM1/3/23

to Michael Zoran, RISC-V ISA Dev, Allen Baum

Den tis 3 jan. 2023 20:48Michael Zoran <mic...@michaelzoran.net> skrev:

I'm thinking of how an instruction like "add t0, t1, t2" would work. I would think it would be 3 access through your dbus and 1 access through your ibus from the instruction itself. Both funnel though your serving_arbiter. So I think that would be 4 sram accesses for this instruction.

In Serving and Subservient, the RF accesses don't go through the DBUS but get muxed in closer to the memory (see serving_ram.v if you're interested). It all works out very nicely in the sense that ibus, dbus and RF never want to access memory during the same cycle. I think the way I do it in SERV is mostly an implementation detail though and there are many ways to accomplish this, but whichever way you want to do it requires fetching the instructions and operand regs from somewhere as well as storing the result. Also, there's nothing saying the instruction can't be in ROM or Flash.

//Olof

MitchAlsup

unread,

Jan 3, 2023, 5:04:57 PM1/3/23

to RISC-V ISA Dev, olof.k...@gmail.com

On Tuesday, January 3, 2023 at 8:27:14 AM UTC-6 olof.k...@gmail.com wrote:

Hey there isa-devils!

I have this idea for a new extension intended for the smallest of cores in the most deeply embedded systems for replacing 8-bit CPUs, complex FSMs and such with RISC-V. The general idea of the extension is to memory-map the register file in RAM (hence Z RF in RAM). I'm not a toolchain guy though, so I thought I'd throw it out here to get some feeling for the general interest and if this makes any sense at all.

For some background, I'm the designer of the award-winning SERV, the world's smallest RISC-V CPU (https://github.com/olofk/serv). SERV is around 2.1kGE or 100-200LUT on various FPGA implementations. This means that memory size is the dominating factor here. On FPGA, which has fixed size SRAM typically 1kB-8kB, we can save a whole SRAM by using a slice of the data/instruction memory for the RF (to give you an idea of how many SERV cores that can be fit into various FPGAs this way, you can look at https://corescore.store/). For ASIC we get some saving as well because we don't need additional control logic if we use the same memory for RF and instructions/data. And in both cases we can easily reclaim 64 bytes of RAM by just compiling our code with RV32E and use the storage normally used for x16-x31. For the kind of applications I'm envisioning this could be a signicant amount of memory. If we have a hand-written piece of assembler that only uses 4 regs then we have a whooping 112 bytes of additional RAM at our disposal.

None of the above requires a new extension. We just need to make sure SW doesn't touch the memory used by the RF, but I'm wondering whether to take this one step further. I'm not a compiler guy though, so I have no idea if any of these things make sense or would be terribly compilated

You need to make sure SW cannot touch RF memory (doesn't is not strong enough). No address generated by SW can access the registers that the HW thinks are its loaded RF registers.

First thing I'm thinking of is whether we could/should add something to the compiler/linker that automatically finds out which is the highest-numbered register used and then allows the rest to be used as RAM. Perhaps a flag to set the maximum reg number?. I have recently learned about -ffixed-reg, but also that it might not be so easy to use with an ABI. Should this be discussed as part of the effort to ratify the RV32E extension perhaps? Can we do something clever by using sh/sb to change value of a half/quarter register?

You will find this unworkable, and almost any big subroutine will end up using a majority of registers. This also interacts with ABI and how registers are used to p[ass and return results from subroutines. At best you will have several holes in the register mapping.

The second thing is that if we have memory-mapped RF, someone mentioned we could also consider adding a CSR to indicate the base address. Having that writable could e.g. enable shadow register files for fast context switching. Again, does that make sense? Would it just be terribly complicated to implement support for?

I was going to mention that you should pursue memory mapping CSRs too to save space; having only the barest and most essential CSRs in flip-flops.

I briefly discussed this during my presentation at the RISC-V Summit last week (https://award-winning.me/serv-32-bit-is-the-new-8-bit) but never found the time to discuss it in detail with anyone there so I'm hoping to get some comments this way instead. So....what do you think?

My guess is that you can routinely get rid of ½ of the RF read accesses by detecting forwarding and capturing the register in a flip-flop for the next instruction. You can probably get rid of ¼ of the RF writes by eliding writes that will be overwritten in the depth of the pipeline.

//Olof

Bruce Hoult

unread,

Jan 3, 2023, 5:51:31 PM1/3/23

to Michael Zoran, RISC-V ISA Dev, olof.k...@gmail.com, Allen Baum

SeRV instructions generally take either 32 or 64 clock cycles to execute, so accessing SRAM four times in that period is not a problem!

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/624c9abe-57c1-4b70-a12a-a8d99e331ccdn%40groups.riscv.org.

Samuel Falvo II

unread,

Jan 3, 2023, 7:24:18 PM1/3/23

to Michael Zoran, RISC-V ISA Dev, Allen Baum, olof.k...@gmail.com

On Tue, Jan 3, 2023, 10:41 AM Michael Zoran <mic...@michaelzoran.net> wrote:

I've seen this used on 8 bit microcontrollers such as the ones from Microchip. Not really sure if it fits in with the riscv instruction design because that means 3 memory accesses for most instructions plus the instruction fetch itself.

If the memory subsystem can keep up, not an issue.

The TMS9900 processor, used among other things in the Texas Instruments TI-99/4 family, only has three internal registers. One of them is the "workspace pointer", which points to a small buffer in RAM which holds the current state of the "general purpose registers".

There is definitely prior art that vindicates the idea.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5be99433-e7bb-4c94-bdd0-ae0487b59982n%40groups.riscv.org.

Bruce Hoult

unread,

Jan 3, 2023, 8:52:59 PM1/3/23

to Samuel Falvo II, Michael Zoran, RISC-V ISA Dev, Allen Baum, olof.k...@gmail.com

Where minimal cost is the goal, not performance, yes. There is a long history of making huge performance compromises to get a low cost entry point into a large software ecosystem, going back to the original Instruction Set ARCHITECTURE, System/360, with the very slow full-featured Model 30 (microcoded 8 bit CPU) and the even 5-10 times slower Model 20 which was striped back to 16 bit registers and 37 instructions, no memory protection, no user/supervisor modes.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAEz%3Dsom8tAxKfvW%3DUA7nyQ-0iLYAfWcL%2Bin5Q5KLfNUeCO2hZQ%40mail.gmail.com.

Allen Baum

unread,

Jan 4, 2023, 1:19:30 AM1/4/23

to Olof Kindgren, RISC-V ISA Dev

I'm not sure we want to trap. Partly because I believe that systems small enough to benefit from this
wouldn't necessarily be able to do something meaningful on an illegal access and partly because
it would allow doing byte accesses on registers...if that would be of any use. Or maybe not.

The PMA can define what kind of accesses are allowed on any address range, so could allow only whole word accesses - or allow no access

(in which case register fetches bypass the PMA entirely. Effectively your (say) 1KB SRAM becomes 896B of addressable SRAM that Load/Store/Branch can use.

(minus however many you can use for CSRs)

I don't think that this requires any changes to softwar You just need to pass the current SRAM bounds in the DeviceTree or whatever you use.

SW needs to know the memory map regardless of how you implement the register file, so passing a slight different number should have no consequenes..

Iztok Jeras

unread,

Jan 4, 2023, 9:36:39 AM1/4/23

to Allen Baum, Olof Kindgren, RISC-V ISA Dev

Ciao,

I implemented one such CPU this summer, but I did not advertise it yet:

The CPU is written in simple SystemVerilog as a single file.

https://github.com/jeras/rp32/blob/master/hdl/rtl/mouse/r5p_mouse.sv

This is some documentation I wrote before doing the implementation, so it is not accurate, but it might be better than nothing.

https://github.com/jeras/rp32/blob/master/doc/MOUSE.md

For a quick overview, it only supports the RV32I base.

No traps or interrupts are implemented, and no CSR.

The C extension could be added without too much trouble.

But for now my C extension code is written in rather advanced SystemVerilog RTL (not supported by all tools), and would not be the best coding style fit.

In each clock period the system bus addresses either an instruction, load or store, or a GPR.

The system bus is pipelined and optimized for normal SRAM with 1 clock cycle read delay.

As long as I remember, there are no idle cycles, the system bus is always occupied.

This results in a CPI (clocks per instruction) of about 3.5.

The CPU was tested with riscv-arch-test and is passing all tests.

https://github.com/jeras/rp32/blob/master/hdl/tbn/r5p_mouse_riscv_tb.sv

I synthesized it with Vivado for Artix FPGA, I do not remember the results well, it was something like 30MHz and 300 cells.

I did not spend much time optimizing the code for FPGA synthesis, but this is my second RISC-V CPU (I optimized the first one), so it probably has no more than 20~40% overhead.

https://github.com/jeras/rp32/blob/master/hdl/rtl/soc/r5p_mouse_soc_top.sv

I never run any SW (except for ISA tests) on the SoC top or on the FPGA, so expect bugs there (wrong GPR address parameter).

If anybody is interested, I can clean up the testbench and Vivado synthesis in a couple of days.

Regards,

Iztok Jeras

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAF4tt%3DAnyKKZdGfSWYnhTh-6NeDPMYU4u6WYVt6R909kqBbXDA%40mail.gmail.com.

Jeff Scott

unread,

Jan 4, 2023, 10:06:35 AM1/4/23

to Iztok Jeras, Allen Baum, Olof Kindgren, RISC-V ISA Dev

I’m curious why you would want to move the X registers to sram?

What is the goal?

I agree the core size is smaller (remove core flops vs. add sram bytes – significant in chip size??), but the increase in power consumption from accessing memory more often would seem to be a bigger concern.

Jeff

From: Iztok Jeras <iztok...@gmail.com>
Sent: Wednesday, January 4, 2023 8:36 AM
To: Allen Baum <allen...@esperantotech.com>
Cc: Olof Kindgren <olof.k...@gmail.com>; RISC-V ISA Dev <isa...@groups.riscv.org>
Subject: [EXT] Re: [isa-dev] Thoughts on a Zrfinram extension

Caution: EXT Email

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAAk4mkaQ2q52ez6CD_9%2BZtudNiTyYmOtfxf%2BG_vDBkCSJL38bg%40mail.gmail.com.

Olof Kindgren

unread,

Jan 4, 2023, 4:16:05 PM1/4/23

to Jeff Scott, Iztok Jeras, Allen Baum, RISC-V ISA Dev

Den ons 4 jan. 2023 16:06Jeff Scott <jeff....@nxp.com> skrev:

I’m curious why you would want to move the X registers to sram?

What is the goal?

I agree the core size is smaller (remove core flops vs. add sram bytes – significant in chip size??), but the increase in power consumption from accessing memory more often would seem to be a bigger concern.

I believe things are getting conflated here so let's back up a bit. The end goal is to reduce size to make RISC-V more attractive for deeply embedded applications normally associated with 8-bit CPUs with 224 bytes of RAM or custom FSMs. I probably don't need to convince too many people on this list about the benefit of using RISC-V here with a solid and modern ecosystem and tooling. Using GCC or LLVM, or even programming in Rust or running a proper RTOS like Zephyr has the potential to save a lot of time over using outdated proprietary languages or tools normally found in this space.

Now, there are several things I believe can help accomplish this goal. Let's assume we need a bit less than 1kB of RAM, which isn't unthinkable for these kind of applications. Now, SERV, which is admittedly in the far end of the spectrum when it comes to size, is just above 2kGE. This is of course highly process dependent, but 992FFs needed for a 31x32 RF would probably be five times larger or so. Using an alternative storage, like SRAM is much more manageable for this case.

Then I mentioned reserving 128 bytes of the previously mentioned RAM for the RF storage to avoid two sets of control logic for the data memory and the RF. That also saves a bit of space, and SERV (or other small CPUs) tend to be used for these applications exactly because area is the biggest concern.

However, there is nothing in my proposal that requires the whole memory map to use one type of backing storage. We could very well use a small and fast FF RAM for one part and bigger/slower SRAM for the rest. Also, not that there is nothing so far that would require an extension as long as we hide the RF addresses from the normal memory access ops

The key thing I am after is to easily reclaim unused registers as memory. Compiling programs as rv32e would directly free up 64 bytes of RAM and this still wouldn't need a new extension. However, we would need an extension if we want to tell the compiler to further minimize register usage and be aware of how many has been used, so an application that only use five registers would have an extra 108 bytes of memory. But it's here that my poor knowledge of compilers comes in. Does this make any sense to do? Is it even possible? Would anyone want this?

The second thing that _could_ be enabled with memory-mapped RF is to have a CSR that holds the address of the RF. Again, this in itself doesn't need an extension, but if we then want to change this address at runtime, e.g. for fast context switching, then I suspect the tools need to be made aware of this. Again, does this make sense? Is it useful for anyone? Will it be too complicated to implement properly?

The third thing, which is probably not terribly useful, is that we could use sb/sh operations to overwrite parts of a register without and/or masking. That would also require the compiler to be aware.

p.s.

People have mentioned memory-mapping CSRs and this is something I already do in SERV, which memory-maps 4 of the 7 supported CSRs. Standardizing an address mapping for CSRs however would probably be impossible.

//Olof

Jeff Scott

unread,

Jan 4, 2023, 5:30:47 PM1/4/23

to Olof Kindgren, Iztok Jeras, Allen Baum, RISC-V ISA Dev

Hi Olof,

I watched your presentation. Really great work.

In many super low end applications that I have experience with, power is more important than area. I can see now your use case is area is more important than power, so many more accesses to sram is not an issue.

In these super low end applications I have experience with, I have seen RISC-V ISA used with 8 (7 really due to X0) implemented X-registers built with flip flops. I assume they were programming in pure assembly for this use case. It would be nice if the compiler could handle this, but have my doubts it would ever happen.

Jeff

Olof Kindgren

unread,

Jan 4, 2023, 6:12:48 PM1/4/23

to Jeff Scott, Iztok Jeras, Allen Baum, RISC-V ISA Dev

On Wed, Jan 4, 2023 at 11:30 PM Jeff Scott <jeff....@nxp.com> wrote:

Hi Olof,

I watched your presentation. Really great work.

Thank you! :)

In many super low end applications that I have experience with, power is more important than area. I can see now your use case is area is more important than power, so many more accesses to sram is not an issue.

In these super low end applications I have experience with, I have seen RISC-V ISA used with 8 (7 really due to X0) implemented X-registers built with flip flops. I assume they were programming in pure assembly for this use case. It would be nice if the compiler could handle this, but have my doubts it would ever happen.

I know at least two ASIC implementations of SERV that do exactly this. And perhaps that's the area/power/usability trade-off that most people prefer. I did ask around a year or two if there was any interest in an 8-register ABI (RV32ε for lack of a better name) but there were no takers at the time, with most people citing the massive work required for an ABI and the limited usability. This Zrfinram (or perhaps Zmmrf would be a better name to clarify it's the memory mapping that's the key here) proposal was intended as something in-between, a best-effort saving to more conveniently reclaim a few bytes of RF storage that would otherwise go unused. But it's not something I would be able to drive myself so it would require active interest from other parties to pursue further.

//Olof

K. York

unread,

Jan 5, 2023, 2:05:08 AM1/5/23

to Olof Kindgren, Jeff Scott, Iztok Jeras, Allen Baum, RISC-V ISA Dev

I don't think there's any notable cases where a compiled procedure would rather use a fixed size temporally isolated memory buffer rather than an additional register.

First, let's observe that for leaf functions, this extra memory area obtained from the unused register file is isomorphic to stack space.

Then we look at how compilers do register allocation -- stack space is where the overflows from register allocation are "spilled" to. (Note the Cranelift blog posts on regalloc2, which contemplate applying register allocation to the stack.)

In what situation would a memory location be preferable to registers? I can only think of transmute operations (store as data type A, immediately load as data type B).

For non leaf functions, the extra memory is unusable because they must assume that an unknown function call may use the entire register file, whether as registers or as memory. Functions that only call functions with known register footprints are identical to leaf functions in this analysis.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAKaYPCPAMZDaRve40XvzbuoydFw7mfKnJYh8p6AYyNjTdLLe6Q%40mail.gmail.com.

Robert Lipe

unread,

Jan 5, 2023, 9:11:45 AM1/5/23

to RISC-V ISA Dev, kane...@gmail.com, Jeff Scott, Iztok Jeras, Allen Baum, RISC-V ISA Dev, olof.k...@gmail.com

It's really hard to not overquote posts here. Sorry I can't whack more of Kane's article.

Olof K said:
> goal is to reduce size to make RISC-V more attractive for deeply embedded
> applications normally associated with 8-bit CPUs with 224 bytes of RAM or

Please remember that we already have a RISC-V member that is shipping a 2KRam part that has RV32E AND supports exceptions, interrupts, compressed mode, and all those things that "normal" RISC-V developers expect to see. (Well, except JTAG, but that's another beef. The part is on Ali and LCSC for USD $0.10 "in quantity"

The parts you're setting out to build sound barely recognizable as RISC-V and would have to beat that existing part in price and some other way to be compelling. I know there's space "under a dime" to fight over but a lot of integrators will choose a part that runs recognizable RV32Eopcoded in a normal way, even if it costs $.10 instead of .7 ... because in three years, it's going to cost $.6 anyway. It's just really hard to win a sustained race to the bottom of pin count, gate count, price, or whatever. Be careful what you wish for. A RISC-V part that had memory registers in RAM sounds like a really bad dream for any non-trivial code and more so for the existing toolchains. That momentum should be part of our collective elevator pitch by now.

I'm not saying that everyone should give up on a market becaause there's already a player there, but WCH's CH32V003 line seems to be the incumbent in the market you're trying to enter. Buy a couple and study them, as best as your non-competes & IP advisors let you.

Low-volume samples are at:
https://www.aliexpress.us/item/3256804850399956.html
Eval boards + their proprietary JTAG probe (sigh) + loose chips:
https://www.aliexpress.us/item/3256804709476544.html (Post bottom has Datasheets, Ref Manuals, and Code Samples for both parts, the advertised RV32E and more traditional RV32IMAC part that's closer to an STM103.

Patrick Yang can be found on Twitter and Discord if ou want to get hold of them directly.

I'd like to see more working together and less of everyone reinventing everything within RVI...

RJL

Tommy Murphy

unread,

Jan 5, 2023, 9:28:37 AM1/5/23

to Robert Lipe, RISC-V ISA Dev, kane...@gmail.com, Jeff Scott, Iztok Jeras, Allen Baum, RISC-V ISA Dev, olof.k...@gmail.com

Apologies if veering off-topic but...

> Please remember that we already have a RISC-V member that is shipping a 2KRam part that has RV32E

I presume that's actually "some draft version of RV32E"?

Or was it ratified in time for this silicon?

If not, then yet another tools mess like the draft V extension in silicon to come, I presume? :⁠-⁠(

Robert Lipe

unread,

Jan 5, 2023, 9:59:21 AM1/5/23

to Tommy Murphy, RISC-V ISA Dev, kane...@gmail.com, Jeff Scott, Iztok Jeras, Allen Baum, olof.k...@gmail.com

On Thu, Jan 5, 2023 at 8:28 AM Tommy Murphy <tommy_...@hotmail.com> wrote:

Apologies if veering off-topic but...

Seems fair ground to me. It's the meat and potatoes of the RISC-V chip business: seeing what vendors actually BUILD and SHIP with these specs.

> Please remember that we already have a RISC-V member that is shipping a 2KRam part that has RV32E

I presume that's actually "some draft version of RV32E"?

Of course. RV32E 1.9 as referenced in Unpriv V20191213 is still Draft.

Or was it ratified in time for this silicon?

Nope. While I'm sure that WCH is going to sprinkle these parts everywhere and quickly become a volume leader with them, there are other companies that claim to have RV32E parts. I have no idea what degree of interop was even attempted, let alone demonstrated or promised.

If not, then yet another tools mess like the draft V extension in silicon to come, I presume? :⁠-⁠(

Very much like that. This month brought us production 0.7.1 V in the BL808 and maybe this or probably next month will bring us more via the 910 cores in TH1520. We'll have millions of 0.7.1 devices while production 1.0 silicon in the mass market is still non-existent. (Ditto for PTE reserved bits.) Also, very much like that fiasco, the vendor is keeping their own branches of GDB, GCC, Binutils, and friends and making it super difficult to get source and don't seem to be bothered by either the GPL violation or in upstreaming their respective changes. Just like the Win Chip Head RV32E will be the dominant pre-release strain, T-Head/Alibaba has a helluva head start in shipping volumes of V units. They just happen to be the ones that shouldn't have ever shipped, 0.7.1.

That's a pretty sickening side of all the newest chips across my bench. Bouffalo BL702/706 and JH-7110 all have SiFive cores that are pretty well upstreamed, but there are still a lot of drivers missing for OX64 and M1S Dock with BL808 and siblings Snow64 and Vision Five 2, both using JH-7110 (no V unit at all). Binary blobs for radios and graphics are still the norm in all four of those chip lines. BL808 needs the T-Head compilers to access the NPU, the vector unit, and the "P" extension.

It's still early days, but it's not a great sign that we're already fragmenting and siloing things like we are, IMO.

RJL

Reply all

Reply to author

Forward