Proposal: Many Core Forth CPU for FPGA development.

Christopher Lozinski

unread,

Mar 12, 2023, 3:07:47 AM3/12/23

to

FPGA’s are great for building parallel applications, there is a huge growth in the market for FPGA’s in the data centres, but they are a nightmare to develop for. A grid of Forth cores would be much more fun.

I had no idea how bad it was, until I recently started my masters degree in Electrical Engineering, Digital Design in Katowice Poland. I thought Verily and VHDL looked easy, but I completely failed to understand how they work under the hood. One has to study EE to understand how they convert a design into digital circuitry (synthesis), how to write good code, and how to recognize disastrous code. They take forever to create a large design. The simulators do not always work correctly, so the simulator can say it works, but in reality it does not, and then debugging on a live FPGA changes the system. Successful projects have to spend at least twice as much on verification as on design. What a nightmare!

I think it would be a lot more fun to develop prototypes on a grid of Forth CPUs. Changes could be tested instantly. Once the application was developed, optimizations could be made.

Why not just use a multicore processor running 8th????
Because then I cannot add in some custom logic blocks. Also Multicore processors are optimized for speed of each individual processor, and not for cooperation between processors. The guys who made linux run on multi-core had a very difficult time. There are caches that can become inconsistent, and messaging between cores is limited. In contrast with a grid of Forth cores on an FPGA, each gets its own memory area, and can talk to its neighbors. Much conceptually simpler. KISS.

In an ideal world, each Forth CPU would have access to its own external memory, to maximize external memory bandwidth. But in the real world, an FPGA board has just one or two memory banks, so good to do something like vision processing, where one image frame gets loaded, and shared with all of the processors. Indeed the Lattice boards even have interfaces for digital cameras.

So let us talk about a particular vision application. Vision processing in a moving vehicle. Or even when the camera is being rotated. The vision processing develops a model of the system being observed. Here are the straight edges, here are the flat surfaces, here are the curved edges. As the image moves, the model needs to be handed over to the adjacent processor to be updated. At least to calculate velocities. There is a lot of communication between adjacent nodes. So hard to do on a multicore cpu.

Next question: Which Forth core should I use?

There are so many Forth FPGA cpus, and so many versions of Forth, it is hard to choose which one to use. Many Forth CPU’s are optimized to fit on an FPGA, many forth virtual machines are optimized to run on the C programming language, or on an Intel CPU. The one person who worked equally on both FPGA’s and C compilers was Dr. Ting. I am sorry that he is now gone. But I believe that he optimized the eForth language and CPU to both work best together. So I think that the EP 32/24/16 with eForth is the right way to go. There is a large stack of eForth software out there. He also did a brilliant job documenting his designs. Makes it very accessible for the beginner. For vision with 3x8 bit color channels, the EP24 makes sense.

Let me make a few more observations. It is very difficult for a single Forth CPU to compete against commercial mainstream register machines. On the other hand, lots of small Forth cores should be able to outperform a single RISC-V cpu on parallelizable applications. In the Forth Days 2022 video, Dan Golding reports that their Forth CPU is 1/10 the size of a RISC-V cpu. I suspect that it is more than 1/10th as fast. So potentially 10 Forth cores or even 5 could outperform a single RISC core.

Clearly a dedicated FPGA application will outperform anything we can build on a grid of Forth CPU’s but I also believe that a lot of small developers will be able to release many innovative FPGA applications quickly, and then if the market is there, potentially migrate them to a more optimised implementation.

I was very discouraged that Dr. Ting passed away before I had a chance to work with him. I am very mindful that at the recent Forth2020 video conference there was a lot of grey hair. I hope that we can find a project that grows this community, before all of the expertise disengages.

And of course the dream is to build up a large enough community around such a platform, that it eventually gets built in silicon.

Jurgen Pitaske

unread,

Mar 12, 2023, 4:47:13 AM3/12/23

to

There is one project I saw that might fit in
https://www.facebook.com/groups/1304548976637542/?multi_permalinks=1639420426483727&notif_id=1678535445817102&notif_t=group_activity&ref=notif

Christopher Lozinski

unread,

Mar 12, 2023, 6:24:47 AM3/12/23

to

Jurgen Pitaske wrote:
> There is one project I saw that might fit in
> https://www.facebook.com/groups/1304548976637542/?multi_permalinks=1639420426483727&notif_id=1678535445817102&notif_t=group_activity&ref=notif

I am of course quite aware of them, and in touch with one of the authors.

So how does this differ from what they are doing?
Officially they are more focused on building a tablet to make Forth accessible to more people. I am more interested in something like the GA144, a large number of tiny stack machines working together to solve computationally intensive problems.
They are Forth advocates. I am less interested in Forth itself, and more interested in advocating for lots of tiny stack processors versus fewer larger register machines . Of course the stack processors will be running Forth, but they could conceivably run something else.

They have a primary processor, maybe some more for an AI application. I am more interested
in very parallel applications divided among cores.
They are building a new processor, maybe a new Forth. I want to use existing cores, with an existing eForth user base.
They are in Verilog, EP24 is in VHDL.

and of course.
Their project exists, this is just vapor ware.

Does that explain the difference?

Marcel Hendrix

unread,

Mar 12, 2023, 7:02:12 AM3/12/23

to

On Sunday, March 12, 2023 at 11:24:47 AM UTC+1, Christopher Lozinski wrote:
[..]

> So how does this differ from what they are doing?
> Officially they are more focused on building a tablet to make Forth accessible to
> more people. I am more interested in something like the GA144, a large number
> of tiny stack machines working together to solve computationally intensive problems.
> They are Forth advocates. I am less interested in Forth itself, and more interested in
> advocating for lots of tiny stack processors versus fewer larger register machines.
> Of course the stack processors will be running Forth, but they could conceivably run
> something else.
> They have a primary processor, maybe some more for an AI application. I am more
> interested in very parallel applications divided among cores.
> They are building a new processor, maybe a new Forth. I want to use existing cores,
> with an existing eForth user base.
> They are in Verilog, EP24 is in VHDL.

[..]

Epiphany multicore chips from Adapteva. The Parallella dev. board can be had for 200$
or so. ( https://www.digikey.nl/en/products/filter/evaluation-boards-embedded-mcu-dsp/786?s=N4IgTCBcDaIA4EMBOCA2qCm6EgLoF8g )

I think "stack processor" is a red herring. As any optimizing Forth compiler will show you,
stacks can be emulated on a register machine.

The important insight is that Forth may have decisive advantages on parallel hardware.
You will realize that when you have to debug a parallel program on parallel, networked,
hardware.

I almost bought the Epihany boards but found an idiotic excuse: it needs a weird power
supply :--) I have a few other Forth projects that I really want to do first.

Remember, the only projects worth doing are the ones that experts tell you
are an impossible dead end. Let me tell you, eForth is not suitable.

-marcel

Christopher Lozinski

unread,

Mar 12, 2023, 8:19:35 AM3/12/23

to

Marcel Hendrix wrote:
>
> Epiphany multicore chips from Adapteva. . The Parallella dev. board can be had for 200$

> or so. ( https://www.digikey.nl/en/products/filter/evaluation-boards-embedded-mcu-dsp/786?s=N4IgTCBcDaIA4EMBOCA2qCm6EgLoF8g )
>

Hugely interesting. A friend wrote Lisp for the device. I thought Adapteva closed down. And now they have been bought, by a company in stealth mode, and the boards are once again available. HUGELY tempting.

> I think "stack processor" is a red herring.

The core 1 team reports that their stack processor is 1/10th the size of a Risc-v core. Meaning that i can fit a lot more of them on a device. And a forth instruction executes in a single clock cycle. So why is it a red herring?

> Let me tell you, eForth is not suitable.

I would love to know why. I am still learning. The Core 1 guys said th ep16 did not have an I/O op code. They started out porting the EP16 to ALtera, then ditched the design. I still have not figured out why. I wonder if Dr. Ting was an electrical engineer who knew what he was doing or not. Not at all intuitive how device synthesis works.

none albert

unread,

Mar 12, 2023, 8:45:27 AM3/12/23

to

In article <395ef5de-14b9-46fd...@googlegroups.com>,

You misunderstood. eForth is not suitable -> the experts saying don't do it ->
that is the project you should embark in.

(Unless I misunderstood.)
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

Marcel Hendrix

unread,

Mar 12, 2023, 9:27:12 AM3/12/23

to

On Sunday, March 12, 2023 at 1:45:27 PM UTC+1, none albert wrote:
[..]

> You misunderstood. eForth is not suitable -> the experts saying don't do it ->
> that is the project you should embark in.
>
> (Unless I misunderstood.)

Also remember the way ants build a bridge, and how to become
a millionaire.

-marcel

dxforth

unread,

Mar 13, 2023, 5:42:07 AM3/13/23

to

An expert is one who has learnt how to get others to take all the risk.

Jurgen Pitaske

unread,

Mar 13, 2023, 6:33:50 AM3/13/23

to

This might be your approach in your business.

I am in our business more used to the fact,
that the expert understands the issues,
puts them all on the table for the ones involved,
and helps with expert advice to move the project forward.

Anton Ertl

unread,

Mar 13, 2023, 6:56:48 AM3/13/23

to

Jurgen Pitaske <jpit...@gmail.com> writes:
>On Monday, 13 March 2023 at 09:42:07 UTC, dxforth wrote:
>> An expert is one who has learnt how to get others to take all the risk.
>
>This might be your approach in your business.
>
>I am in our business more used to the fact,

>that the expert understands the issues,=20

>puts them all on the table for the ones involved,
>and helps with expert advice to move the project forward.

The sentence from dxforth appeared to be a typical dxforthisms,
although this appeared particularly nonsensical even among
dxforthisms. Your explanation made clear to me what I had missed,
resulting in a dxforthism with the usual portion of nonsense. What I
had missed was that dxforth probably had a setting in mind where a
decision maker seeks the advice of one or more experts to help him decide.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

dxforth

unread,

Mar 13, 2023, 8:02:48 AM3/13/23

to

On 13/03/2023 9:46 pm, Anton Ertl wrote:
> Jurgen Pitaske <jpit...@gmail.com> writes:
>> On Monday, 13 March 2023 at 09:42:07 UTC, dxforth wrote:
>>> An expert is one who has learnt how to get others to take all the risk.
>>
>> This might be your approach in your business.
>>
>> I am in our business more used to the fact,
>> that the expert understands the issues,=20
>> puts them all on the table for the ones involved,
>> and helps with expert advice to move the project forward.
>
> The sentence from dxforth appeared to be a typical dxforthisms,
> although this appeared particularly nonsensical even among

We are sensitive today. As you made no comment upon them, can
we assume you took seriously the comments to which I responded?

Matthias Koch

unread,

Mar 13, 2023, 9:36:59 AM3/13/23

to

Hi Christopher,

before you embark on the huge challenge, maybe try small single core designs to dive in and get a feeling for FPGAs!

You can try Mecrisp-Ice (custom stack processor) or Mecrisp-Quintus (RISC-V processor), and you should really have a look at the tutorial by Bruno Levy that shows you the path from a simple blinky to a RISC-V processor design.

https://github.com/brunolevy/learn-fpga
https://mecrisp.sourceforge.net/

Matthias

Christopher Lozinski

unread,

Mar 16, 2023, 4:40:26 PM3/16/23

to

On Monday, March 13, 2023 at 2:36:59 PM UTC+1, Matthias Koch wrote:

>
> You can try Mecrisp-Ice (custom stack processor)

That is a great idea.
i am getting educated about these issues. The EP16/24/32 seem to have two issues.
The stack uses a very large shift register, instead of some memory.
It uses latches for registers, advised against in my class.
Both issues are red flags for me.

There must be a reason the J1 CPU is so popular. I need to read more about it.
I guess I am in the process of exploring these different stack machines.

The work on the CORE-1 is also very interesting. I particularly like how they do not use a cross compiler, they just put the Forth code in a memory region to start off with.

Thank you
Chris

Lorem Ipsum

unread,

Mar 16, 2023, 6:55:26 PM3/16/23

to

On Thursday, March 16, 2023 at 4:40:26 PM UTC-4, Christopher Lozinski wrote:
> On Monday, March 13, 2023 at 2:36:59 PM UTC+1, Matthias Koch wrote:
>
> >
> > You can try Mecrisp-Ice (custom stack processor)
> That is a great idea.
> i am getting educated about these issues. The EP16/24/32 seem to have two issues.
> The stack uses a very large shift register, instead of some memory.
> It uses latches for registers, advised against in my class.
> Both issues are red flags for me.

I don't find any evidence of either of these being true. I can't find anything that indicates the stacks are shift registers. Looking at ep16.vhd, here is the code for the entire CPU register set.

sync: process(clk,clr)
begin
if clr='1' then -- master reset
inten <='0';
slot <= 0;
sp <= "00000";
sp1 <= "00001";
rp <= "00000";
rp1 <= "00001";
t <= (others => '0');
r <= (others => '0');
a <= (others => '0');
p <= (others => '0');
i <= (others => '0');
for ii in s_stack'range loop
s_stack(ii) <= (others => '0');
r_stack(ii) <= (others => '0');
end loop;
elsif (clk'event and clk='1') then
if reset='1' or slot=3 then
slot <= 0;
else slot <= slot+1;
end if;
if intload='1' then
inten <= intset;
end if;
if iload='1' then
i <= data_i(width-1 downto 0);
end if;
if pload='1' then
p <= p_in;
end if;
if tload='1' then
t <= t_in;
end if;
if rload='1' then
r <= r_in;
end if;
if aload='1' then
a <= a_in;
end if;
if spush='1' then
s_stack(conv_integer(sp1)) <= t;
sp <= sp+1;
sp1 <= sp1+1;
elsif spopp='1' then
sp <= sp-1;
sp1 <= sp1-1;
end if;
if rpush='1' then
r_stack(conv_integer(rp1)) <= r;
rp <= rp+1;
rp1 <= rp1+1;
elsif rpopp='1' then
rp <= rp-1;
rp1 <= rp1-1;
end if;
end if;
end process sync;

Google Groups will strip the multiple spaces, so it will be a bit hard to read. I'll present the relevant bits as I talk about them.

sync: process(clk,clr)
begin
if clr='1' then -- master reset
...
elsif (clk'event and clk='1') then
...

This is the part that indicates the registers are registers, and not latches. I expect you read the document ep16inVHDL.pdf and saw all the uses of the verb latch. He is speaking generally, without distinguishing between registers or latches. He is not saying he used latches over clocked registers. Latches are hard to find in FPGAs. They have to be constructed from combinational logic. Virtually no one uses them.

Here are the bits that show the stack is a memory, not a shift register. The memories are addressed by sp1 and rp1. I didn't dig into the code to tell why the code has sp1 and sp. Maybe it is for reading the stack. The code assigns the s_stack contents to 's', which is a terrible variable name as it is so hard to search on, so I didn't bother. There's also no comments to explain any of the variables used.

if spush='1' then
s_stack(conv_integer(sp1)) <= t;
sp <= sp+1;
sp1 <= sp1+1;
...
end if;
if rpush='1' then
r_stack(conv_integer(rp1)) <= r;
rp <= rp+1;
rp1 <= rp1+1;
...
end if;

If it were a shift register, it would operate without an address, only having access to the top items on the stack through ports.

> There must be a reason the J1 CPU is so popular. I need to read more about it.
> I guess I am in the process of exploring these different stack machines.

It is small, both in lines of code and in size in the FPGA. It's also pretty well documented.

> The work on the CORE-1 is also very interesting. I particularly like how they do not use a cross compiler, they just put the Forth code in a memory region to start off with.

That is standard operating procedure for many MCUs and Forths. Cross compilers are messy and a PITA.

--

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209

Christopher Lozinski

unread,

Mar 17, 2023, 1:09:51 AM3/17/23

to

On Thursday, March 16, 2023 at 11:55:26 PM UTC+1, Lorem Ipsum wrote:
>> The EP16/24/32 seem to have two issues.
> > The stack uses a very large shift register, instead of some memory.
> > It uses latches for registers, advised against in my class.

> I don't find any evidence of either of these being true. I can't find anything that indicates the stacks are shift registers. Looking at ep16.vhd, here is the code for the entire CPU register set.

Thank you for correcting me. I got it not from the code, but from the documentation.

> "In this design, the CPU latches all data into appropriate registers and stacks on the rising edge of a single phase master clock. Such a synchronous design ensures that all instructions are executed quickly and reliably in a single clock cycle. When the master clock is held steady, the microprocessor retains all data in registers, stacks and memory, consuming very little power. It is thus possible to further reduce its power consumption by reducing the clock rate, or stopping the clock completely.
."

I can't find the line where it said that the stacks are implemented as shift registers.

"Read the code" is good advice.

> You can try Mecrisp-Ice (custom stack processor)

I did take a look at the Mecrisp-Ice. It runs on the J1a cpu.
In the past I did look at the J1 CPU. I looked again. It now has 32 bit version, and Python simulators, and verilator to C++ compiler, and even a VHDL version. But it has two problems, one specific, one abstract. If I recall correctly, the specific problem, is that it has no interrupts. The abstract problem is that it was designed to squeeze into a tiny space for a commercial app.

I am less interested in building an end user application, and more interested in building a wonderful development environment. A grid of Forth cpu's talking to each other, maybe even interrupting each other. So i think interrupts are critical.

In other news, I have been learning Verilog, and am most unhappy with it.

Here is a large list of Verilog Gotcha's.
https://lcdm-eng.com/papers/snug06_Verilog%20Gotchas%20Part1.pdf

From the "Zen of Python"
Explicit is better than implicit.

I am not a full time EE, so an explicit language makes much more sense to me.
I think I much prefer VHDL. Of course the grass is always greener on the other side of the fence.

Thank you everyone for all of the great advice. I feel like you care about these issues. Out of caution, I do not even talk about it at school.

Lorem Ipsum

unread,

Mar 17, 2023, 1:47:49 AM3/17/23

to

On Friday, March 17, 2023 at 1:09:51 AM UTC-4, Christopher Lozinski wrote:
> On Thursday, March 16, 2023 at 11:55:26 PM UTC+1, Lorem Ipsum wrote:
> >> The EP16/24/32 seem to have two issues.
> > > The stack uses a very large shift register, instead of some memory.
> > > It uses latches for registers, advised against in my class.
> > I don't find any evidence of either of these being true. I can't find anything that indicates the stacks are shift registers. Looking at ep16.vhd, here is the code for the entire CPU register set.
> Thank you for correcting me. I got it not from the code, but from the documentation.
>
> > "In this design, the CPU latches all data into appropriate registers and stacks on the rising edge of a single phase master clock. Such a synchronous design ensures that all instructions are executed quickly and reliably in a single clock cycle. When the master clock is held steady, the microprocessor retains all data in registers, stacks and memory, consuming very little power. It is thus possible to further reduce its power consumption by reducing the clock rate, or stopping the clock completely.
> ."
>
> I can't find the line where it said that the stacks are implemented as shift registers.

"The T register connects parameter stack and return stack as a giant shift register.
Data can be shifted towards the return stack by a PUSH instruction, and shifted
towards the parameter stack by a POP instruction."

I suspect this is what you read. He is simply describing how the >R and R> instructions (which he calls PUSH and POP) move data back and forth. When you move a data value, it causes the rest of the data stack to move data in the same direction as well.

But even if he was using a shift register, why would that matter?

> "Read the code" is good advice.
> > You can try Mecrisp-Ice (custom stack processor)

I think you munged the attributions. This was from Matthias, not me.

> I did take a look at the Mecrisp-Ice. It runs on the J1a cpu.
> In the past I did look at the J1 CPU. I looked again. It now has 32 bit version, and Python simulators, and verilator to C++ compiler, and even a VHDL version. But it has two problems, one specific, one abstract. If I recall correctly, the specific problem, is that it has no interrupts. The abstract problem is that it was designed to squeeze into a tiny space for a commercial app.

A CPU in an FPGA often does not need interrupts. My CPU design has an interrupt. I tried to optimize my design for ease of calculating speeds, so every instruction is one clock cycle, including the interrupt. It is just a forced call instruction that also pushes the processor status word onto the data stack. It allows very fast servicing of interrupts for hard real time apps.

> I am less interested in building an end user application, and more interested in building a wonderful development environment. A grid of Forth cpu's talking to each other, maybe even interrupting each other. So i think interrupts are critical.

Chuck Moore would not agree with you. The GA144 has no interrupts. It has processors that are dedicated to tasks, so they stop and wait for data or even just synchronization.

> In other news, I have been learning Verilog, and am most unhappy with it.
>
> Here is a large list of Verilog Gotcha's.
> https://lcdm-eng.com/papers/snug06_Verilog%20Gotchas%20Part1.pdf
>
> From the "Zen of Python"
> Explicit is better than implicit.

Not sure what that means, other than I guess you are not happy with the fact that Verilog has many assumptions that you can override... if you know they are there. I've never learned Verilog, because I've never found a book that explains all the gotchas.

> I am not a full time EE, so an explicit language makes much more sense to me.
> I think I much prefer VHDL. Of course the grass is always greener on the other side of the fence.

VHDL is *very* wordy. It has relaxed a bit in the last decade or two with many new features that make life easy. I noticed the EP16 code uses (clk'event and clk='1') rather than (rising_edge(clk)). Very old school. It is deprecated, because it can trigger on odd events, such as a transition from HIGH to '1'. But it works normally. There are still gotchas. VHDL uses delta delays (think infinitesimal time intervals) to order events that happen when the clock has not advanced. Without that, you can have several FFs toggle on the same clock edge, with some being updated and feeding into others that will be evaluated next. If you run a clock through a buffer, it adds a delta delay, assuring the downstream FFs will receive data updated on the previous delta cycle. Opps! So watch out for any assignments (buffers) in clock paths.

> Thank you everyone for all of the great advice. I feel like you care about these issues. Out of caution, I do not even talk about it at school.

LOL I was in a presentation from a vendor for a new ARM MCU (back when not everyone was selling ARM MCUs). They asked about our applications and I mentioned Forth. The presenter actually laughed and said he didn't think anyone still used it! I just find it more usable than dealing with the tools of C or other HLLs.

I use VHDL, but I guess I'm used to that. Writing code for hardware design is not like writing software... exactly. I think in terms of what I want built, rather than only what it does. VHDL has code that is executed sequentially, but mostly it's concurrent... in parallel. That's the part people have trouble getting used to when they switch from software.

There are many who play/work in Forth because they like it. I guess that's why you are using it for your project. You just like it.

--

Rick C.

+ Get 1,000 miles of free Supercharging
+ Tesla referral code - https://ts.la/richard11209

Jan Coombs

unread,

Mar 17, 2023, 2:18:27 PM3/17/23

to

On Thu, 16 Mar 2023 22:09:49 -0700 (PDT)
Christopher Lozinski <caloz...@gmail.com> wrote:

> In other news, I have been learning Verilog, and am most unhappy with it.
>
> Here is a large list of Verilog Gotcha's.
> https://lcdm-eng.com/papers/snug06_Verilog%20Gotchas%20Part1.pdf

I have a few stack processors translated into MyHDL[1] if you'd like
another starting point. This Python library can simulate as fast[2]
as other free simulators, export wave traces to aid debug, and export
Verilog and VHDL for hardware synthesis.

One of the reasons for it's existence is to avoid the gotcha's and/or
verbosity of other development environments[3][4].

Jan Coombs
--

[1] MyHDL - From Python to Silicon!
https://myhdl.org/

[2] Performance
https://myhdl.org/docs/performance.html

[3] Why MyHDL?
https://myhdl.org/start/why.html

[4] What MyHDL is not
https://myhdl.org/start/whatitisnot.html

Matthias Koch

unread,

Mar 19, 2023, 10:42:56 AM3/19/23

to

Keep in mind that while Mecrisp-Ice (which comes in 16, 32 and 64 bits) is a direct descendant of Swapforth/J1 by James Bowman, I continued development, and it certainly has interrupts. There are also different variants of the processor, and you can use either shift registers or BRAMs for the stacks. The shift register stacks help when one is short on RAM blocks, and also allow higher maximum frequency in some cases.

Lorem Ipsum

unread,

Mar 19, 2023, 2:04:59 PM3/19/23

to

On Sunday, March 19, 2023 at 10:42:56 AM UTC-4, Matthias Koch wrote:
> Keep in mind that while Mecrisp-Ice (which comes in 16, 32 and 64 bits) is a direct descendant of Swapforth/J1 by James Bowman, I continued development, and it certainly has interrupts. There are also different variants of the processor, and you can use either shift registers or BRAMs for the stacks. The shift register stacks help when one is short on RAM blocks, and also allow higher maximum frequency in some cases.

When you say "shift registers", what feature in the FPGA are you referring to? Some other brands of FPGAs can use the LUT as a (memory based) shift register, by adding a counter to point to the input and output. At least I think that's what they do. It's been a long time since I've looked at Xilinx devices. Others have this as well, but I'm not sure who. Maybe the Lattice (non-iCE) parts. But the iCE40 devices do *not* use the LUTs as memory or shift registers. I see their "General Purpose FPGAs" have "distributed" RAM, which is using the LUTs. But I see no mention of shift registers.

So, are you using the fabric FFs for the stack memory, if not the BRAMs?

In devices that can use the LUTs as RAM, it's not a bad fit to the stacks. A 4LUT gives you a 16 deep stack and a 6LUT gives you a 64 deep stack. I think the 6LUT can be used as 32 x 2 bits, but don't quote me on that.

Using the features of a given line of parts can save on resource utilization, but ties the design to that architecture. For some, that's not a problem.

--

Rick C.

-- Get 1,000 miles of free Supercharging
-- Tesla referral code - https://ts.la/richard11209

Matthias Koch

unread,

Mar 21, 2023, 1:06:19 AM3/21/23

to

> When you say "shift registers", what feature in the FPGA are you referring to?

The normal flipflops available in fabric.

> But the iCE40 devices do *not* use the LUTs as memory or shift registers. I see their "General Purpose FPGAs" have "distributed" RAM, which is using the LUTs. But I see no mention of shift registers.

True.

> So, are you using the fabric FFs for the stack memory, if not the BRAMs?

Fabric FFs.

> Using the features of a given line of parts can save on resource utilization, but ties the design to that architecture. For some, that's not a problem.

Fabric FFs as shift register stacks are portable and run on Lattice iCE40, ECP5 and on Altera/Intel MAX10, the families I have used so far. The LUTs are not used, as Lattice iCE40 does not support the "lutram" design.

You correctly point out that using fabric FFs as memory is a waste of resources; but the smallest targets of Mecrisp-Ice come with very limited BRAM. HX1K offers 8 kb of dualport BRAM, UP5K offers 15 kb BRAM (plus 128 kb in total over four fixed-configuration single port memories), and HX8K offers 16 kb BRAM.

In these configurations I decided to give Forth the full amount of dualport BRAM memory available, and use 16 elements * 16 bits * 2 stacks = 512 LUT+FF for stack elements in the smallest HX1K, and 32*16*2=1024 LUT+FF for UP5K and HX8K.

The ports to larger FPGAs came later in time, and for these, you decide if you like shift registers, or want to use BRAMs as stacks.

Lorem Ipsum

unread,

Mar 21, 2023, 1:52:02 AM3/21/23

to

Still, when you say shift registers, you literally turn the FFs into shift registers? Yeah, I guess that's what a stack is when it comes down to it. It shifts in two directions, so each stack word needs a 2:1 mux on the input, and Bob's your uncle. Each FF has a LUT to make up the mux, so it fits well.

I don't recall ever actually using an iCE40 part in a design that came to fruition. I looked at them a lot, but mostly for their very low power. In the end, the design was more power than other ways of solving it, mostly MCUs. They can be very low power when not clocked. The iCE65 line from SiliconBlue was low double digit uA idle current, but the iCE40 line ended up being more like 100 uA. I don't know of SiBlue was being overly aggressive, or if Lattice decided it was not an important feature, and easier to produce specified at a higher number.

I believe someone told me there are more recent members of the iCE40 lines that are in the lower double digit uA again.

Have you looked at the Gowin parts? They are real and shipping in many varieties. Word is they are essentially a spin off from Lattice, but there's only so much similarity. The docs do suffer a bit from the language barrier. I was going to use one of their parts in a design I'm doing now, but my client sells a lot to the US Government and Gowin had been on a list by the US Military considered to be too close to the Chinese Military. So I'm planning to use an Efinix part.

Once I get the design done, I may spend some time rolling a stack processor for this FPGA. But I might also be so burnt out that I swear off electronics forever!

--

Rick C.

-+ Get 1,000 miles of free Supercharging
-+ Tesla referral code - https://ts.la/richard11209

Matthias Koch

unread,

Mar 21, 2023, 2:04:43 AM3/21/23

to

> Still, when you say shift registers, you literally turn the FFs into shift registers? Yeah, I guess that's what a stack is when it comes down to it. It shifts in two directions, so each stack word needs a 2:1 mux on the input, and Bob's your uncle. Each FF has a LUT to make up the mux, so it fits well.

Exactly, yes. On some targets I use a 3:1 MUX for the data stack instead to drop two elements at once, useful for example in store opcode.

> Have you looked at the Gowin parts?

Might be interesting in future, I am following the progress of Project Apicula: https://github.com/YosysHQ/apicula

> But I might also be so burnt out that I swear off electronics forever!

I am sorry to read that, and I wish you the best!