zealot_small on Lattice ICE40

462 views
Skip to first unread message

Nick Destrycker

unread,
May 7, 2015, 4:50:24 AM5/7/15
to zyli...@googlegroups.com

Hi,

Currently working on a project which needs pwm generator, freq counter, soft processor with uart for uploading code(some kind of bootloader). freq counter and pwm generator is done but know i need a very small open source portable soft processor.

It al needs to stay below 1000 LUT's. When using the zealot_small i noticed it uses true dual port block ram for rom, ram and stack. Unfortunately the lattice ice40 doesn't have true dual port block rams, only 2-port block ram as i noticed in the datasheet and when i was synthesizing (not infering correctly, using tons of logic).

zealot_medium doesn't use this dual port configuration, i decided to emulate every instruction i can. When removing all hardware-implemented instructions, and scapping some more logic i got it synthesized at 1175 LUT's, which is still too large.

i saw the zealot_medium uses single port rom and a top-of-stack register and i think that this is the major difference in LUT count.

Any ideas on how to use the zealot_small or zpu_small without true dual port ram and staying around 600, below 700 LUT ?

thx in advance

Björn Berglöf

unread,
May 7, 2015, 5:39:59 AM5/7/15
to zyli...@googlegroups.com

Just an idea... If you run the ram at twice the frequency of the

core, you can build a 2-port ram out of a 1-port ram.

    / Mr Bear

--
You received this message because you are subscribed to the Google Groups "zylin-zpu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zylin-zpu+...@googlegroups.com.
To post to this group, send email to zyli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/zylin-zpu/890bb80d-67bf-4967-84be-4344d9f68410%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nick Destrycker

unread,
May 7, 2015, 6:06:05 AM5/7/15
to zyli...@googlegroups.com
Mr Bear,
Can you be a little bit more specific? I do am able to infer 2-port ram like this:

process (wclk) --Write memory. 
    begin
        if (wclk'event and wclk = '1') then
            if (write_en = '1') then
                mem(conv_integer(waddr)) <= din; -- Using write address bus.
            end if;
        end if;
    end process;
process (rclk) -- Read memory.
    begin
        if (rclk'event and rclk = '1') then
            dout <= mem(conv_integer(raddr)); -- Using read address bus.     
       end if;
end process;

but not "true" dual port ram. I cannot read/write at the 2 ports at the same time, so i gues a 2-port ram is not what i need ?

Op donderdag 7 mei 2015 11:39:59 UTC+2 schreef bjorn.berglof:

Op donderdag 7 mei 2015 11:39:59 UTC+2 schreef bjorn.berglof:

Björn Berglöf

unread,
May 7, 2015, 10:50:32 AM5/7/15
to zyli...@googlegroups.com

Emulated 2-port ram is an old standard method, used in the times when

logic was slow, and rams often quite fast. The idea is to divide the clock

cycle in two phases (periods of a faster clock). Phase A always writes,

Phase B does the reading. Thus it apears as if the ram has two ports.

 

The way to handle clocks, the mux of the address inputs, and latching

of read data depends on several things, so it is probably not a good idea

unless you are upp to speed in h/w design.

 

It is sometimes called “emulated dual port”, “emulated 2-port,” or

“pseudo dual-port ram”. Lattice seems to have some support for this,
see pages 22-25 in http://www.latticesemi.com/view_document?document_id=8521

Good luck!

    / MrBear

Rick Collins

unread,
May 7, 2015, 1:02:25 PM5/7/15
to zyli...@googlegroups.com
I'm not sure what you mean when you say the RAM you are inferring is not  "true" dual port.   The iCE40 data sheet refers to it as "pseudo dual port", but that is to indicate the ports are not the same so that you can't perform two reads or two writes at the same time.  I don't see a problem with the read and write being simultaneous and I can find no restrictions indicated in the data books.

That said, I don't know what operations the various CPUs require of the block RAMs, but within the obvious restrictions imposed by having separate read and write ports and only one of each, I don't think there is anything the iCE40 block RAMs can't do.  If you need to perform two simultaneous reads on the block RAM you can write the code exactly that way and the tools should infer the use of two block RAMs sharing a common write port and separate read ports emulating a dual read RAM.  If the tools don't figure this out then you will have to manually code this up.  Two independent write ports to the same memory can't be emulated in a similar manner.

To help clarify the dual port RAM method Mr Bear is talking about, this is a way to use a single ported RAM as if it were dual ported.  You run the RAM at twice the speed of the rest of the circuit with a 2x clock and a multiplexor on the various inputs. The iCE40 block RAMs can perform simultaneous reads and writes so using the split phase approach does not buy you anything unless this helps with the timing in your particular design. 

I hope this helps.

Rick

Michael Schnell

unread,
May 12, 2015, 4:13:44 AM5/12/15
to zyli...@googlegroups.com
On 05/07/2015 10:50 AM, Nick Destrycker wrote:
> Currently working on a project which needs pwm generator, freq
> counter, soft processor with uart for uploading code(some kind of
> bootloader). freq counter and pwm generator is done but know i need a
> very small open source portable soft processor

Is anything so fast that you can't use a processor chip for that purpose
? E.g., a PIC32 has hardware for PWM and Freq Counter, and for rather
fast "queer" stuff, it has a second (some even more) register set to
allow for extremely fast interrupts (without saving and retrieving the
registers of the main line code. running at 80 or 120 MHz (the normal
"MX" series)) you can get a reaction to a timer or a hardware event
within some 100 nSec. And there is a 200 MHz "MZ" series.

-Michael

Nick Destrycker

unread,
May 20, 2015, 2:59:57 AM5/20/15
to zyli...@googlegroups.com, msch...@lumino.de
Well the the ice40 ultralight is currently the smallest smallest fpga available(1.4mm x 1.4mm). The smallest uC i can find is the KL03 from freescale measuring 1.6mm x 2.0mm. Size need to be as small as possible because of the application (body prosthesis). Although the fpga will probably dissipate more power that is something to evaluate after.

as with true dual-port ram, two address ports are available for read or write operation (two read/write ports). In this mode, you can write to or read from the address of port A or port B, and the data read is shown at the output port with respect to the read address port. with simple dual port reading is done through one port and writing through the other.

I might see it wrong but simple dual port (lattice defines this as there dual port ram) is not what i need with the zpu because stack operation require writing as wel as reading through one port. The other one is used for fetching instructions and writing data to the ram.

Would someone correct me if i'm wrong ? i'm having a hard time understanding this soft processor :p

Grtz Nick




Op dinsdag 12 mei 2015 10:13:44 UTC+2 schreef Michael Schnell:
Op dinsdag 12 mei 2015 10:13:44 UTC+2 schreef Michael Schnell:

Michael Schnell

unread,
May 20, 2015, 7:10:03 AM5/20/15
to zyli...@googlegroups.com
On 05/20/2015 08:59 AM, Nick Destrycker wrote:
> Well the the ice40 ultralight is currently the smallest smallest fpga
> available(1.4mm x 1.4mm). The smallest uC i can find is the KL03 from
> freescale measuring 1.6mm x 2.0mm. Size need to be as small as possible...

Maybe "chip on board" direct bonding might be an option.

-Michael

Rick Collins

unread,
May 20, 2015, 8:51:55 AM5/20/15
to zyli...@googlegroups.com
At 02:59 AM 5/20/2015, you wrote:
Well the the ice40 ultralight is currently the smallest smallest fpga available(1.4mm x 1.4mm). The smallest uC i can find is the KL03 from freescale measuring 1.6mm x 2.0mm. Size need to be as small as possible because of the application (body prosthesis). Although the fpga will probably dissipate more power that is something to evaluate after.

as with true dual-port ram, two address ports are available for read or write operation (two read/write ports). In this mode, you can write to or read from the address of port A or port B, and the data read is shown at the output port with respect to the read address port. with simple dual port reading is done through one port and writing through the other.

I might see it wrong but simple dual port (lattice defines this as there dual port ram) is not what i need with the zpu because stack operation require writing as wel as reading through one port. The other one is used for fetching instructions and writing data to the ram.

Would someone correct me if i'm wrong ? i'm having a hard time understanding this soft processor :p

You can emulate true dual port RAM by using two RAMs with a common write bus and two separate read busses.  This at least gives you multiple read ports. 

I don't recall the need for dual port RAM in the ZPU.  You would have to construct the data flow of each instruction and see what data was moving  where and when.  I see no  reason why the ZPU couldn't be designed without dual port RAM.  The dual port RAM allows concurrent operations but more importantly allows the elimination of multiplexors.  So not using dual port RAM would likely need more logic elements.  But it certainly can be done.

Rick



--
You received this message because you are subscribed to the Google Groups "zylin-zpu" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zylin-zpu+...@googlegroups.com.
To post to this group, send email to zyli...@googlegroups.com .

Hieronymus vanWontz

unread,
Jun 24, 2015, 9:21:58 AM6/24/15
to zyli...@googlegroups.com, gnuar...@arius.com
Hi,

This is an interesting one. The ZPU small does quite a bit of traffic on the DPRAM (shared for prog/data/stack). The main traffic is caused by the stack, so if you can isolate the stack memory part and use wait cycles, you should be able to run it the "pseudo" way. If your stack is not very deep, you might even use distributed RAM.
The only reason to use real dual ported RAM is when you use a pipelined ZPU variant. Isolating the stack memory and multiplex to separate busses works pretty well and does not introduce relevant incompatibilies, plus your stack won't trash your program code when running into infinite recursion (although there is already some protection by some guarding values in prog mem, I believe)

Greetings,

- Martin

Reply all
Reply to author
Forward
0 new messages