Forth in VHDL

Don Golding

unread,

Jul 12, 2004, 11:00:08 AM7/12/04

to

I am building a simple 16 bit Forth Processor in VHDL which will be embedded
in my future VHDL projects as the high level processor core. I want to
build a very simple and elegant Forth, not a screamer like the P16 or other
Novix like variations

I will be posting questions from time to time regarding this project, if
other people have done this already I would like to here about it.

Thanks,
Don Golding
dgol...@sbcglobal.net

Bernd Paysan

unread,

Jul 12, 2004, 11:55:57 AM7/12/04

to

Don Golding wrote:

> I am building a simple 16 bit Forth Processor in VHDL which will be
> embedded
> in my future VHDL projects as the high level processor core. I want to
> build a very simple and elegant Forth, not a screamer like the P16 or
> other Novix like variations
>
> I will be posting questions from time to time regarding this project, if
> other people have done this already I would like to here about it.

Have a look at my b16 (www.b16-cpu.de).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Don Golding

unread,

Jul 12, 2004, 12:22:28 PM7/12/04

to

I will take a look at what you have done, thanks for the feedback.

Without looking at your source, it seems you are building a high performance
Forth Processor like Ting's P16.

I want to build a KISS Forth Processor to be used as the "Master Control
Program" and high speed problems being solved in VHDL.

Once you code a VHDL solution you just add it to the Forth Processor as a
new opcode it can then call and receive the result on the stack.

I think a simple Forth Processor used this way would use very little
resources (less than 10 percent in a Flex 70 device) and allow this device
to become an application specific Forth Processor using VHDL for speed and
Forth for programmability.

Thanks,
Don

"Bernd Paysan" <bernd....@gmx.de> wrote in message
news:dtoas1-...@miriam.mikron.de...

Guy Macon

unread,

Jul 12, 2004, 12:40:38 PM7/12/04

to

Don Golding <dgol...@sbcglobal.net> says...

>I am building a simple 16 bit Forth Processor in VHDL which will be embedded
>in my future VHDL projects as the high level processor core. I want to
>build a very simple and elegant Forth, not a screamer like the P16 or other
>Novix like variations

When you are finished, I would very much like to buy ten sample chips
from you. My contact info is at http://www.guymacon.com/

Charles Steinkuehler

unread,

Jul 12, 2004, 2:45:29 PM7/12/04

to Don Golding, comp.la...@ada-france.org

Don Golding wrote:

> I will take a look at what you have done, thanks for the feedback.
>
> Without looking at your source, it seems you are building a high performance
> Forth Processor like Ting's P16.
>
> I want to build a KISS Forth Processor to be used as the "Master Control
> Program" and high speed problems being solved in VHDL.
>
> Once you code a VHDL solution you just add it to the Forth Processor as a
> new opcode it can then call and receive the result on the stack.
>
> I think a simple Forth Processor used this way would use very little
> resources (less than 10 percent in a Flex 70 device) and allow this device
> to become an application specific Forth Processor using VHDL for speed and
> Forth for programmability.

I've recently done quite a bit of "back-of-napkin" analysis of building
a *SIMPLE* and *TINY* forth processor targeted for an FPGA. I was
targeting Xilinx parts (I'm forced to use Altera 10K and earlier parts
at my day job, and let's just say I'm not a big fan of their architecture).

Anyway, I was targeting a core that would be smaller than the pico-blaze
and run faster (1 clock/instruction). If you remove the speed
constraint, it might be possible to make the design smaller, but I found
myself gaining an appreciation for Chuck Moore's direction of high-speed
*SIMPLE* processors, as it's very easy to balloon the design size
(especially the control logic) by adding 'simple' features.

The keys to small design (for FPGA logic, at least) seemed to be
aggressive reduction of the multiplexer width for the various logic
paths. If you're willing to implement something a bit larger than the
pico-blaze, it lookes like a 12 to 16 bit machine would be the most
practical and efficient use of resources. An 8-bit system suffers from
the virtually required disparity between the address and data path
widths, while more than 16 bits is likely overkill.

With a multi-clock per instruction design, you can time-share logic
(potentially reducing size), but you're also increasing control
complexity (multiple instruction states to keep track of, which means
larger instruction decode logic), plus you probably have more
multiplexers to route the various data around (potentially increasing
logic size).

It looks to me like a forth engine optimized for an FPGA would do as
much as possible in 1-2 layers of logic, making maximum use of the
'free' 4-input logic in front of the flops used for state storage (ie:
typically you can fit a 2-1 mux + add/sub or loadable up/down counter
(or combinations, like 2-1 mux + add + down counter). Using a 'flat'
deisgn tweaked for high-speed operation (1 or 2 instruction states, max)
would minimize the size of the control and decode logic.

Feel free to bounce ideas around on the list, or e-mail me directly. I
have both Xilinx and Altera design environments available, and while I
don't have a lot of free time, I have 13 years of experience with FPGA's
(going back to the Xilinx 2K series!) and I'd be interested in helping
if you're doing an open design that can be shared.

--
Charles Steinkuehler
cst...@newtek.com

Albert van der Horst

unread,

Jul 13, 2004, 3:41:19 AM7/13/04

to

In article <mailman.15.108965795...@ada-france.org>,

Charles Steinkuehler <cst...@newtek.com> wrote:
>The keys to small design (for FPGA logic, at least) seemed to be
>aggressive reduction of the multiplexer width for the various logic
>paths. If you're willing to implement something a bit larger than the
>pico-blaze, it lookes like a 12 to 16 bit machine would be the most
>practical and efficient use of resources. An 8-bit system suffers from
>the virtually required disparity between the address and data path
>widths, while more than 16 bits is likely overkill.
>

Would a 32 bit design impossible for the larger FPGA's?
As far as I can see changing an implementation to 32 bits need not
change much in the instruction set for a stack machine.

>It looks to me like a forth engine optimized for an FPGA would do as
>much as possible in 1-2 layers of logic, making maximum use of the
>'free' 4-input logic in front of the flops used for state storage (ie:
>typically you can fit a 2-1 mux + add/sub or loadable up/down counter
>(or combinations, like 2-1 mux + add + down counter). Using a 'flat'
>deisgn tweaked for high-speed operation (1 or 2 instruction states, max)
>would minimize the size of the control and decode logic.
>
>Feel free to bounce ideas around on the list, or e-mail me directly. I
>have both Xilinx and Altera design environments available, and while I
>don't have a lot of free time, I have 13 years of experience with FPGA's
>(going back to the Xilinx 2K series!) and I'd be interested in helping
>if you're doing an open design that can be shared.

So you are the right person to ask.
What I would like to have is called the junk yard computer.
This could come extremely cheap if you bother to get your connectors
from an old mother board.

It is a 32 bit Forth engine on a FPGA with:
An interface for a standard keyboard integrated into the fpga.
Then an interface for those 68 pin memory chips that are lying
around by the truck load. Then an interface to a character generator
Rom in a cheap EPROM with a VGA compatible output. Then an ide
interface.

This would be a much nicer single board computer, than those fed
through a serial line to a relative expensive xp.
Alternatively would be to interface to a TV.
It would make a spectacular demo on Forth stands.

Is this feasible? As far as I can tell, the burden could be
split over people. Maybe a DRAM memory interface a standard
component?

>
>--
>Charles Steinkuehler
>cst...@newtek.com

Groetjes Albert

--
--
Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
One man-hour to invent,
One man-week to implement,
One lawyer-year to patent.

Guy Macon

unread,

Jul 14, 2004, 12:22:33 PM7/14/04

to

Albert van der Horst <alb...@spenarnc.xs4all.nl> says...

>What I would like to have is called the junk yard computer.
>This could come extremely cheap if you bother to get your connectors
>from an old mother board.
>
>It is a 32 bit Forth engine on a FPGA

I would instead specify the best Forth engine that fits on a FPGA
costing less than $N.

>with:
>An interface for a standard keyboard integrated into the fpga.

One day soon you will find it hard to find a "standard keyboard"
just as it is currently hard to find a IBM PC (NOT AT or PS/2)
keyboard. USB is the wave of the future here.

>Then an interface for those 68 pin memory chips that are lying
>around by the truck load. Then an interface to a character generator
>Rom in a cheap EPROM with a VGA compatible output. Then an ide
>interface.

A USB keychain memory would be able to replace the memory chip and
the EPROM. They atre cheap and getting cheaper.

>This would be a much nicer single board computer, than those fed
>through a serial line to a relative expensive xp.
>Alternatively would be to interface to a TV.

I wonder how cheaply one could make a B+W NTSC composite signal?
I wonder how cheaply one could make a 640x480+8bit VGA signal?

>It would make a spectacular demo on Forth stands.

Yes. It certainly would.

Charles Steinkuehler

unread,

Jul 14, 2004, 1:21:02 PM7/14/04

to Albert van der Horst, comp.la...@ada-france.org

Albert van der Horst wrote:

> In article <mailman.15.108965795...@ada-france.org>,
> Charles Steinkuehler <cst...@newtek.com> wrote:
>>The keys to small design (for FPGA logic, at least) seemed to be
>>aggressive reduction of the multiplexer width for the various logic
>>paths. If you're willing to implement something a bit larger than the
>>pico-blaze, it lookes like a 12 to 16 bit machine would be the most
>>practical and efficient use of resources. An 8-bit system suffers from
>>the virtually required disparity between the address and data path
>>widths, while more than 16 bits is likely overkill.
>>
> Would a 32 bit design impossible for the larger FPGA's?
> As far as I can see changing an implementation to 32 bits need not
> change much in the instruction set for a stack machine.

The instruction set, and corresponding control logic wouldn't change
much (if at all) when going from 16 to 32 bits, but in a truely tiny
FPGA forth engine, the bulk of the logic is in the datapath (ie: ALU,
rotate logic, etc). That would basically double in size.

That doesn't mean a 32-bit Forth engine won't fit in an FPGA, or even
that it would be inappropriate in an FPGA, but it does mean that there's
no reason to throw the extra gates into the core unless you really need
32-bit performance.

>>It looks to me like a forth engine optimized for an FPGA would do as
>>much as possible in 1-2 layers of logic, making maximum use of the
>>'free' 4-input logic in front of the flops used for state storage (ie:
>>typically you can fit a 2-1 mux + add/sub or loadable up/down counter
>>(or combinations, like 2-1 mux + add + down counter). Using a 'flat'
>>deisgn tweaked for high-speed operation (1 or 2 instruction states, max)
>>would minimize the size of the control and decode logic.
>>
>>Feel free to bounce ideas around on the list, or e-mail me directly. I
>>have both Xilinx and Altera design environments available, and while I
>>don't have a lot of free time, I have 13 years of experience with FPGA's
>>(going back to the Xilinx 2K series!) and I'd be interested in helping
>>if you're doing an open design that can be shared.
>
> So you are the right person to ask.
> What I would like to have is called the junk yard computer.
> This could come extremely cheap if you bother to get your connectors
> from an old mother board.
>
> It is a 32 bit Forth engine on a FPGA with:
> An interface for a standard keyboard integrated into the fpga.
> Then an interface for those 68 pin memory chips that are lying
> around by the truck load.

??? What 68-pin memory chips are you referring to?

> Then an interface to a character generator
> Rom in a cheap EPROM with a VGA compatible output. Then an ide
> interface.
>
> This would be a much nicer single board computer, than those fed
> through a serial line to a relative expensive xp.
> Alternatively would be to interface to a TV.
> It would make a spectacular demo on Forth stands.
>
> Is this feasible? As far as I can tell, the burden could be
> split over people. Maybe a DRAM memory interface a standard
> component?

Feasable, yes...all it would take is design time. You wouldn't even
need a particularly large or expensive FPGA. You'd probably need a USB
phy chip of some sort, but pretty much everything else could be done
directly by the FPGA itself (IDE, DRAM interface, display, etc).

You'll need an external D/A for the video interface, but at NTSC/PAL
composite video rates (or 640x480 VGA), you can get by with CMOS gates
from the FPGA driving an R2R ladder with maybe a couple transistors for
buffering, gain, and/or ESD protection (total cost a few cents in
volume, or probably about a quarter for 3-channel output in low qty).

The hardware design for something like this would be very simple, but
the software to make it do anything useful is a pretty monumental task,
especially if you don't have a core to start from...

You *COULD* try to port something like linux to the hardware, but what's
the point of making a stack-based Forth processor only to try to
shoe-horn a bunch of C code into it? Just go buy a PPC/ARM/whatever
that's been optimized to run C, comes with the I/O you're wanting, and
already has support for linux in the main kernel source tree.

--
Charles Steinkuehler
cst...@newtek.com

Charles Steinkuehler

unread,

Jul 14, 2004, 1:29:44 PM7/14/04

to http://www.guymacon.com, comp.la...@ada-france.org

Guy Macon wrote:

> Albert van der Horst <alb...@spenarnc.xs4all.nl> says...
>

>>This would be a much nicer single board computer, than those fed
>>through a serial line to a relative expensive xp.
>>Alternatively would be to interface to a TV.
>
> I wonder how cheaply one could make a B+W NTSC composite signal?
> I wonder how cheaply one could make a 640x480+8bit VGA signal?

If there's a fairly modern FPGA that includes a stack-based processor
core, you could add a composite/VGA output for the cost of the I/O pins,
a few thousand gates, and a handful of external discretes...in other
words, essentially free.

>>It would make a spectacular demo on Forth stands.
>
> Yes. It certainly would.

What exactly would you do with such a system, other than run a demo?

Would this be a video/vga version of "Look honey, I made the LED blink!"?

NOTE: I'm not trying to be sarcastic...I'm seriously interested in
working on an HDL version of a stack-based machine, and have the design
experience to glue on video, DRAM, and IDE interfaces (been there, done
that, at least with non-DMA versions of IDE). What I don't have is a
lot of free time or a commercial reason to work on such a beast. But if
it's enough fun (and there are others interested in contributing), it
would be worth working on anyway. :)

--
Charles Steinkuehler
cst...@newtek.com

Don Golding

unread,

Jul 14, 2004, 1:45:07 PM7/14/04

to

I am currently using the Altera University Board which includes: VGA Monitor
Port, Standard PC Keyboard port, two hexadecimal LED displays and a Flex 70
FPGA. It costs about $100 from Altera or through University bookstores.
The tool you want to use is Quartus II which is a free download from their
website as well. It comes with the libraries for the keyboard and VGA ports
that allow you to implement them no sweat. It shouldn't be a problem to
implement a 32 bit Forth on this device.

The Forth Processor I am designing in VHDL uses generics to specify the data
width and other attributes such as stack depth for easy customization.
Theoretically, you only need to change these values to go from 16 bit to 32
bit architecture. That is the goal. The current 16 bit design will use
less that 15 to 20 percent of the FPGA leaving plenty of room for
specialized VHDL processes when you need the fastest speeds possible. VHDL
code is al lot like Pascal syntax, with some specific syntax differences
like signals vs. variables.

Don

"Charles Steinkuehler" <cst...@newtek.com> wrote in message
news:mailman.17.108982568...@ada-france.org...

rickman

unread,

Jul 14, 2004, 2:33:18 PM7/14/04

to

Let me add my two cents worth. I have a similar background (even though
I never worked with the Xilinx 2k family) however I cut my eye teeth on
microprogrammed processor design back in the early 80's.

I am about 90% through (been there for several days now :) a FPGA Forth
CPU design and I discovered (or rediscovered if you consider Chuck's and
Jeff's work) some of the same issues you found. I started with a 4 bit
instruction set since the internal memory of an FPGA doesn't map well to
non powers of two. My goals were the same as yours to keep it as small
as possible and to make it run 1 instruction per clock. This makes the
design a lot simpler. The main thing I discovered was that the
multiplexors are what drives the design size. So I have focused on
limiting extraneous multiplexors that appear when you try to directly
implement Forth code.

I'm not sure what to tell about what I have come up with. Since I plan
to use this in a design (both Altera and Xilinx parts) I have not
decided if I will release the work or not. But I am happy to discuss
the design issues.

I will say that two things I decided may help others. I found there is
a lot of value in the address sources used in machineForth. They can
use either a special A register or the R reg (top of return stack) with
auto increment capability. I was planning to use a literal construction
register similar to the Transputers (each literal instruction shifts the
data into it). While trying to optimize the instruction set to improve
do loop performance I ended up combining the literal reg with the return
stack and found I also had the logic to do autoincrement for free. So
my instruction set provides fetch and store as well as ftchpls and
storpls which don't drop the address and add 1 or 2 depending on the
byte access flag. In the end it seems the return stack is a natural
place to provide addresses.

The other "discovery" I made was in the opcode size. I was seeing slow
performance (~50 MHz) partly because the instruction decode logic was
pretty complex. I had been "extending" the 4 bit opcode with a prefix
instruction and so had to decode 7 bits. When I had to go to a larger
opcode to allow for more efficient coding I realized that by using the
full 8 bits from the memory for an effective 5 bit opcode, I could
"pre-decode" the instruction somewhat. I use the msb to flag a literal
vs. other instructions. There are 7 bits left which I use 3 of 7
encoding to provide 35 more instructions. That means any one
instruction can be decoded with just 4 bits of the 8 bit opcode. By
judiciously picking my opcode encoding to provide bits that directly
control various functions, I can greatly reduce the amount of decoding
needed. For example, I use opcode bit 0 to select pop/push for the
return stack and bit 4 for the same function on the data stack. Other
bits are used directly to select the ALU function. Decoding the jumps
and calls are done with just two bits.

One final comment which may be obvious. An FPGA design can run from
internal memory at very high speeds. External memory is very slow by
comparison (again, Jeff and Chuck saw this issue even more pronounced in
their ASIC designs at 500 MHz). My design is intended to be a
controller for low level chip functions and so can run completely from
on chip memory. Trying to make a general CPU this way will require a
very complex external memory interface (cache and pipelining) to get any
sort of speed compared to commercial CPUs.

--

Rick "rickman" Collins

rick.c...@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

Charles Steinkuehler

unread,

Jul 14, 2004, 3:02:10 PM7/14/04

to jo...@bluepal.net, comp.la...@ada-france.org

rickman wrote:

Interesting...I'd looked at shifting litereals into the TOS register,
but that required 2 instructions (load literal into TOS, and
shift/append literal to TOS) and increased the size of the multiplexer
in front of the TOS, which was already by far the biggest mux in the
design. I hadn't thought of using the return stack...

> The other "discovery" I made was in the opcode size. I was seeing slow
> performance (~50 MHz) partly because the instruction decode logic was
> pretty complex. I had been "extending" the 4 bit opcode with a prefix
> instruction and so had to decode 7 bits. When I had to go to a larger
> opcode to allow for more efficient coding I realized that by using the
> full 8 bits from the memory for an effective 5 bit opcode, I could
> "pre-decode" the instruction somewhat. I use the msb to flag a literal
> vs. other instructions. There are 7 bits left which I use 3 of 7
> encoding to provide 35 more instructions. That means any one
> instruction can be decoded with just 4 bits of the 8 bit opcode. By
> judiciously picking my opcode encoding to provide bits that directly
> control various functions, I can greatly reduce the amount of decoding
> needed. For example, I use opcode bit 0 to select pop/push for the
> return stack and bit 4 for the same function on the data stack. Other
> bits are used directly to select the ALU function. Decoding the jumps
> and calls are done with just two bits.

Sounds very nifty! I started with the 3 packed 5-bit instructions of
the P-series cores and was headed towards 6-bit partially pre-decoded
instructions (3 of which would fit in the new memories of the latest
FPGA's, which include parity bits on the internal block rams), but it
seemed like the additional problems caused by leaving "natural" byte (or
at least nibble) sized data chunks wasn't worth it, and the control
logic still looked like it would get pretty big (I was still had lots of
instructions that required more than 4 bits for decode). I like the
trade-off of dropping one instruction per 16-bits and using an
instruction set designed for 4-bit decodes.

> One final comment which may be obvious. An FPGA design can run from
> internal memory at very high speeds. External memory is very slow by
> comparison (again, Jeff and Chuck saw this issue even more pronounced in
> their ASIC designs at 500 MHz). My design is intended to be a
> controller for low level chip functions and so can run completely from
> on chip memory. Trying to make a general CPU this way will require a
> very complex external memory interface (cache and pipelining) to get any
> sort of speed compared to commercial CPUs.

My intent was to make something that would run entirely out of the FPGA
internals, keeping everything running at 1 instruction per clock with
the possible exception of I/O accesses which might be able to insert
wait-states.

If you decide to make your design openly available, I'd love to take a
look at it. If not, I really appreciate the info above...it's already
saved me a lot of time should I ever get enough free time to roll a
design of this sort for fun (or a good enough reason to implement it on
someone else's dime).

--
Charles Steinkuehler
cst...@newtek.com

Albert van der Horst

unread,

Jul 14, 2004, 1:34:49 PM7/14/04

to

In article <10fandu...@corp.supernews.com>,

Guy Macon <http://www.guymacon.com> wrote:
>
>Albert van der Horst <alb...@spenarnc.xs4all.nl> says...
>
>>What I would like to have is called the junk yard computer.
>>This could come extremely cheap if you bother to get your connectors
>>from an old mother board.
>>
>>It is a 32 bit Forth engine on a FPGA
>
>I would instead specify the best Forth engine that fits on a FPGA
>costing less than $N.

Why? Who is interested in saving 10 euro in a project that
takes all the free time of a long winter? Not me!

Who talks about costs? It is about getting some use out of junk
components.

>
>>with:
>>An interface for a standard keyboard integrated into the fpga.
>
>One day soon you will find it hard to find a "standard keyboard"
>just as it is currently hard to find a IBM PC (NOT AT or PS/2)
>keyboard. USB is the wave of the future here.

It isn't hard to find. How dare you say that? Each other week
someone ask me to leave one of those with me. The weeks in
between I find one alongside the street.
I have a dozen lying around. I could have bought one brandnew
(the original IBM AT, heavier than a laptop) for one euro.

USB is only the "wave of the future" for those who want to keep
technology out of our hands. You need to pay license fees for
even using an address.

rickman

unread,

Jul 14, 2004, 6:13:44 PM7/14/04

to

I'm not sure what you mean by "2 instructions". Are you saying that you
would need two opcodes? I am using just one and use a state variable to
distinguish between the first and contiguous literal instructions. The
first LIT does a load with sign extension and others that immediately
follow do a load with shift.

The return stack worked out for me not so much because of the TOS mux
issue. Mine is also very large, but I was trying to make this the focus
and eliminate muxes elsewhere when I could. In the Altera parts they
have a cascade chain that can implement the mux OR gate with 0.1 ns
delay. The newer Xilinx parts can do very wide muxes (16 inputs I
think) in a single CLB. So the TOS mux will only be 1 level of logic in
either case. Of course this optimization does not reduce the number of
LUTs needed. Adders are pretty cheap (in LUTs) compared to muxes. An
adder is the same as a 2 input mux. So it often is better to use more
adders and fewer data path muxes if you can.

In my original 4 bit ISA, some 8 or so clock cycles were needed to
implement the LOOP function. I thought this was pretty poor. So I
added an adder/subtractor to the return stack to allow LOOP to be done
in two clock cycles. Then I realized how this could help with address
increments as well.

The purpose of the LIT reg was to provide addressing for CALLs with
minimum overhead. So in essence this was like Chuck's and Jeff's A
register. Once I had an adder on the return stack, I saw that I could
move the LIT reg function to the return stack and actually *save* logic
(once I had the adder).

I dropped the instruction packing concept because it required an extra
layer of mux on the instruction register the way I would do it. My
speed critical paths are a toss between the fetch and the stack ops.
Removing this mux will help with the overall speed since instruction
fetch is part of *every* cycle.

> > One final comment which may be obvious. An FPGA design can run from
> > internal memory at very high speeds. External memory is very slow by
> > comparison (again, Jeff and Chuck saw this issue even more pronounced in
> > their ASIC designs at 500 MHz). My design is intended to be a
> > controller for low level chip functions and so can run completely from
> > on chip memory. Trying to make a general CPU this way will require a
> > very complex external memory interface (cache and pipelining) to get any
> > sort of speed compared to commercial CPUs.
>
> My intent was to make something that would run entirely out of the FPGA
> internals, keeping everything running at 1 instruction per clock with
> the possible exception of I/O accesses which might be able to insert
> wait-states.
>
> If you decide to make your design openly available, I'd love to take a
> look at it. If not, I really appreciate the info above...it's already
> saved me a lot of time should I ever get enough free time to roll a
> design of this sort for fun (or a good enough reason to implement it on
> someone else's dime).

I will email you my block diagram (PDF). It clearly shows my first
lesson which Jeff and Chuck already figured out which is that separate,
dedicated memories allow a simplification of the address generation and
therefore better speed.

I assume your email address is valid?

Guy Macon

unread,

Jul 14, 2004, 9:53:11 PM7/14/04

to

Albert van der Horst <alb...@spenarnc.xs4all.nl> says...

>More rambling contrary to the idea of a junk yard computer deleted.

>How dare you

>My ass.

This is an overly agressive reply to some quite civil comments
concerning the features one might desire. If you wish to have
a discussion that is only about your vision for what such a
device should be, I suggest that you stop posting to Usenet and
start emailing yourself.

Guy Macon

unread,

Jul 14, 2004, 10:27:27 PM7/14/04

to

Don Golding <dgol...@sbcglobal.net> says...

>I am currently using the Altera University Board which includes: VGA Monitor
>Port, Standard PC Keyboard port, two hexadecimal LED displays and a Flex 70
>FPGA. It costs about $100 from Altera or through University bookstores.

http://www.usna.edu/EE/ee461/Lab/Altera.html
http://search.altera.com/query.html?qt=up1
http://search.altera.com/query.html?qt=up2

rickman

unread,

Jul 15, 2004, 12:58:13 AM7/15/04

to

Relax. I can see where you might take offense at his post. But I think
he meant it to be much lighter than it came off. Let's face it, a 72
pin DRAM module will never take the place of an 8 pin MSOP serial
RAM/EEPROM. So he must be at least half kidding!

Besides, what good is there to getting ticked off about something posted
on the Internet?

Albert van der Horst

unread,

Jul 15, 2004, 3:53:13 AM7/15/04

to

In article <40F60EE5...@yahoo.com>, rickman <jo...@bluepal.net> wrote:
<SNIP>

>
>Relax. I can see where you might take offense at his post. But I think
>he meant it to be much lighter than it came off. Let's face it, a 72
>pin DRAM module will never take the place of an 8 pin MSOP serial
>RAM/EEPROM. So he must be at least half kidding!

Are you referring to the dogma that you need at least some ROM in a
system for booting?
This is of course also true for a Forth computer on fpga with integrated ide.
However, a couple of 100 bytes on chip (Forth code!) should be largely
sufficient to boot from ide. The ide interface could connect to a flash
rom drive or a real one. During development a flash drive would be
ideal. A system with 8 Mbyte could initialise a hard disk from a flash
drive in no time; two disk on ide is just software.

I stand by the following connectors :
72 pin DRAM
IDE
VGA
5 pin keyboard

The board is empty except for the D/A converter (a few resistors)
and an FPGA.
A prototype board would need an interface to program the device.

Lateron we could add some jumpers and get the remaining I/O pins out
to do some useful work.

>Besides, what good is there to getting ticked off about something posted
>on the Internet?

It is no good. Once in three years it happens.

>Rick "rickman" Collins

Charles Steinkuehler

unread,

Jul 15, 2004, 7:32:00 AM7/15/04

to jo...@bluepal.net, comp.la...@ada-france.org

rickman wrote:

> Charles Steinkuehler wrote:
>>
>> Interesting...I'd looked at shifting litereals into the TOS register,
>> but that required 2 instructions (load literal into TOS, and
>> shift/append literal to TOS) and increased the size of the multiplexer
>> in front of the TOS, which was already by far the biggest mux in the
>> design. I hadn't thought of using the return stack...
>
> I'm not sure what you mean by "2 instructions". Are you saying that you
> would need two opcodes? I am using just one and use a state variable to
> distinguish between the first and contiguous literal instructions. The
> first LIT does a load with sign extension and others that immediately
> follow do a load with shift.

Ah...another optimization I hadn't discovered yet!

> The return stack worked out for me not so much because of the TOS mux
> issue. Mine is also very large, but I was trying to make this the focus
> and eliminate muxes elsewhere when I could. In the Altera parts they
> have a cascade chain that can implement the mux OR gate with 0.1 ns
> delay. The newer Xilinx parts can do very wide muxes (16 inputs I
> think) in a single CLB. So the TOS mux will only be 1 level of logic in
> either case. Of course this optimization does not reduce the number of
> LUTs needed. Adders are pretty cheap (in LUTs) compared to muxes. An
> adder is the same as a 2 input mux. So it often is better to use more
> adders and fewer data path muxes if you can.

I did a fair amount of playing around with what could be squeezed into
the single 4-input LUT in front of a flip-flop. It looked like I could
combine math function(s) (add/sub/inc/dec) with muxing fairly
arbitrarily. It looks like constructs that mixed muxing (a 2-1 mux) and
math functions were the most efficent use of the chunky FPGA logic.

> I dropped the instruction packing concept because it required an extra
> layer of mux on the instruction register the way I would do it. My
> speed critical paths are a toss between the fetch and the stack ops.
> Removing this mux will help with the overall speed since instruction
> fetch is part of *every* cycle.

I was looking at shifting instructions using a 2-1 mux in front of
instruction register. I still had 1 clock to read the next instruction
from memory (kind of a psuedo instruction that runs when the I register
is empty), but was looking at how to get rid of it. It sounds like the
slightly larger instructions are an overall win anyway.

>> If you decide to make your design openly available, I'd love to take a
>> look at it. If not, I really appreciate the info above...it's already
>> saved me a lot of time should I ever get enough free time to roll a
>> design of this sort for fun (or a good enough reason to implement it on
>> someone else's dime).
>
> I will email you my block diagram (PDF). It clearly shows my first
> lesson which Jeff and Chuck already figured out which is that separate,
> dedicated memories allow a simplification of the address generation and
> therefore better speed.

What are you using for the program memory and the stacks? I was
thinking about putting the program and data stack memory in the same
block of ram (using dual-port access to avoid speed problems), and use a
seperate memory (perhaps distributed ram in Xilinx) for the return stack.

> I assume your email address is valid?

Yup...and I've got the spam to prove it!

--
Charles Steinkuehler
cst...@newtek.com

rickman

unread,

Jul 15, 2004, 8:19:42 PM7/15/04

to

Albert van der Horst wrote:
>

Now I am getting ticked off. Your Karma ran over my dogma!!!

rickman

unread,

Jul 15, 2004, 8:47:00 PM7/15/04

to

Charles Steinkuehler wrote:
>
> rickman wrote:

> > The return stack worked out for me not so much because of the TOS mux
> > issue. Mine is also very large, but I was trying to make this the focus
> > and eliminate muxes elsewhere when I could. In the Altera parts they
> > have a cascade chain that can implement the mux OR gate with 0.1 ns
> > delay. The newer Xilinx parts can do very wide muxes (16 inputs I
> > think) in a single CLB. So the TOS mux will only be 1 level of logic in
> > either case. Of course this optimization does not reduce the number of
> > LUTs needed. Adders are pretty cheap (in LUTs) compared to muxes. An
> > adder is the same as a 2 input mux. So it often is better to use more
> > adders and fewer data path muxes if you can.
>
> I did a fair amount of playing around with what could be squeezed into
> the single 4-input LUT in front of a flip-flop. It looked like I could
> combine math function(s) (add/sub/inc/dec) with muxing fairly
> arbitrarily. It looks like constructs that mixed muxing (a 2-1 mux) and
> math functions were the most efficent use of the chunky FPGA logic.

I ended up separating the add/sub, the logical and the shift functions.
I am trying to be chip independant and Altera LUTs don't have a separate
carry chain. They chain the carry, but they spilt the 4-input LUT into
two 3-input LUTs, one for add, one for carry. So you need a second LUT
to invert one addend into a subtrahend (are those the right terms?)
Anyway, I ended up with an arithmetic ALU and a logical ALU and a third
path for the right shifts. The left shift is done with a DUP ADD pair
rather then burn more LUTs. This uses a total of 3n LUTs (n is the bit
width) plus n/2 LUTs for additional inputs to the TOS mux. It also
makes the opcode decoding more simple since I can pick opcodes that let
me provide opcode bits to the ADD/SUB and logical functions without
decoding.

> > I dropped the instruction packing concept because it required an extra
> > layer of mux on the instruction register the way I would do it. My
> > speed critical paths are a toss between the fetch and the stack ops.
> > Removing this mux will help with the overall speed since instruction
> > fetch is part of *every* cycle.
>
> I was looking at shifting instructions using a 2-1 mux in front of
> instruction register. I still had 1 clock to read the next instruction
> from memory (kind of a psuedo instruction that runs when the I register
> is empty), but was looking at how to get rid of it. It sounds like the
> slightly larger instructions are an overall win anyway.

I'm not sure what you mean about having 1 clock to read. Don't you do
the entire fetch and execute in one clock cycle? Is your machine
pipelined?

> > I will email you my block diagram (PDF). It clearly shows my first
> > lesson which Jeff and Chuck already figured out which is that separate,
> > dedicated memories allow a simplification of the address generation and
> > therefore better speed.
>
> What are you using for the program memory and the stacks? I was
> thinking about putting the program and data stack memory in the same
> block of ram (using dual-port access to avoid speed problems), and use a
> seperate memory (perhaps distributed ram in Xilinx) for the return stack.

I started with a single address space. But the address muxing was
cumbersome. The more I looked at it the more I realized that in the
FPGA it made little sense to combine memories since you ended up needing
address muxes that slowed everything a lot. So each memory is separate
and is one block for now. The part I am using first is an Alteral
EP1K50 with 10 4k bit blocks. The CPU will use at least 4 leaving 6 for
IO buffers. I may have to add one or two more to the program store.

Dr. Bruce R. McFarling

unread,

Jul 16, 2004, 3:01:13 AM7/16/04

to

alb...@spenarnc.xs4all.nl (Albert van der Horst) wrote in message news:<I0vv8q.7Dt...@spenarnc.xs4all.nl>...

> In article <40F60EE5...@yahoo.com>, rickman <jo...@bluepal.net> wrote:
> <SNIP>

> >Relax. I can see where you might take offense at his post. But I think
> >he meant it to be much lighter than it came off. Let's face it, a 72
> >pin DRAM module will never take the place of an 8 pin MSOP serial
> >RAM/EEPROM. So he must be at least half kidding!

> Are you referring to the dogma that you need at least some ROM in a
> system for booting?

> This is of course also true for a Forth computer on fpga with integrated ide.
> However, a couple of 100 bytes on chip (Forth code!) should be largely
> sufficient to boot from ide.

...

> I stand by the following connectors :
> 72 pin DRAM
> IDE
> VGA
> 5 pin keyboard

Of course, both sides of the argument are right ... a 72 pin DRAM

module will never take the place of an 8 pin MSOP serial RAM/EEPROM.

OTOH, a boot from IDE process on-chip could. So you "only" need the
DRAM because something ELSE is taking the place of the 8-pin MSOP
serial RAM/EEPROM.

Don Golding

unread,

Jul 16, 2004, 11:02:42 AM7/16/04

to

This is very informative dialog and I want to thank all of you for
participating in this discussion. It is helping me immensely.

I seem to have a very different vision of the best way to utilize Forth,
and VHDL on FPGA.

Big picture:

Forth is for the highest level coding and VHDL is used for the lowest
levels and for speed. After all, VHDL is faster than Forth, obviously.
One exception is where speed is not important but the VHDL code would use a
lot of resources and Forth make more sense.

Think of it this way, Forth is the operating system and VHDL is the
assembler.

Opcodes for Forth are the basic Forth Words not the normal primitives used
to build Forth.

IE: Begin, AGAIN, IF, ELSE, THEN, +, -, DROP, SWAP, >R, R>, LITERAL, OR,
AND,

An eight bit opcode is assigned to each of these high level Forth words. So
the code prom, on chip, will hold this code. 8x1K is the size.

Also there is another RAM memory which is 16 bits wide: 16x1K.

When ever you develop a new VHDL routine, to create a new opcode for it and
just call it like and other Forth word.

Speed is the last priority on the list. The goal is to basically create a
Forth Computer on one chip! with application extensions written in VHDL!

More power Scotty...
Don

"rickman" <spamgo...@yahoo.com> wrote in message
news:40F72584...@yahoo.com...

Guy Macon

unread,

Jul 16, 2004, 11:24:51 AM7/16/04

to

Don Golding <dgol...@sbcglobal.net> says...

>I seem to have a very different vision of the best way to
>utilize Forth, and VHDL on FPGA.

My preference would be simple, simple, simple, running on the
lowest-cost and most widely available programmable device
available, and with a very clean implementation that minimizes
the number of Forth words that the processor knows and maximizing
the number of Forth words that are made up out of other Forth
words. My gut feeling is that if you make it really, *really*
simple, you will end up with one-clock instructions and a fast
clock rate. Sort of a "ForthRISC" type of design.

Another path would be to make something that is blazingly fast,
and there is a place in the world for such a design, but it seems
like a lot more work to design such a device, and I personally
wouldn't be as interested in it as I would be interested in a
simple device.

--
Guy Macon, Electronics Engineer & Project Manager for hire.
Remember Doc Brown from the _Back to the Future_ movies? Do you
have an "impossible" engineering project that only someone like
Doc Brown can solve? My resume is at http://www.guymacon.com/

rickman

unread,

Jul 16, 2004, 11:29:16 AM7/16/04

to

I am sorry I missed your call yesterday. I am taking off for a week and
won't be back until next Thursday. So it will have to wait a while.

From your description above, I would guess that you are a software
person rather than a hardware designer. Your abstract model would not
be bad, I believe it is basically a Forth machine emulation. But the
1-to-1 mapping you are assuming between VHDL modules and Forth word
definitions does not result in a good hardware design in the general
case.

A CPU to optimally execute Forth is still a CPU. It has registers,
logic and accesses memories. The method I chose to design my CPU was to
pick the basic functions that I wanted to implement in hardware and then
design optimal hardware to do that. I don't design in VHDL at all. I
write VHDL to describe my hardware. After all, VHDL stands for VHSIC
Hardware Description Language. Trying to describe your solution in VHDL
without first designing your hardware does not, in general, produce a
good solution.

See you next week!

Charles Steinkuehler

unread,

Jul 16, 2004, 11:42:34 AM7/16/04

to http://www.guymacon.com, comp.la...@ada-france.org

Guy Macon wrote:

> Don Golding <dgol...@sbcglobal.net> says...
>
>>I seem to have a very different vision of the best way to
>>utilize Forth, and VHDL on FPGA.
>
> My preference would be simple, simple, simple, running on the
> lowest-cost and most widely available programmable device
> available, and with a very clean implementation that minimizes
> the number of Forth words that the processor knows and maximizing
> the number of Forth words that are made up out of other Forth
> words. My gut feeling is that if you make it really, *really*
> simple, you will end up with one-clock instructions and a fast
> clock rate. Sort of a "ForthRISC" type of design.
>
> Another path would be to make something that is blazingly fast,
> and there is a place in the world for such a design, but it seems
> like a lot more work to design such a device, and I personally
> wouldn't be as interested in it as I would be interested in a
> simple device.

As has (I thought) been made somewhat apparent by this thread, when
talking about smallish forth-type stack processors, there's sort of a
'zen' truth that simple=fast=simple.

If you want to make a tiny, simple forth machine, it winds up running
fast (almost despite itself), as attempts to make the simplest possible
circuitry wind up resulting in a small instruction set, single-clock per
instruction architecture. But you don't want 1-clock instruction
execution, you'd rather have a slower, simpler design? Keeping track of
multiple instruction states tends to *ADD* complexity at the small sizes
possible for a simple stack-based CPU, so by making a small system,
you're making it fast.

Conversely, if you try to optimize for speed, you can wind up in pretty
much the same place, paring down to the bare minimum hardware to keep
prorogation delays minimal. The other option is to wind up with
multi-million gate designs like most modern CPUs, buying speed with
complexity.

It's all really the anti-thesis of the huge processors that are the
current rage in the desktop market, which build speed by adding
complexity, rather than removing it.

The whole exercise has given me a much greater appreciation of where
Chuck Moore went with machine forth (and I may be getting dangerously
close to understanding color forth if I'm not careful! :-).

--
Charles Steinkuehler
cst...@newtek.com

Don Golding

unread,

Jul 16, 2004, 11:55:00 AM7/16/04

to

Actually I have been a electronics hardware designer for 30 years, digital,
analog and microprocessor. In the mid eighties, I discovered Forth and have
never been the same since.

All of my processor based projects, mostly intelligent robots now, use
Forth. I currently use New Micros Forth and hardware for my base platforms
at present.

I also have a very strong software background, programming Windows
applications in another "out-of-the-mainstream" language called Clarion. I
have taken numerous courses in: Pascal, Data Structures, Neuronetworks, and
two courses in VHDL. So I am a competent programmer too.

With both a strong software and hardware background, I envision a Forth SOS,
System On a Chip, using both Forth and VHDL to their fullest capabilities.
I want to use Forth as the master control program and VHDL to create fast IO
specific co-processors whose functions are "called" by the Forth Engine as
words if you like. The high level application coding will be in Forth and
both the application Forth code in ROM and application variables will be
kept in RAM, all on the chip.

As for the hardware design using this system, you only need to design the IO
circuits and connect them to the chip. Write the mixture of Forth and VHDL
to add the new devices to the system, and ship it...

I do have a unique perspective on things, I'll give you that...

Thanks,
Don

See my projects, history at: http://www.angelusresearch.com/military. Me
resume is there too.

"rickman" <spamgo...@yahoo.com> wrote in message

news:40F7F44C...@yahoo.com...

rickman

unread,

Jul 16, 2004, 12:17:57 PM7/16/04

to

Guy Macon wrote:
>
> Don Golding <dgol...@sbcglobal.net> says...
>
> >I seem to have a very different vision of the best way to
> >utilize Forth, and VHDL on FPGA.
>
> My preference would be simple, simple, simple, running on the
> lowest-cost and most widely available programmable device
> available, and with a very clean implementation that minimizes
> the number of Forth words that the processor knows and maximizing
> the number of Forth words that are made up out of other Forth
> words. My gut feeling is that if you make it really, *really*
> simple, you will end up with one-clock instructions and a fast
> clock rate. Sort of a "ForthRISC" type of design.

That is exactly what I am trying to produce. All of my instructions,
(except reading from instruction memory... can't do two reads at once
from the same ram) are just one clock cycle with no pipelining. I tried
to keep it as small as possible without requiring the instructions to be
too limited. My current estimate in the Altera 1K parts is 400 LEs for
everything but instruction decode (which should be small). But so far
the synthesis results don't pare well with my estimates and are about
50% larger. I much prefer to infer logic rather than instantiate it.
We'll see if I can "coax" a better result from the compiler. As long as
I get a decent speed I will still be happy.

> Another path would be to make something that is blazingly fast,
> and there is a place in the world for such a design, but it seems
> like a lot more work to design such a device, and I personally
> wouldn't be as interested in it as I would be interested in a
> simple device.

To have a faster clock cycle requires pipelining with all the attending
complications. It can be worth the effort if you have a need for a new
CPU design. But that would be hard to justify.

Brad Eckert

unread,

Jul 16, 2004, 1:16:36 PM7/16/04

to

alb...@spenarnc.xs4all.nl (Albert van der Horst) wrote in message news:<I0vv8q.7Dt...@spenarnc.xs4all.nl>...

> However, a couple of 100 bytes on chip (Forth code!) should be largely
> sufficient to boot from ide. The ide interface could connect to a flash
> rom drive or a real one. During development a flash drive would be
> ideal. A system with 8 Mbyte could initialise a hard disk from a flash
> drive in no time; two disk on ide is just software.
>
> I stand by the following connectors :
> 72 pin DRAM
> IDE
> VGA
> 5 pin keyboard
>
> The board is empty except for the D/A converter (a few resistors)
> and an FPGA.
> A prototype board would need an interface to program the device.

Xilinx's Byteblaster IV cable (which is something like $99) can load a
bitstream into the FPGA for development. After that, you replace it
with a serial boot chip. I'm don't know if you can use the same tool
to program their serial flash devices.

A bitstream can be loaded into the FPGA from a parallel flash EEPROM.
You need a small CPLD to control the process. I've used a XC9572 CPLD
(about $2.60) to do this job and allow programming of the flash
through a standard PC parallel port. The FPGA boots up with the flash
in byte mode, then switches to word mode so that a soft CPU can use it
as x16 program storage.

I suppose the PCB would have mounting holes that line up with threaded
holes on a standard hard drive.

All of this is fun stuff, but what do you use it for? People buy
hardware to run applications. FPGAs are good at DSP intensive stuff
like MP3 playback, DVD players, etc.

--Brad

Don Golding

unread,

Jul 16, 2004, 5:24:31 PM7/16/04

to

With the tools today like Quartus II and the Xilinx Navigator counterpart,
entire systems can be implemented into a single chip increasing speed and
reducing cost, simultaneously. In the past, we picked a processor, added
RAM, ROM, and specialized IO parts, then designed a circuit board to make
systems. Now most of the external parts can be implemented inside the FPGA
and everything becomes software as apposed to hardware. We can finally use
a Forth processors in our designs without third party tools and licenses.
Costs are greatly reduced and performance increases exponentially.

FPGA's and VHDL are the cutting edge technology for developing "systems on a
chip" which are inherently parallel processing machines. Add Forth to the
mix and Forth gets back it's "cutting edge" status...

Don

"Brad Eckert" <nospaa...@tinyboot.com> wrote in message
news:7d4cc56.04071...@posting.google.com...

Guy Macon

unread,

Jul 17, 2004, 3:14:47 AM7/17/04

to

Don Golding <dgol...@sbcglobal.net> says...

>With the tools today like Quartus II and the Xilinx Navigator counterpart,
>entire systems can be implemented into a single chip increasing speed and
>reducing cost, simultaneously. In the past, we picked a processor, added
>RAM, ROM, and specialized IO parts, then designed a circuit board to make
>systems. Now most of the external parts can be implemented inside the FPGA
>and everything becomes software as apposed to hardware. We can finally use
>a Forth processors in our designs without third party tools and licenses.
>Costs are greatly reduced and performance increases exponentially.
>
>FPGA's and VHDL are the cutting edge technology for developing "systems on a
>chip" which are inherently parallel processing machines. Add Forth to the
>mix and Forth gets back it's "cutting edge" status...

I had the pleasure of having lunch with Don Golding today (he had
steak and kidney pie, I had bangers and mash) and saw the system he
describes above. I was very impressed, and I have not been impressed
by much of the VHDL code I see on the net. His design seems to me to
have the clarity of design that first attracted me to Forth, and I am
looking forward to seeing how this project turns out.

Steinar Knutsen

unread,

Jul 17, 2004, 7:52:32 AM7/17/04

to

On Fri, 16 Jul 2004, Don Golding wrote:

> With both a strong software and hardware background, I envision a
> Forth SOS, System On a Chip, using both Forth and VHDL to their
> fullest capabilities. I want to use Forth as the master control
> program and VHDL to create fast IO specific co-processors whose
> functions are "called" by the Forth Engine as words if you like. The
> high level application coding will be in Forth and both the
> application Forth code in ROM and application variables will be kept
> in RAM, all on the chip.
>
> As for the hardware design using this system, you only need to design
> the IO circuits and connect them to the chip. Write the mixture of
> Forth and VHDL to add the new devices to the system, and ship it...

Seems like a nice way to get a series of somewhat customized small
systems running? "I need a controller for that, a controller for that
and a controller for that, and an ALU in the middle." A little like the
"Lego approach" SOC-vendors like to tell people they offer the customers
with thick wallets.
--
Steinar

Don Golding

unread,

Jul 17, 2004, 3:53:20 PM7/17/04

to

As I mentioned in and earlier post, this is developing into a Parallel
Processing System with an CISC (Complex Instruction Set Computer)
architecture. Forth is the operating system and also the high level
application programming environment used to extend the language/system in
order to perform the functions required by the specific application.

Didn't we already kill CISC in favor of RISC(Reduced Instruction Set
Computer) years ago? Technology has a habit of coming full circle when
technology evolves and opens new solutions spaces which weren't possible
before. We will use VHDL primarily to create very high level instructions
which automatically become new words/instructions for use by the Forth
Processor.

In the single chip mode where there is a limited amount of on-chip ROM, we
most likely will concentrate on using Forth for the very highest level of
the application code. If we add a single external RAM and ROM, then the
chip becomes a more general purpose processor which can be used for large
applications requiring a lot of code.

We can easily parallel these chips together in order to build massively
parallel computers if desired.

How about a wireless handheld computer to browse the Internet any time, any
where, with speech recognition and synthesis that really works. I have a
TREO 600, which is a step in the right direction but doesn't give me the
computer of my dreams yet. (I do have Quartus Forth on it). Microsoft
Windows I last heard is up to 200 million lines of code and Explorer is such
a Hodge-podge of code it will never be secure. I don't know about the rest
of the world, but I am ready for something different.

Sorry about the rambling, but sometimes it feel good to share ideas and
dreams to stimulate creativity in a discussion such as this.

The current system:

It can have up to 256 opcodes which perform simple or complex tasks written
in VHDL or Forth, depending on which is best for the task at hand.

I am currently using the Altera University board with VGA monitor port,
Keyboard port, 2 Seven segment led displays, 4 switches(Cost: $100).
Quartus II is free from Altera.

Once the design is completed, all or part can be used on even less expensive
FPGA's. I think the EPF10K70RC240-4 FPGA which I use now is about $15 in
low quantities.

IO ports: 54 lines now, but this could be increased by another 90 pins or so
are unassigned. I may want to add another 16 bit address bus for external
Memory and/or other external devices (memory mapped IO perhaps), possibly
another 4 bit bus for external device chip select bus which would allow up
to 16 devices, 16 bits wide to be added to the system easily.

One 16 bit main system bus
VGA monitor Port-working
PC Keyboard Port-working
Forth Processor-60% done

Current Statistics:

Part: Altera Flex10K EPF10K70RC240-4
LE'S: 218 5% used, 95% free
Pins used: 97, 51% free
Memory used: 18,432, 50% free

Forth Processor

16 bit Data Bus
16 bit Address Bus
Stack Depth: 128

Instructions/cycle: most will complete in the 1 cycle, but a few might take
2 cycles, I don't care, using the least amount of resources is the most
important design consideration, not speed. All instructions are written in
VHDL, so they will be fast enough.

The Future:

An Interpreter will be developed as the last and final step giving the
system a truly complete System On a Chip (SOC)? With the Interpreter, Forth
and VHDL code can be debugged
interactively, which is one of the major productivity features Forth based
systems have today.

Since these Forth Processors are very small in size, multiple units can be
incorporated in the single chip design if necessary.

This is the first time I have taken on the task to actually develop a Forth
Processor from scratch and I consider myself a pretty created person. I
have tried some things already that were more traditional ways of building
processors with more registers and the opcodes to use them. The LE usage
was piling up fast. They were up to 12% with many more functions left to
go. I dropped that idea in favor of a more pure Forth approach and
remarkably the LE usage dropped back down to 5% from 12%.

Chuck has described the most efficient and powerful processor implementation
possible which is Forth, period, end of report.

Thanks Chuck!

Don

Albert van der Horst

unread,

Jul 18, 2004, 11:12:14 AM7/18/04

to

In article <YHxIc.90959$m25....@newssvr29.news.prodigy.com>,
Don Golding <dgol...@sbcglobal.net> wrote:
>I am building a simple 16 bit Forth Processor in VHDL which will be embedded
>in my future VHDL projects as the high level processor core. I want to
>build a very simple and elegant Forth, not a screamer like the P16 or other
>Novix like variations

I wonder about the following.
Once you have programmed the Forth into a FPGA chip, is it non-volatile?
Or does one need a mechanism to load the 'P' of the FPGA each time
power is supplied to the board.

>
>I will be posting questions from time to time regarding this project, if
>other people have done this already I would like to here about it.
>
>Thanks,
>Don Golding
>dgol...@sbcglobal.net

Guy Macon

unread,

Jul 19, 2004, 2:09:25 AM7/19/04

to

Albert van der Horst <alb...@spenarnc.xs4all.nl> says...

>Once you have programmed the Forth into a FPGA chip, is it non-volatile?

>Or does one need a mechanism to load the 'P' of the FPGA each time
>power is supplied to the board.

http://en.wikipedia.org/wiki/FPGA

Dr. Bruce R. McFarling

unread,

Jul 19, 2004, 5:27:38 AM7/19/04

to

alb...@spenarnc.xs4all.nl (Albert van der Horst) wrote in message news:<I11zKF.19I...@spenarnc.xs4all.nl>...

> In article <YHxIc.90959$m25....@newssvr29.news.prodigy.com>,
> Don Golding <dgol...@sbcglobal.net> wrote:
> >I am building a simple 16 bit Forth Processor in VHDL which will be embedded
> >in my future VHDL projects as the high level processor core. I want to
> >build a very simple and elegant Forth, not a screamer like the P16 or other
> >Novix like variations
>
> I wonder about the following.
> Once you have programmed the Forth into a FPGA chip, is it non-volatile?
> Or does one need a mechanism to load the 'P' of the FPGA each time
> power is supplied to the board.

If what you have done is build a Forth processors in FPGA, then its
non-volatile. You wouldn't HAVE to load the "P" in FPGA ... "field
programmable" means after it left the factory, not every power up.

Like any other microprocessor, whether the code it is running is
non-volatile depends on whether you stored it in a non-volatile place.

Bernd Paysan

unread,

Jul 19, 2004, 6:37:21 AM7/19/04

to

Dr. Bruce R. McFarling wrote:
> If what you have done is build a Forth processors in FPGA, then its
> non-volatile. You wouldn't HAVE to load the "P" in FPGA ... "field
> programmable" means after it left the factory, not every power up.

The FPGAs I know don't have non-volatile memory inside (unlike PLDs). They
need a "config device", which is a serial E(E)PROM, and capable to load the
FPGA at power-up. The type of serial EEPROM depends on the manufacturer,
Xilinx uses expensive "active" parts, which provide a clock, and Altera now
uses cheap SPI-based passive parts, where the FPGA generates the clock.

> Like any other microprocessor, whether the code it is running is
> non-volatile depends on whether you stored it in a non-volatile place.

You can initialize the block RAMs inside the FPGA with the FPGA config
device, too; if the program resides there, it's available after the FPGA
has been configured.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Don Golding

unread,

Jul 19, 2004, 9:52:46 AM7/19/04

to

"Albert van der Horst" <alb...@spenarnc.xs4all.nl> wrote in message
news:I11zKF.19I...@spenarnc.xs4all.nl...

> In article <YHxIc.90959$m25....@newssvr29.news.prodigy.com>,
> Don Golding <dgol...@sbcglobal.net> wrote:
> >I am building a simple 16 bit Forth Processor in VHDL which will be
embedded
> >in my future VHDL projects as the high level processor core. I want to
> >build a very simple and elegant Forth, not a screamer like the P16 or
other
> >Novix like variations
>
> I wonder about the following.
> Once you have programmed the Forth into a FPGA chip, is it non-volatile?
> Or does one need a mechanism to load the 'P' of the FPGA each time
> power is supplied to the board.
>

Check this page out for a very good explanation of configuration devices.

http://www.altera.com/products/devices/serialcfg/features/scg-adv_features.html#lowcost

There several different techniques to load the FPGA with the configuration
code.

1) A small configuration chip downloads the code each time the system is
turned on.
2) Some parts can be preprogramed by the manufacturer when you order the
parts like an EPROM or PROM.
3) You can take your code which was developed on an FPGA and have ASIC
parts made. This is a very popular technique for high volume applications.

Don

Brad Eckert

unread,

Jul 19, 2004, 6:52:29 PM7/19/04

to

"Don Golding" <dgol...@sbcglobal.net> wrote in message news:<OmQKc.93826$7R.6...@newssvr29.news.prodigy.com>...

>
> Check this page out for a very good explanation of configuration devices.
>
> http://www.altera.com/products/devices/serialcfg/features/scg-adv_features.html#lowcost
>

Unless you use a flash or antifuse based FPGA. Then it's instant on.
Quicklogic has a free prototype service for its antifuse FPGAs. Their
QL4009 would make a nice Forth chip. Costs around $10. I'd bet the
on-chip RAM is big enough to hold an IDE boot loader. I think the RAM
powers up to a user defined state, but I'm not sure.

Actel makes the flash based parts. Lattice Semiconductor also has
flash based FPGAs but they are high end parts.

--Brad

Brad Eckert

unread,

Jul 20, 2004, 11:43:32 PM7/20/04

to

Charles Steinkuehler <cst...@newtek.com> wrote in message news:<mailman.26.108999257...@ada-france.org>...

>
> As has (I thought) been made somewhat apparent by this thread, when
> talking about smallish forth-type stack processors, there's sort of a
> 'zen' truth that simple=fast=simple.
>
> If you want to make a tiny, simple forth machine, it winds up running
> fast (almost despite itself), as attempts to make the simplest possible
> circuitry wind up resulting in a small instruction set, single-clock per
> instruction architecture. But you don't want 1-clock instruction
> execution, you'd rather have a slower, simpler design? Keeping track of
> multiple instruction states tends to *ADD* complexity at the small sizes
> possible for a simple stack-based CPU, so by making a small system,
> you're making it fast.

Let me relate my experience from this weekend. I was fed up with
trying to get a RISC-like DSP design to go faster, so in the morning I
sketched the basics of what I really needed: A couple of registers
feeding a MAC. Then I drew paths from memory based on the application
requirements and added some registers for indexing. It became obvious
I could use the two MAC input registers as the top two elements on the
data stack. I used a small block RAM for the data stack and for
holding filter coefficients. The main goal was to get all of the
MAC-and-fetch steps to execute in one cycle.

Okay, I was on a roll. I decided to use 18-bit cells, with 6-bit MISC
instructions. By that afternoon I had a clean synthesis. Probably
buggy but basically designed. Sure was easy compared to the other
design.

It is much faster than the old design and much simpler. The main
improvement was removing the block RAM from the critical path. The
block RAM in Xilinx FPGAs (and Altera) has to be faked into providing
an asynchronous read, which slows things down.

At least for compute intensive applications, I'm totally sold on MISC
style processors. It's okay for the instruction set to vary slightly
from one application to the next because the tools are easy to change.

--
Brad