More than headed in the direction of MISC. I designed a MISC processor
somewhere around 2002 that I used in a design for a DMA controller/bus
interface. The CPU didn't need to do a lot, but it needed to be fast
enough and responsive. The MISC processor did the job well.
One point I want to make about the instruction set without getting into
totally. In a MISC, the goal is nominally to minimize the hardware.
When implementing a stack CPU it is not realistic to implement all of
the Forth primitives because of the data paths they require. ROT is a
good example. DUP, SWAP, etc involve just TOS and NOS. ROT involves
the top three stack items. This requires more hardware for the muxes
and in this case it even needs an extra register since the stack itself
can be implemented by a block RAM or distributed RAM. In my design only
the TOS is an actual register.
This means Forth primitives which are not instructions need to be built
up. When used in code these primitives can be inefficient depending on
what is used around them. So an optimizer still has room to work.
> Then, after seeing the repeated remarks from the other thread, about how good optimising Forth compilers for register machines essentially unroll stack operations by doing a 're-labelling' on registers, which has led to your explorations of that design as an alternative possibility.
I'm not sure the connection is quite that linear, but I guess it got me
thinking, what can a MISC register based design look like. I had never
explored that before because the typical register CPU has a 16 bit
opcode to allow lots of flexibility in the instruction functionality.
But that is not useful in my approach to a CPU design because I don't
want complex instructions, even the limited complexity RISC type
instructions. So I started thinking of just how a register instruction
set could be minimized. That is what I've been doing the last couple of
days. I think I have something that is comparable to my stack based
design in flexibility and is a bit more efficient in code density. But
there is a bit more work to do.
> So far hopefully correct (albeit perhaps mangled in trying to simplify)?
>
> My take-away from the previous thread was that by building a *register based* MISC you'd still require a good middle layer -- i.e. you'd still need a reasonably smart compiler to transform the human Forth code, unroll the stack operations, and produce the efficient register based machine code.
I'm a hardware guy, I don't deal with compilers much, either writing or
even using, at least not for these MISC machines. I mostly do something
more like an assembler and let it go at a few macros. You'll have to
ask someone else about middle layers.
> Whereas if you went with the stack-based MISC design, the compiler layer would be much simpler.
Perhaps, but there is still room for an optimizer. If the CPU maps so
well to Forth that you don't need an optimizer, I think you will be
leaving a lot on the table in terms of efficiency at the compute level.
That is, there may be redundancies in the actual data movements. For
example, in my machine a literal is created on the return stack because
most literals are actually addresses and addresses are used on the
return stack. Think of the return stack as a memory address generator.
So *literal* Forth that puts literals on the data stack will require
extra movements. But then I'm not sure literals are used as addresses
in Forth exactly. I guess it depends on the style of Forth used. IMHO
STC is the only way to use a stack machine for Forth.
I'm not sure all this is clear. I don't have a lot of time to write
tonight so my writing may not be well organized.
> For me, one of the key points emerging out of the previous thread was that it's not just a question of optimising between two elements -- hardware and programming language, but rather there's an 'iron triangle' between hardware / compiler / and programming language.
I would agree with that. In fact, this is a good point to mention the
ZPU. This is a stack CPU design with very different goals than most
MISC you will find. They specifically designed a instruction set
architecture which could spam a range of implementation complexities
(and therefor speed) while being 100% compatible. They wanted a low end
implementation that would be small enough to fit even small FPGAs and
did. Their design is actually a bit smaller than mine! The kicker
is... they wanted it to run C code and wrote a back end to produce code
for it. I believe someone even has used this CPU for a commercial app.
The support is a mailing list and I haven't seen much traffic lately
so I'm not sure how much it is being used just now. Still this is
rather interesting, a stack CPU to implement C code... who'da thunk it?
> So the trade-offs are between all three layers: the simplicity/complexity of the processor and its opcodes, the necessity/complexity of the optimising compiler as middle-ware, and the simplicity/expressiveness of the language that the human being programs in.
I would agree with that only mentioning that everyone has different
goals for "optimizing".
> If you take the ideal programming language for your processor design as Forth, then really you're down to the trade-off between machine closely matching the human language -- and reducing the need for middleware, or machine diverging from human language -- in which case the middleware becomes more necessary.
I don't agree that this will be optimal in many ways. I guess it make
the compiler "optimal" since it becomes an assembler, right?
> The greater the divergence, the more the middleware has to be clever.
>
> I've pulled out some of what I thought were key quotes from the previous thread that are hopefully also relevent in this discussion.
>
>
> AKE wrote:
> "So what would be the situation if one were writing Forth code for a Forth chip? Is it correct that in this case, each of the ordinary operations (rot, swap, dup, over, etc.) would be atomic operations, i.e. take one machine cycle? And would this remove the ambiguity of cost-counting by essentially removing the compiler layer? [So] is it correct to say that IF one were using a 'simple' Forth chip, then one would have similar ability to predict precisely how any Forth implementation of an algorithm would cost to run?"
If your compiler is so simple it is an assembler, then that is certainly
easy to do. Otherwise I think there are still ways to optimize Forth
code even on a Forth-ish CPU. So there will be come complexity to the
middle-ware.
> Brad Eckert wrote:
> "That's exactly the case. The language is close enough to the instruction set that a simple compiler does a good job. That means a bug-free compiler and easy to understand object code. In fact, since the chip is so simple (look up the J1 for an example) you can write a simulation in Forth or C to make exact measurements."
Yes, I agree. In my machine you don't even need tools to count the
cycles other than perhaps an adder. It is *always* one clock per
instruction.
> Stephen Pelc wrote:
> "In a private email, someone commented to me that stack machine hardware is a substitute for a good code generator."
Again I think you can still use good compilers to optimize even Forth
code on a MISC CPU.
> Andrew Haley wrote:
> "That's true: by closing the semantic gap betwen language and machine,
> you make the need for a complex layer of software go away."
Certainly minimized.
> Stephen Pelc wrote:
> "At the cost of writing another one. The choice is to buy an off the
> shelf chip and write a complex compiler, or to write a simple compiler
> and a CPU."
:^)
> Bernd Paysan wrote:
> "Then you can compare: The cost of an off-the-shelf 8051 is between $10k and
> $100k. The cost of integrating it into a chip is 3 months of Bernd's
> salery, and, due to the bugs in the $100k industry standard stuff, you need
> three mask spins to get it right ($300k), and are one year late to market
> (another year for struggling with the 8051, to get the software right). The
> cost of writing a Forth CPU is one week of Bernd's salery, and you save 10
> cents die size cost per chip you sell, and we did have first-time-right
> silicons. Plus you don't need to write a good code generator for the 8051,
> which is for sure in the "priceless" category."
Yes, I remember Bernd writing about this adventure in some detail in
another thread I believe. He has done a couple of internal chip CPUs.
My hat is off to him.
--
Rick