Zero Instruction Computing?

gnuarm.del...@gmail.com

unread,

Nov 24, 2020, 9:49:02 PM11/24/20

to

I want to say zero instructions because I am envisioning a rather limited design that is essentially a calculator using a multiply followed by an add with little in the way of operations to specify. Essentially data will need to be steered into the device and data will need to be collected, so a stack seems optimal.

Implemented in an FPGA with dedicated DSP elements, the MAC would be of value. So every operation would be of the form

A <= B * C + D

There would be some steering of data, so TOS goes to C and D with the option of zeroing out either one. NOS goes to B muxed with input data from the various sensors.

I'm thinking of making the output register on the DSP unit the TOS with a block RAM stack for NOS. I believe this provides a path from A to D to make it a MAC rather than just an adder. The number representation is fixed point, most likely with no integer component minimizing normalization on multiplies. Adds will require normalization by one bit.

What does come into play is selecting the many constants required for the calculations. Some sensors have more simple calcs with offset and scale adjustment. Others have much more complex math with divides and square roots and many, many intermediates.

I recall on the GA144 one of the "tricks" that can be done with the stack is to treat the lower 8 items as a circular file, so they could be constants or even data on a conveyor belt. I'll need to detail the calculations to see if a similar approach could be used.

No real instructions, but data addressing may not be so simple.

--

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209

gnuarm.del...@gmail.com

unread,

Dec 28, 2020, 1:59:07 AM12/28/20

to

I've made some progress on my most recent CPU design. It has a stack and an address register. I've tried to not include immediate addresses in the instruction but I might just do that. It doesn't need much address space and the choices of instruction memory width goes to 36 bits once 18 bits is not enough. The program will not be large so one block ram will be adequate.

The main issue is the nature of the "ALU" that is provided in the FPGA. The multiplier has two 18 bit inputs forming a 36 bit output. This is added into a 55 bit accumulator while a 54 bit third input is subtracted. I have yet to figure out the optimal data paths. Presently my thinking is to use numbers as Q1.17 format with no sign bit since everything is positive. This is a way of doing math I've not encountered before. It requires a lot of attention to scale factors and every equation you want to compute has to be redone to normalize the intermediate data. There is a temperature compensation equation that requires a quadratic approximation which requires adjustment to both coefficients by different amounts.

The programming won't look much like Forth. It is a VLIW format with individual control points. The only encoded fields are the input multiplexer controls. It won't be programmed from Forth or anything like it. Each control word will just be an OR of the active control points directly stored into the block RAM through VHDL. Not easy to read or program.

--

Rick C.

+ Get 1,000 miles of free Supercharging
+ Tesla referral code - https://ts.la/richard11209

Paul Rubin

unread,

Dec 28, 2020, 4:27:24 AM12/28/20

to

"gnuarm.del...@gmail.com" <gnuarm.del...@gmail.com> writes:
> I've made some progress on my most recent CPU design.

What is its purpose, if it has one?

> The multiplier has two 18 bit inputs forming a 36 bit output. This is
> added into a 55 bit accumulator while a 54 bit third input is
> subtracted.

This sounds intended for doing long convolutions without overflowing the
accumulator. That calls for applications that can use this operation.

> It requires a lot of attention to scale factors and every equation you
> want to compute has to be redone to normalize the intermediate data.

> There is a temperature compensation equation...

I've never had to deal much with scaled arithmetic so don't know if
there are particular techniques that aren't obvious. It's probably
easiest to use a processor with floating point, or use simulated
floating point unless you really can't afford the speed hit. Twiddling
temperature compensation coefficients doesn't sound like something that
has to be done very fast though.

minf...@arcor.de

unread,

Dec 28, 2020, 4:53:39 AM12/28/20

to

AFAIK the common way is to store a precalculated table and to interpolate later.
No big math involved.

gnuarm.del...@gmail.com

unread,

Dec 28, 2020, 1:12:56 PM12/28/20

to

On Monday, December 28, 2020 at 4:27:24 AM UTC-5, Paul Rubin wrote:
> "gnuarm.del...@gmail.com" <gnuarm.del...@gmail.com> writes:
> > I've made some progress on my most recent CPU design.
> What is its purpose, if it has one?

This is intended to process various sensor inputs to generate data for display and compare to limits.

> > The multiplier has two 18 bit inputs forming a 36 bit output. This is
> > added into a 55 bit accumulator while a 54 bit third input is
> > subtracted.
> This sounds intended for doing long convolutions without overflowing the
> accumulator. That calls for applications that can use this operation.

Or not. Just because the bits are there doesn't mean they have to be used. Oddly enough it is the lsbs that get tossed first. By using Q1.17 format only 1 msb of the product gets tossed, and 17 of the lsbs. The accumulator has another set of msbs that can be ignored. However, when performing subtractions the msbs should all flip indicating a negative result which must be captured for flag purposes (A < B or A < 0).

> > It requires a lot of attention to scale factors and every equation you
> > want to compute has to be redone to normalize the intermediate data.
> > There is a temperature compensation equation...
>
> I've never had to deal much with scaled arithmetic so don't know if
> there are particular techniques that aren't obvious. It's probably
> easiest to use a processor with floating point, or use simulated
> floating point unless you really can't afford the speed hit. Twiddling
> temperature compensation coefficients doesn't sound like something that
> has to be done very fast though.

"Very fast"??? That is done on paper prior to the design. In fact one of the things holding me back is the lack of white board space... none. I've never found a computer app that is as facile as a white board. I do have patio doors and need to make use of them.

--

Rick C.

-- Get 1,000 miles of free Supercharging
-- Tesla referral code - https://ts.la/richard11209

gnuarm.del...@gmail.com

unread,

Dec 28, 2020, 1:20:03 PM12/28/20

to

It's not a complicated formula. Two coefficients, one on the X term and one on the X² term.
: TC ( X -- TC ) dup coeffx2 * coeffx1 + * ; \ coeffx1 and coeffx2 are constants

No big math involved.

--

Rick C.

-+ Get 1,000 miles of free Supercharging
-+ Tesla referral code - https://ts.la/richard11209

Paul Rubin

unread,

Dec 28, 2020, 4:04:50 PM12/28/20

to

"gnuarm.del...@gmail.com" <gnuarm.del...@gmail.com> writes:
> This is intended to process various sensor inputs to generate data for
> display and compare to limits.

I'm having trouble understanding why use a custom CPU for this.

> one of the things holding me back is the lack of white board
> space... none. I've never found a computer app that is as facile as a
> white board.

Pen and paper can work pretty well. If you really want to flex, you can
use cocktail napkins, but yellow pads are easier to use if you have any
around.

gnuarm.del...@gmail.com

unread,

Dec 28, 2020, 5:12:37 PM12/28/20

to

On Monday, December 28, 2020 at 4:04:50 PM UTC-5, Paul Rubin wrote:
> "gnuarm.del...@gmail.com" <gnuarm.del...@gmail.com> writes:
> > This is intended to process various sensor inputs to generate data for
> > display and compare to limits.
> I'm having trouble understanding why use a custom CPU for this.

I expect so. But that is not important to the technical issues. It's not really intended to be a "CPU". It needs to do some basic math with saturating arithmetic (which is easiest to do with tests for wraparound, aka sign changes) and set outputs based on the results. If there were less of it I would use dedicated circuits. In this case it is entirely reasonable to reuse the one limited resource, the multiplier/adder unit.

One issue in the ALU unit that I have not figured out is the accumulator. It has an internal feedback path that recirculates the output back to one of the adder inputs with a control that disables it. I assume that means it sets all of the inputs to zero. The output can be registered or the register can be bypassed. If the register is bypassed the accumulator feedback becomes an asynchronous path adding the output at the rate of the logic delays. I guess you can't use the accumulator if you don't use the internal register.

> > one of the things holding me back is the lack of white board
> > space... none. I've never found a computer app that is as facile as a
> > white board.
> Pen and paper can work pretty well. If you really want to flex, you can
> use cocktail napkins, but yellow pads are easier to use if you have any
> around.

Yeah, hard to erase paper. That's the beauty of white boards, so easy to adapt rather than start over on a new page.

--

Rick C.

+- Get 1,000 miles of free Supercharging
+- Tesla referral code - https://ts.la/richard11209