Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

About that mulstep hardware ...

34 views
Skip to first unread message

smallpond

unread,
Jan 12, 2008, 12:46:07 PM1/12/08
to
What would be the ultimate computer hack?

Would it not be doubling the speed of a 25-year old
computer with a soldering iron and a few programmable
parts?

History

Perq 1 had a 4K x 48-bit microstore. When this was
deemed inadequate it was decided that the Perq Too
would get 16K (as though you could ever satisfy
software guys with enough memory). Another new feature
was that the mulstep instruction was designed, which
was basically a conditional add (1-bit multiply)
improving the multiply speed from its former
shift-and-add done in microcode. This required little
additional hardware since there was already an adder
and shifter in the CPU data path. The main piece of
new hardware was the multiplier register, formed from
two 8-bit shift registers, which held the multiplier
and stored the result.

The basic algorithm was to look at one bit of the
multiplier and change the ALU operation based on
the value.

Mul ALU
---------
0 A
1 A+B

On the first operation, A is 0. On subsequent
operations, A is the output of the previous operation
shifted right one bit. After 16 steps, the output of
the shifter is the high 16 bits of the product and the
multipler register holds the low 16 bits.

Divstep does the complement of this operation, doing a
conditional add or subtract and shift per clock cycle.
The shifters run in the opposite direction and the
carry bit shifts into the multipler register to
generate the quotient.

Sparc used the same mulstep/divstep approach for speed.
A full 32-bit multiplier requires several levels of
adders, so it couldn't complete in one clock cycle.

The Awakening

At the time, it was believed that too much new hardware
would be required on the already stuffed full CPU card
to do a multi-bit multiplier.

Recently, in fact just after my second cup of coffee
this morning, I realized that we could have made the
Perq mulstep do 2 bits at a time with the same
hardware.

- Set the main shifter to shift right two bits instead
of one.

- Rewire the two multiplier shift registers to hold odd
and even bits, thus shifting two bits at a time
instead of one.

- Use excess-3 style encoding: two bits don't have to
mean 0,1,2,3, It turns out to be very easy to
multiply by -2,-1,0,1 or 2.

- The B-Mux needs to be able to be able to supply Y<<1
(AKA B*2) to the ALU.

- The control logic looks at the low two bits of the
multiplier (MUL) and a new M carry bit on each clock
according to the following table:

MUL Min ALU Mout Q
--------------------------
00 0 A 0 NA
01 0 A+B 0 0
10 0 A+(B*2) 0 1
11 0 A-B 1 0
00 1 A+B 0 0
01 1 A+(B*2) 0 1
10 1 A-B 1 0
11 1 A-(B*2) 1 1

The A-MUX presents the previous result shifted by two
on each cycle.

Now, how to get B-MUX to supply B*2 without adding any
chips? Complement the least significant address bit of
the Y register address and prefill this register with
B*2. This means that to multiply foo * baz you need to
load 3 registers instead of 2:

Reg0 := baz
Reg1 := baz<<1
mult := foo

The Q column above indicates the cases where we have to
complement the Y address bit during the multiply.

Dividing two bits at a time is kind of tricky and left
as an exercise to the reader. Meaning basically that I
have no clue. I think that probably it still needs to
get done in one-bit mode, so the multiplier register
has to be a little bit more complex. In fact, to
reduce the amount of rewiring, I think we replace the
two shift registers with two PALs, which lets us shift
left by one, or right by two and also hold the other
new M & Q logic that we need. Yeah, two 22V10s would
be just about right.

Anyway, if anyone wants the title of ultimate Perq
uber-geek, you could implement this hack and have the
world's fastest Perq.

Of course, you would also have to rewrite some
microcode, which requires sacrificing a goat and
self-flagellation.

Just food for thought...

-- Steve Clark, Mgr. Perq 2 Hardware Emeritus

Skeezics

unread,
Sep 23, 2014, 7:52:03 PM9/23/14
to
On Saturday, January 12, 2008 9:46:07 AM UTC-8, smallpond wrote:
> What would be the ultimate computer hack?
>
> Would it not be doubling the speed of a 25-year old
> computer with a soldering iron and a few programmable
> parts?
>
[snip]
> Anyway, if anyone wants the title of ultimate Perq
> uber-geek, you could implement this hack and have the
> world's fastest Perq.
>
> Of course, you would also have to rewrite some
> microcode, which requires sacrificing a goat and
> self-flagellation.
>
> Just food for thought...
>
> -- Steve Clark, Mgr. Perq 2 Hardware Emeritus


Six years might be a little late for a follow-up, but a couple of months ago I worked out an implementation of the Mul/Div hardware for PERQemu, to complete our 16K CPU (20-bit) emulation. As it turns out, the self-flagellation and goat sacrifice comment was pretty spot on!

Anyway, I've been thinking about this post for a while now. Perhaps I'll try out the 2-bits-at-a-time idea in software as a proof-of-concept, y'know, for when we revive the new PERQ-2 in FPGAs...

:-)



0 new messages