Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Multiply with F21

35 views

Skip to first unread message

Johannes Teich

unread,

Jul 16, 1995, 3:00:00 AM7/16/95

Hi folks!

The instruction set of the F21 controller ...
_______________________________________
| @A @A+ !A !A+ A@ A! | (address register)
| @R+ !R@ push pop | (return stack)
| dup over drop nop | (data stack)
| # com 2* 2/ -or and + +* | (arithmetic & logic)
| call ; else T=0 C=0 | (jumps)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
... contains a rather funny instruction: +* ("if bit 0 of T(OS)
is one, add a copy of S(econd) to T(OS)"). After some brainstorming,
here is what I suppose it could be good for:

.----------------------------------------------------------------.
0|\ F21 style multiplication on a 16-bit engine (using +*) |
1| |
2| HEX VARIABLE CARRY ( 1 | 0 ) |
3| |
4| : +C ( n n -- n ) 0 TUCK D+ CARRY ! ; |
5| : 2U/ ( u -- u/2 ) 2/ 7FFF AND ; |
6| |
7| : +* ( n1 n2 -- n1 n2 | n1 n1+n2 ) |
8| DUP 1 AND IF OVER +C THEN ; |
9| |
10| : UMUL ( 8bit 8bit -- 16bit ) \ 10x10bit on F21 |
11| 8 0 DO 2* LOOP SWAP $FF AND |
12| 8 0 DO +* 2U/ CARRY @ IF 8000 OR THEN LOOP NIP ; |
13| |
14| : test ( -- ) 255 255 UMUL U. ; \ test -> 65025 |
15| |
`----------------------------------------------------------------'

I don't know if there are any other instructions (apart from + and +*)
affecting the carry bit. And I suppose that the only instruction that
tests the carry bit is C=0 ("jump if T20 is zero"). Am I right or wrong?

Two more questions: I think that the 20-bit inline value after the
opcode for {#} must be 20-bit-aligned in code memory. Does this mean
that the 5-bit opcode must sit in slot 3? What happens if this is not
the case? And how can the opcode slots be handled in 8-bit memory?

Cheers Johannes Teich | Internet: __ __c
--Hannes 10052...@compuserve.com -- _~\<,_
_________________________________ CompuServe: 100522,135 ____(_)/_(_)_
(Murnau Bavaria/Germany)

Jeff Fox

unread,

Jul 16, 1995, 3:00:00 AM7/16/95

In article <3008e8...@kbbs.org>

Nice arrangement. Also jumps are 10 bit, 14 bit, or home page type.

>... contains a rather funny instruction: +* ("if bit 0 of T(OS)
>is one, add a copy of S(econd) to T(OS)"). After some brainstorming,
>here is what I suppose it could be good for:

Yes, +* is a multiply step instruction as in:

: +* ( n1 n2 -- n1 n2 | n1 n1+n2 ) \ multiply step n1 should be left
DUP 1 AND IF OVER + THEN ; \ shifted n times for n steps. It is a conditional
\ non-destructive add if T.0 is true.

+* is a multiply step instruction for multiplies with less than 20
bit results. N is left shifted n times and then +* 2/ is executed
n times. As in:

: 8*8 ( n1 n2 -- n1*n2 ) \ 16 bit output
push 2* 2* 2* \ left shift n1 8 times
2* 2* 2* 2*
2* pop +* 2/ \ no ripple at all
+* 2/ nop nop \ conservative use of nop for fast memory
+* 2/ nop nop
+* 2/ nop nop
+* 2/ nop nop
+* 2/ nop nop
+* 2/ nop nop
+* 2/ push drop
pop ;

The number of nops will depend on the number of bits through which
carry will ripple in a clock cycle (not yet known) and the speed
of the memory from which the instructions are being fetched. The
simulator says the upper limit for this sequence is about 150ns.

> .----------------------------------------------------------------.
> 0|\ F21 style multiplication on a 16-bit engine (using +*) |
> 1| |
> 2| HEX VARIABLE CARRY ( 1 | 0 ) |
> 3| |
> 4| : +C ( n n -- n ) 0 TUCK D+ CARRY ! ; |
> 5| : 2U/ ( u -- u/2 ) 2/ 7FFF AND ; |
> 6| |
> 7| : +* ( n1 n2 -- n1 n2 | n1 n1+n2 ) |
> 8| DUP 1 AND IF OVER +C THEN ; |
> 9| |
>10| : UMUL ( 8bit 8bit -- 16bit ) \ 10x10bit on F21 |
>11| 8 0 DO 2* LOOP SWAP $FF AND |
>12| 8 0 DO +* 2U/ CARRY @ IF 8000 OR THEN LOOP NIP ; |
>13| |
>14| : test ( -- ) 255 255 UMUL U. ; \ test -> 65025 |
>15| |
> `----------------------------------------------------------------'

Yes this is the way it works. It is useful for multiplies of
limited precision.

>I don't know if there are any other instructions (apart from + and +*)
>affecting the carry bit. And I suppose that the only instruction that
>tests the carry bit is C=0 ("jump if T20 is zero"). Am I right or wrong?

Carry (T.20) is set to 0 by any memory fetch since memory is 20 bits
wide and the registers and stacks are 21 bits wide. 2* sets T.20
to the previous contents of T.19. + and +* may set T.20 as a
result of addition. COM changes T.20.
Memory addressing uses carry to select SRAM address. The home page
branch instructions do permit branching into the SRAM address
space (where the carry bit is set in the address) within a 13 bit
range. 2/ sets T.19 to the previous value in T.20.

ALU instructions work on 21 bits, and the registers and stacks are
21 bits wide, so even DROP can change carry.

>Two more questions: I think that the 20-bit inline value after the
>opcode for {#} must be 20-bit-aligned in code memory. Does this mean
>that the 5-bit opcode must sit in slot 3? What happens if this is not
>the case? And how can the opcode slots be handled in 8-bit memory?

Up to four # opcodes may occur in a word. So # # # # is a legal opcode.
It would need four inline 20 bit words in memory following it.

About 50% of Chuck's MuP21 code looks like this:

# call

where # puts a 20 bit literal on the stack and then an onpage subroutine
call is made in the same opcode. The # loads the word after the #_call
so the call pushes the address after the literal loaded by #.

In 8 bit memory you only get slot3. (ie. the last slot) When executing
instructions in 8 bit memory only the least signifigant 5 bits are
decoded as instruction. Literals are only 8 bits wide, and since
the branch opcodes will appear in the lower five bits of the address
field that they use each of the branch instructions can only access
8 different locations on a page.

Jeff Fox
Ultra Technology
2510 10th St.
Berkeley CA 94710
(510) 848-2149
jf...@netcom.com
http://www.dnai.com/~jfox

Johannes Teich

unread,

Jul 18, 1995, 3:00:00 AM7/18/95

In article <jfoxDBt...@netcom.com>
Jeff Fox <jf...@netcom20.netcom.com> wrote:

>> | # com 2* 2/ -or and + +* | (arithmetic & logic)
>> | call ; else T=0 C=0 | (jumps)
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>Nice arrangement. Also jumps are 10 bit, 14 bit, or home page type.

Thanks your prompt reply. I only can hope that writing info about F21
is at least half as much fun as reading it. :)

Hm, still curious: How is the distinction between 10 and 14 bit?
{else} and {call} 14 bit, {T=0} and {C=0} 10 bit? Or somehow flexible
via the C register?

>+* is a multiply step instruction for multiplies with less than 20
>bit results. N is left shifted n times and then +* 2/ is executed
>n times. As in:
>
>: 8*8 ( n1 n2 -- n1*n2 ) \ 16 bit output
> push 2* 2* 2* \ left shift n1 8 times
> 2* 2* 2* 2*
> 2* pop +* 2/ \ no ripple at all
> +* 2/ nop nop \ conservative use of nop for fast memory
> +* 2/ nop nop
> +* 2/ nop nop
> +* 2/ nop nop
> +* 2/ nop nop
> +* 2/ nop nop
> +* 2/ push drop
> pop ;

I see, there's no need for looping as I did...

>>10| : UMUL ( 8bit 8bit -- 16bit ) \ 10x10bit on F21
>>11| 8 0 DO 2* LOOP SWAP $FF AND

>>12| 8 0 DO +* 2U/ CARRY @ IF $8000 OR THEN LOOP NIP ;

>Yes this is the way it works. It is useful for multiplies of
>limited precision.

I'm sure higher precision multiplies can be based on it as well.
Any group of bits (e.g. 8 or 10) can be considered to be one digit in
a number system with adequate base (256 or 1024, resp.).

>Carry (T.20) is set to 0 by any memory fetch since memory is 20 bits
>wide and the registers and stacks are 21 bits wide. 2* sets T.20
>to the previous contents of T.19. + and +* may set T.20 as a
>result of addition. COM changes T.20.
>Memory addressing uses carry to select SRAM address. The home page
>branch instructions do permit branching into the SRAM address
>space (where the carry bit is set in the address) within a 13 bit
>range.

I thought 14 bit...?

>2/ sets T.19 to the previous value in T.20.

Ah, {2/} does not double the "sign bit" as ANS Forth {2/} does. I
understand there is no need for a sign bit at all. Thanks to 2's
complement there is one {+} for signed and unsigned numbers, and even
a single precision multiply (with single precision result) makes no
distinction between signed and unsigned operation (as we all know :).
Bit 19 can be tested as bit 20.

>ALU instructions work on 21 bits, and the registers and stacks are
>21 bits wide, so even DROP can change carry.

>Up to four # opcodes may occur in a word. So # # # # is a legal

>opcode. It would need four inline 20 bit words in memory following
>it.

All in all a very clever concept. F21, 1st quarter '96?

(Somwhere on my disk there resides a picture of Mr. Moore in some
queer graphic format. Hopefully I will somehow manage to unpack it
before the next crash. :)

Cheers Johannes Teich | Internet: __ __c
--Hannes 10052...@compuserve.com -- _~\<,_
_________________________________ CompuServe: 100522,135 ____(_)/_(_)_

(Murnau, Bavaria/Germany)

Jeff Fox

unread,

Jul 18, 1995, 3:00:00 AM7/18/95

In article <300bd4...@kbbs.org>
Johannes Teich <Johanne...@kbbs.org> writes:

>Hm, still curious: How is the distinction between 10 and 14 bit?
>{else} and {call} 14 bit, {T=0} and {C=0} 10 bit? Or somehow flexible
>via the C register?

MuP21 is limited to a ten bit argument in branch instructions, but
when a branch instruction appears in slot 0 on the F21 it will take
a 15 bit argument. If the most signifigant bit is 0 then this is a
14 bit jump. The lower 14 bits form an address with the upper 7
forming a 14 bit sized page in memory. If the most signifigant bit
of the 15 bit argument field is a 1 then it is interpreted as a home
page branch. Home page is defined in the configuration register. The
reason for this is that it permits setting home page to SRAM so that
code in DRAM can branch to code in SRAM in a one word instruction. So
you get either a 14 bit paged branch from the page you are on, or you
get a 13 bit home page branch where home page is defined in the config
register.

On MuP21 to jump to SRAM from DRAM you have to load a literal, complement
it (to set carry to address SRAM), push it to the return stack and execute
a return. # COM PUSH ; This sequence takes two words of memory for a
jmp from DRAM to SRAM. To do a call to SRAM on MuP21 you have to add
another # PUSH which now results in a four word long macro, or you can
use # CALL (on page call) to a routine that does PUSH ; This will only
be two inlined words (although it does call a one word subroutine somewhere
on the same page). In any event using two or four word macros for branches
that are larger than 10 bits on MuP21 causes code to expand. We wanted a
mechanism on F21 to perform braching on more than 10 bits and ideally
between SRAM and DRAM which would normally take a 21 bit address that would
fit into one word instructions. Thus the home page.

Of course in DRAM 10 bit branches are faster since they are on-page. You
can adjust the memory timing for on-page branching in DRAM from 25ns to
about 75ns. Doing off page branching in DRAM is slower since you have to
wait for the page setup time. A 14 bit branch in DRAM can be set from
about 90ns to about 200ns. Home page timing will depend on whether
home page is set in DRAM or SRAM. F21 only address 13 bits worth of high
speed 20 bit wide SRAM. SRAM timing can be set from about 15ns to 40ns.

>>+* is a multiply step instruction for multiplies with less than 20
>>bit results. N is left shifted n times and then +* 2/ is executed
>>n times. As in:
>>
>>: 8*8 ( n1 n2 -- n1*n2 ) \ 16 bit output
>> push 2* 2* 2* \ left shift n1 8 times
>> 2* 2* 2* 2*
>> 2* pop +* 2/ \ no ripple at all
>> +* 2/ nop nop \ conservative use of nop for fast memory
>> +* 2/ nop nop
>> +* 2/ nop nop
>> +* 2/ nop nop
>> +* 2/ nop nop
>> +* 2/ nop nop
>> +* 2/ push drop
>> pop ;

>I see, there's no need for looping as I did...
>
>10| : UMUL ( 8bit 8bit -- 16bit ) \ 10x10bit on F21
>11| 8 0 DO 2* LOOP SWAP $FF AND
>12| 8 0 DO +* 2U/ CARRY @ IF $8000 OR THEN LOOP NIP ;

The unrolled and inlined code will be faster since being linear it allows
use of instruction prefetch to speed up execution. The idea is the same
in the code. Also remember that a compiler may unroll a loop and inline
the code in some systems.

>>Carry (T.20) is set to 0 by any memory fetch since memory is 20 bits
>>wide and the registers and stacks are 21 bits wide. 2* sets T.20
>>to the previous contents of T.19. + and +* may set T.20 as a
>>result of addition. COM changes T.20.
>>Memory addressing uses carry to select SRAM address. The home page
>>branch instructions do permit branching into the SRAM address
>>space (where the carry bit is set in the address) within a 13 bit
>>range.

>I thought 14 bit...?

10 bit fast on-page branches (1Mx4 DRAM chips have fast 10 bit pages)
14 bit (14 bit page) branches (slower since DRAMS do not havve 14 bit pages)
13 bit home page branches (upper bits come from a configuration register)

Also note that F21 supports 256kx4 DRAM chips also. These chips only have
9 bit pages. When a bit is set in the configuration register on F21 it
will assume the DRAM have 9 bit pages rather than 10 bit pages.

So branch instructions can span 14 bits of DRAM address and 13 bits of
DRAM or SRAM address in a one word branch instruction.

>>2/ sets T.19 to the previous value in T.20.

>Ah, {2/} does not double the "sign bit" as ANS Forth {2/} does. I
>understand there is no need for a sign bit at all. Thanks to 2's
>complement there is one {+} for signed and unsigned numbers, and even
>a single precision multiply (with single precision result) makes no
>distinction between signed and unsigned operation (as we all know :).
>Bit 19 can be tested as bit 20.

In MuP21 2/ did not change bit 20 or bit 19 a sort of 20 bit 2/. On
F21 bit 20 goes into bit 19. This is a 21 bit 2/ or you can think of
it as a 20 bit 2/ if you do a 2* 2/ first. It is a little more powerful
for extended precision math this way.

You can use bit 20 for carry or sign depending on how you want to do
math (20 or 21 bits). It is a little strange since carry is really no
different than the other bits for most alu operations.

>>ALU instructions work on 21 bits, and the registers and stacks are
>>21 bits wide, so even DROP can change carry.

>>Up to four # opcodes may occur in a word. So # # # # is a legal
>>opcode. It would need four inline 20 bit words in memory following
>>it.

>All in all a very clever concept. F21, 1st quarter '96?

That is the plan. Prototypes this fall, and volume production to follow.

0 new messages