>Terje Mathisen wrote:
> >Jerome H. Fine wrote:
>
>> Is there an algorithm that will perform an Unsigned Integer Divide
>> for a 48-Bit Value using ONLY 16 bit instructions PLUS the
>> Signed Integer Divide (is is DIV or IDIV?) instruction which
>> divides DX,AX by a 16 bit value? OR maybe the Signed Integer
>> Multiply (is it MUL or IMUL?) instruction could be substituted
>> which multiplies two Signed 16-Bit Integers to produce a 32-Bit
>> Integer?
>
> Yes!
>
> The easiest is to split the input into chunks based on the largest
> power of ten that fits in 16 bits, i.e. 10000:
>
> Assuming SI points to 48-bit input variable, DI points to low end of 4
> 16-bit variables to receive the remainders, each to be less than 10000:
>
> mov bx,10000
> mov cx, 3
> next:
> xor dx,dx
> mov ax,[si+4]
> div bx
> mov [si+4],ax
> mov ax,[si+2]
> div bx
> mov [si+2],ax
> mov ax,[si]
> div bx
> mov [di], dx
> add di,2
> dec cx
> jnz next
> stosw ; The most significant chunk ends up here.
>
> The code above does more divisions than required, since after the
> first two iterations the top word has to be zero.
I believe that I have managed to improve the efficiency of the
code a bit by using the cx register in a somewhat different manner.
This took a bit of testing, but I think the code on my actual
hardware has now settled down enough to suggest this variation
for 8086 code:
----------------------------------------------------------------
ENCODE:
push di
mov bx,10000
next:
xor dx,dx
mov ax,[si+4]
test ax,ax ;probably omit this code
jz skip1 ; on faster hardware
div bx
mov [si+4],ax
skip1:
mov ax,[si+2]
mov cx,ax ;probably omit
or cx,dx ; this code on
jz skip2 ; faster hardware
div bx
mov [si+2],ax
skip2:
mov ax,[si]
mov cx,ax ;probably omit
or cx,dx ; this code on
jz skip3 ; faster hardware
div bx
mov [si],ax
skip3:
mov [di],dx
add di,2
mov cx.[si+2]
or cx.[si+4]
jnz next
xchg dx,ax
test dx,dx
jnz finish
xchg dx,ax
dec dx,2
finish:
pop bx ;suggested code
lea cx,[di] ; to determine
sub cx,bx ; the number of
shr cx,1 ; remainders in memory
makeascii:
call encode ;value to be encoded is in dx
loop:
cmp bx,di
jz done
dec di,2
mov dx,[di]
call encode ;value to be encoded is in dx
jmp loop
done:
ret
----------------------------------------------------------------
Note that the extra code to omit the "div bx" when any dividend
is zero assumes that the divide instruction is so slow that the extra
instructions and the branches will be faster than performing the
actual divide instruction. That will obviously be dependent on
the CPU being used.
The four instructions which follow finish "should" produce the number
of additional remainders (used from the bottom up) which are
to be encoded. I assume that the "shr cx,1" is a divide by two.
If the user wishes to pad the encoded value with blanks to
produce the same total number of characters each time, cx
can be used to determine the number of characters used for
the encode the first time. The maximum value of an Unsigned
48-Bit Integer is 281,474,976,710,655 which usually requires
at least 16 characters (with a leading blank). If cx is zero,
then use all 16 characters for the first (and final) encode.
Otherwise, multiply cx by 4 and subtract that from the number
of characters used for the first encode. Use 4 characters for
remaining encodes and remember to ZERO fill after the first
encode.
At makeascii, dx holds the final Unsigned 16-Bit Integer which
may be as large as 65535. cx can be used to determine how
many additional 16-Bit Integers have been produced. They
are removed from higher to lower until di equals the original
address. Check cx BEFORE (cx may be zero) removing any
additional remainders which may have been placed there since
the above code may have produced ONLY one value that is
less than 10000. Otherwise, if there are two or more values
(with the additional values being in memory), the first value in dx
may be as large as 65535.
NOTE that the above code has been neither assembled NOR even
tested, so there may be mistakes as I changed from my actual
hardware to 8086 instructions. There are probably faster versions
of the code as well. Mainly, I just wanted to reduce the number
of "div" instructions and provide some suggestions for anyone who
is still interested.
For those who are curious, I have included the code from my actual
hardware that is replaced by "Call Divide" from between skip 1 and
skip3.
On my actual hardware, the instructions are able to "auto-decrement"
a register during the execution of a Mov instruction. In addition,
there was an extreme need to conserve code space. Consequently,
on my actual hardware. the (repeated) code between "skip1" and
"skip3" was placed into a subroutine. Also note that since my
actual hardware has ONLY a SIGNED "Div" instruction, it was
necessary to divide by 20,000 and convert the result: Note that
R3 initially holds the address of the following value.
----------------------------------------------------------------
; NOTE: The following code supports a 48-Bit Unsigned Integer up to
; 281,474,976,710,655 and has been modified from 8086
; code supplied by Terje Mathisen
; NOTE: Since the maximum dividend can be 655,359,999 and the PDP-11
; Div Instruction is a SIGNED Divide Instruction, 20,000 is
used as the
; Divisor instead of 10,000 - which produces a maximum quotient
; of 32767 instead of 65537 and prevents overflow from occuring
; NOTE: The instructions which have been commented out are
; useful if the "Div" instruction takes an extremely long time.
; Otherwise, the additional overhead is not cost-effective.
Divide: Call (PC) ;Execute the Subroutine Twice
Mov R1,R0 ;Branch if the Divide Instruction
;;; Bne 10$ ; MUST be executed
;;; Mov -(R3),R1 ;Branch if the Divide Instruction
;;; Beq 50$ ; NEED not be executed
;;; Cmp R1,#20000. ;Branch if the Divide
;;; Bhis 20$ ; MUST be executed
;;; Br 30$ ;Continue with Common Code
10$: Mov -(R3),R1 ;Get the Low Order Portion of the
Dividend
20$: Div #20000.,R0 ;Divide by 20,000
Asl R0 ;Change from
30$: Cmp R1,#10000. ; Divide by
Blo 40$ ; 20,000
Sub #10000.,R1 ; to Divide
Inc R0 ; by 10,000
40$: Mov R0,(R3) ;Save the Quotient
50$: Return
----------------------------------------------------------------
THANK YOU MANY, MANY TIMES to Terje Mathisen
for the basic original algorithm.
Jerome Fine