; divide number between 0 and 800 by 10
; hl = dividend
; returns:
; a = quotient
; h = remainder
div10:
xor a ; init quo
ld b,#7 ; loop 7 times
ld de,#-640d ; 10 shifted left 6 times
divloop:
add hl,de ; trial subtraction
jp c,div1 ; test if too small
sbc hl,de ; then restore
or a ; clear carry from restore ( not sure if needed? )
div1:
rl a ; carry to bit0 and shift quo
add hl,hl ; shift for next trial
djnz divloop ; 7 times
add hl,hl ; align remainder
ret
I'm not sure if the 'or a' in the conditional part is needed
but I think it is because of the SBC operation.
Thanks
Dwight
> Hi
> I'm writing some code to deal with separating track and sector
> for a 5.25 inch disk. The disk has 10 sectors per track and
> I'm given a value of 0 to 799 as an index into the disk.
> In anothe group, we've been optimizing this calculation for
> the Z80 processor. I think we've just about hit the optimal
> calculation for size as well as speed ( secondary ).
> It seems to me that there might be some way to use the
> DAA instruction but so far it has alluded me. Here is the code
> we have so far. Any suggestions for additional optimization
> is greatly appreciated.
More obvious to me, multiply by 205 and shift right 11 bits.
205 is X'CD', or binary 11001101. There should be some
optimal combinations of shift, add, and store that can
do that.
-- glen
Hi Glen
That gives me the quotient but not the remainder. I need both.
The quotient is the track while the remainder is the sector.
Both are required.
Also, the calculation exceeds 16 bits and doesn't fit into
a register pair ( 800 x 205 = 164000 ).
If it didn't over flow, it might look like:
ld e,l
ld d,h
add hl,hl
add hl,hl
add hl,de
add hl,hl
add hl,hl
add hl,hl
add hl,de
ld e,i
ld d,h
add hl,hl
add hl,hl
add hl,de
shift 11 times here to get quotient
But, as I said, this is only half the problem. I still need the
remainder that would require me to multiply by 10 and then subtract.
Dwight
> ; divide number between 0 and 800 by 10
> ; hl = dividend
> ; returns:
> ; a = quotient
> ; h = remainder
Joseph carr "z-80 microcomputer handbook" p. 190. Put
the 16-bit value in HL, put the 8 bit value in C and zero in B.
Then shift HL left and subtract BC successively. In effect you
are shifting BC right relative to HL.and doing "long division"
Feed the results into the lower bits of HL as you shift HL to the
left.
divide8:
;B has 8 bit divider, HL the "16-bit" dividend
;quotient in H, remainder in L
;successive shift and subtract operations
LD C,0
LD D,8 ;shift counter
LOOP ADD HL,HL ;shift HL left
XOR A,A ;clear carry
SBC HL,BC ;subtact divisor from UPPER byte
INC HL ;presume result "good"
JP P,JUMP1 ;jump if positive result
ADD HL,BC ;if negative, restore first
JUMP1: RES 0,L ;and clear bit 0 in L
DEC D ;count down 8 shifts
JR NZ,LOOP ;and loop until none
Note from Herb. I don't think this works if HL has a value
above 32K decimal. Looks to me like the first "ADD HL,HL" effectively
left-shifts bit 15 to oblivion. But for the required value of 0 to
800,
that's not a problem. Like any code in a book, check it throughly
for problems and marginal conditions; make sure you avoid "feeding"
it values it won't handle. (What happens if B=0?)
Herb Johnson
"don't try to be a programmer, Herb" - French Luser
retrotechnology.com
Hello, Dwight
Here is a divide routine which I dug out of my archives which should
work for up to 255 sectors and allow up to 65535 records. It is about
the same size but use of some Z80 instructions might shrink it a little.
Jeffrey W. Shook
;...........................................................
;
; Divide 16 bit number by 8 bit number - unsigned
;
; REGISTER USAGE - Those not shown are unchanged
;
; ENTRY: DURING: RETURN:
;
; CY ??? ... div/0 error
; A ??? remainder remainder
; B ??? loop count ???
; C divisor divisor divisor
; HL dividend div/quo quotient
;
; 15 instructions, 20 bytes, 860 cycles max, ~ 215 usec at 4 MHz
;
; By Jeffrey W. Shook, circa 25JAN82, excerpt from UTIL.ASM,
; part of PRINTF.ASM package. STRUCT.LIB constructs removed.
;
DIV8: ; Cycles, 8080
XRA A ; Clear remainder 4
CMP C ; Check for divide by 0 4
STC ; Return CY set on error 4
RZ ; 11
MVI B, 16 ; Set loop bit count 7
DIV8B:
DAD H ; Shift dividend left 10
RAL ; into remainder 4
CMP C ; Compare remainder to divisor 4
JC DIV8C ; If greater or equal 10
SUB C ; Then subtract divisor 4
INX H ; Shift 1 into least bit of quotient 5
DIV8C:
DCR B ; Count bit 4
JNZ DIV8B ; 10
DIV8D:
ORA A ; Clear carry since DCR doesn't 4
RET ; 10
END
Hi
Made a few more tweeks. I do like Herb's but I think
this is close to optimal.
divide number between 0 and 800 by 10
; hl = dividend
; returns:
; a = quotient
; h = remainder
div10:
ld a,#2 ; loop flag 7 times
ld de,#--1280d ; 10 shifted left 6 times
divloop:
add hl,hl
add hl,de ; trial subtraction
jp c,div1 ; test if too small
sbc hl,de ; then restore
div1:
rl a ; carry to bit0 and shift quo
jr c,divloop
add hl,hl ; align remainder
ret
Dwight
Hi Herb
I'd still need to load B. That would push it to 20 bytes ( If I count
right ). It is clearly better than mine for numbers larger than
1023. Still, see my latest post. I have the count down to 16
bytes and fewer clocks. Yours is 20 bytes, including the ret
and ld bc,0a00h.
Thanks
Dwight
Hi Jeffrey
This is also a good general solution. I still count loading
the divisor because it is a fixed value. I also don't need the
full range so I can deal with a shorter loop. Take a look at
my latest post. I think I'm approaching optimal.
Dwight
Hi Herb
Results are backwards. L is the quotient and H is the remainder.
At least it won't hang if B=0 but it could check for that.
It would be more optimal if the ADD HL,BC and SBC HL,BC were
swapped and use the carry flag like I use. SBC uses 15 clock
cycles compared to 11. It is better to have the conditional do the
extra clocks than to do it for each loop. Worst case is the same
but best case is better.
Dwight
> On Jan 22, 2:36 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
(snip)
>>More obvious to me, multiply by 205 and shift right 11 bits.
>>205 is X'CD', or binary 11001101. There should be some
>>optimal combinations of shift, add, and store that can
>>do that.
> That gives me the quotient but not the remainder. I need both.
> The quotient is the track while the remainder is the sector.
> Both are required.
> Also, the calculation exceeds 16 bits and doesn't fit into
> a register pair ( 800 x 205 = 164000 ).
Yes, I tried ones that would fit, but at some point it
rounds the wrong way.
> If it didn't over flow, it might look like:
(snip of 16 bit product code)
> But, as I said, this is only half the problem. I still need the
> remainder that would require me to multiply by 10 and then subtract.
It still might be faster, but maybe not smaller.
A loop subtracting 10 each time while counting up might be
smaller, though slower.
-- glen
>Hi
> I'm writing some code to deal with separating track and sector
>for a 5.25 inch disk. The disk has 10 sectors per track and
>I'm given a value of 0 to 799 as an index into the disk.
> In anothe group, we've been optimizing this calculation for
>the Z80 processor. I think we've just about hit the optimal
>calculation for size as well as speed ( secondary ).
> It seems to me that there might be some way to use the
>DAA instruction but so far it has alluded me. Here is the code
>we have so far. Any suggestions for additional optimization
>is greatly appreciated.
Ok we're talking an 80 track disk with tracks number 0 to 79 [0-4Fh].
There are 10 sectors to the track logially 0-9 [0-9h].
Simplest approach is a straight line divide by 10.
It is assumed the value passed is binary and the results will be
binary.
First pass thought would be something like this.
if N>10 then
Shift right ; /2
shift right ;/2
subtract ;
shift right ; results in /10
; The result will be depending on how you organize
0-79 (0-799/10) in 16 bit register
0-9 (remainder) in 8 bit register.
else
N=sector on first track
The only constant is the compare to see if the index is less then 10
as no need to do math then.
NS* dos, CPM on NS* are examples of system that need to do this alot
as NS* disk is 10 sectors adn depending on the media 35, 40 or 80
tracks (possibly two sides as well).
If the values are BCD already then there is no work. Consider
converting N [0-799] to BCD and the it's a matter of what nibble is
needed. Likely the program has or will need a Binary to BCD output
routine for IO that can do this. Also there is likely a BCD to
binary input routine as well. If both are general your already there.
Doing it fast is a preference as disk access is slow enough and any
thing around the code to speed it up doesn't hurt as it will be done
often.
Allison
========
Michael Pearce (mi...@anet-dfw.com) wrote:
: Does anyone have any z80 assembly code for unsigned multiplication?
: preferably 8 bit * b bit, but 16 * 16 would also be fine.
Below are two routines for unsigned 16x16 multiply and divide. Signed
operations can use these with some wrappers to do the accounting for
the sign bits. These public domain routines will also work on the 8080
and 8085 (no Z80 specific instructions). Note that there is no error
checking for overflow, and that arguments are passed on the stack.
============================= cut here
================================
_div16u
; set up
pop hl
pop bc ; divisor to bc
ex (sp),hl ; dividend to hl
ex de,hl ; dividend to de
; initialize
ld hl,17 ; init loop counter to N + 1
push hl
ld l,h ; init quotient
push hl
; decrement counter
div_top
div_shift_ct equ 4
IF 1
push hl ; 12 (debugging comments?)
ld hl,div_shift_ct ; 10
add hl,sp ; 10
dec (hl) ; 10
pop hl ; 10 --> 52
ELSE
inc sp ; 6
ex (sp),hl ; 16
dec h ; 4
ex (sp),hl ; 16
dec sp ; 6 --> 48
ENDI
jp z,finish
; update quotient
ex (sp),hl ; get quotient
add hl,hl ; update quotient
ex (sp),hl ; and put it back
; rotate dividend into partial dividend
add hl,hl ; shift pd
ex de,hl ; get dividend
add hl,hl ; shift dividend
ex de,hl ; get back pd
jp nc,skip_03
inc hl ; "rotate" carry bit in
skip_03
; compare pd and divisor -- subtract divisor from pd if pd > divisor
ld a,h
cp b
jp c,div_top
jp nz,div_top ; not greater or equal so skip
; compare low bytes
ld a,l
cp c
jp c,div_top
; do subtraction
ld a,l ; get low byte
sub c
ld l,a ; save result
ld a,h ; get high byte
sbc a,b
ld h,a ; save result
; update quotient
ex (sp),hl
inc hl
ex (sp),hl
jp div_top
finish
pop bc ; quotient in bc, remainder in hl
inc sp
inc sp
ret
; ******************************************************************
; unsigned _mul16( int, int );
; returns hl
_mul16u
; initialize
pop hl
pop bc
ex (sp),hl
ex de,hl
ld hl,0
ld a,c
ld c,8
; do lower half
_lower_half rra
jp nc,_Lfore
add hl,de
_LL ex de,hl
add hl,hl ; shift
ex de,hl
dec c
jp nz,_lower_half
; do upper half
ld a,b
ld c,8
_upper_half rra
jp nc,_Lfore
add hl,de
_LL ex de,hl
add hl,hl ; shift
ex de,hl
dec c
jp nz,_upper_half
ret
Found in my archives.
========
Michael Pearce (mi...@anet-dfw.com) wrote:
: Does anyone have any z80 assembly code for unsigned multiplication?
: preferably 8 bit * b bit, but 16 * 16 would also be fine.
Below are two routines for unsigned 16x16 multiply and divide. Signed
operations can use these with some wrappers to do the accounting for
the sign bits. These public domain routines will also work on the 8080
and 8085 (no Z80 specific instructions). Note that there is no error
checking for overflow, and that arguments are passed on the stack.
============================= cut here
================================
_div16u
; set up
pop hl
pop bc ; divisor to bc
ex (sp),hl ; dividend to hl
ex de,hl ; dividend to de
; initialize
ld hl,17 ; init loop counter to N + 1
push hl
ld l,h ; init quotient
push hl
; decrement counter
div_top
div_shift_ct equ 4
IF 1
push hl ; 12 (debugging comments?)
ld hl,div_shift_ct ; 10
add hl,sp ; 10
dec (hl) ; 10
pop hl ; 10 --> 52
ELSE
inc sp ; 6
ex (sp),hl ; 16
dec h ; 4
ex (sp),hl ; 16
dec sp ; 6 --> 48
ENDI
jp z,finish
; update quotient
ex (sp),hl ; get quotient
add hl,hl ; update quotient
ex (sp),hl ; and put it back
; rotate dividend into partial dividend
add hl,hl ; shift pd
ex de,hl ; get dividend
add hl,hl ; shift dividend
ex de,hl ; get back pd
jp nc,skip_03
inc hl ; "rotate" carry bit in
skip_03
; compare pd and divisor -- subtract divisor from pd if pd > divisor
ld a,h
cp b
jp c,div_top
jp nz,div_top ; not greater or equal so skip
; compare low bytes
ld a,l
cp c
jp c,div_top
; do subtraction
ld a,l ; get low byte
sub c
ld l,a ; save result
ld a,h ; get high byte
sbc a,b
ld h,a ; save result
; update quotient
ex (sp),hl
inc hl
ex (sp),hl
jp div_top
finish
pop bc ; quotient in bc, remainder in hl
inc sp
inc sp
ret
; ******************************************************************
; unsigned _mul16( int, int );
; returns hl
_mul16u
; initialize
pop hl
pop bc
ex (sp),hl
ex de,hl
ld hl,0
ld a,c
ld c,8
; do lower half
_lower_half rra
jp nc,_Lfore
add hl,de
_LL ex de,hl
add hl,hl ; shift
ex de,hl
dec c
jp nz,_lower_half
; do upper half
ld a,b
ld c,8
_upper_half rra
jp nc,_Lfore
add hl,de
_LL ex de,hl
add hl,hl ; shift
ex de,hl
dec c
jp nz,_upper_half
ret
Hi
Thanks but I think these are all to large for what I need. I have
only limited
additional space in the EPROM to work with. The one, I last proposed,
fits
into 16 bytes and solves the problem. I'm not even sure if I'll be
able to fit
this in with only using 16 bytes. I may have to be even more creative
in other
parts of the code.
My requirments are size first and speed second. Gineric algorithms
are way to large. Even the additional test for less then 10 would be
excessive
for what I need. I'm looking for more optimal solutions, not generic
solutions.
Count my bytes and count my clock cycles. If your not less than 16
bytes,
it does me little good. The generic solution you show is greater than
50 bytes!
Dwight
Dwight,
No problem. It was only suggestion and someone else may find it
useful for their archive.
However, the requirement that the code fit in a limited space is a new
dimension. An understandable one but till now not one I'd seen
stated. I'd seriously look at other routines for space. Most Z80
instructions with prefixes are larger in most cases. But there are a
few that can really help over base 8080. This is one time where hand
crafted code can be very small.
Allison
>Hi
> I'm writing some code to deal with separating track and sector
>for a 5.25 inch disk. The disk has 10 sectors per track and
>I'm given a value of 0 to 799 as an index into the disk.
> In anothe group, we've been optimizing this calculation for
>the Z80 processor. I think we've just about hit the optimal
>calculation for size as well as speed ( secondary ).
> It seems to me that there might be some way to use the
>DAA instruction but so far it has alluded me. Here is the code
>we have so far. Any suggestions for additional optimization
>is greatly appreciated.
You do understand that DAA is base ten only because D=Decimal,
right? I'm pretty sure it's a two-byte instruction. One byte is 0A hex
The fact is that you can divide (multiply, whatever) by ANY base if
you force the ''constant'' to be something other than 0A (hex). Most
every assembler I've seen chokes on it, but you can hard-code it ok.
Besides being useful to the spooks community (for generating odd
ciphers) this functionality might be useful in what you're proposing.
If you REALLY gotta be efficient.
Bill
Hi
From another group, it is the 8088 AAM that is the two byte
with the 0A that can be changed, not the Z80.
> Besides being useful to the spooks community (for generating odd
> ciphers) this functionality might be useful in what you're proposing.
>
The 8088/86 instruction would be just what one needs.
I regret it is not the same as the DAA. The DAA was intended
to recreate a BCD number after a binary arithmatic on
BCD numbers.
The AAM would be useful in creating BCD numbers from
a larger binary number.
Still, it seem like there is some magic in the DAA instruction
that could be used.
Dwight
Dwight
Hi
My mistake. I thought I'd stated that I was interested in small size.
I've posted to two other groups so I wasn't sure what I'd said but
thought I'd stated that size was a primary criteria.
I agree that posting these is a good idea.
It looks like variation on straight division such as I've been
proposing
is about as tight as one can get. I've thought about non-restoring
division which I think would be better in for the more general cases
but it puts too many conditional paths into the minimal code types.
For those that don't understand non-restoring, one
combines the restore and next subrtaction into one operation.
In both cases, one keeps two values for the divisor. One that
is half the other and opposite sign.
When done, one needs to check the sign of the remainder. If
it is negative then you add the divisor back into the remainder
and increment the quotient. It is fastest but does require more
code.
Hardware typically does it this way.
Dwight
Hi Glen
For speed, one might use a smaller number and expect the result
to be off some. One could treat this as a trial quotient.
Calculate the remainder using the multiply by 10 and then check
the size of the remainder. If the error was over small enough range,
one could do maybe 2 or 3 compares of the remainder to determine
how far off it was.
Of course, one could first do a trial of all of the values to
see how big the maximum error was over the range of numbers. This
would
simplify the testing.
This has the advantage that one only needs maybe 2 or 3
conditionals and the rest is all inline, as I described. No need for
time consuming loops.
I'll think about this over the weekend and post what I come up with.
I think the error will always be that the remainder is too large so
testing should be simple.
Dwight
Hi Glen
I just did a quick check. It looks like a multiply by 51d, instead
of 205. It will work if I shift the answer in HL right by 1.
I just checked it for the 800 case and it is only off by 10. That is
only
one compare. I'll have to run some test cases to try all the values
from 0 to 799. If worst case is only off by 1 in the quotient, this
would be a really fast way to do this calculation.
Dwight
> My mistake. I thought I'd stated that I was interested in small size.
> I've posted to two other groups so I wasn't sure what I'd said but
> thought I'd stated that size was a primary criteria.
> I agree that posting these is a good idea.
Did you try a subtract 10 each time loop?
It will loop up to 79 times, but should be small. It should
still be faster than one step of the floppy drive head.
Otherwise, there might be a routine already in the ROM that
you could use to reduce the size.
-- glen
A fellow did this on another group and it produced
the smallest code. Only 11 bytes. It was really slow
for larger numbers ( over 2000 clocks ) still useful
if I need the additional 5 bytes.
My main purpose is to, one time, read the disk images and I
intend to transfer the data over a serial line. Such a
calcultion is not out of line for that purpose. Since
I want to do the disk serially, I might just make a
simple increment routine and not divide at all.
Dwight
Hi Glen
Forgot to mention, the ROM is setup for 32 hard sectored
8 inch disk. I want to do the least modifications to use it
for 5.25 10 sectored. I'll use much of the code for
the 8 inch with a few minor tweaks for a small change in
format, sector size and sector count. I've already modified
the board with a crystal change and a few capacitors to
get it to run at the 5.25 data rate. The PLL is descrete
parts and was easy to modify.
It is from a Polymorphic system using System88 OS.
I have a working system using single density disk but
I have archive disk that are double that I wish to get the
data from. I had a card for the 8 inch 32 sectored double
density disk.
Dwight
Hi Glen
I worked on the idea of multiplying and here is my result.
It is about 30 bytes and runs in only 230 clocks. It also
has no conditional code!
; divides 0-799 by 10
; hl is input dividend
; a= qoutient
; l=remainder
Div10fast:
ld d,h
ld e,l
add hl,hl
add hl,hl
add hl,de
add hl,hl
ld b,h
ld c,l
add hl,hl
add hl,hl
add hl,bc
add hl,de ; dividend *51
ld a,h ; add little more accuracy
add a,#16d ; tweak value
ld c,ald b,#0
add hl,bc ; add corrections for truncation
ld a,h ; quo*2
rra ; quo=a but still need remainder
ld l,a
ld c,a
ld h,b
add hl,hl
add hl,hl
add hl,bc
add hl,hl ; quo*10 to calc Remainder
ex de,hl
sbc hl,de ; l= remainder
ret