6502 Print U16 as decimal

John Brooks

unread,

Jul 7, 2017, 10:09:22 PM7/7/17

to

In 2015 I made very compact assembly routines to convert binary words to decimal.

The 65816 binary-to-BCD conversion routine is $11 bytes and is posted here:

https://groups.google.com/forum/#!searchin/comp.sys.apple2.programmer/woz$20hextodec%7Csort:relevance/comp.sys.apple2.programmer/NpfRXsf2T0s/9swgn8FMCwAJ

I found that the 6502 version was much larger and more complex than the 65816 version due to the inefficiency of handling BCD-packed digits.

I ended up avoiding BCD and created a compact 6502 routine which could be configured to:
1) display all digits
2) left justify (by skipping leading zeroes)
3) right justify (by replacing leading zeroes with spaces)
4) print 2 to 5 digit numbers

The configurations range between 53 & 61 bytes. I've also added a 4 byte 'Demo' to each config which prints #$1234 as decimal 4660.

len=$34 Print leading zeroes
0800:A2 12 A9 34 48 8A A2 04 A0 FF 84 3D C8 85 3F 68
0810:85 3E 38 FD 30 08 48 A5 3F FD 34 08 B0 EE 68 98
0820:D0 00 49 B0 20 ED FD A0 00 A5 3E CA F0 F4 10 E2
0830:60 0A 64 E8 10 00 00 03 27

len=$3B Left justify (skip leading zeroes)
0800:A2 12 A9 34 48 8A A2 04 A0 FF 84 3D C8 85 3F 68
0810:85 3E 38 FD 36 08 48 A5 3F FD 3A 08 B0 EE 68 98
0820:D0 04 E6 3D 10 07 49 B0 20 ED FD A0 00 C6 3D A5
0830:3E CA F0 F2 10 DC 60 0A 64 E8 10 00 00 03 27

len=$3D Right justify (spaces instead of leading zeroes)
0800:A2 12 A9 34 48 8A A2 04 A0 FF 84 3D C8 85 3F 68
0810:85 3E 38 FD 38 08 48 A5 3F FD 3C 08 B0 EE 68 98
0820:D0 06 E6 3D 30 02 A9 10 49 B0 20 ED FD A0 00 C6
0830:3D A5 3E CA F0 F2 10 DA 60 0A 64 E8 10 00 00 03 27

I'm interested in making the code smaller if anyone has ideas. Code golf anyone? qkumba?

Below is the source, compatible with Merlin 8, Merlin 16, and Merlin32:

-JB
@JBrooksBSI

1 *-------------------------------
2 * DecPrint - 6502 print 16 bits
3 *
4 * 11/8/2015 by John Brooks
5 *-------------------------------
6 org $800
7
8 Demo ldx #$12 ;X=H
9 lda #$34 ;A=L
10 * Fall into DecPrintU16
11
12 *-------------------------------
13
14 DEC_SKIP0 = 1 ;Set to 1 to skip leading zeroes (left justify)
15 DEC_SPACE0 = 0 ;Set to 1 to print leading spaces (right justify)
16 DEC_DIGITS = 5 ;# of digits to print (2-5)
17 DEC_VARS = $3D
18
19 dum DEC_VARS ;Uses 3 temp bytes (ZP or ABS)
20 DecCtr ds 1 ;Leading zero ctr
21 DecWord ds 2 ;U16 being printed
22 dend
23
24 RomCOut = $FDED
25
26 *-------------------------------
27 * Print U16 as decimal via COUT
28 * IN: X=hi, A=Lo
29 * OUT: X=$FF, Y=$00
30 *-------------------------------
31 DecPrintU16
32 pha
33 txa
34 DecModLen = *+1
35 :MOD ldx #DEC_DIGITS-1
36 ldy #-1
37 sty DecCtr
38
39 :Loop iny
40 sta DecWord+1
41 pla
42 sta DecWord
43
44 :DoDigit sec
45 sbc Power10L-1,x
46 pha
47 lda DecWord+1
48 sbc Power10H-1,x
49 bcs :Loop
50
51 :GotDigit
52 pla
53 tya
54 bne :PrDigit ;Print all non-zero digits
55
56 do DEC_SKIP0
57 inc DecCtr
58 bpl :NoDigit ;Skip leading zeroes
59 else
60 do DEC_SPACE0
61 inc DecCtr
62 bmi :PrDigit ;Print digit if we've seen a non-zero
63 lda #$10 ;Else print a space
64 fin
65 fin
66
67 :PrDigit eor #"0"
68 jsr RomCOut
69 ldy #0
70 :NoDigit
71 do DEC_SKIP0+DEC_SPACE0
72 dec DecCtr
73 fin
74 lda DecWord
75 dex
76 beq :PrDigit
77 bpl :DoDigit
78 rts
79
80 Power10L db <10,<100,<1000,<10000
81 Power10H db >10,>100,>1000,>10000
82
83 lst off

Michael 'AppleWin Debugger Dev'

unread,

Jul 7, 2017, 11:01:36 PM7/7/17

to

On Friday, July 7, 2017 at 7:09:22 PM UTC-7, John Brooks wrote:
> In 2015 I made very compact assembly routines to convert binary words to decimal.

Nice hat trick!
qkumba is going to have very hard time optimizing that table-driven approach.

Might want to add two comments for readability:

Line 63

lda #$10 ;Else print a space: $10 ^ $30 = $20

Line 76
beq :PrDigit ;Print last digit

qkumba

unread,

Jul 7, 2017, 11:54:32 PM7/7/17

to

> qkumba is going to have very hard time optimizing that table-driven approach.

And yet...

:PrDigit eor #"0"
jsr RomCOut

->

:PrDigit jsr PRHEXZ ;FDE5

qkumba

unread,

Jul 8, 2017, 12:02:45 AM7/8/17

to

> :PrDigit jsr PRHEXZ ;FDE5

Not for the right-justified version, though.

plin...@gmail.com

unread,

Jul 8, 2017, 12:07:38 AM7/8/17

to

~35 bytes with leading zeros - can't test it, but I think it's good, albeit very inefficient. It works by putting the low byte in counting down the binary will BCD-ing up the dec. I store the dec in 1, 2, 3 from least to most significant so when I print I can use a count down loop with a bne without a compare.

tax
stx $00
lda #0
sta $01
sta $02
sta $03
loop:
cld
ldx $00
bne .1
cpy #0
beq printer
.1 dex
bne .2
dey
.2 sed
stx $00
ldx #1
addloop:
clc
lda $01,x
adc #1
bcc loop
inx
cpx #4
beq loop
bne addloop
printer:
ldy #3
.1 lda $00,y
jsr $fdda
dey
bne .1
rts

qkumba

unread,

Jul 8, 2017, 12:14:55 AM7/8/17

to

This one does not work for me. It prints all zeroes, no matter the input.
Also the expected input is not X:A as the others. It looks like Y:A but even so...

barrym95838

unread,

Jul 8, 2017, 1:16:49 AM7/8/17

to

On Friday, July 7, 2017 at 9:14:55 PM UTC-7, qkumba wrote:
> This one does not work for me. It prints all zeroes, no matter the input.
> Also the expected input is not X:A as the others. It looks like Y:A but even so...

How about:

300:86 A0 85 A1 A9 00 48 A9 00 A2 10 C9 05 90 02 E9
310:05 26 A0 26 A1 2A CA D0 F2 09 B0 48 A5 A0 05 A1
320:D0 E5 68 20 ED FD 68 D0 FA 60

Left justified, 42 bytes, source available on request.

Mike B.

John Brooks

unread,

Jul 8, 2017, 1:18:39 AM7/8/17

to

On Friday, July 7, 2017 at 9:02:45 PM UTC-7, qkumba wrote:
> > :PrDigit jsr PRHEXZ ;FDE5
>
> Not for the right-justified version, though.

I cleaned up the source, added comments and moved the configuration-specific code into macros as Bredon intended (shout out to Big Mac).

The macro refactor saved a few bytes in the smallest config, and qkumba's mod saved 2 bytes in the two smallest configs:

len=$2F Print leading zeroes
0800:A2 12 A9 34 48 8A A0 FF A2 04 C8 85 3F 68 85 3E
0810:38 FD 2A 08 48 A5 3F FD 2E 08 B0 EE 68 98 20 E5
0820:FD A0 00 A5 3E CA F0 F6 10 E6 60 0A 64 E8 10 00
0830:00 03 27

len=$39 Left justify (skip leading zeroes)
0800:A2 12 A9 34 48 8A A0 FF 84 3D A2 04 C8 85 3F 68
0810:85 3E 38 FD 34 08 48 A5 3F FD 38 08 B0 EE 68 98
0820:D0 04 E6 3D 10 05 20 E5 FD A0 00 C6 3D A5 3E CA
0830:F0 F4 10 DE 60 0A 64 E8 10 00 00 03 27

len=$3D Right justify (spaces instead of leading zeroes)

0800:A2 12 A9 34 48 8A A0 FF 84 3D A2 04 C8 85 3F 68

0810:85 3E 38 FD 38 08 48 A5 3F FD 3C 08 B0 EE 68 98
0820:D0 06 E6 3D 30 02 A9 10 49 B0 20 ED FD A0 00 C6
0830:3D A5 3E CA F0 F2 10 DA 60 0A 64 E8 10 00 00 03 27

Revised, commented source below.

-JB
@JBrooksBSI

1 *-------------------------------
2 * DecPrint - 6502 print 16 bits
3 *
4 * 11/8/2015 by John Brooks
5 *-------------------------------
6 org $800
7
8 Demo ldx #$12 ;X=H
9 lda #$34 ;A=L
10 * Fall into DecPrintU16
11
12 *-------------------------------
13

14 DEC_SKIP0 = 0 ;Set to 1 to skip leading zeroes (left justify)

15 DEC_SPACE0 = 0 ;Set to 1 to print leading spaces (right justify)

16 DEC_DIGITS = 5 ;# of digits to print (1-5)

17 DEC_VARS = $3D
18
19 dum DEC_VARS ;Uses 3 temp bytes (ZP or ABS)
20 DecCtr ds 1 ;Leading zero ctr
21 DecWord ds 2 ;U16 being printed
22 dend
23

24 RomPrHexZ = $FDE5
25 RomCOut = $FDED
26
27 DEC_INIT mac
28 do DEC_SKIP0+DEC_SPACE0
29 sty DecCtr ;-1 means no non-zeroes seen yet
30 fin
31 <<<
32
33 DEC_ZERO mac
34 do DEC_SKIP0
35 bne :PrDigit ;Print all non-zero digits
36 inc DecCtr ;Mark that a zero was found
37 bpl :NoDigit ;Skip leading zeroes
38 else
39
40 do DEC_SPACE0
41 bne :PrDigit ;Print all non-zero digits
42 inc DecCtr ;Mark that a zero was found
43 bmi :PrDigit ;Print digit if we've seen a non-zero
44 lda #$10 ;Else print a space. $10="0" EOR " "
45 fin
46 fin
47 <<<
48
49 DEC_PRINT mac
50 do DEC_SPACE0
51 eor #"0" ;Print 0-9 or space
52 jsr RomCOut
53 else
54 jsr RomPrHexZ ;qkumba opt
55 fin
56 <<<
57
58 DEC_CTR mac
59 do DEC_SKIP0+DEC_SPACE0
60 dec DecCtr ;Mark that a digit was printed
61 fin
62 <<<
63
64 *-------------------------------
65 * Print U16 as decimal via COUT
66 * IN: X=hi, A=Lo
67 * OUT: X=$FF, Y=$00
68 *-------------------------------
69 DecPrintU16
70 pha ;ArgL
71 txa ;ArgH
72 ldy #0-1 ;Start digit is 0 (-1 for extra iny)
73 DEC_INIT ;Init leading zero ctr if needed
74 DecModLen = *+1
75 :MOD ldx #DEC_DIGITS-1 ;# of digits to print
76
77 :Loop iny ;Inc current digit 0-9
78 sta DecWord+1 ;Store remainderH
79 pla
80 sta DecWord ;Store remainderL
81
82 :DoDigit sec ;Try next larger digit
83 sbc :Pow10L-1,x ;Subtract power-of-10 lo
84 pha ;Save in case digit is good
85 lda DecWord+1
86 sbc :Pow10H-1,x ;Subtract power-of-10 hi
87 bcs :Loop ;If sub didn't borrow, try higher digit
88
89 :GotDigit
90 pla ;Subtract failed. Discard remainder low
91 tya ;A=digit 0-9
92 DEC_ZERO ;Handle leading zero case if needed
93 :PrDigit DEC_PRINT ;Print digit 0-9 (or space if right-justify)
94
95 ldy #0 ;Next smaller digit starts at 0
96 :NoDigit
97 DEC_CTR ;Update leading-zero ctr if needed
98 lda DecWord ;Get remainderL (prepare for subtract)
99 dex ;Now calc value of smaller digit
100 beq :PrDigit ;Last digit prints as-is (no leading-zero logic)
101 bpl :DoDigit ;Calc digits 2-DEC_DIGITS
102 rts
103
104 :Pow10L db <10,<100,<1000,<10000
105 :Pow10H db >10,>100,>1000,>10000
106
107 lst off

John Brooks

unread,

Jul 8, 2017, 2:53:14 AM7/8/17

to

Very cool Mike. It looks like the routine calculates low-digit to high-digit, storing digits on the stack. I don't get how the constant ROLing preserves upper digits while the lower digits are being calculated though.

I'm interested in the source and more info on the algorithm. I don't think I've run across this approach before.

-JB
@JBrooksBSI

barrym95838

unread,

Jul 8, 2017, 3:17:17 AM7/8/17

to

On Friday, July 7, 2017 at 11:53:14 PM UTC-7, John Brooks wrote:
> I'm interested in the source and more info on the algorithm.
> I don't think I've run across this approach before.
>
> -JB
> @JBrooksBSI

; Output 16-bit unsigned integer to stdout
; by Michael T. Barry 2017.07.07. Free to
; copy, use and modify, but without warranty
org 768
iout:
stx $a0 ; low-order half
sta $a1 ; high-order half
lda #0 ; null delimiter for print
pha ; repeat {
iout2: ; divide by 10
lda #0 ; remainder
ldx #16 ; loop counter
iout3:
cmp #5 ; partial remainder >= 10 (/2)?
bcc iout4
sbc #5 ; yes: update partial
; remainder, set carry
iout4:
rol $a0 ; gradually replace dividend
rol $a1 ; with the quotient
rol ; A is gradually replaced
dex ; with the remainder
bne iout3 ; loop 16 times
ora #$b0 ; convert remainder to ASCII
pha ; stack digits in ascending
lda $a0 ; order ('0' for zero)
ora $a1
bne iout2 ; } until quotient is 0
pla
iout5:
jsr $fded ; print digits in descending
pla ; order until delimiter is
bne iout5 ; encountered
rts

See http://forum.6502.org/viewtopic.php?f=2&t=3051
for some background info. I might be wrong, but I
think that I stumbled into an original idea there.

Mike B.

qkumba

unread,

Jul 8, 2017, 11:43:11 AM7/8/17

to

I confess that I don't understand how it works, but it's beautiful.
Now let's do 40 bytes:

iout:
stx $a0 ; low-order half
sta $a1 ; high-order half

ldy #0 ; counter for print

; repeat {
iout2: ; divide by 10
lda #0 ; remainder
ldx #16 ; loop counter
iout3:
cmp #5 ; partial remainder >= 10 (/2)?
bcc iout4
sbc #5 ; yes: update partial
; remainder, set carry
iout4:
rol $a0 ; gradually replace dividend
rol $a1 ; with the quotient
rol ; A is gradually replaced
dex ; with the remainder
bne iout3 ; loop 16 times

pha ; stack digits in ascending

iny ; increase digits counter

lda $a0 ; order ('0' for zero)
ora $a1
bne iout2 ; } until quotient is 0

iout5:
pla
jsr $fde5 ; print digits
dey ; in descending order
bne iout5 ; while count remains
rts

John Brooks

unread,

Jul 8, 2017, 12:28:46 PM7/8/17

to

It looks like a divide with dual outputs - the remainder is in A and the quotient replaces the original input.

Running it repeatedly effectively strips digits from low to high.

Nicely done Mike!

So ironically, the smallest 6502 decimal conversion so far is a clever implementation of the canonical 6502 divide:

int iter=0;
while (input)
{
digits[iter++] = input % 10;
input = input / 10;
}

:)

-JB
@JBrooksBSI

Michael 'AppleWin Debugger Dev'

unread,

Jul 8, 2017, 12:47:09 PM7/8/17

to

On Saturday, July 8, 2017 at 9:28:46 AM UTC-7, John Brooks wrote:
> the canonical 6502 divide:
>
> int iter=0;
> while (input)
> {
> digits[iter++] = input % 10;
> input = input / 10;
> }

That doesn't handle the edge case when input == 0. :-/
Trivial enough to fix. :-)

#include <stdio.h>
#include <stdlib.h>

void printu16( unsigned x )
{
char digits[6];
int len = 0;

do
{
digits[len++] = x % 10;
x /= 10;
}
while( x );

while( len )
putchar( '0' | digits[ --len ] );
}

int main( const int nArg, const char *aArg[] )
{
printu16( nArg > 1 ? strtol(aArg[1],NULL,16) : 0x1234 );
putchar( '\n' );
return 0;
}

Michael 'AppleWin Debugger Dev'

unread,

Jul 8, 2017, 12:50:28 PM7/8/17

to

On Saturday, July 8, 2017 at 9:47:09 AM UTC-7, Michael 'AppleWin Debugger Dev' wrote:

P.S.
I prefer to write the output like this -- just to confuse the C noobs who are left wondering what operator '-->' is. :-)

while( len --> 0 )
putchar( '0' | digits[ len ] );

John Brooks

unread,

Jul 8, 2017, 2:02:43 PM7/8/17

to

On Saturday, July 8, 2017 at 8:43:11 AM UTC-7, qkumba wrote:

40 bytes is too easy. Let's do 39 bytes! (+4 for demo)

0300:A9 12 A2 34 86 A0 85 A1 A9 00 48 A9 00 A8 A2 10
0310:C9 05 90 03 E9 05 C8 26 A0 26 A1 2A CA D0 F1 09
0320:B0 88 10 E6 20 ED FD 68 D0 FA 60

Merlin src below.

-JB
@JBrooksBSI

*-------------------------------

* DecPrint - 6502 print 16 bits

*
* by Mike B (barrym95838) 7/7/2017
*
* Disassembly, comments, and
* optimization by John Brooks 7/8/2017
*-------------------------------
lst off

org $0300

Demo lda #$12
ldx #$34
* Fall into DecPrintU16

*-------------------------------

DEC_VARS = $A0

dum DEC_VARS ;Uses 2 temp bytes (ZP or ABS)

DecWord ds 2 ;U16 being printed

dend

RomCOut = $FDED

*-------------------------------

* Print U16 as decimal via COUT

* IN: A=hi, X=lo
* OUT: A=$00, X=$00, Y=$FF
*-------------------------------
DecPrintU16
stx DecWord
sta DecWord+1
lda #0 ;Flag end-of-digits
:DoDigit pha ;Save previous low-order digit
lda #0 ;Remainder=0
tay ;DivDone=true
ldx #16 ;16-bit divide
:Loop cmp #10/2 ;Calc DecWord/10
bcc :Mul2
sbc #10/2 ;Remove high-order digit & shift 1 into DecWord
iny ;DivDone=false
:Mul2 rol DecWord ;Shift /10 result into DecWord
rol DecWord+1
rol ;Shift bits of input into acc (input mod 10)
dex
bne :Loop ;Continue 16-bit divide
ora #"0" ;Convert mod 10 result to ascii
dey
bpl :DoDigit ;If result of /10 was not zero, do next digit
:Print jsr RomCOut ;Print highest digit
pla ;Pull next-highest ascii digit
bne :Print ;If not end-of-digits $00, print more
rts

lst off

qkumba

unread,

Jul 8, 2017, 2:24:35 PM7/8/17

to

> 40 bytes is too easy. Let's do 39 bytes! (+4 for demo)

37.

0300:A9 12 A2 34 86 A0 85 A1 A9 FF 48 A9 00 A8 A2 10
0310:C9 05 90 03 E9 05 C8 26 A0 26 A1 2A CA D0 F1 88
0320:10 E8 20 E5 FD 68 10 FA 60

gid...@sasktel.net

unread,

Jul 8, 2017, 3:35:59 PM7/8/17

to

I have to say that this one bares a few similarities to Woz's last screen calculator.

ASL
TAY
AND #$F0
BPL *+2
ORA #5
BCC *+2
ORA #$A
ASL
ASL
STA $26
TYA
AND #$E
ADC #$10
ASL $26
ROL
STA $27
RTS

Michael 'AppleWin Debugger Dev'

unread,

Jul 8, 2017, 4:09:45 PM7/8/17

to

Topic is U16, but easy enough to extend to S16 version:

0300:A9 FE A2 00 C9 80 90 12
0308:A8 A9 AD 20 ED FD 8A 49
0310:FF 18 69 01 AA 98 49 FF
0318:69 00 86 A0 85 A1 A9 FF
0320:48 A9 00 A8 A2 10 C9 05
0328:90 03 E9 05 C8 26 A0 26
0330:A1 2A CA D0 F1 88 10 E8
0338:20 E5 FD 68 10 FA 60

org $0300

Demo lda #$FE ; 65536 - 65024 = (-)512
ldx #$00

* Fall into DecPrintU16

*-------------------------------

DecWord = $A0
PRHEXZ = $FDE5

RomCOut = $FDED

*-------------------------------
* Print U16 as decimal via COUT
* IN: A=hi, X=lo

*-------------------------------
DecPrintS16
cmp #$80 ;HiLo
bcc DecPrintU16 ;C=(Hi > $7F)
tay ;C=1
lda #"-"
jsr RomCOut ; but destroys C
txa ;Calc 2's complement
eor #$FF
clc
adc #1
tax
tya
eor #$FF
adc #0
; *** Intentional fall into DecPrintU16

*-------------------------------
* Print U16 as decimal via COUT
* IN: A=hi, X=lo
* OUT: A=$00, X=$00, Y=$FF
*-------------------------------
DecPrintU16
stx DecWord
sta DecWord+1

lda #$FF ;Flag end-of-digits

:DoDigit pha ;Save previous low-order digit
lda #0 ;Remainder=0
tay ;DivDone=true
ldx #16 ;16-bit divide
:Loop cmp #10/2 ;Calc DecWord/10
bcc :Mul2
sbc #10/2 ;Remove high-order digit & shift 1 into DecWord
iny ;DivDone=false
:Mul2 rol DecWord ;Shift /10 result into DecWord
rol DecWord+1
rol ;Shift bits of input into acc (input mod 10)
dex
bne :Loop ;Continue 16-bit divide
; ora #"0" ;Convert mod 10 result to ascii
dey
bpl :DoDigit ;If result of /10 was not zero, do next digit

:Print jsr PRHEXZ ;Print highest digit

pla ;Pull next-highest ascii digit

bpl :Print ;If not end-of-digits $FF, print more
rts

Michael J. Mahon

unread,

Jul 8, 2017, 4:55:05 PM7/8/17

to

So while we're at it, drop the initial "cmp #$80" and move the
conditional branch after the "tay" and change it to a "bpl".

That saves two bytes and two cycles in DecPrintS16.

--

-michael

NadaNet 3.1 for Apple II parallel computing!
Home page: http://michaeljmahon.com

"The wastebasket is our most important design
tool--and it's seriously underused."

John Brooks

unread,

Jul 8, 2017, 5:51:09 PM7/8/17

to

Now 35 bytes (+4 for demo code):

0300:A9 12 A2 34 86 A0 85 A1 A9 00 A8 A2 10 C9 05 90
0310:03 E9 05 C8 26 A0 26 A1 2A CA D0 F1 48 A9 FD 48
0320:A9 E1 48 88 10 E2 60

-JB
@JBrooksBSI

*-------------------------------
* DecPrint - 6502 print 16 bits
*
* by Mike B (barrym95838) 7/7/2017
*

* Optimized by J.Brooks & qkumba 7/8/2017

*-------------------------------
lst off
org $0300

dum $A0 ;Uses 2 temp bytes (ZP or ABS)

DecWord ds 2 ;U16 being printed
dend

RomPlaPrHex = $FDE2 ;PLA then PrHexZ

*-------------------------------

Demo lda #$12
ldx #$34

* 4 byte demo falls into DecPrintU16

*-------------------------------
* Print U16 as decimal via COUT
* IN: A=hi, X=lo

* OUT: X=$00, Y=$FF

*-------------------------------
DecPrintU16
stx DecWord
sta DecWord+1

:DoDigit lda #0 ;Remainder=0

tay ;DivDone=true
ldx #16 ;16-bit divide

:Div10 cmp #10/2 ;Calc DecWord/10
bcc :Under10

sbc #10/2 ;Remove high-order digit & shift 1 into DecWord
iny ;DivDone=false

:Under10 rol DecWord ;Shift /10 result into DecWord

rol DecWord+1
rol ;Shift bits of input into acc (input mod 10)
dex

bne :Div10 ;Continue 16-bit divide
pha ;Push low digit 0-9 to print
lda #>RomPlaPrHex-1
pha ;Push address of ROM nibble print
lda #<RomPlaPrHex-1
pha
dey ;Chk DivDone > 0

bpl :DoDigit ;If result of /10 was not zero, do next digit

rts
lst off

sandraal...@gmail.com

unread,

Jul 8, 2017, 6:10:16 PM7/8/17

to

You know what makes it extra useful is that you should be able to leverage the same routine to print octal and binary, just by changing the divisor, or by keeping it (Radix/2) in zero-page somewhere. :-)

John Brooks

unread,

Jul 8, 2017, 7:41:51 PM7/8/17

to

Another byte bites the dust.

Now 34 bytes (+4 demo):

0300:A2 12 A9 34 20 4A FF A9 00 A8 A2 10 C9 05 90 03
0310:E9 05 C8 26 45 26 46 2A CA D0 F1 48 A9 FD 48 A9
0320:E1 48 88 10 E2 60

-JB
@JBrooksBSI

roger....@gmail.com

unread,

Jul 8, 2017, 11:23:56 PM7/8/17

to

On Saturday, July 8, 2017 at 4:51:09 PM UTC-5, John Brooks wrote:
> pha ;Push low digit 0-9 to print
> lda #>RomPlaPrHex-1
> pha ;Push address of ROM nibble print
> lda #<RomPlaPrHex-1
> pha

Hehe...this is an excellent way of getting out of the right to left display.

Also nice abuse of an unlabeled monitor address. (I would have gone with prhex-2.)

Impressive - and tacky - at the same time!

John Brooks

unread,

Jul 8, 2017, 11:37:09 PM7/8/17

to

It was only a question of whether I found it before qkumba. He is quite the code golfer.

Did you see my 34 byte version? It has another unorthodox use of the monitor ROM to save a byte.

-JB
@JBrooksBSI

barrym95838

unread,

Jul 9, 2017, 12:27:45 AM7/9/17

to

On Saturday, July 8, 2017 at 8:37:09 PM UTC-7, John Brooks wrote:
> ... It was only a question of whether I found it before qkumba.

> He is quite the code golfer.

> ...

I am not in the same league, but I'm proud that I found a fresh
path which was deserving of your golfing attention. That super-
compact UM/MOD just popped up during one of my rare bursts of
inspiration. If only they weren't so rare and random ... I wish
I could save a few of those up, and use them at a time and place
of my choosing once in a while ...

Mike B.

John Brooks

unread,

Jul 9, 2017, 1:51:57 AM7/9/17

to

As a lifelong programmer ('79 to present), I've written a ton of decimal print routines and have never come across this exact method.

I've seen calls to DIV/MOD routines which are typically huge and slow. The go-to strategies are typically BCD on older CPUs and power-of-10 lookups on newer CPUs (or hardware divide).

I think I may have seen and possibly written a similar in-place-result 'DIV & MOD' loop in the 1980s when I was writing 3D math routines for my flight simulators on the //e & IIGS (Tomahawk), but I just considered the in-place-result to be a minor space savings rather than a compelling feature.

IMO the dual-result 'DIV & MOD' shift/sub loop is not nearly as well known in computer science as the dual-result 'Sin & Cos' commonly used in shader programming.

For the application of printing decimal numbers, your approach fits like a glove. The ability to peel off low digits while retaining the high digits allows a compact recursive-like reprocessing of the input number.

It's not a speed-demon, but I don't see how it can be beat for compact code size.

Nicely done!

-JB
@JBrooksBSI

John Brooks

unread,

Jul 9, 2017, 2:04:26 AM7/9/17

to

On Saturday, July 8, 2017 at 9:27:45 PM UTC-7, barrym95838 wrote:

Here is my last optimization idea: use the overflow flag to determine when to conclude the divide.

So now it's 33 bytes (+4 for demo):

300:A2 12 A9 34 20 4A FF A9 00 B8 A2 10 C9 05 90 03
310:E9 85 38 26 45 26 46 2A CA D0 F1 48 A9 FD 48 A9 E1 48 70 E3 60

*-------------------------------
* DecPrint - 6502 print 16 bits

* Merlin 8/16/32 assembler
*
* by Michael T. Barry 2017.07.07. Free to
* copy, use and modify, but without warranty

*
* Optimized by J.Brooks & qkumba 7/8/2017
*-------------------------------
lst off
org $0300

ZpDecWord = $45 ;U16 being printed

RomPlaPrHex = $FDE2 ;PLA then PrHexZ

RomSave = $FF4A ;A->$45, X->$46, Y->$47

*-------------------------------

Demo ldx #$12
lda #$34

* 4 byte demo falls into DecPrintU16

*-------------------------------
* Print U16 as decimal via COUT
* IN: A=hi, X=lo
* OUT: X=$00, Y=$FF
*-------------------------------
DecPrintU16

jsr RomSave ;Save A,X to $45,$46
:DoDigit lda #0 ;Remainder=0
clv ;V=0 means div result = 0
ldx #16 ;16-bit divide
:Div10 cmp #10/2 ;Calc ZpDecWord/10
bcc :Under10
sbc #10/2+$80 ;Remove digit & set V=1 to show div result > 0
sec ;Shift 1 into div result
:Under10 rol ZpDecWord ;Shift /10 result into ZpDecWord
rol ZpDecWord+1

rol ;Shift bits of input into acc (input mod 10)
dex
bne :Div10 ;Continue 16-bit divide

pha ;Push low digit 0-9 to print
lda #>RomPlaPrHex-1
pha ;Push address of ROM nibble print
lda #<RomPlaPrHex-1
pha

bvs :DoDigit ;If V=1, result of /10 was > 0 & do next digit
rts
lst off

barrym95838

unread,

Jul 9, 2017, 3:54:51 AM7/9/17

to

On Saturday, July 8, 2017 at 11:04:26 PM UTC-7, John Brooks wrote:
> So now it's 33 bytes (+4 for demo):
>
> 300:A2 12 A9 34 20 4A FF A9 00 B8 A2 10 C9 05 90 03
> 310:E9 85 38 26 45 26 46 2A CA D0 F1 48 A9 FD 48 A9 E1 48 70 E3 60
>

You didn't even trash Y, AFAICT. I'm not sure, but I think that it
might be the ultimate in 6502 coding bad-assery when it's not even
remotely obvious how to improve an optimization by throwing another
register at it.

Did you put the high-half in $45 and the low half in $46? If so,
do you need to "rol ZpDecWord+1" before you "rol ZpDecWord", or is
it just way past my bed-time?

Mike B.

Antoine Vignau

unread,

Jul 9, 2017, 3:56:33 AM7/9/17

to

You, guys, are crazy!

Why write a so small routine when we have 64KB of RAM space ;-)

Antoine

Michael 'AppleWin Debugger Dev'

unread,

Jul 9, 2017, 9:37:48 AM7/9/17

to

On Sunday, July 9, 2017 at 12:56:33 AM UTC-7, Antoine Vignau wrote:
> You, guys, are crazy!
> Why write a so small routine when we have 64KB of RAM space ;-)

/sarcasm Oh hush with your 32 GB of system ram. Oh wait, that's my i7 dev box. =P :-)

Does this mean we can coin a new quote?

"64KB aught to be enough for anybody" -- Antoine

:-)

vi...@pianoman.cluster.toy

unread,

Jul 9, 2017, 4:13:44 PM7/9/17

to

If anyone wants to continue optimizing 6502 code for size, I should
point out my "ll_asm" code density project.

http://www.deater.net/weave/vmwprod/asm/ll/ll.html

where I optimize the same program for size in assembly language on 30+
architectures. The code most of interest is LZSS decompression, but
integer printing and strcat() are involved as well.

I get a lot of complaints about my lack of assembly skills. Now that
I finally got m68k reoptimized (after a lot of badgering by
Amiga/m68k diehards) the next most complained about arch is 6502.

I also have some partially done 65c816 code, but I was developing it
on a SNES not on a GS.

Vince

gid...@sasktel.net

unread,

Jul 9, 2017, 4:55:33 PM7/9/17

to

Personally I am confused by the whole thing when since ROM routines are being used:

300:A2 34 A9 12 4C 24 ED

300:LDX #$34
LDA #$12
JMP $ED24

does exactly the same thing in only 3 bytes (+4 for demo)

gid...@sasktel.net

unread,

Jul 9, 2017, 5:04:46 PM7/9/17

to

For anyone wanting a quick 3-way conversion, hex, decimal, binary.

Here is my 16-bit ampersand utility.

Just launch at the applesoft prompt, then type:

]&$FFFF or,
]&$65535 or,
]&%1111111111111111

9000: a9 10 8d f6 03 a9 90 8d f7 03 60 00 00 00 00 00
9010: a2 01 a0 02 c9 24 d0 1d bd 00 02 f0 0b 09 80 9d
9020: 00 02 20 b1 00 e8 d0 f0 20 a7 ff a6 3e a5 3f 20
9030: 24 ed 4c 50 90 c9 23 d0 3c 20 b1 00 a9 a4 20 ed
9040: fd 20 67 dd 20 52 e7 85 3f a6 50 86 3e 20 41 f9
9050: a2 01 20 8e fd a2 11 06 3e 26 3f b0 03 a9 b0 2c
9060: a9 b1 2c a9 a0 20 ed fd ca e0 09 f0 f6 e8 ca d0
9070: e6 4c 95 d9 60 c9 25 d0 fb a9 00 85 3e 85 3f a2
9080: 02 bd 00 02 f0 0f 18 29 01 f0 01 38 26 3e 26 3f
9090: e8 e0 20 90 ec 20 8e fd a9 a3 20 ed fd a6 3e a5
90a0: 3f 20 24 ed 20 8e fd a9 a4 20 ed fd a6 3e a5 3f
90b0: 20 41 f9 20 8e fd 4c 95 d9 00 00 00 00 00 00 00

Antoine Vignau

unread,

Jul 9, 2017, 5:25:29 PM7/9/17

to

Yeah, you got it right! Thank you, I am so honored :-)
Antoine

Anthony Lawther

unread,

Jul 9, 2017, 5:59:41 PM7/9/17

to

Michael,

I grit my teeth in silence at your use of then instead of than when used
for comparison, but I can't let this slide: there are multiple definitions
of aught but not one of them means the same as ought.

barrym95838

unread,

Jul 9, 2017, 6:01:39 PM7/9/17

to

On Sunday, July 9, 2017 at 1:55:33 PM UTC-7, gid...@sasktel.net wrote:
> Personally I am confused by the whole thing when since ROM routines are being used:
>
> 300:A2 34 A9 12 4C 24 ED
>
> 300:LDX #$34
> LDA #$12
> JMP $ED24
>
> does exactly the same thing in only 3 bytes (+4 for demo)

I can't speak for the others, but I suffer from a very specific
form of mental illness, and even though I'm not very good at it,
6502 code golfing temporarily relieves some of my symptoms.

I offer the following compromise, in 38 bytes, using only COUT
and two bytes of zero-page. It should be easily adaptable to
any 65xx-based system which doesn't have that fat hog $ED24 or
any cool undocumented monitor entry points. It can be easily
adapted to print in bases 2, 4, 6, and 8 as well, and any even
numbered base from 12 to 36 with just a few more bytes of code.

300:86 A0 85 A1 A9 00 48 A9 00 B8 A0 10 C9 05 90 03
310:E9 85 38 26 A0 26 A1 2A 88 D0 F1 09 B0 70 E7 20
320:ED FD 68 D0 FA 60

Mike B.

Michael 'AppleWin Debugger Dev'

unread,

Jul 9, 2017, 7:59:40 PM7/9/17

to

On Sunday, July 9, 2017 at 2:59:41 PM UTC-7, Anthony Lawther wrote:
> > "64KB aught to be enough for anybody" -- Antoine
>

> Michael,
> I grit my teeth in silence at your use of then instead of than when used
> for comparison,

I don't see any use "then" or "than" in any of my posts in _this_ thread ... but if I have buggered them up in _other_ threads then yup, guilty as charged -- I can never seem to remember which is which -- even though I keep looking at the Oatmean's cheatsheet from time to time.
http://theoatmeal.com/comics/misspelling

> there are multiple definitions
> of aught but not one of them means the same as ought.

Ah, TIL'd about ought vs aught. I always assumed the meant the same thing. Mea culpa. :-/

Bloody English and its billion homonyms already. :-)

qkumba

unread,

Jul 9, 2017, 10:24:28 PM7/9/17

to

> Personally I am confused by the whole thing when since ROM routines are being used:
>
> 300:A2 34 A9 12 4C 24 ED
>
> 300:LDX #$34
> LDA #$12
> JMP $ED24
>
> does exactly the same thing in only 3 bytes (+4 for demo)

Except when it doesn't.
You are assuming that Applesoft is present. We are not.

Michael 'AppleWin Debugger Dev'

unread,

Jul 10, 2017, 12:06:12 AM7/10/17

to

On Sunday, July 9, 2017 at 1:55:33 PM UTC-7, gid...@sasktel.net wrote:

> Personally I am confused by the whole thing when since ROM routines are being used:
>
> 300:A2 34 A9 12 4C 24 ED
>
> 300:LDX #$34
> LDA #$12
> JMP $ED24
>
> does exactly the same thing in only 3 bytes (+4 for demo)

Except ...

1. It requires Applesoft to have been initialized,
2. Which in turn stomps over memory all over the place,
3. Ergo its memory usage is non-deterministic
4. Calling Applesoft COLD start or WARM start @ $E000 or $E003 respectively _doesn't_ return to the caller -- HOW does one initialize Applesoft properly and yet allow LINPRT to be called safely?
5. How could you call this in a boot sector?
6. Its D-O-G slow, and inefficient.

Digit peeling doesn't have any of these disadvantages and "just works."
The goal here is to minimize coupling and maximize locality.

It is always good to have different, orthogonal tools in the proverbial toolbox.

For a different problem LINPRT might make more sense.

gid...@sasktel.net

unread,

Jul 10, 2017, 1:52:29 AM7/10/17

to

Applesoft does not have to be initialized, just the I/0 vectors ($36.39) set up to use $ED24, which answers questions 1, 4 and 5.

As far as speed goes, I am trying to find a use for this highly efficient tool that requires a lot of converting hex digits and printing them to the screen in decimal format where speed would be a factor.

What are some general use scenarios that could be used with this utility?

Michael J. Mahon

unread,

Jul 10, 2017, 3:06:08 AM7/10/17

to

Michael 'AppleWin Debugger Dev' <michael....@gmail.com> wrote:

The way you retain control after jumping to the Applesoft cold-start
address is to set the COUT vector to branch to your return point.

When Applesoft completes its initialization, it tries to print a "]"
prompt, and that's where you grab control back.

--
-michael - NadaNet 3.1 and AppleCrate II: http://michaeljmahon.com

Anthony Lawther

unread,

Jul 10, 2017, 9:14:45 AM7/10/17

to

Michael 'AppleWin Debugger Dev' <michael....@gmail.com> wrote:

I'm a self confessed spelling and grammar nerd, but I'm often reminded that
I'm not supposed to hold others to the same high standard.

I'm glad I helped you learn today, and you didn't take offense.

Now back to 6502 assembly discussions :-)

Harry Potter

unread,

Jul 10, 2017, 10:54:03 AM7/10/17

to

On Sunday, July 9, 2017 at 2:04:26 AM UTC-4, John Brooks wrote:
> So now it's 33 bytes (+4 for demo):
>
> 300:A2 12 A9 34 20 4A FF A9 00 B8 A2 10 C9 05 90 03
> 310:E9 85 38 26 45 26 46 2A CA D0 F1 48 A9 FD 48 A9 E1 48 70 E3 60
>

Uhh...may I use your code in my CBM/Apple2SimpleIO libraries? I tested it out and got good results. If I may, how would you like to be credited?

James Davis

unread,

Jul 10, 2017, 3:30:36 PM7/10/17

to

On Monday, July 10, 2017 at 6:14:45 AM UTC-7, Anthony Lawther wrote:
> I'm a self confessed spelling and grammar nerd, ....

Ditto that! So am I.

For all those who have trouble remembering whether to use 'then' or 'than,' think: "'then' it will happen, rather 'than,' 'then' it will not happen." And: "this rather 'than' that." In other words, equate 'then' with something 'happening' and 'than' with 'rather than' (comparing two things).

Definitions:

Then:

1. then - adv
1 : at that time
2 : soon after that : next
3 : in addition : besides
4 : in that case 5 : consequently

2. then - n : that time <since ~>

3. then - adj : existing or acting at that time
<the then attorney general>

Than:

1. than - conj
1 — used after a comparative adjective or adverb to introduce the second part of a comparison expressing inequality <older than I am>
2 — used after other or a word of similar meaning to express a difference of kind, manner, or identity <adults other than parents>

2. than - prep : in comparison with <older than me>

(c)2000 Zane Publishing, Inc. and Merriam-Webster, Incorporated. All rights reserved

Michael 'AppleWin Debugger Dev'

unread,

Jul 10, 2017, 4:55:16 PM7/10/17

to

On Monday, July 10, 2017 at 12:30:36 PM UTC-7, James Davis wrote:
> On Monday, July 10, 2017 at 6:14:45 AM UTC-7, Anthony Lawther wrote:
> > I'm a self confessed spelling and grammar nerd, ....
>
> Ditto that! So am I.

TL:DR;

then - compare time
than - compare things

James Davis

unread,

Jul 10, 2017, 6:32:53 PM7/10/17

to

OK, Michael. So now that you really understand it, you won't make that mistake again, right?

John Brooks

unread,

Jul 10, 2017, 8:29:14 PM7/10/17

to

Yes, it is a public release with no restrictions. There are multiple authors, so credit Mike, qkumba, and me, or credit this CSA2 thread "6502 Print U16 as decimal" so interested readers can learn more.

-JB
@JBrooksBSI

barrym95838

unread,

Jul 10, 2017, 8:43:14 PM7/10/17

to

On Monday, July 10, 2017 at 5:29:14 PM UTC-7, John Brooks wrote:
> Yes, it is a public release with no restrictions. There are multiple
> authors, so credit Mike, qkumba, and me, or credit this CSA2 thread
> "6502 Print U16 as decimal" so interested readers can learn more.
>
> -JB
> @JBrooksBSI

I second that.

Mike B.

Michael 'AppleWin Debugger Dev'

unread,

Jul 10, 2017, 10:46:40 PM7/10/17

to

On Monday, July 10, 2017 at 3:32:53 PM UTC-7, James Davis wrote:
> OK, Michael. So now that you really understand it, you won't make that mistake again, right?

Rather than make promises that I have no hope of living up to I will remain silent then to give false impressions that I am cured. :-)

And yes, it is ironic if I made any above. :-)

Harry Potter

unread,

Jul 11, 2017, 9:29:17 AM7/11/17

to

On Monday, July 10, 2017 at 8:29:14 PM UTC-4, John Brooks wrote:
> Yes, it is a public release with no restrictions. There are multiple authors, so credit Mike, qkumba, and me, or credit this CSA2 thread "6502 Print U16 as decimal" so interested readers can learn more.
>

Done! :)

Harry Potter

unread,

Jul 11, 2017, 10:02:42 AM7/11/17

to

Does anybody here want me to post *my* version of the U16Print routine here? :)

Michael 'AppleWin Debugger Dev'

unread,

Jul 11, 2017, 2:30:04 PM7/11/17

to

On Tuesday, July 11, 2017 at 7:02:42 AM UTC-7, Harry Potter wrote:
> Does anybody here want me to post *my* version of the U16Print routine here? :)

No.

barrym95838

unread,

Jul 13, 2017, 11:23:46 AM7/13/17

to

On Sunday, July 9, 2017 at 1:13:44 PM UTC-7, vi...@pianoman.cluster.toy wrote:
> If anyone wants to continue optimizing 6502 code for size, I should
> point out my "ll_asm" code density project.
>
> http://www.deater.net/weave/vmwprod/asm/ll/ll.html
>
> where I optimize the same program for size in assembly language on 30+
> architectures. The code most of interest is LZSS decompression, but
> integer printing and strcat() are involved as well.
>

> ...

Thanks for reminding me, Vince. There was a murmur about you over on
6502.org about a year or so ago, and I started to do a rewrite. I was
making decent progress, but multiple distractions caused me to push it
to the back burner, and it finally fell off the back of my stove. I
will try to retrieve it and finish, but the distractions keep piling
up at an alarming rate, so I can't even begin to guess about an ETA.

http://forum.6502.org/viewtopic.php?f=1&t=3044&hilit=weaver

Mike B.

John Brooks

unread,

Jul 13, 2017, 1:08:34 PM7/13/17

to

Hi Vince. I took a quick look at the linux logo project and there's a lot of room to improve it, as Mike mentioned.

I see that the current smallest exe is 68k. Around 1988, while working at Datasoft, we were creating the same game on 6502 and 68k computers and ran friendly competitions among the assembly programs to see which version would have the smallest code.

We found that for game logic (if/else heavy) and ascii/byte operations, the 6502 code would be 0.75x to 0.5x the size of 68k code.

For parts of the games which required 16 bit or higher math or ptrs, the 6502 would be 1.5x to 4x larger than 68k.

Anyway, I'm pretty sure the 6502 version of ll could be much smaller than the current 68k version, if coded 'the 6502 way'.

I'm pretty sure the ascii art of the logo and the 6502 code to unpack and print it as text on the Apple II would fit in about 256 bytes if the compression was switched to RLE and an efficient 6502 implementation was used.

If I get time I'll make an example as these refactors are easier to see than describe.

-JB

Michael 'AppleWin Debugger Dev'

unread,

Jul 14, 2017, 12:12:41 PM7/14/17

to

On Thursday, July 13, 2017 at 10:08:34 AM UTC-7, John Brooks wrote:
> Hi Vince. I took a quick look at the linux logo project and there's a lot of room to improve it, as Mike mentioned.

That's a bit of an understatement. :-)

The whole HGR y to address calculation takes up $40 bytes which can be simplified down to $1B bytes.

File: no-os/6502_apple/ll_6502.s
Func: y_to_addr
Lines: 585-642

Replace with Woz's code:
ASL ;A--BCDEFGH0
TAX ;TAX...TXA could be TAY...TYA
AND #$F0 ;A--BCDE0000
BPL _1 ;B=0
ORA #$05 ;A--BCDE0B0B
_1 BCC _2 ;A-0
ORA #$0A ;A--BCDEABAB
_2 ASL ;B--CDEABAB0
ASL ;C--DEABAB00
STA YADDRL
TXA ;C--BCDEFGH0
AND #$0E ;C--0000FGH0
ADC #$10 ;O--OOxxFGH0 ; HPAG2 = $10 for base $2000, $20 for base $4000
ASL YADDRL ;D--00xxFGHC GBASL=EABAB000
ROL ;0--0xxFGHCD
STA YADDRH
RTS

= How it works =

The psuedo code to convert an y coordinate to an HGR address is:

return 0x2000 + (y&7)*0x400 + ((y/8)&7)*0x80 + (y/64)*0x28;

That is, given:

y = 0 ... 191

Or in binary

y = abcdefgh

Determine the start of the HGR scanline address it corresponds to.

We can break the terms of the address calculation into 3 parts:

(y/64)*0x28
y = abcdefgh
y/64 = 000000ab
y>>6 = 000000ab
ab * $28 = 0aba b000
00 = $00 = 0000_0000
01 = $28 = 0010_1000
10 = $50 = 0101_0000
11 = $78 = 0111_1000

(y%8)*0x400
y = abcdefgh
y%8 = 00000fgh
y&7 = 00000fgh
fgh * $400 = 000f gh00 00000000
000 = $0000 = 0000_0000 00000000
001 = $0400 = 0000_0100 00000000
010 = $0800 = 0000_1000 00000000
011 = $0C00 = 0000_1100 00000000
100 = $1000 = 0001_0000 00000000
101 = $1400 = 0001_0100 00000000
110 = $1800 = 0001_1000 00000000
111 = $1C00 = 0001_1100 00000000

((y/8)&7)*0x80;
y = abcdefgh
y/8 = 000abcde
&7 = 111
= 00000cde
cde * $80 = cd_e000_0000
000 = $000 = 00_0000_0000
001 = $080 = 00_1000_0000
010 = $100 = 01_0000_0000
011 = $180 = 01_1000_0000
100 = $200 = 10_0000_0000
101 = $280 = 10_1000_0000
110 = $300 = 11_0000_0000
111 = $380 = 11_1000_0000

Our address is of the form:
addr = 0000_0000 0aba b000 ((y/64) )*0x028
addr = 000f gh00 0000_0000 ((y%8) )*0x400
addr = 0000 00cd e000 0000 ((y/8)&7)*0x080
addr = 000f ghcd eaba b000

Reference:
https://github.com/AppleWin/AppleWin/wiki

Michael 'AppleWin Debugger Dev'

unread,

Jul 14, 2017, 12:17:49 PM7/14/17

to

Also, the old code, y_to_addr:, requires Y to be zero prior to calling due the STY:

;==================================================
; y_to_addr - convert y value to address in mem
;==================================================
; this is needlessly complicated. Blame Steve Wozniak
; apparently it was a clever hack to avoid the need
; for dedicated memory refresh circuitry
...
sty YADDRL

Michael J. Mahon

unread,

Jul 14, 2017, 12:39:33 PM7/14/17

to

Michael 'AppleWin Debugger Dev' <michael....@gmail.com> wrote:

It's a lot more than refresh circuitry--he also avoided all DRAM refresh
interference with the processor, which is crucial in permitting
deterministic processor timing.

sandraal...@gmail.com

unread,

Jul 14, 2017, 3:21:26 PM7/14/17

to

On Thursday, July 13, 2017 at 10:08:34 AM UTC-7, John Brooks wrote:

To provide a "fair" comparison, your "refactoring" should use the same LZSS input as the versions for the other CPUs.

John Brooks

unread,

Jul 14, 2017, 4:57:53 PM7/14/17

to

I would if LZSS was an efficient codec for this use-case, but unfortunately it is not. The ascii art logo is short-length-RLE in nature which is ill-suited to LZSS's design of handling large files of english text with repeating words and phrases.

My philosophy is rather than do garbage-in, garbage-out, solve the problem the best way possible and then let the other platforms see how much they can improve to match or beat the 'optimal' version.

-JB
@JBrooksBSI

Michael J. Mahon

unread,

Jul 14, 2017, 6:41:03 PM7/14/17

to

Exactly.

When doing cross-platform, cross-architecture comparisons, only the
function need remain constant. The best algorithms and data
representations are architecture-specific (unless the external data
representation is a part of the specification, e.g.: GIF).

Michael 'AppleWin Debugger Dev'

unread,

Jul 17, 2017, 4:26:41 AM7/17/17

to

On Friday, July 14, 2017 at 1:57:53 PM UTC-7, John Brooks wrote:
> I would if LZSS was an efficient codec for this use-case, but unfortunately it is not. The ascii art logo is short-length-RLE in nature which is ill-suited to LZSS's design of handling large files of english text with repeating words and phrases.

Indeed. LZSS is horribly inefficient for this data set.

I replaced the bloated LZSS + data (283 bytes) with simple 2-bit per character data (70 chars * 12 rows) = 210 bytes packed.

For decompression unpack 2 bits to 4 bits (2 pixels)

I'm pretty sure the decompression could be optimized. Maybe if qkumba is bored he can take a look. :-)
https://github.com/Michaelangel007/6502_linux_logo/blob/master/linuxlogo.s#L233-L334

David Schmenk

unread,

Jul 17, 2017, 10:06:09 AM7/17/17

to

This was the same approach I took for the Apple 1 30th birthday slideshow: 10 40x23 slides for hours of entertainment in only 3.5 K. No doubt it could have been smaller, but you can't beat the simplicity.

qkumba

unread,

Jul 17, 2017, 12:59:14 PM7/17/17

to

> I'm pretty sure the decompression could be optimized. Maybe if qkumba is bored he can take a look. :-)

Without focusing on the algorithm in Unpack2Bits, here are some quick suggestions:

lda zUnpackBits
and #3 ; A=000000ba
pha
asl ; A=00000ba0
asl ; A=0000ba00
sta zMask
pla
ora zMask ; A=0000baba

->

lda zUnpackBits
asl ; A=00000ba0
asl ; A=0000ba00
eor zUnpackBits ;A=0000xxyy
and #$0c ; A=0000xx00
eor zUnpackBits ;A=0000baba

--

NoShiftSherlock
cmp #$80 ; msb of byte0 set?
rol zMask ; shift in to lsb of byte1

ldx zDstShift ; x={0,1,2} + 4 < 7
cpx #3 ; all bits fit into dest byte?

ldx zSaveX
ora #$80

->

NoShiftSherlock
asl ; msb of byte0 set?
rol zMask ; shift in to lsb of byte1

sec
ror

ldx zDstShift ; x={0,1,2} + 4 < 7
cpx #3 ; all bits fit into dest byte?

ldx zSaveX

--

Draw8Rows
ldy #0
CopyScanLine
lda UnpackAddr,Y
sta (zHgrPtr),Y

cpx #0 ; Clear source on last scanline copy
bne CopyNextByte
txa
sta UnpackAddr,Y
CopyNextByte
iny
cpy #40 ; 280/7 = 40 bytes/scanline
bne CopyScanLine

->

Draw8Rows
ldy #39
CopyScanLine
lda UnpackAddr,Y
sta (zHgrPtr),Y

txa ; Clear source on last scanline copy
bne CopyNextByte
sta UnpackAddr,Y
CopyNextByte
dey ; 280/7 = 40 bytes/scanline
bpl CopyScanLine

--

dex
bpl Draw8Rows

ldy zSaveY

lda zCursorY
cmp #$14 ; Y=$40 .. $A0, Rows $8..$13 (inclusive)
bcs OuputDone

ldx #0
stx zSaveX
beq LineNotDone

->

stx zSaveX
dex
bpl Draw8Rows

ldy zSaveY

lda zCursorY
cmp #$14 ; Y=$40 .. $A0, Rows $8..$13 (inclusive)
bcs OuputDone

bcc LineNotDone

--

LineNotDone
stx zDstShift
;;clc ;clear already in all paths

vi...@pianoman.cluster.toy

unread,

Jul 17, 2017, 5:14:10 PM7/17/17

to

On 2017-07-14, John Brooks <jbr...@blueshiftinc.com> wrote:

> I would if LZSS was an efficient codec for this use-case, but
> unfortunately it is not. The ascii art logo is short-length-RLE in
> nature which is ill-suited to LZSS's design of handling large files of
> english text with repeating words and phrases.

Interesting. When I originally started on this project 10+ years ago,
I used RLE but it turned out that LZSS won by a large margin at least
on x86. The uncompress code was smaller for RLE but the data input
was much smaller.

Maybe that doesn't hold on 6502. This is code I wrote a while ago and
my 6502 assembly skills have never been that great, things fall apart
anytime I need to deal with values wider than 8-bits.

Although the current top size decrease is amazing, the purpose of
the project long ago has morphed from "how small can I print an ascii
art logo" to "how small can I run lzss, followed by some text parsing
of a file read from disk".

Vince

Michael 'AppleWin Debugger Dev'

unread,

Jul 17, 2017, 5:17:25 PM7/17/17

to

On Monday, July 17, 2017 at 9:59:14 AM UTC-7, qkumba wrote:
> > I'm pretty sure the decompression could be optimized. Maybe if qkumba is bored he can take a look. :-)
>
> Without focusing on the algorithm in Unpack2Bits, here are some quick suggestions:
> lda zUnpackBits

> asl ; A=00000ba0
> asl ; A=0000ba00
> eor zUnpackBits ;A=0000xxyy
> and #$0c ; A=0000xx00
> eor zUnpackBits ;A=0000baba

Sorry, my comments were incomplete!
This needs a trailing AND #$F
I've updated the annotation of A reg

lda zUnpackBits
asl ; A=?????ba0
asl ; A=????ba00
eor zUnpackBits ; A=????xxyy
and #%00001100 ; A=????xx00
eor zUnpackBits ; A=????baba
and #%00001111 ; A=0000baba

> --
> NoShiftSherlock

Nice compact way to set the MSB :-)

> --
>
> Draw8Rows

Ah, yes, count down instead of up to save an redundant CPY

> --

> stx zSaveX
> dex
> bpl Draw8Rows

Sweet trick of setting X=0 on last iteration.
Definitely going to have to remember that one.

> --
>
> LineNotDone
> stx zDstShift

We need to restore X, so we need:

dex
bpl Draw8Rows
inx
:
ldx zSaveX

> ;;clc ;clear already in all paths

Nice eyes!

Down to 696 bytes -- again ;-)
(I added a pretty-print of the models.)

Michael 'AppleWin Debugger Dev'

unread,

Jul 17, 2017, 5:32:56 PM7/17/17

to

On Monday, July 17, 2017 at 2:14:10 PM UTC-7, vi...@pianoman.cluster.toy wrote:
> Although the current top size decrease is amazing, the purpose of
> the project long ago has morphed from "how small can I print an ascii
> art logo" to "how small can I run lzss, followed by some text parsing
> of a file read from disk".
>
> Vince

1. Since we're using compression / packing anyways is it "cheating" if we store a packed 2-bits/char instead of the ANSI text? :-)

As Mike Acton would say in his 3 Big Lies -- Lie #1 Software is the Platform.
https://youtu.be/rX0ItVEVjHc?t=1041

No, the *Platform* IS the problem. The problem is:
"How do we store in an efficient manner and display the Linux Logo on an Apple 2?"

The first issue we run into is there is no 80-columns on the Apple ][ and ][+.

The second issue is how do store the HGR data efficiently?

This is a different problem from an x86 where no one cares if we waste 80*12 = 960 bytes. Storing the data in an optimal format per platform is a valid solution. Why waste time+space storing it compressed in ASCII when we will only show in HGR mode on the Apple 2??

2. I added 48K / 64K detection. Feel free to borrow.

i.e.
detect_langcard
sta RAMIN ; Detect 16K RAM / Language Card
sta RAMIN ; Read RAM

lda $D000
eor #$FF
sta $D000
cmp $D000
bne apple_ii_48K
eor #$FF
sta $D000

RAM_64K
ldx #"6"
ldy #"4"
bne RAM_size
apple_ii_48K
ldx #"4"
ldy #"8"
RAM_size

done_detecting
sta ROMIN ; Turn off Language Card
sta ROMIN ; if it was probed

3. I dropped the disk read because on x86 we could simply call INT 13h to read a Track/Sector which isn't exactly "fair."
https://en.wikipedia.org/wiki/INT_13H

It wouldn't be _that_ hard to re-use the P5 firmware $C600 code to read a Sector of data for the Apple 2 version.

Michael J. Mahon

unread,

Jul 17, 2017, 6:15:22 PM7/17/17

to

Of course, the compression ratio does not depend on the processor, but only
on the input data and the compression algorithm.

I suspect that the statistics of the input data have changed significantly
from when you did your early experiments.

There is no such thing as a universally "best" compression algorithm. As
the input changes, the best compression algorithm changes.

gid...@sasktel.net

unread,

Jul 18, 2017, 1:22:21 AM7/18/17

to

> There is no such thing as a universally "best" compression algorithm. As
> the input changes, the best compression algorithm changes.

Is it not also possible that the "best compression algorithm" can also change according to the input data size?

i.e. when using LZ4 to compress text, I found it was not worth compressing text if the input size was 10 blocks or less, but at 20 blocks I was getting about 40% compression, at 30 blocks was averaging over 50%, and one 60 block text file got close to 75% compression.

So the larger the file the better the compression, whereas at smaller sizes, another compressor would be better.

Back when I started playing with LZ4, for me, LZ4 has now become pretty close to being a universal compressor.

Have tried it on system files, hi-res and dbl hi-res graphics and, text files, all with some pretty good results.

Michael 'AppleWin Debugger Dev'

unread,

Jul 18, 2017, 4:40:12 PM7/18/17

to

On Monday, July 17, 2017 at 9:59:14 AM UTC-7, qkumba wrote:

> Without focusing on the algorithm in Unpack2Bits, here are some quick suggestions:

Thanks for all the help Peter!

I'm constantly impressed and amazed at your ability to squeeze out a few more bytes. I guess that's one for the CV's. Hobbies: 6502 optimization / minification. :-)

Down to 681 bytes after juggling the X and Y regs in Unpack2Bits. I'm very happy with the size. Since I don't see any more obvious minification / code golf opportunities (the few I tried didn't pan out) and need to get to my other projects this is "good enough."

Michael

gid...@sasktel.net

unread,

Jul 18, 2017, 4:59:32 PM7/18/17

to

On Monday, July 17, 2017 at 11:22:21 PM UTC-6, gid...@sasktel.net wrote:
> > There is no such thing as a universally "best" compression algorithm. As
> > the input changes, the best compression algorithm changes.

Never mind, just re-read that and it had a different meaning when I read it the first time. :)

qkumba

unread,

Jul 19, 2017, 11:46:39 AM7/19/17

to

> Thanks for all the help Peter!

you're welcome.

> this is "good enough."

In case you return to it:

pha
cmp #$38 ;
bne apple_iiplus

apple_ii
pla
jsr IB_HGR ; HGR on ][
beq apple_ii_normal ; always, ends with BNE $D01B RTS

apple_iiplus

->

cmp #$38 ;
bne apple_iiplus

apple_ii
jsr IB_HGR ; HGR on ][
beq apple_ii_normal ; always, ends with BNE $D01B RTS

apple_iiplus
pha

--

RAM_64K
;; ldx #"6" ;x is already "6"

--

lda #0
tay
tax ; SrcShift=0

->

ldx #0
ldy #0
A is destroyed right after the STAs anyway.

--

ldy zDstShift ; which 280 px column is next pixel writing to?
beq NoShiftSherlock
MakeShiftMask
asl
rol zMask
dey
bne MakeShiftMask

NoShiftSherlock

asl ; msb of byte0 set?
rol zMask ; shift in to lsb of byte1

->

ldy zDstShift ; which 280 px column is next pixel writing to?
MakeShiftMask

asl ; msb of byte0 set?
rol zMask ; shift in to lsb of byte1

dey
bpl MakeShiftMask

--

lda zDstShift ; x={0,1,2} + 4 < 7
;; clc ;carry is cleared by ror after asl above
adc #4
cmp #7 ; all bits fit into dest byte?
bcc FitSameByte

Michael 'AppleWin Debugger Dev'

unread,

Jul 19, 2017, 1:15:32 PM7/19/17

to

On Wednesday, July 19, 2017 at 8:46:39 AM UTC-7, qkumba wrote:
> In case you return to it:

:-)

> cmp #$38 ;
> bne apple_iiplus
> apple_ii
> jsr IB_HGR ; HGR on ][
> beq apple_ii_normal ; always, ends with BNE $D01B RTS
>
> apple_iiplus
> pha

Ah, yes, no point doing PHA before it is needed.

> --
>
> RAM_64K
> ;; ldx #"6" ;x is already "6"

NAK.

Code path for //e (original) is:

apple_iiplus
apple_iie
- branches back up to RAM_64K.

I tried moving the LDX #6 above but the IB_HGR and AS_HGR trash X -- need to see if Y is available ...

> --

> ldx #0
> ldy #0
> A is destroyed right after the STAs anyway.

Same size -- but ooooh, the LDX #0 can be completely removed now since it is set in NextSrcShift !

> ->
>
> ldy zDstShift ; which 280 px column is next pixel writing to?
> MakeShiftMask
> asl ; msb of byte0 set?
> rol zMask ; shift in to lsb of byte1
> dey
> bpl MakeShiftMask

Ah, yes! That extra ASL ROL was bugging me. Don't know why I didn't think of the BPL for an extra loop iteration.

> --
> lda zDstShift ; x={0,1,2} + 4 < 7
> ;; clc ;carry is cleared by ror after asl above

Nice eyes!

667 bytes now.