This is kind of an unForthy thing I'm about to do, so I'm seeking opinions and comments. I'm squeezing things down, making things faster, playing code golf all over in my 6502 retro Forth, PETTIL
https://github.com/chitselb/pettil
From the beginning I integrated Sweet16 into the code, because it has great code density vs. 6502 assembler when dealing with 16-bit addresses and such. The idea here is the programmer can intertwingle Forth, 6502, and Sweet16 code fluidly, choosing the best approach. On a 6502, zero page is prime real estate, and here's this 32 bytes of Sweet16 registers that is otherwise unused. How can I get more value from that gorgeous page zero real estate? Maybe use it as local storage in primitives, and maybe don't even bother putting things back on the stack when I'm in the middle of doing that? Maybe. Words that work on the dictionary seem like good candidates to use this area for communication. David Schmenk's PLASMA describes the benefits of this
https://github.com/dschmenk/PLASMA#a-new-approach
There's 32 bytes of Sweet16 registers, the I inner loop index current value and limit, a pointer to Forth User variables, and the 7-byte 15-clock NEXT routine. Here's my zero page map, to make things clearer [fixed width font]:
[ $00 $02 $04 $06 $08 $0A $0C $0E ]
[ R0 R1 R2 R3 R4 R5 R6 R7 ] Sweet16 register
[ ACC TOS N ] Assembler name
[ N0 N1 N2 N3 N4 N5 ] Forth "N" scratch area
-------------------------------------------
[ $10 $12 $14 $16 $18 $1A $1C $1E ]
[ R8 R9 R10 R11 R12 R13 R14 R15 ] Sweet16 register
[ EXT SP CPR ST PC ] Assembler name
[ N6 N7 N8 N9 N11 N12 N13 ] Forth "N" scratch area
-------------------------------------------
[ $20 ... ]
[ low bytes of split stack ] STACKL
-------------------------------------------
[ $50 ... ]
[ high bytes of split stack ] STACKH
-------------------------------------------
[ $80 $82 $84 $86 ... $8B $8D ..]
[ zi zlim up inc 8B IP TIME ] I loop index/limit; NEXT routine;
inc 8B Commodore system jiffy clock
jmp (008B)
[ $90 ... ] rest of zero page for kernel use
And I realize that stack juggling is the antipattern that teaches me to factor my Forth code better. BUT this would be faster, and on a 1mhz 6502 I will take all the faster I can find.
FIND */MOD UM* CREATE FORGET <-- a few of the words which will probably become faster and smaller when it's all over.