I am all for that approach as well. Explore.
To show a very old example of what could be done, with counted strings,
this is very old code that I ported to Gforth6 a decade ago for my own
amusement.
Play with it but build something much better please.
\ STRINGS.FS provides re-entrant string functions for GForth6 BFox
\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\ Change History
\ Original TIForth: Brian Fox London Ontario Canada 08Oct87
\ Ported to HSForth 14Sep88
\ Ported to Win32Forth & sped up string stack 09Oct04
\ Test under VFX to compare speed 30Oct04
\ Ported to Gforth v6 15Feb05
\
\ Explanation
\ This file extends Forth to create counted strings. The original
\ objective was to demostrate string handling in Forth could be as easy as BASIC
\ I believe the objective was met.
\
\ The principal is simple. All string functions that potentially alter a string
\ move their output to a string stack and return the address of that new string.
\ The String stack is created in temporary space above PAD in a location called
\ TOP$. By creating intermediate output strings each function can be
\ "strung together" until the final result is obtained. The final result can be
\ stored back to another string or printed. Printing or storing a string
\ collapses the string stack automatically.
\ General Naming Convention: (some exceptions)
\ 1. WORDs that end in '$' leave results on stack in TOP$
\ 2. WORDs that start with $ expect a string argument on the stack
\
\ Features Include:
\ Compile time size checking. ( for novice users, can be commented out )
\ MaxLen byte compiled each into string for run time overflow checks.
\ String stack of fixed width for speed
\ TOP$ is the top of a stack of "PADs" for multiple string operations
\ Normal "BASIC" functions are re-entrant (LEFT$ RIGHT$ MID$)
\ $. and $! collapse the string stack on completion.
\ +$ allows multiple concatenation with run time size checking
\ :=" for easy assignment of string literals at compile time
\ :="" for string clear routine.
\ $POS finds position of a character within a string.
\ MAXLEN returns the maximum capacity of the string
\ $.R prints flush right text with leading blanks
\ $.LEFT prints flush left text with trailing blanks
\ -trailing$ reomves trailing spaces
\ -leading$ removes leading spaces
\ push$ pushes a string onto the stack (allocates space first)
\ -----------------------------------------------------------------------------
\ string stack lexicon
HEX
pad value top$ \ contains address of top of string stack
100 constant ss-width \ string stack width
ss-width 1- constant max$len \ biggest string we handle
: +ssp \ incr. string stack pointer
top$ ss-width + to top$ ;
: clrssp
pad to top$ ; \ reset top of string stack to pad
: ($!) ( $addr1 $addr2 -- ) \ no size checking!! be careful
>r dup c@ 1+ r> swap cmove ;
: $clip ( $addr n -- $addr n )
max over c@ min ;
: ?stringsize ( n -- )
max$len > abort" string too big" ;
: new.top$ ( -- top$ ) \ create a new string on the string stack of maximum length
+ssp \ create the space
top$
0FF over 1- ! ; \ set length to 0, maxlen to 255
: >top$ ( str -- ) new.top$ ($!) ; \ push str onto string stack
\ primitives ( some of these come from concepts in PYTHON)
\ return the address of of the nth char in a string
: ]$ ( adr$ n -- adr$[n] ) 1+ + ; \ usage: A$ 5 ]$ C@ returns 5th char
: chr$ ( ascii# -- top$ ) \ convert ascii char to $
new.top$ 0 ]$ c! \ store the ascii# at NEW top$[0]
1 top$ c! \ set the char count to 1
top$ ;
: push$ ( str -- top$) \ push str onto the string stack
>top$ top$ ; \ found this phrase was used a lot.
decimal
\ string variable size byte for error checking
\ this version also uses compile time size checking with an abort.
: $variable ( #bytes -- )
create dup ?stringsize \ remove if you don't like compile time check
dup c, 0 c, allot
does> 1+ ;
\ string primitive operations
: len ( adr$ - n ) c@ ;
: maxlen ( str -- maximum length ) 1- len ;
: pack$ ( adr cnt -- $ ) over 1- c! 1- ;
: !len ( n adr -- adr ) swap over c! ;
\ String constant control characters
HEX
1 $variable cr$ 0d01 cr$ !
1 $variable eof$ 1a01 eof$ !
1 $variable "$ 2201 "$ !
1 $variable bl$ 2001 bl$ !
1 $variable tab$ 0901 tab$ !
1 $variable null$ 0 null$ !
DECIMAL
: chr$ ( ascii# -- str ) 256 * 1+ top$ ! top$ ;
: asc ( adr$ -- ascii# ) 1+ c@ ;
\ $TEXT is like text but much handier since you can get multiple
\ inputs that simple stack up automatically. ( BL $TEXT BL $TEXT etc...)
: $TEXT ( delimit-char -- ) WORD push$ ;
\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\ STRING I/O WORDS:
\ $! $. I refer to as string I/O words. This type of word
\ must clear the string stack when complete. Since these operators always
\ DO something with end product string this does not pose much restriction
\ to the FORTH programmer and abstracts the details allowing the Forth
\ programmer to write complex phrases of string language with less concern
\ about the mechanism at work.
: $! ( $adr1 $adr2 -- ) ($!) clrssp ;
\ useful if you need error checking
: $move ( $adr1 , $adr2 -- )
2dup maxlen swap len < abort" string to big"
$! ;
: $. ( $adr -- ) count type clrssp ; \ $. clears string stack after printing.
: $.left ( $adr,n -- ) \ prints str, n chars wide
over $. \ print the string
swap c@ - 0 max spaces ; \ print len(str)-n spaces
: $.r ( $adr,n -- ) over len - 0 max spaces $. ; \ print right justified
: .top$ ( $adr -- ) top$ count type ; \ view top$ but do not collapse the string stack
\ these are syntax candy
: :=" ( $addr -- <text> ) [char] " word swap $! ; \ usage: name :=" Brian Fox"
: :="" ( $addr -- ) dup maxlen 0 fill ; \ usage: name :=""
: $xchg ( $adr1,$adr2 -- ) \ does run time size checking
dup >top$
over swap ($!) ( don't collapse the stack. we still need it)
top$ swap $move ;
: +$ ( $adr1,$adr2 -- top$ )
2dup swap ( $adr1) >top$
( $adr2) count top$ count + swap cmove
len swap len + dup ?stringsize
top$ !len ;
: left$ ( adr$ #char -- top$ )
swap >top$ top$ !len ;
: right$ ( adr$ #char -- top$ )
+ssp
0 $clip >r count r@ - + r@ top$ c!
top$ 1+ r> cmove top$ ;
: mid$ ( adr$ start #char -- top$ )
+ssp
0 max >r 1 $clip
2dup swap c@ - negate 1+ r> min
top$ c! + top$ 1+ top$ c@ cmove top$ ;
: str$ ( n -- top$ )
+ssp 0 <# #s #> dup top$ c!
top$ 1+ swap cmove top$ ;
: $val ( adr$ - n ) \ can't accept commas or periods
number? 0=
if
abort" val cannot convert the string to a number"
then
drop drop ;
: -trailing$ ( $ -- $ ) \ removes trailing blanks, results in Top$
push$ count -trailing ( adr cnt ) pack$ ;
: -LEADING ( adr cnt -- adr cnt ) \ dank je wel Albert Van der Horst
BEGIN
over c@ bl = over 0= 0= and
WHILE
1- SWAP 1+ SWAP
REPEAT ;
: -leading$ ( $ -- $ ) push$ count -leading ( adr cnt ) pack$ ;
: clean$ ( $ -- $ ) -leading$ -trailing$ ;
: $compare ( adr$1 adr$2 -- -n:0:n ) count rot count compare ;
: $< ( adr$1 adr$2 -- ? ) $compare 1 = ;
: $> ( adr$1 adr$2 -- ? ) $compare -1 = ;
: $= ( adr$1 adr$2 -- ? ) $compare 0= ;
: $pos ( addr$ char -- position ) \ returns 0 if not found
over >r
>r dup count
r> scan drop swap -
r> c@
over <=
if drop 0 then ;
true [if]
decimal
\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\ String test suite
\ test strings
255 $variable q q :=" 12345678901234567890123456789012345678901234567890 "
255 $variable w w :=" ABCDEFGHIJKLMNOPQRSTUVWXY:abcdefghijklmnopqrstuvwxy "
100 $variable x x :=" string X" ( smaller to check for errors)
255 $variable y y :=" string Y"
32 $variable anumber$
200 $variable cut$
: stringmuncher
clrssp
w clean$ q clean$ +$ y $!
y 100 left$ 60 right$ 2 50 mid$ x $!
9999 str$ anumber$ $!
\ Delimit the string w at the ":" position
w clean$ dup [char] : $POS left$ cut$ $! ( "abcdefghijklmnopqrstuvwxy:")
;
: $DEMO
page
cr ." Testing 1,000,000 iterations of Stringmuncher ..."
cr ." Input strings"
cr ." Q = " q $.
cr ." W = " w $.
cr ." X = " x $.
cr ." Y = " y $.
cr
100 0 do [char] . emit
10000 0 do
stringmuncher
loop
loop
cr
cr ." Results:"
cr ." X = " x $.
cr ." Y = " y $.
cr ." anumber$=" anumber$ $.
cr ." CUT$ = " cut$ $.
cr ." *COMPLETE*" ;
[then]