Questions about strings

Chris Curl

unread,

Oct 17, 2015, 9:47:05 AM10/17/15

to

Assuming that the count is at the beginning of a string, is the count the same size as the characters in the string?

And are the characters "wide" (16 bits), or are they 8-bit entities?

In other words, to get the length of a string, would I use C@ or @?

I know I could use COUNT NIP, but I want to understand what is going on behind the scenes so that my implementation behaves like others would.

Perhaps I should go rogue and implement them as standard BSTRs ...

http://bytecomb.com/vba-internals-string-variables-and-pointers-in-depth/

rickman

unread,

Oct 17, 2015, 11:25:26 AM10/17/15

to

Do you read the ANS Forth spec?

"3.1.3.4 Counted strings
A counted string in memory is identified by the address (c-addr) of its
length character. The length character of a counted string shall
contain a binary representation of the number of data characters,
between zero and the implementation-defined maximum length for a counted
string. The maximum length of a counted string shall be at least 255."

Does that answer your question?

--

Rick

Ron Aaron

unread,

Oct 17, 2015, 12:30:40 PM10/17/15

to

On 10/17/15 16:46, Chris Curl wrote:
> Assuming that the count is at the beginning of a string, is the count the same size as the characters in the string?
>
> And are the characters "wide" (16 bits), or are they 8-bit entities?
>
> In other words, to get the length of a string, would I use C@ or @?

I'll just interject with an 8th answer.

In 8th, a string contains its length (but not in the same byte buffer as
the text), and all strings are UTF-8. That means that the character
length of a string and its byte-count are not necessarily the same.

Therefore there are two words: "s:len", which gives the number of
*characters*, and "s:size" which gives the number of *bytes*

Anton Ertl

unread,

Oct 17, 2015, 1:16:21 PM10/17/15

to

Chris Curl <ccur...@gmail.com> writes:
>Assuming that the count is at the beginning of a string, is the count the same size as the characters in the string?

If you use a COUNTed string, the count is a character. Most words
dealing with strings take and produce strings represented as c-addr u,
where u specifies the length in characters, and c-addr points to the
first character.

>And are the characters "wide" (16 bits), or are they 8-bit entities?

In practice, 8-bit entities on byte-addressed machines.

>In other words, to get the length of a string, would I use C@ or @?

Most of the time, you would use NIP.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2015: http://www.rigwit.co.uk/EuroForth2015/

Jason Damisch

unread,

Oct 17, 2015, 1:41:43 PM10/17/15

to

On Saturday, October 17, 2015 at 6:47:05 AM UTC-7, Chris Curl wrote:
> Assuming that the count is at the beginning of a string,
> is the count the same size as the characters in the string?

You have a command line at your disposal, I suggest that you poke
around with it. Its ok to fail at first to later succeed. Don't
be afraid of making mistakes.

But, I'll answer. In regular Forth the strings have a byte count
and that first byte can be reached with C@. The strings can
therefore not be longer than 255 characters long. The regular
Forth strings just use ASCII. Try using C@ and EMIT to play
around with some Forth strings. Also S"

Jason

Chris Curl

unread,

Oct 17, 2015, 1:43:03 PM10/17/15

to

yes, thanks. is there a downloadable version of the spec somewhere? then it would be easier looking for stuff in it.

Anton Ertl

unread,

Oct 17, 2015, 1:54:17 PM10/17/15

to

Chris Curl <ccur...@gmail.com> writes:
>yes, thanks. is there a downloadable version of the spec somewhere?

http://www.forth200x.org/documents/forth-2012.pdf

For easy online lookup, you can go for

http://forth-standard.org/standard/words

Elizabeth D. Rather

unread,

Oct 17, 2015, 1:56:47 PM10/17/15

to

A link to the Forth2012 Release Candidate and other useful Forth links
may be found http://www.forth.com/resources/ as well as other places.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

rickman

unread,

Oct 17, 2015, 2:07:17 PM10/17/15

to

Unlike many specs (VHDL is nearly impossible to read) the Forth spec is
not too bad. The level 1 stuff is pretty clear most of the time. The
deeper stuff is harder to pull out, but that may be simply because it is
harder to understand by definition. I know I never fully follow many
conversations here about things like what a system should do with ' if
. But it has lots of useful info in the appendix.

Just adopt a positive attitude and look for the info. It is usually there.

--

Rick

HAA

unread,

Oct 17, 2015, 9:51:12 PM10/17/15

to

Chris Curl wrote:
>
> is there a downloadable version of the spec somewhere? then it would be
> easier looking for stuff in it.

For ANS-Forth (aka Forth-94) there's a html version I like using:
ftp://ftp.taygeta.com/pub/Forth/Literature/dpans6v3_3.zip

Brad Eckert wrote a handy 'word index' for it:
ftp://ftp.lab.unb.br/pub/hardware/opencores/cd16/Doc/AnsDraft/quick.htm

foxaudio...@gmail.com

unread,

Oct 18, 2015, 12:42:00 AM10/18/15

to

I am all for that approach as well. Explore.
To show a very old example of what could be done, with counted strings,
this is very old code that I ported to Gforth6 a decade ago for my own
amusement.

Play with it but build something much better please.

\ STRINGS.FS provides re-entrant string functions for GForth6 BFox

\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\ Change History
\ Original TIForth: Brian Fox London Ontario Canada 08Oct87
\ Ported to HSForth 14Sep88
\ Ported to Win32Forth & sped up string stack 09Oct04
\ Test under VFX to compare speed 30Oct04
\ Ported to Gforth v6 15Feb05
\

\ Explanation
\ This file extends Forth to create counted strings. The original
\ objective was to demostrate string handling in Forth could be as easy as BASIC
\ I believe the objective was met.
\
\ The principal is simple. All string functions that potentially alter a string
\ move their output to a string stack and return the address of that new string.
\ The String stack is created in temporary space above PAD in a location called
\ TOP$. By creating intermediate output strings each function can be
\ "strung together" until the final result is obtained. The final result can be
\ stored back to another string or printed. Printing or storing a string
\ collapses the string stack automatically.

\ General Naming Convention: (some exceptions)
\ 1. WORDs that end in '$' leave results on stack in TOP$
\ 2. WORDs that start with $ expect a string argument on the stack
\
\ Features Include:
\ Compile time size checking. ( for novice users, can be commented out )
\ MaxLen byte compiled each into string for run time overflow checks.
\ String stack of fixed width for speed
\ TOP$ is the top of a stack of "PADs" for multiple string operations
\ Normal "BASIC" functions are re-entrant (LEFT$ RIGHT$ MID$)
\ $. and $! collapse the string stack on completion.
\ +$ allows multiple concatenation with run time size checking
\ :=" for easy assignment of string literals at compile time
\ :="" for string clear routine.
\ $POS finds position of a character within a string.
\ MAXLEN returns the maximum capacity of the string
\ $.R prints flush right text with leading blanks
\ $.LEFT prints flush left text with trailing blanks
\ -trailing$ reomves trailing spaces
\ -leading$ removes leading spaces
\ push$ pushes a string onto the stack (allocates space first)
\ -----------------------------------------------------------------------------

\ string stack lexicon
HEX
pad value top$ \ contains address of top of string stack

100 constant ss-width \ string stack width
ss-width 1- constant max$len \ biggest string we handle

: +ssp \ incr. string stack pointer
top$ ss-width + to top$ ;

: clrssp
pad to top$ ; \ reset top of string stack to pad

: ($!) ( $addr1 $addr2 -- ) \ no size checking!! be careful
>r dup c@ 1+ r> swap cmove ;

: $clip ( $addr n -- $addr n )
max over c@ min ;

: ?stringsize ( n -- )
max$len > abort" string too big" ;

: new.top$ ( -- top$ ) \ create a new string on the string stack of maximum length
+ssp \ create the space
top$
0FF over 1- ! ; \ set length to 0, maxlen to 255

: >top$ ( str -- ) new.top$ ($!) ; \ push str onto string stack

\ primitives ( some of these come from concepts in PYTHON)

\ return the address of of the nth char in a string
: ]$ ( adr$ n -- adr$[n] ) 1+ + ; \ usage: A$ 5 ]$ C@ returns 5th char

: chr$ ( ascii# -- top$ ) \ convert ascii char to $
new.top$ 0 ]$ c! \ store the ascii# at NEW top$[0]
1 top$ c! \ set the char count to 1
top$ ;

: push$ ( str -- top$) \ push str onto the string stack
>top$ top$ ; \ found this phrase was used a lot.

decimal
\ string variable size byte for error checking
\ this version also uses compile time size checking with an abort.

: $variable ( #bytes -- )
create dup ?stringsize \ remove if you don't like compile time check
dup c, 0 c, allot
does> 1+ ;

\ string primitive operations
: len ( adr$ - n ) c@ ;
: maxlen ( str -- maximum length ) 1- len ;
: pack$ ( adr cnt -- $ ) over 1- c! 1- ;
: !len ( n adr -- adr ) swap over c! ;

\ String constant control characters
HEX
1 $variable cr$ 0d01 cr$ !
1 $variable eof$ 1a01 eof$ !
1 $variable "$ 2201 "$ !
1 $variable bl$ 2001 bl$ !
1 $variable tab$ 0901 tab$ !
1 $variable null$ 0 null$ !

DECIMAL
: chr$ ( ascii# -- str ) 256 * 1+ top$ ! top$ ;
: asc ( adr$ -- ascii# ) 1+ c@ ;

\ $TEXT is like text but much handier since you can get multiple
\ inputs that simple stack up automatically. ( BL $TEXT BL $TEXT etc...)

: $TEXT ( delimit-char -- ) WORD push$ ;

\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\ STRING I/O WORDS:
\ $! $. I refer to as string I/O words. This type of word
\ must clear the string stack when complete. Since these operators always
\ DO something with end product string this does not pose much restriction
\ to the FORTH programmer and abstracts the details allowing the Forth
\ programmer to write complex phrases of string language with less concern
\ about the mechanism at work.

: $! ( $adr1 $adr2 -- ) ($!) clrssp ;

\ useful if you need error checking
: $move ( $adr1 , $adr2 -- )
2dup maxlen swap len < abort" string to big"
$! ;

: $. ( $adr -- ) count type clrssp ; \ $. clears string stack after printing.

: $.left ( $adr,n -- ) \ prints str, n chars wide
over $. \ print the string
swap c@ - 0 max spaces ; \ print len(str)-n spaces

: $.r ( $adr,n -- ) over len - 0 max spaces $. ; \ print right justified

: .top$ ( $adr -- ) top$ count type ; \ view top$ but do not collapse the string stack

\ these are syntax candy
: :=" ( $addr -- <text> ) [char] " word swap $! ; \ usage: name :=" Brian Fox"

: :="" ( $addr -- ) dup maxlen 0 fill ; \ usage: name :=""

: $xchg ( $adr1,$adr2 -- ) \ does run time size checking
dup >top$
over swap ($!) ( don't collapse the stack. we still need it)
top$ swap $move ;

: +$ ( $adr1,$adr2 -- top$ )
2dup swap ( $adr1) >top$
( $adr2) count top$ count + swap cmove
len swap len + dup ?stringsize
top$ !len ;

: left$ ( adr$ #char -- top$ )
swap >top$ top$ !len ;

: right$ ( adr$ #char -- top$ )
+ssp
0 $clip >r count r@ - + r@ top$ c!
top$ 1+ r> cmove top$ ;

: mid$ ( adr$ start #char -- top$ )
+ssp
0 max >r 1 $clip
2dup swap c@ - negate 1+ r> min
top$ c! + top$ 1+ top$ c@ cmove top$ ;

: str$ ( n -- top$ )
+ssp 0 <# #s #> dup top$ c!
top$ 1+ swap cmove top$ ;

: $val ( adr$ - n ) \ can't accept commas or periods
number? 0=
if
abort" val cannot convert the string to a number"
then
drop drop ;

: -trailing$ ( $ -- $ ) \ removes trailing blanks, results in Top$
push$ count -trailing ( adr cnt ) pack$ ;

: -LEADING ( adr cnt -- adr cnt ) \ dank je wel Albert Van der Horst
BEGIN
over c@ bl = over 0= 0= and
WHILE
1- SWAP 1+ SWAP
REPEAT ;

: -leading$ ( $ -- $ ) push$ count -leading ( adr cnt ) pack$ ;

: clean$ ( $ -- $ ) -leading$ -trailing$ ;

: $compare ( adr$1 adr$2 -- -n:0:n ) count rot count compare ;

: $< ( adr$1 adr$2 -- ? ) $compare 1 = ;
: $> ( adr$1 adr$2 -- ? ) $compare -1 = ;
: $= ( adr$1 adr$2 -- ? ) $compare 0= ;

: $pos ( addr$ char -- position ) \ returns 0 if not found
over >r
>r dup count
r> scan drop swap -
r> c@
over <=
if drop 0 then ;

true [if]

decimal
\ \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
\ String test suite

\ test strings
255 $variable q q :=" 12345678901234567890123456789012345678901234567890 "
255 $variable w w :=" ABCDEFGHIJKLMNOPQRSTUVWXY:abcdefghijklmnopqrstuvwxy "
100 $variable x x :=" string X" ( smaller to check for errors)
255 $variable y y :=" string Y"
32 $variable anumber$
200 $variable cut$

: stringmuncher
clrssp
w clean$ q clean$ +$ y $!
y 100 left$ 60 right$ 2 50 mid$ x $!
9999 str$ anumber$ $!
\ Delimit the string w at the ":" position
w clean$ dup [char] : $POS left$ cut$ $! ( "abcdefghijklmnopqrstuvwxy:")
;

: $DEMO
page
cr ." Testing 1,000,000 iterations of Stringmuncher ..."
cr ." Input strings"
cr ." Q = " q $.
cr ." W = " w $.
cr ." X = " x $.
cr ." Y = " y $.
cr
100 0 do [char] . emit
10000 0 do
stringmuncher
loop
loop
cr
cr ." Results:"
cr ." X = " x $.
cr ." Y = " y $.
cr ." anumber$=" anumber$ $.
cr ." CUT$ = " cut$ $.
cr ." *COMPLETE*" ;

[then]

Johan Kotlinski

unread,

Oct 18, 2015, 6:40:42 PM10/18/15

to

On Saturday, October 17, 2015 at 7:41:43 PM UTC+2, Jason Damisch wrote:
> But, I'll answer. In regular Forth the strings have a byte count
> and that first byte can be reached with C@. The strings can
> therefore not be longer than 255 characters long.

Hi!
It happened to me that I hit the limit, that s" was given too many characters. Does anyone have nice tricks or workarounds for that problem?

What I want in the end is c-addr u representation, so the exact intermediate storage form is not important.

Cheers
Johan

Elizabeth D. Rather

unread,

Oct 18, 2015, 7:41:10 PM10/18/15

to

Whether you need long strings depends entirely on the application. A
simple approach would be to simply modify your programs to use a cell
instead of a byte for the count, and use @ instead of C@ to get it. You
can keep the standard counted strings for standard words that depend on
them such as ." and use other strategies for your long strings. Words
that take c-addr u arguments (MOVE, ERASE, etc.) will work fine with
both, the only difference being how you fetch and store the lengths.
That is why the arguments are specified as c-addr u instead of c-addr c.

Anton Ertl

unread,

Oct 19, 2015, 3:50:43 AM10/19/15

to

Johan Kotlinski <kotl...@gmail.com> writes:
>On Saturday, October 17, 2015 at 7:41:43 PM UTC+2, Jason Damisch wrote:
>> But, I'll answer. In regular Forth the strings have a byte count
>> and that first byte can be reached with C@. The strings can
>> therefore not be longer than 255 characters long.
>
>Hi!
>It happened to me that I hit the limit, that s" was given too many characters. Does anyone have nice tricks or workarounds for that problem?

You mean that S" was limited by using a COUNTed string for internal
storage? In that case, get a better Forth system. E.g., Gforth has
no such limit. The user input device input buffer is limited to 1024
bytes, but you can have longer input lines in files (I just tried it
with an S" containing 5402 bytes).

Or you can tell your system implementor that his system is lacking and
that other systems do it better. Otherwise they'll continue to say
things along the lines of: "Counted strings are a great representation
for storing strings. Longer strings are not needed: Our users have
not complained about the limited length."

Chris Curl

unread,

Oct 19, 2015, 8:03:42 AM10/19/15

to

Re using the command line, I suppose could that ... if I had a Forth system installed. I have instead chosen to implement my own. But that is a good suggestion ... I assume there is a free version of Forth out there to play with, and it is good advice to use that for reference.

HAA

unread,

Oct 19, 2015, 8:26:29 AM10/19/15

to

Anton Ertl wrote:
>
> Or you can tell your system implementor that his system is lacking and
> that other systems do it better. Otherwise they'll continue to say
> things along the lines of: "Counted strings are a great representation
> for storing strings. Longer strings are not needed: Our users have
> not complained about the limited length."

Or this:

"The counted string as an internal format in Forth is extremely
practical. It is designed primarily for word names and program-internal
strings such as S" and ." which are virtually never longer than 256
bytes. So, what's to be embarrassed about?

To be sure, there are applications in which longer strings are
necessary, but like all application needs, they are most appropriately
handled in the application or possibly in optional wordsets."

Johan Kotlinski

unread,

Oct 19, 2015, 10:58:06 AM10/19/15

to

On Monday, October 19, 2015 at 1:41:10 AM UTC+2, Elizabeth D. Rather wrote:
> Whether you need long strings depends entirely on the application. A
> simple approach would be to simply modify your programs to use a cell
> instead of a byte for the count, and use @ instead of C@ to get it.

Hi Elizabeth!
Thanks for your answer.

What made me wonder is the "simply". To me it doesn't seem simple to write a new s", actually it feels a bit complicated. Is there maybe some good trick I don't know about that would make it simple?

All the best,
Johan

Elizabeth D. Rather

unread,

Oct 19, 2015, 2:37:02 PM10/19/15

to

These are all very short, simple definitions. Look at the current
definitions of ." S" and similar. They may involve some
implementation-dependent words (that know how your dictionary and data
space is managed). Some examples:

: STRING ( char -- ) PARSE STRING, ;

: SLITERAL ( c-addr1 u -- ) ( -- c-addr1 u )
POSTPONE (S") STRING, ; IMMEDIATE

: ," ( -- ) [CHAR] " STRING ;

: S" ( "ccc"<"> -- )
STATE @ IF POSTPONE (S") ," EXIT THEN
[CHAR] " PARSE >QPAD COUNT ; IMMEDIATE

In the above definitions, STRING, compiles a string into data space, and
>QPAD is a buffer. Those are the only system-dependent words, and are
themselves pretty simple.

I'm not convinced, however, that S" and ," are the best way to handle
really long strings. I would consider different data structures whose
design would depend on how the strings are being acquired and what needs
to be done with them.

Elizabeth D. Rather

unread,

Oct 19, 2015, 2:38:31 PM10/19/15

to

On 10/19/15 2:03 AM, Chris Curl wrote:
> Re using the command line, I suppose could that ... if I had a Forth system installed. I have instead chosen to implement my own. But that is a good suggestion ... I assume there is a free version of Forth out there to play with, and it is good advice to use that for reference.
>

There are many free Forths, including free evaluation versions of
SwiftForth and VFX. You really should be learning to use Forth before
setting out to write one. It will make it a lot easier for you.

Chris Curl

unread,

Oct 19, 2015, 5:28:36 PM10/19/15

to

I will install the SwiftForth eval version tonight and start playing with it.

rickman

unread,

Oct 20, 2015, 2:47:52 PM10/20/15

to

I'm a bit confused. I think Jason meant to say "counted strings"
instead of just "strings". s" uses a cell pair to specify a string of
length up to the max number represented by a cell. I don't see any
other limit to what can be represented by s". Maybe this was a limit of
the internal functionality of the forth when parsing the string? I
don't find anything limiting this in the standard. I also don't find
anything where the system has to describe this limitation. Should it be
required?

--

Rick

Elizabeth D. Rather

unread,

Oct 20, 2015, 3:49:02 PM10/20/15

to

You are correct, neither Forth94 nor Forth2012 limits the size of the
string in S". It is likely that some implementations use a byte for the
length, as that was standard practice for many years, but it seems to me
that that would be non-compliant. Maybe someone in the Forth 200x
committee would like to clarify that.

rickman

unread,

Oct 20, 2015, 11:25:45 PM10/20/15

to

The standard requires the value on the stack to be a cell, but does that
require that s" be capable of working with string lengths longer than
will fit in a byte if there is some internal limitation of the
implementation?

--

Rick

HAA

unread,

Oct 21, 2015, 12:12:21 AM10/21/15

to

rickman wrote:
>
> The standard requires the value on the stack to be a cell, but does that
> require that s" be capable of working with string lengths longer than
> will fit in a byte if there is some internal limitation of the
> implementation?

AFAIK it doesn't say. If it did it would need to specify a minimum as with:
- The maximum length of a counted string shall be at least 255.
- The size of the region identified by WORD shall be at least 33 characters.

Anton Ertl

unread,

Oct 21, 2015, 2:26:34 AM10/21/15

to

"Elizabeth D. Rather" <era...@forth.com> writes:
>You are correct, neither Forth94 nor Forth2012 limits the size of the
>string in S". It is likely that some implementations use a byte for the
>length, as that was standard practice for many years, but it seems to me
>that that would be non-compliant. Maybe someone in the Forth 200x
>committee would like to clarify that.

There is no limit specified, so the length of the string is only
limited by other limits, not by the standard; other limits in this
context are the available memory, and the input line length (the
standard guarantees that the TIB is at least 80 characters, file input
lines can be 128 bytes long (11.3.5), blocks can be 1024 characters
long, and an EVALUATEd string is only limited by available memory).

For interpretive S" and S\", Forth-2012 guarantees that each buffer
can hold 80 characters.

So, a standard-compliant compiled S" is able to deal with strings
longer than 255 characters, because such usage of S" can occur in
blocks and EVALUATEd strings. Given that one usually builds S" on top
of SLITERAL, and the strings passed to SLITERAL are only limited by
memory, this just means that system implementors should implement
SLITERAL properly, and S" will be catered for.

Alfred Singlestone

unread,

Oct 21, 2015, 7:21:40 PM10/21/15

to

On Monday, October 19, 2015 at 2:38:31 PM UTC-4, Elizabeth D. Rather wrote:

> There are many free Forths, including free evaluation versions of
> SwiftForth and VFX. You really should be learning to use Forth before
> setting out to write one. It will make it a lot easier for you.

I switch between learning Forth, improving the code that underlies it, and learning more about Intel assembly language. The last one might be the easiest for me, because I've been programming in mainframe assembly language for thirty-odd years.

At any rate, I find a nice feedback loop in shifting among them. Ideas beget ideas.

HAA

unread,

Oct 21, 2015, 8:08:40 PM10/21/15

to

Producing a forth of any quality isn't easy and of course it's a never-ending project.
The danger is tiring of it before really getting to know the language - which IMO is
only done through writing applications (lots of them).

HAA

unread,

Oct 21, 2015, 8:21:49 PM10/21/15

to

Correction.

3.4.1 Parsing
Unless otherwise noted, the number of characters parsed may be from
zero to the implementation-defined maximum length of a counted string.

Elizabeth D. Rather

unread,

Oct 22, 2015, 12:31:10 AM10/22/15

to

Good catch! Indeed, the description of S" includes this:

Compilation: ( “ccchquotei” – – )
*Parse* ccc delimited by " (double-quote). Append the run-time semantics
given below to the current definition.

...so it is limited by 3.4.1. If I were on the current standards
committee I'd look for a way to be more explicit about that, although I
now see that 6.1.2165 S" does include a cross-reference to 3.4.1.

I still am of the opinion that S" isn't very useful for managing really
long strings.

Anton Ertl

unread,

Oct 22, 2015, 1:54:09 AM10/22/15

to

"Elizabeth D. Rather" <era...@forth.com> writes:

>On 10/21/15 3:21 PM, HAA wrote:
>> 3.4.1 Parsing
>> Unless otherwise noted, the number of characters parsed may be from
>> zero to the implementation-defined maximum length of a counted string.
>
>Good catch! Indeed, the description of S" includes this:
>
>Compilation: ( “ccchquotei” – – )
>*Parse* ccc delimited by " (double-quote). Append the run-time semantics
>given below to the current definition.
>
>...so it is limited by 3.4.1. If I were on the current standards
>committee I'd look for a way to be more explicit about that

You are free to submit an RfD, but I think that it would be better to
move this limitation to WORD where it probably comes from. There is
no technical reason to have such a limitation for PARSE-NAME or PARSE.

HAA

unread,

Oct 22, 2015, 9:09:17 AM10/22/15

to

If named strings don't present a problem you can use PARSE.

: $" ( " ccc<"> name )
[char] " parse here swap dup chars allot 2dup 2constant cmove ;

$" imagine this is a very long string" msg1

HAA

unread,

Oct 23, 2015, 9:44:55 PM10/23/15

to

Elizabeth D. Rather wrote:
> ...
> Compilation: ( "ccchquotei" - - )

> *Parse* ccc delimited by " (double-quote). Append the run-time semantics
> given below to the current definition.
>
> ...so it is limited by 3.4.1. If I were on the current standards
> committee I'd look for a way to be more explicit about that, although I
> now see that 6.1.2165 S" does include a cross-reference to 3.4.1.

PARSE has the same cross-reference which implies retrieved strings
are limited to 255 chars for a standard program. 3.4.1. has an "unless
otherwise noted" proviso but I could find no such exemption for PARSE.

Anton Ertl

unread,

Oct 25, 2015, 6:22:59 AM10/25/15

to

There is no technical reason for that, because PARSE is specified to
produce an address in the input buffer, so it must not use a counted
string as intermediate storage. A.6.2.2008 PARSE says:

|2)
| 3) The count character limits the length of the string returned by
| WORD to 255 characters (longer strings can easily be stored in
| blocks!). This limitation does not exist for PARSE.

So it was obviously not the intent of the Forth-94 committee to have
this limitation for PARSE, so it probably is a bug in Forth-94 (and
-2012).

HAA

unread,

Oct 26, 2015, 12:15:16 AM10/26/15

to

Anton Ertl wrote:
> "HAA" <som...@microsoft.com> writes:
> >Elizabeth D. Rather wrote:
> >> ...
> >> Compilation: ( "ccchquotei" - - )
> >> *Parse* ccc delimited by " (double-quote). Append the run-time semantics
> >> given below to the current definition.
> >>
> >> ...so it is limited by 3.4.1. If I were on the current standards
> >> committee I'd look for a way to be more explicit about that, although I
> >> now see that 6.1.2165 S" does include a cross-reference to 3.4.1.
> >
> >PARSE has the same cross-reference which implies retrieved strings
> >are limited to 255 chars for a standard program. 3.4.1. has an "unless
> >otherwise noted" proviso but I could find no such exemption for PARSE.
>
> There is no technical reason for that, because PARSE is specified to
> produce an address in the input buffer, so it must not use a counted
> string as intermediate storage.

That's more than 3.4.1 states. It places a limit on any parsed strings in
a standard program 'unless otherwise noted' but doesn't say why.

> A.6.2.2008 PARSE says:
>
> |2)
> | 3) The count character limits the length of the string returned by
> | WORD to 255 characters (longer strings can easily be stored in
> | blocks!). This limitation does not exist for PARSE.
>
> So it was obviously not the intent of the Forth-94 committee to have
> this limitation for PARSE,

It's suggestive, though I wouldn't consider promotion of a new word in
the annex of a standard as authoritative or binding.

> so it probably is a bug in Forth-94 (and
> -2012).

If the entire input buffer is accessible to the programmer, then a
standard program should be able to code PARSE from scratch using
SOURCE and >IN without any length limit. Maybe it's not guaranteed.

Anton Ertl

unread,

Oct 26, 2015, 7:23:37 AM10/26/15

to

"HAA" <som...@microsoft.com> writes:

>Anton Ertl wrote:
>> >PARSE has the same cross-reference which implies retrieved strings
>> >are limited to 255 chars for a standard program. 3.4.1. has an "unless
>> >otherwise noted" proviso but I could find no such exemption for PARSE.
>>
>> There is no technical reason for that, because PARSE is specified to
>> produce an address in the input buffer, so it must not use a counted
>> string as intermediate storage.
>
>That's more than 3.4.1 states. It places a limit on any parsed strings in
>a standard program 'unless otherwise noted' but doesn't say why.
>
>> A.6.2.2008 PARSE says:
>>
>> |2)
>> | 3) The count character limits the length of the string returned by
>> | WORD to 255 characters (longer strings can easily be stored in
>> | blocks!). This limitation does not exist for PARSE.
>>
>> So it was obviously not the intent of the Forth-94 committee to have
>> this limitation for PARSE,
>
>It's suggestive, though I wouldn't consider promotion of a new word in
>the annex of a standard as authoritative or binding.

Not sure what you mean with "promotion", but yes, the rationale is not
authoritative. It does give us insight into the intent of the
committee, though.

>> so it probably is a bug in Forth-94 (and
>> -2012).
>
>If the entire input buffer is accessible to the programmer, then a
>standard program should be able to code PARSE from scratch using
>SOURCE and >IN without any length limit.

Yes, you can write PARSE using SOURCE, >IN and other standard
words.

Lars Brinkhoff

unread,

Oct 27, 2015, 6:30:22 AM10/27/15

to

"HAA" <som...@microsoft.com> writes:
> Producing a forth of any quality isn't easy and of course it's a
> never-ending project. The danger is tiring of it before really
> getting to know the language - which IMO is only done through writing
> applications (lots of them).

Agreed. I'm keeping an eye on Forth repositories at GitHub, and I see a
lot of half-finished toy Forths (and very few applications or
libraries). Nothing wrong with that, but it doesn't seem to be a good
way to learn programming in Forth.

Chris Curl

unread,

Oct 27, 2015, 9:14:59 AM10/27/15

to

Guilty as charged! :D
I am a software developer by trade, and as such, I spend 8-10 hours a day writing code. So I doubt I will be writing "lots" of Forth programs in my spare time. I am one of those people who has many interests. This is just one of my current interests, competing with hockey, vortex-based math, stock trending analysis, woodworking and working on my car. As it turns out, having my own implementation of Forth may play a role in many of those as well, because you can also use it to control a CNC machine, and an engine management system. Then there is that whole machine learning and robotics area that also interest me.

Damn, I need to win the lottery or something. A couple of hours at night it not nearly enough time.

Chris Curl

unread,

Oct 27, 2015, 10:01:34 AM10/27/15

to

On Tuesday, October 27, 2015 at 6:30:22 AM UTC-4, Lars Brinkhoff wrote:

> I'm keeping an eye on Forth repositories at GitHub, and I see a
> lot of half-finished toy Forths (and very few applications or
> libraries). Nothing wrong with that, but it doesn't seem to be a good
> way to learn programming in Forth.

There may be another one up there soon ... this one being an x86 assembler version I am working on. It has about 50 primitives, and a table of vectors instead of a big switch statement to execute the primitives. It builds an exe that is under 40K.

I don't know why, but I feel like implementing my own Forth is a rite of passage of sorts. If I am no longer able to implement my own Forth compiler/interpreter, then I am not worthy and should not be using Forth.

For some reason though, I don't feel the same way about the other languages, just Forth. Maybe it's because of my history, and the fact that my first Forth implementation was in the 1980s on a 6502. I guess this takes me back to my youth.

Lars Brinkhoff

unread,

Oct 27, 2015, 10:04:28 AM10/27/15

to

Chris Curl <ccur...@gmail.com> writes:
>> Agreed. I'm keeping an eye on Forth repositories at GitHub, and I see a
>> lot of half-finished toy Forths (and very few applications or
>> libraries). Nothing wrong with that, but it doesn't seem to be a good
>> way to learn programming in Forth.
> Guilty as charged! :D

So am I! :-) :-(

Lars Brinkhoff

unread,

Oct 27, 2015, 10:04:55 AM10/27/15

to

Chris Curl <ccur...@gmail.com> writes:
> I don't know why, but I feel like implementing my own Forth is a rite
> of passage of sorts. If I am no longer able to implement my own Forth
> compiler/interpreter, then I am not worthy and should not be using
> Forth.

Shouldn't it be self-hosting too? ;-)

Chris Curl

unread,

Oct 27, 2015, 5:26:19 PM10/27/15

to

On Tuesday, October 27, 2015 at 10:04:55 AM UTC-4, Lars Brinkhoff wrote:

Heh ... yeah .. probably ... that might be a ways off ... :D

Elizabeth D. Rather

unread,

Oct 27, 2015, 5:45:32 PM10/27/15

to

In my experience, it's easier to build a standalone Forth than one under
an OS.

Lars Brinkhoff

unread,

Oct 28, 2015, 2:44:19 AM10/28/15

to

Elizabeth D. Rather wrote

> Chris Curl wrote:
>> Lars Brinkhoff wrote:

>>> Chris Curl wrote:
>>>> I feel like implementing my own Forth is a rite of passage of
>>>> sorts. If I am no longer able to implement my own Forth compiler/
>>>> interpreter, then I am not worthy and should not be using Forth.
>>> Shouldn't it be self-hosting too? ;-)
>> Heh ... yeah .. probably ... that might be a ways off ... :D
> In my experience, it's easier to build a standalone Forth than one
> under an OS.

"Self-hosting" means that the language is implemented in itself, and can
generate new versions of itself without any other language being
involved. E.g. a metacompiled Forth.

HAA

unread,

Oct 30, 2015, 9:20:53 AM10/30/15

to

Anton Ertl wrote:
> "HAA" <som...@microsoft.com> writes:
> >Anton Ertl wrote:
> >> >PARSE has the same cross-reference which implies retrieved strings
> >> >are limited to 255 chars for a standard program. 3.4.1. has an "unless
> >> >otherwise noted" proviso but I could find no such exemption for PARSE.
> >>
> >> There is no technical reason for that, because PARSE is specified to
> >> produce an address in the input buffer, so it must not use a counted
> >> string as intermediate storage.
> >
> >That's more than 3.4.1 states. It places a limit on any parsed strings in
> >a standard program 'unless otherwise noted' but doesn't say why.
> >
> >> A.6.2.2008 PARSE says:
> >>
> >> |2)
> >> | 3) The count character limits the length of the string returned by
> >> | WORD to 255 characters (longer strings can easily be stored in
> >> | blocks!). This limitation does not exist for PARSE.
> >>
> >> So it was obviously not the intent of the Forth-94 committee to have
> >> this limitation for PARSE,
> >
> >It's suggestive, though I wouldn't consider promotion of a new word in
> >the annex of a standard as authoritative or binding.
>
> Not sure what you mean with "promotion", but yes, the rationale is not
> authoritative. It does give us insight into the intent of the
> committee, though.

Intent? Or opinion of one or more members? Elizabeth might know.

> >> so it probably is a bug in Forth-94 (and
> >> -2012).
> >
> >If the entire input buffer is accessible to the programmer, then a
> >standard program should be able to code PARSE from scratch using
> >SOURCE and >IN without any length limit.
>
> Yes, you can write PARSE using SOURCE, >IN and other standard
> words.

This passage would appear to confirm it.

"A program may directly examine the input buffer using its address and length
as returned by SOURCE"

> - anton

Elizabeth D. Rather

unread,

Oct 31, 2015, 1:07:01 AM10/31/15

to

Historically, the reason WORD moved its string was that overwhelmingly
the most searches occur when a program is being compiled, and that is
the process being optimized overall. WORD moved the string to the next
available location in the dictionary, leaving it positioned such that,
if the program is going to construct a definition for it, it doesn't
have to be moved again; all that needs to be done is to fill in other
elements of the new definition. I am not sure that the thought of
leaving the string in the input buffer was actually considered by Chuck.
I don't recall it being debated.

Certainly we never felt the 255-char to be a serious limitation. In the
late 70's, Chuck wrote an extensive word processing application for
FORTH, Inc. documentation. The polyFORTH Reference Manual, the first ed.
of Starting Forth, and countless other documents were composed using it.
That was probably the most string-intensive program we ever worked on,
but it treated strings as data in ways that were never limited by that
parameter.

As for invoking that limitation in PARSE, it does appear to me now that
it was an oversight, and you might want to fix it.

Cheers,
Elizabeth

>>>> so it probably is a bug in Forth-94 (and
>>>> -2012).
>>>
>>> If the entire input buffer is accessible to the programmer, then a
>>> standard program should be able to code PARSE from scratch using
>>> SOURCE and >IN without any length limit.
>>
>> Yes, you can write PARSE using SOURCE, >IN and other standard
>> words.
>
> This passage would appear to confirm it.
>
> "A program may directly examine the input buffer using its address and length
> as returned by SOURCE"
>
>> - anton
>
>

Paul Rubin

unread,

Oct 31, 2015, 2:27:54 AM10/31/15

to

"Elizabeth D. Rather" <era...@forth.com> writes:

> In the late 70's, Chuck wrote an extensive word processing application
> for FORTH, Inc. documentation. The polyFORTH Reference Manual, the
> first ed. of Starting Forth, and countless other documents were
> composed using it.

Is that program still in use? Any chance of releasing it? It sounds
neat, and I doubt that having it in the wild would interfere with Forth
Inc. revenue.

Elizabeth D. Rather

unread,

Oct 31, 2015, 3:28:29 AM10/31/15

to

It certainly wouldn't impact revenue, but I have no idea whether Leon
(who has all the old files, discs, mostly unreadable, etc.) could
resurrect it enough to be useful to you. We ceased using it 30 years
ago! Its design was somewhat like what (little) I understand of LaTex:
in other words, commands relating to what to do with the text that
followed. Ask him.

Cheers,
Elizabeth

Paul Rubin

unread,

Oct 31, 2015, 3:37:28 AM10/31/15

to

"Elizabeth D. Rather" <era...@forth.com> writes:
> It certainly wouldn't impact revenue, but I have no idea whether Leon
> (who has all the old files, discs, mostly unreadable, etc.) could
> resurrect it enough to be useful to you. We ceased using it 30 years
> ago!

Yes, it's quite understandable if exhuming the code is too much hassle
to be worthwhile just for curiosity's sake. I don't have any great to
desire to actually use it beyond trying it out a little. It's more an
interest in seeing how something like that was written.

> Its design was somewhat like what (little) I understand of LaTex: in
> other words, commands relating to what to do with the text that
> followed.

That's about what I'd guessed, so I was mostly interested in seeing what
the commands looked like, the implementation technique, what using it
was like, etc. Was it done with blocks and screens, using a block
editor to write the input document?

> Ask him.

I've never met him and he doesn't post here, but maybe someday.

Thanks!

Elizabeth D. Rather

unread,

Oct 31, 2015, 3:48:42 AM10/31/15

to

Yes. polyFORTH was entirely block-oriented, even after we started
keeping blocks in DOS files.

>> Ask him.
>
> I've never met him and he doesn't post here, but maybe someday.

le...@forth.com should work. He is an active participant in Forth20xx,
but doesn't usually check in to c.l.f.

Chris Curl

unread,

Oct 31, 2015, 1:48:37 PM10/31/15

to

I have decided that in my own personal Forth, strings will be both counted and NULL terminated. The count is in the first BYTE and does NOT include the NULL terminator.

So "ABC" would use 5 bytes and look like this:

3 65 66 67 0

This makes it easy to work with the strings using either approach. And I find that some activities are more straightforward when stopping at NULL as opposed to keeping a count and stopping when that count exceeds the count byte.

If it turns out that I have a need to deal with longer strings in the future, then I will cross that bridge when I get to it.

Elizabeth D. Rather

unread,

Oct 31, 2015, 3:57:55 PM10/31/15

to

On 10/31/15 7:48 AM, Chris Curl wrote:
> I have decided that in my own personal Forth, strings will be both counted and NULL terminated. The count is in the first BYTE and does NOT include the NULL terminator.
>
> So "ABC" would use 5 bytes and look like this:
>
> 3 65 66 67 0
>
> This makes it easy to work with the strings using either approach. And I find that some activities are more straightforward when stopping at NULL as opposed to keeping a count and stopping when that count exceeds the count byte.

Which operations are those? Most string operations such as FILL, MOVE
and CMOVE are usually implemented as code primitives on an ITC Forth.
The only real use I find for null-terminated strings is for issuing
calls to an OS or C library that requires them (and can be handled with
a small set of special operators). Otherwise, runtime counting is
grossly inefficient, and the effort to place nulls at the ends of your
strings is usually unnecessary.

> If it turns out that I have a need to deal with longer strings in the future, then I will cross that bridge when I get to it.
>

That's a good plan. And when the situation arises, you can add specific
words to do what you need, rather than re-do all your strings and
string-handling words.

Chris Curl

unread,

Oct 31, 2015, 4:34:18 PM10/31/15

to

On Saturday, October 31, 2015 at 3:57:55 PM UTC-4, Elizabeth D. Rather wrote:
> Which operations are those? Most string operations such as FILL, MOVE
> and CMOVE are usually implemented as code primitives on an ITC Forth.
> The only real use I find for null-terminated strings is for issuing
> calls to an OS or C library that requires them (and can be handled with
> a small set of special operators). Otherwise, runtime counting is
> grossly inefficient, and the effort to place nulls at the ends of your
> strings is usually unnecessary.
>
> > If it turns out that I have a need to deal with longer strings in the future, then I will cross that bridge when I get to it.
> >
>
> That's a good plan. And when the situation arises, you can add specific
> words to do what you need, rather than re-do all your strings and
> string-handling words.
>
> Cheers,
> Elizabeth

Specifically, the routine that parses the input line. I find it easier just keep going until I hit a NULL, which of course means I am the end of the line.

And yes, I was planning on writing new words for long strings if need be.

Elizabeth D. Rather

unread,

Oct 31, 2015, 6:06:52 PM10/31/15

to

Indeed, many implementers have found terminating the input stream with a
null is convenient. I don't think that generalizes to all strings, though.

Rod Pemberton

unread,

Nov 2, 2015, 2:34:41 AM11/2/15

to

BTW, NULL is used by C for a pointer to non-C-object area,
but NUL is a character of all bits zero for ASCII and EBCDIC,
probably Unicode too. Or, NUL is a C-sized byte of all bits
zero for C where C's byte may be larger than the character
size in bits.

Some issues you might run into is which one to use and
what to do if the length of the nul terminated string
exceeds the ability to store the length in the byte.

I implemented null terminated strings in my Forth, as have
some others here. I've not noticed any serious issues yet.

IIRC, you only need to change one Forth word to work
slightly differently than intended by the specifications,
but isn't in violation of them.

I had planned to insert a byte for the count since null
terminated strings won't pass Hayes Core tests as they're
currently implemented, but I have yet to do so as it will
require reformatting my dictionary code.

Having programmed heavily in both C and PL/1 (a PL/I variant),
one which uses nul-terminated strings and the other counted
strings, I'm of the strong opinion that nul terminated strings
is the better solution. However, saying so seems to consistently
incite some flames here.

As Ms. Rather noted, some of Forth's string words are
typically implemented as primitives, many of which almost
map directly onto various C string functions.

CMOVE strncpy()
ENCLOSE strchr() or strpbrk()
COMPARE strcmp()
COUNT strlen()

Rod Pemberton