Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

1 /STRING vs SNIP

431 views
Skip to first unread message

foxaudio...@gmail.com

unread,
Oct 18, 2016, 10:31:17 PM10/18/16
to
I noticed a lot of code uses the phrase 1 /STRING so I made a word I call SNIP.

: snip ( addr len -- addr' len' ) 1- swap 1+ swap ;

It's not rocket science but in my ITC system SNIP was 3 times faster than
1 /STRING. Seems like a Forthy thing to do if 1 /STRING is used a lot.

BF

(I was going to call it MOHEL, but thought better of it.)
:-)

Ron Aaron

unread,
Oct 19, 2016, 12:24:54 AM10/19/16
to


On 19/10/2016 5:31, foxaudio...@gmail.com wrote:
> I noticed a lot of code uses the phrase 1 /STRING so I made a word I call SNIP.
>
> : snip ( addr len -- addr' len' ) 1- swap 1+ swap ;

Nice; I also have 'trim' etc for removing whitespace.

>

>
> (I was going to call it MOHEL, but thought better of it.)
> :-)

Oy ;-)

Anton Ertl

unread,
Oct 19, 2016, 3:59:33 AM10/19/16
to
foxaudio...@gmail.com writes:
>I noticed a lot of code uses the phrase 1 /STRING so I made a word I call SNIP.
>
>: snip ( addr len -- addr' len' ) 1- swap 1+ swap ;

63 occurences "1 /string" in 62393 lines of code. Too little IMO for
having to remember yet another word.

>It's not rocket science but in my ITC system SNIP was 3 times faster than
>1 /STRING.

In VFX:

VFX Forth for Linux IA32 Version: 4.72 [build 0555]
: snip 1- swap 1+ swap ; ok
: foo1 snip ; ok
: foo2 1 /string ; ok
see foo1
FOO1
( 080C0AB0 4B ) DEC EBX
( 080C0AB1 8B5500 ) MOV EDX, [EBP]
( 080C0AB4 42 ) INC EDX
( 080C0AB5 895500 ) MOV [EBP], EDX
( 080C0AB8 C3 ) NEXT,
( 9 bytes, 5 instructions )
ok
see foo2
FOO2
( 080C0AE0 83C3FF ) ADD EBX, -01
( 080C0AE3 8B5500 ) MOV EDX, [EBP]
( 080C0AE6 83C201 ) ADD EDX, 01
( 080C0AE9 895500 ) MOV [EBP], EDX
( 080C0AEC C3 ) NEXT,
( 13 bytes, 5 instructions )

Probably same speed as long as the code working set fits in the cache.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2016: http://www.euroforth.org/ef16/

HAA

unread,
Oct 19, 2016, 5:20:18 AM10/19/16
to
Is your /STRING written in machine code?




foxaudio...@gmail.com

unread,
Oct 19, 2016, 8:12:36 AM10/19/16
to
No. I was comparing Forth to Forth.

It is impressive to see what VFX turns /STRING into! Wow. That's tight code..

I will optimize /string.

BF

hughag...@gmail.com

unread,
Oct 19, 2016, 7:58:49 PM10/19/16
to
On Wednesday, October 19, 2016 at 12:59:33 AM UTC-7, Anton Ertl wrote:
> foxaudio...@gmail.com writes:
> >I noticed a lot of code uses the phrase 1 /STRING so I made a word I call SNIP.
> >
> >: snip ( addr len -- addr' len' ) 1- swap 1+ swap ;
>
> 63 occurences "1 /string" in 62393 lines of code. Too little IMO for
> having to remember yet another word.

If Anton Ertl is opposed to your SNIP because you have added another word for the user to remember, then he must absolutely hate my string-stack.4th package! I add about two dozen words for the user to remember! OMG! How would any human being ever learn all these words? It would take Anton Ertl a lifetime to learn how string-stack.4th works, even if he had Bernd Paysan helping him --- learning string-stack.4th is like learning rocket-science; it requires far more intelligence and perseverence than any ANS-Forth programmer could be expected to have! Even learning your SNIP is beyond the ability of most ANS-Forth programmers, and my string-stack.4th is at least an order of magnitude more complicated. Arg! ;-)

In this case, my DISCARD-LEFT$ would do the trick. I don't much like your SNIP because it is too specific in what it does. It only works with 1 char (by comparison; DISCARD-LEFT$ allows the programmer to specify any number of chars). Most importantly however, you don't have any way to manage your strings. You might have the original string still available, and also have the SNIP substring (I use the term "derivative" in my string-stack.4th) available, at the same time --- if you modify one of these strings, you will modify the other one also --- you are forcing the programmer to keep track of which strings are derivative of which other strings --- by comparison, in my string-stack.4th I have "unique" and "derivative" strings, but the system keeps track of which are which, so the programmer can write his code as if every string were unique and not worry about the optimization going on under the hood.

My string-stack.4th package is written entirely in ANS-Forth and should run on any ANS-Forth system. I'm not an ANS-Forth programmer however. Elizabeth Rather says:
-----------------------------------------------------------------------------
...in Forth it's so easy to build data structures that are exactly right for
the particular application at hand that worrying about what pre-built
structures you have and how to use them is just not worth the bother.
-----------------------------------------------------------------------------
She demands that every ANS-Forth program be written using only the raw ANS-Forth words, without any extensions at all. She demands that you use
1 /STRING
in your application program. Anton Ertl (appointed by Elizabeth Rather to be the chair-person of the Forth-200x committee) is parrotting Elizabeth Rather here. He is opposed to your SNIP because this causes a programmer of your system to have to worry about what pre-built words are available, which is just not worth the bother. Anton is worried that the reader of your code would encounter SNIP and think:
"What the hell is that??? SNIP is not listed in the ANS-Forth document! Arg! Looking up this SNIP word in the programmer's own documentation is just not
worth the bother --- I will forget about this program --- learning how it
works is like rocket-science!"
In my string-stack.4th I'm treating strings as a "pre-built structure" and providing code that will work in any program that does text processing. In ANS-Forth, we have /STRING (and you have provided SNIP which is a slight upgrade), all of which treat the string as a kind of array, and require the user to directly access the chars in this array. By comparison, in my string-stack.4th, the programmer would never directly access the chars in the strings, and I don't actually provide any way to do so. I could easily upgrade my string-stack.4th to use UTF-8 or UTF-32 or whatever, rather than ascii, and all the code written that uses string-stack.4th would continue to work without modification because it doesn't access chars directly and hence doesn't assume that the chars are any particular format or size.

I'm all about writing general-purpose Forth code. This is why Elizabeth Rather and her committee appointees Anton Ertl and Bernd Paysan, and all of their sycophants, say that I'm not an ANS-Forth programmer.

The following is an excerpt from my documentation. These are just the words used for extracting sub-strings from strings. I also have a lot more words used for searching in strings. Oftentimes, you will search first to find your substring, then extract a substring or substrings based on that information.

Anyway, here is the excerpt:
-----------------------------------------------------------------------------

LEN$ ( -- length ) \ string: a --
This returns the length of the string on the data-stack. This consumes the
string on the string-stack (Forth functions traditionally consume their
arguments), so if this is used and you still need the string, then DUP$ or
OVER$ or whatever should be used to keep a copy on the string-stack.

MID$ ( start-index length -- ) \ string: a -- b
The B string is a substring in the middle of the A string.

ANTI-MID$ ( start-index length -- ) \ string: a -- b
Returns the string with the middle part extracted (what MID$ would have
returned is not returned, but instead the edge parts concatenated together are
returned).

INNER$ ( start-index limit-index -- ) \ string: a -- b
This is like MID$ except that it uses a LIMIT-INDEX rather than a LENGTH (this
is somewhat like Mark Wills' MID$ and, to the best of my recollection, like
the QBASIC MID$). Note that the LIMIT-INDEX is 1 beyond the middle-part that
is kept (LIMIT-INDEX minus START-INDEX equals length).

ANTI-INNER$ ( start-index limit-index -- ) \ string: a -- b
Returns the string with the middle part extracted (what INNER$ would have
returned is not returned, but instead the edge parts concatenated together are
returned). Note that the LIMIT-INDEX is 1 beyond the middle-part that is
extracted (LIMIT-INDEX minus START-INDEX equals length).

LEFT$ ( length -- ) \ string: a -- b
This provides a substring of length LENGTH from the left side of the string.

RIGHT$ ( length -- ) \ string: a -- b
This provides a substring of length LENGTH from the right side of the string.

DISCARD-LEFT$ ( length -- ) \ string: a -- b
This discards a substring of length LENGTH from the left side of the string.

DISCARD-RIGHT$ ( length -- ) \ string: a -- b
This discards a substring of length LENGTH from the right side of the string.

FILL$ ( length char -- ) \ string: -- a
This produces a string filled with CHAR of length LENGTH.

BLANK$ ( length -- ) \ string: -- a
This produces a string filled with blanks of length LENGTH.

LPAD$ ( length -- ) \ string: a -- b
This pads the string with blanks on the left side so the total length is
LENGTH --- if the length of A is less than LENGTH nothing is done.

RPAD$ ( length -- ) \ string: a -- b
This pads the string with blanks on the right side so the total length is
LENGTH --- if the length of A is less than LENGTH nothing is done.

LTRIM$ ( -- ) \ string: a -- b
This trims the whitespace from the left side of the string.

RTRIM$ ( -- ) \ string: a -- b
This trims the whitespace from the right side of the string.

TRIM$ ( -- ) \ string: a -- b
This trims the whitespace from the left and right sides of the string.

BLACKEN$ ( -- ) \ string: a -- b
This removes all the whitespace from the entire string.

HAA

unread,
Oct 21, 2016, 1:02:41 AM10/21/16
to
hughag...@gmail.com wrote:
> On Wednesday, October 19, 2016 at 12:59:33 AM UTC-7, Anton Ertl wrote:
> > foxaudio...@gmail.com writes:
> > >I noticed a lot of code uses the phrase 1 /STRING so I made a word I call SNIP.
> > >
> > >: snip ( addr len -- addr' len' ) 1- swap 1+ swap ;
> >
> > 63 occurences "1 /string" in 62393 lines of code. Too little IMO for
> > having to remember yet another word.
>
> If Anton Ertl is opposed to your SNIP because you have added another word for the user
> to remember, then he must absolutely hate my string-stack.4th package! I add about two
> dozen words for the user to remember!
> ...
> In this case, my DISCARD-LEFT$ would do the trick. I don't much like your SNIP because
> it is too specific in what it does. It only works with 1 char (by comparison;
> DISCARD-LEFT$ allows the programmer to specify any number of chars). Most importantly
> however, you don't have any way to manage your strings.

/STRING isn't intended for that. It's a low-level primitive whose scope and uses
extend beyond string packages.




hughag...@gmail.com

unread,
Oct 21, 2016, 3:51:39 AM10/21/16
to
Well, memory management is important --- this is why languages with GC are popular --- the memory-management in my string-stack.4th is simpler than GC, but it does relieve the programmer from the error-prone tedium of allocating and freeing memory for strings.

For Forth to be useful, we have to get past using low-level primitives directly in programs --- this makes our programs primitive --- it is extremely frustrating to me that ANS-Forth holds us at this 1970s level forever.

My string-stack.4th was written in ANS-Forth, but I knew that it would never be used by ANS-Forth programmers --- it was written mostly as something that would go into Straight Forth.

Because the string-stack is a standardized part of Straight Forth, I can dedicate a register to be the string-stack pointer; plus we have the PTR and CNT registers that can represent an array --- this should boost the efficiency of the string-stack a lot --- text-processing is a big part of what desktop computers do, so every language needs to provide support (lack of support for text processing is one reason why C has lost popularity).

Note that application programmers don't have to mess with PTR and CNT --- they can leave that low-level primitive stuff to systems programmers --- application programmers are strongly discouraged from accessing chars directly, but are encouraged to only work with strings on the string-stack.

Everybody criticizes me for taking a long time to write Straight Forth. I don't think that is fair. Now it is almost a quarter of a century since ANS-Forth became the standard, and ANS-Forth is still not useful --- the ANS-Forth programmers are still dinking around with /STRING and other low-level primitives --- it doesn't matter when I come out with Straight Forth, because it will be the first-ever useful Forth system when it does come out.

Forth-200x has no chance of being useful at all --- it lacks quotations, which are the basis for all general-purpose data-structures --- it is a complete waste of time.

hughag...@gmail.com

unread,
Oct 21, 2016, 3:57:37 AM10/21/16
to
On Friday, October 21, 2016 at 12:51:39 AM UTC-7, hughag...@gmail.com wrote:
> ...text-processing is a big part of what desktop computers do, so every language needs to provide support (lack of support for text processing is one reason why C has lost popularity).

Just as a fun experiment, I might rewrite string-stack.4th in C. It will never be used by ANS-Forth programmers, as they absolutely hate it --- it might be accepted by C programmers though --- how ironic would that be?

trans

unread,
Oct 21, 2016, 6:49:55 AM10/21/16
to
Don't quite get anti-mid vs anti-inner. Also their names seem odd.

P.S. I know what it is to feel like the outsider, having strong opinions that differ from the mainstream group. It can bw very frustrating! But chin up, I for one like these little lib. Why reinvent the wheel?

trans

unread,
Oct 21, 2016, 7:07:20 AM10/21/16
to
Forth APIs seem very confusing, as to which ones work with which Forths. And varying conventions.

Anyway this looked interesting:
http://www-personal.umich.edu/~williams/archive/forth/strings/fstrings/fstrings.txt

trans

unread,
Oct 21, 2016, 8:20:33 AM10/21/16
to
On Friday, October 21, 2016 at 1:02:41 AM UTC-4, HAA wrote:

> /STRING isn't intended for that. It's a low-level primitive whose scope and uses
> extend beyond string packages.

Then why such a name?

HAA

unread,
Oct 21, 2016, 9:26:48 AM10/21/16
to
Why not such a name? The stack picture and specification makes clear it's dealing
with an arbitrary array of characters somewhere in memory. The OTA project spec
has a string concatenation primitive +STRING similarly intended for general use.
I use these functions all the time. The following example is from a previous posting.
The message it displays is, I think, particularly apt.

\ SKIP SCAN exist in most Forths as primitives
\ but can be coded from scratch

: +STRING ( a1 u1 a2 u2 -- a2 u3)
2swap swap 2over + 2 pick cmove + ;

: SPLIT ( a u char -- a2 u2 a3 u3 )
>r 2dup r> scan 2swap 2 pick - ;

: /WORD ( a u -- a1 u1 a2 u2 )
bl skip bl split 2swap ;

: +WORD ( a1 u1 a2 u2 -- a u )
+string s" " 2swap +string ;

: REVERSE-WORDS ( src len dest -- dest len )
>r 0 begin >r dup while /word r> 1+ repeat 2drop
2r> 0 tuck ?do +word loop -trailing ;

: .rvs ( a u -- ) pad reverse-words cr type ;

: plug ( -- ) cr
s" words user-created few a with things do often can Forth" .rvs
s" accomplish. to libraries require languages other that" .rvs ;

plug



Albert van der Horst

unread,
Oct 21, 2016, 9:41:15 AM10/21/16
to
In article <9c78f98e-09a2-491e...@googlegroups.com>,
Sensible and solid piece of work.
However.

After decennia of working with my 5 word ministring package I never
encountered a situation where I would rather use Williams package.

I'm now full busy with Basic Lisp and pascal compilers. For this I'm
inclined to go special purpose, and get dynamic strings tightly
coupled to my ALLOCATE (including "Hugh's" SIZE).

Groetjes Albert
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

trans

unread,
Oct 21, 2016, 12:13:51 PM10/21/16
to
On Friday, October 21, 2016 at 9:26:48 AM UTC-4, HAA wrote:

> Why not such a name?

B/c they day you finally come into modernity and use UTF-8, your word name isn't going to jive.

Although I suppose you could redefine the meaning of the word "string", the rest of the programming world be damned.

Anton Ertl

unread,
Oct 21, 2016, 12:31:23 PM10/21/16
to
trans <tran...@gmail.com> writes:
>B/c they day you finally come into modernity and use UTF-8, your word name isn't going to jive.

/STRING works nicely for stepping by n bytes in an UTF-8 string.

And if you want to step by 1 code point in an UTF-8 string, there's
+X/STRING.

Albert van der Horst

unread,
Oct 21, 2016, 1:57:31 PM10/21/16
to
In article <2016Oct2...@mips.complang.tuwien.ac.at>,
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>trans <tran...@gmail.com> writes:
>>B/c they day you finally come into modernity and use UTF-8, your word name isn't going to jive.
>
>/STRING works nicely for stepping by n bytes in an UTF-8 string.
>
>And if you want to step by 1 code point in an UTF-8 string, there's
>+X/STRING.

C@ C! CMOVE /STRING COMPARE have all gradually moved from meaning strings
to meaning bytes. It would be weird if all implementations change
because of change in English language habits.

E.g. all operators in ciforth starting with $ means rows of bytes.
Normally in this primitive Forth you would have one ascii character
per byte, but you're in no way restricted as what you store in them.
A compact serial representation of a factorisation that accomodates
unrestricted precision big numbers comes to mind. (In a transputer
demonstration program.)

>
>- anton

trans

unread,
Oct 21, 2016, 5:19:59 PM10/21/16
to
On Friday, October 21, 2016 at 1:57:31 PM UTC-4, Albert van der Horst wrote:
> In article <2016Oct2...@mips.complang.tuwien.ac.at>,
> Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> >trans <tran...@gmail.com> writes:
> >>B/c they day you finally come into modernity and use UTF-8, your word name isn't going to jive.
> >
> >/STRING works nicely for stepping by n bytes in an UTF-8 string.
> >
> >And if you want to step by 1 code point in an UTF-8 string, there's
> >+X/STRING.
>
> C@ C! CMOVE /STRING COMPARE have all gradually moved from meaning strings
> to meaning bytes. It would be weird if all implementations change
> because of change in English language habits.

Ah, I see. So that has already happened. I'm learning a lot about the history and culture of Forth. Thanks.

HAA

unread,
Oct 21, 2016, 10:47:45 PM10/21/16
to
Albert van der Horst wrote:
> ...
> C@ C! CMOVE /STRING COMPARE have all gradually moved from meaning strings
> to meaning bytes.

They were 'bytes' from the very beginning. From Forth-79 ...

C! n addr -- 219 "c-store"
Store the least significant 8-bits of n at addr.

C@ addr -- byte 156 "c-fetch"
Leave on the stack the contents of the byte at addr (with
higher bits zero, in a 16-bit field).

CMOVE addr1 addr2 n -- 153 "c-move"
Move n bytes beginning at address addr1 to addr2. The
contents of addr1 is moved first proceeding toward high
memory. If n is zero nothing is moved.




Hans Bezemer

unread,
Oct 22, 2016, 6:25:33 AM10/22/16
to
foxaudio...@gmail.com wrote:

> I noticed a lot of code uses the phrase 1 /STRING so I made a word I call
> SNIP.
>
> : snip ( addr len -- addr' len' ) 1- swap 1+ swap ;
>
> It's not rocket science but in my ITC system SNIP was 3 times faster than
> 1 /STRING. Seems like a Forthy thing to do if 1 /STRING is used a lot.
4tH has got a similar word called CHOP. I use it more regularly than /STRING
(can't even remember when I used that one for the last time. Must have been
while I was porting something). IMHO it is a useful abstraction.

Hans Bezemer

Stephen Pelc

unread,
Oct 22, 2016, 7:34:14 AM10/22/16
to
On Fri, 21 Oct 2016 20:06:08 +0200 (CEST),
alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) wrote:

>C@ C! CMOVE /STRING COMPARE have all gradually moved from meaning strings
>to meaning bytes. It would be weird if all implementations change
>because of change in English language habits.

This is formalised in the Forth 2012 standard. Unless otherwise
qualified in the text, the term 'character' means a 'primitive
character' (pchar), which can contain any value. On most CPUs,
a pchar is a byte.

Stephen

--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads

Anton Ertl

unread,
Oct 22, 2016, 9:52:25 AM10/22/16
to
ste...@mpeforth.com (Stephen Pelc) writes:
>On Fri, 21 Oct 2016 20:06:08 +0200 (CEST),
>alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) wrote:
>
>>C@ C! CMOVE /STRING COMPARE have all gradually moved from meaning strings
>>to meaning bytes. It would be weird if all implementations change
>>because of change in English language habits.
>
>This is formalised in the Forth 2012 standard. Unless otherwise
>qualified in the text, the term 'character' means a 'primitive
>character' (pchar), which can contain any value. On most CPUs,
>a pchar is a byte.

And at the last meeting we standardized "1 chars = 1". So if the
address unit is a byte, /STRING works on bytes.

hughag...@gmail.com

unread,
Oct 22, 2016, 11:02:03 PM10/22/16
to
On Friday, October 21, 2016 at 3:49:55 AM UTC-7, trans wrote:
> Don't quite get anti-mid vs anti-inner. Also their names seem odd.

They are simple and obvious. Here is the excerpt:
--------------------------------------------------------------------
MID$ ( start-index length -- ) \ string: a -- b
The B string is a substring in the middle of the A string.

ANTI-MID$ ( start-index length -- ) \ string: a -- b
Returns the string with the middle part extracted (what MID$ would have
returned is not returned, but instead the edge parts concatenated together are
returned).

INNER$ ( start-index limit-index -- ) \ string: a -- b
This is like MID$ except that it uses a LIMIT-INDEX rather than a LENGTH (this
is somewhat like Mark Wills' MID$ and, to the best of my recollection, like
the QBASIC MID$). Note that the LIMIT-INDEX is 1 beyond the middle-part that
is kept (LIMIT-INDEX minus START-INDEX equals length).

ANTI-INNER$ ( start-index limit-index -- ) \ string: a -- b
Returns the string with the middle part extracted (what INNER$ would have
returned is not returned, but instead the edge parts concatenated together are
returned). Note that the LIMIT-INDEX is 1 beyond the middle-part that is
extracted (LIMIT-INDEX minus START-INDEX equals length).
--------------------------------------------------------------------

MID$ takes a start-index and a length
INNER$ takes a start-index and a limit-index
These two extract a sub-string from the middle of a string.
This is how INNER$ is defined:

: inner$ ( start-index limit-index -- ) \ string: a -- b
over - mid$ ;

ANTI-MID$ and ANTI-INNER$ are like MID$ and INNER$ except that instead of returning the sub-string in the middle, they discard the middle part and return the rest of the string (the two edge sub-strings concatenated together).

What don't you get?

hughag...@gmail.com

unread,
Oct 22, 2016, 11:13:37 PM10/22/16
to
On Friday, October 21, 2016 at 3:49:55 AM UTC-7, trans wrote:
> P.S. I know what it is to feel like the outsider, having strong opinions that differ from the mainstream group. It can bw very frustrating! But chin up, I for one like these little lib. Why reinvent the wheel?

I'm not reinventing the wheel. The memory-management system in string-stack.4th is my own invention. Nobody else has ever done anything like this.

The history here is that Mark Wills suggested to me that I write a string-stack package that used the heap. My original plan was pretty simple --- every time that a string is duplicated (with DUP$ OVER$ etc.) I would ALLOCATE a new string and copy the old string to the new string --- this way, any changes made to one string won't affect the other string.

This original plan is actually quite slow however, so it wasn't acceptable to me. My invention was the "unique" and "derivative" strings. A derivative string is just a reference to the unique string, so ALLOCATE is not necessary, and copying the string is not necessary. Only if the derivative string is going to change does it get converted into a unique string. Also, if a unique is going to change, then all of its derivatives get converted into uniques so they don't get affected. A lot of what you do with strings doesn't change them, so in many cases it is not necessary to convert derivatives into uniques. For example, printing out a string doesn't change it.

I've never looked at any other string packages so I don't really know what they are doing, but I feel confident that my string-stack.4th is far superior to them --- I think up ideas, and they work --- nobody else is doing any thinking at all; I don't need to look at their stuff.

trans

unread,
Oct 23, 2016, 12:09:18 AM10/23/16
to
In that case disregard everything I said.

HAA

unread,
Oct 23, 2016, 8:15:56 PM10/23/16
to
Anton Ertl wrote:
> ste...@mpeforth.com (Stephen Pelc) writes:
> >On Fri, 21 Oct 2016 20:06:08 +0200 (CEST),
> >alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) wrote:
> >
> >>C@ C! CMOVE /STRING COMPARE have all gradually moved from meaning strings
> >>to meaning bytes. It would be weird if all implementations change
> >>because of change in English language habits.
> >
> >This is formalised in the Forth 2012 standard. Unless otherwise
> >qualified in the text, the term 'character' means a 'primitive
> >character' (pchar), which can contain any value. On most CPUs,
> >a pchar is a byte.
>
> And at the last meeting we standardized "1 chars = 1". So if the
> address unit is a byte, /STRING works on bytes.
>
> - anton

That option already existed under Forth-94. It was only necessary to
state the assumption. All the 200x decision does is encourage
sloppiness - code that under Forth-94 would be considered poorly
written. Presumably code that uses CHARS in some places and
not others, CHAR+ in some places and 1+ in others, in a willy-nilly
fashion is now perfectly legal under 200x ?

CHARS and CHAR+ are still there (and need to be) along with all the
operators that deal with chars. Like 0= vs NOT they add clarity and
readability to code even if on most machines they're just synonyms.



Albert van der Horst

unread,
Oct 23, 2016, 8:52:16 PM10/23/16
to
In article <nujjrn$579$1...@gioia.aioe.org>, HAA <som...@microsoft.com> wrote:
>Anton Ertl wrote:
>> ste...@mpeforth.com (Stephen Pelc) writes:
>> >On Fri, 21 Oct 2016 20:06:08 +0200 (CEST),
>> >alb...@cherry.spenarnc.xs4all.nl (Albert van der Horst) wrote:
>> >
>> >>C@ C! CMOVE /STRING COMPARE have all gradually moved from meaning strings
>> >>to meaning bytes. It would be weird if all implementations change
>> >>because of change in English language habits.
>> >
>> >This is formalised in the Forth 2012 standard. Unless otherwise
>> >qualified in the text, the term 'character' means a 'primitive
>> >character' (pchar), which can contain any value. On most CPUs,
>> >a pchar is a byte.
>>
>> And at the last meeting we standardized "1 chars = 1". So if the
>> address unit is a byte, /STRING works on bytes.
>>
>> - anton
>
>That option already existed under Forth-94. It was only necessary to
>state the assumption. All the 200x decision does is encourage
>sloppiness - code that under Forth-94 would be considered poorly
>written. Presumably code that uses CHARS in some places and
>not others, CHAR+ in some places and 1+ in others, in a willy-nilly
>fashion is now perfectly legal under 200x ?

If I use a byte array e.g. for the famous erathostenes sieve,
because it only need to contain a flag, I'm using C@ C!
These are byte operators to me. If I add CHARS here and there
at places that would suggest that we are dealing with characters.
Address units should be fundamental. Then you document how many
chars you can store in an address unit, or how many address units
it takes to store a character. (And how many address units is a cell.)

>
>CHARS and CHAR+ are still there (and need to be) along with all the
>operators that deal with chars. Like 0= vs NOT they add clarity and
>readability to code even if on most machines they're just synonyms.

I never can tell MOVE and CMOVE apart.

NOT to invert a flag is better than 0= because that suggest
arithmetic not logic. Too bad we can't use it anymore, so
I have to start all my programs with
: NOT 0= ;

hughag...@gmail.com

unread,
Oct 23, 2016, 9:36:54 PM10/23/16
to
I have said that I want Straight Forth to be a grassroots effort, and that I am willing to listen to input from Forth programmers.

I said above: "I don't need to look at their stuff." What I meant here, is that I want to avoid a situation like we have in Forth-200x. On the Forth-200x mailing-list, there are a lot of people who seem to know very little about Forth, but just think that it is pretty awesome that they get to participate in the design of the Forth-200x standard. The result is a lot of tedious discussions about nonsense. I want input on Straight Forth --- but only if the person is an actual programmer --- only if the person has something worthwhile to say, and isn't just playing big-expert make-believe.

Another bad thing about Forth-200x, is that ultimately Leon Wagner just decides what will be Forth-200x, and he is mostly focused on keeping SwiftForth legacy code going. For example, locals have been defined with { ... } since the late 1980s (the John Hopkins format), but Leon Wagner killed this in Forth-200x because SwiftForth uses { ... } for multi-line comments. I don't want to be like Leon Wagner and pretend to be interested in people's input, but to not actually care in the slightest. If a person has input that contradicts my ideas, I'm willing to change Straight Forth if this is shown to be better. It is a waste of everybody's time to solicit input and then ignore it, the way that Leon Wagner does.

I think that the whole purpose of the Forth-200x mailing list is just to collect names of "contributors." Even if a person withdraws from the mailing list and withdraws his support of Forth-200x, the Forth-200x committee will still list that person's name as a contributor. They just want people's names, so they can list them as contributors (which is assumed to mean supporters), but they don't actually want the people's input.

I do want people's input --- but only if they have given some thought to the subject --- if a person just wants to have fun being a big expert, but has nothing to contribute except idiot blather, then the person should stay with Forth-200x because I have no patience for this.

HAA

unread,
Oct 23, 2016, 10:30:17 PM10/23/16
to
Albert van der Horst wrote:
> ...
> I never can tell MOVE and CMOVE apart.

You should because they're very different functions. MOVE handles
overlaps (an extra test) and on some implementations can actually
be faster. CMOVE is one-directional and explicitly char-at-a-time.

And I didn't even need to look it up :)



Anton Ertl

unread,
Oct 24, 2016, 7:36:57 AM10/24/16
to
"HAA" <som...@microsoft.com> writes:
>Anton Ertl wrote:
>> And at the last meeting we standardized "1 chars = 1". So if the
>> address unit is a byte, /STRING works on bytes.
...
>That option already existed under Forth-94.

Yes, it was an option, and all popular systems implemented this
option, and many programs rely on it. I.e., it is common practice,
and that's why we standardized it.

>It was only necessary to
>state the assumption. All the 200x decision does is encourage
>sloppiness - code that under Forth-94 would be considered poorly
>written.

There are some people who consider such code poorly written, and many
that don't (as evidenced by the very common practice of writing such
code). And that's pretty independent of the standard. E.g., there
are people who consider code poorly written if it is not written to be
independent of the canonical flag representation, while every Forth
standard has specified a canonical flag representation.

>Presumably code that uses CHARS in some places and
>not others, CHAR+ in some places and 1+ in others, in a willy-nilly
>fashion is now perfectly legal under 200x ?

Not sure what you mean with willy-nilly. Forth-94 and Forth-2012
programs can also use 1+, CHAR+, and CHARS in the same program. In
the next standard a standard program can use 1+ where a Forth-2012
program should have used CHAR+ and use nothing where a Forth-2012 program
should have used CHARS or 1 CHARS /.

>CHARS and CHAR+ are still there (and need to be)

They need to be there for compatibility with standard Forth-2012
programs; they also allow people to use them who feel the urge to use
these words for whatever reasons.

hughag...@gmail.com

unread,
Oct 24, 2016, 10:44:53 PM10/24/16
to
On Monday, October 24, 2016 at 4:36:57 AM UTC-7, Anton Ertl wrote:
> "HAA" <som...@microsoft.com> writes:
> >Presumably code that uses CHARS in some places and
> >not others, CHAR+ in some places and 1+ in others, in a willy-nilly
> >fashion is now perfectly legal under 200x ?
>
> Not sure what you mean with willy-nilly. Forth-94 and Forth-2012
> programs can also use 1+, CHAR+, and CHARS in the same program. In
> the next standard a standard program can use 1+ where a Forth-2012
> program should have used CHAR+ and use nothing where a Forth-2012 program
> should have used CHARS or 1 CHARS /.
>
> >CHARS and CHAR+ are still there (and need to be)
>
> They need to be there for compatibility with standard Forth-2012
> programs; they also allow people to use them who feel the urge to use
> these words for whatever reasons.

ANS-Forth and Forth-200x are both products of the 1970s. ANS-Forth's /STRING is to programming what bell-bottom jeans are to fashion --- cool in its day, but not something that anybody cares about now.

In Straight Forth the programmer will almost never directly access chars. You only work with strings --- some strings may have only one char in them --- this is the same as with my string-stack.4th that is written in ANS-Forth.

The char in Straight Forth could be an 8-bit ascii char, or a 32-bit UTF-32 char --- the user doesn't care if they are 1 byte or 4 bytes, because he doesn't directly access the chars (I'm actually using ascii right now, although later on I will upgrade to UTF-32).

Cells in Straight Forth are 64-bit (8 bytes), with numbers being fixed-point with an assumed unity of 2^32. All arrays have to have an element size that is a multiple of the cell size (the cell size is called W rather than CELL). This includes strings. Chars are UTF-32. So, a string element is a cell with only the integer part containing data but nothing in the fractional part --- this is wasteful of memory, as only 32 bits are needed for UTF-32, but 64 bits are being used --- I don't care because modern computers have plenty of memory.

I will provide functions to compress strings into UTF-8 for compact storage, and to uncompress them back into the format described above for working with them. That could be useful for programs that work with a large amount of text --- a text-editor for example --- it might not be needed though, as we do have virtual memory nowadays.

Most all of the design decisions of ANS-Forth assume the PDP-11 which had only eight 16-bit registers and only 64KB memory total. Charles Moore wrote Forth for the PDP-11 because it was state of the art at the time, and Elizabeth Rather has been carrying his design decisions onward for decades --- she acts like she is Moses and these design decisions were written in stone by the finger of God --- this is despite the fact that these design decision became obsolete at about the same time that Charles Moore got kicked out of Forth Inc. (1982).

hughag...@gmail.com

unread,
Oct 25, 2016, 7:38:14 PM10/25/16
to
I skimmed over that article FSTRINGS.TXT --- he isn't doing any memory-management, so it is not directly interesting to me --- I might be willing to switch over to his naming convention though; mine is based on QBASIC which he didn't like.

He has some functions that I don't have. Rotating strings, for example --- I could implement that if there were a demand --- I don't know what it is for.

He has some functions for treating strings as sets --- I could implement that too --- I'm also unclear on what that is for, but it seems more likely.

David N. Williams

unread,
Oct 26, 2016, 10:50:51 AM10/26/16
to
On 10/25/16 7:38 PM, hughag...@gmail.com wrote:
> On Friday, October 21, 2016 at 4:07:20 AM UTC-7, trans wrote:
>> [...]
>>
>> Anyway this looked interesting:
>> http://www-personal.umich.edu/~williams/archive/forth/strings/
>> fstrings/fstrings.txt
>>
>
> I skimmed over that article FSTRINGS.TXT --- he isn't doing any
> memory-management, so it is not directly interesting to me --- I might
> be willing to switch over to his naming convention though; mine is
> based on QBASIC which he didn't like.

Just in case there's any confusion, the article is not by me, but rather
by Marcus Gabriel, as explained here:

http://www.umich.edu/~williams/archive/forth/strings/fstrings/readme.txt

My own dynamic-strings package,

http://www.umich.edu/~williams/archive/forth/strings/dstrings.html

with string stack and garbage collection, was apparently too elaborate
to have been taken up by Forthists, although I am fairly proud of it.

My more conventional ANS Forth string library

http://www.umich.edu/~williams/archive/forth/strings/mstrings.html

works with the data stack and buffers allocated by the user. It is lean,
to me at least.

-- David W.

David N. Williams

unread,
Oct 26, 2016, 10:53:02 AM10/26/16
to
On 10/26/16 10:50 AM, David N. Williams wrote:
>
> Just in case there's any confusion, the article is not by me, but rather
> by Marcus Gabriel, as explained here:

Oops! It's by George Hawkins!

-- David W.

Albert van der Horst

unread,
Oct 26, 2016, 11:24:04 AM10/26/16
to
In article <nuqfri$k4m$1...@dont-email.me>,
I see with respect the neatness and the amount of work that is gone
into this. However, if you're honest, can you testify that all
those words have been used at least once in your life, or someone else
who use the package?

If I look at the way I handle strings, I always use a fixed buffer,
and merely 5 words that I can remember easily.
Lately I've added .format that uses %xxx with xxx in a user extendable
wordlist.

The only thing I miss is the word $\ (string-slash-from-behind)
analogous to $/ (string slash) :
This would do
"/home/scan/mimi/test090910.tif" &/ $\ TYPE TYPE

/home/scan/mimi test090910.tif

Now I'm thinking about how turn a pascal program (file size long string)
into a Forth program (idem). My conclusion is that it needs special
string handling, closely coupled to an ALLOCATE implementation.
This results in a very limited set of words, and very little copying
and also very little generality.
I'm always am surprised with the ease one introduces string concatenation
$+ that can easily lead to quadratic run time in e.g. the above case
and immediately confronts one with problems like memory recovery/
garbage collection.

>
>-- David W.

foxaudio...@gmail.com

unread,
Oct 26, 2016, 10:52:48 PM10/26/16
to
On Wednesday, October 19, 2016 at 3:59:33 AM UTC-4, Anton Ertl wrote:
> foxaudio...@gmail.com writes:
> >I noticed a lot of code uses the phrase 1 /STRING so I made a word I call SNIP.
> >
> >: snip ( addr len -- addr' len' ) 1- swap 1+ swap ;
>
> 63 occurences "1 /string" in 62393 lines of code. Too little IMO for
> having to remember yet another word.
>
> >It's not rocket science but in my ITC system SNIP was 3 times faster than
> >1 /STRING.
>
> In VFX:
>
> VFX Forth for Linux IA32 Version: 4.72 [build 0555]
> : snip 1- swap 1+ swap ; ok
> : foo1 snip ; ok
> : foo2 1 /string ; ok
> see foo1
> FOO1
> ( 080C0AB0 4B ) DEC EBX
> ( 080C0AB1 8B5500 ) MOV EDX, [EBP]
> ( 080C0AB4 42 ) INC EDX
> ( 080C0AB5 895500 ) MOV [EBP], EDX
> ( 080C0AB8 C3 ) NEXT,
> ( 9 bytes, 5 instructions )
> ok
> see foo2
> FOO2
> ( 080C0AE0 83C3FF ) ADD EBX, -01
> ( 080C0AE3 8B5500 ) MOV EDX, [EBP]
> ( 080C0AE6 83C201 ) ADD EDX, 01
> ( 080C0AE9 895500 ) MOV [EBP], EDX
> ( 080C0AEC C3 ) NEXT,
> ( 13 bytes, 5 instructions )
>
> Probably same speed as long as the code working set fits in the cache.
Just for fun it's only three instructions on the TMS9900 with TOS in a register.
Memory to memory architecture was cool for coding but sloooooooow.

CODE: /STRING ( c-addr1 u1 n -- c-addr2 u2 )
TOS *SP SUB,
TOS 2 (SP) ADD,
*SP+ TOS MOV, \ refill TOS
NEXT,
END-CODE

Anton Ertl

unread,
Oct 27, 2016, 5:46:55 AM10/27/16
to
...
>Just for fun it's only three instructions on the TMS9900 with TOS in a register.

It can be reduced to two instructions on IA-32, as SwiftForth demonstrates:

see foo2
8083C4F 1 # 0 [EBP] ADD 83450001
8083C53 1 # EBX SUB 83EB01
8083C56 RET C3 ok

Apparently SwiftForth has an optimization rule for n /STRING; whether
this is faster than the VFX code depends on the actual processor. The
SNIP code is bigger and probably slower on SwiftForth, however:

see snip
8083BFF EBX DEC 4B
8083C00 0 [EBP] EAX MOV 8B4500
8083C03 EBX 0 [EBP] MOV 895D00
8083C06 EAX EBX MOV 8BD8
8083C08 EBX INC 43
8083C09 0 [EBP] EAX MOV 8B4500
8083C0C EBX 0 [EBP] MOV 895D00
8083C0F EAX EBX MOV 8BD8
8083C11 RET C3 ok

alex

unread,
Oct 27, 2016, 9:07:49 AM10/27/16
to
The technique is very well known. It even has a name; COW. Copy On Write.

--
Alex

foxaudio...@gmail.com

unread,
Oct 27, 2016, 9:46:30 AM10/27/16
to
Is the rule just for 1 /string ?

My example is for an arbitrary parameter value.

BF

David N. Williams

unread,
Oct 27, 2016, 10:16:06 AM10/27/16
to
On 10/26/16 11:33 AM, Albert van der Horst wrote:
> In article <nuqfri$k4m$1...@dont-email.me>,
> David N. Williams <will...@umich.edu> wrote:
>> [...]
>>
>> My own dynamic-strings package,
>>
>> http://www.umich.edu/~williams/archive/forth/strings/dstrings.html
>>
>> with string stack and garbage collection, was apparently too elaborate
>> to have been taken up by Forthists, although I am fairly proud of it.
>>
>> My more conventional ANS Forth string library
>>
>> http://www.umich.edu/~williams/archive/forth/strings/mstrings.html
>>
>> works with the data stack and buffers allocated by the user. It is
lean,
>> to me at least.
>
> I see with respect the neatness and the amount of work that is gone
> into this. However, if you're honest, can you testify that all
> those words have been used at least once in your life, or someone else
> who use the package?

I can't say that I've used them all, beyond testing; but I have used a
fair number of them. I can't speak for other users. I actually know of
only one (if memory serves), and that was experimental.

> If I look at the way I handle strings, I always use a fixed buffer,
> and merely 5 words that I can remember easily.

I understand the appeal of doing much with few words. However, I'm
firmly in the library camp. To me, a library is a collection of words
organized for a purpose for which there is a more or less natural
logical structure. I find it more of a conceptual load to omit words
that logically "ought" to be there than to include them. Coming from
there, I admit that I probably get seduced into over-elaboration. But
actually I'm not so sure it's unjustified. I guess it's not for me to
say. It goes without saying that a library needs to be well documented
and tested.

Even worse, I'm in the multiple namespace and multiple stack camps. :-)

> Lately I've added .format that uses %xxx with xxx in a user extendable
> wordlist.
>
> The only thing I miss is the word $\ (string-slash-from-behind)
> analogous to $/ (string slash) :
> This would do
> "/home/scan/mimi/test090910.tif" &/ $\ TYPE TYPE
>
> /home/scan/mimi test090910.tif

Nice word!

> Now I'm thinking about how turn a pascal program (file size long string)
> into a Forth program (idem). My conclusion is that it needs special
> string handling, closely coupled to an ALLOCATE implementation.
> This results in a very limited set of words, and very little copying
> and also very little generality.
> I'm always am surprised with the ease one introduces string concatenation
> $+ that can easily lead to quadratic run time in e.g. the above case
> and immediately confronts one with problems like memory recovery/
> garbage collection.

It was not *the* design purpose, but this an example of the kind of
thing the dstrings library was designed for. Before the project got
scooped by the open sourcing of Vermaseren's FORM and the demise of the
ppc, I was well along in a project that was using it to write a text
translater for Veltman's Schoonschip M680x0 assembly language source to
ppc assembly language.

There was also a Forth to pfe C project (which I don't guarantee to work
with the latest dstrings.):

http://www.umich.edu/~williams/archive/forth/hatforth/

A simpler example, which may even still work, is a text translator I
wrote to translate C library binding descriptions to pfe C bindings:

http://www.umich.edu/~williams/archive/forth/gmpfr/#pfebindings

This was used with the forth-gmpfr bignum package.

-- David

Coos Haak

unread,
Oct 27, 2016, 11:00:46 AM10/27/16
to
Op Wed, 26 Oct 2016 17:33:08 +0200 (CEST) schreef Albert van der Horst:

<snip>
> The only thing I miss is the word $\ (string-slash-from-behind)
> analogous to $/ (string slash) :
> This would do
> "/home/scan/mimi/test090910.tif" &/ $\ TYPE TYPE
>
> /home/scan/mimi test090910.tif
>
>
> Groetjes Albert

I've two words for that, only for path+filenames, looks only for both
slashes and dots.

ident type f:/bin/f.exe
ident basename type f.exe
ident dirname type f:/bin

(DOS accepts forward slashes, I like them better)

groet Coos

Doug Hoffman

unread,
Oct 27, 2016, 5:01:33 PM10/27/16
to
On 10/27/16 10:16 AM, David N. Williams wrote:

> It was not *the* design purpose, but this an example of the kind of
> thing the dstrings library was designed for. Before the project got
> scooped by the open sourcing of Vermaseren's FORM and the demise of the
> ppc, I was well along in a project that was using it to write a text
> translater for Veltman's Schoonschip M680x0 assembly language source to
> ppc assembly language.

That's interesting. Long ago when Macs were 680x0 I invested
considerable time and effort learning (Forth) assembler for the same. I
had a few non-trivial routines written that way including a fast
Boyer-Moore text search. But all that effort was negated when Macs moved
to the PPC. It was then that I vowed to stick with high level standard
Forth and I haven't looked back. No regrets. Optimizing compilers like
VFX do an excellent job.

-Doug

hughag...@gmail.com

unread,
Oct 27, 2016, 10:12:28 PM10/27/16
to
I had never heard of Copy On Write --- I looked it up right now though and it does seem to be pretty much the same as what I'm doing.

If this is so "well known," why hasn't it ever been done in Forth before?

The reason why this technique, and other good ideas, have never been done in Forth before is that ANS-Forth is the "Standard." There aren't going to be any good ideas in Forth-200x either, because Forth-200x is mandated to be 100% compatible with ANS-Forth, and because all of the committee members are appointed by Elizabeth Rather.

The whole idea with Straight Forth is to escape from the dumb-as-dirt mentality that permeates Elizabeth Rather's toy language. I'm really never going to become an ANS-Forth or Forth-200x programmer --- that is just shameful --- that is like voluntarily signing up for the Special-Ed class.

There are a lot of good ideas in computer science. Quotations are one of them. This COW for strings is another one. There are others. I'm not saying that I can think up everything myself --- I'm saying that in Straight Forth good ideas will be made a part of the Standard.

Here is an excerpt from my Straight Forth document:
-----------------------------------------------------------------------------
There are seven types of data:
These are the 3 primary types of data:
1.) number a signed 64-bit fixed-point with unity assumed to be 2^32 --- this includes pointers, etc.
data-stack
2.) string this is a one-cell struct containing a pointer and a count
string-stack or data-stack
3.) xt an "execution-token" --- this is a one-cell struct --- these are given to EXECUTE
data-stack
These are 4 more types of data, all of which are optional extensions:
4.) double-number a 128-bit fixed-point with unity assumed to be 2^64
double-stack
5.) ratio two 64-bit integers (unity assumed to be 1), which are a numerator and denominator
double-stack
6.) BFI "big f'ing integer" --- a 128-bit integer with unity assumed to be 1
double-stack
7.) float an IEEE-754 64-bit float
float-stack
-----------------------------------------------------------------------------

The only use for BFI numbers that I can think of, is Nathaniel Grossman's continued-fraction program. The Float data-type is mostly only used for low-level numeric functions (SIN COS etc.), but all of the obvious ones are built-in anyway. Both of these can be ignored by almost all users.

The Double and Ratio data-types are only useful for cases in which a Number lacks adequate precision and/or range. Both of these can be ignored by most users.

Most programs will only use the Number, String and XT data-types --- the two "killer apps" are cross-compiling to micro-controllers (primarily Forth, but other languages as well), and CAM (generating gcode for CNC machines) --- these three basic data types should be adequate.

The two built-in data-structures are the Array and the Chain --- these (mostly the Chain) should be adequate for 99% of all Straight Forth programs.

I really really really hate ANS-Forth; I'm never going to become an ANS-Forth programmer --- the Straight Forth that I'm proposing is pretty simple --- support for the Number and String data-types, and the Array and Chain data-structures, seems obvious and non-controversial to me.

Stephen Pelc

unread,
Oct 28, 2016, 3:31:47 AM10/28/16
to
On Thu, 27 Oct 2016 09:42:20 GMT, an...@mips.complang.tuwien.ac.at
(Anton Ertl) wrote:

>It can be reduced to two instructions on IA-32, as SwiftForth demonstrates:
>
>see foo2
>8083C4F 1 # 0 [EBP] ADD 83450001
>8083C53 1 # EBX SUB 83EB01
>8083C56 RET C3 ok
>
>Apparently SwiftForth has an optimization rule for n /STRING; whether
>this is faster than the VFX code depends on the actual processor.

At some stage in the past (I forget for which x86 CPU),
memory/immediate instructions became very slow. At that point,
we removed their use by the VFX code generator. Ho, hum, time
to read up on current CPUs.

jim

unread,
Oct 28, 2016, 6:35:16 AM10/28/16
to
Stephen

These days it's hard to find a processor. Just a processor.
If you want to implement a high speed application
(say a radar imaging job) you end up having to look
at video processors - it seems nobody is making "just a
processor" anymore. We now seem to aquired the
concept of "hardware bloat"
It's not exactly conducive to stack machines.

> In article <5812fe2f....@news.eternal-september.org>,
ste...@mpeforth.com says...

jasmijn Bonting

unread,
Oct 28, 2016, 7:09:54 AM10/28/16
to
Op vrijdag 28 oktober 2016 09:31:47 UTC+2 schreef Stephen Pelc:
> On Thu, 27 Oct 2016 09:42:20 GMT, an...@mips.complang.tuwien.ac.at
> (Anton Ertl) wrote:
>
> >It can be reduced to two instructions on IA-32, as SwiftForth demonstrates:
> >
> >see foo2
> >8083C4F 1 # 0 [EBP] ADD 83450001
> >8083C53 1 # EBX SUB 83EB01
> >8083C56 RET C3 ok
> >
> >Apparently SwiftForth has an optimization rule for n /STRING; whether
> >this is faster than the VFX code depends on the actual processor.
>
> At some stage in the past (I forget for which x86 CPU),
> memory/immediate instructions became very slow. At that point,
> we removed their use by the VFX code generator. Ho, hum, time
> to read up on current CPUs.
>
> Stephen
>

It is still slow, since the total instruction for large immidates is longer then 7 bytes and has to pre decoded before the next instruction.
The instruction by itself is a 4 micro cycle instruction and can not
combined with a other non 1 micro opcode instruction.

Tessa

Stephen Pelc

unread,
Oct 28, 2016, 8:41:20 AM10/28/16
to
On Fri, 28 Oct 2016 04:09:53 -0700 (PDT), jasmijn Bonting
<tessa....@gmail.com> wrote:

>It is still slow, since the total instruction for large immidates is
>longer then 7 bytes and has to pre decoded before the next instruction.
>The instruction by itself is a 4 micro cycle instruction and can not
>combined with a other non 1 micro opcode instruction.

Thanks for the reminder and telling me that the problem still exists.

Anton Ertl

unread,
Oct 28, 2016, 1:09:47 PM10/28/16
to
I have now done a benchmark: ten copies of the "1 /STRING" code, but
with different memory locations for the second stack items to avoid
the dependence-through-memory slowdown that would probably swamp all
other effects. Here are two copies:

VFX-like SF-like
movq (%rdi), %rax addq $1, 0(%rdi)
addq $1, %rax subq $1, %rsi
movq %rax, (%rdi) addq $1, 8(%rdi)
subq $1, %rsi subq $1, %rsi
movq 8(%rdi), %rax
addq $1, %rax
movq %rax, 8(%rdi)
subq $1, %rsi

100M iterations of these ten copies take the following number of cycles:

VFX-like SF-like
1537853028 1167890963 Phenom II X2 560
1105969806 1021509951 Core i7 4690K (Haswell)

So the SwiftForth-like code is faster than the VFX-like code on both
the Phenom II and the Haswell.

Chris Curl

unread,
Oct 28, 2016, 1:59:31 PM10/28/16
to
Sure, the numbers are smaller for the SF-like version, but those numbers
are so big that they don't tell me a clear story. Can you put the results
into something a human can wrap his head around, as in "the difference
results in the VFX version taking <X> seconds|minutes|hours longer to
execute on a data size of <Y>"?

If it is only a second or two for 100 MILLION iterations, then ... sure it
is faster, but the difference PER CYCLE is so small that the difference will
never be noticed in practice.

At the end of the day though, knowing which is faster IS valuable, because
it never hurts to do things as efficiently as possible.

alex

unread,
Oct 28, 2016, 2:39:11 PM10/28/16
to
On 28/10/16 03:12, hughag...@gmail.com wrote:
> On Thursday, October 27, 2016 at 6:07:49 AM UTC-7, alex wrote:
>> On 23/10/16 04:13, hughag...@gmail.com wrote:
>>> On Friday, October 21, 2016 at 3:49:55 AM UTC-7, trans wrote:
>>>> P.S. I know what it is to feel like the outsider, having strong opinions that differ from the mainstream group. It can bw very frustrating! But chin up, I for one like these little lib. Why reinvent the wheel?
>>>
>>> I'm not reinventing the wheel. The memory-management system in string-stack.4th is my own invention. Nobody else has ever done anything like this.
>>>
>>> The history here is that Mark Wills suggested to me that I write a string-stack package that used the heap. My original plan was pretty simple --- every time that a string is duplicated (with DUP$ OVER$ etc.) I would ALLOCATE a new string and copy the old string to the new string --- this way, any changes made to one string won't affect the other string.
>>>
>>> This original plan is actually quite slow however, so it wasn't acceptable to me. My invention was the "unique" and "derivative" strings. A derivative string is just a reference to the unique string, so ALLOCATE is not necessary, and copying the string is not necessary. Only if the derivative string is going to change does it get converted into a unique string. Also, if a unique is going to change, then all of its derivatives get converted into uniques so they don't get affected. A lot of what you do with strings doesn't change them, so in many cases it is not necessary to convert derivatives into uniques. For example, printing out a string doesn't change it.
>>>
>>> I've never looked at any other string packages so I don't really know what they are doing, but I feel confident that my string-stack.4th is far superior to them --- I think up ideas, and they work --- nobody else is doing any thinking at all; I don't need to look at their stuff.
>>>
>>
>> The technique is very well known. It even has a name; COW. Copy On Write.
>
> I had never heard of Copy On Write --- I looked it up right now though and it does seem to be pretty much the same as what I'm doing.
>
> If this is so "well known," why hasn't it ever been done in Forth before?
>
> The reason why this technique, and other good ideas, have never been done in Forth before is that ANS-Forth is the "Standard." There aren't going to be any good ideas in Forth-200x either, because Forth-200x is mandated to be 100% compatible with ANS-Forth, and because all of the committee members are appointed by Elizabeth Rather.
>
> The whole idea with Straight Forth is to escape from the dumb-as-dirt mentality that permeates Elizabeth Rather's toy language. I'm really never going to become an ANS-Forth or Forth-200x programmer --- that is just shameful --- that is like voluntarily signing up for the Special-Ed class.
>

You are confusing implementation with definition. It's a common problem
in standards work; it's not the job of the standard to tell you how to
do things. Straight Forth appears to do quite a bit of telling you how
to do things.

> There are a lot of good ideas in computer science. Quotations are one
of them. This COW for strings is another one. There are others. I'm not
saying that I can think up everything myself --- I'm saying that in
Straight Forth good ideas will be made a part of the Standard.
>

You don't read widely enough. On several occasion you have claimed to
have discovered or invented something, generally a technique or idea
that has been know for many years.

HAA

unread,
Oct 28, 2016, 8:18:14 PM10/28/16
to
Anton Ertl wrote:
> "HAA" <som...@microsoft.com> writes:
> >Anton Ertl wrote:
> >> And at the last meeting we standardized "1 chars = 1". So if the
> >> address unit is a byte, /STRING works on bytes.
> ...
> >That option already existed under Forth-94.
>
> Yes, it was an option, and all popular systems implemented this
> option, and many programs rely on it. I.e., it is common practice,
> and that's why we standardized it.

No. It may be a fact that on most systems "1 chars = 1" but that doesn't
give carte blanche to programmers to write 'standard programs' in a
manner that disregards datatype. ANS was explicit that certain functions
should handle characters, while others cells, address units and so on.

I don't write professionally so none of this affects me. I don't even bother
to document code for my own benefit. I get away with it because chances
are nobody will read my code and should anyone find it necessary, they will
eventually work it out. But that isn't what Forth Standards are about, is it.
A 'Standard Forth Program' is meant to be professional and readable as
well as portable. ANS went to pains to get Forth programmers to write code
that is clear, consistent and informative. I think it's a mistake to wind back
the clock.



hughag...@gmail.com

unread,
Oct 28, 2016, 11:26:31 PM10/28/16
to
On Friday, October 28, 2016 at 11:39:11 AM UTC-7, alex wrote:
> On 28/10/16 03:12, hughag...@gmail.com wrote:
> > The reason why this technique, and other good ideas, have never been done in Forth before is that ANS-Forth is the "Standard." There aren't going to be any good ideas in Forth-200x either, because Forth-200x is mandated to be 100% compatible with ANS-Forth, and because all of the committee members are appointed by Elizabeth Rather.
> >
> > The whole idea with Straight Forth is to escape from the dumb-as-dirt mentality that permeates Elizabeth Rather's toy language. I'm really never going to become an ANS-Forth or Forth-200x programmer --- that is just shameful --- that is like voluntarily signing up for the Special-Ed class.
> >
>
> You are confusing implementation with definition. It's a common problem
> in standards work; it's not the job of the standard to tell you how to
> do things. Straight Forth appears to do quite a bit of telling you how
> to do things.

I know that a standard is supposed to standardize behavior rather than implementation. In my document, I don't say anything about the unique/derivative strings --- I just say that the string-stack is supposed to behave as if every string on the stack were unique. Similarly with chains I use a made-up name "chain" rather than specify how it is implemented (it is an AVL tree with an UP pointer) --- I take pains to document the behavior rather than the implementation --- chains could be implemented with doubly-linked lists too (this would make LEFT and RITE etc. faster, but SEARCH and INSERT etc. slower).

As for telling people what data-structures to use, I don't see a problem with that. Factor has "sequences," Lisp has lists --- Straight Forth has chains --- I learned Factor and I was impressed by how much readability was improved when everybody used sequences. In Straight Forth novice programmers use chains and don't implement their own data-structures, but intermediate programmers are also encouraged to use chains and not implement their own data-structures (although, being intermediate, they know how to write HOFs so they can implement their own data-structures). There are no advanced programmers in Straight Forth --- not even me; I use chains too --- I don't think it is useful to have advanced programmers because they would write programs that nobody else could understand (the other side of the coin is that most advanced programmers don't think high-level languages are useful, but they just program in assembly-language).

Readability is also improved by standardizing. In ANS-Forth there are multiple string-stack implementations (I have only looked at Mark Wills'), but all of them are different because there is no standard. I described MID$ and INNER$ above. Are these the best names for these functions? I don't care! They are reasonable names, so I'm going with that. The important point is that every Straight Forth implementation uses the same names --- even if they were goofy names such as Lisp's CAR and CDR it is still better to standardize something, rather than have everybody do everything different from one program to the next. Maybe red, amber and green are not the best colors for stop-lights and some people argue that green, purple and orange would be better, but we have to standardize on something or chaos will result. Also, we should strive to be reasonably compatible with everybody else in the world --- we don't want to be like the English who purposely make their standards opposite from the rest of the world in a misguided effort to prove that they are smarter than everybody else (for example: the rifling in an Enfield barrel turns in the opposite direction of everybody else's rifles, although a bullet is going to fly just as straight whether it is rotating clock-wise or counter-clock-wise).

In addition to improving readability, having certain data-structures built-in and standardized allows them to be implemented efficiently. All the words associated with chains are written in assembly-language. Similarly, the string-stack is written in assembly-language (btw: I have gotten rid of the REG and ACC user-registers; the two registers I had used for that are now dedicated to being the ptr and count for the top value of the string-stack).

> > There are a lot of good ideas in computer science. Quotations are one
> of them. This COW for strings is another one. There are others. I'm not
> saying that I can think up everything myself --- I'm saying that in
> Straight Forth good ideas will be made a part of the Standard.
> >
>
> You don't read widely enough. On several occasion you have claimed to
> have discovered or invented something, generally a technique or idea
> that has been know for many years.

I don't think there are "several occasions." You may be referring to when John Passaniti was telling everybody that my symtab.4th was a Splay Tree, but it isn't.

It is true though that I don't have any education. I remember when I was 19 or 20 that I invented DeMorgan's Theorum --- I was very proud of myself --- only much later did I find out that this has been well known since the late Middle Ages and it is described in every novice-level book on probability. I'm still interested in probability, but I've read some books now so I have my terminology consistent with everybody else --- my LowDraw.4th program was my first-ever ANS-Forth program --- I thought I could become an ANS-Forth programmer when I wrote that, but it was only later when I wrote my second ANS-Forth program (symtab.4th) that Elizabeth Rather and Bernd Paysan told me that I could never be an ANS-Forth programmer (they consider John Passaniti to be a true ANS-Forth programmer, despite the fact that he has never written any ANS-Forth code and knows almost nothing about computer-science).

I may be ignorant of some computer-science, but I'm not an ignoramus. I don't make utterly stupid statements such as:
----------------------------------------------------------------------------
Truly, I don't see the point of all this paranoia about reentrancy, particularly in
systems without a multitasker. VARIABLE was *designed* for things that would be used in
multiple definitions. I remember the horror of (*gasp*) global variables back in the 1960's
in Fortran. But, frankly, in 30+ years of programming in Forth I never found them a problem.
I think this is all about mythical dragons.
----------------------------------------------------------------------------

So, I may be ignorant of some things (I had never heard of this COW idea before I thought it up myself) --- but I'm willing to learn --- there is a big difference between being ignorant and being an ignoramus.

I still have hope that Forth can become successful --- Elizabeth Rather must be kicked out of the Forth community --- after that, respect in the programming world becomes possible.

Straight Forth is designed for writing programs. You have to have data-structures in order to write programs. If you don't have data-structures, then you are limited to writing toy programs such as appeared in the "Starting Forth" book. Note that all the Forth code Elizabeth Rather has posted on comp.lang.forth was copied directly out of "Starting Forth." She says:

----------------------------------------------------------------------------
People are much too phobic about [global] variables. The cry, "They aren't re-entrant"
simply means you have to look at the whole application and how it is organized.
----------------------------------------------------------------------------

Looking at the "whole application" only works for toy programs that are small enough to glanced over quickly (less than 40 lines of code) --- also, glancing over the "whole application" only works if you have all of the source-code, which means that you can't have any code-libraries.

ANS-Forth is a toy language --- it isn't any good for writing programs --- this is why everybody abandoned Forth in 1994.

With Straight Forth though, the Forth language can be resurrected.

Andrew Haley

unread,
Oct 29, 2016, 5:09:54 AM10/29/16
to
HAA <som...@microsoft.com> wrote:
> Anton Ertl wrote:
>> "HAA" <som...@microsoft.com> writes:
>> >Anton Ertl wrote:
>> >> And at the last meeting we standardized "1 chars = 1". So if the
>> >> address unit is a byte, /STRING works on bytes.
>> ...
>> >That option already existed under Forth-94.
>>
>> Yes, it was an option, and all popular systems implemented this
>> option, and many programs rely on it. I.e., it is common practice,
>> and that's why we standardized it.
>
> No. It may be a fact that on most systems "1 chars = 1" but that doesn't
> give carte blanche to programmers to write 'standard programs' in a
> manner that disregards datatype.

It does now. "CHARS" was almost always nonsensical anyway.

> ANS was explicit that certain functions should handle characters,
> while others cells, address units and so on.

Time moves on: an address unit is a byte, and a byte is large enough
to hold a primitive character.

> I don't write professionally so none of this affects me. I don't
> even bother to document code for my own benefit. I get away with it
> because chances are nobody will read my code and should anyone find
> it necessary, they will eventually work it out. But that isn't what
> Forth Standards are about, is it. A 'Standard Forth Program' is
> meant to be professional and readable as well as portable. ANS went
> to pains to get Forth programmers to write code that is clear,
> consistent and informative.

Perhaps, but I don't believe that's relevant. Forth has always been
about simplicity and economy. Sprinkling no-op words like CHARS over
programs never went any way towards that goal. I believe that words
which (in must cases) literally do nothing are antithetical to Forth.
Leave the syntactic sugar to languages whose users like that stuff.

Andrew.

Anton Ertl

unread,
Oct 29, 2016, 6:49:07 AM10/29/16
to
Chris Curl <ccur...@gmail.com> writes:
>On Friday, October 28, 2016 at 1:09:47 PM UTC-4, Anton Ertl wrote:
>> I have now done a benchmark: ten copies of the "1 /STRING" code, but
>> with different memory locations for the second stack items to avoid
>> the dependence-through-memory slowdown that would probably swamp all
>> other effects. Here are two copies:
>>
>> VFX-like SF-like
>> movq (%rdi), %rax addq $1, 0(%rdi)
>> addq $1, %rax subq $1, %rsi
>> movq %rax, (%rdi) addq $1, 8(%rdi)
>> subq $1, %rsi subq $1, %rsi
>> movq 8(%rdi), %rax
>> addq $1, %rax
>> movq %rax, 8(%rdi)
>> subq $1, %rsi
>>
>> 100M iterations of these ten copies take the following number of cycles:

Now including Skylake:

VFX-like SF-like
1537853028 1167890963 Phenom II X2 560
1105969806 1021509951 Core i7 4690K (Haswell)
1102112502 1010674749 Core i7 6700K (Skylake)

>> So the SwiftForth-like code is faster than the VFX-like code on both
>> the Phenom II and the Haswell.
...
>Sure, the numbers are smaller for the SF-like version, but those numbers
>are so big that they don't tell me a clear story. Can you put the results
>into something a human can wrap his head around, as in "the difference
>results in the VFX version taking <X> seconds|minutes|hours longer to
>execute on a data size of <Y>"?

It depends on how /STRING is used in the application. An application
that calls 1 /STRING 1e12 times in a way that is similar (wrt
dependences and resource consumption) to the microbenchmark will run
22s faster on a Core i7-6700K in turbo mode (4.2GHz) and 112s faster
on the Phenom II X2 560 (3.3GHz) with the SF-like variant than with
the VFX-like variant. I doubt that is really very informative, though.

Another way to look at it is, that if you ignore this kind of
difference everywhere else, your programs will run slower by a factor
of 1.08 (Haswell) ... 1.32 (Phenom II) than if you always go for the
faster code.

>If it is only a second or two for 100 MILLION iterations, then ... sure it
>is faster, but the difference PER CYCLE is so small that the difference will
>never be noticed in practice.

It's more like 0.01s on Phenom II and less on the Intel chips.
However, if you have a 1 /STRING in an inner loop (as can easily
happen), you can easily get more than 100M executions in a program,
and you can get a noticable (Phenom II) or at least measurable
difference even in an application.

>At the end of the day though, knowing which is faster IS valuable, because
>it never hurts to do things as efficiently as possible.

Sometimes it has it's cost, but knowing which is faster by how much
let's you make an informed decision on whether the speedup is worth
the cost.

Anton Ertl

unread,
Oct 29, 2016, 12:49:41 PM10/29/16
to
"HAA" <som...@microsoft.com> writes:
>Anton Ertl wrote:
>> "HAA" <som...@microsoft.com> writes:
>> >Anton Ertl wrote:
>> >> And at the last meeting we standardized "1 chars = 1". So if the
>> >> address unit is a byte, /STRING works on bytes.
>> ...
>> >That option already existed under Forth-94.
>>
>> Yes, it was an option, and all popular systems implemented this
>> option, and many programs rely on it. I.e., it is common practice,
>> and that's why we standardized it.
>
>No. It may be a fact that on most systems "1 chars = 1" but that doesn't
>give carte blanche to programmers to write 'standard programs' in a
>manner that disregards datatype.

Certainly these programs were not standard programs; they could be
labeled as "standard programs with an environmental dependency on 1
chars = 1". With the next standard, these programs will be standard
programs.

>ANS was explicit that certain functions
>should handle characters, while others cells, address units and so on.

Yes. And it turned out that differentiating between chars and address
units was a freedom for systems that no widely-used system exploited;
and on the program side, most programmers chose to ignore this
theoretical difference. But even those that did not cannot be sure
that their program does not have that environmental dependency (in
Forth-94 and -2012), because they have no way to test their programs
on a system that does not satisfy the environmental dependency.

>A 'Standard Forth Program' is meant to be professional and readable as
>well as portable. ANS went to pains to get Forth programmers to write code
>that is clear, consistent and informative.

I don't see any particular move in Forth-94 towards requiring
additional words for informative purposes only.

The introduction of CHARS and CHAR+ did not happen for informative
purposes, but for abstracting the character size in the same way that
CELLS and CELL+ abstracts cell size (which became necessary due to the
move from 16-bit systems towards larger cell sizes). The reasons why
they thought that we would need to abstract the character size were:

1) Nybble-addressed machines, where 8-bit chars would take 2 aus.

2) Unicode, which around 1990 looked to be a 2-byte fixed-width
character set (UCS-2).

However, looking at it 22 years later, no Forth-94 implementation for
nybble-addressed machines is known to me, and it's unlikely that we
will ever see one for the next standard; and it turned out that UCS-2
is not good enough for Unicode, that we can work with Unicode nicely
using 8-bit units, and as a result, the move towards wider chars that
was the reason for introducing CHARS and CHAR+ has not happened in
Forth systems.

Anyway, if you feel the need to use these unnecessary abstractions in
your programs, they are still there, and nobody has proposed removing
them, so you are free to use these words to write what you consider
professional, readable, clear, consistent and informative code.

hughag...@gmail.com

unread,
Oct 29, 2016, 8:11:41 PM10/29/16
to
On Saturday, October 29, 2016 at 9:49:41 AM UTC-7, Anton Ertl wrote:
> The introduction of CHARS and CHAR+ did not happen for informative
> purposes, but for abstracting the character size in the same way that
> CELLS and CELL+ abstracts cell size (which became necessary due to the
> move from 16-bit systems towards larger cell sizes). The reasons why
> they thought that we would need to abstract the character size were:
>
> 1) Nybble-addressed machines, where 8-bit chars would take 2 aus.
>
> 2) Unicode, which around 1990 looked to be a 2-byte fixed-width
> character set (UCS-2).
>
> However, looking at it 22 years later, no Forth-94 implementation for
> nybble-addressed machines is known to me, and it's unlikely that we
> will ever see one for the next standard; and it turned out that UCS-2
> is not good enough for Unicode, that we can work with Unicode nicely
> using 8-bit units, and as a result, the move towards wider chars that
> was the reason for introducing CHARS and CHAR+ has not happened in
> Forth systems.

This is a typical example of how the ANS-Forth and Forth-200x committees get bogged down in nonsense.

In Straight Forth the programmer doesn't generally have access to chars. The programmer only works with strings. Why would anybody want to have access to a char? What does this do for you that a string with length of 1 does not do? If you are working only with strings, you have a lot of functions that do useful things --- there are no functions that do anything with chars.

A Straight Forth system would normally use UTF-32. It would be possible to have a Straight Forth system that uses one of the 8-bit ascii extension though, and it would have to declare this fact. Programs should compile and run just fine under all of these systems --- the only problem would be if one of your literal strings in the source-code contains a char that is not represented in the underlying character scheme (for example: pi is not in ascii).

> Anyway, if you feel the need to use these unnecessary abstractions in
> your programs, they are still there, and nobody has proposed removing
> them, so you are free to use these words to write what you consider
> professional, readable, clear, consistent and informative code.

Nobody on the ANS-Forth committee knows what "abstraction" means. CHARS was just nonsense. The Forth-200x committee doesn't know either, because they are requiring 8-bit chars on the theory that this is what most users need, and to hell with those who need 16-bit or 32-bit chars.

Straight Forth abstracts away the character format (UTF-32 UTF-8 extended-ascii, etc.) by only making strings available to the user --- the string is an abstraction of the concept: "array of chars" --- this is how abstraction works!

It is possible in Straight Forth for a programmer to load a string into PTR and CNT and directly access the chars, in which case the programmer obviously needs to know what format the chars are, and needs to know how big the chars are. Programmers are strongly discouraged from writing such low-level code unless absolutely necessary --- for ordinary application programs, the programmer just works with strings and doesn't worry about what the underlying character format is or how big the chars are --- leave low-level code to the code-library writers (code libraries have dependencies).

HAA

unread,
Oct 31, 2016, 4:15:25 AM10/31/16
to
Anton Ertl wrote:
> ...
> The introduction of CHARS and CHAR+ did not happen for informative
> purposes,

How do you know? It can't be ruled out.

> but for abstracting the character size in the same way that
> CELLS and CELL+ abstracts cell size (which became necessary due to the
> move from 16-bit systems towards larger cell sizes). The reasons why
> they thought that we would need to abstract the character size were:
>
> 1) Nybble-addressed machines, where 8-bit chars would take 2 aus.
>
> 2) Unicode, which around 1990 looked to be a 2-byte fixed-width
> character set (UCS-2).

I'm sure these were issues. The practical consequence of Forth-94
introducing mnemonics for the two most used datatypes in Forth was
portability and readability. Forth had been dogged by complaints of
'write-only' code and ANS helped address that.

> Anyway, if you feel the need to use these unnecessary abstractions in
> your programs, they are still there, and nobody has proposed removing
> them, so you are free to use these words to write what you consider
> professional, readable, clear, consistent and informative code.

The fact is CHAR+ CHARS are necessary and must remain in CORE so
systems can read/write portable code. 200x needs to explain to users
what it hopes to gain by allowing coding such as this:

: nextword \ ad -- ad'
dup c@ + 1+ ;

Forth-94 would require it be written:

: nextword \ ad -- ad'
dup c@ 1+ CHARS + ;

It's not just the addition of CHARS that's different, the logic has changed
for the better.

(no offence to the original author intended)



Anton Ertl

unread,
Oct 31, 2016, 5:02:26 AM10/31/16
to
"HAA" <som...@microsoft.com> writes:
>Anton Ertl wrote:
>> ...
>> The introduction of CHARS and CHAR+ did not happen for informative
>> purposes,
>
>How do you know? It can't be ruled out.

It says so in the Forth-94 document. From E.2.3:

|However, there are cases where a character is larger than an address
|unit. Examples include (1) systems with small address units (e.g.,
|bit- and nibble-addressed systems), and (2) systems with large
|character sets (e.g., 16-bit characters on a byte-addressed
|machine). CHAR+ and CHARS operators, analogous to CELL+ and CELLS are
|available to allow maximum portability.

>> Anyway, if you feel the need to use these unnecessary abstractions in
>> your programs, they are still there, and nobody has proposed removing
>> them, so you are free to use these words to write what you consider
>> professional, readable, clear, consistent and informative code.
>
>The fact is CHAR+ CHARS are necessary and must remain in CORE so
>systems can read/write portable code.

Systems don't write code. And programs using CHAR+ CHARS are not any
more portable in practice than code using 1+ and nothing.

>200x needs to explain to users
>what it hopes to gain by allowing coding such as this:
>
>: nextword \ ad -- ad'
> dup c@ + 1+ ;

More standard programs; another benefit is that this reflects the
mainstream of Forth usage in the standard, so that system implementors
are not misled into thinking that it's a good idea to choose UTF-16 as
char encoding on byte-addressed machines.
0 new messages