Naming for parsing words

581 views
Skip to first unread message

Ruvim

unread,
May 17, 2022, 12:23:57 PMMay 17
to
In my post [1] in ForthHub I compare the different variants for naming
parsing words.

Namely, I mean the words for which compilation semantics include
scanning/parsing the input stream, and, if interpretation semantics are
defined for the word, they include parsing too. Especially, when the
word parses one or several lexemes.

Can we have a convention for naming parsing words?

What is your considerations?


[1] Naming for parsing words
https://github.com/ForthHub/discussion/discussions/112

--
Ruvim

S Jack

unread,
May 17, 2022, 2:13:01 PMMay 17
to
On Tuesday, May 17, 2022 at 11:23:57 AM UTC-5, Ruvim wrote:
> Can we have a convention for naming parsing words?
>
> What is your considerations?
>
>
> [1] Naming for parsing words
> https://github.com/ForthHub/discussion/discussions/112

Bah, that a parsing word with immediate parameter is 'better readable' than
a postfix operator. Most definitions contain postfix words and mixing in
parsing words is style conflict so I avoid parsing words. May mean I miss some
peephole optimizing opportunities but that doesn't impact me. If I had
to have a parsing word I would choose the form 'foo( parm )' and I won't be
confusing it as a comment.
--
me

none albert

unread,
May 17, 2022, 3:47:33 PMMay 17
to
In article <t60i6r$7m0$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>In my post [1] in ForthHub I compare the different variants for naming
>parsing words.
>
>Namely, I mean the words for which compilation semantics include
>scanning/parsing the input stream, and, if interpretation semantics are
>defined for the word, they include parsing too. Especially, when the
>word parses one or several lexemes.
>
>Can we have a convention for naming parsing words?
>
>What is your considerations?

I am a proponent of outlawing parsing words, with an exception for
denotations, say constants.
E.g 'DROP or "AAP" are parsed by ' and " and generate a constant,
independant of interpretation of compilation mode.
That made it possible to restrict the inspection of STATE to
where a word is interpreted or compiled.
The compilation STATE decides whether to compile it,
adding LIT or FLIT, or leave it on the stack.
Don't get upset, I don't propose it to the standard.

Parsing words can be handy in special purpose languages,
to honour expectations from users. That is a different matter.
A simple convention to end the parsing words with ':'.

FROM floating-point IMPORT: F* F** FLOG PI

>
>[1] Naming for parsing words
>https://github.com/ForthHub/discussion/discussions/112
>
>--
>Ruvim

Groetjes Albert
--
"in our communism country Viet Nam, people are forced to be
alive and in the western country like US, people are free to
die from Covid 19 lol" duc ha
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

dxforth

unread,
May 17, 2022, 11:00:14 PMMay 17
to
I've tended to use /xxx to mean 'extract' e.g. /STRING /SIGN

Any confusion with file path is just the nature of forth and the former
should be in quotes anyway.

Hans Bezemer

unread,
May 18, 2022, 7:38:50 AMMay 18
to
On Wednesday, May 18, 2022 at 5:00:14 AM UTC+2, dxforth wrote:
> On 18/05/2022 02:23, Ruvim wrote:
> > In my post [1] in ForthHub I compare the different variants for naming
> > parsing words.
> > Can we have a convention for naming parsing words?
> > What is your considerations?
Well, I think you mean by parsing - cutting up the TIB. In that case,
we already have PARSE, PARSE-NAME (and 4tH has got its own "PARSE-WORD").

So, when designing this lib I took that into account:
https://sourceforge.net/p/forth-4th/code/HEAD/tree/trunk/4th.src/lib/parsing.4th

BTW, if you try to compile it - it's 4tH, not Forth, so your mileage may vary.

Hans Bezemer

Ruvim

unread,
May 18, 2022, 10:30:45 AMMay 18
to
On 2022-05-17, S Jack wrote:
> On Tuesday, May 17, 2022 at 11:23:57 AM UTC-5, Ruvim wrote:
>> Can we have a convention for naming parsing words?
>>
>> What is your considerations?
>>
>>
>> [1] Naming for parsing words
>> https://github.com/ForthHub/discussion/discussions/112
>
> Bah, that a parsing word with immediate parameter is 'better readable' than
> a postfix operator. Most definitions contain postfix words and mixing in
> parsing words is style conflict so I avoid parsing words.

Do you avoid the standard parsing words?
For example: "[']" "postpone" 's"' 'abort"'
And what about defining words?

I'm wondered why people continue to use parsing words if they dislike them.

Why not introduce new words like:

:def ( sd.name -- ) ( C: -- colon-sys )

does-created ( xt sd.name -- )

To use them as:

`foo :def 123 . ;

[: @ . ;] `bar does-created 123 ,

foo bar \ prints "123 123"


And after that, what to do with "[if]" and "[undefined]"?



> May mean I miss some
> peephole optimizing opportunities but that doesn't impact me. If I had
> to have a parsing word I would choose the form 'foo( parm )' and I won't be
> confusing it as a comment.


123 constant( foo )
create( bar ) 456 ,

:( baz ) ( -- x ) foo postpone( foo bar ) ;

:( test-baz ) :( baz2 ) baz drop postpone( @ ; ) ;

t{ test-baz baz2 -> 123 456 }t


Hm.. why not.


--
Ruvim

Anton Ertl

unread,
May 19, 2022, 11:23:39 AMMay 19
to
Ruvim <ruvim...@gmail.com> writes:
>Do you avoid the standard parsing words?
>For example: "[']" "postpone" 's"' 'abort"'

I avoid these unless there is some reason not to. In particular:

Instead of ['] FOO, I write `FOO. The latter can be copy-pasted into
interpretive code.

Instead of POSTPONE FOO, I write ]] FOO [[. Especially nice for
multiple words.

Instead of S" BLA", I write "BLA".

I don't use ABORT", not the least because I always have to look up the
directiom of the flag. Instead, I use THROW. If I need a new ball, I
create it with

"new ball" exception constant new-ball

>And what about defining words?

I tend to use these. They have default compilation semantics and one
rarely wants to copy-paste them between compiled and interpreted code.
Of course, in those rare cases (i.e., when debugging a defining word),
I wish that they took their name argument from the stack.

>I'm wondered why people continue to use parsing words if they dislike them.

In the four cases above, I do it when writing code that should work on
Forth systems that do not understand the better idioms, or when
demonstating something to an audience that may not be familiar with
the better idioms, and these idioms would distract from the point I am
trying to demonstrate.

>Why not introduce new words like:
>
> :def ( sd.name -- ) ( C: -- colon-sys )
>
> does-created ( xt sd.name -- )

The question is if the benefit is worth the cost in these cases.
Cost: additional words (because we don't want to destandardize all
existing code). Benefit: rare, as mentioned above.

>To use them as:
>
> `foo :def 123 . ;
>
> [: @ . ;] `bar does-created 123 ,
>
> foo bar \ prints "123 123"

Why `FOO, not "FOO"?

>And after that, what to do with "[if]" and "[undefined]"?

And \ and (.

> 123 constant( foo )
> create( bar ) 456 ,
>
> :( baz ) ( -- x ) foo postpone( foo bar ) ;
>
> :( test-baz ) :( baz2 ) baz drop postpone( @ ; ) ;
>
> t{ test-baz baz2 -> 123 456 }t
>
>
>Hm.. why not.

Why yes? And please explain the definition of TEST-BAZ and BAZ2.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

S Jack

unread,
May 19, 2022, 1:20:59 PMMay 19
to
On Wednesday, May 18, 2022 at 9:30:45 AM UTC-5, Ruvim wrote:
> Do you avoid the standard parsing words?

No. For new words where the choice is to make it parsing or postfix I choose
postfix. I've re-defined some existing parsing words to be postifx such as
FORGET and SEE:
' foo FORGET
' foo SEE
But I don't go for purity which usually leads to abominations. Note in
above tick is acceptable. It's a matter of using exceptions sparingly and
where most effective. That's the art and of course not everyone is going
to agree on the choices.
But back to your original what should be standard convention for parsing
word syntax, my view:

General choices
1) foo bar bat
No syntax
One must know what foo bar and bat are.
2) foo: bar bat
Syntax indicates foo: a parsing word with bar as immediate parameter
but bat is undetermined, could be a second parameter to foo or an
operator.
3) foo( bar bat )
Syntax indicates foo( is parsing word and has two immediate parameters
bar and bat.

Choice (1) should be preferable to the Forth purist (DX, the.Bee).
Don't waste time worrying over syntax schemes.

Choice (2) proposed by Albert which works for me is a simple syntax for
parsing words, sufficient since our use of parsing words will be limited.

Choice (3) is total explicit and should fit well in a more formal Forth which
is standard Forth.

I think choice (3) is best for the standard. I'll probably be using choice
(2) but that doesn't mean I'm changing ' to ': .
--
me

dxforth

unread,
May 20, 2022, 1:14:22 AMMay 20
to
On 19/05/2022 00:30, Ruvim wrote:
> On 2022-05-17, S Jack wrote:
>>
>> Bah, that a parsing word with immediate parameter is 'better readable' than
>> a postfix operator. Most definitions contain postfix words and mixing in
>> parsing words is style conflict so I avoid parsing words.
>
> Do you avoid the standard parsing words?
> For example: "[']" "postpone" 's"' 'abort"'
> And what about defining words?
>
> I'm wondered why people continue to use parsing words if they dislike them.

But do they [beyond the few that already exist]? A few may enjoy creating
new parsing words (like the few that enjoy creating macros) but I wouldn't
say either was intrinsic to Forth, or even popular. If creating new parsing
words were popular, wouldn't there already be a convention for it?

https://pastebin.com/qpZLFc6h

none albert

unread,
May 20, 2022, 3:48:46 AMMay 20
to
In article <t62vui$1mi$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>On 2022-05-17, S Jack wrote:
>> On Tuesday, May 17, 2022 at 11:23:57 AM UTC-5, Ruvim wrote:
>>> Can we have a convention for naming parsing words?
>>>
>>> What is your considerations?
>>>
>>>
>>> [1] Naming for parsing words
>>> https://github.com/ForthHub/discussion/discussions/112
>>
>> Bah, that a parsing word with immediate parameter is 'better readable' than
>> a postfix operator. Most definitions contain postfix words and mixing in
>> parsing words is style conflict so I avoid parsing words.
>
>Do you avoid the standard parsing words?
>For example: "[']" "postpone" 's"' 'abort"'
>And what about defining words?
>
>I'm wondered why people continue to use parsing words if they dislike them.
>
>Why not introduce new words like:
>
> :def ( sd.name -- ) ( C: -- colon-sys )
>
> does-created ( xt sd.name -- )
>
>To use them as:
>
> `foo :def 123 . ;
>
> [: @ . ;] `bar does-created 123 ,
>
> foo bar \ prints "123 123"

Then I would prefer:
[: "hello world" TYPE ;] : hello
or even c++/java/.. compatible:
{ "hello world" TYPE } : hello

<SNIP>
>
> t{ test-baz baz2 -> 123 456 }t

Test words benefit from 2 separate code sequences plugged in.
I use
REGRESS test-baz baz2 S: 123 456 <EOL>
>
>
>Hm.. why not.
>
>--
>Ruvim

Hans Bezemer

unread,
May 20, 2022, 7:22:59 AMMay 20
to
On Thursday, May 19, 2022 at 5:23:39 PM UTC+2, Anton Ertl wrote:

Nice to see that "S Jack" puts me in the realm of "Forth purists", where
4tH cannot be classified as "pure" by any measure - including the words
it supports - which are in part just there to facilitate the particular 4tH
architecture, but also stuff like ">ZERO" ( n -- 0), "STOW" ( n1 n2 -- n1 n1 n2),
"EXCEPT" (like: 0= WHILE), "UNLESS" (like: 0= IF) and ";THEN" (like: EXIT THEN).

But I'm inasmuch a purist where I'd like to keep things simple and clear -
even a little bit less abstract.

E.g. I'd like the "three rule engine" intact, which says:
(1) If it's a word, execute it;
(2) If it's not a word, convert it to a number;
(3) If it's not a number either, it's an error.

> Instead of ['] FOO, I write `FOO. The latter can be copy-pasted into
> interpretive code.
> Instead of S" BLA", I write "BLA".
.. which (like prefixed numbers) violate the rule "keep it simple", since
it requires me to evaluate what I've parsed before I pull the trigger.

I literally try to find the word. FAIL: I literally convert it. FAIL: I throw
an exception. I don't have to go looking for prefixed ', ", #, $ or whatever.
"Simplicity" means "maintainability". "Maintainability" means "less bugs".
(See: "Does Software Decay").

> Instead of POSTPONE FOO, I write ]] FOO [[. Especially nice for
> multiple words.
I think that it's a beautiful solution - although due to 4tH's architecture,
it does carry very little significance to that particular project.

> >Why not introduce new words like:
> Why yes? And please explain the definition of TEST-BAZ and BAZ2.
I think this comes form a desire to "beautify the language" - may be
by even looking at other "beautiful languages", without really appreciating
Forth's inherent philosophy. And that's KISS. And IMHO every generally
accepted proposition HAS to adhere to that philosophy.

I have nothing against coming up with new stuff that make the language
more useful or readable - but the starting point must always be:
- What problem am I exactly solving here;
- How do I express (implement) it in a Forth-like way ;
- How does this actual solution improve the way we're reading and writing
Forth programs.

Starting with (a ripped off) idea about "syntactic sugar" and then try to
squeeze it brute force in the most horrible Forth code, is IMHO the very
worst starting point - even from a engineering point of view.

P.S. I know I'm reacting here to different messages from different persons.
But I'm not addressing particular persons here, but particular ideas.

Hans Bezemer

Hans Bezemer

unread,
May 20, 2022, 7:47:02 AMMay 20
to
On Wednesday, May 18, 2022 at 4:30:45 PM UTC+2, Ruvim wrote:
> Do you avoid the standard parsing words?
> For example: "[']" "postpone" 's"' 'abort"'
> And what about defining words?
In my programming: no - they're just too handy. Imagine not having
S" and having to poke in characters into a string one by one - no
matter what mechanism you throw at it. I even confiscated C"
for that reason in 4tH to avoid too many C, ;-)

But I can tell you they're a pain in 4tH without a preprocessor. In
that particular variant ALL parsing words have to be hardcoded.

And then there are those STANDARD "parsing words" that are so
braindead that the only way I WANT to support them is by only
supporting them in that preprocessor.

.. like ACTION-OF (needless: DEFER@ can do that);
.. like BEGIN-STRUCTURE (overly complex and VERY unForth-like
compared to "0 .. CONSTANT).
.. like SYNONYM (it's much easier to look up the behavior first and,
if that succeeds, make the proper dictionary entry - instead of
making a dictionary entry and then fail at the most crucial moment).

The latter could even have been better if they followed the ALIAS rule

' FOO ALIAS BAR

Because that is in essence what you're doing and IMHO clearer than

SYNONYM BAR FOO

4tH supports a parsing AKA which does EXACTLY that:

AKA FOO BAR

Although (also from readability), this is IMHO much clearer:

' FOO AKA BAR

But the former is found at least in SOME Forths, so almost COMUS. ;-)

Hans Bezemer



none albert

unread,
May 20, 2022, 9:01:57 AMMay 20
to
In article <8e02adcd-d183-4d15...@googlegroups.com>,
Hans Bezemer <the.bee...@gmail.com> wrote:
>On Wednesday, May 18, 2022 at 4:30:45 PM UTC+2, Ruvim wrote:
>> Do you avoid the standard parsing words?
>> For example: "[']" "postpone" 's"' 'abort"'
>> And what about defining words?
>In my programming: no - they're just too handy. Imagine not having
>S" and having to poke in characters into a string one by one - no
>matter what mechanism you throw at it. I even confiscated C"
>for that reason in 4tH to avoid too many C, ;-)
>
>But I can tell you they're a pain in 4tH without a preprocessor. In
>that particular variant ALL parsing words have to be hardcoded.
>
>And then there are those STANDARD "parsing words" that are so
>braindead that the only way I WANT to support them is by only
>supporting them in that preprocessor.
>
>.. like ACTION-OF (needless: DEFER@ can do that);
>.. like BEGIN-STRUCTURE (overly complex and VERY unForth-like
>compared to "0 .. CONSTANT).
>.. like SYNONYM (it's much easier to look up the behavior first and,
>if that succeeds, make the proper dictionary entry - instead of
>making a dictionary entry and then fail at the most crucial moment).
>
>The latter could even have been better if they followed the ALIAS rule
>
>' FOO ALIAS BAR

Totally agreed. But in my book:
'FOO ALIAS BAR
' is a normal word, but it has the prefix flag.
'FOO is the same constant in compilation and interpretation mode.
In this way there is no exception with numbers,
1 is the prefix that understands all numbers starting with 1,
2 with 2 etc.
(I invented this when I refused to add $ in my kernel, and
I wanted to load Marcel Hendrix programs. The solution is to
make $ a loadable extension, and the PREFIX idea was born.).
Interestingly the lookup of words in the dictionary is shorter.
Formerly (adr len adr1 len1 ) COMPARE
Now (pp adr len ) CORA (simpler compare)
Who cares what the lenght of the word is, the comparision fails
anyway, normally.

Every word say
DROP 1178 $189 'FOO "We gaan naar rome"
is found in the dictionary. No exceptions.

(I have demonstrated that with adding a few characters to
terminators, not only space and tab, you can parse Pascal.)

>
>Because that is in essence what you're doing and IMHO clearer than
>
>SYNONYM BAR FOO

I hate that too

>
>4tH supports a parsing AKA which does EXACTLY that:
>
>AKA FOO BAR
>
>Although (also from readability), this is IMHO much clearer:
>
>' FOO AKA BAR

The rule is that you add a word (BAR) to the dictionary, the
preceeding word (AKA) does the action, consuming some data.

DEFER BAR
'FOO IS BAR
is also an abomination.
This must be
'FOO 'BAR TRANSFER-EXECUTION-BEHAVIOUR
Neither FOO nor BAR is executed, so their dea ("name tokens")
should be used.

>
>But the former is found at least in SOME Forths, so almost COMUS. ;-)

>
>Hans Bezemer

Anton Ertl

unread,
May 20, 2022, 10:57:51 AMMay 20
to
albert@cherry.(none) (albert) writes:
>DEFER BAR
>'FOO IS BAR
>is also an abomination.
>This must be
>'FOO 'BAR TRANSFER-EXECUTION-BEHAVIOUR

A standard variant of this code:

' foo ' bar defer!

Anton Ertl

unread,
May 20, 2022, 12:05:05 PMMay 20
to
Hans Bezemer <the.bee...@gmail.com> writes:
>But I'm inasmuch a purist where I'd like to keep things simple and clear -
>even a little bit less abstract.
>
>E.g. I'd like the "three rule engine" intact, which says:
>(1) If it's a word, execute it;

No compilation?

>(2) If it's not a word, convert it to a number;

No compilation?

>(3) If it's not a number either, it's an error.

I guess you want a traditional text interpreter rather than the
interpret-only text interpreter you outlined above. let's take a look
at a very tradtitional one: This is taken from Ting's System's Guide
to fig-Forth:

: INTERPRET
BEGIN
-FIND
IF
STATE @ <
IF CFA ,
ELSE
CFA
EXECUTE
ENDIF
?STACK
ELSE
HERE
NUMBER
DPL @ 1+
IF
[COMPILE]
DLITERAL
ELSE
DROP
[COMPILE]
LITERAL
ENDIF
?STACK
ENDIF
AGAIN
;

Hmm, triple-nested control structure, 1 BEGIN, 3 IFs, and 3 ELSEs, not
the paragon of simplicity. Interestingly, I don't find your (3) in
there. It must be hidden im NUMBER (checking, it is). It's also not
obvious how INTERPRET terminates; that's performed with the X trick,
but I don't discuss this here further.

How about simplifying it?

1) Eliminating doubles would get rid of one IF.

2) Eliminating numbers would get rid of another IF (you now have to
hide the error handling in -FIND).

3) Eliminating compile state would eliminate the third IF.

The result would be much simpler and especially clearer (well, apart
from the error-hiding and X tricks):

: INTERPRET
BEGIN
-FIND
CFA
EXECUTE
?STACK
AGAIN
;

>"Simplicity" means "maintainability". "Maintainability" means "less bugs".
>(See: "Does Software Decay").

Then you should be delighted by this new INTERPRET.

However, there is a cost to this implementation simplicity. Where in
fig-Forth you write

: star 42 emit ;

you now have to write something like

: star num 42 literal [compile] emit ;

but that's a sacrifice you love to make in the name of "simplicity",
clarity, "maintainability" and "less bugs", no?

If not, please explain why you find the compexity of the traditional
INTERPRET to be acceptable, but not

: INTERPRET
BEGIN
PARSE-NAME DUP
WHILE
FORTH-RECOGNIZER RECOGNIZE
STATE @ IF RECTYPE>COMP ELSE RECTYPE>INT THEN
EXECUTE
?STACK \ simple housekeeping
REPEAT 2DROP
;

(from <https://forth-standard.org/proposals/recognizer#contribution-142>)

The problem with the traditional interpreter is that

* It does not recognize floats.

* Without number prefixes we get bugs in the code from using (sticky)
HEX.

* Without the tick-recognizer I get bugs from writing ' where ['] is
appropriate and vice versa, and testing compiled code interpretively
is cumbersome.

* Without the string-recognizer, we need to use S\" (or S"), and
implementing these words properly is apparently too complex for most
Forth systems.

Hans Bezemer

unread,
May 21, 2022, 6:07:29 AMMay 21
to
On Friday, May 20, 2022 at 3:01:57 PM UTC+2, none albert wrote:
> DEFER BAR
> 'FOO IS BAR
> is also an abomination.
> This must be
> 'FOO 'BAR TRANSFER-EXECUTION-BEHAVIOUR
> Neither FOO nor BAR is executed, so their dea ("name tokens")
> should be used.
You're kidding, right? I desperately hope so. Because IS is just another
TO. By seriously promoting this, you say:

23 ' FOO !

I don't think this helps readability - on the contrary.

Hans Bezemer

Hans Bezemer

unread,
May 21, 2022, 6:33:55 AMMay 21
to
On Friday, May 20, 2022 at 6:05:05 PM UTC+2, Anton Ertl wrote:
> Hans Bezemer <the.bee...@gmail.com> writes:
> No compilation?
Compilation sets STATE and enters the compiler. ";" compiles EXIT,
SMUDGEs the latest definition and reenters the interpreter.

> The problem with the traditional interpreter is that
> * It does not recognize floats.
IMHO it should not even recognize doubles. I consider it to be an add-on.
I know it's a quite radical idea and most certainly would break a LOT
of code, I consider it to be architectural more solid. You may have
another opinion.

> * Without number prefixes we get bugs in the code from using (sticky)
> HEX.
Gee, never happened to me. Maybe a programmer discipline issue? IMHO
that enables people to write Forth at all - since they don't deplete or overflow
the stack with every IF or BEGIN..REPEAT.

Neat antidote:

: base&exec base @ >r base ! execute r> base ! ;

Which allows for cool code like:

r@ [char] u = if u>d ['] (.number) 10 base&exec then
r@ [char] o = if u>d ['] (.number) 8 base&exec then
r@ [char] x = if u>d ['] (.number) 16 base&exec then

You know - Factor has got some cool idea's!

> * Without the tick-recognizer I get bugs from writing ' where ['] is
> appropriate and vice versa, and testing compiled code interpretively
> is cumbersome.
Gee, never happened to me. Maybe a programmer discipline issue? IMHO
that enables people to write Forth at all - since they don't deplete or overflow
the stack with every IF or BEGIN..REPEAT.

> * Without the string-recognizer, we need to use S\" (or S"), and
> implementing these words properly is apparently too complex for most
> Forth systems.
Never had that problem. Could be an architecture issue - idunno:

/*
This function compiles '."'.
*/

#ifndef ARCHAIC
static void DoDotQuote (void)
#else
static void DoDotQuote ()
#endif

{
CompileString (PRINT);
}


/*
This function compiles a ,".
*/

#ifndef ARCHAIC
static void DoCommaQuote (void)
#else
static void DoCommaQuote ()
#endif

{
CompileString (STRINGD);
}


/*
This function compiles a S".
*/

#ifndef ARCHAIC
static void DoSQuote (void)
#else
static void DoSQuote ()
#endif

{
CompileString (SQUOTE);
}

The reason I'm posting this is: I can't comment on the rationale
others peoples implementations. I can - however - comment on
mine. If I'd done Tings compiler, I'd probably done it differently.

E.g. I'd implemented a HIDE, so after implementing IF, I could
discard helper words from the search chain . Gee, I even think
I implemented such a thing in 4tH, I'm not sure. But I vaguely
remember it..

Hans Bezemer


Anton Ertl

unread,
May 21, 2022, 6:52:51 AMMay 21
to
Hans Bezemer <the.bee...@gmail.com> writes:
>On Friday, May 20, 2022 at 3:01:57 PM UTC+2, none albert wrote:
>> DEFER BAR
>> 'FOO IS BAR
>> is also an abomination.

When I made the DEFER proposal, I proposed IS because of common
practice, DEFER! as a non-parsing alternative, and DEFER@ in order to
get the current contents of the deferred word. Stephen Pelc suggested
a parsing alternative to DEFER@, and it became ACTION-OF.

I used to think that ACTION-OF and IS are unnecessary with DEFER@ and
DEFER!, but a few years ago we introduced defer-flavoured locals, and
with locals the idioms "['] foo defer@" and "( xt ) ['] foo defer!"
don't work, so ACTION-OF and IS are actually necessary in this
context.

Concerning VALUEs, GForth does not allow ADDR for them, so you have to
use IS to change them. And that's good, because it means that @ and !
don't access the value.

Anton Ertl

unread,
May 21, 2022, 2:11:49 PMMay 21
to
Hans Bezemer <the.bee...@gmail.com> writes:
>On Friday, May 20, 2022 at 6:05:05 PM UTC+2, Anton Ertl wrote:
>> * It does not recognize floats.
>IMHO it should not even recognize doubles. I consider it to be an add-on.

But your three rule engine had no place for add-ons.

>I know it's a quite radical idea and most certainly would break a LOT
>of code, I consider it to be architectural more solid. You may have
>another opinion.

No, it actually is the direction I prefer: have an extensible text
interpreter. And once you have that, you can make your rule (1) and
(2) extensions, too.

>Gee, never happened to me. Maybe a programmer discipline issue?
[...]
>Gee, never happened to me. Maybe a programmer discipline issue?

You can burden the programmers with some tasks, claim that you have a
simple system that supposedly reduces bugs, and dismiss all bugs
arising from that as prgrammer discipline issues (which does not make
them go away).

I prefer to avoid such issues by giving the programmers less
burdensome ways to express the programs, e.g., number prefixes and the
tick-recognizer.

>> * Without the string-recognizer, we need to use S\" (or S"), and
>> implementing these words properly is apparently too complex for most
>> Forth systems.
>Never had that problem. Could be an architecture issue - idunno:

[C code skipped]

What do you want to tell me by showing some C fragments?

>The reason I'm posting this is: I can't comment on the rationale
>others peoples implementations. I can - however - comment on
>mine.

But you posted the C fragments without any comment relevant to the
discussion at hand.

And if you studied other implementations, you could

1) learn from them.

2) comment on them.

Your comments leave the impression that you are convinced that your
implementation is the greatest, or at least close to it, but you
actually don't know other implementations.

Hans Bezemer

unread,
May 21, 2022, 5:33:42 PMMay 21
to
On Saturday, May 21, 2022 at 8:11:49 PM UTC+2, Anton Ertl wrote:
> You can burden the programmers with some tasks, claim that you have a
> simple system that supposedly reduces bugs, and dismiss all bugs
> arising from that as prgrammer discipline issues (which does not make
> them go away).
>
> I prefer to avoid such issues by giving the programmers less
> burdensome ways to express the programs, e.g., number prefixes and the
> tick-recognizer.
Try Java. You'll find it impressive. It's built EXACTLY from that perspective.
Dijkstra LOVED it.

Hans Bezemer

dxforth

unread,
May 22, 2022, 2:42:31 AMMay 22
to
On 21/05/2022 20:26, Anton Ertl wrote:
> ...
> Concerning VALUEs, GForth does not allow ADDR for them, so you have to
> use IS to change them. And that's good, because it means that @ and !
> don't access the value.

So assembler routines can't access VALUEs ?

#512 value #OUTBUF \ buffer size

code WRITECHAR ( char -- )
addr #outbuf ) ax mov outsiz ) ax cmp 1 $ jnz
c: (flushwrite) 4 ?ferror ;c
1 $: ax pop outptr ) di mov al 0 [di] mov
outptr ) inc outsiz ) inc next end-code

Anton Ertl

unread,
May 22, 2022, 3:23:12 AMMay 22
to
dxforth <dxf...@gmail.com> writes:
>On 21/05/2022 20:26, Anton Ertl wrote:
>> ...
>> Concerning VALUEs, GForth does not allow ADDR for them, so you have to
>> use IS to change them. And that's good, because it means that @ and !
>> don't access the value.
>
>So assembler routines can't access VALUEs ?

If they know where to find them, they can. The value might be in a
register, however; it also might be in a register in some parts of the
code, and in memory in other parts.

Anton Ertl

unread,
May 22, 2022, 4:42:16 AMMay 22
to
In my experience Java is quite burdensome.

>Dijkstra LOVED it.

I actually tool the effort to check your claim, and what I found
indicates that it is a blatant lie:
<https://www.cs.utexas.edu/users/EWD/transcriptions/OtherDocs/Haskell.html>

Some quotes from this letter by Dijkstra:

|their undergraduate curriculum has not recovered from the transition
|from Pascal to something like C++ or Java.

|Haskell, though not perfect, is of a quality that is several orders of
|magnitude higher than Java, which is a mess

Hans Bezemer

unread,
May 22, 2022, 9:06:37 AMMay 22
to
On Sunday, May 22, 2022 at 10:42:16 AM UTC+2, Anton Ertl wrote:
> Hans Bezemer <the.bee...@gmail.com> writes:
> >On Saturday, May 21, 2022 at 8:11:49 PM UTC+2, Anton Ertl wrote:
> >> I prefer to avoid such issues by giving the programmers less
> >> burdensome ways to express the programs, e.g., number prefixes and the
> >> tick-recognizer.
> >Try Java. You'll find it impressive. It's built EXACTLY from that perspective.
> In my experience Java is quite burdensome.
>
> >Dijkstra LOVED it.
>
> I actually tool the effort to check your claim, and what I found
> indicates that it is a blatant lie:
> <https://www.cs.utexas.edu/users/EWD/transcriptions/OtherDocs/Haskell.html>
>
> Some quotes from this letter by Dijkstra:
>
> |their undergraduate curriculum has not recovered from the transition
> |from Pascal to something like C++ or Java.

Understanding sarcasm is a skill, I suppose. If you'd read his writings about e.g. Ada, you would have understood it was sarcasm.

HB

Paul Rubin

unread,
May 22, 2022, 4:21:37 PMMay 22
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> |Haskell, though not perfect, is of a quality that is several orders of
> |magnitude higher than Java, which is a mess

From Dijkstra's 1972 Turing Award lecture:

With a few very basic principles at its foundation, it [LISP] has
shown a remarkable stability. Besides that, LISP has been the
carrier for a considerable number of in a sense our most
sophisticated computer applications. LISP has jokingly been
described as “the most intelligent way to misuse a computer”. I
think that description a great compliment because it transmits the
full flavour of liberation: it has assisted a number of our most
gifted fellow humans in thinking previously impossible
thoughts.

dxforth

unread,
May 22, 2022, 10:50:48 PMMay 22
to
On 22/05/2022 17:16, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>On 21/05/2022 20:26, Anton Ertl wrote:
>>> ...
>>> Concerning VALUEs, GForth does not allow ADDR for them, so you have to
>>> use IS to change them. And that's good, because it means that @ and !
>>> don't access the value.
>>
>>So assembler routines can't access VALUEs ?
>
> If they know where to find them, they can. The value might be in a
> register, however; it also might be in a register in some parts of the
> code, and in memory in other parts.

The choice whether a VALUE or a VARIABLE is used should have nothing to do
with whether one is using high-level or assembler. In making the former
harder to use, one is saying VALUEs for forth and VARIABLEs for assembler.

Anton Ertl

unread,
May 23, 2022, 1:49:36 AMMay 23
to
dxforth <dxf...@gmail.com> writes:
>On 22/05/2022 17:16, Anton Ertl wrote:
>>>So assembler routines can't access VALUEs ?
>>
>> If they know where to find them, they can. The value might be in a
>> register, however; it also might be in a register in some parts of the
>> code, and in memory in other parts.
>
>The choice whether a VALUE or a VARIABLE is used should have nothing to do
>with whether one is using high-level or assembler.

If you feel this way, you as a system designer can devise the
interface between Forth and assembly language accordingly; or as a
user of a third-party Forth system, you can choose a Forth system that
satisfies your requirement.

dxforth

unread,
May 23, 2022, 5:25:40 AMMay 23
to
On 23/05/2022 15:40, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>On 22/05/2022 17:16, Anton Ertl wrote:
>>>>So assembler routines can't access VALUEs ?
>>>
>>> If they know where to find them, they can. The value might be in a
>>> register, however; it also might be in a register in some parts of the
>>> code, and in memory in other parts.
>>
>>The choice whether a VALUE or a VARIABLE is used should have nothing to do
>>with whether one is using high-level or assembler.
>
> If you feel this way, you as a system designer can devise the
> interface between Forth and assembly language accordingly; or as a
> user of a third-party Forth system, you can choose a Forth system that
> satisfies your requirement.

Gforth seems to be the 'odd man out' here. A user may well ask why has
it made access to VALUEs so difficult.

Anton Ertl

unread,
May 23, 2022, 7:44:26 AMMay 23
to
dxforth <dxf...@gmail.com> writes:
>Gforth seems to be the 'odd man out' here. A user may well ask why has
>it made access to VALUEs so difficult.

Difficult? Let's phrase the question in a less loaded way:

Q: Why does Gforth not support ADDR on value-flavoured words?

A: E.g., because without ADDR a future version of Gforth can keep the
value V in a register in the loop

?do ... v ... @ ... to v ... loop

wheras with ADDR V that's not generally possible, and those cases that
are possible require a lot of compiler complexity.

Q: But I want to port code that uses ADDR V to Gforth?

A: Either define V as a VARUE (like a VALUE, but supports ADDR), or
just prepend the code with

: VALUE VARUE ;

dxforth

unread,
May 23, 2022, 9:17:05 PMMay 23
to
On 23/05/2022 21:23, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>Gforth seems to be the 'odd man out' here. A user may well ask why has
>>it made access to VALUEs so difficult.
>
> Difficult? Let's phrase the question in a less loaded way:
>
> Q: Why does Gforth not support ADDR on value-flavoured words?
>
> A: E.g., because without ADDR a future version of Gforth can keep the
> value V in a register in the loop
>
> ?do ... v ... @ ... to v ... loop
>
> wheras with ADDR V that's not generally possible, and those cases that
> are possible require a lot of compiler complexity.

Keeping things in registers usually refers to constants or locals.
My understanding of VALUEs is that they're read far more often than
written and, as such, your use above would appear to be something of
an anomaly. FWIW I don't expect ADDR to be used much either - which
isn't to say I can do without it. I definitely want to be able to
access VALUEs via assembler. To me that's more important than
'keeping it in a register'.

>
> Q: But I want to port code that uses ADDR V to Gforth?
>
> A: Either define V as a VARUE (like a VALUE, but supports ADDR), or
> just prepend the code with
>
> : VALUE VARUE ;

I have a hard enough time justifying VALUEs and VARIABLEs.

Anton Ertl

unread,
May 24, 2022, 7:50:41 AMMay 24
to
dxforth <dxf...@gmail.com> writes:
>On 23/05/2022 21:23, Anton Ertl wrote:
>> Q: Why does Gforth not support ADDR on value-flavoured words?
>>
>> A: E.g., because without ADDR a future version of Gforth can keep the
>> value V in a register in the loop
>>
>> ?do ... v ... @ ... to v ... loop
>>
>> wheras with ADDR V that's not generally possible, and those cases that
>> are possible require a lot of compiler complexity.
>
>Keeping things in registers usually refers to constants or locals.

Sure, if you support ADDR, you cannot keep values in registers in many
situations (and the remaining situations are probably so few that you
just keep them in memory all the time).

But there are also those who advocate not using locals, and to use
variables or values instead. Aren't you one of them? Anyway, for
those usages, among others, it would be beneficial to keep values in
registers.

>My understanding of VALUEs is that they're read far more often than
>written and, as such, your use above would appear to be something of
>an anomaly.

Possibly. When a value is both read and written in a loop, allocating
it in a register is particularly beneficial. Here's an example:

: foo1 0 begin 2dup u> while 1+ repeat 2drop ;
0 value x
: foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;

[/tmp:130361] perf stat -e cycles -e instructions vfxlin "include xxx.fs 1000000000 foo1 bye"
[/tmp:130362] perf stat -e cycles -e instructions vfxlin "include xxx.fs 1000000000 foo2 bye"

On a Skylake this produces:

cycles instructions
1012951207 4010090522 foo1
5060982449 6010156213 foo2

So allocating x in a register (what lxf does for stack items) is 5
times faster than allocating it in memory (what lxf does for values).

>FWIW I don't expect ADDR to be used much either

Exactly. But the possible future use of ADDR on a value means that it
always has to be kept in memory. So if you support ADDR on a value,
you have to pay for it even if you don't use it (on that value, or at
all).

And that's why Gforth does not support ADDR on values. If you want to
use ADDR on a word, you can define this particular word with VARUE.

Anton Ertl

unread,
May 24, 2022, 12:53:55 PMMay 24
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>When a value is both read and written in a loop, allocating
>it in a register is particularly beneficial. Here's an example:
>
>: foo1 0 begin 2dup u> while 1+ repeat 2drop ;
>0 value x
>: foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;
>
>[/tmp:130361] perf stat -e cycles -e instructions vfxlin "include xxx.fs 1000000000 foo1 bye"
>[/tmp:130362] perf stat -e cycles -e instructions vfxlin "include xxx.fs 1000000000 foo2 bye"
>
>On a Skylake this produces:
>
> cycles instructions
> 1012951207 4010090522 foo1
> 5060982449 6010156213 foo2
>
>So allocating x in a register (what lxf does for stack items) is 5
>times faster than allocating it in memory (what lxf does for values).

Sorry for the confusion, I used both lxf and vfxlin 4.72 for this,
with similar results.

And just to show that it's not as bad for read-only or write-only
memory accesses:

: foo1 0 begin 2dup u> while 1+ repeat 2drop ;
0 value x
0 value y
: foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;
: foo3 to y 0 begin y over u> while 1+ repeat drop ;
: foo4 0 begin 2dup u> while 1+ dup to y repeat 2drop ;

cycles instructions vfxlin 4.72
1012848419 4009410376 foo1 registers
5060035003 6010211435 foo2 read+write
1900018814 4009598660 foo3 read-only
1012666250 5009454583 foo4 write-only

Here are the inner loops:

foo1 foo2 foo3 foo4
CMP EBX, [EBP] CMP EBX, [080A3440] CMP EBX, [080A3444] CMP EBX, [EBP]
JNB 080C0ACC JBE 080C0B7B JNB 080C0BBF JNB 080C0C12
INC EBX MOV EDX, [080A3440] INC EBX INC EBX
JMP 080C0AC0 INC EDX JMP 080C0BB0 MOV [080A3444], EBX
MOV [080A3440], EDX JMP 080C0C00
JMP 080C0B60

Interestingly, foo1 and foo4 read one stack item from memory (likewise
for lxf).

For lxf:

cycles instructions vfxlin 4.72
1002385171 4000942415 foo1 registers
5137955357 7001690803 foo2 read+write
1002416848 5000977891 foo3 read-only
2002758293 5001179157 foo4 write-only

foo1 foo2 foo3 foo4
cmp [ebp], ebx mov eax, [08389CF8] mov eax, [08389CFC] cmp [ebp], ebx
jbe "0804FBFB" cmp ebx, eax cmp eax, ebx jbe "0804FC80"
inc ebx jbe "0804FC3A" jbe "0804FC5C" inc ebx
jmp "0804FBEF" mov eax, [08389CF8] inc ebx mov [08389CFC], ebx
inc eax jmp "0804FC4C" jmp "0804FC6E"
mov [08389CF8], eax
jmp "0804FC20"

Interestingly, on lxf FOO3 is faster and FOO4 slower than on vfxlin,
even though FOO4 has the same code as on VFX; maybe a code alignment
issue. Anyway, you can see that the variant with memory read+write is
a lot slower than the other variants.

- anton

































(

Marcel Hendrix

unread,
May 24, 2022, 4:36:44 PMMay 24
to
On Tuesday, May 24, 2022 at 6:53:55 PM UTC+2, Anton Ertl wrote:
[..]
> And just to show that it's not as bad for read-only or write-only
> memory accesses:
> : foo1 0 begin 2dup u> while 1+ repeat 2drop ;
> 0 value x
> 0 value y
> : foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;
> : foo3 to y 0 begin y over u> while 1+ repeat drop ;
> : foo4 0 begin 2dup u> while 1+ dup to y repeat 2drop ;
>
> cycles instructions vfxlin 4.72
> 1012848419 4009410376 foo1 registers
> 5060035003 6010211435 foo2 read+write
> 1900018814 4009598660 foo3 read-only
> 1012666250 5009454583 foo4 write-only

Hmm, weird. On iForth it is permutated:

FORTH> #1000000000 TO #times TEST
foo1 0.212 seconds elapsed.
foo2 0.221 seconds elapsed.
foo3 0.213 seconds elapsed.
foo4 0.425 seconds elapsed. ok

FORTH> ' foo1 idis ' foo2 idis ' foo3 idis ' foo4 idis
$0133D880 : foo1
$0133D88A xor rbx, rbx
$0133D88D mov rax, rax
$0133D890 pop rdi
$0133D891 cmp rbx, rdi
$0133D894 push rdi
$0133D895 jae $0133D8A3 offset NEAR
$0133D89B lea rbx, [rbx 1 +] qword
$0133D89F jmp $0133D890 offset SHORT
$0133D8A1 push rbx
$0133D8A2 pop rbx
$0133D8A3 pop rdi
$0133D8A4 ;
$01340940 : foo2
$0134094A mov $01340500 qword-offset, 0 d#
$01340955 pop rbx
$01340956 nop
$01340957 nop
$01340958 mov rax, $01340500 qword-offset
$0134095F cmp rax, rbx
$01340962 jae $0134097E offset NEAR
$01340968 mov rax, $01340500 qword-offset
$0134096F lea rcx, [rax 1 +] qword
$01340973 mov $01340500 qword-offset, rcx
$0134097A jmp $01340958 offset SHORT
$0134097C push rbx
$0134097D pop rbx
$0134097E ;
$01340A00 : foo3
$01340A0A pop rbx
$01340A0B mov $01340520 qword-offset, rbx
$01340A12 xor rbx, rbx
$01340A15 mov rax, rax
$01340A18 mov rax, $01340520 qword-offset
$01340A1F cmp rbx, rax
$01340A22 jae $01340A30 offset NEAR
$01340A28 lea rbx, [rbx 1 +] qword
$01340A2C jmp $01340A18 offset SHORT
$01340A2E push rbx
$01340A2F pop rbx
$01340A30 ;
$01340A80 : foo4
$01340A8A xor rbx, rbx
$01340A8D mov rax, rax
$01340A90 pop rdi
$01340A91 cmp rbx, rdi
$01340A94 push rdi
$01340A95 jae $01340AAE offset NEAR
$01340A9B lea rcx, [rbx 1 +] qword
$01340A9F mov $01340520 qword-offset, rcx
$01340AA6 lea rbx, [rbx 1 +] qword
$01340AAA jmp $01340A90 offset SHORT
$01340AAC push rbx
$01340AAD pop rbx
$01340AAE pop rdi
$01340AAF ;

The stack variable ( rdi ) is causing the slowdown.

-marcel

dxforth

unread,
May 24, 2022, 8:59:47 PMMay 24
to
On 24/05/2022 20:48, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>> ...
>>Keeping things in registers usually refers to constants or locals.
>
> Sure, if you support ADDR, you cannot keep values in registers in many
> situations (and the remaining situations are probably so few that you
> just keep them in memory all the time).
>
> But there are also those who advocate not using locals, and to use
> variables or values instead. Aren't you one of them? Anyway, for
> those usages, among others, it would be beneficial to keep values in
> registers.
>
>>My understanding of VALUEs is that they're read far more often than
>>written and, as such, your use above would appear to be something of
>>an anomaly.
>
> Possibly. When a value is both read and written in a loop, allocating
> it in a register is particularly beneficial. Here's an example:
>
> : foo1 0 begin 2dup u> while 1+ repeat 2drop ;
> 0 value x
> : foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;

That's where I would use a VARIABLE - which optimizing compilers can
can keep in a register if they wish. The example code I previously
posted has one VALUE referenced once and two VARIABLEs each referenced
twice. If I were going to prioritize anything for register use, it
would be VARIABLEs as they're used more often.

VALUEs in forth were conceived as self-fetching VARIABLEs. Which was
fine until one had to write to them. They're in CORE-EXT because they
didn't replace anything nor add anything. Syntactic sugar with pros
and cons is how I view them. I'll use them - but not when VARIABLEs
are the better choice.

Anton Ertl

unread,
May 25, 2022, 1:05:15 AMMay 25
to
Marcel Hendrix <m...@iae.nl> writes:
>On Tuesday, May 24, 2022 at 6:53:55 PM UTC+2, Anton Ertl wrote:
>[..]
>> And just to show that it's not as bad for read-only or write-only
>> memory accesses:
>> : foo1 0 begin 2dup u> while 1+ repeat 2drop ;
>> 0 value x
>> 0 value y
>> : foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;
>> : foo3 to y 0 begin y over u> while 1+ repeat drop ;
>> : foo4 0 begin 2dup u> while 1+ dup to y repeat 2drop ;
>>
>> cycles instructions vfxlin 4.72
>> 1012848419 4009410376 foo1 registers
>> 5060035003 6010211435 foo2 read+write
>> 1900018814 4009598660 foo3 read-only
>> 1012666250 5009454583 foo4 write-only
>
>Hmm, weird. On iForth it is permutated:
>
>FORTH> #1000000000 TO #times TEST
>foo1 0.212 seconds elapsed.
>foo2 0.221 seconds elapsed.
>foo3 0.213 seconds elapsed.
>foo4 0.425 seconds elapsed. ok

So that's ~1/1/1/2 cycles per iteration if your CPU runs slightly
below 5GHz. My guess is that's because you are running it on a Zen3
CPU, where the hardware has special optimizations for avoiding the
memory dependence latency we see on the Skylake.

Here's the cycles/iteration that I see on more recent CPUs than the
Skylake (using lxf):

foo1 foo2 foo3 foo4
1 8 1 1 Zen2 (Ryzen 3900X)
1 1 1 1 Zen3 (Ryzen 5800X)
1 1.8 1 1 Rocket Lake (Xeon W1370P)

So Rocket Lake obviously optimizes this case, too, but apparently
there is some residue.

At some point, CPUs will optimize so well that Python will run as fast
as well-written assembly language. Of course, by then software
developers will have switched to a language that's 100 times slower
even on that hardware:-).

Anton Ertl

unread,
May 25, 2022, 1:42:58 AMMay 25
to
dxforth <dxf...@gmail.com> writes:
>On 24/05/2022 20:48, Anton Ertl wrote:
>> : foo1 0 begin 2dup u> while 1+ repeat 2drop ;
>> 0 value x
>> : foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;
>
>That's where I would use a VARIABLE

I certainly used to think that variables are the Forth way, and that
one should use values only for almost-constants that are not changed
in compiled code. But values (without ADDR) have a nice property, as
discussed in this thread.

>which optimizing compilers can
>can keep in a register if they wish.

Forget it. This optimization is as hard for variables as for varues.
It is easy for values. A Forth meme is that you should not burden the
compiler with jobs that the programmer can perform. And as it
happens, values make the compiler's job easier than varues or
variables, so if you want this optimization, use values!

Another way to satisfy the meme is to fetch the variable at the start,
keep it in, e.g., a local in the word, and store it back im the end.
E.g.,

variable xx
: foo2a 0 {: x :} begin dup x u> while x 1+ to x repeat drop x xx ! ;
\ or
: foo2b 0 begin {: x :} dup x u> while x 1+ repeat xx ! drop ;


>The example code I previously
>posted has one VALUE referenced once and two VARIABLEs each referenced
>twice. If I were going to prioritize anything for register use, it
>would be VARIABLEs as they're used more often.

And then you find that you have to prove that nothing else stores to
or fetches from the address between the accesses to the variable.
Hundreds of papers have been written about this problem (alias
analysis); it takes a lot of work (both compiler writer time and
compile time) to perform alias analysis, and it still only partially
solves the problem. As mentioned above, IMO that's not the Forth way.

We already have some Forth ways:

1) Use values.

2) Keep the value of the variable explicitly in locals, the return
stack, or the data stack in the code fragment where you want to give
the compiler the opportunity to keep it in a register.

dxforth

unread,
May 25, 2022, 3:15:57 AMMay 25
to
On 25/05/2022 15:05, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>On 24/05/2022 20:48, Anton Ertl wrote:
>>> : foo1 0 begin 2dup u> while 1+ repeat 2drop ;
>>> 0 value x
>>> : foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;
>>
>>That's where I would use a VARIABLE
>
> I certainly used to think that variables are the Forth way, and that
> one should use values only for almost-constants that are not changed
> in compiled code.

Does that make a VARUE an 'almost-variable' :) How many languages
do you know have "almost constants"? Sorry, a VALUE is a variable.

> But values (without ADDR) have a nice property, as
> discussed in this thread.

Good luck in the real world.

>
>>which optimizing compilers can
>>can keep in a register if they wish.
>
> Forget it. This optimization is as hard for variables as for varues.
> It is easy for values. A Forth meme is that you should not burden the
> compiler with jobs that the programmer can perform. And as it
> happens, values make the compiler's job easier than varues or
> variables, so if you want this optimization, use values!

A better meme is don't break common practice - especially when one
is promoting portability between systems, standards and the like.

Hans Bezemer

unread,
May 25, 2022, 11:04:28 AMMay 25
to
On Sunday, May 22, 2022 at 10:21:37 PM UTC+2, Paul Rubin wrote:
The advantage of being Dutch is that lots more of his work is available to you.
Dijkstra didn't believe the myth that "the language will save you" - that's a
recurring theme in his work:

"For example, to the desperate manager, it is a startlingly comforting thought
that the programming language in use is the source of all his misery. The wretch
clings to the dream of the programming language, in which programming is so
easy that everything will work itself out. The new programming languages ​​as
panacea are sold in the quack stall like fresh bread from the baker. Notorious in
this regard is the IBM ad in Datamation, 1968, in which a radiant Susie Meyer
—in colors!— declares that the conversion to PL/I was the end of all her programming
troubles. Unfortunately, history does not record what poor Susie Meyer looked like a
few years later, but one can guess, because the miracle cure did not work, of course.
Anyone who reads the propaganda literature for Ada—the programming language
promoted by the US Department of Defense—must see that the world has changed
little in 14 years".

And:

"Once the feeling among mathematicians that computers are not worth bothering
with, then it acts as a so-called "self-fulfilling prophecy": if the intellectually best
equipped ignore the subject, the territory is occupied by second- and third-rate
people and after a while it is even more difficult for the really clever boy to
imagine that there is a task in front of him (..) By discussing the essentially
mathematical nature of the entire usability problem with a large-scale campaign,
programming as something that anyone can learn in a three-week course, in short,
by declaring so obstinately that the gold of the promised mountains is massive that
enough people believe it. In the years that followed, the profession of programmer
was recruited completely uncritically. Here in this country MULO (Highschool)
was more than enough, but in other countries it was no better. The result can be
imagined: of the half-million professional programmers the world now counts,
the majority is one of incompetence that defies description. But here's an
explanation for the tough life of the software crisis: an incompetent half-million
labor army isn't replaced overnight".

Hans Bezemer

Original:

Voor de wanhopige manager is het bijvoorbeeld een ontstellend geruststellende gedachte,
dat de gebezigde programmeertaal de bron van al zijn ellende is. De stakker klampt zich
vast aan de droom van de programmeertaal, waarin programmeren zo makkelijk is, dat het
allemaal vanzelf goedkomt. De nieuwe programmeertalen als panacee gaan in de
kwakzalverskraam over de toonbank als verse broodjes bij de bakker. In dit verband berucht
is de IBM-advertentie in Datamation, 1968, waarin een stralende Susie Meyer —in kleuren!—
verklaart, dat de bekering tot PL/I het einde van al haar programmeerproblemen was. Hoe de
arme Susie Meyer er een paar jaar later uitzag, vermeldt de historie helaas niet, maar het laat
zich raden, want het wondermiddel heeft natuurlijk niet gewerkt. Wie de propagandaliteratuur
voor Ada —de programmeertaal, die door het Amerikaanse ministerie van defensie wordt
gepousseerd— leest, moet constateren, dat de wereld in 14 jaar weinig is veranderd.

Overheerst bij wiskundigen eenmaal het gevoel dat computers niet de moeite waard
zijn om je mee bezig te houden, dan werkt dat vervolgens als een z.g.
"self-fulfilling prophecy": als de intellectueel het best geequipeerden het vak links laten
liggen, wordt het gebied bezet door tweede- en derderangs mensen en na enige tijd kan
de echt knappe jongen zich nog moeilijker voorstellen dat daar een taak voor hem ligt (..)
Door met een groots opgezette campagne het wezenlijk wiskundige karakter van de hele
gebruiksproblematiek onder tafel te praten, programmeren voor te stellen als iets dat
iedereen in een drieweekse cursus kan leren, kortom door zo hardnekking te verklaren
dat het goud der beloofde bergen massief is, dat genoeg mensen het geloven. In de jaren
die daar op volgden is er voor het vak van programmeur volledig kritiekloos geronseld.
Hier in den lande was MULO ruimschoots genoeg, maar in andere landen was het geen
haar beter. Het resultaat laat zich denken: van de half-millioen professionele programmeurs,
die de wereld inmiddels telt is het merendeel van een incompetentie die elke beschrijving tart.
Maar hier ligt wel een verklaring van het taaie leven van de software crisis: een incompetent
arbeidsleger van een half millioen ververs je niet een-twee-drie.


Anton Ertl

unread,
May 25, 2022, 1:30:39 PMMay 25
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>My guess is that's because you are running it on a Zen3
>CPU, where the hardware has special optimizations for avoiding the
>memory dependence latency we see on the Skylake.
>
>Here's the cycles/iteration that I see on more recent CPUs than the
>Skylake (using lxf):
>
>foo1 foo2 foo3 foo4
>1 8 1 1 Zen2 (Ryzen 3900X)
>1 1 1 1 Zen3 (Ryzen 5800X)
>1 1.8 1 1 Rocket Lake (Xeon W1370P)
>
>So Rocket Lake obviously optimizes this case, too, but apparently
>there is some residue.

Of course, the question is how well this works in other cases. A
major difference between gforth and gforth-fast is that gforth keeps
the TOS in memory and gforth-fast keeps it in a register. And if we
disable the additional optimizations of gforth-fast with
"--ss-number=0 --ss-states=0", that's even more so. In particular, a
lot of the performance difference on CPUs without this optimization is
due to this difference. So on CPUs with this optimization the
performance difference should be much smaller. Let's see whether that
works out. I do

for i in gforth-fast gforth; do LC_NUMERIC=en_US perf stat -e cycles -e instructions -e branches $i --ss-number=0 --ss-states=0 ../gforth/onebench.fs; done

on six different CPUs; the numbers are cycles, except for the first
line.

gforth-fast gforth
3,460,908,004 4,737,564,166 instructions (all)
2,269,234,025 4,104,828,209 Goldmont (2016 efficiency)
1,714,076,717 3,495,857,543 Sandy Bridge (2011)
1,584,890,277 3,419,614,950 Zen (2017)
1,342,756,453 2,706,876,924 Zen2 (2019)
1,110,655,337 2,424,136,035 Zen3 (2020)
1,189,653,919 2,106,179,585 Rocket Lake (2021)

So, gforth-fast --ss-number=0 --ss-states=0 gets roughly a factor of 2
throughout the bank. The memory dependence optimization does not
appear to benefit gforth more than gforth-fast. It's unclear to me
whether that is because it is not that effective for gforth, or there
are also important memory dependencies in gforth-fast for which it
helps (but I don't know of the latter).

Hans Bezemer

unread,
May 25, 2022, 4:23:39 PMMay 25
to
On Wednesday, May 25, 2022 at 2:59:47 AM UTC+2, dxforth wrote:
> That's where I would use a VARIABLE - which optimizing compilers can
> can keep in a register if they wish. The example code I previously
> posted has one VALUE referenced once and two VARIABLEs each referenced
> twice. If I were going to prioritize anything for register use, it
> would be VARIABLEs as they're used more often.
>
> VALUEs in forth were conceived as self-fetching VARIABLEs. Which was
> fine until one had to write to them. They're in CORE-EXT because they
> didn't replace anything nor add anything. Syntactic sugar with pros
> and cons is how I view them. I'll use them - but not when VARIABLEs
> are the better choice.

As far as 4tH is concerned - I started out like you. Sure, it's in the extended
core and people tend to use them in programs, so it's handy if you add them
to the vocabulary. But its use largely overlaps VARIABLE. Not supporting
LOCALS helps too ;-)

Nowadays the difference (in 4tH) has blurred even further, since the optimizer
changes VARIABLEs to VALUEs when the address of the variable is known
at compile time. +TO changes VALUEs to VARIABLEs so it won't take up another
token and +! can simply be used.

And I find that I tend to do VALUEs often when I can initialize a VARIABLE -
like VARIABLE in Forth-79.

HB

dxforth

unread,
May 26, 2022, 1:28:53 AMMay 26
to
On 26/05/2022 06:23, Hans Bezemer wrote:
> ...
> And I find that I tend to do VALUEs often when I can initialize a VARIABLE -
> like VARIABLE in Forth-79.

Given VALUEs will change (else one would use CONSTANT) it might have been
better to omit the initial value. Or would that have made VALUEs too much
like VARIABLEs ? In the app below every VALUE was defined with a dummy.

\ Program constants
0 value #TERMS \ number of terminals in DTA file
0 value TERM \ working terminal#
$95 constant TLEN \ length of each term definition
$100 constant CHUNK \ in-file chunk to get
20 constant ISIZ \ size input buffer / terminal name
200 constant TMAX \ max #terminals

\ Storage areas allocated at run-time
here value IN-BASE ( -- a ) \ in-file
here value DTA-BASE ( -- a ) \ dta-file
here value TBUF ( -- a ) \ temp terminal buffer
here value SBUF ( -- a ) \ swap/work buffer
here value IBUF ( -- a ) \ console input
here value XBUF ( -- a ) \ terminal index

Anton Ertl

unread,
May 26, 2022, 2:28:46 AMMay 26
to
dxforth <dxf...@gmail.com> writes:
>On 26/05/2022 06:23, Hans Bezemer wrote:
>> ...
>> And I find that I tend to do VALUEs often when I can initialize a VARIABLE -
>> like VARIABLE in Forth-79.

From Forth-79:
|VARIABLE 227
| A defining word executed in the form:
| VARIABLE <name>
| to create a dictionary entry for <name> and allot two bytes
| for storage in the parameter field. The application must
| initialize the stored value.

In fig-Forth VARIABLE took the initial value from the stack. I miss
it every time I use VARIABLE.

>Given VALUEs will change (else one would use CONSTANT) it might have been
>better to omit the initial value.

No. You need to initialize a variable (or value) before reading its
value. So the way to ensure this is to initialize it right after
definition:

variable where-index -1 where-index !

This is much more cumbersome than fig-Forth's

-1 variable where-index

At least they got it right for VALUE.

>In the app below every VALUE was defined with a dummy.

A defined dummy helps avoid Heisenbugs, and can help find bugs.

>\ Program constants
>0 value #TERMS \ number of terminals in DTA file

You have more than 0 terminals at the start?

>\ Storage areas allocated at run-time
>here value IN-BASE ( -- a ) \ in-file
>here value DTA-BASE ( -- a ) \ dta-file
>here value TBUF ( -- a ) \ temp terminal buffer
>here value SBUF ( -- a ) \ swap/work buffer
>here value IBUF ( -- a ) \ console input
>here value XBUF ( -- a ) \ terminal index

I work on systems where accessing memory near 0 produces an exception,
so initializing addresses as 0 helps find bugs quickly where the
allocation has not happened before the first read. Without
initialization, the bug might go unnoticed for longer, making it
harder to find.

dxforth

unread,
May 26, 2022, 4:51:24 AMMay 26
to
On 26/05/2022 16:12, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>On 26/05/2022 06:23, Hans Bezemer wrote:
>>> ...
>>> And I find that I tend to do VALUEs often when I can initialize a VARIABLE -
>>> like VARIABLE in Forth-79.
>
> From Forth-79:
> |VARIABLE 227
> | A defining word executed in the form:
> | VARIABLE <name>
> | to create a dictionary entry for <name> and allot two bytes
> | for storage in the parameter field. The application must
> | initialize the stored value.
>
> In fig-Forth VARIABLE took the initial value from the stack. I miss
> it every time I use VARIABLE.
>
>>Given VALUEs will change (else one would use CONSTANT) it might have been
>>better to omit the initial value.
>
> No. You need to initialize a variable (or value) before reading its
> value. So the way to ensure this is to initialize it right after
> definition:

I do that in a function (e.g. INIT ) so that when a program is re-run
VALUEs and anything else that requires predictable initial values will
be correctly set. Defining x VALUE FOO is misleading as it suggests
FOO will be x whenever the program is run. It's such assumptions that
lead to your Heisenbugs.

>
> variable where-index -1 where-index !
>
> This is much more cumbersome than fig-Forth's
>
> -1 variable where-index
>
> At least they got it right for VALUE.
>
>>In the app below every VALUE was defined with a dummy.
>
> A defined dummy helps avoid Heisenbugs, and can help find bugs.

Or not. I recently ported an old program which presumed VARIABLE
initialized to zero. It helped that my VARIABLE initialized to a
random value else I might still be looking for the bug.

>
>>\ Program constants
>>0 value #TERMS \ number of terminals in DTA file
>
> You have more than 0 terminals at the start?
>
>>\ Storage areas allocated at run-time
>>here value IN-BASE ( -- a ) \ in-file
>>here value DTA-BASE ( -- a ) \ dta-file
>>here value TBUF ( -- a ) \ temp terminal buffer
>>here value SBUF ( -- a ) \ swap/work buffer
>>here value IBUF ( -- a ) \ console input
>>here value XBUF ( -- a ) \ terminal index
>
> I work on systems where accessing memory near 0 produces an exception,
> so initializing addresses as 0 helps find bugs quickly where the
> allocation has not happened before the first read. Without
> initialization, the bug might go unnoticed for longer, making it
> harder to find.

My preferred solution was:

: INIT ( -- )
altered off \ clear
isiz reserve to ibuf \ console input
tlen reserve to tbuf \ temp terminal buffer
chunk reserve to sbuf \ swap/work buffer
tmax cells reserve to xbuf \ terminal index
;

( INIT)

\ Main
: INSTALL ( -- )
cls title init
open-target read-dta catalog
menu
close-target
;

Paul Rubin

unread,
May 26, 2022, 1:23:34 PMMay 26
to
dxforth <dxf...@gmail.com> writes:
> Defining x VALUE FOO is misleading as it suggests FOO will be x
> whenever the program is run.

It only suggests that FOO is x until it gets updated. FOO should (if
possible) have a legitimate and meaningful value at all times, rather
than a dummy value.

P Falth

unread,
May 26, 2022, 4:48:37 PMMay 26
to
Anton, this is an interesting and somewhat unexpected development.
Can you share some more information on how you plan to implement it?

I can see some problems in that you can have more values defined then
you have registers available. In lxf I have 5 registers available for the code
generator, the rest are used by the system (stackpointers etc). Even if I only
use 1 register for values it will hurt the code generator. And what VALUE
would I dedicate that register to!

On a CPU with many register like ARM64 I could think of something like

1234 REGISTER-VALUE R1 test

to make test a value stored in register R1. This could be very useful
in some cases.

BR
Peter Fälth

dxforth

unread,
May 26, 2022, 7:54:09 PMMay 26
to
On 26/05/2022 16:12, Anton Ertl wrote:
>>On 26/05/2022 06:23, Hans Bezemer wrote:
>>> ...
>>> And I find that I tend to do VALUEs often when I can initialize a VARIABLE -
>>> like VARIABLE in Forth-79.
>
> From Forth-79:
> |VARIABLE 227
> | A defining word executed in the form:
> | VARIABLE <name>
> | to create a dictionary entry for <name> and allot two bytes
> | for storage in the parameter field. The application must
> | initialize the stored value.
>
> In fig-Forth VARIABLE took the initial value from the stack. I miss
> it every time I use VARIABLE.

AFAIK the 'n VARIABLE name' syntax originated with Moore. It is present
in Kitt Peak Forth and Forth Inc's microForth (from which fig-Forth likely
got it). It also appears in Forth-77 (which FD called a "standardized
glossary" - also based on Kitt Peak Forth). According to FD, Forth-79
was the first serious attempt at a standard and portability. I'm not
aware of any official explanation as why the initializing value of VARIABLE
was dropped in Forth-79 (it also seems to have disappeared by the time of
polyForth and Starting FORTH). The change didn't bother me. If anything,
I thought they were fixing a loop-hole.

So where did VALUE come from? All I know is TO first appeared in FD V1N4.
According to the author it was based on a suggestion by Moore. The
intention was:

n VARIABLE foo foo @ ( n)

would be replaced by:

VARIABLE foo n TO foo foo ( n)

I don't know who first came up with VALUE or when. The earliest I've
seen it implemented was in F-PC. I suspect however it was in use before
that. Perhaps someone knows?

dxforth

unread,
May 26, 2022, 8:32:02 PMMay 26
to
It's not the case for DEFER. I would very much have preferred VALUE
required no initial value then I wouldn't need to include a dummy.

Andy Valencia

unread,
May 27, 2022, 4:25:54 PMMay 27
to
Paul Rubin <no.e...@nospam.invalid> writes:
> It only suggests that FOO is x until it gets updated. FOO should (if
> possible) have a legitimate and meaningful value at all times, rather
> than a dummy value.

Indeed, thus almost every language has initializers for things which can
subsequently have their value changed.

Andy Valencia
Home page: https://www.vsta.org/andy/
To contact me: https://www.vsta.org/contact/andy.html

dxforth

unread,
May 27, 2022, 11:51:57 PMMay 27
to
On 28/05/2022 06:24, Andy Valencia wrote:
> Paul Rubin <no.e...@nospam.invalid> writes:
>> It only suggests that FOO is x until it gets updated. FOO should (if
>> possible) have a legitimate and meaningful value at all times, rather
>> than a dummy value.
>
> Indeed, thus almost every language has initializers for things which can
> subsequently have their value changed.

Yes, but what sense does it make to initialize them at compile-time
when it's execution-time that matters.

Ruvim

unread,
May 28, 2022, 7:32:28 AMMay 28
to
On 2022-05-19 18:47, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> Do you avoid the standard parsing words?
>> For example: "[']" "postpone" 's"' 'abort"'
>
> I avoid these unless there is some reason not to. In particular:
>
> Instead of ['] FOO, I write `FOO. The latter can be copy-pasted into
> interpretive code.

To quoting a word I prefer the form 'FOO (i.e. Tick vs Backtick) for the
following reasons:
- it is closer to "[']" and "'" (so it's already connotated with
quoting a word in Forth);
- it is also used for quoting in some other languages (e.g., in Lisp,
to quote a list).

Possible disadvantage of this choice are as follows.

- Sometimes Tick is an actual part of a name. But those who use names
starting with a Tick probably will not use recognizers, but parsing words.

- Tick is used for character literals in Gforth. But is was a
suboptimal choice, I think.



[...]
>> To use them as:
>>
>> `foo :def 123 . ;
>>
>
> Why `FOO, not "FOO"?

For better readability.

I use Backtick for a space delimited string [1]. This form tells me that
the string is a name (a name of something, not necessary the name of a
definition). And it's better for readability. Also, this form is far
more concise than S" FOO", that was important before a recognizer for
"FOO" was introduced.




I think we should find some conventions for these forms (and maybe some
other), and give them names. After that, to avoid conflicts, a Forth
source code file (or code block) that relies on a convention, should
start with a declaration that mentions the convention's name.

The format of such a declaration probably should be standardized.

As an example, JavaScript uses "use strict" string literal as the first
item of a code block to declare strict mode.

We could use a list of recognizer names in the declaration. The scope of
this declaration (where this recognizers are in effect) should be
limited in obvious way.



[1] Not only I, see for example:
https://github.com/rufig/spf4-utf8/blob/9eaeb371ee2824082464cc39552390bf74308451/devel/%7Eygrek/lib/xmltag.f#L95

--
Ruvim

Ruvim

unread,
May 28, 2022, 9:55:51 AMMay 28
to
On 2022-05-19 18:47, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:

[...]

>> 123 constant( foo )
>> create( bar ) 456 ,
>>
>> :( baz ) ( -- x ) foo postpone( foo bar ) ;
>>
>> :( test-baz ) :( baz2 ) baz drop postpone( @ ; ) ;
>>
>> t{ test-baz baz2 -> 123 456 }t
>>
>>

> And please explain the definition of TEST-BAZ and BAZ2.

It's just a test case.
TEST-BAZ creates the word BAZ2, that is equivalent to the following
definition:

: baz2 [ baz drop postpone( @ ; ) ] [

: baz2 [ baz drop ] @ ;

: baz2 [ foo postpone( foo bar ) drop ] @ ;

: baz2 [ foo ] foo bar [ drop ] @ ;

: baz2 [ 123 ] foo bar [ drop ] @ ;

: baz2 [ 123 drop ] foo bar @ ;

: baz2 foo bar @ ;

: baz2 123 bar @ ;




Actually, the form
:( foo )
is not quite readable due to confused ":("

A variant:
def( foo )
looks slightly better. But then the block should end with "end-def".

Maybe:

:def( foo ) ... ;

Just for reference, an implementation for ':def(' is following.

\ Data type symbols:
\ "sd" is a pair ( c-addr u )
\ "t" is a tuple ( i*x )

\ The code relies on the "quoted-word-by-tick" recognizer

: noop ;
: tt-dual ( t xt.compil xt.interp -- t )
state @ if drop else nip then execute
;
: tt-slit ( sd -- sd | ) 'slit, 'noop tt-dual ;
: tt-xt ( t xt -- t ) 'compile, 'execute tt-dual ;
: def ( sd.name -- ) ( C: -- colon-sys ) ': execute-parsing ;

: :def( \ "ccc )"
')' parse [: ( -- sd.name )
parse-name parse-name nip abort" unexpected immediate argument"
;] execute-parsing ( sd.name | c-addr 0 )
dup if tt-slit else 2drop then 'def tt-xt
; immediate


It defines such a behavior that the following lines of code are equivalent:

:def( foo ) :def( bar ) ... ;

:def( foo ) "bar" :def( ) ... ;

: foo "bar" ': execute-parsing ... ;


Well, I dislike this variant too. Since the eclipsed fragment is not the
body of the "bar" definition.



--
Ruvim

Ruvim

unread,
May 28, 2022, 9:56:37 AMMay 28
to
On 2022-05-19 18:47, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> Do you avoid the standard parsing words?
>> For example: "[']" "postpone" 's"' 'abort"'
>
> I avoid these unless there is some reason not to. In particular:
>
> Instead of ['] FOO, I write `FOO. The latter can be copy-pasted into
> interpretive code.

> Instead of POSTPONE FOO, I write ]] FOO [[. Especially nice for
> multiple words.
>
> Instead of S" BLA", I write "BLA".
>
> I don't use ABORT", not the least because I always have to look up the
> directiom of the flag.
> Instead, I use THROW. If I need a new ball, I
> create it with
>
> "new ball" exception constant new-ball



I use ABORT" for the examples that should be shot and standard
compliant. And I still have to use S" BLA", ['] and POSTPONE for
strictly standard compliant modules.




>> And what about defining words?
>
> I tend to use these. They have default compilation semantics and one
> rarely wants to copy-paste them between compiled and interpreted code.
> Of course, in those rare cases (i.e., when debugging a defining word),
> I wish that they took their name argument from the stack.


It's confusing that in one context they are followed by an immediate
argument, but in another context — are not.

: foo : bar ;

"foo" is an immediate argument of the first colon, but "bar" is not an
immediate argument of the second colon. It's very confusing. (1)


Actually, some such words cannot be avoided due to Forth reflection.
E.g. the phrase:
parse-name test type
produces different effects when it's interpreted and when it's executed
(after compilation to a word).

But "parse-name" is a “plumbing” word, and ":" is a “porcelain” word
(after Git's terminology). Puzzles are OK for plumbing.





>> I'm wondered why people continue to use parsing words if they dislike them.
>
> In the four cases above, I do it when writing code that should work on
> Forth systems that do not understand the better idioms, or when
> demonstating something to an audience that may not be familiar with
> the better idioms, and these idioms would distract from the point I am
> trying to demonstrate.

I would like to get rid of the burden of that old idioms.



>> Why not introduce new words like:
>>
>> :def ( sd.name -- ) ( C: -- colon-sys )
>>
>> does-created ( xt sd.name -- )
>
> The question is if the benefit is worth the cost in these cases.
> Cost: additional words (because we don't want to destandardize all
> existing code). Benefit: rare, as mentioned above.


And I would like to avoid the problem (1) in some lexical scopes at least.



>
>> And after that, what to do with "[if]" and "[undefined]"?
>
> And \ and (.

If they are with us in any case, we should not avoid them, but properly
(safely and readable) use them.



--
Ruvim

Anton Ertl

unread,
May 28, 2022, 10:32:30 AMMay 28
to
Ruvim <ruvim...@gmail.com> writes:
>On 2022-05-19 18:47, Anton Ertl wrote:
>> Instead of ['] FOO, I write `FOO. The latter can be copy-pasted into
>> interpretive code.
>
>To quoting a word I prefer the form 'FOO (i.e. Tick vs Backtick) for the
>following reasons:
> - it is closer to "[']" and "'" (so it's already connotated with
>quoting a word in Forth);
> - it is also used for quoting in some other languages (e.g., in Lisp,
>to quote a list).
>
>Possible disadvantage of this choice are as follows.
>
> - Sometimes Tick is an actual part of a name. But those who use names
>starting with a Tick probably will not use recognizers, but parsing words.

Gforth currently has the following words starting with ':

'-error ' 'quit 'cold 'image 'clean-maintask

and Gforth has recognizers. Also, I have seen several programs that
define words with names starting with ' (and often paired with a word
with a name without ', such as 'QUIT and QUIT). For ` I have yet to
see someone write a word that starts with that. As soon as someone
loads that program in a system with a '-recognizer, there are the
following potential problems:

* The user may be unaware of the existence of 'QUIT, and writes 'QUIT
with the intention of getting the xt of QUIT, but gets something
else because the word-recognizer precedes the '-recognizer.

* If, OTOH, the '-recognizer preceded the word-recognizer, you get the
converse problem: If you want to get at the word 'QUIT, you get the
xt of QUIT instead.

We also would have preferred to use 'FOO for the xt of FOO, but to
avoid these problems, we chose `.

> - Tick is used for character literals in Gforth. But is was a
>suboptimal choice, I think.

The usage 'c' is standardized in Forth-2012. The usage 'c is legacy
in Gforth, and not the major reason why we decided against 'FOO,
although a user writing, e.g., '# might be unpleasantly surprised that
the result is the ASCII value of # instead of the xt of #.

There is a word I' in Gforth, so 'I' would be a conflict with
non-legacy syntax. I think that given the fact that Gforth by default
tries to recognize a character before an xt, most users who want the
xt of I' probably would see 'I' and immediately see the conflict.

In any case, while some problems are less serious than others, all of
these problems are avoided by using ` in the tick-recognizer, which is
why we are using that.

>I think we should find some conventions for these forms (and maybe some
>other), and give them names. After that, to avoid conflicts, a Forth
>source code file (or code block) that relies on a convention, should
>start with a declaration that mentions the convention's name.
>
>The format of such a declaration probably should be standardized.
>
>As an example, JavaScript uses "use strict" string literal as the first
>item of a code block to declare strict mode.
>
>We could use a list of recognizer names in the declaration. The scope of
>this declaration (where this recognizers are in effect) should be
>limited in obvious way.

I think that, on the contrary, we should use one common set of words
and syntax instead of introducing ways to declare idiosyncracy.

E.g., do I think that the alignment handling in struct.fs is superior
to the Forth-2012 field words? Yes. Still, when writing new code, I
use the Forth-2012 words. And that's despite the fact that struct.fs
is a standard program, so you can use the struct.fs words fully
portably. It's just that the cost of idiosyncrasy outweighs the
benefits of the superior alignment handling.

We have also had ways to specify which wordset a program uses: wordset
queries with ENVIRONMENT?. A number of systems did not support that
in a useful way, few programs used it, and eventually we decided to
make them obsolescent (and refer only to Forth-94 while they are still
there). So I expect that a new way to specify which dialect a program
uses will also receive little love.

That being said, once we have standardized configurable recognizers,
nothing prevents you from adding a recognizer that uses ' for xt
literals and another recognizer that uses ` for string. So your
convention might be something like

require ruvim.4th ruvim-convention{
... \ code that uses ruvim-convention stuff
}ruvim-convention

I just hope that, like I do for field words, you will use the
(hopefully standardized) string syntax "string" (which already has a
lot of mindshare), and that we reach a consensus for a syntax for xt
literals, and all use that then.

- anton

Ruvim

unread,
May 28, 2022, 10:41:21 AMMay 28
to
On 2022-05-19 21:20, S Jack wrote:
> On Wednesday, May 18, 2022 at 9:30:45 AM UTC-5, Ruvim wrote:
>> Do you avoid the standard parsing words?
>
> No. For new words where the choice is to make it parsing or postfix I choose
> postfix. I've re-defined some existing parsing words to be postifx such as
> FORGET and SEE:
> ' foo FORGET
> ' foo SEE
> But I don't go for purity which usually leads to abominations. Note in
> above tick is acceptable. It's a matter of using exceptions sparingly and
> where most effective. That's the art and of course not everyone is going
> to agree on the choices.

My choice is a recognizer for Tick, e.g. 'foo

And I think, we can avoid exceptions in some scopes.


> But back to your original what should be standard convention for parsing
> word syntax, my view:
>
> General choices
> 1) foo bar bat
> No syntax
> One must know what foo bar and bat are.
> 2) foo: bar bat
> Syntax indicates foo: a parsing word with bar as immediate parameter
> but bat is undetermined, could be a second parameter to foo or an
> operator.
> 3) foo( bar bat )
> Syntax indicates foo( is parsing word and has two immediate parameters
> bar and bat.
>
> Choice (1) should be preferable to the Forth purist (DX, the.Bee).
> Don't waste time worrying over syntax schemes.
>
> Choice (2) proposed by Albert which works for me is a simple syntax for
> parsing words, sufficient since our use of parsing words will be limited.
>
> Choice (3) is total explicit and should fit well in a more formal Forth which
> is standard Forth.
>
> I think choice (3) is best for the standard. I'll probably be using choice
> (2) but that doesn't mean I'm changing ' to ': .


From thees three variants, (3) looks better for me too.

But applying this rule to Colon seems not so pretty:

:( foo ) ... ;

(as I already mentioned that in a previous message).


If you want a Colon-like word that parses its body, how it can look like?

I mean something like:

def( foo ){ ... }def

I.e., the word should parse both the name and the whole definition body.
What a syntax can be for such a case?


--
Ruvim

Marcel Hendrix

unread,
May 28, 2022, 11:19:13 AMMay 28
to
On Saturday, May 28, 2022 at 4:32:30 PM UTC+2, Anton Ertl wrote:
[..]
> For ` I have yet to
> see someone write a word that starts with that.

\ Try to talk to the Forth inside the server (If it's not C).

ALSO ENVIR

SERVER >UPC 'N'
<> [IF]
: ` &' PARSE \ #<string of server commands>#
{{
IFORTH!
DUP BOOTLINK @ CHANNEL-!
BOOTLINK @ CHANNEL-SEND
}} ;
[ELSE] : ` CR ." Cannot execute: ``" &' PARSE TYPE ." ''" ;
[THEN]

PREVIOUS

-marcel

S Jack

unread,
May 28, 2022, 11:54:07 AMMay 28
to
On Saturday, May 28, 2022 at 9:41:21 AM UTC-5, Ruvim wrote:
>
> From thees three variants, (3) looks better for me too.
>
> But applying this rule to Colon seems not so pretty:
>
> :( foo ) ... ;
>

Like postscript make name assignment postfix:
: .... ; =( foo )

Colon is a :noname and "=(" assigns a name if wanted. ":NONAME" no longer needed.

Not going for purity one could do:
: .... ;\ foo

--
me

Ruvim

unread,
May 28, 2022, 12:00:06 PMMay 28
to
On 2022-05-20 11:48, albert wrote:
> In article <t62vui$1mi$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>> On 2022-05-17, S Jack wrote:
>>> On Tuesday, May 17, 2022 at 11:23:57 AM UTC-5, Ruvim wrote:
>>>> Can we have a convention for naming parsing words?
>>>>
>>>> What is your considerations?
>>>>
>>>>
>>>> [1] Naming for parsing words
>>>> https://github.com/ForthHub/discussion/discussions/112
>>>
>>> Bah, that a parsing word with immediate parameter is 'better readable' than
>>> a postfix operator. Most definitions contain postfix words and mixing in
>>> parsing words is style conflict so I avoid parsing words.
>>
>> Do you avoid the standard parsing words?
>> For example: "[']" "postpone" 's"' 'abort"'
>> And what about defining words?
>>
>> I'm wondered why people continue to use parsing words if they dislike them.
>>
>> Why not introduce new words like:
>>
>> :def ( sd.name -- ) ( C: -- colon-sys )
>>
>> does-created ( xt sd.name -- )
>>
>> To use them as:
>>
>> `foo :def 123 . ;
>>
>> [: @ . ;] `bar does-created 123 ,
>>
>> foo bar \ prints "123 123"
>
> Then I would prefer:
> [: "hello world" TYPE ;] : hello
> or even c++/java/.. compatible:
> { "hello world" TYPE } : hello

So this ":" is still a parsing word.


In the STOIC language [1,2] it's written as:

'hello : "hello world" type ;

123 'bar constant


Interestingly, it uses two forms of string literals [3] — blank
separated, and double-quoted:

Examples of strings are:

'ABC_DEF -- note termination by a space
(can use a tab also)

"This one has spaces in it as well as ^&*()"



But STOIC didn't avoid inconspicuous immediate arguments.
For example, the word DISPATCH
Compilation: ( "ccc" -- )
Run-time: ( x -- x | )

A usage example:

'C.F : % F-command, needs second character
CLI.GNB IF
CONVERT_TO_UPPER
DISPATCH 'I C.FI
DISPATCH 'O C.FO
THEN DROP ERR.INVCOM
;



> <SNIP>
>>
>> t{ test-baz baz2 -> 123 456 }t
>
> Test words benefit from 2 separate code sequences plugged in.
> I use
> REGRESS test-baz baz2 S: 123 456 <EOL>

There are actually two separate code sequences, they are just separated
not by "S:", but by "->".

In general, you can write:

t{ test-baz baz2 -> 123 dup 333 + }t




[1] STOIC, 1976, — it's very close to Forth
https://en.wikipedia.org/wiki/STOIC

[2] Sources and some applications
http://www.decuslib.com/decus/vax85c/saostoic/

[3] STOIC manual
http://www.decuslib.com/decus/vax85c/saostoic/gm/newstoic.mem

--
Ruvim

S Jack

unread,
May 28, 2022, 12:08:29 PMMay 28