Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Naming for parsing words

614 views
Skip to first unread message

Ruvim

unread,
May 17, 2022, 12:23:57 PM5/17/22
to
In my post [1] in ForthHub I compare the different variants for naming
parsing words.

Namely, I mean the words for which compilation semantics include
scanning/parsing the input stream, and, if interpretation semantics are
defined for the word, they include parsing too. Especially, when the
word parses one or several lexemes.

Can we have a convention for naming parsing words?

What is your considerations?


[1] Naming for parsing words
https://github.com/ForthHub/discussion/discussions/112

--
Ruvim

S Jack

unread,
May 17, 2022, 2:13:01 PM5/17/22
to
On Tuesday, May 17, 2022 at 11:23:57 AM UTC-5, Ruvim wrote:
> Can we have a convention for naming parsing words?
>
> What is your considerations?
>
>
> [1] Naming for parsing words
> https://github.com/ForthHub/discussion/discussions/112

Bah, that a parsing word with immediate parameter is 'better readable' than
a postfix operator. Most definitions contain postfix words and mixing in
parsing words is style conflict so I avoid parsing words. May mean I miss some
peephole optimizing opportunities but that doesn't impact me. If I had
to have a parsing word I would choose the form 'foo( parm )' and I won't be
confusing it as a comment.
--
me

none albert

unread,
May 17, 2022, 3:47:33 PM5/17/22
to
In article <t60i6r$7m0$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>In my post [1] in ForthHub I compare the different variants for naming
>parsing words.
>
>Namely, I mean the words for which compilation semantics include
>scanning/parsing the input stream, and, if interpretation semantics are
>defined for the word, they include parsing too. Especially, when the
>word parses one or several lexemes.
>
>Can we have a convention for naming parsing words?
>
>What is your considerations?

I am a proponent of outlawing parsing words, with an exception for
denotations, say constants.
E.g 'DROP or "AAP" are parsed by ' and " and generate a constant,
independant of interpretation of compilation mode.
That made it possible to restrict the inspection of STATE to
where a word is interpreted or compiled.
The compilation STATE decides whether to compile it,
adding LIT or FLIT, or leave it on the stack.
Don't get upset, I don't propose it to the standard.

Parsing words can be handy in special purpose languages,
to honour expectations from users. That is a different matter.
A simple convention to end the parsing words with ':'.

FROM floating-point IMPORT: F* F** FLOG PI

>
>[1] Naming for parsing words
>https://github.com/ForthHub/discussion/discussions/112
>
>--
>Ruvim

Groetjes Albert
--
"in our communism country Viet Nam, people are forced to be
alive and in the western country like US, people are free to
die from Covid 19 lol" duc ha
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

dxforth

unread,
May 17, 2022, 11:00:14 PM5/17/22
to
I've tended to use /xxx to mean 'extract' e.g. /STRING /SIGN

Any confusion with file path is just the nature of forth and the former
should be in quotes anyway.

Hans Bezemer

unread,
May 18, 2022, 7:38:50 AM5/18/22
to
On Wednesday, May 18, 2022 at 5:00:14 AM UTC+2, dxforth wrote:
> On 18/05/2022 02:23, Ruvim wrote:
> > In my post [1] in ForthHub I compare the different variants for naming
> > parsing words.
> > Can we have a convention for naming parsing words?
> > What is your considerations?
Well, I think you mean by parsing - cutting up the TIB. In that case,
we already have PARSE, PARSE-NAME (and 4tH has got its own "PARSE-WORD").

So, when designing this lib I took that into account:
https://sourceforge.net/p/forth-4th/code/HEAD/tree/trunk/4th.src/lib/parsing.4th

BTW, if you try to compile it - it's 4tH, not Forth, so your mileage may vary.

Hans Bezemer

Ruvim

unread,
May 18, 2022, 10:30:45 AM5/18/22
to
On 2022-05-17, S Jack wrote:
> On Tuesday, May 17, 2022 at 11:23:57 AM UTC-5, Ruvim wrote:
>> Can we have a convention for naming parsing words?
>>
>> What is your considerations?
>>
>>
>> [1] Naming for parsing words
>> https://github.com/ForthHub/discussion/discussions/112
>
> Bah, that a parsing word with immediate parameter is 'better readable' than
> a postfix operator. Most definitions contain postfix words and mixing in
> parsing words is style conflict so I avoid parsing words.

Do you avoid the standard parsing words?
For example: "[']" "postpone" 's"' 'abort"'
And what about defining words?

I'm wondered why people continue to use parsing words if they dislike them.

Why not introduce new words like:

:def ( sd.name -- ) ( C: -- colon-sys )

does-created ( xt sd.name -- )

To use them as:

`foo :def 123 . ;

[: @ . ;] `bar does-created 123 ,

foo bar \ prints "123 123"


And after that, what to do with "[if]" and "[undefined]"?



> May mean I miss some
> peephole optimizing opportunities but that doesn't impact me. If I had
> to have a parsing word I would choose the form 'foo( parm )' and I won't be
> confusing it as a comment.


123 constant( foo )
create( bar ) 456 ,

:( baz ) ( -- x ) foo postpone( foo bar ) ;

:( test-baz ) :( baz2 ) baz drop postpone( @ ; ) ;

t{ test-baz baz2 -> 123 456 }t


Hm.. why not.


--
Ruvim

Anton Ertl

unread,
May 19, 2022, 11:23:39 AM5/19/22
to
Ruvim <ruvim...@gmail.com> writes:
>Do you avoid the standard parsing words?
>For example: "[']" "postpone" 's"' 'abort"'

I avoid these unless there is some reason not to. In particular:

Instead of ['] FOO, I write `FOO. The latter can be copy-pasted into
interpretive code.

Instead of POSTPONE FOO, I write ]] FOO [[. Especially nice for
multiple words.

Instead of S" BLA", I write "BLA".

I don't use ABORT", not the least because I always have to look up the
directiom of the flag. Instead, I use THROW. If I need a new ball, I
create it with

"new ball" exception constant new-ball

>And what about defining words?

I tend to use these. They have default compilation semantics and one
rarely wants to copy-paste them between compiled and interpreted code.
Of course, in those rare cases (i.e., when debugging a defining word),
I wish that they took their name argument from the stack.

>I'm wondered why people continue to use parsing words if they dislike them.

In the four cases above, I do it when writing code that should work on
Forth systems that do not understand the better idioms, or when
demonstating something to an audience that may not be familiar with
the better idioms, and these idioms would distract from the point I am
trying to demonstrate.

>Why not introduce new words like:
>
> :def ( sd.name -- ) ( C: -- colon-sys )
>
> does-created ( xt sd.name -- )

The question is if the benefit is worth the cost in these cases.
Cost: additional words (because we don't want to destandardize all
existing code). Benefit: rare, as mentioned above.

>To use them as:
>
> `foo :def 123 . ;
>
> [: @ . ;] `bar does-created 123 ,
>
> foo bar \ prints "123 123"

Why `FOO, not "FOO"?

>And after that, what to do with "[if]" and "[undefined]"?

And \ and (.

> 123 constant( foo )
> create( bar ) 456 ,
>
> :( baz ) ( -- x ) foo postpone( foo bar ) ;
>
> :( test-baz ) :( baz2 ) baz drop postpone( @ ; ) ;
>
> t{ test-baz baz2 -> 123 456 }t
>
>
>Hm.. why not.

Why yes? And please explain the definition of TEST-BAZ and BAZ2.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

S Jack

unread,
May 19, 2022, 1:20:59 PM5/19/22
to
On Wednesday, May 18, 2022 at 9:30:45 AM UTC-5, Ruvim wrote:
> Do you avoid the standard parsing words?

No. For new words where the choice is to make it parsing or postfix I choose
postfix. I've re-defined some existing parsing words to be postifx such as
FORGET and SEE:
' foo FORGET
' foo SEE
But I don't go for purity which usually leads to abominations. Note in
above tick is acceptable. It's a matter of using exceptions sparingly and
where most effective. That's the art and of course not everyone is going
to agree on the choices.
But back to your original what should be standard convention for parsing
word syntax, my view:

General choices
1) foo bar bat
No syntax
One must know what foo bar and bat are.
2) foo: bar bat
Syntax indicates foo: a parsing word with bar as immediate parameter
but bat is undetermined, could be a second parameter to foo or an
operator.
3) foo( bar bat )
Syntax indicates foo( is parsing word and has two immediate parameters
bar and bat.

Choice (1) should be preferable to the Forth purist (DX, the.Bee).
Don't waste time worrying over syntax schemes.

Choice (2) proposed by Albert which works for me is a simple syntax for
parsing words, sufficient since our use of parsing words will be limited.

Choice (3) is total explicit and should fit well in a more formal Forth which
is standard Forth.

I think choice (3) is best for the standard. I'll probably be using choice
(2) but that doesn't mean I'm changing ' to ': .
--
me

dxforth

unread,
May 20, 2022, 1:14:22 AM5/20/22
to
On 19/05/2022 00:30, Ruvim wrote:
> On 2022-05-17, S Jack wrote:
>>
>> Bah, that a parsing word with immediate parameter is 'better readable' than
>> a postfix operator. Most definitions contain postfix words and mixing in
>> parsing words is style conflict so I avoid parsing words.
>
> Do you avoid the standard parsing words?
> For example: "[']" "postpone" 's"' 'abort"'
> And what about defining words?
>
> I'm wondered why people continue to use parsing words if they dislike them.

But do they [beyond the few that already exist]? A few may enjoy creating
new parsing words (like the few that enjoy creating macros) but I wouldn't
say either was intrinsic to Forth, or even popular. If creating new parsing
words were popular, wouldn't there already be a convention for it?

https://pastebin.com/qpZLFc6h

none albert

unread,
May 20, 2022, 3:48:46 AM5/20/22
to
In article <t62vui$1mi$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>On 2022-05-17, S Jack wrote:
>> On Tuesday, May 17, 2022 at 11:23:57 AM UTC-5, Ruvim wrote:
>>> Can we have a convention for naming parsing words?
>>>
>>> What is your considerations?
>>>
>>>
>>> [1] Naming for parsing words
>>> https://github.com/ForthHub/discussion/discussions/112
>>
>> Bah, that a parsing word with immediate parameter is 'better readable' than
>> a postfix operator. Most definitions contain postfix words and mixing in
>> parsing words is style conflict so I avoid parsing words.
>
>Do you avoid the standard parsing words?
>For example: "[']" "postpone" 's"' 'abort"'
>And what about defining words?
>
>I'm wondered why people continue to use parsing words if they dislike them.
>
>Why not introduce new words like:
>
> :def ( sd.name -- ) ( C: -- colon-sys )
>
> does-created ( xt sd.name -- )
>
>To use them as:
>
> `foo :def 123 . ;
>
> [: @ . ;] `bar does-created 123 ,
>
> foo bar \ prints "123 123"

Then I would prefer:
[: "hello world" TYPE ;] : hello
or even c++/java/.. compatible:
{ "hello world" TYPE } : hello

<SNIP>
>
> t{ test-baz baz2 -> 123 456 }t

Test words benefit from 2 separate code sequences plugged in.
I use
REGRESS test-baz baz2 S: 123 456 <EOL>
>
>
>Hm.. why not.
>
>--
>Ruvim

Hans Bezemer

unread,
May 20, 2022, 7:22:59 AM5/20/22
to
On Thursday, May 19, 2022 at 5:23:39 PM UTC+2, Anton Ertl wrote:

Nice to see that "S Jack" puts me in the realm of "Forth purists", where
4tH cannot be classified as "pure" by any measure - including the words
it supports - which are in part just there to facilitate the particular 4tH
architecture, but also stuff like ">ZERO" ( n -- 0), "STOW" ( n1 n2 -- n1 n1 n2),
"EXCEPT" (like: 0= WHILE), "UNLESS" (like: 0= IF) and ";THEN" (like: EXIT THEN).

But I'm inasmuch a purist where I'd like to keep things simple and clear -
even a little bit less abstract.

E.g. I'd like the "three rule engine" intact, which says:
(1) If it's a word, execute it;
(2) If it's not a word, convert it to a number;
(3) If it's not a number either, it's an error.

> Instead of ['] FOO, I write `FOO. The latter can be copy-pasted into
> interpretive code.
> Instead of S" BLA", I write "BLA".
.. which (like prefixed numbers) violate the rule "keep it simple", since
it requires me to evaluate what I've parsed before I pull the trigger.

I literally try to find the word. FAIL: I literally convert it. FAIL: I throw
an exception. I don't have to go looking for prefixed ', ", #, $ or whatever.
"Simplicity" means "maintainability". "Maintainability" means "less bugs".
(See: "Does Software Decay").

> Instead of POSTPONE FOO, I write ]] FOO [[. Especially nice for
> multiple words.
I think that it's a beautiful solution - although due to 4tH's architecture,
it does carry very little significance to that particular project.

> >Why not introduce new words like:
> Why yes? And please explain the definition of TEST-BAZ and BAZ2.
I think this comes form a desire to "beautify the language" - may be
by even looking at other "beautiful languages", without really appreciating
Forth's inherent philosophy. And that's KISS. And IMHO every generally
accepted proposition HAS to adhere to that philosophy.

I have nothing against coming up with new stuff that make the language
more useful or readable - but the starting point must always be:
- What problem am I exactly solving here;
- How do I express (implement) it in a Forth-like way ;
- How does this actual solution improve the way we're reading and writing
Forth programs.

Starting with (a ripped off) idea about "syntactic sugar" and then try to
squeeze it brute force in the most horrible Forth code, is IMHO the very
worst starting point - even from a engineering point of view.

P.S. I know I'm reacting here to different messages from different persons.
But I'm not addressing particular persons here, but particular ideas.

Hans Bezemer

Hans Bezemer

unread,
May 20, 2022, 7:47:02 AM5/20/22
to
On Wednesday, May 18, 2022 at 4:30:45 PM UTC+2, Ruvim wrote:
> Do you avoid the standard parsing words?
> For example: "[']" "postpone" 's"' 'abort"'
> And what about defining words?
In my programming: no - they're just too handy. Imagine not having
S" and having to poke in characters into a string one by one - no
matter what mechanism you throw at it. I even confiscated C"
for that reason in 4tH to avoid too many C, ;-)

But I can tell you they're a pain in 4tH without a preprocessor. In
that particular variant ALL parsing words have to be hardcoded.

And then there are those STANDARD "parsing words" that are so
braindead that the only way I WANT to support them is by only
supporting them in that preprocessor.

.. like ACTION-OF (needless: DEFER@ can do that);
.. like BEGIN-STRUCTURE (overly complex and VERY unForth-like
compared to "0 .. CONSTANT).
.. like SYNONYM (it's much easier to look up the behavior first and,
if that succeeds, make the proper dictionary entry - instead of
making a dictionary entry and then fail at the most crucial moment).

The latter could even have been better if they followed the ALIAS rule

' FOO ALIAS BAR

Because that is in essence what you're doing and IMHO clearer than

SYNONYM BAR FOO

4tH supports a parsing AKA which does EXACTLY that:

AKA FOO BAR

Although (also from readability), this is IMHO much clearer:

' FOO AKA BAR

But the former is found at least in SOME Forths, so almost COMUS. ;-)

Hans Bezemer



none albert

unread,
May 20, 2022, 9:01:57 AM5/20/22
to
In article <8e02adcd-d183-4d15...@googlegroups.com>,
Hans Bezemer <the.bee...@gmail.com> wrote:
>On Wednesday, May 18, 2022 at 4:30:45 PM UTC+2, Ruvim wrote:
>> Do you avoid the standard parsing words?
>> For example: "[']" "postpone" 's"' 'abort"'
>> And what about defining words?
>In my programming: no - they're just too handy. Imagine not having
>S" and having to poke in characters into a string one by one - no
>matter what mechanism you throw at it. I even confiscated C"
>for that reason in 4tH to avoid too many C, ;-)
>
>But I can tell you they're a pain in 4tH without a preprocessor. In
>that particular variant ALL parsing words have to be hardcoded.
>
>And then there are those STANDARD "parsing words" that are so
>braindead that the only way I WANT to support them is by only
>supporting them in that preprocessor.
>
>.. like ACTION-OF (needless: DEFER@ can do that);
>.. like BEGIN-STRUCTURE (overly complex and VERY unForth-like
>compared to "0 .. CONSTANT).
>.. like SYNONYM (it's much easier to look up the behavior first and,
>if that succeeds, make the proper dictionary entry - instead of
>making a dictionary entry and then fail at the most crucial moment).
>
>The latter could even have been better if they followed the ALIAS rule
>
>' FOO ALIAS BAR

Totally agreed. But in my book:
'FOO ALIAS BAR
' is a normal word, but it has the prefix flag.
'FOO is the same constant in compilation and interpretation mode.
In this way there is no exception with numbers,
1 is the prefix that understands all numbers starting with 1,
2 with 2 etc.
(I invented this when I refused to add $ in my kernel, and
I wanted to load Marcel Hendrix programs. The solution is to
make $ a loadable extension, and the PREFIX idea was born.).
Interestingly the lookup of words in the dictionary is shorter.
Formerly (adr len adr1 len1 ) COMPARE
Now (pp adr len ) CORA (simpler compare)
Who cares what the lenght of the word is, the comparision fails
anyway, normally.

Every word say
DROP 1178 $189 'FOO "We gaan naar rome"
is found in the dictionary. No exceptions.

(I have demonstrated that with adding a few characters to
terminators, not only space and tab, you can parse Pascal.)

>
>Because that is in essence what you're doing and IMHO clearer than
>
>SYNONYM BAR FOO

I hate that too

>
>4tH supports a parsing AKA which does EXACTLY that:
>
>AKA FOO BAR
>
>Although (also from readability), this is IMHO much clearer:
>
>' FOO AKA BAR

The rule is that you add a word (BAR) to the dictionary, the
preceeding word (AKA) does the action, consuming some data.

DEFER BAR
'FOO IS BAR
is also an abomination.
This must be
'FOO 'BAR TRANSFER-EXECUTION-BEHAVIOUR
Neither FOO nor BAR is executed, so their dea ("name tokens")
should be used.

>
>But the former is found at least in SOME Forths, so almost COMUS. ;-)

>
>Hans Bezemer

Anton Ertl

unread,
May 20, 2022, 10:57:51 AM5/20/22
to
albert@cherry.(none) (albert) writes:
>DEFER BAR
>'FOO IS BAR
>is also an abomination.
>This must be
>'FOO 'BAR TRANSFER-EXECUTION-BEHAVIOUR

A standard variant of this code:

' foo ' bar defer!

Anton Ertl

unread,
May 20, 2022, 12:05:05 PM5/20/22
to
Hans Bezemer <the.bee...@gmail.com> writes:
>But I'm inasmuch a purist where I'd like to keep things simple and clear -
>even a little bit less abstract.
>
>E.g. I'd like the "three rule engine" intact, which says:
>(1) If it's a word, execute it;

No compilation?

>(2) If it's not a word, convert it to a number;

No compilation?

>(3) If it's not a number either, it's an error.

I guess you want a traditional text interpreter rather than the
interpret-only text interpreter you outlined above. let's take a look
at a very tradtitional one: This is taken from Ting's System's Guide
to fig-Forth:

: INTERPRET
BEGIN
-FIND
IF
STATE @ <
IF CFA ,
ELSE
CFA
EXECUTE
ENDIF
?STACK
ELSE
HERE
NUMBER
DPL @ 1+
IF
[COMPILE]
DLITERAL
ELSE
DROP
[COMPILE]
LITERAL
ENDIF
?STACK
ENDIF
AGAIN
;

Hmm, triple-nested control structure, 1 BEGIN, 3 IFs, and 3 ELSEs, not
the paragon of simplicity. Interestingly, I don't find your (3) in
there. It must be hidden im NUMBER (checking, it is). It's also not
obvious how INTERPRET terminates; that's performed with the X trick,
but I don't discuss this here further.

How about simplifying it?

1) Eliminating doubles would get rid of one IF.

2) Eliminating numbers would get rid of another IF (you now have to
hide the error handling in -FIND).

3) Eliminating compile state would eliminate the third IF.

The result would be much simpler and especially clearer (well, apart
from the error-hiding and X tricks):

: INTERPRET
BEGIN
-FIND
CFA
EXECUTE
?STACK
AGAIN
;

>"Simplicity" means "maintainability". "Maintainability" means "less bugs".
>(See: "Does Software Decay").

Then you should be delighted by this new INTERPRET.

However, there is a cost to this implementation simplicity. Where in
fig-Forth you write

: star 42 emit ;

you now have to write something like

: star num 42 literal [compile] emit ;

but that's a sacrifice you love to make in the name of "simplicity",
clarity, "maintainability" and "less bugs", no?

If not, please explain why you find the compexity of the traditional
INTERPRET to be acceptable, but not

: INTERPRET
BEGIN
PARSE-NAME DUP
WHILE
FORTH-RECOGNIZER RECOGNIZE
STATE @ IF RECTYPE>COMP ELSE RECTYPE>INT THEN
EXECUTE
?STACK \ simple housekeeping
REPEAT 2DROP
;

(from <https://forth-standard.org/proposals/recognizer#contribution-142>)

The problem with the traditional interpreter is that

* It does not recognize floats.

* Without number prefixes we get bugs in the code from using (sticky)
HEX.

* Without the tick-recognizer I get bugs from writing ' where ['] is
appropriate and vice versa, and testing compiled code interpretively
is cumbersome.

* Without the string-recognizer, we need to use S\" (or S"), and
implementing these words properly is apparently too complex for most
Forth systems.

Hans Bezemer

unread,
May 21, 2022, 6:07:29 AM5/21/22
to
On Friday, May 20, 2022 at 3:01:57 PM UTC+2, none albert wrote:
> DEFER BAR
> 'FOO IS BAR
> is also an abomination.
> This must be
> 'FOO 'BAR TRANSFER-EXECUTION-BEHAVIOUR
> Neither FOO nor BAR is executed, so their dea ("name tokens")
> should be used.
You're kidding, right? I desperately hope so. Because IS is just another
TO. By seriously promoting this, you say:

23 ' FOO !

I don't think this helps readability - on the contrary.

Hans Bezemer

Hans Bezemer

unread,
May 21, 2022, 6:33:55 AM5/21/22
to
On Friday, May 20, 2022 at 6:05:05 PM UTC+2, Anton Ertl wrote:
> Hans Bezemer <the.bee...@gmail.com> writes:
> No compilation?
Compilation sets STATE and enters the compiler. ";" compiles EXIT,
SMUDGEs the latest definition and reenters the interpreter.

> The problem with the traditional interpreter is that
> * It does not recognize floats.
IMHO it should not even recognize doubles. I consider it to be an add-on.
I know it's a quite radical idea and most certainly would break a LOT
of code, I consider it to be architectural more solid. You may have
another opinion.

> * Without number prefixes we get bugs in the code from using (sticky)
> HEX.
Gee, never happened to me. Maybe a programmer discipline issue? IMHO
that enables people to write Forth at all - since they don't deplete or overflow
the stack with every IF or BEGIN..REPEAT.

Neat antidote:

: base&exec base @ >r base ! execute r> base ! ;

Which allows for cool code like:

r@ [char] u = if u>d ['] (.number) 10 base&exec then
r@ [char] o = if u>d ['] (.number) 8 base&exec then
r@ [char] x = if u>d ['] (.number) 16 base&exec then

You know - Factor has got some cool idea's!

> * Without the tick-recognizer I get bugs from writing ' where ['] is
> appropriate and vice versa, and testing compiled code interpretively
> is cumbersome.
Gee, never happened to me. Maybe a programmer discipline issue? IMHO
that enables people to write Forth at all - since they don't deplete or overflow
the stack with every IF or BEGIN..REPEAT.

> * Without the string-recognizer, we need to use S\" (or S"), and
> implementing these words properly is apparently too complex for most
> Forth systems.
Never had that problem. Could be an architecture issue - idunno:

/*
This function compiles '."'.
*/

#ifndef ARCHAIC
static void DoDotQuote (void)
#else
static void DoDotQuote ()
#endif

{
CompileString (PRINT);
}


/*
This function compiles a ,".
*/

#ifndef ARCHAIC
static void DoCommaQuote (void)
#else
static void DoCommaQuote ()
#endif

{
CompileString (STRINGD);
}


/*
This function compiles a S".
*/

#ifndef ARCHAIC
static void DoSQuote (void)
#else
static void DoSQuote ()
#endif

{
CompileString (SQUOTE);
}

The reason I'm posting this is: I can't comment on the rationale
others peoples implementations. I can - however - comment on
mine. If I'd done Tings compiler, I'd probably done it differently.

E.g. I'd implemented a HIDE, so after implementing IF, I could
discard helper words from the search chain . Gee, I even think
I implemented such a thing in 4tH, I'm not sure. But I vaguely
remember it..

Hans Bezemer


Anton Ertl

unread,
May 21, 2022, 6:52:51 AM5/21/22
to
Hans Bezemer <the.bee...@gmail.com> writes:
>On Friday, May 20, 2022 at 3:01:57 PM UTC+2, none albert wrote:
>> DEFER BAR
>> 'FOO IS BAR
>> is also an abomination.

When I made the DEFER proposal, I proposed IS because of common
practice, DEFER! as a non-parsing alternative, and DEFER@ in order to
get the current contents of the deferred word. Stephen Pelc suggested
a parsing alternative to DEFER@, and it became ACTION-OF.

I used to think that ACTION-OF and IS are unnecessary with DEFER@ and
DEFER!, but a few years ago we introduced defer-flavoured locals, and
with locals the idioms "['] foo defer@" and "( xt ) ['] foo defer!"
don't work, so ACTION-OF and IS are actually necessary in this
context.

Concerning VALUEs, GForth does not allow ADDR for them, so you have to
use IS to change them. And that's good, because it means that @ and !
don't access the value.

Anton Ertl

unread,
May 21, 2022, 2:11:49 PM5/21/22
to
Hans Bezemer <the.bee...@gmail.com> writes:
>On Friday, May 20, 2022 at 6:05:05 PM UTC+2, Anton Ertl wrote:
>> * It does not recognize floats.
>IMHO it should not even recognize doubles. I consider it to be an add-on.

But your three rule engine had no place for add-ons.

>I know it's a quite radical idea and most certainly would break a LOT
>of code, I consider it to be architectural more solid. You may have
>another opinion.

No, it actually is the direction I prefer: have an extensible text
interpreter. And once you have that, you can make your rule (1) and
(2) extensions, too.

>Gee, never happened to me. Maybe a programmer discipline issue?
[...]
>Gee, never happened to me. Maybe a programmer discipline issue?

You can burden the programmers with some tasks, claim that you have a
simple system that supposedly reduces bugs, and dismiss all bugs
arising from that as prgrammer discipline issues (which does not make
them go away).

I prefer to avoid such issues by giving the programmers less
burdensome ways to express the programs, e.g., number prefixes and the
tick-recognizer.

>> * Without the string-recognizer, we need to use S\" (or S"), and
>> implementing these words properly is apparently too complex for most
>> Forth systems.
>Never had that problem. Could be an architecture issue - idunno:

[C code skipped]

What do you want to tell me by showing some C fragments?

>The reason I'm posting this is: I can't comment on the rationale
>others peoples implementations. I can - however - comment on
>mine.

But you posted the C fragments without any comment relevant to the
discussion at hand.

And if you studied other implementations, you could

1) learn from them.

2) comment on them.

Your comments leave the impression that you are convinced that your
implementation is the greatest, or at least close to it, but you
actually don't know other implementations.

Hans Bezemer

unread,
May 21, 2022, 5:33:42 PM5/21/22
to
On Saturday, May 21, 2022 at 8:11:49 PM UTC+2, Anton Ertl wrote:
> You can burden the programmers with some tasks, claim that you have a
> simple system that supposedly reduces bugs, and dismiss all bugs
> arising from that as prgrammer discipline issues (which does not make
> them go away).
>
> I prefer to avoid such issues by giving the programmers less
> burdensome ways to express the programs, e.g., number prefixes and the
> tick-recognizer.
Try Java. You'll find it impressive. It's built EXACTLY from that perspective.
Dijkstra LOVED it.

Hans Bezemer

dxforth

unread,
May 22, 2022, 2:42:31 AM5/22/22
to
On 21/05/2022 20:26, Anton Ertl wrote:
> ...
> Concerning VALUEs, GForth does not allow ADDR for them, so you have to
> use IS to change them. And that's good, because it means that @ and !
> don't access the value.

So assembler routines can't access VALUEs ?

#512 value #OUTBUF \ buffer size

code WRITECHAR ( char -- )
addr #outbuf ) ax mov outsiz ) ax cmp 1 $ jnz
c: (flushwrite) 4 ?ferror ;c
1 $: ax pop outptr ) di mov al 0 [di] mov
outptr ) inc outsiz ) inc next end-code

Anton Ertl

unread,
May 22, 2022, 3:23:12 AM5/22/22
to
dxforth <dxf...@gmail.com> writes:
>On 21/05/2022 20:26, Anton Ertl wrote:
>> ...
>> Concerning VALUEs, GForth does not allow ADDR for them, so you have to
>> use IS to change them. And that's good, because it means that @ and !
>> don't access the value.
>
>So assembler routines can't access VALUEs ?

If they know where to find them, they can. The value might be in a
register, however; it also might be in a register in some parts of the
code, and in memory in other parts.

Anton Ertl

unread,
May 22, 2022, 4:42:16 AM5/22/22
to
In my experience Java is quite burdensome.

>Dijkstra LOVED it.

I actually tool the effort to check your claim, and what I found
indicates that it is a blatant lie:
<https://www.cs.utexas.edu/users/EWD/transcriptions/OtherDocs/Haskell.html>

Some quotes from this letter by Dijkstra:

|their undergraduate curriculum has not recovered from the transition
|from Pascal to something like C++ or Java.

|Haskell, though not perfect, is of a quality that is several orders of
|magnitude higher than Java, which is a mess

Hans Bezemer

unread,
May 22, 2022, 9:06:37 AM5/22/22
to
On Sunday, May 22, 2022 at 10:42:16 AM UTC+2, Anton Ertl wrote:
> Hans Bezemer <the.bee...@gmail.com> writes:
> >On Saturday, May 21, 2022 at 8:11:49 PM UTC+2, Anton Ertl wrote:
> >> I prefer to avoid such issues by giving the programmers less
> >> burdensome ways to express the programs, e.g., number prefixes and the
> >> tick-recognizer.
> >Try Java. You'll find it impressive. It's built EXACTLY from that perspective.
> In my experience Java is quite burdensome.
>
> >Dijkstra LOVED it.
>
> I actually tool the effort to check your claim, and what I found
> indicates that it is a blatant lie:
> <https://www.cs.utexas.edu/users/EWD/transcriptions/OtherDocs/Haskell.html>
>
> Some quotes from this letter by Dijkstra:
>
> |their undergraduate curriculum has not recovered from the transition
> |from Pascal to something like C++ or Java.

Understanding sarcasm is a skill, I suppose. If you'd read his writings about e.g. Ada, you would have understood it was sarcasm.

HB

Paul Rubin

unread,
May 22, 2022, 4:21:37 PM5/22/22
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> |Haskell, though not perfect, is of a quality that is several orders of
> |magnitude higher than Java, which is a mess

From Dijkstra's 1972 Turing Award lecture:

With a few very basic principles at its foundation, it [LISP] has
shown a remarkable stability. Besides that, LISP has been the
carrier for a considerable number of in a sense our most
sophisticated computer applications. LISP has jokingly been
described as “the most intelligent way to misuse a computer”. I
think that description a great compliment because it transmits the
full flavour of liberation: it has assisted a number of our most
gifted fellow humans in thinking previously impossible
thoughts.

dxforth

unread,
May 22, 2022, 10:50:48 PM5/22/22
to
On 22/05/2022 17:16, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>On 21/05/2022 20:26, Anton Ertl wrote:
>>> ...
>>> Concerning VALUEs, GForth does not allow ADDR for them, so you have to
>>> use IS to change them. And that's good, because it means that @ and !
>>> don't access the value.
>>
>>So assembler routines can't access VALUEs ?
>
> If they know where to find them, they can. The value might be in a
> register, however; it also might be in a register in some parts of the
> code, and in memory in other parts.

The choice whether a VALUE or a VARIABLE is used should have nothing to do
with whether one is using high-level or assembler. In making the former
harder to use, one is saying VALUEs for forth and VARIABLEs for assembler.

Anton Ertl

unread,
May 23, 2022, 1:49:36 AM5/23/22
to
dxforth <dxf...@gmail.com> writes:
>On 22/05/2022 17:16, Anton Ertl wrote:
>>>So assembler routines can't access VALUEs ?
>>
>> If they know where to find them, they can. The value might be in a
>> register, however; it also might be in a register in some parts of the
>> code, and in memory in other parts.
>
>The choice whether a VALUE or a VARIABLE is used should have nothing to do
>with whether one is using high-level or assembler.

If you feel this way, you as a system designer can devise the
interface between Forth and assembly language accordingly; or as a
user of a third-party Forth system, you can choose a Forth system that
satisfies your requirement.

dxforth

unread,
May 23, 2022, 5:25:40 AM5/23/22
to
On 23/05/2022 15:40, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>On 22/05/2022 17:16, Anton Ertl wrote:
>>>>So assembler routines can't access VALUEs ?
>>>
>>> If they know where to find them, they can. The value might be in a
>>> register, however; it also might be in a register in some parts of the
>>> code, and in memory in other parts.
>>
>>The choice whether a VALUE or a VARIABLE is used should have nothing to do
>>with whether one is using high-level or assembler.
>
> If you feel this way, you as a system designer can devise the
> interface between Forth and assembly language accordingly; or as a
> user of a third-party Forth system, you can choose a Forth system that
> satisfies your requirement.

Gforth seems to be the 'odd man out' here. A user may well ask why has
it made access to VALUEs so difficult.

Anton Ertl

unread,
May 23, 2022, 7:44:26 AM5/23/22
to
dxforth <dxf...@gmail.com> writes:
>Gforth seems to be the 'odd man out' here. A user may well ask why has
>it made access to VALUEs so difficult.

Difficult? Let's phrase the question in a less loaded way:

Q: Why does Gforth not support ADDR on value-flavoured words?

A: E.g., because without ADDR a future version of Gforth can keep the
value V in a register in the loop

?do ... v ... @ ... to v ... loop

wheras with ADDR V that's not generally possible, and those cases that
are possible require a lot of compiler complexity.

Q: But I want to port code that uses ADDR V to Gforth?

A: Either define V as a VARUE (like a VALUE, but supports ADDR), or
just prepend the code with

: VALUE VARUE ;

dxforth

unread,
May 23, 2022, 9:17:05 PM5/23/22
to
On 23/05/2022 21:23, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>Gforth seems to be the 'odd man out' here. A user may well ask why has
>>it made access to VALUEs so difficult.
>
> Difficult? Let's phrase the question in a less loaded way:
>
> Q: Why does Gforth not support ADDR on value-flavoured words?
>
> A: E.g., because without ADDR a future version of Gforth can keep the
> value V in a register in the loop
>
> ?do ... v ... @ ... to v ... loop
>
> wheras with ADDR V that's not generally possible, and those cases that
> are possible require a lot of compiler complexity.

Keeping things in registers usually refers to constants or locals.
My understanding of VALUEs is that they're read far more often than
written and, as such, your use above would appear to be something of
an anomaly. FWIW I don't expect ADDR to be used much either - which
isn't to say I can do without it. I definitely want to be able to
access VALUEs via assembler. To me that's more important than
'keeping it in a register'.

>
> Q: But I want to port code that uses ADDR V to Gforth?
>
> A: Either define V as a VARUE (like a VALUE, but supports ADDR), or
> just prepend the code with
>
> : VALUE VARUE ;

I have a hard enough time justifying VALUEs and VARIABLEs.

Anton Ertl

unread,
May 24, 2022, 7:50:41 AM5/24/22
to
dxforth <dxf...@gmail.com> writes:
>On 23/05/2022 21:23, Anton Ertl wrote:
>> Q: Why does Gforth not support ADDR on value-flavoured words?
>>
>> A: E.g., because without ADDR a future version of Gforth can keep the
>> value V in a register in the loop
>>
>> ?do ... v ... @ ... to v ... loop
>>
>> wheras with ADDR V that's not generally possible, and those cases that
>> are possible require a lot of compiler complexity.
>
>Keeping things in registers usually refers to constants or locals.

Sure, if you support ADDR, you cannot keep values in registers in many
situations (and the remaining situations are probably so few that you
just keep them in memory all the time).

But there are also those who advocate not using locals, and to use
variables or values instead. Aren't you one of them? Anyway, for
those usages, among others, it would be beneficial to keep values in
registers.

>My understanding of VALUEs is that they're read far more often than
>written and, as such, your use above would appear to be something of
>an anomaly.

Possibly. When a value is both read and written in a loop, allocating
it in a register is particularly beneficial. Here's an example:

: foo1 0 begin 2dup u> while 1+ repeat 2drop ;
0 value x
: foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;

[/tmp:130361] perf stat -e cycles -e instructions vfxlin "include xxx.fs 1000000000 foo1 bye"
[/tmp:130362] perf stat -e cycles -e instructions vfxlin "include xxx.fs 1000000000 foo2 bye"

On a Skylake this produces:

cycles instructions
1012951207 4010090522 foo1
5060982449 6010156213 foo2

So allocating x in a register (what lxf does for stack items) is 5
times faster than allocating it in memory (what lxf does for values).

>FWIW I don't expect ADDR to be used much either

Exactly. But the possible future use of ADDR on a value means that it
always has to be kept in memory. So if you support ADDR on a value,
you have to pay for it even if you don't use it (on that value, or at
all).

And that's why Gforth does not support ADDR on values. If you want to
use ADDR on a word, you can define this particular word with VARUE.

Anton Ertl

unread,
May 24, 2022, 12:53:55 PM5/24/22
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>When a value is both read and written in a loop, allocating
>it in a register is particularly beneficial. Here's an example:
>
>: foo1 0 begin 2dup u> while 1+ repeat 2drop ;
>0 value x
>: foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;
>
>[/tmp:130361] perf stat -e cycles -e instructions vfxlin "include xxx.fs 1000000000 foo1 bye"
>[/tmp:130362] perf stat -e cycles -e instructions vfxlin "include xxx.fs 1000000000 foo2 bye"
>
>On a Skylake this produces:
>
> cycles instructions
> 1012951207 4010090522 foo1
> 5060982449 6010156213 foo2
>
>So allocating x in a register (what lxf does for stack items) is 5
>times faster than allocating it in memory (what lxf does for values).

Sorry for the confusion, I used both lxf and vfxlin 4.72 for this,
with similar results.

And just to show that it's not as bad for read-only or write-only
memory accesses:

: foo1 0 begin 2dup u> while 1+ repeat 2drop ;
0 value x
0 value y
: foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;
: foo3 to y 0 begin y over u> while 1+ repeat drop ;
: foo4 0 begin 2dup u> while 1+ dup to y repeat 2drop ;

cycles instructions vfxlin 4.72
1012848419 4009410376 foo1 registers
5060035003 6010211435 foo2 read+write
1900018814 4009598660 foo3 read-only
1012666250 5009454583 foo4 write-only

Here are the inner loops:

foo1 foo2 foo3 foo4
CMP EBX, [EBP] CMP EBX, [080A3440] CMP EBX, [080A3444] CMP EBX, [EBP]
JNB 080C0ACC JBE 080C0B7B JNB 080C0BBF JNB 080C0C12
INC EBX MOV EDX, [080A3440] INC EBX INC EBX
JMP 080C0AC0 INC EDX JMP 080C0BB0 MOV [080A3444], EBX
MOV [080A3440], EDX JMP 080C0C00
JMP 080C0B60

Interestingly, foo1 and foo4 read one stack item from memory (likewise
for lxf).

For lxf:

cycles instructions vfxlin 4.72
1002385171 4000942415 foo1 registers
5137955357 7001690803 foo2 read+write
1002416848 5000977891 foo3 read-only
2002758293 5001179157 foo4 write-only

foo1 foo2 foo3 foo4
cmp [ebp], ebx mov eax, [08389CF8] mov eax, [08389CFC] cmp [ebp], ebx
jbe "0804FBFB" cmp ebx, eax cmp eax, ebx jbe "0804FC80"
inc ebx jbe "0804FC3A" jbe "0804FC5C" inc ebx
jmp "0804FBEF" mov eax, [08389CF8] inc ebx mov [08389CFC], ebx
inc eax jmp "0804FC4C" jmp "0804FC6E"
mov [08389CF8], eax
jmp "0804FC20"

Interestingly, on lxf FOO3 is faster and FOO4 slower than on vfxlin,
even though FOO4 has the same code as on VFX; maybe a code alignment
issue. Anyway, you can see that the variant with memory read+write is
a lot slower than the other variants.

- anton

































(

Marcel Hendrix

unread,
May 24, 2022, 4:36:44 PM5/24/22
to
On Tuesday, May 24, 2022 at 6:53:55 PM UTC+2, Anton Ertl wrote:
[..]
> And just to show that it's not as bad for read-only or write-only
> memory accesses:
> : foo1 0 begin 2dup u> while 1+ repeat 2drop ;
> 0 value x
> 0 value y
> : foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;
> : foo3 to y 0 begin y over u> while 1+ repeat drop ;
> : foo4 0 begin 2dup u> while 1+ dup to y repeat 2drop ;
>
> cycles instructions vfxlin 4.72
> 1012848419 4009410376 foo1 registers
> 5060035003 6010211435 foo2 read+write
> 1900018814 4009598660 foo3 read-only
> 1012666250 5009454583 foo4 write-only

Hmm, weird. On iForth it is permutated:

FORTH> #1000000000 TO #times TEST
foo1 0.212 seconds elapsed.
foo2 0.221 seconds elapsed.
foo3 0.213 seconds elapsed.
foo4 0.425 seconds elapsed. ok

FORTH> ' foo1 idis ' foo2 idis ' foo3 idis ' foo4 idis
$0133D880 : foo1
$0133D88A xor rbx, rbx
$0133D88D mov rax, rax
$0133D890 pop rdi
$0133D891 cmp rbx, rdi
$0133D894 push rdi
$0133D895 jae $0133D8A3 offset NEAR
$0133D89B lea rbx, [rbx 1 +] qword
$0133D89F jmp $0133D890 offset SHORT
$0133D8A1 push rbx
$0133D8A2 pop rbx
$0133D8A3 pop rdi
$0133D8A4 ;
$01340940 : foo2
$0134094A mov $01340500 qword-offset, 0 d#
$01340955 pop rbx
$01340956 nop
$01340957 nop
$01340958 mov rax, $01340500 qword-offset
$0134095F cmp rax, rbx
$01340962 jae $0134097E offset NEAR
$01340968 mov rax, $01340500 qword-offset
$0134096F lea rcx, [rax 1 +] qword
$01340973 mov $01340500 qword-offset, rcx
$0134097A jmp $01340958 offset SHORT
$0134097C push rbx
$0134097D pop rbx
$0134097E ;
$01340A00 : foo3
$01340A0A pop rbx
$01340A0B mov $01340520 qword-offset, rbx
$01340A12 xor rbx, rbx
$01340A15 mov rax, rax
$01340A18 mov rax, $01340520 qword-offset
$01340A1F cmp rbx, rax
$01340A22 jae $01340A30 offset NEAR
$01340A28 lea rbx, [rbx 1 +] qword
$01340A2C jmp $01340A18 offset SHORT
$01340A2E push rbx
$01340A2F pop rbx
$01340A30 ;
$01340A80 : foo4
$01340A8A xor rbx, rbx
$01340A8D mov rax, rax
$01340A90 pop rdi
$01340A91 cmp rbx, rdi
$01340A94 push rdi
$01340A95 jae $01340AAE offset NEAR
$01340A9B lea rcx, [rbx 1 +] qword
$01340A9F mov $01340520 qword-offset, rcx
$01340AA6 lea rbx, [rbx 1 +] qword
$01340AAA jmp $01340A90 offset SHORT
$01340AAC push rbx
$01340AAD pop rbx
$01340AAE pop rdi
$01340AAF ;

The stack variable ( rdi ) is causing the slowdown.

-marcel

dxforth

unread,
May 24, 2022, 8:59:47 PM5/24/22
to
On 24/05/2022 20:48, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>> ...
>>Keeping things in registers usually refers to constants or locals.
>
> Sure, if you support ADDR, you cannot keep values in registers in many
> situations (and the remaining situations are probably so few that you
> just keep them in memory all the time).
>
> But there are also those who advocate not using locals, and to use
> variables or values instead. Aren't you one of them? Anyway, for
> those usages, among others, it would be beneficial to keep values in
> registers.
>
>>My understanding of VALUEs is that they're read far more often than
>>written and, as such, your use above would appear to be something of
>>an anomaly.
>
> Possibly. When a value is both read and written in a loop, allocating
> it in a register is particularly beneficial. Here's an example:
>
> : foo1 0 begin 2dup u> while 1+ repeat 2drop ;
> 0 value x
> : foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;

That's where I would use a VARIABLE - which optimizing compilers can
can keep in a register if they wish. The example code I previously
posted has one VALUE referenced once and two VARIABLEs each referenced
twice. If I were going to prioritize anything for register use, it
would be VARIABLEs as they're used more often.

VALUEs in forth were conceived as self-fetching VARIABLEs. Which was
fine until one had to write to them. They're in CORE-EXT because they
didn't replace anything nor add anything. Syntactic sugar with pros
and cons is how I view them. I'll use them - but not when VARIABLEs
are the better choice.

Anton Ertl

unread,
May 25, 2022, 1:05:15 AM5/25/22
to
Marcel Hendrix <m...@iae.nl> writes:
>On Tuesday, May 24, 2022 at 6:53:55 PM UTC+2, Anton Ertl wrote:
>[..]
>> And just to show that it's not as bad for read-only or write-only
>> memory accesses:
>> : foo1 0 begin 2dup u> while 1+ repeat 2drop ;
>> 0 value x
>> 0 value y
>> : foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;
>> : foo3 to y 0 begin y over u> while 1+ repeat drop ;
>> : foo4 0 begin 2dup u> while 1+ dup to y repeat 2drop ;
>>
>> cycles instructions vfxlin 4.72
>> 1012848419 4009410376 foo1 registers
>> 5060035003 6010211435 foo2 read+write
>> 1900018814 4009598660 foo3 read-only
>> 1012666250 5009454583 foo4 write-only
>
>Hmm, weird. On iForth it is permutated:
>
>FORTH> #1000000000 TO #times TEST
>foo1 0.212 seconds elapsed.
>foo2 0.221 seconds elapsed.
>foo3 0.213 seconds elapsed.
>foo4 0.425 seconds elapsed. ok

So that's ~1/1/1/2 cycles per iteration if your CPU runs slightly
below 5GHz. My guess is that's because you are running it on a Zen3
CPU, where the hardware has special optimizations for avoiding the
memory dependence latency we see on the Skylake.

Here's the cycles/iteration that I see on more recent CPUs than the
Skylake (using lxf):

foo1 foo2 foo3 foo4
1 8 1 1 Zen2 (Ryzen 3900X)
1 1 1 1 Zen3 (Ryzen 5800X)
1 1.8 1 1 Rocket Lake (Xeon W1370P)

So Rocket Lake obviously optimizes this case, too, but apparently
there is some residue.

At some point, CPUs will optimize so well that Python will run as fast
as well-written assembly language. Of course, by then software
developers will have switched to a language that's 100 times slower
even on that hardware:-).

Anton Ertl

unread,
May 25, 2022, 1:42:58 AM5/25/22
to
dxforth <dxf...@gmail.com> writes:
>On 24/05/2022 20:48, Anton Ertl wrote:
>> : foo1 0 begin 2dup u> while 1+ repeat 2drop ;
>> 0 value x
>> : foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;
>
>That's where I would use a VARIABLE

I certainly used to think that variables are the Forth way, and that
one should use values only for almost-constants that are not changed
in compiled code. But values (without ADDR) have a nice property, as
discussed in this thread.

>which optimizing compilers can
>can keep in a register if they wish.

Forget it. This optimization is as hard for variables as for varues.
It is easy for values. A Forth meme is that you should not burden the
compiler with jobs that the programmer can perform. And as it
happens, values make the compiler's job easier than varues or
variables, so if you want this optimization, use values!

Another way to satisfy the meme is to fetch the variable at the start,
keep it in, e.g., a local in the word, and store it back im the end.
E.g.,

variable xx
: foo2a 0 {: x :} begin dup x u> while x 1+ to x repeat drop x xx ! ;
\ or
: foo2b 0 begin {: x :} dup x u> while x 1+ repeat xx ! drop ;


>The example code I previously
>posted has one VALUE referenced once and two VARIABLEs each referenced
>twice. If I were going to prioritize anything for register use, it
>would be VARIABLEs as they're used more often.

And then you find that you have to prove that nothing else stores to
or fetches from the address between the accesses to the variable.
Hundreds of papers have been written about this problem (alias
analysis); it takes a lot of work (both compiler writer time and
compile time) to perform alias analysis, and it still only partially
solves the problem. As mentioned above, IMO that's not the Forth way.

We already have some Forth ways:

1) Use values.

2) Keep the value of the variable explicitly in locals, the return
stack, or the data stack in the code fragment where you want to give
the compiler the opportunity to keep it in a register.

dxforth

unread,
May 25, 2022, 3:15:57 AM5/25/22
to
On 25/05/2022 15:05, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>On 24/05/2022 20:48, Anton Ertl wrote:
>>> : foo1 0 begin 2dup u> while 1+ repeat 2drop ;
>>> 0 value x
>>> : foo2 0 to x begin dup x u> while x 1+ to x repeat drop ;
>>
>>That's where I would use a VARIABLE
>
> I certainly used to think that variables are the Forth way, and that
> one should use values only for almost-constants that are not changed
> in compiled code.

Does that make a VARUE an 'almost-variable' :) How many languages
do you know have "almost constants"? Sorry, a VALUE is a variable.

> But values (without ADDR) have a nice property, as
> discussed in this thread.

Good luck in the real world.

>
>>which optimizing compilers can
>>can keep in a register if they wish.
>
> Forget it. This optimization is as hard for variables as for varues.
> It is easy for values. A Forth meme is that you should not burden the
> compiler with jobs that the programmer can perform. And as it
> happens, values make the compiler's job easier than varues or
> variables, so if you want this optimization, use values!

A better meme is don't break common practice - especially when one
is promoting portability between systems, standards and the like.

Hans Bezemer

unread,
May 25, 2022, 11:04:28 AM5/25/22
to
On Sunday, May 22, 2022 at 10:21:37 PM UTC+2, Paul Rubin wrote:
The advantage of being Dutch is that lots more of his work is available to you.
Dijkstra didn't believe the myth that "the language will save you" - that's a
recurring theme in his work:

"For example, to the desperate manager, it is a startlingly comforting thought
that the programming language in use is the source of all his misery. The wretch
clings to the dream of the programming language, in which programming is so
easy that everything will work itself out. The new programming languages ​​as
panacea are sold in the quack stall like fresh bread from the baker. Notorious in
this regard is the IBM ad in Datamation, 1968, in which a radiant Susie Meyer
—in colors!— declares that the conversion to PL/I was the end of all her programming
troubles. Unfortunately, history does not record what poor Susie Meyer looked like a
few years later, but one can guess, because the miracle cure did not work, of course.
Anyone who reads the propaganda literature for Ada—the programming language
promoted by the US Department of Defense—must see that the world has changed
little in 14 years".

And:

"Once the feeling among mathematicians that computers are not worth bothering
with, then it acts as a so-called "self-fulfilling prophecy": if the intellectually best
equipped ignore the subject, the territory is occupied by second- and third-rate
people and after a while it is even more difficult for the really clever boy to
imagine that there is a task in front of him (..) By discussing the essentially
mathematical nature of the entire usability problem with a large-scale campaign,
programming as something that anyone can learn in a three-week course, in short,
by declaring so obstinately that the gold of the promised mountains is massive that
enough people believe it. In the years that followed, the profession of programmer
was recruited completely uncritically. Here in this country MULO (Highschool)
was more than enough, but in other countries it was no better. The result can be
imagined: of the half-million professional programmers the world now counts,
the majority is one of incompetence that defies description. But here's an
explanation for the tough life of the software crisis: an incompetent half-million
labor army isn't replaced overnight".

Hans Bezemer

Original:

Voor de wanhopige manager is het bijvoorbeeld een ontstellend geruststellende gedachte,
dat de gebezigde programmeertaal de bron van al zijn ellende is. De stakker klampt zich
vast aan de droom van de programmeertaal, waarin programmeren zo makkelijk is, dat het
allemaal vanzelf goedkomt. De nieuwe programmeertalen als panacee gaan in de
kwakzalverskraam over de toonbank als verse broodjes bij de bakker. In dit verband berucht
is de IBM-advertentie in Datamation, 1968, waarin een stralende Susie Meyer —in kleuren!—
verklaart, dat de bekering tot PL/I het einde van al haar programmeerproblemen was. Hoe de
arme Susie Meyer er een paar jaar later uitzag, vermeldt de historie helaas niet, maar het laat
zich raden, want het wondermiddel heeft natuurlijk niet gewerkt. Wie de propagandaliteratuur
voor Ada —de programmeertaal, die door het Amerikaanse ministerie van defensie wordt
gepousseerd— leest, moet constateren, dat de wereld in 14 jaar weinig is veranderd.

Overheerst bij wiskundigen eenmaal het gevoel dat computers niet de moeite waard
zijn om je mee bezig te houden, dan werkt dat vervolgens als een z.g.
"self-fulfilling prophecy": als de intellectueel het best geequipeerden het vak links laten
liggen, wordt het gebied bezet door tweede- en derderangs mensen en na enige tijd kan
de echt knappe jongen zich nog moeilijker voorstellen dat daar een taak voor hem ligt (..)
Door met een groots opgezette campagne het wezenlijk wiskundige karakter van de hele
gebruiksproblematiek onder tafel te praten, programmeren voor te stellen als iets dat
iedereen in een drieweekse cursus kan leren, kortom door zo hardnekking te verklaren
dat het goud der beloofde bergen massief is, dat genoeg mensen het geloven. In de jaren
die daar op volgden is er voor het vak van programmeur volledig kritiekloos geronseld.
Hier in den lande was MULO ruimschoots genoeg, maar in andere landen was het geen
haar beter. Het resultaat laat zich denken: van de half-millioen professionele programmeurs,
die de wereld inmiddels telt is het merendeel van een incompetentie die elke beschrijving tart.
Maar hier ligt wel een verklaring van het taaie leven van de software crisis: een incompetent
arbeidsleger van een half millioen ververs je niet een-twee-drie.


Anton Ertl

unread,
May 25, 2022, 1:30:39 PM5/25/22
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>My guess is that's because you are running it on a Zen3
>CPU, where the hardware has special optimizations for avoiding the
>memory dependence latency we see on the Skylake.
>
>Here's the cycles/iteration that I see on more recent CPUs than the
>Skylake (using lxf):
>
>foo1 foo2 foo3 foo4
>1 8 1 1 Zen2 (Ryzen 3900X)
>1 1 1 1 Zen3 (Ryzen 5800X)
>1 1.8 1 1 Rocket Lake (Xeon W1370P)
>
>So Rocket Lake obviously optimizes this case, too, but apparently
>there is some residue.

Of course, the question is how well this works in other cases. A
major difference between gforth and gforth-fast is that gforth keeps
the TOS in memory and gforth-fast keeps it in a register. And if we
disable the additional optimizations of gforth-fast with
"--ss-number=0 --ss-states=0", that's even more so. In particular, a
lot of the performance difference on CPUs without this optimization is
due to this difference. So on CPUs with this optimization the
performance difference should be much smaller. Let's see whether that
works out. I do

for i in gforth-fast gforth; do LC_NUMERIC=en_US perf stat -e cycles -e instructions -e branches $i --ss-number=0 --ss-states=0 ../gforth/onebench.fs; done

on six different CPUs; the numbers are cycles, except for the first
line.

gforth-fast gforth
3,460,908,004 4,737,564,166 instructions (all)
2,269,234,025 4,104,828,209 Goldmont (2016 efficiency)
1,714,076,717 3,495,857,543 Sandy Bridge (2011)
1,584,890,277 3,419,614,950 Zen (2017)
1,342,756,453 2,706,876,924 Zen2 (2019)
1,110,655,337 2,424,136,035 Zen3 (2020)
1,189,653,919 2,106,179,585 Rocket Lake (2021)

So, gforth-fast --ss-number=0 --ss-states=0 gets roughly a factor of 2
throughout the bank. The memory dependence optimization does not
appear to benefit gforth more than gforth-fast. It's unclear to me
whether that is because it is not that effective for gforth, or there
are also important memory dependencies in gforth-fast for which it
helps (but I don't know of the latter).

Hans Bezemer

unread,
May 25, 2022, 4:23:39 PM5/25/22
to
On Wednesday, May 25, 2022 at 2:59:47 AM UTC+2, dxforth wrote:
> That's where I would use a VARIABLE - which optimizing compilers can
> can keep in a register if they wish. The example code I previously
> posted has one VALUE referenced once and two VARIABLEs each referenced
> twice. If I were going to prioritize anything for register use, it
> would be VARIABLEs as they're used more often.
>
> VALUEs in forth were conceived as self-fetching VARIABLEs. Which was
> fine until one had to write to them. They're in CORE-EXT because they
> didn't replace anything nor add anything. Syntactic sugar with pros
> and cons is how I view them. I'll use them - but not when VARIABLEs
> are the better choice.

As far as 4tH is concerned - I started out like you. Sure, it's in the extended
core and people tend to use them in programs, so it's handy if you add them
to the vocabulary. But its use largely overlaps VARIABLE. Not supporting
LOCALS helps too ;-)

Nowadays the difference (in 4tH) has blurred even further, since the optimizer
changes VARIABLEs to VALUEs when the address of the variable is known
at compile time. +TO changes VALUEs to VARIABLEs so it won't take up another
token and +! can simply be used.

And I find that I tend to do VALUEs often when I can initialize a VARIABLE -
like VARIABLE in Forth-79.

HB

dxforth

unread,
May 26, 2022, 1:28:53 AM5/26/22
to
On 26/05/2022 06:23, Hans Bezemer wrote:
> ...
> And I find that I tend to do VALUEs often when I can initialize a VARIABLE -
> like VARIABLE in Forth-79.

Given VALUEs will change (else one would use CONSTANT) it might have been
better to omit the initial value. Or would that have made VALUEs too much
like VARIABLEs ? In the app below every VALUE was defined with a dummy.

\ Program constants
0 value #TERMS \ number of terminals in DTA file
0 value TERM \ working terminal#
$95 constant TLEN \ length of each term definition
$100 constant CHUNK \ in-file chunk to get
20 constant ISIZ \ size input buffer / terminal name
200 constant TMAX \ max #terminals

\ Storage areas allocated at run-time
here value IN-BASE ( -- a ) \ in-file
here value DTA-BASE ( -- a ) \ dta-file
here value TBUF ( -- a ) \ temp terminal buffer
here value SBUF ( -- a ) \ swap/work buffer
here value IBUF ( -- a ) \ console input
here value XBUF ( -- a ) \ terminal index

Anton Ertl

unread,
May 26, 2022, 2:28:46 AM5/26/22
to
dxforth <dxf...@gmail.com> writes:
>On 26/05/2022 06:23, Hans Bezemer wrote:
>> ...
>> And I find that I tend to do VALUEs often when I can initialize a VARIABLE -
>> like VARIABLE in Forth-79.

From Forth-79:
|VARIABLE 227
| A defining word executed in the form:
| VARIABLE <name>
| to create a dictionary entry for <name> and allot two bytes
| for storage in the parameter field. The application must
| initialize the stored value.

In fig-Forth VARIABLE took the initial value from the stack. I miss
it every time I use VARIABLE.

>Given VALUEs will change (else one would use CONSTANT) it might have been
>better to omit the initial value.

No. You need to initialize a variable (or value) before reading its
value. So the way to ensure this is to initialize it right after
definition:

variable where-index -1 where-index !

This is much more cumbersome than fig-Forth's

-1 variable where-index

At least they got it right for VALUE.

>In the app below every VALUE was defined with a dummy.

A defined dummy helps avoid Heisenbugs, and can help find bugs.

>\ Program constants
>0 value #TERMS \ number of terminals in DTA file

You have more than 0 terminals at the start?

>\ Storage areas allocated at run-time
>here value IN-BASE ( -- a ) \ in-file
>here value DTA-BASE ( -- a ) \ dta-file
>here value TBUF ( -- a ) \ temp terminal buffer
>here value SBUF ( -- a ) \ swap/work buffer
>here value IBUF ( -- a ) \ console input
>here value XBUF ( -- a ) \ terminal index

I work on systems where accessing memory near 0 produces an exception,
so initializing addresses as 0 helps find bugs quickly where the
allocation has not happened before the first read. Without
initialization, the bug might go unnoticed for longer, making it
harder to find.

dxforth

unread,
May 26, 2022, 4:51:24 AM5/26/22
to
On 26/05/2022 16:12, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>On 26/05/2022 06:23, Hans Bezemer wrote:
>>> ...
>>> And I find that I tend to do VALUEs often when I can initialize a VARIABLE -
>>> like VARIABLE in Forth-79.
>
> From Forth-79:
> |VARIABLE 227
> | A defining word executed in the form:
> | VARIABLE <name>
> | to create a dictionary entry for <name> and allot two bytes
> | for storage in the parameter field. The application must
> | initialize the stored value.
>
> In fig-Forth VARIABLE took the initial value from the stack. I miss
> it every time I use VARIABLE.
>
>>Given VALUEs will change (else one would use CONSTANT) it might have been
>>better to omit the initial value.
>
> No. You need to initialize a variable (or value) before reading its
> value. So the way to ensure this is to initialize it right after
> definition:

I do that in a function (e.g. INIT ) so that when a program is re-run
VALUEs and anything else that requires predictable initial values will
be correctly set. Defining x VALUE FOO is misleading as it suggests
FOO will be x whenever the program is run. It's such assumptions that
lead to your Heisenbugs.

>
> variable where-index -1 where-index !
>
> This is much more cumbersome than fig-Forth's
>
> -1 variable where-index
>
> At least they got it right for VALUE.
>
>>In the app below every VALUE was defined with a dummy.
>
> A defined dummy helps avoid Heisenbugs, and can help find bugs.

Or not. I recently ported an old program which presumed VARIABLE
initialized to zero. It helped that my VARIABLE initialized to a
random value else I might still be looking for the bug.

>
>>\ Program constants
>>0 value #TERMS \ number of terminals in DTA file
>
> You have more than 0 terminals at the start?
>
>>\ Storage areas allocated at run-time
>>here value IN-BASE ( -- a ) \ in-file
>>here value DTA-BASE ( -- a ) \ dta-file
>>here value TBUF ( -- a ) \ temp terminal buffer
>>here value SBUF ( -- a ) \ swap/work buffer
>>here value IBUF ( -- a ) \ console input
>>here value XBUF ( -- a ) \ terminal index
>
> I work on systems where accessing memory near 0 produces an exception,
> so initializing addresses as 0 helps find bugs quickly where the
> allocation has not happened before the first read. Without
> initialization, the bug might go unnoticed for longer, making it
> harder to find.

My preferred solution was:

: INIT ( -- )
altered off \ clear
isiz reserve to ibuf \ console input
tlen reserve to tbuf \ temp terminal buffer
chunk reserve to sbuf \ swap/work buffer
tmax cells reserve to xbuf \ terminal index
;

( INIT)

\ Main
: INSTALL ( -- )
cls title init
open-target read-dta catalog
menu
close-target
;

Paul Rubin

unread,
May 26, 2022, 1:23:34 PM5/26/22
to
dxforth <dxf...@gmail.com> writes:
> Defining x VALUE FOO is misleading as it suggests FOO will be x
> whenever the program is run.

It only suggests that FOO is x until it gets updated. FOO should (if
possible) have a legitimate and meaningful value at all times, rather
than a dummy value.

P Falth

unread,
May 26, 2022, 4:48:37 PM5/26/22
to
Anton, this is an interesting and somewhat unexpected development.
Can you share some more information on how you plan to implement it?

I can see some problems in that you can have more values defined then
you have registers available. In lxf I have 5 registers available for the code
generator, the rest are used by the system (stackpointers etc). Even if I only
use 1 register for values it will hurt the code generator. And what VALUE
would I dedicate that register to!

On a CPU with many register like ARM64 I could think of something like

1234 REGISTER-VALUE R1 test

to make test a value stored in register R1. This could be very useful
in some cases.

BR
Peter Fälth

dxforth

unread,
May 26, 2022, 7:54:09 PM5/26/22
to
On 26/05/2022 16:12, Anton Ertl wrote:
>>On 26/05/2022 06:23, Hans Bezemer wrote:
>>> ...
>>> And I find that I tend to do VALUEs often when I can initialize a VARIABLE -
>>> like VARIABLE in Forth-79.
>
> From Forth-79:
> |VARIABLE 227
> | A defining word executed in the form:
> | VARIABLE <name>
> | to create a dictionary entry for <name> and allot two bytes
> | for storage in the parameter field. The application must
> | initialize the stored value.
>
> In fig-Forth VARIABLE took the initial value from the stack. I miss
> it every time I use VARIABLE.

AFAIK the 'n VARIABLE name' syntax originated with Moore. It is present
in Kitt Peak Forth and Forth Inc's microForth (from which fig-Forth likely
got it). It also appears in Forth-77 (which FD called a "standardized
glossary" - also based on Kitt Peak Forth). According to FD, Forth-79
was the first serious attempt at a standard and portability. I'm not
aware of any official explanation as why the initializing value of VARIABLE
was dropped in Forth-79 (it also seems to have disappeared by the time of
polyForth and Starting FORTH). The change didn't bother me. If anything,
I thought they were fixing a loop-hole.

So where did VALUE come from? All I know is TO first appeared in FD V1N4.
According to the author it was based on a suggestion by Moore. The
intention was:

n VARIABLE foo foo @ ( n)

would be replaced by:

VARIABLE foo n TO foo foo ( n)

I don't know who first came up with VALUE or when. The earliest I've
seen it implemented was in F-PC. I suspect however it was in use before
that. Perhaps someone knows?

dxforth

unread,
May 26, 2022, 8:32:02 PM5/26/22
to
It's not the case for DEFER. I would very much have preferred VALUE
required no initial value then I wouldn't need to include a dummy.

Andy Valencia

unread,
May 27, 2022, 4:25:54 PM5/27/22
to
Paul Rubin <no.e...@nospam.invalid> writes:
> It only suggests that FOO is x until it gets updated. FOO should (if
> possible) have a legitimate and meaningful value at all times, rather
> than a dummy value.

Indeed, thus almost every language has initializers for things which can
subsequently have their value changed.

Andy Valencia
Home page: https://www.vsta.org/andy/
To contact me: https://www.vsta.org/contact/andy.html

dxforth

unread,
May 27, 2022, 11:51:57 PM5/27/22
to
On 28/05/2022 06:24, Andy Valencia wrote:
> Paul Rubin <no.e...@nospam.invalid> writes:
>> It only suggests that FOO is x until it gets updated. FOO should (if
>> possible) have a legitimate and meaningful value at all times, rather
>> than a dummy value.
>
> Indeed, thus almost every language has initializers for things which can
> subsequently have their value changed.

Yes, but what sense does it make to initialize them at compile-time
when it's execution-time that matters.

Ruvim

unread,
May 28, 2022, 7:32:28 AM5/28/22
to
On 2022-05-19 18:47, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> Do you avoid the standard parsing words?
>> For example: "[']" "postpone" 's"' 'abort"'
>
> I avoid these unless there is some reason not to. In particular:
>
> Instead of ['] FOO, I write `FOO. The latter can be copy-pasted into
> interpretive code.

To quoting a word I prefer the form 'FOO (i.e. Tick vs Backtick) for the
following reasons:
- it is closer to "[']" and "'" (so it's already connotated with
quoting a word in Forth);
- it is also used for quoting in some other languages (e.g., in Lisp,
to quote a list).

Possible disadvantage of this choice are as follows.

- Sometimes Tick is an actual part of a name. But those who use names
starting with a Tick probably will not use recognizers, but parsing words.

- Tick is used for character literals in Gforth. But is was a
suboptimal choice, I think.



[...]
>> To use them as:
>>
>> `foo :def 123 . ;
>>
>
> Why `FOO, not "FOO"?

For better readability.

I use Backtick for a space delimited string [1]. This form tells me that
the string is a name (a name of something, not necessary the name of a
definition). And it's better for readability. Also, this form is far
more concise than S" FOO", that was important before a recognizer for
"FOO" was introduced.




I think we should find some conventions for these forms (and maybe some
other), and give them names. After that, to avoid conflicts, a Forth
source code file (or code block) that relies on a convention, should
start with a declaration that mentions the convention's name.

The format of such a declaration probably should be standardized.

As an example, JavaScript uses "use strict" string literal as the first
item of a code block to declare strict mode.

We could use a list of recognizer names in the declaration. The scope of
this declaration (where this recognizers are in effect) should be
limited in obvious way.



[1] Not only I, see for example:
https://github.com/rufig/spf4-utf8/blob/9eaeb371ee2824082464cc39552390bf74308451/devel/%7Eygrek/lib/xmltag.f#L95

--
Ruvim

Ruvim

unread,
May 28, 2022, 9:55:51 AM5/28/22
to
On 2022-05-19 18:47, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:

[...]

>> 123 constant( foo )
>> create( bar ) 456 ,
>>
>> :( baz ) ( -- x ) foo postpone( foo bar ) ;
>>
>> :( test-baz ) :( baz2 ) baz drop postpone( @ ; ) ;
>>
>> t{ test-baz baz2 -> 123 456 }t
>>
>>

> And please explain the definition of TEST-BAZ and BAZ2.

It's just a test case.
TEST-BAZ creates the word BAZ2, that is equivalent to the following
definition:

: baz2 [ baz drop postpone( @ ; ) ] [

: baz2 [ baz drop ] @ ;

: baz2 [ foo postpone( foo bar ) drop ] @ ;

: baz2 [ foo ] foo bar [ drop ] @ ;

: baz2 [ 123 ] foo bar [ drop ] @ ;

: baz2 [ 123 drop ] foo bar @ ;

: baz2 foo bar @ ;

: baz2 123 bar @ ;




Actually, the form
:( foo )
is not quite readable due to confused ":("

A variant:
def( foo )
looks slightly better. But then the block should end with "end-def".

Maybe:

:def( foo ) ... ;

Just for reference, an implementation for ':def(' is following.

\ Data type symbols:
\ "sd" is a pair ( c-addr u )
\ "t" is a tuple ( i*x )

\ The code relies on the "quoted-word-by-tick" recognizer

: noop ;
: tt-dual ( t xt.compil xt.interp -- t )
state @ if drop else nip then execute
;
: tt-slit ( sd -- sd | ) 'slit, 'noop tt-dual ;
: tt-xt ( t xt -- t ) 'compile, 'execute tt-dual ;
: def ( sd.name -- ) ( C: -- colon-sys ) ': execute-parsing ;

: :def( \ "ccc )"
')' parse [: ( -- sd.name )
parse-name parse-name nip abort" unexpected immediate argument"
;] execute-parsing ( sd.name | c-addr 0 )
dup if tt-slit else 2drop then 'def tt-xt
; immediate


It defines such a behavior that the following lines of code are equivalent:

:def( foo ) :def( bar ) ... ;

:def( foo ) "bar" :def( ) ... ;

: foo "bar" ': execute-parsing ... ;


Well, I dislike this variant too. Since the eclipsed fragment is not the
body of the "bar" definition.



--
Ruvim

Ruvim

unread,
May 28, 2022, 9:56:37 AM5/28/22
to
On 2022-05-19 18:47, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> Do you avoid the standard parsing words?
>> For example: "[']" "postpone" 's"' 'abort"'
>
> I avoid these unless there is some reason not to. In particular:
>
> Instead of ['] FOO, I write `FOO. The latter can be copy-pasted into
> interpretive code.

> Instead of POSTPONE FOO, I write ]] FOO [[. Especially nice for
> multiple words.
>
> Instead of S" BLA", I write "BLA".
>
> I don't use ABORT", not the least because I always have to look up the
> directiom of the flag.
> Instead, I use THROW. If I need a new ball, I
> create it with
>
> "new ball" exception constant new-ball



I use ABORT" for the examples that should be shot and standard
compliant. And I still have to use S" BLA", ['] and POSTPONE for
strictly standard compliant modules.




>> And what about defining words?
>
> I tend to use these. They have default compilation semantics and one
> rarely wants to copy-paste them between compiled and interpreted code.
> Of course, in those rare cases (i.e., when debugging a defining word),
> I wish that they took their name argument from the stack.


It's confusing that in one context they are followed by an immediate
argument, but in another context — are not.

: foo : bar ;

"foo" is an immediate argument of the first colon, but "bar" is not an
immediate argument of the second colon. It's very confusing. (1)


Actually, some such words cannot be avoided due to Forth reflection.
E.g. the phrase:
parse-name test type
produces different effects when it's interpreted and when it's executed
(after compilation to a word).

But "parse-name" is a “plumbing” word, and ":" is a “porcelain” word
(after Git's terminology). Puzzles are OK for plumbing.





>> I'm wondered why people continue to use parsing words if they dislike them.
>
> In the four cases above, I do it when writing code that should work on
> Forth systems that do not understand the better idioms, or when
> demonstating something to an audience that may not be familiar with
> the better idioms, and these idioms would distract from the point I am
> trying to demonstrate.

I would like to get rid of the burden of that old idioms.



>> Why not introduce new words like:
>>
>> :def ( sd.name -- ) ( C: -- colon-sys )
>>
>> does-created ( xt sd.name -- )
>
> The question is if the benefit is worth the cost in these cases.
> Cost: additional words (because we don't want to destandardize all
> existing code). Benefit: rare, as mentioned above.


And I would like to avoid the problem (1) in some lexical scopes at least.



>
>> And after that, what to do with "[if]" and "[undefined]"?
>
> And \ and (.

If they are with us in any case, we should not avoid them, but properly
(safely and readable) use them.



--
Ruvim

Anton Ertl

unread,
May 28, 2022, 10:32:30 AM5/28/22
to
Ruvim <ruvim...@gmail.com> writes:
>On 2022-05-19 18:47, Anton Ertl wrote:
>> Instead of ['] FOO, I write `FOO. The latter can be copy-pasted into
>> interpretive code.
>
>To quoting a word I prefer the form 'FOO (i.e. Tick vs Backtick) for the
>following reasons:
> - it is closer to "[']" and "'" (so it's already connotated with
>quoting a word in Forth);
> - it is also used for quoting in some other languages (e.g., in Lisp,
>to quote a list).
>
>Possible disadvantage of this choice are as follows.
>
> - Sometimes Tick is an actual part of a name. But those who use names
>starting with a Tick probably will not use recognizers, but parsing words.

Gforth currently has the following words starting with ':

'-error ' 'quit 'cold 'image 'clean-maintask

and Gforth has recognizers. Also, I have seen several programs that
define words with names starting with ' (and often paired with a word
with a name without ', such as 'QUIT and QUIT). For ` I have yet to
see someone write a word that starts with that. As soon as someone
loads that program in a system with a '-recognizer, there are the
following potential problems:

* The user may be unaware of the existence of 'QUIT, and writes 'QUIT
with the intention of getting the xt of QUIT, but gets something
else because the word-recognizer precedes the '-recognizer.

* If, OTOH, the '-recognizer preceded the word-recognizer, you get the
converse problem: If you want to get at the word 'QUIT, you get the
xt of QUIT instead.

We also would have preferred to use 'FOO for the xt of FOO, but to
avoid these problems, we chose `.

> - Tick is used for character literals in Gforth. But is was a
>suboptimal choice, I think.

The usage 'c' is standardized in Forth-2012. The usage 'c is legacy
in Gforth, and not the major reason why we decided against 'FOO,
although a user writing, e.g., '# might be unpleasantly surprised that
the result is the ASCII value of # instead of the xt of #.

There is a word I' in Gforth, so 'I' would be a conflict with
non-legacy syntax. I think that given the fact that Gforth by default
tries to recognize a character before an xt, most users who want the
xt of I' probably would see 'I' and immediately see the conflict.

In any case, while some problems are less serious than others, all of
these problems are avoided by using ` in the tick-recognizer, which is
why we are using that.

>I think we should find some conventions for these forms (and maybe some
>other), and give them names. After that, to avoid conflicts, a Forth
>source code file (or code block) that relies on a convention, should
>start with a declaration that mentions the convention's name.
>
>The format of such a declaration probably should be standardized.
>
>As an example, JavaScript uses "use strict" string literal as the first
>item of a code block to declare strict mode.
>
>We could use a list of recognizer names in the declaration. The scope of
>this declaration (where this recognizers are in effect) should be
>limited in obvious way.

I think that, on the contrary, we should use one common set of words
and syntax instead of introducing ways to declare idiosyncracy.

E.g., do I think that the alignment handling in struct.fs is superior
to the Forth-2012 field words? Yes. Still, when writing new code, I
use the Forth-2012 words. And that's despite the fact that struct.fs
is a standard program, so you can use the struct.fs words fully
portably. It's just that the cost of idiosyncrasy outweighs the
benefits of the superior alignment handling.

We have also had ways to specify which wordset a program uses: wordset
queries with ENVIRONMENT?. A number of systems did not support that
in a useful way, few programs used it, and eventually we decided to
make them obsolescent (and refer only to Forth-94 while they are still
there). So I expect that a new way to specify which dialect a program
uses will also receive little love.

That being said, once we have standardized configurable recognizers,
nothing prevents you from adding a recognizer that uses ' for xt
literals and another recognizer that uses ` for string. So your
convention might be something like

require ruvim.4th ruvim-convention{
... \ code that uses ruvim-convention stuff
}ruvim-convention

I just hope that, like I do for field words, you will use the
(hopefully standardized) string syntax "string" (which already has a
lot of mindshare), and that we reach a consensus for a syntax for xt
literals, and all use that then.

- anton

Ruvim

unread,
May 28, 2022, 10:41:21 AM5/28/22
to
On 2022-05-19 21:20, S Jack wrote:
> On Wednesday, May 18, 2022 at 9:30:45 AM UTC-5, Ruvim wrote:
>> Do you avoid the standard parsing words?
>
> No. For new words where the choice is to make it parsing or postfix I choose
> postfix. I've re-defined some existing parsing words to be postifx such as
> FORGET and SEE:
> ' foo FORGET
> ' foo SEE
> But I don't go for purity which usually leads to abominations. Note in
> above tick is acceptable. It's a matter of using exceptions sparingly and
> where most effective. That's the art and of course not everyone is going
> to agree on the choices.

My choice is a recognizer for Tick, e.g. 'foo

And I think, we can avoid exceptions in some scopes.


> But back to your original what should be standard convention for parsing
> word syntax, my view:
>
> General choices
> 1) foo bar bat
> No syntax
> One must know what foo bar and bat are.
> 2) foo: bar bat
> Syntax indicates foo: a parsing word with bar as immediate parameter
> but bat is undetermined, could be a second parameter to foo or an
> operator.
> 3) foo( bar bat )
> Syntax indicates foo( is parsing word and has two immediate parameters
> bar and bat.
>
> Choice (1) should be preferable to the Forth purist (DX, the.Bee).
> Don't waste time worrying over syntax schemes.
>
> Choice (2) proposed by Albert which works for me is a simple syntax for
> parsing words, sufficient since our use of parsing words will be limited.
>
> Choice (3) is total explicit and should fit well in a more formal Forth which
> is standard Forth.
>
> I think choice (3) is best for the standard. I'll probably be using choice
> (2) but that doesn't mean I'm changing ' to ': .


From thees three variants, (3) looks better for me too.

But applying this rule to Colon seems not so pretty:

:( foo ) ... ;

(as I already mentioned that in a previous message).


If you want a Colon-like word that parses its body, how it can look like?

I mean something like:

def( foo ){ ... }def

I.e., the word should parse both the name and the whole definition body.
What a syntax can be for such a case?


--
Ruvim

Marcel Hendrix

unread,
May 28, 2022, 11:19:13 AM5/28/22
to
On Saturday, May 28, 2022 at 4:32:30 PM UTC+2, Anton Ertl wrote:
[..]
> For ` I have yet to
> see someone write a word that starts with that.

\ Try to talk to the Forth inside the server (If it's not C).

ALSO ENVIR

SERVER >UPC 'N'
<> [IF]
: ` &' PARSE \ #<string of server commands>#
{{
IFORTH!
DUP BOOTLINK @ CHANNEL-!
BOOTLINK @ CHANNEL-SEND
}} ;
[ELSE] : ` CR ." Cannot execute: ``" &' PARSE TYPE ." ''" ;
[THEN]

PREVIOUS

-marcel

S Jack

unread,
May 28, 2022, 11:54:07 AM5/28/22
to
On Saturday, May 28, 2022 at 9:41:21 AM UTC-5, Ruvim wrote:
>
> From thees three variants, (3) looks better for me too.
>
> But applying this rule to Colon seems not so pretty:
>
> :( foo ) ... ;
>

Like postscript make name assignment postfix:
: .... ; =( foo )

Colon is a :noname and "=(" assigns a name if wanted. ":NONAME" no longer needed.

Not going for purity one could do:
: .... ;\ foo

--
me

Ruvim

unread,
May 28, 2022, 12:00:06 PM5/28/22
to
On 2022-05-20 11:48, albert wrote:
> In article <t62vui$1mi$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>> On 2022-05-17, S Jack wrote:
>>> On Tuesday, May 17, 2022 at 11:23:57 AM UTC-5, Ruvim wrote:
>>>> Can we have a convention for naming parsing words?
>>>>
>>>> What is your considerations?
>>>>
>>>>
>>>> [1] Naming for parsing words
>>>> https://github.com/ForthHub/discussion/discussions/112
>>>
>>> Bah, that a parsing word with immediate parameter is 'better readable' than
>>> a postfix operator. Most definitions contain postfix words and mixing in
>>> parsing words is style conflict so I avoid parsing words.
>>
>> Do you avoid the standard parsing words?
>> For example: "[']" "postpone" 's"' 'abort"'
>> And what about defining words?
>>
>> I'm wondered why people continue to use parsing words if they dislike them.
>>
>> Why not introduce new words like:
>>
>> :def ( sd.name -- ) ( C: -- colon-sys )
>>
>> does-created ( xt sd.name -- )
>>
>> To use them as:
>>
>> `foo :def 123 . ;
>>
>> [: @ . ;] `bar does-created 123 ,
>>
>> foo bar \ prints "123 123"
>
> Then I would prefer:
> [: "hello world" TYPE ;] : hello
> or even c++/java/.. compatible:
> { "hello world" TYPE } : hello

So this ":" is still a parsing word.


In the STOIC language [1,2] it's written as:

'hello : "hello world" type ;

123 'bar constant


Interestingly, it uses two forms of string literals [3] — blank
separated, and double-quoted:

Examples of strings are:

'ABC_DEF -- note termination by a space
(can use a tab also)

"This one has spaces in it as well as ^&*()"



But STOIC didn't avoid inconspicuous immediate arguments.
For example, the word DISPATCH
Compilation: ( "ccc" -- )
Run-time: ( x -- x | )

A usage example:

'C.F : % F-command, needs second character
CLI.GNB IF
CONVERT_TO_UPPER
DISPATCH 'I C.FI
DISPATCH 'O C.FO
THEN DROP ERR.INVCOM
;



> <SNIP>
>>
>> t{ test-baz baz2 -> 123 456 }t
>
> Test words benefit from 2 separate code sequences plugged in.
> I use
> REGRESS test-baz baz2 S: 123 456 <EOL>

There are actually two separate code sequences, they are just separated
not by "S:", but by "->".

In general, you can write:

t{ test-baz baz2 -> 123 dup 333 + }t




[1] STOIC, 1976, — it's very close to Forth
https://en.wikipedia.org/wiki/STOIC

[2] Sources and some applications
http://www.decuslib.com/decus/vax85c/saostoic/

[3] STOIC manual
http://www.decuslib.com/decus/vax85c/saostoic/gm/newstoic.mem

--
Ruvim

S Jack

unread,
May 28, 2022, 12:08:29 PM5/28/22
to
On Tuesday, May 17, 2022 at 11:23:57 AM UTC-5, Ruvim wrote:
>
> Can we have a convention for naming parsing words?

Not a suggestion but an intriguing thought:

Instead of naming conventions what if the standard specifies only behavior
with given id's and not names. Programs can map a set of names to the id's
when compiled thereby using names that fit the application. If one is looking
at someone else code and doesn't like the naming, he just maps it to the
name set he prefers and re-compiles.

In good old days of HTML before the prevalence of java script Firefox provided
a user default CSS. The user could disable a web page's CSS and use his
default to have a much improved view of the page.

--
me

Anton Ertl

unread,
May 28, 2022, 12:27:06 PM5/28/22
to
Marcel Hendrix <m...@iae.nl> writes:
>On Saturday, May 28, 2022 at 4:32:30 PM UTC+2, Anton Ertl wrote:
>[..]
>> For ` I have yet to
>> see someone write a word that starts with that.
...
> : ` &' PARSE \ #<string of server commands>#

No problem, because the tick-recognizer does not recognize a solitary
"`".

Ruvim

unread,
May 28, 2022, 12:41:48 PM5/28/22
to
On 2022-05-20 09:14, dxforth wrote:
> On 19/05/2022 00:30, Ruvim wrote:
>> On 2022-05-17, S Jack wrote:
>>>
>>> Bah, that a parsing word with immediate parameter is 'better readable' than
>>> a postfix operator. Most definitions contain postfix words and mixing in
>>> parsing words is style conflict so I avoid parsing words.
>>
>> Do you avoid the standard parsing words?
>> For example: "[']" "postpone" 's"' 'abort"'
>> And what about defining words?
>>
>> I'm wondered why people continue to use parsing words if they dislike them.
>
> But do they [beyond the few that already exist]?

Yes, they/we do. For example, see use of "parse-name" in Forth code on
GitHub [1]. There are many cases of standard words definitions, but also
many cases of user defined parsing word.
Obviously, in the set of distinct parsing words, the user defined words
is a major part.


> A few may enjoy creating
> new parsing words (like the few that enjoy creating macros) but I wouldn't
> say either was intrinsic to Forth, or even popular. If creating new parsing
> words were popular, wouldn't there already be a convention for it?
>
> https://pastebin.com/qpZLFc6h

The old convention was: don't use any special naming convention for
porcelain parsing word, they should look just like ordinary words.

For example: "[COMPILE]" and ASCII in Forth-79, POSTPONE in Forth-94


[1] Search results in GitHub for "language:forth parse-name'
https://github.com/search?q=language%3Aforth+parse-name&type=code

--
Ruvim

Ruvim

unread,
May 28, 2022, 1:16:31 PM5/28/22
to
On 2022-05-17 23:47, albert wrote:
> In article <t60i6r$7m0$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>> In my post [1] in ForthHub I compare the different variants for naming
>> parsing words.
>>
>> Namely, I mean the words for which compilation semantics include
>> scanning/parsing the input stream, and, if interpretation semantics are
>> defined for the word, they include parsing too. Especially, when the
>> word parses one or several lexemes.
>>
>> Can we have a convention for naming parsing words?
>>
>> What is your considerations?
>
> I am a proponent of outlawing parsing words, with an exception for
> denotations, say constants.
> E.g 'DROP or "AAP" are parsed by ' and " and generate a constant,
> independant of interpretation of compilation mode.
> That made it possible to restrict the inspection of STATE to
> where a word is interpreted or compiled.
> The compilation STATE decides whether to compile it,
> adding LIT or FLIT, or leave it on the stack.
> Don't get upset, I don't propose it to the standard.
>
> Parsing words can be handy in special purpose languages,
> to honour expectations from users. That is a different matter.
> A simple convention to end the parsing words with ':'.
>
> FROM floating-point IMPORT: F* F** FLOG PI

It seems, "FROM" is also a parsing word, then why is a trailing ':'
missed in the name?

And what do you think in regards to "constant", "variable", ":", and
other defining words? Would you like to outlaw them?


My position is that the parsed part (the immediate arguments), if any,
should be explicitly marked and easy readable, without reading manuals.

And if a porcelain parsing word is allowed to be used in compilation
state and in interpretation state, it should accept immediate arguments
in both cases (or explicitly show that an argument is passed via the
stack instead).

By "to be used" I mean to be encountered by the Forth text interpreter.



--
Ruvim

Ruvim

unread,
May 28, 2022, 2:17:32 PM5/28/22
to
On 2022-05-17 20:23, Ruvim wrote:
> In my post [1] in ForthHub I compare the different variants for naming
> parsing words.
>
> Namely, I mean the words for which compilation semantics include
> scanning/parsing the input stream, and, if interpretation semantics are
> defined for the word, they include parsing too. Especially, when the
> word parses one or several lexemes.
>
> Can we have a convention for naming parsing words?
>
> What is your considerations?
>
>
Concerning the defining words in Forth — I think, they do parsing just
for better readability.

For example, the code:

123 constant foo

: bar "hello world" type ;

probably is better readable than:

123 "foo" const

{ "hello world" type } "bar" def

or even

123 s" foo" const


But then, for better readability, why not to parse both arguments:

constant foo 123

The question is rhetorical. It's a compromise between readability and
ability to calculate or pass this argument from other words.



Let's have a look at the WebAssembly language [1]. It's a stack-based
language, and it employs s-expression in the source code by the
following rule [2]:

"(X ... )" is equivalent to "... X"

or, vice versa:

" ... X" is equivalent to "(X ... )"



Let the word "const ( x sd.name -- )" creates a constant by an x and
name string from the stack, and the word "def ( xt sd.name -- )" creates
a word by an xt and name string from the stack, and "{ ... }" is
equivalent to "[: ... ;]" that works in interpretation state too.

Using the above rule for s-expression, we will get:

123 "foo" const

123 (const "foo")

(const 123 "foo")

(const ("foo" 123))


And for a word definition:

{ "test passed" type } "foo" def

(def { "test passed" type } "foo" )

(def ("foo" { "test passed" type }))


NB: the words "def" and "const" don't parse anything by themselves.
In Forth it can be implemented as a recognizer for "(X" that returns "X"
and a translator that also parses input stream according to parenthesis.

This approach cannot be used when you need to analyze an immediate
argument in compile time. It only can provide a better readability in
some cases.

One more example:

\ !h ( fileid|0 -- )
\ h ( -- fileid|0 )

(open-file "path/to/file.txt" r/o) throw !h
(file-size h) throw
(reposition-file h) throw
(write-file (h "(test passed)" )) throw
(close-file h) throw 0 !h


Nice, isn't it?
We see the main command at the start of each line. And the explicit
arguments do not get in the way before the command. And the implicit
arguments on the stack also flow from command to command.



[1] WebAssembly in Wikipedia
https://en.wikipedia.org/wiki/WebAssembly
[2] Folded Instructions (specification)
https://webassembly.github.io/spec/core/text/instructions.html#folded-instructions

--
Ruvim

P Falth

unread,
May 28, 2022, 4:34:30 PM5/28/22
to
One of the first things that struck me when I learned about the forth language
was the complete freedom to name words whatever you wanted.
I do not want to change that!
What you have presented might be very logical but it looks ugly and does not
read well. you can do recommendations but no standardization please!

BR
Peter

Ruvim

unread,
May 28, 2022, 5:25:06 PM5/28/22
to
On 2022-05-29 00:34, P Falth wrote:
> On Saturday, 28 May 2022 at 18:41:48 UTC+2, Ruvim wrote:
>> On 2022-05-20 09:14, dxforth wrote:

>>> A few may enjoy creating
>>> new parsing words (like the few that enjoy creating macros) but I wouldn't
>>> say either was intrinsic to Forth, or even popular. If creating new parsing
>>> words were popular, wouldn't there already be a convention for it?
>>>
>>> https://pastebin.com/qpZLFc6h
>>
>> The old convention was: don't use any special naming convention for
>> porcelain parsing word, they should look just like ordinary words.
>>
>> For example: "[COMPILE]" and ASCII in Forth-79, POSTPONE in Forth-94
>
> One of the first things that struck me when I learned about the forth language
> was the complete freedom to name words whatever you wanted.
> I do not want to change that!
> What you have presented might be very logical but it looks ugly and does not
> read well.


OK. Could you please order by ugliness (from higher to lower) the
following definitions for the word "ip-in-subnet ( x.ip x.ip-mask
x.ip-net -- flag )":

: ip-in-subnet >r and r> = ;
: ip-in-subnet ['] and dip = ;
: ip-in-subnet 'and dip = ; \ Tick is for quoting a word
: ip-in-subnet `and dip = ; \ Backtick is for quoting a word
: ip-in-subnet dip' and = ;
: ip-in-subnet dip:and = ;
: ip-in-subnet dip( and ) = ;
: ip-in-subnet dip{ and } = ;

See definitions for "dip", etc at
https://github.com/ForthHub/discussion/discussions/112


> you can do recommendations but no standardization please!

Yes, naming conventions it not about standardization at all.



--
Ruvim

Hans Bezemer

unread,
May 28, 2022, 6:28:57 PM5/28/22
to
On Saturday, May 28, 2022 at 11:25:06 PM UTC+2, Ruvim wrote:
This is Forth
> : ip-in-subnet >r and r> = ;

Entering Factor territory here
> : ip-in-subnet ['] and dip = ;

This is Anton Forth
> : ip-in-subnet 'and dip = ; \ Tick is for quoting a word
> : ip-in-subnet `and dip = ; \ Backtick is for quoting a word

Night of the Living Dead
> : ip-in-subnet dip' and = ;

Texas Chainsaw Massacre
> : ip-in-subnet dip:and = ;

Cannibal Holocaust
> : ip-in-subnet dip( and ) = ;

A Serbian Film
> : ip-in-subnet dip{ and } = ;

Hans Bezemer

none albert

unread,
May 29, 2022, 3:25:44 AM5/29/22
to
In article <t6t189$k0b$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>On 2022-05-19 18:47, Anton Ertl wrote:
>> Ruvim <ruvim...@gmail.com> writes:
>>> Do you avoid the standard parsing words?
>>> For example: "[']" "postpone" 's"' 'abort"'
>>
>> I avoid these unless there is some reason not to. In particular:
>>
>> Instead of ['] FOO, I write `FOO. The latter can be copy-pasted into
>> interpretive code.
>
>To quoting a word I prefer the form 'FOO (i.e. Tick vs Backtick) for the
>following reasons:
> - it is closer to "[']" and "'" (so it's already connotated with
>quoting a word in Forth);
> - it is also used for quoting in some other languages (e.g., in Lisp,
>to quote a list).
>
>Possible disadvantage of this choice are as follows.
>
> - Sometimes Tick is an actual part of a name. But those who use names
>starting with a Tick probably will not use recognizers, but parsing words.
>
> - Tick is used for character literals in Gforth. But is was a
>suboptimal choice, I think.

I agree with this disadvantage. It is however no worse than
2SWAP, that at first glance seem to be an integer.
'DROP is a number (token/address) that is not to be found if 'DROP
is actually in the dictionary.
I'm using 'DROP instead of using `` ' DROP '' or `` ['] DROP ''
since 2001. My Forth doesn't get confused by an occasional
program by Marcel Hendrix who is fond of using ' to indicate
deferred execution.

<SNIP>

>--
>Ruvim
--
"in our communism country Viet Nam, people are forced to be
alive and in the western country like US, people are free to
die from Covid 19 lol" duc ha
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Anton Ertl

unread,
May 29, 2022, 3:31:58 AM5/29/22
to
P Falth <peter....@gmail.com> writes:
[Allocating VALUEs to registers]
>Can you share some more information on how you plan to implement it?
>
>I can see some problems in that you can have more values defined then
>you have registers available. In lxf I have 5 registers available for the c=
>ode
>generator, the rest are used by the system (stackpointers etc). Even if I o=
>nly=20
>use 1 register for values it will hurt the code generator. And what VALUE
>would I dedicate that register to!

I have no concrete plans, but the general idea is to allocate values
to registers locally, not globally. E.g., if you have a loop where V
is only read, load it into a register before the loop, and only use
the register in the loop. Or if it is read and written in the loop,
load it before the loop, keep it in the register during the loop, and
store it afterwards.

>On a CPU with many register like ARM64 I could think of something like
>
>1234 REGISTER-VALUE R1 test
>
>to make test a value stored in register R1. This could be very useful
>in some cases.

Something like that may be useful for dealing with things that are
used frequently in a lot of places, e.g., for implementing THIS in an
object-oriented extension. If you name the register, it becomes at
least machine-specific and probably also system-specific. The
approach used by early C compilers for the "register" storage class is
probably better: keep the first n register-values in registers, where
n is a machine- and system-specific number, and the rest is treated
like normal values.

- anton

none albert

unread,
May 29, 2022, 3:40:44 AM5/29/22
to
In article <445b821c-4ee2-4655...@googlegroups.com>,
Do you still use the recognizer/prefix & ?
We were there first in 1993/2001 with that prefix.
I think &' is superior to ''' in view of ' and '' that also
exists.
I think &A is more Forth like, than a parser that not until
the second ' knows what 'A' is supposed to mean.

'A' is the worst c-compatibility.
I would love to see 0X1A for hex instead of $1A , freeing
$HOME for environment variables. That would be a beneficial
c-compatibility.

>
>-marcel

Groetjes Albert

none albert

unread,
May 29, 2022, 3:56:03 AM5/29/22
to
I hate [UNDEFINED].

"socket-server" [UNDEFINED] [IF]
\ You really mean you can insert a portable definition of `socket-server?
\ Even if you succeed, you uglifies you code beyond redemption.
[THEN]

has to be replaced by

WANT socket-server

Groetjes Albert

>--
>Ruvim

none albert

unread,
May 29, 2022, 3:59:59 AM5/29/22
to
That is just solution 1, with a small code convention that
draw attention to parsing ahead.

>> 3) foo( bar bat )
>> Syntax indicates foo( is parsing word and has two immediate parameters
>> bar and bat.
>>
>> Choice (1) should be preferable to the Forth purist (DX, the.Bee).
>> Don't waste time worrying over syntax schemes.
>>
>> Choice (2) proposed by Albert which works for me is a simple syntax for
>> parsing words, sufficient since our use of parsing words will be limited.
>>
>> Choice (3) is total explicit and should fit well in a more formal Forth which
>> is standard Forth.
>>
>> I think choice (3) is best for the standard. I'll probably be using choice
>> (2) but that doesn't mean I'm changing ' to ': .

none albert

unread,
May 29, 2022, 4:19:02 AM5/29/22
to
No it isn't. If the namespace/vocabulary word is pushing it
is void, else ALSO.

>
>And what do you think in regards to "constant", "variable", ":", and
>other defining words? Would you like to outlaw them?
That is the normal use case for getting a name from the input
stream. That goes without saying.

>
>
>My position is that the parsed part (the immediate arguments), if any,
>should be explicitly marked and easy readable, without reading manuals.

You can study my programs ciasdis and tmanx. Both are user defined
languages and use parsing words freely, to mimic conventions used
in assembler and musical scores.
You can not use these powerful programs without reading manuals,
but most falls in place automatically.

>
>And if a porcelain parsing word is allowed to be used in compilation
>state and in interpretation state, it should accept immediate arguments
>in both cases (or explicitly show that an argument is passed via the
>stack instead).

A user of my reverse engineering assembler would use "porcelain"
words only. These are a layer separated from reality, and can
be used only via meticulous documentation.
You are spreading an illusion of git-people, that porcelain words
are intuitive. That makes git all but unusable for the casual
user.

>--
>Ruvim

Groetjes Albert

P Falth

unread,
May 29, 2022, 4:57:38 AM5/29/22
to
On Saturday, 28 May 2022 at 23:25:06 UTC+2, Ruvim wrote:
> On 2022-05-29 00:34, P Falth wrote:
> > On Saturday, 28 May 2022 at 18:41:48 UTC+2, Ruvim wrote:
> >> On 2022-05-20 09:14, dxforth wrote:
>
> >>> A few may enjoy creating
> >>> new parsing words (like the few that enjoy creating macros) but I wouldn't
> >>> say either was intrinsic to Forth, or even popular. If creating new parsing
> >>> words were popular, wouldn't there already be a convention for it?
> >>>
> >>> https://pastebin.com/qpZLFc6h
> >>
> >> The old convention was: don't use any special naming convention for
> >> porcelain parsing word, they should look just like ordinary words.
> >>
> >> For example: "[COMPILE]" and ASCII in Forth-79, POSTPONE in Forth-94
> >
> > One of the first things that struck me when I learned about the forth language
> > was the complete freedom to name words whatever you wanted.
> > I do not want to change that!
> > What you have presented might be very logical but it looks ugly and does not
> > read well.
> OK. Could you please order by ugliness (from higher to lower) the
> following definitions for the word "ip-in-subnet ( x.ip x.ip-mask
> x.ip-net -- flag )":

This is how I would write it
> : ip-in-subnet >r and r> = ;


This I can understand after looking up what dip does
> : ip-in-subnet ['] and dip = ;

The rest are just ugly and difficult to read. ' ´ ` all look the same for me at a quick view
> : ip-in-subnet 'and dip = ; \ Tick is for quoting a word
> : ip-in-subnet `and dip = ; \ Backtick is for quoting a word
> : ip-in-subnet dip' and = ;
> : ip-in-subnet dip:and = ;
> : ip-in-subnet dip( and ) = ;
> : ip-in-subnet dip{ and } = ;

What I do not understand is this drive to pass an XT as argument. For sure it can be
powerful in certain cases, but you do not need to rewrite everything in that style.

For this example "ip-in-subnet" I would probably just use it without a definition
x.ip x.ip-mask and x.ip-net =

BR
Peter

Stephen Pelc

unread,
May 29, 2022, 5:33:36 AM5/29/22
to
On 29 May 2022 at 00:28:55 CEST, "Hans Bezemer" <the.bee...@gmail.com>
wrote:
Hear, hear. Simplify and add lightness.

--
Stephen Pelc, ste...@vfxforth.com
MicroProcessor Engineering, Ltd. - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com - free VFX Forth downloads

Marcel Hendrix

unread,
May 29, 2022, 6:36:02 AM5/29/22
to
On Sunday, May 29, 2022 at 11:33:36 AM UTC+2, Stephen Pelc wrote:
[..]
> > On Saturday, May 28, 2022 at 11:25:06 PM UTC+2, Ruvim wrote:
> > This is Forth
> >> : ip-in-subnet >r and r> = ;
> >
[..]
> Hear, hear. Simplify and add lightness.

Yeah, like

: dip ( x quot -- x ) swap [ call ] dip ;

-marcel

Ruvim

unread,
May 29, 2022, 10:23:02 AM5/29/22
to
As a user, I would prefer the tick-recognizer precedes the
word-recognizer. Since I know that I always use a leading tick either to
quoting a word, or for a character literal (when the lexeme ends with a
tick and consists of 3 xchars), and don't use a tick in names at all.

But another user, or the system implementer, who wants to use names
starting with tick (and probably doesn't use additional recognizers at
all), would prefer the word-recognizer precedes the tick-recognizer.

And only this latter option (where the word-recognizer is the most
precedence) is back compatible. The former option (where the
tick-recognizer precedes the word-recognizer) destandardizes some
standard programs, so it is not acceptable.

But with the backward-compatible option I cannot safely rely on the
tick-recognizer since I cannot be sure that 'QUIT always return xt of
QUIT. (1)


>
> We also would have preferred to use 'FOO for the xt of FOO, but to
> avoid these problems, we chose `.
[...]
> In any case, while some problems are less serious than others, all of
> these problems are avoided by using ` in the tick-recognizer, which is
> why we are using that.


Using a backtick, you don't avoid the problem (1), but only reduce the
chance of encountering this problem. Eventually, you either effectively
destandardize names starting with a backtick (except this name itself),
or make the backtick-recognizer unsafe (and so unusable).

I cannot rely on the backtick-recognizer at all, if I have even a little
chance that it returns not what I expect.

At the same time, many people in the Forth community will fairly resist
to any new restrictions on names.


Nota bene — this problem is a general problem for recognizers.

For example, if the implementer of a standard system, or some third
party library, provides a word "'M'" in the FORTH-WORDLIST, the system
is still standard (even having the included library).

And a standard program that uses a character literal 'M' is still a
standard program. But this standard program will work incorrectly on
this standard system, since the system will interpret "'M'" as its word
name.




So I see a declaration of a recognizer (that is the perceptor) as the
only solution of this general problem, and the problem (1) particularly.
It can be actually specified as an ordered sequence of recognizers, or a
single name of a well known sequence of recognizers.

Then, if I declare the used recognizers in any case, and it affects only
my code, and doesn't affect third-party code, why give up on the
preferred tick-recognizer?





>> I think we should find some conventions for these forms (and maybe some
>> other), and give them names. After that, to avoid conflicts, a Forth
>> source code file (or code block) that relies on a convention, should
>> start with a declaration that mentions the convention's name.
>>
>> The format of such a declaration probably should be standardized.
>>
>> As an example, JavaScript uses "use strict" string literal as the first
>> item of a code block to declare strict mode.
>>
>> We could use a list of recognizer names in the declaration. The scope of
>> this declaration (where this recognizers are in effect) should be
>> limited in obvious way.
>
> I think that, on the contrary, we should use one common set of words
> and syntax instead of introducing ways to declare idiosyncracy.

I don't interpret it as way to declare idiosyncrasy (although it can be
used for that). Otherwise any DSL, or even set of helping words that a
user includes into the search order would be declaration of idiosyncrasy.

I see it as a backward-compatible way to introduce a well-known set of
new features, that is connected with additional *restrictions* on
programs (that relies on these features).

I.e., for backward-compatibility, a handful of new words is better than
new *unconditional* restrictions on all word names.


> E.g., do I think that the alignment handling in struct.fs is superior
> to the Forth-2012 field words? Yes. Still, when writing new code, I
> use the Forth-2012 words. And that's despite the fact that struct.fs
> is a standard program, so you can use the struct.fs words fully
> portably. It's just that the cost of idiosyncrasy outweighs the
> benefits of the superior alignment handling.

I see. But now we are in the position as if the Forth-2012 field words
are not standardized yet. And we can choose a better set of words (and
their names).



> We have also had ways to specify which wordset a program uses: wordset
> queries with ENVIRONMENT?. A number of systems did not support that
> in a useful way, few programs used it, and eventually we decided to
> make them obsolescent (and refer only to Forth-94 while they are still
> there). So I expect that a new way to specify which dialect a program
> uses will also receive little love.

Don't you think the solution is not well designed for the problem?

The "ENVIRONMENT?" word just doesn't fit the problem of including
external libraries (external definitions for a word set).

Using "ENVIRONMENT?" a program should choose what to do if a word set is
missed. Usually a program chooses just to not work (not load, abort,
throw an error, etc).

But then why to use "ENVIRONMENT?", and analyze it results, if the
program will be aborted in any case, if a word is not found.


So this example doesn't support your position, I think.



> That being said, once we have standardized configurable recognizers,
> nothing prevents you from adding a recognizer that uses ' for xt
> literals and another recognizer that uses ` for string. So your
> convention might be something like
>
> require ruvim.4th ruvim-convention{
> ... \ code that uses ruvim-convention stuff
> }ruvim-convention
>
> I just hope that, like I do for field words, you will use the
> (hopefully standardized) string syntax "string" (which already has a
> lot of mindshare), and that we reach a consensus for a syntax for xt
> literals, and all use that then.



forth-2023 !perception{

... \ code that uses a standard set of recognizers
... \ including 'FOO `name wordlist1::wordlist2::name etc
... \ that precede the word-recognizer (!)
}

or

forth-2023 !perception// \ up to the end of file


(the naming is under construction)


--
Ruvim

Anton Ertl

unread,
May 29, 2022, 10:58:31 AM5/29/22
to
albert@cherry.(none) (albert) writes:
>Do you still use the recognizer/prefix & ?
>We were there first in 1993/2001 with that prefix.

Gforth has supported 'A in 0.2.1 (1996).

But that's water down the river. 'A' has been standardized.

>I would love to see 0X1A for hex

A number of systems support 0x1a, e.g., Gforth:

0x1a . 26 ok

It's just too annoying to have to replace the "0x" from gdb with "$".

>instead of $1A , freeing
>$HOME for environment variables.

In development Gforth:

require rec-env.fs ok
$HOME type /home/anton ok
$1a . 26 ok

No need for freeing. If you have an environment variable that would
be a valid hex number (my normal environment does not), you could
arrange for REC-ENV to be in front of REC-NUM (and write the hex
number with a leading 0), or write "ABC" getenv.

>That would be a beneficial c-compatibility.

C does not support $HOME, so why would that be any kind of
C-compatibility?

And is there any other criterion for "worst" and "beneficial" than
your personal preference?

dxforth

unread,
May 29, 2022, 12:18:12 PM5/29/22
to
On 29/05/2022 17:56, albert wrote:
> In article <t6t9mi$buj$2...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>>On 2022-05-19 18:47, Anton Ertl wrote:
>>>> And after that, what to do with "[if]" and "[undefined]"?
>>>
>>> And \ and (.
>>
>>If they are with us in any case, we should not avoid them, but properly
>>(safely and readable) use them.
>
> I hate [UNDEFINED].
>
> "socket-server" [UNDEFINED] [IF]
> \ You really mean you can insert a portable definition of `socket-server?
> \ Even if you succeed, you uglifies you code beyond redemption.
> [THEN]
>
> has to be replaced by
>
> WANT socket-server

I don't care for its length but at least [UNDEFINED] is unambiguous.
Were shortness paramount I'd have chosen LACK. But hey, we need to
keep Forth respectable and starched shirts sell better.

Hans Bezemer

unread,
May 29, 2022, 12:53:49 PM5/29/22
to
On Sunday, May 29, 2022 at 6:18:12 PM UTC+2, dxforth wrote:
> > I hate [UNDEFINED].
> I don't care for its length but at least [UNDEFINED] is unambiguous.
> Were shortness paramount I'd have chosen LACK. But hey, we need to
> keep Forth respectable and starched shirts sell better.

I love [UNDEFINED] for the same reason I like

VARIABLE x
VALUE y
CONSTANT z
: a
CHAR b

You parse the thing, look it up in the symboltable and leave a 1 or a 0 - which [IF]
can pick up easily. Ok, you can discuss the name as far as I'm concerned, but that's
it.

I'd hate to see:

S" x" VARIABLE
20 S" y" VALUE
10 S" x" CONSTANT
S" a" DEF{ }
S" b" CHAR

Hans Bezemer



Ruvim

unread,
May 29, 2022, 2:23:42 PM5/29/22
to
On 2022-05-29 20:18, dxforth wrote:
> On 29/05/2022 17:56, albert wrote:
>> In article <t6t9mi$buj$2...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>>> On 2022-05-19 18:47, Anton Ertl wrote:
>>>>> And after that, what to do with "[if]" and "[undefined]"?
>>>>
>>>> And \ and (.
>>>
>>> If they are with us in any case, we should not avoid them, but properly
>>> (safely and readable) use them.

By "they" I don't mean these particular words, but porcelain words that
do parsing in general.

>>
>> I hate [UNDEFINED].
>>
>> "socket-server" [UNDEFINED] [IF]
>> \ You really mean you can insert a portable definition of `socket-server?
>> \ Even if you succeed, you uglifies you code beyond redemption.
>> [THEN]
>>
>> has to be replaced by
>>
>> WANT socket-server
>
> I don't care for its length but at least [UNDEFINED] is unambiguous.
> Were shortness paramount I'd have chosen LACK. But hey, we need to
> keep Forth respectable and starched shirts sell better.


lack( foo ) [if]
...
[then]

Nice!

Concerning compilation state, [lack]( foo ) looks overloaded.
So just use [ lack( foo ) ]

: bar
...
[ lack( foo ) ] [if]
...
[then]

[ glut( baz ) ] [if] \ "baz" is available
... baz
[then]
...
;

And to check availability of "foo" at run-time:

: foobar lack( foo ) if ... else ... then ;



--
Ruvim

Ruvim

unread,
May 29, 2022, 2:30:33 PM5/29/22
to
But having shorter string literals it looks better:
"x" var
or
`x var
10 `x const


Anyway, what can you suggest if I need to create a definition (e.g. a
constant) programmatically, and its name is passed as an argument?

Having "const" from the above it could look as:

: foo ( sd.name -- ) ... 10 -rot const ... ;



--
Ruvim

Ruvim

unread,
May 29, 2022, 3:23:44 PM5/29/22
to
On 2022-05-29 12:57, P Falth wrote:
> On Saturday, 28 May 2022 at 23:25:06 UTC+2, Ruvim wrote:
>> On 2022-05-29 00:34, P Falth wrote:
>>> On Saturday, 28 May 2022 at 18:41:48 UTC+2, Ruvim wrote:
>>>> On 2022-05-20 09:14, dxforth wrote:
>>
>>>>> A few may enjoy creating
>>>>> new parsing words (like the few that enjoy creating macros) but I wouldn't
>>>>> say either was intrinsic to Forth, or even popular. If creating new parsing
>>>>> words were popular, wouldn't there already be a convention for it?
>>>>>
>>>>> https://pastebin.com/qpZLFc6h
>>>>
>>>> The old convention was: don't use any special naming convention for
>>>> porcelain parsing word, they should look just like ordinary words.
>>>>
>>>> For example: "[COMPILE]" and ASCII in Forth-79, POSTPONE in Forth-94
>>>
>>> One of the first things that struck me when I learned about the forth language
>>> was the complete freedom to name words whatever you wanted.
>>> I do not want to change that!
>>> What you have presented might be very logical but it looks ugly and does not
>>> read well.
>> OK. Could you please order by ugliness (from higher to lower) the
>> following definitions for the word "ip-in-subnet ( x.ip x.ip-mask
>> x.ip-net -- flag )":
>
> This is how I would write it
>> : ip-in-subnet >r and r> = ;


This example is taken for simplicity. Instead, just take your real life
case when you pass an xt as an argument, or pass a word (or several) as
an immediate argument.


> This I can understand after looking up what dip does
>> : ip-in-subnet ['] and dip = ;
>
> The rest are just ugly and difficult to read. ' ´ ` all look the same for me at a quick view
>> : ip-in-subnet 'and dip = ; \ Tick is for quoting a word
>> : ip-in-subnet `and dip = ; \ Backtick is for quoting a word
>> : ip-in-subnet dip' and = ;
>> : ip-in-subnet dip:and = ;
>> : ip-in-subnet dip( and ) = ;
>> : ip-in-subnet dip{ and } = ;
>
> What I do not understand is this drive to pass an XT as argument.

No any xt is passed in the last four code examples.


> For sure it can be powerful in certain cases,
> but you do not need to rewrite everything in that style.

Actually, it's my position too.

Have a look [1], I started from a combinator (a word that accepts an xt
on the stack), and then introduced an equivalent parsing word:

| Now let's consider a variant of the "dip" combinator
| that accepts a word to be executed as an immediate argument
| and generates a more efficient code. Also, this variant can
| be better readable when the word is known in advance.

I provided two reasons for that:

1. Potentially, more efficient code.
2. Potentially, better readability.


In general, I talk not only about passing xt, but passing any argument
(for example, a name string).

Accepting immediate arguments can be a good choice in many cases. But a
common problem that authors don't mark these arguments in the source
code explicitly. And that worsens readability.


So for the words having immediate arguments — I want the boundaries of
these arguments be easy visible. And in the same time I want to have the
corresponding ("plumbing") words that accepts argument from the data
stack (or, reluctantly, a technique like "execute-parsing" at least).




[1] https://github.com/ForthHub/discussion/discussions/112


--
Ruvim

P Falth

unread,
May 29, 2022, 4:29:12 PM5/29/22
to
To make them visible I have a completely different solution.
I have written a context sensitive Forth colorizer for the text editor I am using (SciTe)

After I add parsing words to the right section they will color their arguments different
then ordinary text. It is a great help for readability.

Peter

Ruvim

unread,
May 29, 2022, 4:48:31 PM5/29/22
to
I don't imply the idea of intuitiveness at all.

In Git, plumbing commands "to be used as building blocks for new tools
and custom scripts" [1], "these commands are primarily for scripted use"
[2].

Porcelain commands are high-level commands [3]. These commands are
intended for direct use and their output format may be changed (to make
it better readable by humans) by evolving.


Using this terminology, "parse-name" is a plumbing word in Forth. The
word "constant" is a porcelain word (which is intended for direct use).
But we don't have a corresponding plumbing word to create a constant.



[1] Git Internals - Plumbing and Porcelain
https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain

[2] Low-level commands (plumbing)
https://git.github.io/htmldocs/git.html#_low_level_commands_plumbing

[3] High-level commands (porcelain)
https://git.github.io/htmldocs/git.html#_high_level_commands_porcelain


--
Ruvim

Ruvim

unread,
May 29, 2022, 5:11:02 PM5/29/22
to
On 2022-05-30, P Falth wrote:
> On Sunday, 29 May 2022 at 21:23:44 UTC+2, Ruvim wrote:
[...]
>> So for the words having immediate arguments — I want the boundaries of
>> these arguments be easy visible. And in the same time I want to have the
>> corresponding ("plumbing") words that accepts argument from the data
>> stack (or, reluctantly, a technique like "execute-parsing" at least).
>
> To make them visible I have a completely different solution.
> I have written a context sensitive Forth colorizer for the text editor I am using (SciTe)
>
> After I add parsing words to the right section they will color their arguments different
> then ordinary text. It is a great help for readability.


I see. Probably, this solution is acceptable for individuals or very
small collectives.

But this solution is not scalable on the whole Forth ecosystem. It is
not suitable for shared code and distributed development.


--
Ruvim

none albert

unread,
May 30, 2022, 2:28:46 AM5/30/22
to
In ciforth the question is moot. Recognizers (prefixes) are present
in the dictionary in wordlists. Normal precedence rules apply.
If a recognizer is present in a VOCABULARY, you can switch it off
by not adding that vocabulary to the search order.
The exception position for numbers (and other denotations) was a
major design flaw in the original Forth.
Note that Moore himself moved away from that in colorforth.

S Jack

unread,
May 30, 2022, 3:41:32 AM5/30/22
to
On Sunday, May 29, 2022 at 3:29:12 PM UTC-5, P Falth wrote:
> I have written a context sensitive Forth colorizer for the text editor I am using (SciTe)
>

VIM has a color setup for each language including Forth and user can
modify, add to or change color, as needed. My Forth is ever evolving and I
haven't changed the color setting in last 10 years so in effect I have little
reliance on color. But that's just me :)
--
me

dxforth

unread,
May 30, 2022, 3:56:50 AM5/30/22
to
There's not much Moore hasn't move away from. Having left Forth Inc
there was nothing to hold him back. Whenever he announced a new forth
there'd be a flurry of interest only for it to dissipate. For all the
rhetoric, it seems few forthers are willing to venture 'beyond the fields
we know' to borrow Dunsany's phrase. We're ready to accept - provided
Standard Forth approves.

Anton Ertl

unread,
May 30, 2022, 7:02:59 AM5/30/22
to
Why not?

Sure, you may need to add the parsing words to multiple editors, but
that's doable. And you don't add that many parsing words.

I guess that with the language server protocol you can add the parsing
words to several editors at once.

Of course, with a naming convention you would just add the convention
to the editor, no need to add every word separately, so a large
project might want to introduce such a convention.

One very general way would be to load the project into a Forth system,
and use the system's parsing and recognizing for coloring: If a piece
of source code is parsed with with something other than the text
interpreter's PARSE-NAME or one of a list of pre-defined parsing words
(e.g., S", (, \), you give the parsing-word-parsed colour to the piece
of source code. You colour stuff parsed by the text interpreter based
on the recognizer that recognizes it, and stuff parsed by S", (, \,
and maybe a few others separately. You have to find some way to
continue parsing after an error to make this approach usable.

minf...@arcor.de

unread,
May 30, 2022, 8:26:21 AM5/30/22
to
Aargh... non-immediate state-dependent parsing words,
camouflaged as recognizers, back-ticked for short-hand notation,
colorized with respect to daltonians, are certainly Forth's future.

Ruvim

unread,
May 30, 2022, 9:43:42 AM5/30/22
to
On 2022-05-30 14:24, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> On 2022-05-30, P Falth wrote:
>>> On Sunday, 29 May 2022 at 21:23:44 UTC+2, Ruvim wrote:
>> [...]
>>>> So for the words having immediate arguments — I want the boundaries of
>>>> these arguments be easy visible. And in the same time I want to have the
>>>> corresponding ("plumbing") words that accepts argument from the data
>>>> stack (or, reluctantly, a technique like "execute-parsing" at least).
>>>
>>> To make them visible I have a completely different solution.
>>> I have written a context sensitive Forth colorizer for the text editor I am using (SciTe)
>>>
>>> After I add parsing words to the right section they will color their arguments different
>>> then ordinary text. It is a great help for readability.
>>
>>
>> I see. Probably, this solution is acceptable for individuals or very
>> small collectives.
>>
>> But this solution is not scalable on the whole Forth ecosystem. It is
>> not suitable for shared code and distributed development.
>
> Why not?
>
> Sure, you may need to add the parsing words to multiple editors, but
> that's doable. And you don't add that many parsing words.


1. Parsing words of one project can conflict with parsing words of other
project.

Do you really think that every "dip_ x" can be globally reserved as a
parsing word? (taking into account that naming convention is not used,
and parsing words are named as other ordinary words).

2. This solution forces to use only editors that can highlight text by
known protocol. But some people don't use highlighting at all.



> I guess that with the language server protocol you can add the parsing
> words to several editors at once.
>
> Of course, with a naming convention you would just add the convention
> to the editor, no need to add every word separately, so a large
> project might want to introduce such a convention.

Yes. And a system of reusable Forth packages (that I keep in mind) is a
large project in this sense.


> One very general way would be to load the project into a Forth system,
> and use the system's parsing and recognizing for coloring: If a piece
> of source code is parsed with with something other than the text
> interpreter's PARSE-NAME or one of a list of pre-defined parsing words
> (e.g., S", (, \), you give the parsing-word-parsed colour to the piece
> of source code. You colour stuff parsed by the text interpreter based
> on the recognizer that recognizes it, and stuff parsed by S", (, \,
> and maybe a few others separately. You have to find some way to
> continue parsing after an error to make this approach usable.


I used this approach for correct programs that are loaded into the Forth
system. I changed the Forth text interpreter in such a way that it
generates XML during translation of source code.

After that I generated XHTML (from XML) with code highlighting
(including parsed parts or immediate arguments, immediate words,
numbers, etc), links to other files (that were subjects of INCLUDED),
cross-references (i.e., places where a word is used), and links to the
definition for each word.

I didn't need to specially handle standard words like S", (, \, etc.

All text that was read from the input stream during translation of a
word was considered as an immediate argument of this word.

Concerning errors. I think, continue loading after an error could be
dangerous, and it makes a little sense due to more and more incorrect
highlighting, since an error in compilation of one word (especially a
defining word) produces errors in compilation dependent words like an
avalanche, recursively.

I can also imagine this approach in a Forth live coding system,
conceptually something like Holon, but with less restriction on the
source code structure, and having robust mapping to the text files of
source code.


--
Ruvim

Ruvim

unread,
May 30, 2022, 10:24:05 AM5/30/22
to
On 2022-05-29 18:22, Ruvim wrote:
> On 2022-05-28 17:22, Anton Ertl wrote:
[...]

>> I just hope that, like I do for field words, you will use the
>> (hopefully standardized) string syntax "string" (which already has a
>> lot of mindshare), and that we reach a consensus for a syntax for xt
>> literals, and all use that then.


For all: if you use additional recognizers (beyond words and numbers),
then what recognizers do you prefer to be in the sequence that a
standard system should provide by a request? (I mean, not initially, but
after asking only).

Ruvim

unread,
May 30, 2022, 10:44:00 AM5/30/22
to
On 2022-05-29 22:23, Ruvim wrote:
> On 2022-05-29 20:18, dxforth wrote:

>> I don't care for its length but at least [UNDEFINED] is unambiguous.
>> Were shortness paramount I'd have chosen LACK.  But hey, we need to
>> keep Forth respectable and starched shirts sell better.
>
>
>   lack( foo ) [if]
>      ...
>   [then]
>
> Nice!


Probably, readability of following

"foo" lack [if] ... [then]

is not worser.




--
Ruvim

Stephen Pelc

unread,
May 30, 2022, 12:14:16 PM5/30/22
to
On 29 May 2022 at 23:10:58 CEST, "Ruvim" <ruvim...@gmail.com> wrote:

> On 2022-05-30, P Falth wrote:
>> On Sunday, 29 May 2022 at 21:23:44 UTC+2, Ruvim wrote:
> [...]
>>> So for the words having immediate arguments — I want the boundaries of
>>> these arguments be easy visible. And in the same time I want to have the
>>> corresponding ("plumbing") words that accepts argument from the data
>>> stack (or, reluctantly, a technique like "execute-parsing" at least).
>>
>> To make them visible I have a completely different solution.
>> I have written a context sensitive Forth colorizer for the text editor I am
>> using (SciTe)
>>
>> After I add parsing words to the right section they will color their
>> arguments different
>> then ordinary text. It is a great help for readability.
>
> I see. Probably, this solution is acceptable for individuals or very
> small collectives.

Why not large collectives?

> But this solution is not scalable on the whole Forth ecosystem. It is
> not suitable for shared code and distributed development.

Why not? Syntax colouring editors have been around for a long time.

Stephen

S Jack

unread,
May 30, 2022, 12:58:27 PM5/30/22
to
On Monday, May 30, 2022 at 11:14:16 AM UTC-5, Stephen Pelc wrote:

> Why not? Syntax colouring editors have been around for a long time.
>

Unsure if editor syntax coloring is robust enough. Forth has little syntax so
each word will need to be listed. May be sufficient for the standard words but
when additional word-sets are loaded, the editor's syntax coloring will need
to be updated. Is this a problem? Color Forth wouldn't have this problem as
each word has embedded color information (I assume). But then will need some
means of having the Forth interact with the editor display. If Forth words
have a consistent syntax, then editor syntax coloring would have little
problem.
One solution is to have syntax, even if no one likes it, and the editor
displays the word without the syntax but in color based on the syntax
the editor sees. I do that now with comment tags where the pager (less)
converts the tags to color escape codes. The comment text is shown in
color without the tag.
--
me

S Jack

unread,
May 30, 2022, 3:01:12 PM5/30/22
to
I have a word LSWT to list Forth words by type, for example the following
lists constants:

consts lswt
CURRENT and CONTEXT are FORTH
SEETABLE SEETABLECount BIT6 MASK5BITS DEADBEEF SEEK_END SEEK_CUR
SEEK_SET RWRR RW WO RO zlitbl zhitbl \T '/' ',' '_' ')' '(' EOT ETX STX
SOH FAM:RW ascR ERRIOR ERRFIG MPAD:END MPAD:MASK MPAD:LIMIT CST.BTREK
CST.START CS0 CSI FIN \N RAWIO SYSARGS SYSCMDL SYSOPTION SHELL PARMS4
DO2AVAL DOAVAL DODOE DOUSR DO2VAR DO2VAL DO2CON DOVAR DOVAL DOCON DOCOL
CST_IF SMAX CHAR STDERR STDOUT STDIN EPIPE EINTR O_NONBLOCK O_APPEND
O_TRUNC O_EXCL O_CREAT O_RDWR O_WRONLY O_RDONLY SYS_FORK SYS_EXECVE
SYS_WAITPID SYS_BRK SYS_CREAT SYS_WRITE SYS_READ SYS_CLOSE SYS_OPEN
SYS_EXIT REC/BLK BFMAGIC IORMAP #BUFF DOVDO DOVEC 0x80 B/SCR B/BUF
TAREAZ TAREA MPADZ MPADB SIBZ SIBB IPCOBZ IPCOB IPCIBZ IPCIB FBUFZ FBUF
MACBZ MACB EM LIMIT FIRST WNMAX C/L BL FALSE TRUE CELL-1 -CELL CELL
FOUR THREE TWO MONE ONE ZERO ok

The word type is determined by its codeword and for defining word
offspring an additional address. As a Forth word's type can be
determined by codeword and an address, any Forth has information
that it can use for coloring.
--
me

Ruvim

unread,
May 30, 2022, 3:08:27 PM5/30/22
to
On 2022-05-30 20:58, S Jack wrote:
> On Monday, May 30, 2022 at 11:14:16 AM UTC-5, Stephen Pelc wrote:
>
>> Why not? Syntax colouring editors have been around for a long time.
>>
>
> Unsure if editor syntax coloring is robust enough. Forth has little syntax so
> each word will need to be listed. May be sufficient for the standard words but
> when additional word-sets are loaded, the editor's syntax coloring will need
> to be updated. Is this a problem?

> Color Forth wouldn't have this problem as
> each word has embedded color information (I assume). But then will need some
> means of having the Forth interact with the editor display.


In colorForth a color *is* a syntax. It's not a syntax highlighting by
an editor, it's a syntax itself.

A definition

: foo ( flag -- ) if 123 . then ;

is written as

red:foo

white:(
white:flag
white:--
white:)

green:if
green:123
green:.
green:then

green:;



NB: "Green words like if are compiled, but because they are in the macro
wordlist and act like immediate words, they are executed at compile time
like words written explicitly in yellow." [1]




> If Forth words
> have a consistent syntax, then editor syntax coloring would have little
> problem.
> One solution is to have syntax, even if no one likes it, and the editor
> displays the word without the syntax but in color based on the syntax
> the editor sees. I do that now with comment tags where the pager (less)
> converts the tags to color escape codes. The comment text is shown in
> color without the tag.
> --
> me



[1] The colorForth Editor
http://www.greenarraychips.com/home/documents/greg/cf-editor.htm

[2] colorForth
https://colorforth.github.io/cf.htm

[3] Pre-parsed words
https://colorforth.github.io/parsed.html


--
Ruvim

none albert

unread,
May 30, 2022, 3:10:36 PM5/30/22
to
In ciforth the basic dictionary search is FOUND
"foo" FOUND
leaves an address or NULL if not found.
So [DEFINED] is not needed, but neither are [IF] [THEN]

"2DROP" FOUND 0=
" : 2DROP DROP DROP ; " ROT AND EVALUATE

Hallmark of a sound design ?

Hans Bezemer

unread,
May 30, 2022, 3:14:31 PM5/30/22
to
On Sunday, May 29, 2022 at 8:30:33 PM UTC+2, Ruvim wrote:
> But having shorter string literals it looks better:
> "x" var
> or
> `x var
> 10 `x const

That is a very personal metric. I see strings - not declarations. And I think prefixed
words are b--t ugly.

> Anyway, what can you suggest if I need to create a definition (e.g. a
> constant) programmatically, and its name is passed as an argument?

> Having "const" from the above it could look as:
>
> : foo ( sd.name -- ) ... 10 -rot const ... ;

Try "CREATE". It has been there for a long time.

Hans Bezemer

Stephen Pelc

unread,
May 30, 2022, 4:55:17 PM5/30/22
to
On 30 May 2022 at 18:58:26 CEST, "S Jack" <sdwj...@gmail.com> wrote:

> On Monday, May 30, 2022 at 11:14:16 AM UTC-5, Stephen Pelc wrote:
>
>> Why not? Syntax colouring editors have been around for a long time.
>>
>
> Unsure if editor syntax coloring is robust enough.

Well, find out, otherwise this proposal is in the class "I want someone else
to do all the work".

Stephen Pelc

unread,
May 30, 2022, 4:57:32 PM5/30/22
to
On 30 May 2022 at 21:01:10 CEST, "S Jack" <sdwj...@gmail.com> wrote:
>
> The word type is determined by its codeword and for defining word
> offspring an additional address. As a Forth word's type can be
> determined by codeword and an address, any Forth has information
> that it can use for coloring.

Does that apply to Native Code Compilers?
It is loading more messages.
0 new messages