Copy-pasteable Forth

Anton Ertl

unread,

May 18, 2018, 11:06:36 AM5/18/18

to

Recent discussions have inspired me to think about how a
(non-standard) Forth system would look that is designed to be as
copy-pasteable as possible, i.e., where you could copy and paste code
from interpreted to compiled code and vice versa; and what would not
be possible. And ideally we would manage to do that without
multiple-behaviour words.

I look at the Forth-2012 core words as a departure point. I expect
that all problems are already present in the core words.

Non-parsing Words with default interpretation and compilation
semantics are already copy-pasteable, with the exception of ]. ! #
#> #S * */ */MOD + +! , - . / /MOD 0< 0= 1+ 1- 2! 2* 2/ 2@ 2DROP 2DUP
2OVER 2SWAP < <# = > >BODY >NUMBER ?DUP @ ABORT ABS ACCEPT ALIGN
ALIGNED ALLOT AND BASE BL C! C, C@ CELL+ CELLS CHAR+ CHARS COUNT CR
DECIMAL DEPTH DROP DUP EMIT ENVIRONMENT? EVALUATE EXECUTE FILL FIND
FM/MOD HERE HOLD I IMMEDIATE INVERT J KEY LSHIFT M* MAX MIN MOD MOVE
NEGATE OR OVER QUIT ROT RSHIFT S>D SIGN SM/REM SOURCE SPACE SPACES
STATE SWAP TYPE U. U< UM* UM/MOD XOR

Parsing words with default compilation semantics: ' : CHAR CONSTANT
CREATE VARIABLE WORD. These are not copy-pasteable. By converting them
from parsing words to non-parsing words that take a string, most of
them can be made copy-pasteable; e.g.:

5 "foo" constant

To make this practical, we need literal strings, and explains why
Forth has traditionally used parsing words instead. For the literal
strings, since we don't want the multiple-behaviour S", and want
copy-pasteability, literal strings have to be recognized by the text
interpreter.

The conversion above makes no sense for WORD. Many uses of WORD are
for defining parsing words and these words should be changed to by
take the string(s) from the stack. For the uses in user-defined
text-interpreters, a way to make it copy-pasteable to the regular text
interpreter would be to have another input stream from which WORD
takes its input (no, I have not worked this idea out).

Code using >IN also may behave differently when copy-and-pasted, and
the additional input stream may solve that, too.

For ":" the conversion is not sufficient to make it copy-pasteable,
but I'll discuss that later.

( parses, but is copy-pastable.

Words where the compilation semantics parse: ." ABORT" S" ['] [CHAR].
Most of these can be converted into normal words that take strings on
the stack. E.g., ." foo" becomes "foo" TYPE. S" becomes
unnecessary, because the text interpreter recognizes literal strings.
And we can eliminate some words in the process; not only do we not
need .", but "x" CHAR and "x" [CHAR] do the same, so [CHAR] is
unnecessary. Of course, given that Forth-2012 has 'x', CHAR is
unnecessary, too.

There is, however, a difference between what standard ['] FOO does and
what "FOO" ' would do: standard ['] looks the string up during
compilation, ' during run-time, when the search order and the contents
of the wordlists may be different. The simplest solution is to let
the text interpreter recognize literal xts, e.g., `FOO.

Return-address words: >R R> R@. They can be given default
interpretation semantics (in addition to the default compilation
semantics); in order to work, the return stack must not be used by the
text interpreter across the interpretation of words.

Control structure words: +LOOP BEGIN DO ELSE IF LEAVE LOOP REPEAT THEN
UNTIL WHILE.

One existing solution is to turn on compilation when encountering a
control structure start, and revert back to interpretation upon
control structure end. This works much better in the present context,
because the copy-pasteability reduces the differences that you would
normally encounter with this approach. The compilation needs to be to
a separate dictionary section, so that we can perform dictionary
allocation inside the control structure.

Another solution is to behave somewhat like the control structures do
at run-time, during interpretation time.

Both of these solutions are not satisfactory wrt the requirement to
avoid dual-behaviour words.

Another solution is the Postscript way: replace the control structure words
with words that take xts, e.g.:

5 [: dup 0> ;] [: dup . 1- ;] while

where WHILE performs the first xt, and if it returns true, performs
the second xt, and then repeats the process. LEAVE needs to be
adapted to deal with the next enclosing LOOP (somewhat like THROW for
CATCH). Locals are interesting in connection with this solution.

All these solutions don't allow to copy-and-paste incomplete control
structures, but I don't think that that's a solvable problem.

DOES> ... ; can be replaced by [: ... ;] SET-DOES>, which is already
copy-pasteable in Gforth.

For words that assume a surrounding colon definition, such as EXIT and
RECURSE (and UNLOOP, which only makes sense if followed by an EXIT), I
don't see a sensible way to make them copy-pasteable.

[ and ] could be changed to reduce/increase the compilation level. In
particular, when you have ] FOO [ inside a colon definition, FOO is
POSTPONEd; when you copy it into interpreted code, FOO is compiled.
Sub-0 compilation levels are just interpreter instances with a
separate context (I have not worked this out in detail). This
behaviour of [ and ] leaves the framework of compilation and
interpretation semantics introduced by Forth-94. They always perform
this behaviour, no matter what the state.

LITERAL moves a number from one compilation level to the next higher
one. In particular, for sub-0 levels, it copies the number from the
data stack at that level to the data stack at the next level (not
worked out, either).

POSTPONE is unnecessary, thanks to ] ... [.

[: and ;] (not core, but needed above) nest an anonymous colon
definition inside another, or produce an anonymous colon definition
when used interpretively. This does not satisfy the requirement of no
dual-behaviour words.

To make : and ; work copy-pasteably, they would have to behave like [:
and ;]. Or we just use replace

: foo ... ;

with

"foo" [: ... ;] alias

and forget about : and ;. However, then EXIT and RECURSE are not very
useful in connection with xt-based control structures.

Overall, there are a number of problems to be solved, but a wholly
copy-pasteable Forth would also simplify some issues that we have in
standard Forth.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2018: September 12/14-17 near Edinburgh, Scotland

JennyB

unread,

May 18, 2018, 11:36:36 AM5/18/18

to

Interesting. It looks quite like Factor, or STOIC of many years ago, which did something with nested compilation instead of DOES> that I never understood.

Alex McDonald

unread,

May 18, 2018, 1:52:16 PM5/18/18

to

On 18-May-18 09:00, Anton Ertl wrote:
> Recent discussions have inspired me to think about how a
> (non-standard) Forth system would look that is designed to be as
> copy-pasteable as possible, i.e., where you could copy and paste code
> from interpreted to compiled code and vice versa; and what would not
> be possible. And ideally we would manage to do that without
> multiple-behaviour words.

There are lots of interesting ideas in this.

> Control structure words: +LOOP BEGIN DO ELSE IF LEAVE LOOP REPEAT THEN
> UNTIL WHILE.
>
>

> Another solution is the Postscript way: replace the control structure words
> with words that take xts, e.g.:
>
> 5 [: dup 0> ;] [: dup . 1- ;] while >
> where WHILE performs the first xt, and if it returns true, performs
> the second xt, and then repeats the process. LEAVE needs to be
> adapted to deal with the next enclosing LOOP (somewhat like THROW for
> CATCH). Locals are interesting in connection with this solution.
>

(I'm sure you're aware of this but...)

Hello block structured Forth. It's possible to do it right now in
standard Forth.

: ifTrue ( i*x cond xtT -- i*y )

swap if execute else drop then ;

: ifElse ( i*x cond xtT xtF -- i*y )

rot if drop else nip then

execute ;

10 value n

n 20 > [: "too big" ;]
[: n {: test :}
"just right"
test 10 < [: 2drop "too small" ;] ifTrue
;] ifElse

This has a very Smalltalk feel to it.

> All these solutions don't allow to copy-and-paste incomplete control
> structures, but I don't think that that's a solvable problem.

Cut&paste works fine on my system (although without the locals in the
example shown, since I have a borked locals implementation). There's
nothing on the control (which is the data) stack.

--
Alex

JennyB

unread,

May 30, 2018, 4:34:28 AM5/30/18

to

Another useful literal prefix might return the data
address for words that have mutable internal data.

Thus:

"foo" value "bar" 2value "baz" defer
23 @_foo ! foo foo @_bar 2! 'foo @_baz !

a...@littlepinkcloud.invalid

unread,

May 30, 2018, 4:42:29 AM5/30/18

to

I think there should be strong pushback against any language changes
which are no more than a "new" way of writing something that is
already easily expressed in the standard language.

Andrew.

JennyB

unread,

Jun 1, 2018, 5:52:07 AM6/1/18

to

Of course. I'm merely contemplating how different Forth might have looked if it had had literal recognizers from the start.

minf...@arcor.de

unread,

Jun 1, 2018, 9:38:38 AM6/1/18

to

The obvious solution would be intermediate compilation and execution of
interpreted commands. But one would need some sort of dataspace or macro
manager then to clean intermdiate stuff from memory that is no longer needed.

Unsurprisingly standard Forth already needs such beasts for compilation of
locals or intermediate string data storage for SUBSTITUTE et al.

A more comfortable FILE wordset implementation could also unroll all definitions
in case of exceptions during INCLUDED by intermediate storage of system snapshots.

a...@littlepinkcloud.invalid

unread,

Jun 1, 2018, 9:56:00 AM6/1/18

to

JennyB <jenny...@googlemail.com> wrote:
> On Wednesday, 30 May 2018 09:42:29 UTC+1, a...@littlepinkcloud.invalid wrote:
>> JennyB <jenny...@googlemail.com> wrote:
>> > Another useful literal prefix might return the data
>> > address for words that have mutable internal data.
>> >
>> > Thus:
>> >
>> > "foo" value "bar" 2value "baz" defer
>> > 23 @_foo ! foo foo @_bar 2! 'foo @_baz !
>>
>> I think there should be strong pushback against any language changes
>> which are no more than a "new" way of writing something that is
>> already easily expressed in the standard language.
>

> Of course.

Please forgive me, but clearly it's not onbious to everyone. Some of
the dicussion around the standard over the last few years has been
precisely about new ways to write things that are already easily
expressed. Number prefixes and deferred words, for example.

> I'm merely contemplating how different Forth might have looked if it
> had had literal recognizers from the start.

Fair enough.

Andrew.

Anton Ertl

unread,

Jun 1, 2018, 1:54:44 PM6/1/18

to

There are two different issues here:

1) Do we want to take the address of VALUEs and such? I don't want
that, because without it, value-flavoured things have the nice
property of not aliasing with anything else, which makes various
compiler transformations possible or at least much easier; e.g., if
you have a VALUE X, and the code

( addr n ) TO X ( addr ) @

you can move the @ before the TO X:

( addr n ) swap @ swap to x

This may not look very attractive for a threaded-code implementation,
but it can speed up an analytical native-code implementation on a CPU
with in-order execution. For a more substantial example, consider
this loop containing the value X:

do ... X ... ! loop

If we know that the ! cannot change X (because the address of X is not
taken), we load X into a register before the loop, and only use the
register inside.

In short: If you want the address, use a VARIABLE.

2) If, despite 1), we want to take the address of VALUEs, then yes,
having a recognizer is a better alternative to having a parsing word
ADDR. The syntax is something we can have a long bike-shedding
session on, though:-).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html

EuroForth 2018: http://www.euroforth.org/ef18/

jean-...@buechi-schmitt.ch

unread,

Jun 2, 2018, 6:55:50 PM6/2/18

to

Hello everybody,
I did not understand the subject you are discussing.
Just to tell you what I did to solve some of my problems.
I changed the word (against the word S (the comentaries of stack are then
like this S (----).

The word (now the same function as the word S "then I created a stack
alphanumeric data or are stored strings of characters.
The information on this stack is marked with tags that start
by #FileName C: \ SwiftForth \ ----
To access a label, write:

(#FileName) List> This returns to the stack Forth an addr bipointor, len to the string that comes after the label.

This procedure allows you to manipulate any kind of information.
Here we make a very clear distinction between a function and the information it has to deal with.
For example, creating a database will be as follows:

( #Dbase MydataBase #Size 1024 )NewDbase

Of course we can put this in a definition:

: Test ( #Dbase MydataBase #Size 1024 )NewDbase \ Creation Dbase 1024 * 4096 Octets
(MydataBase) OpenBase
Etc ...
;
I think the Forth machine is very well done, just add a layer
extra that will allow to manipulate objects without being declared.

jean-...@buechi-schmitt.ch

unread,

Jun 2, 2018, 6:57:12 PM6/2/18

to

Hello everybody,
I did not understand the subject you are discussing.
Just to tell you what I did to solve some of my problems.
I changed the word (against the word S (the comentaries of stack are then
like this S (----).

The word (now the same function as the word S "then I created a stack
alphanumeric data or are stored strings of characters.
The information on this stack is marked with tags that start
by #FileName C: \ SwiftForth \ ----
To access a label, write:

(#FileName) List> This returns to the stack Forth an addr bipointor, len to the string that comes after the label.

This procedure allows you to manipulate any kind of information.
Here we make a very clear distinction between a function and the information it has to deal with.
For example, creating a database will be as follows:

(#Dbase MydataBase #Size 1024) NewDbase

Of course we can put this in a definition:

: Test (#Dbase MydataBase #Size 1024) NewDbase \ Creation Dbase

(MydataBase) OpenBase
Etc ...
;
I think the Forth machine is very well done, just add a layer
extra that will allow to manipulate objects without being declared.

Cordialement
Jean-Pierre Schmitt