Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Poll: strings notation

479 views
Skip to first unread message

Ruvim

unread,
Dec 5, 2020, 10:06:29 AM12/5/20
to
# Strings notation

## Introduction in normative terminology (convoluted)

### Standard strings

The standard representation for a character string data object [1] is a
cell-pair data object [2], where the first cell is the length of the
string, and the second cell is the starting address of the string. NB:
standard numbering of the cells in a cell-pair on a stack is going from
the top to the bottom of the stack.

A cell-pair that represents a character string is denoted in the stack
notation by the individual components of this cell-pair, namely by their
data types symbols, "c-addr" (character-aligned address), and "u"
(unsigned number): "( c-addr u )".

Just for comparison, double numbers (that are also cell-pairs [3]) are
usually denoted in the stack notation by the dedicated data type symbol
"d" (i.e., without referring the components of the corresponding cell-pair).


In some cases, people also introduce a symbol to denote a cell-pair that
represents a character string. For example, I sometimes use 'd-txt'
symbol (with suffixes). Using this symbol the stack notation for the
SEARCH word can be expressed as ( d-txt.1 d-txt.2 -- d-txt.3 ).



### Custom strings

Many Forth systems or libraries provide a set of operations on strings
as opaque data objects that are identified by single-cell values (just
to be clear, a single-cell value is also a data object).

An example: FFL Dynamic text string module [4]. In this module
documentation the symbol "str" is used to denote a string data object
identifier in the stack notation.

E.g.: str-get ( str -- c-addr u )



## References

[1] 3.1.4.2 Character strings
https://forth-standard.org/standard/usage#subsection.3.1.4.2

[2] 3.1.4 Cell-pair types
https://forth-standard.org/standard/usage#subsection.3.1.4

[3] 3.1.4.1 Double-cell integers
https://forth-standard.org/standard/usage#subsection.3.1.4.1

[4] FFL, Dynamic text string
http://irdvo.nl/FFL/docs/str.html





## Poll


1. What symbols (if any) do you use to denote a *single-cell* value that
identifies a string? (NB: it doesn't matter what particular library is
used).


2. What symbols (if any) do you use to denote a *cell-pair* that
represents a character string? (NB: without referring the individual
components).



--
Ruvim

NN

unread,
Dec 5, 2020, 1:10:37 PM12/5/20
to
1. I use $ to indicate the address refers to a string
I use it for both byte-counted and long-counted.
I rarely mix the two. However for my clarity I might
add C ( char) and L (cell).

2. No symbols because the stack signature usually says :: adr len
and I know its a cell pair. And the comments indicate
if its a string or number vector.

Trebor English

unread,
Dec 5, 2020, 9:59:01 PM12/5/20
to
On Saturday, December 5, 2020 at 10:06:29 AM UTC-5, Ruvim wrote:
> # Strings notation
>
> ## Introduction in normative terminology (convoluted)
>
> ### Standard strings
>
> The standard representation for a character string data object [1] is a
> cell-pair data object [2], where the first cell is the length of the
> string, and the second cell is the starting address of the string. NB:
> standard numbering of the cells in a cell-pair on a stack is going from
> the top to the bottom of the stack.
>
. . . snip . . .
>
> Just for comparison, double numbers (that are also cell-pairs [3]) are
> usually denoted in the stack notation by the dedicated data type symbol
> "d" (i.e., without referring the components of the corresponding cell-pair).
>
. . . snip . . .

Double numbers make sense when you use words like D+ to add one double integer to another double integer. It would not make sense to add one address and count to another address and count. They are not double numbers. They only occupy the same number of bits as a double integer.

A word like TYPE takes two parameters, the address of a string of characters and the number of characters to send to the screen.

CMOVE and CMOVE> take three parameters that are also adjacent on the stack and related. Like c-addr u for TYPE they remain separate parameters. Three cells could be described as a pair and a half a pair. I think of them and I type
( FROM-addr TO-addr count -- )

A single cell pointer to the count byte of a counted string is just a character pointer. I suppose it could point to a null terminated string but that would be surprising in most Forth code. Since before FIG Forth the word COUNT has been used to take the address of a counted string and produce a c-addr u.
: COUNT ( c-addr -- c-addr u ) DUP 1+ SWAP C@ ;

It is my preference to use a single cell address of the count byte of a string if that is possible. Sometimes a text string is in memory without a count byte such as in blocks or files. It is easy enough to convert a counted string address to a c-addr u when necessary. Inside the word TYPE the address advances and the count counts. They must be separated.

From a thing like d-txt I would infer that the d means it is somehow related to a double integer. Other people may draw other inferences. If you used 2-txt I would be more likely to infer that it was a two cell thing like with 2DUP, 2OVER, or 2! rather than a double integer.

>
> 2. What symbols (if any) do you use to denote a *cell-pair* that
> represents a character string? (NB: without referring the individual
> components).
>

Why do you want to not specify the individual parameters?

Ruvim

unread,
Dec 6, 2020, 7:13:37 AM12/6/20
to
On 2020-12-06 05:58, Trebor English wrote:
[...]
> From a thing like d-txt I would infer that the d means it is > somehow related to a double integer. Other people may draw
> other inferences.

Yes, 'd' is a suboptimal choice. A better variant would be 'xd-txt',
since 'xd' is the standard symbol for the unspecified cell pair data type.


> If you used 2-txt I would be more likely to infer that it was
> a two cell thing like with 2DUP, 2OVER, or 2! rather than a double
> integer.

I see. But I would prefer something nearer to the standard notation.

The standard stack notation is based on the data type symbols: each
parameter on the stack is denoted by a data type symbol, with possible
suffix. See the sections 3.1 and 2.2.2

In the standard only digits are used as suffixes:

| Multiple instances of the same type in the description
| of a definition are suffixed with a sequence digit
| subscript to distinguish them


A generalization of this approach can be the following format:

<data-type-symbol><separator><suffix>



>> 2. What symbols (if any) do you use to denote a *cell-pair* that
>> represents a character string? (NB: without referring the individual
>> components).
>>
>
> Why do you want to not specify the individual parameters?

Just to be more concise in the stack notations and documentation.
The same suffix is repeated for both components of a cell-pair, and when
you use sensible suffixes, it become even more long.

For example, using 'sd' symbol for cell pairs that represent strings,
we can write

MATCH-HEAD ( sd-src sd-key -- sd-tail true | sd-src false )

instead of a rather longer variant:

MATCH-HEAD ( c-addr-src u-src c-addr-key u-key
-- c-addr-tail u-tail true | c-addr-src u-src false )



--
Ruvim

none albert

unread,
Dec 6, 2020, 7:40:09 AM12/6/20
to
In article <rqg7li$f3i$1...@dont-email.me>, Ruvim <ruvim...@gmail.com> wrote:
>## Poll
>
>
>1. What symbols (if any) do you use to denote a *single-cell* value that
>identifies a string? (NB: it doesn't matter what particular library is
>used).
>
>
>2. What symbols (if any) do you use to denote a *cell-pair* that
>represents a character string? (NB: without referring the individual
>components).

ciforth (lina wina)

An address length pair is a string constant, and I call it sc.
There is no special name for an address where you could store a
string constant, not any more than for an address where you could
store a floating point number. You can name such a buffer
a float variable or a string variable, but a buffer only becomes
that by storing an fp or string there.
"
$!

STACKEFFECT: sc addr ---

DESCRIPTION:

Store a string constant sc in the string variable at address addr.
"

Why would the stack description be differently than

"
!

STACKEFFECT: n addr ---

DESCRIPTION:

Store all 64 bits of n at addr .
"

>--
>Ruvim

Groetjes Albert
--
This is the first day of the end of your life.
It may not kill you, but it does make your weaker.
If you can't beat them, too bad.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

David N. Williams

unread,
Dec 6, 2020, 9:50:28 AM12/6/20
to
On 12/5/20 10:06 AM, Ruvim wrote:
> # Strings notation
> [...]
> ## Poll
>
>
> 1. What symbols (if any) do you use to denote a *single-cell* value
> that identifies a string? (NB: it doesn't matter what particular
> library is used).

I use "$" prefixes for the single cell identifiers on my garbage-collected
string stack, e.g.:

($: $1 -- $2 )

These are actually the addresses of counted strings in the string stack
space. The count field is byte-sized, half cell-sized, or cell-sized,
depending on the string library configuration, with cell as the default
size

Words that operate with such strings generally have "$" in their name,
e.g.;

$VARIABLE date$ $" 6-Dec-2020" date$ $!

> 2. What symbols (if any) do you use to denote a *cell-pair* that
> represents a character string? (NB: without referring the individual
> components).

I use "s" prefixes or suffixes, e.g. in the form:

( s -- s' flag )
( s1 -- s2 flag )
( -- date.s )

Or sometimes:

( -- date-s )

-- David

dxforth

unread,
Dec 6, 2020, 6:54:44 PM12/6/20
to
It appears ANS (COMPARE SEARCH) considered such situations rare enough to
question the value of a dedicated symbol. It's no more burden numbering
adr/len pairs than numbering n u d ud etc. which one can explain in the
comment that will likely accompany the function.

NN

unread,
Dec 7, 2020, 5:23:50 AM12/7/20
to
On Saturday, 5 December 2020 at 15:06:29 UTC, Ruvim wrote:
I am not sure about the purpose of this poll, but you missed one.

There are 3 types of string representations

1. cell-pair (-- adr len )
2. length counted either as byte-counted or cell-counted.
3. null terminated strings ( c-strings )

Ruvim

unread,
Dec 7, 2020, 5:55:42 PM12/7/20
to
On 2020-12-07 13:23, NN wrote:
> On Saturday, 5 December 2020 at 15:06:29 UTC, Ruvim wrote:
[...]
>> ## Poll
>>
>>
>> 1. What symbols (if any) do you use to denote a *single-cell* value that
>> identifies a string? (NB: it doesn't matter what particular library is
>> used).
>>
>>
>> 2. What symbols (if any) do you use to denote a *cell-pair* that
>> represents a character string? (NB: without referring the individual
>> components).
>>
>>
>
> I am not sure about the purpose of this poll,

I would like to introduce a common symbol for cell pairs that represent
character strings. And this symbol should not conflict with other known
symbols.


> but you missed one.
>
> There are 3 types of string representations

> 1. cell-pair (-- adr len )
> 2. length counted either as byte-counted or cell-counted.
> 3. null terminated strings ( c-strings )

In these 2 and 3 cases a string is identified by an address (see
3.1.3.4), and this address is a single-cell value that identifies this
string.

My item 1 above covers all the cases of single-cell identifiers, and so
these two cases too. In each case you can use a different symbol (since
these cases correspond to the different data types), and all of them may
be mentioned in your answer.


--
Ruvim

NN

unread,
Dec 8, 2020, 4:43:01 AM12/8/20
to
On Monday, 7 December 2020 at 22:55:42 UTC, Ruvim wrote:
> On 2020-12-07 13:23, NN wrote:
> > On Saturday, 5 December 2020 at 15:06:29 UTC, Ruvim wrote:
> [...]
> >> ## Poll
> >>
> >>
> >> 1. What symbols (if any) do you use to denote a *single-cell* value that
> >> identifies a string? (NB: it doesn't matter what particular library is
> >> used).
> >>
> >>
> >> 2. What symbols (if any) do you use to denote a *cell-pair* that
> >> represents a character string? (NB: without referring the individual
> >> components).
> >>
> >>
> >
> > I am not sure about the purpose of this poll,
> I would like to introduce a common symbol for cell pairs that represent
> character strings. And this symbol should not conflict with other known
> symbols.
> > but you missed one.
> >

Is the purpose of this common symbol as notation in comments or in the
code ?

If it's within the code would you not consider it a form of typing ?


[...snipped...]

Stephen Pelc

unread,
Dec 8, 2020, 6:52:58 AM12/8/20
to
On Tue, 8 Dec 2020 01:55:35 +0300, Ruvim <ruvim...@gmail.com>
wrote:

>I would like to introduce a common symbol for cell pairs that represent
>character strings. And this symbol should not conflict with other known
>symbols.

What's wrong with
string
?

During the character discussions, it became apparent that the notation

previously called "primitive character" was being called "p-char" in
speech.

I can say "string" and "caddr len" easily. Similarly "c-string"
(counted string) and "z-string" (zero terminated).

IMHO the caddr len string is there to construct higher levels
of abstraction. Any commercial package will soon develop a higher
level package in which the construction of strings is hidden.
These strings will then be represented by single-cell references.

Stephen

--
Stephen Pelc, ste...@vfxforth.com <<< NEW
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, +44 (0)78 0390 3612
web: http://www.mpeforth.com - free VFX Forth downloads

Ruvim

unread,
Dec 8, 2020, 7:38:45 AM12/8/20
to
On 2020-12-08 12:42, NN wrote:
> On Monday, 7 December 2020 at 22:55:42 UTC, Ruvim wrote:
>> On 2020-12-07 13:23, NN wrote:
>>> On Saturday, 5 December 2020 at 15:06:29 UTC, Ruvim wrote:
>> [...]
>>>> ## Poll
>>>>
>>>>
>>>> 1. What symbols (if any) do you use to denote a *single-cell* value that
>>>> identifies a string? (NB: it doesn't matter what particular library is
>>>> used).
>>>>
>>>>
>>>> 2. What symbols (if any) do you use to denote a *cell-pair* that
>>>> represents a character string? (NB: without referring the individual
>>>> components).
>>>>
>>>>
>>>
>>> I am not sure about the purpose of this poll,
>> I would like to introduce a common symbol for cell pairs that represent
>> character strings. And this symbol should not conflict with other known
>> symbols.

> Is the purpose of this common symbol as notation in comments or in the
> code ?

In comments, including stack comments.

However, tidy stack comments can be used by some third-party type
checkers. But I isn't aware of any particular implementation yet [1]




> If it's within the code would you not consider it a form of typing ?

Actually, a form of typing is a substantial part of Forth.


If we take the following definition [2]

| A language is typed if the specification
| of every operation defines types of data
| to which the operation is applicable.

then Forth is of course a typed language. And then it's incorrect to
claim that Forth is an untyped language. (NB: it may be wrong in another
terminology/definitions)

But data-type checking in Forth is only optional:

| No data-type checking is required of a system.
| An ambiguous condition exists if an incorrectly
| typed data object is encountered.
( 3.1 Data types [3])


So in Forth the programmer should himself ensure that all data objects
belong to the data types expected by operations and their context.

| Forth rarely explicitly imposes data-type restrictions.
| Still, data types implicitly do exist, and discipline
| is required, particularly if portability of programs
| is a goal. In Forth, it is incumbent upon the programmer
| (rather than the compiler) to determine that data are
| accurately typed.
( A.3.1 Data types [4])





[1] https://stackoverflow.com/a/40321950/1300170
[2] https://w.wiki/pgo
[3] https://forth-standard.org/standard/usage#usage:data
[4] https://forth-standard.org/standard/rationale#rat:types



--
Ruvim

NN

unread,
Dec 8, 2020, 8:27:50 AM12/8/20
to
I am not against type checking because catching silly errors can
only be a good thing in writing correct software.

If I recall correctly, Factor (programming language) does type checking
from stack signatures. The checker is used to good effect in identifying
errors at an early stage. It doesnt not seem to have hampered progress
within factor. ( I cant remember who's blog it I read this - apologies )

Ruvim

unread,
Dec 8, 2020, 3:47:57 PM12/8/20
to
On 2020-12-08 14:52, Stephen Pelc wrote:
> On Tue, 8 Dec 2020 01:55:35 +0300, Ruvim <ruvim...@gmail.com>
> wrote:
>
>> I would like to introduce a common symbol for cell pairs that represent
>> character strings. And this symbol should not conflict with other known
>> symbols.
>
> What's wrong with
> string
> ?

Looking at Table 3.1: Data types, the "string" symbol seems too long,
and the "string" name seems too confusing: what particular string and
why namely this.


> During the character discussions, it became apparent that the notation
>
> previously called "primitive character" was being called "p-char" in
> speech.
>
> I can say "string" and "caddr len" easily. Similarly "c-string"
> (counted string) and "z-string" (zero terminated).
>
> IMHO the caddr len string is there to construct higher levels
> of abstraction. Any commercial package will soon develop a higher
> level package in which the construction of strings is hidden.
> These strings will then be represented by single-cell references.


What about the following variant?



Symbol | Data type | Size on stack
-------|-------------------|---------------
sd | primitive string | 2 cells


Symbol: sd
Data type: primitive string
Size on stack: 2 cells



Rationale

All standard symbols for subtypes of the cell-pair (and it itself) end
with 'd'. So 'sd' says that it is a string and a cell-pair.

"Primitive string" name says that it's the most basic and general
representation for a character string.


--
Ruvim

jan4comp....@murray-microft.co.uk

unread,
Dec 8, 2020, 5:40:53 PM12/8/20
to
On Tue, 8 Dec 2020 15:38:42 +0300
Ruvim <ruvim...@gmail.com> wrote:

> However, tidy stack comments can be used by some third-party type
> checkers. But I isn't aware of any particular implementation yet [1]

"StrongForth [1] is a programming language that is very close to ANS
Forth. One of the biggest differences is that it includes strong
static type-checking."

Some object oriented Forths (Oforth[2] 8th[3]) must also have some way
of tagging different data types to select run time actions, so likely
have a way to make this clear in program source.

Jan Coombs
--

[1] StrongForth Homepage
https://www.stephan-becher.de/strongforth/

[2] The Oforth Programming Language
"Oforth is dynamically typed : all items on the stack are objects and
each object has a type."
http://www.oforth.com/

[3] "8th differs from more traditional Forths in a number of ways.
First of all, it is strongly typed and has a plethora of useful types
(dynamic strings, arrays, maps, queues and more)."
https://community.arm.com/developer/ip-products/system/b/embedded-blog/posts/8th-a-gentle-introduction-to-a-modern-forth
"One Effort, Multiple Platforms"
https://8th-dev.com/


Stephen Pelc

unread,
Dec 8, 2020, 6:02:24 PM12/8/20
to
On Tue, 8 Dec 2020 23:47:55 +0300, Ruvim <ruvim...@gmail.com>
wrote:

>"Primitive string" name says that it's the most basic and general
>representation for a character string.

On the principle of least surprise and by reference to pchar, let's
use pstring.

Personally, in a quick glance at a stack comment, I count the symbols.
Hence, except for Dxxxx and 2yyyy, I would prefer to reference
primitive strings as caddr len or caddr/len, which is only two
characters longer than pstring. Clarity is much more important
than saving a few keystrokes.

Ruvim

unread,
Dec 9, 2020, 4:39:08 AM12/9/20
to
On 2020-12-09 02:02, Stephen Pelc wrote:
> On Tue, 8 Dec 2020 23:47:55 +0300, Ruvim <ruvim...@gmail.com>
> wrote:
>
>> "Primitive string" name says that it's the most basic and general
>> representation for a character string.
>
> On the principle of least surprise and by reference to pchar, let's
> use pstring.
>
> Personally, in a quick glance at a stack comment, I count the symbols.
> Hence, except for Dxxxx and 2yyyy, I would prefer to reference
> primitive strings as caddr len or caddr/len, which is only two
> characters longer than pstring. Clarity is much more important
> than saving a few keystrokes.


Using standard data type symbols, "." as a separator, and sensible
suffixes, the stack notation for the MATCH-HEAD word can be expressed as
the following:

MATCH-HEAD
( c-addr.src u.src c-addr.key u.key -- c-addr.tail u.tail true |
c-addr.src u.src false )

The main issue of this variant is that the same long suffix is repeated
twice, for both components of a cell-pair.


Using "pstring" symbol:

MATCH-HEAD
( pstring.src pstring.key -- pstring.tail true | pstring.src false )


Using "sd" symbol:

MATCH-HEAD
( sd.src sd.key -- sd.tail true | sd.src false )

In this variant it's more clear that 'sd' is a cell pair.




As an idea, a hierarchical notation for data types:

MATCH-HEAD
( xd.ps.src xd.ps.key -- xd.ps.tail true | xd.ps.src false )


'xd' — is for an unspecified cell pair
'ps' — for primitive string


--
Ruvim

Ruvim

unread,
Dec 9, 2020, 6:04:16 AM12/9/20
to
On 2020-12-09 01:40, jan4comp....@murray-microft.co.uk wrote:
> On Tue, 8 Dec 2020 15:38:42 +0300
> Ruvim <ruvim...@gmail.com> wrote:
>
>> However, tidy stack comments can be used by some third-party type
>> checkers. But I isn't aware of any particular implementation yet [1]
>
> "StrongForth [1] is a programming language that is very close to ANS
> Forth. One of the biggest differences is that it includes strong
> static type-checking."
>
> Some object oriented Forths (Oforth[2] 8th[3]) must also have some way
> of tagging different data types to select run time actions, so likely
> have a way to make this clear in program source.


Thanks for the links.

By a third-party type checker I mean a module that can be loaded into a
standard Forth system and then it validates a program, or a system that
can validate a standard Forth program.

It looks like an interesting challenge.


--
Ruvim

David N. Williams

unread,
Dec 9, 2020, 12:15:09 PM12/9/20
to
On 12/9/20 4:39 AM, Ruvim wrote:
> [...]
>
> Using "sd" symbol:
>
> MATCH-HEAD
> ( sd.src sd.key -- sd.tail true | sd.src false )
>
> In this variant it's more clear that 'sd' is a cell pair.

I find this more readable:

MATCH-HEAD
( src.s key.s -- tail.s true | src.s false )

To me, the trailing .<something> notation signals that <something> is a
data type. I have no problem remembering that "s" is the fundamental
Forth string type consisting of ( addr u), without needing a "d"
indicator.

For some reason, in this case I find having the type indicator in front
to be an extra conceptual load. On the other hand, I would certainly
write ( s1 s2 -- s3 flag ), wIth "s" in front, when I only need to
indicate generic strings, and don't care about any particular function.

I don't get the "s" in these cases confused with "s" for "data stack" in
cases like

(f: x y -- s: flag )

All a personal reaction, of course.

-- David

Anton Ertl

unread,
Dec 14, 2020, 9:47:11 AM12/14/20
to
NN <novembe...@gmail.com> writes:
>If I recall correctly, Factor (programming language) does type checking
>from stack signatures. The checker is used to good effect in identifying
> errors at an early stage. It doesnt not seem to have hampered progress
>within factor. ( I cant remember who's blog it I read this - apologies )

My experience with Factor's type checking is that at Forth-Tagung 2012
we had a workshop on Factor. Everybody had to write a small Factor
program with a twist that is trivial in Forth, but requires getting
the typing right in Factor. I failed to get the typing right; my
mental model of how the typing in factor works was obviously wrong.
Others got it right. Anyway, my takeaway from the workshop was that
Factor's typing gets in the way rather than having a good effect.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2020: https://euro.theforth.net/2020

Marcel Hendrix

unread,
Dec 15, 2020, 2:21:10 AM12/15/20
to
On Monday, December 14, 2020 at 3:47:11 PM UTC+1, Anton Ertl wrote:
> Anyway, my takeaway from the workshop was that
> Factor's typing gets in the way rather than having a good effect.

Do you remember any motivating examples of serious errors that are
caught by such a checker?

My (very likely, flawed) recollection of e.g. StrongForth discussions
is that even in that case this IMO important question was never
resolved or even addressed.

-marcel

Anton Ertl

unread,
Dec 15, 2020, 4:25:37 AM12/15/20
to
Marcel Hendrix <m...@iae.nl> writes:
>On Monday, December 14, 2020 at 3:47:11 PM UTC+1, Anton Ertl wrote:
>> Anyway, my takeaway from the workshop was that
>> Factor's typing gets in the way rather than having a good effect.
>
>Do you remember any motivating examples of serious errors that are
>caught by such a checker?

I have had difficulties in debugging code where a lot of addresses of
structures are flying around. It's no easy to see which address is
which, and if the structures contain other addresses, I don't see it
when looking at what the addresses point to, either.

There are classic Forth approaches like testing each word individually
that I did not do sufficiently for that code, IIRC because quite a bit
of data needs to be built up before the words can be fully tested.

I have been thinking about adding type knowledge to structures and
structure access words in order to facilitate debugging such code, but
have not taken action yet.

In any case, a Forth type checker would have to know about structures
to help in such cases.

Other than that, type errors have not been a problem in Forth in my
experience. I think the reason is that Forthers have a variety of
ways to avoid such problems (e.g., testing individual words); people
coming from languages with more typechecking may miss their type
checker in Forth because they have not yet acquired these skills.

Gerry Jackson

unread,
Dec 15, 2020, 4:28:30 AM12/15/20
to
On 14/12/2020 14:39, Anton Ertl wrote:
> NN <novembe...@gmail.com> writes:
>> If I recall correctly, Factor (programming language) does type checking
>>from stack signatures. The checker is used to good effect in identifying
>> errors at an early stage. It doesnt not seem to have hampered progress
>> within factor. ( I cant remember who's blog it I read this - apologies )
>
> My experience with Factor's type checking is that at Forth-Tagung 2012
> we had a workshop on Factor. Everybody had to write a small Factor
> program with a twist that is trivial in Forth, but requires getting
> the typing right in Factor. I failed to get the typing right; my
> mental model of how the typing in factor works was obviously wrong.
> Others got it right. Anyway, my takeaway from the workshop was that
> Factor's typing gets in the way rather than having a good effect.
>

It seems to me that some languages have such a complicated typing system
that confuses programmers into creating the sort of problems that the
typing system checks find. Like a self-fulfilling prophecy.

--
Gerry

minf...@arcor.de

unread,
Dec 15, 2020, 5:11:36 AM12/15/20
to
Types make more sense in languages with a higher degree of abstraction
(Forth has practically none even when you count in locals). Ada and others
can give the compiler additional information about value ranges. I don't know
whether there are languages that could prevent undefined operations, like
adding temperature to length.

All that makes some sense for writing safety-critical applications. But it
is not perfect, costs a lot of coding effort, and catches 'perhaps' only a certain
class of programming mistakes.

For Forth it would be an overkill IMO. It is a stack-based language where
the stack implies 'some' type information. Floating-point numbers reside
on the fp-stack, operators are not overloaded (except that evil TO), but
mistaking stack elements is only caught late during runtime.

Beefing this up by additional type information for a Forth compiler would still
be without much effect, because f.ex. SWAP cannot be made type-aware.

NN

unread,
Dec 15, 2020, 6:07:17 AM12/15/20
to
On Tuesday, 15 December 2020 at 10:11:36 UTC, minf...@arcor.de wrote:
> Gerry Jackson schrieb am Dienstag, 15. Dezember 2020 um 10:28:30 UTC+1:
> > On 14/12/2020 14:39, Anton Ertl wrote:
> > > NN <novembe...@gmail.com> writes:
> > >> If I recall correctly, Factor (programming language) does type checking
> > >>from stack signatures. The checker is used to good effect in identifying
> > >> errors at an early stage. It doesnt not seem to have hampered progress
> > >> within factor. ( I cant remember who's blog it I read this - apologies )
> > >
> > > My experience with Factor's type checking is that at Forth-Tagung 2012
> > > we had a workshop on Factor. Everybody had to write a small Factor
> > > program with a twist that is trivial in Forth, but requires getting
> > > the typing right in Factor. I failed to get the typing right; my
> > > mental model of how the typing in factor works was obviously wrong.
> > > Others got it right. Anyway, my takeaway from the workshop was that
> > > Factor's typing gets in the way rather than having a good effect.
> > >
> > It seems to me that some languages have such a complicated typing system
> > that confuses programmers into creating the sort of problems that the
> > typing system checks find. Like a self-fulfilling prophecy.
> >
> Types make more sense in languages with a higher degree of abstraction
> (Forth has practically none even when you count in locals). Ada and others
> can give the compiler additional information about value ranges. I don't know
> whether there are languages that could prevent undefined operations, like
> adding temperature to length.
>

f# has something called units of measure, just to catch things like adding
temp to length.

NN

unread,
Dec 15, 2020, 6:33:42 AM12/15/20
to
Types exists to catch /stop a certain class of errors, when used correctly.

Seems to me that blaming the type system is an easy way out for the 'confused'
programmer. He should not be allowed to use it as a get-out-of-jail card

NN

unread,
Dec 15, 2020, 6:43:18 AM12/15/20
to
Workshops are great at providing overviews but I am not sure they overlap
the same set as tutorials

Despite your poor experience, I am sure you will agree it does not
diminish the value of typing.

Have there been any other factor workshops since 2012.



none albert

unread,
Dec 15, 2020, 8:28:11 AM12/15/20
to
In article <adaf007a-c1a7-4908...@googlegroups.com>,
minf...@arcor.de <minf...@arcor.de> wrote:
>Gerry Jackson schrieb am Dienstag, 15. Dezember 2020 um 10:28:30 UTC+1:
>> On 14/12/2020 14:39, Anton Ertl wrote:
>> > NN <novembe...@gmail.com> writes:
>> >> If I recall correctly, Factor (programming language) does type checking
>> >>from stack signatures. The checker is used to good effect in identifying
>> >> errors at an early stage. It doesnt not seem to have hampered progress
>> >> within factor. ( I cant remember who's blog it I read this - apologies )
>> >
>> > My experience with Factor's type checking is that at Forth-Tagung 2012
>> > we had a workshop on Factor. Everybody had to write a small Factor
>> > program with a twist that is trivial in Forth, but requires getting
>> > the typing right in Factor. I failed to get the typing right; my
>> > mental model of how the typing in factor works was obviously wrong.
>> > Others got it right. Anyway, my takeaway from the workshop was that
>> > Factor's typing gets in the way rather than having a good effect.
>> >
>> It seems to me that some languages have such a complicated typing system
>> that confuses programmers into creating the sort of problems that the
>> typing system checks find. Like a self-fulfilling prophecy.
>>
>
>Types make more sense in languages with a higher degree of abstraction
>(Forth has practically none even when you count in locals). Ada and others
>can give the compiler additional information about value ranges. I don't know
>whether there are languages that could prevent undefined operations, like
>adding temperature to length.

There may not be languages, but I added such a feature to Java in a
control program for chips baking machines. There was a team of 20 working
on those.
Having physical dimension (length, power, energy) stored with each entity
allowed for much less error prone display.
(Shell want oil output not in cubic metre but kilobarrels/day as is the
natural way to count oil.)

Anton Ertl

unread,
Dec 15, 2020, 9:45:38 AM12/15/20
to
"minf...@arcor.de" <minf...@arcor.de> writes:
>I don't know
>whether there are languages that could prevent undefined operations, like
>adding temperature to length.

And if you do, you should support units as well.

I have heard that Fortress does this, but it has been discontinued.

Given that writing various numbers with units is de rigeur in
engineering and natural sciences, and there have been high-profile
failures due to erroneous treatment of units, it is surprising that
languages that support units have failed to become popular, while in
the area of typing many different kind of languages are popular.

Anton Ertl

unread,
Dec 15, 2020, 10:01:40 AM12/15/20
to
NN <novembe...@gmail.com> writes:
>On Monday, 14 December 2020 at 14:47:11 UTC, Anton Ertl wrote:
>> My experience with Factor's type checking is that at Forth-Tagung 2012
>> we had a workshop on Factor. Everybody had to write a small Factor
>> program with a twist that is trivial in Forth, but requires getting
>> the typing right in Factor. I failed to get the typing right; my
>> mental model of how the typing in factor works was obviously wrong.
>> Others got it right. Anyway, my takeaway from the workshop was that
>> Factor's typing gets in the way rather than having a good effect.
...
>Workshops are great at providing overviews but I am not sure they overlap
>the same set as tutorials

Actually, looking at the announcement
<https://alt.forth-ev.de/article.php/201112161129336>, it says "Factor
Schulung". I leave it to you whether you translate it into
"tutorial" or "training workshop".

>Despite your poor experience, I am sure you will agree it does not
>diminish the value of typing.

It is unclear to me what exactly you want me to agree on.

Anyway, I think that a seasoned Factor user would have done the
assignment in a snap, and would not even have noticed any limitation
coming from the type checker, just like a seasoned Forth programmer
knows how to use Forth in a way that does not need type checking, how
a seasoned Modula-2 programmer does not write newbie code like "x>y
AND y>z" and knows what to make of the the type error reported by the
compiler for this code, like a seasoned Haskell programmer knows how
to prevent or deal with multi-line type errors coming from type
inference that fail to pinpoint where the error is, etc.

But there is a cost to Factor type checking, and it showed up in my
experience; just like there is a cost to no type checking, and it
shows up in the experience of Forth neophytes.

>Have there been any other factor workshops since 2012.

No.

dxforth

unread,
Dec 15, 2020, 7:39:58 PM12/15/20
to
On 15/12/2020 21:11, minf...@arcor.de wrote:
> ...
> All that makes some sense for writing safety-critical applications. But it
> is not perfect, costs a lot of coding effort, and catches 'perhaps' only a certain
> class of programming mistakes.

The nearest thing forth has to type-checking is compiler security. When
the compiler catches a syntax error, we pat it on the back, grateful at
how it avoided a potential catastrophe :)

Doug Hoffman

unread,
Dec 16, 2020, 6:21:45 AM12/16/20
to
On 12/15/20 7:39 PM, dxforth wrote:

> The nearest thing forth has to type-checking is compiler security.  When
> the compiler catches a syntax error, we pat it on the back, grateful at
> how it avoided a potential catastrophe :)

True. However I get a limited but often useful form of type checking
when I use an objects extension. Judiciously placed stack dumps show
what normally would be hard to understand addresses. But if they are
objects I can send the addresses messages to determine the class of the
object and the contents of the object. Not always useful, but on
occasion it has helped a lot.

-Doug

Doug Hoffman

unread,
Dec 16, 2020, 8:13:54 AM12/16/20
to
I forgot to mention that when an object is sent a message that it does
not understand then that is a "type error" and the program halts nicely
at that point. Not foolproof, but works well enough to be useful to me.

-Doug

minf...@arcor.de

unread,
Dec 16, 2020, 10:09:38 AM12/16/20
to
Always better than
1.E 2.E + <-- stack underflow

Doug Hoffman

unread,
Dec 16, 2020, 10:27:36 AM12/16/20
to
My objects extension, and this may be typical for most or all of the
many Forth object extensions available, may not catch this. I definitely
do not implement the "everything is an object" paradigm.

However Franck Bensusan has shown the way, quite elegantly IMO, to do
exactly that with his Oforth. Oforth does not enforce type checking the
way StrongForth and Factor do, which I see as a big plus for Oforth. But
like a standard Forth object extension, Oforth will halt when a message
like "+" is sent to an object that does not respond to +.

-Doug

Anton Ertl

unread,
Dec 16, 2020, 1:12:00 PM12/16/20
to
Doug Hoffman <dhoff...@gmail.com> writes:
>I forgot to mention that when an object is sent a message that it does
>not understand then that is a "type error" and the program halts nicely
>at that point.

There have been discussions about supporting the following scheme:

If an object does not understand a message selector, a
message-not-understood message is sent to the object, and the object
can implement a method for that selector to delegate the original
selector to some other object (maybe one that is contained in the
object).

I expect that this does not make errors harder to find and understand:
If the delegated message is not understood, that shows up very close
to the original invocation. If it does, there probably is no error.

none albert

unread,
Dec 16, 2020, 1:32:01 PM12/16/20
to
In article <098fef11-936b-467e...@googlegroups.com>,
In iterations Charles Moore apparently starts from scratch and
restarts the build, check cycle. Check not debug, because the steps
are too small from him to make mistakes .. most of the time.
That doesn't work for me, because I do not program fast enough, I have
a psychological barrier against starting from scratch, and I make too
many mistakes even in small steps.
But lately I have changed my style of work, mimicking Charles Moore somewhat,
that means having a build, debug cycle, for each separate word.
(debug instead of check because I make more mistakes.)
Each time I do not debug, or say add a test for simple words I regret
it. I works by adding a REGRESS to every word. Even "trivial" words
like so.

\ For angle in degrees put sine on the fp stack.
: sine S>F 180 S>F F/ PI F* FSIN ;
REGRESS 90 sine 1E0 F=~ S: TRUE

Now contrary to Charles Moore I don't need to start from scatch for a
redesign, because I kept my tests. Remember the difference is that I
run my tests dozens of times. Charles Moore (and maybe others) just
checks one time and then it runs. Even for a fairly monumental design change,
I can change rapidly, because all problems will be trapped, during the
loading of the redesigned program.

Debugging for e.g. a stack error is just a monumental waste of time,
there should never be a situation where you do not know immediately
where it occurs. In short I'm done with stack errors and the like.
Yes it requires a lot of discipline, and the programs no longer
look like a poem, more like a juridical contract.

For Anton:
I put setting up data structures in the REGRESS too. Like so.

\ Return the cardinality of solutions represented in the table.
: count-table sort-table 0 >R
0 last 1- BEGIN 2DUP <= WHILE (count-table) R> + >R REPEAT
2DROP R> ;
REGRESS init-table 0.9E0 F, 1.E0 F, 2.E0 F, 2.E0 F, 2.E0 F, 2.E0 F, S:
REGRESS fini-table count-table S: 1

In a production run, I do NO-REGRESS and no dummy data structures are
generated.

Doug Hoffman

unread,
Dec 17, 2020, 2:56:19 AM12/17/20
to
On 12/16/20 1:06 PM, Anton Ertl wrote:
> Doug Hoffman <dhoff...@gmail.com> writes:
>> I forgot to mention that when an object is sent a message that it does
>> not understand then that is a "type error" and the program halts nicely
>> at that point.
>
> There have been discussions about supporting the following scheme:
>
> If an object does not understand a message selector, a
> message-not-understood message is sent to the object, and the object
> can implement a method for that selector to delegate the original
> selector to some other object (maybe one that is contained in the
> object).

As usual, your thinking is 2-3 steps ahead of mine. That's an
interesting idea that I'll have to ponder for awhile to see a use case.

-Doug

Anton Ertl

unread,
Dec 17, 2020, 5:35:35 AM12/17/20
to
Doug Hoffman <dhoff...@gmail.com> writes:
>On 12/16/20 1:06 PM, Anton Ertl wrote:
>> If an object does not understand a message selector, a
>> message-not-understood message is sent to the object, and the object
>> can implement a method for that selector to delegate the original
>> selector to some other object (maybe one that is contained in the
>> object).
>
>As usual, your thinking is 2-3 steps ahead of mine. That's an
>interesting idea that I'll have to ponder for awhile to see a use case.

It's somebody else's idea (from Smalltalk?), and IIRC Andrew Haley
presented it here.

As for use cases, there are probably cool software pattern names that
I don't know, but one use case is to have a general wrapper object that
logs the message and then dele gates it to the contained object.

Another use case could be a default behaviour for container objects:
delegate all unknown selectors to the contained objects. In this case
you would have the problem that with multiple calls you have to know
exactly how many stack items are consumed. Maybe closures can help,
but I am not sure that we can make the same closure work for the
message-not-understood case and for the contained objects.

Marcel Hendrix

unread,
Dec 18, 2020, 3:55:05 AM12/18/20
to
On Tuesday, December 15, 2020 at 10:25:37 AM UTC+1, Anton Ertl wrote:
> Marcel Hendrix <m...@iae.nl> writes:
> >On Monday, December 14, 2020 at 3:47:11 PM UTC+1, Anton Ertl wrote:
> >> Anyway, my takeaway from the workshop was that
> >> Factor's typing gets in the way rather than having a good effect.
> >
> >Do you remember any motivating examples of serious errors that are
> >caught by such a checker?
> I have had difficulties in debugging code where a lot of addresses of
> structures are flying around. It's no easy to see which address is
> which, and if the structures contain other addresses, I don't see it
> when looking at what the addresses point to, either.
>
> There are classic Forth approaches like testing each word individually
> that I did not do sufficiently for that code, IIRC because quite a bit
> of data needs to be built up before the words can be fully tested.

Actually, that is a real problem I encounter a lot with more ambitious
and longer living Forth projects (like my SPICE simulator). After a few
iterations datastructures get modified to add new features, and
inevitably bugs are introduced. At that point it is usually not obvious
anymore when and how a datastructure comes into existence.
(An example would be an internal list of circuit nodes with their attributes,
versus a list of the symbolic names of these nodes as used by the netlist.)

What I usually do is put a break at a point after such a structure is initialized,
and then debug on the commandline. It can be quite frustrating when
it takes a long time for the program to generate the necessary data,
especially when the Forth dies immediately after the bug manifests itself.

> I have been thinking about adding type knowledge to structures and
> structure access words in order to facilitate debugging such code, but
> have not taken action yet.

I don't see how these errors would be easier caught when types
are introduced. My experience is that the problem most often has to do
with time: the program wants to use data that is not created yet, or wants
to store data in a not yet created structure, or storing or retrieving depends
on yet another structure with an implicit temporal behavior.

Thinking about this suggests that something could be done with a sort
of dynamic 'validation' of structures, or by giving them an 'uninitialized'
property. I am not sure this would qualify as 'typing.'
What I see myself do for bigger programs is adding lots of seemingly
superfluous tests that give detailed error reports (file, line number, name
of excuting word, content of stack(s), suggestion for action) and that try
to prevent crashing before any meaningful output is generated.
And I couldn't survive without the word ^^ , which single steps the code
showing the active file and line number (my text editor can show line
numbers).

> In any case, a Forth type checker would have to know about structures
> to help in such cases.
>
> Other than that, type errors have not been a problem in Forth in my
> experience. I think the reason is that Forthers have a variety of
> ways to avoid such problems (e.g., testing individual words); people
> coming from languages with more typechecking may miss their type
> checker in Forth because they have not yet acquired these skills.

-marcel

Doug Hoffman

unread,
Dec 18, 2020, 4:59:52 AM12/18/20
to
On 12/15/20 4:15 AM, Anton Ertl wrote:
> Marcel Hendrix <m...@iae.nl> writes:
>> On Monday, December 14, 2020 at 3:47:11 PM UTC+1, Anton Ertl wrote:
>>> Anyway, my takeaway from the workshop was that
>>> Factor's typing gets in the way rather than having a good effect.
>>
>> Do you remember any motivating examples of serious errors that are
>> caught by such a checker?
>
> I have had difficulties in debugging code where a lot of addresses of
> structures are flying around. It's no easy to see which address is
> which, and if the structures contain other addresses, I don't see it
> when looking at what the addresses point to, either.

> I have been thinking about adding type knowledge to structures and
> structure access words in order to facilitate debugging such code, but
> have not taken action yet.

They're called objects. You already have your own extension.

-Doug

Anton Ertl

unread,
Dec 18, 2020, 12:05:01 PM12/18/20
to
Marcel Hendrix <m...@iae.nl> writes:
>On Tuesday, December 15, 2020 at 10:25:37 AM UTC+1, Anton Ertl wrote:
>> There are classic Forth approaches like testing each word individually
>> that I did not do sufficiently for that code, IIRC because quite a bit
>> of data needs to be built up before the words can be fully tested.
>
>Actually, that is a real problem I encounter a lot with more ambitious
>and longer living Forth projects (like my SPICE simulator). After a few
>iterations datastructures get modified to add new features, and
>inevitably bugs are introduced. At that point it is usually not obvious
>anymore when and how a datastructure comes into existence.
>(An example would be an internal list of circuit nodes with their attributes,
>versus a list of the symbolic names of these nodes as used by the netlist.)
>
>What I usually do is put a break at a point after such a structure is initialized,
>and then debug on the commandline. It can be quite frustrating when
>it takes a long time for the program to generate the necessary data,
>especially when the Forth dies immediately after the bug manifests itself.

That was not a problem in my case. The problem was that the resulting
data structures were full of 64-bit addresses pointing to other
structures which were full of 64-bit addresses. It's like being in a
maze where every room looks the same.

>> I have been thinking about adding type knowledge to structures and
>> structure access words in order to facilitate debugging such code, but
>> have not taken action yet.
>
>I don't see how these errors would be easier caught when types
>are introduced. My experience is that the problem most often has to do
>with time: the program wants to use data that is not created yet, or wants
>to store data in a not yet created structure,

My thinking was to use run-time checking with field-addressing words
that (in debugging mode) check a tag field at the start of the
structure. That should usually catch both of these cases.

>or storing or retrieving depends
>on yet another structure with an implicit temporal behavior.

Not sure if what I am thinking of would help in these cases.

>Thinking about this suggests that something could be done with a sort
>of dynamic 'validation' of structures, or by giving them an 'uninitialized'
>property.

Yes, one could do that, too.

>I am not sure this would qualify as 'typing.'

Well, if you look at Rust, it tries to use static type checking to
ensure that the problems you experience don't happen; so a dynamic
variant of checking that could also be called type checking.

Anton Ertl

unread,
Dec 18, 2020, 12:11:47 PM12/18/20
to
Doug Hoffman <dhoff...@gmail.com> writes:
>> I have been thinking about adding type knowledge to structures and
>> structure access words in order to facilitate debugging such code, but
>> have not taken action yet.
>
>They're called objects. You already have your own extension.

My own extension does not check. Also, the idea here is to optionally
check for structures that are not otherwise used in an object-oriented
way.

Alex McDonald

unread,
Dec 18, 2020, 1:41:08 PM12/18/20
to
On 18-Dec-20 16:45, Anton Ertl wrote:


> The problem was that the resulting
> data structures were full of 64-bit addresses pointing to other
> structures which were full of 64-bit addresses. It's like being in a
> maze where every room looks the same.
>

This is a huge human debugging problem, and I have experienced it too.
Windows has an address randomisation feature that makes visual address
matching even harder, since the addresses returned for allocated memory
shift about from run to run. There are a few (not all satisfactory)
solutions

Split the address, in this example by dotting at 8 nibbles
(xxxxxxxx.yyyyyyyy) or every 4 (wwww.xxxx.yyyy.zzzz) or some combination
like 4.4.8. This seems to help a little; it allows for easier pattern
matching.

Inspect addresses before dumping and replace common prefixes with
characters; so A.xxxxxxxx B.yyyyyyyy A.zzzzzzzz. This dispenses with
inspecting & memorizing the prefix, and (surprising to me) much
allocation under Windows for reasonable sizes of memory results in the
same leading 8 nibbles (and sometimes more).

It requires a smart dump, and remembering prefixes between dumps; all
very well having A.40205020 on one dump and A.40203030 on the next where
the As are completely different. Perhaps a simple substitution;

$7f301000 "A" dump-prefix blockA
blockA 400 dump-smart

and have a smart dump match all $7f301000 to the string "A".


--
Alex

none albert

unread,
Dec 19, 2020, 11:34:52 AM12/19/20
to
In article <2020Dec1...@mips.complang.tuwien.ac.at>,
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>Doug Hoffman <dhoff...@gmail.com> writes:
>>On 12/16/20 1:06 PM, Anton Ertl wrote:
>>> If an object does not understand a message selector, a
>>> message-not-understood message is sent to the object, and the object
>>> can implement a method for that selector to delegate the original
>>> selector to some other object (maybe one that is contained in the
>>> object).
>>
>>As usual, your thinking is 2-3 steps ahead of mine. That's an
>>interesting idea that I'll have to ponder for awhile to see a use case.
>
>It's somebody else's idea (from Smalltalk?), and IIRC Andrew Haley
>presented it here.
>
>As for use cases, there are probably cool software pattern names that
>I don't know, but one use case is to have a general wrapper object that
>logs the message and then dele gates it to the contained object.

There is a category error here. An object is a data structure and can
not log anything.
An objects method is an action and it can log its input and output.

The design pattern to add debugging is called a decorator.

Decorators and class/objects are both present in ciforth, and
independant features. Methods in a class execute xt's. Every high
level xt (non-code definition) can be decorated, then later
undecorated.

Using it looks like :

' decorator ' method decorated
' method undecorated

Both the decorated and undecorated words are of course "carnal" sins.

In ciforth decorators are more powerful for the existance of the `` CO ''
coroutine call. In the above example
: decorator .S CO .S ;
prints the stack before and after `` method '' runs.

<SNIP>

>
>- anton

Andy Valencia

unread,
Dec 19, 2020, 1:54:25 PM12/19/20
to
albert@cherry.(none) (albert) writes:
> ' decorator ' method decorated
> ' method undecorated

How do you name the method? The instance of a distinct class
will probably execute a distinct method, even if the method name
is implemented across multiple classes.

FWIW in OO for ForthOS, there's a dict "oo" with class:method names
in it. Since ForthOS also allows direct naming of a dict with
"dictname." prefix:

' oo.MyClass:myMethod

will pick up the xt for the method for this class.

Andy

Ron AARON

unread,
Dec 20, 2020, 3:33:43 AM12/20/20
to
Because there have been requests over the years for OOP in 8th, I'm
finally implementing an object type which is the basis for any OOP the
user wants to build.

The current implementation is similar to Javascript's, in that a new
object is created from an existing one (its 'super'), and if a name is
given, then a new class is also created (based on the super).

One of my users asked if I'll have 'multiple-inheritance', and my
initial response is that that's not in the current implementation. My
further response is that years of experience with C++ and Java have not
lead me to believe that MI is particularly useful or clear (in practice).

However, I'm trying to think through how I would implement MI without
making my life difficult. Having multiple classes as a superclass could
be easily done with a linked-list (for instance) or an array. But having
superclasses A and B both implementing a 'foo' method leads to some
extra joy.

My instinct is that 'foo' should be resolved bu the first superclass
which implements it. That leaves open the question of how best to allow
invocation of a specific superclass' 'foo'.

Further, as currently implemented, the word 'super' invokes the
superclass's version of whatever method is currently being invoked. If
there are multiple superclasses, that further complicates matters.

Have any of you implemented MI in your Forths?

none albert

unread,
Dec 20, 2020, 6:45:03 AM12/20/20
to
In article <160840385332.30483....@media.vsta.org>,
Andy Valencia <van...@vsta.org> wrote:
>albert@cherry.(none) (albert) writes:
>> ' decorator ' method decorated
>> ' method undecorated
>
>How do you name the method? The instance of a distinct class
>will probably execute a distinct method, even if the method name
>is implemented across multiple classes.
>
>FWIW in OO for ForthOS, there's a dict "oo" with class:method names
>in it. Since ForthOS also allows direct naming of a dict with
>"dictname." prefix:
>
> ' oo.MyClass:myMethod
>

You solved the question yourself. There must be an xt associated
with the method and that is dependant of the oo package you use.
In ciforth in particular there cannot be two methods for different
objects with the same name, so there is no hurdle here.

In general of course we can have two words with the same name. That is
solved with wordlists. So if one requires same name methods for different
classes, in my book a wordlist has to be associated with a class.
That is not hard, but I build that bridge if there is a river.

>
>Andy

Doug Hoffman

unread,
Dec 20, 2020, 8:40:50 AM12/20/20
to
On 12/20/20 3:33 AM, Ron AARON wrote:

> Because there have been requests over the years for OOP in 8th, I'm
> finally implementing an object type which is the basis for any OOP the
> user wants to build.

> One of my users asked if I'll have 'multiple-inheritance', and my
> initial response is that that's not in the current implementation. My
> further response is that years of experience with C++ and Java have not
> lead me to believe that MI is particularly useful or clear (in practice).

Considering the Forth minimalist mindset, that belief is correct IMO.

> However, I'm trying to think through how I would implement MI without
> making my life difficult. Having multiple classes as a superclass could
> be easily done with a linked-list (for instance) or an array. But having
> superclasses A and B both implementing a 'foo' method leads to some
> extra joy.
>
> My instinct is that 'foo' should be resolved bu the first superclass
> which implements it. That leaves open the question of how best to allow
> invocation of a specific superclass' 'foo'.
>
> Further, as currently implemented, the word 'super' invokes the
> superclass's version of whatever method is currently being invoked. If
> there are multiple superclasses, that further complicates matters.
>
> Have any of you implemented MI in your Forths?

Yes. Based on Mike Hore's Mops model. You have correctly identified most
of the implementation issues. It is doable.

The bigger problem is understanding the behavior of objects that have
deeply nested MI at various places in the chain of superclasses.

While MI can be elegant, so is a meta object protocol(MOP). I have
dropped MI for my own use and have no idea how to do a MOP.

IMO the more important features to support are polymorphism with duck
typing as in Oforth. But do it however you want, classic Forthers do. :-)

-Doug

Paul Rubin

unread,
Dec 20, 2020, 12:42:46 PM12/20/20
to
Ron AARON <c...@8th-dev.com> writes:
> One of my users asked if I'll have 'multiple-inheritance', and my
> initial response is that that's not in the current implementation. My
> further response is that years of experience with C++ and Java have
> not lead me to believe that MI is particularly useful or clear (in
> practice).

You might look at CLOS (Common Lisp Object System) which I believe is
mostly descended from Flavors (the MIT Lisp Machine System). These (or
at least Flavors) let you specify how method combination is supposed to
work. That said, I've never used either in practice. I think I
understood Flavors from reading the manual and implementing a subset of
it in Emacs, but I never made that much sense of CLOS.

When I tried using MI in Python, things got confusing very fast. These
days I'm not much of a believer in OO in general. It has gone out of
style in C++ as well, with people preferring template generics. In 8th
you'd use runtime dispatch and that would avoid the C++ code bloat, at
the cost of a little bit of speed.

Ron AARON

unread,
Dec 21, 2020, 2:13:26 AM12/21/20
to
Right. Polymorphism is handled in my implementation. I'll not be doing
MI, it's just too messy and there's little advantage.

The current implementation is pretty solid; now to see if anyone
actually wants to use it after having asked for it...

Ron AARON

unread,
Dec 21, 2020, 2:16:18 AM12/21/20
to
Yes. Runtime dispatch also makes it easy to override behavior without
resorting to deferred-words.

Regarding OOP in C++, the trend in recent years has been to make shallow
hierarchies. Templates are also popular, as you point out; I've always
found them confusing as hell and prone to breakage (and very (!)
difficult to debug).

Paul Rubin

unread,
Dec 21, 2020, 3:34:11 AM12/21/20
to
Ron AARON <c...@8th-dev.com> writes:
> Regarding OOP in C++, the trend in recent years has been to make
> shallow hierarchies. Templates are also popular, as you point out;
> I've always found them confusing as hell and prone to breakage (and
> very (!) difficult to debug).

The Concepts extension should help with the error messages somewhat.
It's too late for C++ but there are other to generics as well. C++ went
for templates because it's built around the idea of zero-cost
abstraction so they don't want a generic call to involve any type of
runtime dispatch.

I thought this old article comparing generics in a bunch of different
languages (C++, OCaml, Haskell, etc.) was pretty good, but it was far
away from how Forth could sanely do things:

https://www.semanticscholar.org/paper/An-extended-comparative-study-of-language-support-Garcia-J%C3%A4rvi/4329c6bb865abdf916d157a918605f1b76425cd8

It might be worth taking a closer look at CLOS generics though.

Ron AARON

unread,
Dec 21, 2020, 3:36:33 AM12/21/20
to
I'll take a look, thanks.

Lars Brinkhoff

unread,
Dec 21, 2020, 9:14:58 AM12/21/20
to
Paul Rubin wrote:
> You might look at CLOS (Common Lisp Object System) which I believe is
> mostly descended from Flavors (the MIT Lisp Machine System).

I have a vague memory CLOS came from CommonLoops, which in turn came
from Loops for Interlisp-D. Or maybe CLOS was a merging between that
and New Flavors.

Anton Ertl

unread,
Dec 21, 2020, 1:18:23 PM12/21/20
to
Alex McDonald <al...@rivadpm.com> writes:
>On 18-Dec-20 16:45, Anton Ertl wrote:
>
>
>> The problem was that the resulting
>> data structures were full of 64-bit addresses pointing to other
>> structures which were full of 64-bit addresses. It's like being in a
>> maze where every room looks the same.
>>
>
>This is a huge human debugging problem, and I have experienced it too.
>Windows has an address randomisation feature that makes visual address
>matching even harder, since the addresses returned for allocated memory
> shift about from run to run.

Linux has the same feature (?), but you can turn it off (as root) with

echo 0 >/proc/sys/kernel/randomize_va_space

But in the case I was thinking of, the maze problem was also present
within a single session.

>Inspect addresses before dumping and replace common prefixes with
>characters; so A.xxxxxxxx B.yyyyyyyy A.zzzzzzzz. This dispenses with
>inspecting & memorizing the prefix, and (surprising to me) much
>allocation under Windows for reasonable sizes of memory results in the
>same leading 8 nibbles (and sometimes more).

You can invoke Gforth (development version) with

gforth --map-32bit

and on AMD64 this will put the stuff allocated by Gforth directly with
mmap (in particular, the dictionary and the stacks) into the first 4GB
if possible. This does not help for ALLOCATEd memory, though.

>It requires a smart dump, and remembering prefixes between dumps; all
>very well having A.40205020 on one dump and A.40203030 on the next where
>the As are completely different. Perhaps a simple substitution;
>
>$7f301000 "A" dump-prefix blockA
>blockA 400 dump-smart
>
>and have a smart dump match all $7f301000 to the string "A".

Sounds like a good idea.

Gforth now also has ... (a "smart" variant of .s), which guesses what
it is looking at and shows the data accordingly. In particular, if
the cell looks like an address in the dictinary, it shows it as:

* as FOO, the name of the word, for body addresses of created words and variables

* as <FOO>, for body addresses of other words

* as `FOO, for the xt of words

* as ``BAR, for the nt of synonyms

* as <FOO+$5f>, for other addresses in the dictionary.

I.e.:

Input:
create foo ok
synonym bar foo
bar ``bar >body ' bar ``bar ``bar 1- ...

prints:
<5> foo <bar> `foo ``bar <foo+$1F>

This does not help for ALLOCATEd data. Defining anchors for allocated
data as you suggest may be a way to deal with that.

Anton Ertl

unread,
Dec 21, 2020, 1:33:41 PM12/21/20
to
Ron AARON <c...@8th-dev.com> writes:
>One of my users asked if I'll have 'multiple-inheritance', and my
>initial response is that that's not in the current implementation. My
>further response is that years of experience with C++ and Java have not
>lead me to believe that MI is particularly useful or clear (in practice).

Stroustroup wrote that he did not see the case for multiple
inheritance but that he added it to C++ to allow implementing mixins.
From what I understand about mixins, they only combine methods, so
something like Java's interfaces or Smalltalk-like duck typing may
be good enough for that.

>But having
>superclasses A and B both implementing a 'foo' method leads to some
>extra joy.
>
>My instinct is that 'foo' should be resolved bu the first superclass
>which implements it.

If the method implementation is not the same, IMO this should be an error.

>Further, as currently implemented, the word 'super' invokes the
>superclass's version of whatever method is currently being invoked. If
>there are multiple superclasses, that further complicates matters.

Instead of "super", use explicit naming.

>Have any of you implemented MI in your Forths?

I have not.

Anton Ertl

unread,
Dec 21, 2020, 1:53:37 PM12/21/20
to
Paul Rubin <no.e...@nospam.invalid> writes:
>It's too late for C++ but there are other to generics as well. C++ went
>for templates because it's built around the idea of zero-cost
>abstraction so they don't want a generic call to involve any type of
>runtime dispatch.
>
>I thought this old article comparing generics in a bunch of different
>languages (C++, OCaml, Haskell, etc.) was pretty good, but it was far
>away from how Forth could sanely do things:
>
>https://www.semanticscholar.org/paper/An-extended-comparative-study-of-language-support-Garcia-J%C3%A4rvi/4329c6bb865abdf916d157a918605f1b76425cd8

Generics exist to allow implementing stuff like containers (e.g.,
lookup tables, sets, etc.) in a sane way in statically type-checked
languages; e.g., the paper you link to mentions only statically
type-checked languages AFAIK (I don't know Cn). Languages with
dynamic type-checking like Smalltalk implement containers just fine
without needing generics. Forth does not have type-checking, so you
do not need generics to allow putting a variety of types into
containers. If you want to make an array of vehicles, an array of
cells will do just fine, and you can then put addresses of car
objects, lorry objects, and bicycle objects in it; no type-checker
will complain.

>It might be worth taking a closer look at CLOS generics though.

Looking at

https://en.wikipedia.org/wiki/Generic_function#In_Common_Lisp_Object_System

it seems that a CLOS generic function (the only use of "generic" in
<https://en.wikipedia.org/wiki/CLOS>) is a method selector, and has
nothing to do with generics in Java or templates in C++.

It's interesting that Forth followed a similar path as Lisp wrt
syntax: Early Forth OO extensions (e.g., Neon) had a
Smalltalk-inspired syntax, while later ones look more like Forth, and
it seems to me that we now have a consensus that the latter is better.

Paul Rubin

unread,
Dec 22, 2020, 4:14:28 AM12/22/20
to
Lars Brinkhoff <lars...@nocrew.org> writes:
> I have a vague memory CLOS came from CommonLoops, which in turn came
> from Loops for Interlisp-D.

You might be right, and that would explain why I didn't understand much
of it when I tried to read about it back in the day. I never used more
than the simplest features of it.

Flavors made reasonable sense though there was a perception that SEND
was a hack and CLOS's approach was better. But, the motivation for MI
in Flavors (and maybe Loops) was mixins, and in retrospect it doesn't
seem to have been a great way to do those.

I had thought C++'s MI didn't support mixins the way Flavors did.
E.g. you can't have multiple superclasses with automatic method
combination. Flavors needed that, because the driving example was a
window system with superclasses like window-with-borders,
window-with-scrollbars, etc. You'd pick the features you wanted your
window to have, and inherit from that set of superclasses.

Hans Bezemer

unread,
Jan 19, 2021, 9:00:47 AM1/19/21
to
On Saturday, December 5, 2020 at 4:06:29 PM UTC+1, Ruvim wrote:
> 1. What symbols (if any) do you use to denote a *single-cell* value that
> identifies a string? (NB: it doesn't matter what particular library is
> used).
NONE. The only thing you can do with a single cell string value is either STORE something there (using (+)PLACE) or retrieve something (using COUNT).
Hence, the use of these values is quite limited in 4tH.

> 2. What symbols (if any) do you use to denote a *cell-pair* that
> represents a character string? (NB: without referring the individual
> components).
NONE. those values live upon the stack most of their lifetime - or as string constants and thus quite easy to spot.
They are rarely stored in a 2VARIABLE.

There is a dynamic string library in 4tH, which essentially turns a cell of array of cells into dynamic strings. The complete API is below:

\ ds.free v --
\ ds.init v --
\ ds.build v n --
\ ds.destroy --
\ ds.place a n v --
\ ds.count v -- a n
\ ds+place a n v --

Any string manipulation is done when the string has been turned into its addr/count representation by DS.COUNT

Hans Bezemer
0 new messages