Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ALLOT?

309 views
Skip to first unread message

dxforth

unread,
Aug 30, 2021, 1:14:51 AM8/30/21
to
On 30/08/2021 12:22, Krishna Myneni wrote:

> : allot? ( u -- a ) here swap allot ;

IIRC you've had this forever. I never thought much about it at
the time. Now a convert, I find uses for it all the time e.g.

$95 constant TLEN \ length of each term definition
$100 constant CHUNK \ in-file chunk to get
20 constant ISIZ \ size input buffer / terminal name
200 constant TMAX \ max #terminals

\ Storage areas allocated at run-time
here value TBUF ( -- a ) \ temp terminal buffer
here value SBUF ( -- a ) \ swap/work buffer
here value IBUF ( -- a ) \ console input
here value XBUF ( -- a ) \ terminal index

: INIT ( -- )
altered off \ clear
isiz reserve to ibuf \ console input
tlen reserve to tbuf \ temp terminal buffer
chunk reserve to sbuf \ swap/work buffer
tmax cells reserve to xbuf \ terminal index
;

Krishna Myneni

unread,
Aug 30, 2021, 8:26:31 AM8/30/21
to
On 8/30/21 12:14 AM, dxforth wrote:
> On 30/08/2021 12:22, Krishna Myneni wrote:
>
>> : allot?  ( u -- a ) here swap allot ;
>
> IIRC you've had this forever.  I never thought much about it at
> the time.  Now a convert, I find uses for it all the time e.g.
>

ALLOT? was originally called ?ALLOT in kForth, but that word is
deprecated now in kForth, in favor of ALLOT?, following a suggestion on
word naming by Ruvim. ALLOT? is essential for making use of the dynamic
dictionary model implemented in kForth, which has no HERE.

> $95  constant TLEN      \ length of each term definition
> $100 constant CHUNK     \ in-file chunk to get
> 20   constant ISIZ      \ size input buffer / terminal name
> 200  constant TMAX      \ max #terminals
>
> \ Storage areas allocated at run-time
> here value TBUF     ( -- a )  \ temp terminal buffer
> here value SBUF     ( -- a )  \ swap/work buffer
> here value IBUF     ( -- a )  \ console input
> here value XBUF     ( -- a )  \ terminal index
>
> : INIT ( -- )
>   altered off                 \ clear
>   isiz       reserve to ibuf  \ console input
>   tlen       reserve to tbuf  \ temp terminal buffer
>   chunk      reserve to sbuf  \ swap/work buffer
>   tmax cells reserve to xbuf  \ terminal index
> ;

RESERVE is a synonym of ALLOT? .

--
Krishna

Ruvim

unread,
Aug 30, 2021, 10:11:44 AM8/30/21
to
On 2021-08-30 15:26, Krishna Myneni wrote:
> On 8/30/21 12:14 AM, dxforth wrote:
>> On 30/08/2021 12:22, Krishna Myneni wrote:
>>
>>> : allot?  ( u -- a ) here swap allot ;
>>
>> IIRC you've had this forever.  I never thought much about it at
>> the time.  Now a convert, I find uses for it all the time e.g.
>>
>
> ALLOT? was originally called ?ALLOT in kForth, but that word is
> deprecated now in kForth, in favor of ALLOT?, following a suggestion on
> word naming by Ruvim. ALLOT? is essential for making use of the dynamic
> dictionary model implemented in kForth, which has no HERE.

AFAIR, I only criticized the "?ALLOT" name for this word, and I didn't
suggest "ALLOT?" as a better variant.

Usually the question mark at the end (like "X?") means that this word
return a *flag* at the top; usually there is also a variant of the word
without the question mark (like "X") that doesn't return a flag.

For example, "KEY ( -- char )" returns a character,
and "KEY? ( -- flag )" returns only a flag about available of a
character. Ditto for "EKEY" and "EKEY?", "XKEY" and "XKEY?".

The word "XC!+?" returns a flag, while "XC!+" doesn't

The word "ENVIRONMENT?" doesn't have a pair, but it also returns a flag
at the top.

The idea is that if a name ends with the question mark, then the word
returns a flag at the top (NB: the converse does not hold).

Obviously, the name "ALLOT?" for ( u -- addr ) breaks this convention.




A possible variant for your word is "ALLOTED ( u -- addr )"

But I would prefer "ALLOTED ( u -- addr u )".
And I used "ALLOCATED ( u -- addr u )" for allocating in the heap,
since usually my APIs allow to set a buffer with arbitrary size.

For example:

256 1024 * ALLOCATED DATASPACE!

buf-size-default ALLOCATED buf1!
buf-size-default ALLOCATED buf2!

where
DATASPACE! ( addr u -- )

buf1 ( -- addr u )
buf1! ( addr u -- )

buf2 ( -- addr u )
buf2! ( addr u -- )



Among the standards words we have: "ALIGN ( -- )"
and "ALIGNED ( addr1 -- addr2 )"




>
>> $95  constant TLEN      \ length of each term definition
>> $100 constant CHUNK     \ in-file chunk to get
>> 20   constant ISIZ      \ size input buffer / terminal name
>> 200  constant TMAX      \ max #terminals
>>
>> \ Storage areas allocated at run-time
>> here value TBUF     ( -- a )  \ temp terminal buffer
>> here value SBUF     ( -- a )  \ swap/work buffer
>> here value IBUF     ( -- a )  \ console input
>> here value XBUF     ( -- a )  \ terminal index
>>
>> : INIT ( -- )
>>    altered off                 \ clear
>>    isiz       reserve to ibuf  \ console input
>>    tlen       reserve to tbuf  \ temp terminal buffer
>>    chunk      reserve to sbuf  \ swap/work buffer
>>    tmax cells reserve to xbuf  \ terminal index
>> ;
>
> RESERVE is a synonym of ALLOT? .

Obviously, yes.

"RESERVE" is better, but I don't sure that this name is good enough to
recommend it for FORTH word list.


--
Ruvim

Brian Fox

unread,
Aug 30, 2021, 10:45:05 AM8/30/21
to
On 2021-08-30 10:11 AM, Ruvim wrote:
>
> Obviously, yes.
>
> "RESERVE" is better, but I don't sure that this name is good enough to
> recommend it for FORTH word list.
>
>
> --
> Ruvim

Although not a proper English word, ALLOC might be a name for this
thing.

Anton Ertl

unread,
Aug 30, 2021, 11:15:11 AM8/30/21
to
Yes, it's not a good name.

Similarly, a "?" at the start means that the word takes a flag and/or
that the word may or may not perform an action: e.g., ?DUP ?EXIT
?BRANCH.

Incidentially, I just needed such a word a few days ago; there is
already a word (originally for a special purpose) in Gforth that does
that: SMALL-ALLOT, so I used that. But the word and its name is not
appropriate for general-purpose usage (it is limited to allocations
smaller than a page).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2021: https://euro.theforth.net/2021

Doug Hoffman

unread,
Aug 30, 2021, 12:19:46 PM8/30/21
to
issue
assign
supply
provide
give
grant
furnish
impart
bestow
extend
reserve

Ruvim

unread,
Aug 30, 2021, 2:25:14 PM8/30/21
to
From this list I like "provide" and "impart".

OTOH, it's desirable to have an association with "allot", I think.

A possible variant:

allotted-head ( u -- addr )


--
Ruvim

Krishna Myneni

unread,
Aug 30, 2021, 2:59:26 PM8/30/21
to
On 8/30/21 9:11 AM, Ruvim wrote:
> On 2021-08-30 15:26, Krishna Myneni wrote:
>> On 8/30/21 12:14 AM, dxforth wrote:
>>> On 30/08/2021 12:22, Krishna Myneni wrote:
>>>
>>>> : allot?  ( u -- a ) here swap allot ;
>>>
>>> IIRC you've had this forever.  I never thought much about it at
>>> the time.  Now a convert, I find uses for it all the time e.g.
>>>
>>
>> ALLOT? was originally called ?ALLOT in kForth, but that word is
>> deprecated now in kForth, in favor of ALLOT?, following a suggestion
>> on word naming by Ruvim. ALLOT? is essential for making use of the
>> dynamic dictionary model implemented in kForth, which has no HERE.
>
> AFAIR, I only criticized the "?ALLOT" name for this word, and I didn't
> suggest "ALLOT?" as a better variant.
>

Ok. I may be misremembering. I may have just changed the name to ALLOT?
after considering your criticism of ?ALLOT to be justified. In any case,
I'm not willing to tinker with it further. ALLOT? does return something,
though it is not a flag.


Krishna

NN

unread,
Aug 30, 2021, 4:19:04 PM8/30/21
to
Why do we need a new word when BUFFER: exists

https://forth-standard.org/standard/core/BUFFERColon

Doug Hoffman

unread,
Aug 30, 2021, 4:33:44 PM8/30/21
to
On 8/30/21 2:25 PM, Ruvim wrote:

> ... it's desirable to have an association with "allot", I think.

I agree.

> A possible variant:
>
> allotted-head ( u -- addr )

That would work.

-Doug

P Falth

unread,
Aug 30, 2021, 5:12:39 PM8/30/21
to
Exactly my thought also!

$95 constant TLEN
TLEN BUFFER: TBUF

works very well

BR
Peter

> https://forth-standard.org/standard/core/BUFFERColon

Hans Bezemer

unread,
Aug 30, 2021, 5:57:50 PM8/30/21
to
On Monday, August 30, 2021 at 7:14:51 AM UTC+2, dxforth wrote:
I agree, these are bad names. I think I'd do something like this:

200 constant #terms
256 constant /chunk
aka chars units

#terms cells assign to xbuf
/chunk chars assign to sbuf

Hans Bezemer

Paul Rubin

unread,
Aug 30, 2021, 6:49:33 PM8/30/21
to
NN <novembe...@gmail.com> writes:
> Why do we need a new word when BUFFER: exists
> https://forth-standard.org/standard/core/BUFFERColon

Why is that in CORE? CREATE foo 20 ALLOT is what I always used.
ALLOT? is for when you don't want to create a name.

Paul Rubin

unread,
Aug 30, 2021, 6:51:20 PM8/30/21
to
Ruvim <ruvim...@gmail.com> writes:
> allotted-head ( u -- addr )

How about ALLOT> based on things like R> plopping an address onto the stack.

Ruvim

unread,
Aug 30, 2021, 8:50:29 PM8/30/21
to
In ">r", "r>" (and similar), ">" means moving an item from/to another place.

This case is another one.


Actually, the symbol ">" is used for many meanings.

Among the standard words there are (with some examples):
- the operation "greater-than": ">", "u>", "0>"
- as a part of the pictogram "not-equals": "<>", "0<>"
- an angle parenthesis: "<#" "#>"
- as a graphical alternative to "to":
- moving an item to/from another place (usually a stack):
">r", "n>r", "nr>"
- conversion of a type: "d>s", "d>f"
- extracting details (with flag): "ekey>char", "ekey>fkey"
- extracting details (without flag): "name>string"
- switching to another field: "name>", ">body"
- conversion of a form (representation): ">number", ">float"
- some very strange (if not ugly) variants: "cmove>", "does>", ">in"


It's not a good idea to increase the last group group of very strange
cases, I think. Moreover, this group words can be made deprecated and
replaced.



Some additional comments.

In the case of the words:

name> ( nt -- xt )
>body ( xt -- addr )

they actually imply "xt", i.e. "name>(xt)" and "(xt)>body".


The words ">number", ">float" imply "text", i.e. "text to number
(integer)" and "text to float number". Despite of similar names, their
signatures are not isomorphic.


In "cmove>", the symbol ">" is just a hint for "from higher addresses to
lower addresses", but it's very *confused*. Since ">" can also mean the
normal direction from left to right (and then, from lower to higher
addresses).

In "does>", the symbol ">" looks like a temporary thoughtless solution
(perhaps "in front"), that became permanent.
Taking into account the words ";code" and ":noname", "does>" should be
named as ";:does", e.g.:

: foo create ... ;:does ... ;


">in" is generally a museum exhibit.
I can guess, ">" means "a pointer in", and "in" means "input buffer"
See also: https://forth-standard.org/standard/core/toIN#contribution-110
I would like to find an alternative to this word.

--
Ruvim

dxforth

unread,
Aug 30, 2021, 10:45:48 PM8/30/21
to
On 31/08/2021 00:11, Ruvim wrote:
>
> "RESERVE" is better, but I don't sure that this name is good enough to
> recommend it for FORTH word list.

What Forth word list - the standard? No. I'm simply drawing
attention to a function I appear to have unwittingly reinvented.

The name RESERVE comes from language XPL0 in which I had some
involvement. Simulating it in forth required HERE SWAP ALLOT
which got tedious.

Regarding naming, how many variations of ALLOT ALLOCATE can a
language stomach? Forth isn't C :) IMO using a different name
breaks the monotony, is actually easier to remember and reads
better.

dxforth

unread,
Aug 30, 2021, 11:15:55 PM8/30/21
to
?xxx typically meant conditionally do something. So if ?ALLOT first
checked available memory one could argue it was appropriate relative to
ALLOT where it hasn't been a requirement. OTOH the similarity in names
can work against one.

dxforth

unread,
Aug 30, 2021, 11:48:39 PM8/30/21
to
Likely the name of the buffer has been defined but there's reason
to postpone allocation e.g. keep executables small. Simpler than
ALLOCATE.

dxforth

unread,
Aug 31, 2021, 1:17:18 AM8/31/21
to
On 31/08/2021 10:50, Ruvim wrote:
>
> In "does>", the symbol ">" looks like a temporary thoughtless solution
> (perhaps "in front"), that became permanent.
> Taking into account the words ";code" and ":noname", "does>" should be
> named as ";:does", e.g.:
>
> : foo create ... ;:does ... ;

Originally it was <BUILDS DOES>

Before that there was ;: though I'm uncertain of the usage. It doesn't
appear to have used CREATE etc which may have been its Achilles' heel and
why abandoned.

Anton Ertl

unread,
Aug 31, 2021, 2:15:12 AM8/31/21
to
Paul Rubin <no.e...@nospam.invalid> writes:
It's in CORE EXT (the EXT part is not obvious in general on
Forth-standard.org). Why not?

Anton Ertl

unread,
Aug 31, 2021, 3:47:22 AM8/31/21
to
Ruvim <ruvim...@gmail.com> writes:
>In "cmove>", the symbol ">" is just a hint for "from higher addresses to
>lower addresses", but it's very *confused*. Since ">" can also mean the
>normal direction from left to right (and then, from lower to higher
>addresses).

Note the pronounciation: "c-move-up". It means: cmove the memory
block from lower to higher addresses; i.e., the name describes the
usage (what), not the how. The how is overspecified for the
application, though. Fortunately, we have MOVE now, making both CMOVE
and CMOVE> unnecessary.

>In "does>", the symbol ">" looks like a temporary thoughtless solution
>(perhaps "in front"), that became permanent.

The idea that I have read was that you would write the defining word
like:

: const <builds , does> @ ;

So <BUILDS points to the name of the new defining word, while DOES>
points to the code executed when the defined word runs.

One can also see the < and > as angle brackets including the code
performed at defining time.

Of course, later <BUILDS was replaced by CREATE, leaving DOES> without
counterpart.

Still, people have used > to end other return-address-using words;
e.g., Stepen Pelc found the following code in the wild
<4ffffe5e....@192.168.0.50> (AFAIK Bernd Paysan used to use such
words):

\ VFX-specific
: list> ( thread -- element )
BEGIN @ dup WHILE dup r@ execute
REPEAT drop r> drop ;

which is used like

: foo bar list> bla blub ;

A list is passed to LIST>, which then executes the "bla blub" part
repeatedly, once for every list element. A more modern (and more
flexible and soon-standard) approach is to put "bla blub" inside a
quotation and call that with EXECUTE for each list element.

So > at the end of the word has become a general indication that the
rest of the word is treated as separate definition.

dxforth

unread,
Aug 31, 2021, 8:23:44 AM8/31/21
to
Here's a listing of the naming conventions published in 'Thinking
Forth' and 'Forth Programmers Handbook':

https://pastebin.com/qpZLFc6h

As to the difference between:

!name Store into name !DATA

name! Store into name B!

I was informed it likely stemmed from when forth names were stored
as count-plus-three characters.

Ruvim

unread,
Aug 31, 2021, 8:30:46 AM8/31/21
to
On 2021-08-31 10:10, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> In "cmove>", the symbol ">" is just a hint for "from higher addresses to
>> lower addresses", but it's very *confused*. Since ">" can also mean the
>> normal direction from left to right (and then, from lower to higher
>> addresses).
>
> Note the pronounciation: "c-move-up". It means: cmove the memory
> block from lower to higher addresses; i.e., the name describes the
> usage (what), not the how.
[...]

It's unconvincing. It seems to me that "from lower to higher addresses"
is about "how".

Also, there is a word "cmove" with pronunciation "c-move", without "up",
that actually moves "from lower to higher addresses".


In anyway, the intention was that ">" is pronounced as "up" in this
case. Perhaps, it means "move to upper addresses"?

Well, it just a lesson that for new names their etymology should be
written in their rationale.

An interesting thing that in Forth-79 this word had the name "<CMOVE"
with the pronunciation "reverse-c-move".

https://www.complang.tuwien.ac.at/forth/fth79std/FORTH-79.TXT



>> In "does>", the symbol ">" looks like a temporary thoughtless solution
>> (perhaps "in front"), that became permanent.
>
> The idea that I have read was that you would write the defining word
> like:
>
> : const <builds , does> @ ;

In Forth-79 the both ways already were available — via "<builds" (like
above), and via "create":

: const create , does> @ ;

What was a different is unclear.

"<builds" was a compile-only word (attribute "C"), but not an immediate
word (no attribute "I").


> So <BUILDS points to the name of the new defining word, while DOES>
> points to the code executed when the defined word runs.


I don't see how "<" in "<BUILDS" says that this word points to the name
of the new definition.

The meaning of "<" and ">" in these names is *not obvious*, we should admit.


> One can also see the < and > as angle brackets including the code
> performed at defining time.

It's far from obvious too. E.g. compare to "<# #>" where they are
really just angle brackets.


> Of course, later <BUILDS was replaced by CREATE, leaving DOES> without
> counterpart.
>
> Still, people have used > to end other return-address-using words;
> e.g., Stepen Pelc found the following code in the wild
> <4ffffe5e....@192.168.0.50> (AFAIK Bernd Paysan used to use such
> words):
>
> \ VFX-specific
> : list> ( thread -- element )
> BEGIN @ dup WHILE dup r@ execute
> REPEAT drop r> drop ;
>
> which is used like
>
> : foo bar list> bla blub ;
>
> A list is passed to LIST>, which then executes the "bla blub" part
> repeatedly, once for every list element.

In the context of BacFORTH, some people (including me) used for that an
arrow-like pictogram "=>", so the code looked as:

: foo bar list=> bluh blub ;

In BacForth it can be also nested, e.g:

: foo bar list=> bluh buz list=> blub ;

I.e., for each element of the "bar" list it also iterates the "buz" list.

[...]

> So > at the end of the word has become a general indication that the
> rest of the word is treated as separate definition.

I would not encourage such usage of ">", and would suggest something
more contrast like "==>" instead (if any).


--
Ruvim

Bernd Linsel

unread,
Aug 31, 2021, 9:25:42 AM8/31/21
to
On 31.08.2021 09:10, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>
> Note the pronounciation: "c-move-up". It means: cmove the memory
> block from lower to higher addresses; i.e., the name describes the
> usage (what), not the how. The how is overspecified for the
> application, though. Fortunately, we have MOVE now, making both CMOVE
> and CMOVE> unnecessary.
>

And, unfortunately, introducing a run-time decision for the rare case
that source and destination memory areas overlap.

One strength of Forth (and one "10x" of C. Moore's "1000x") is the
ability to do at editing time what can be done at editing time, i.e. if
you know that source and destination do not overlap (or if dest overlaps
the lower source), you use the simpler CMOVE, if dest overlaps upper
part of source, use the equally simple (but cache-unfriendly) CMOVE>.

Still better, alignment would have to be taken into account, in order to
move word-sized (or SIMD-packet-sized) quantities if appropriate.
Admittedly, this would lead to an explosion of similar words, each for
itself with very limited use.

C and Fortran compilers invest a considerable amount into analysis of
moving/copying objects at compile time, and are often, but not always,
able to produce decently performing code. Still, in C the programmer is
bound to provide information, whether he/she intends to avoid trashing
of the source area (by either calling memcpy or memmove). In C,
assignments of potentially aliased pointer targets not specified as
'restrict' have to be compiled analogous to memmove, whereas in Fortran
aliasing is explicitly forbidden.

[snip]

B.

Bernd Linsel

unread,
Aug 31, 2021, 9:38:16 AM8/31/21
to
There are some examples for the difference between those two:

C! ( c-addr c -- ) store a character into address
!R ( x -- ) store x into address on return stack

Or, in Machine Forth (~1997):
!A/@A store to/load from address in A register
A copy A to TOS
A! pop TOS into A register

--
B.

Anton Ertl

unread,
Aug 31, 2021, 12:43:29 PM8/31/21
to
Ruvim <ruvim...@gmail.com> writes:
>On 2021-08-31 10:10, Anton Ertl wrote:
>> Ruvim <ruvim...@gmail.com> writes:
>>> In "cmove>", the symbol ">" is just a hint for "from higher addresses to
>>> lower addresses", but it's very *confused*. Since ">" can also mean the
>>> normal direction from left to right (and then, from lower to higher
>>> addresses).
>>
>> Note the pronounciation: "c-move-up". It means: cmove the memory
>> block from lower to higher addresses; i.e., the name describes the
>> usage (what), not the how.
>[...]
>
>It's unconvincing. It seems to me that "from lower to higher addresses"
>is about "how".

The "how" for CMOVE> is to process the character with the highest
address first, the one with the lowest addess last. So I would call
the "how" for CMOVE> "from high to low addresses".

>Also, there is a word "cmove" with pronunciation "c-move", without "up",
>that actually moves "from lower to higher addresses".

It processes the character with the lower address first. If you use
it for moving to an overlapping address range with a higher address,
you don't get a copy of the original.

>In anyway, the intention was that ">" is pronounced as "up" in this
>case. Perhaps, it means "move to upper addresses"?

That's what I was trying to express.

>https://www.complang.tuwien.ac.at/forth/fth79std/FORTH-79.TXT
...
>> The idea that I have read was that you would write the defining word
>> like:
>>
>> : const <builds , does> @ ;
>
>In Forth-79 the both ways already were available — via "<builds" (like
>above), and via "create":
>
> : const create , does> @ ;
>
>What was a different is unclear.

<BUILDS is an uncontrolled word in Forth-79, and it can do anything.
Note that DOES> only mentions usage with CREATE.

A system might implement <BUILDS as

: <BUILDS CREATE ;

or it might implement it in the old way (with an additional cell in
the defined word); how DOES> works in the latter case is an
interesting question.

Or the system might not implement <BUILDS at all.

>> So <BUILDS points to the name of the new defining word, while DOES>
>> points to the code executed when the defined word runs.
>
>
>I don't see how "<" in "<BUILDS" says that this word points to the name
>of the new definition.

In the example above < points to CONST.

>I would not encourage such usage of ">", and would suggest something
>more contrast like "==>" instead (if any).

These days, I suggest passing an xt instead of using return-address
manipulation, and use a quotation if you want the xt inline.

S Jack

unread,
Aug 31, 2021, 1:41:31 PM8/31/21
to
On Tuesday, August 31, 2021 at 7:30:46 AM UTC-5, Ruvim wrote:
> I would not encourage such usage of ">", and would suggest something
> more contrast like "==>" instead (if any).

Punctuation in word names has little merit. I'm sure you have
noted that I use BUILD DOES as oppose to <BUILDS DOES> . The brackets
don't provide any _vital_ meaning and is problematic in providing a
consistent meaning as testified by this thread.

It also makes text descriptions difficult because punctuation in names
conflicts with how punctuation is used in text and it can be also be
devastating if the host OS or communications gets a hold of them.

Note that assembly code does not use punctuation in symbols for
good reason, all the above said problems.

Dropping unneeded punctuation from word names IMHO would be a step
in the right direction.

--
me

"In 2525 if programmers are still alive,
Forth has just one word and it's KOO"

Anton Ertl

unread,
Aug 31, 2021, 1:53:57 PM8/31/21
to
Bernd Linsel <bl1-rem...@gmx.com> writes:
>On 31.08.2021 09:10, Anton Ertl wrote:
>> Ruvim <ruvim...@gmail.com> writes:
>>
>> Note the pronounciation: "c-move-up". It means: cmove the memory
>> block from lower to higher addresses; i.e., the name describes the
>> usage (what), not the how. The how is overspecified for the
>> application, though. Fortunately, we have MOVE now, making both CMOVE
>> and CMOVE> unnecessary.
>>
>
>And, unfortunately, introducing a run-time decision for the rare case
>that source and destination memory areas overlap.
>
>One strength of Forth (and one "10x" of C. Moore's "1000x") is the
>ability to do at editing time what can be done at editing time, i.e. if
>you know that source and destination do not overlap (or if dest overlaps
>the lower source), you use the simpler CMOVE, if dest overlaps upper
>part of source, use the equally simple (but cache-unfriendly) CMOVE>.

CMOVE> is not significantly less cache-friendly than CMOVE. The
problem with both is that for the problem you are suggesting them for,
they are overspecified, and, if you don't introduce a run-time
decision for the overlap case, they are slow. If you introduce a
run-time decision for the overlap case, there is no advantage over
MOVE.

Let's get numbers for this, using the following benchmark:

50 constant bufsize
create a bufsize allot
create b bufsize allot
a bufsize 'a' fill
: bench-move 100000000 0 do a b bufsize move loop ;
: bench-cmove 100000000 0 do a b bufsize cmove loop ;
: bench-cmove> 100000000 0 do a b bufsize cmove> loop ;

Cycles per iteration on a Skylake (Core i5-6600K)
sf vfx gf lxf
95 34 36 32 bench-move
100 32 87 32 bench-cmove
83 33 90 219 bench-cmove>

sf (SwiftForth 3.11) has a pretty straightforward implementation: MOVE
either calls CMOVE or CMOVE>, and CMOVE and CMOVE> contain a simple
loop that copies a byte at a time.

For vfx (VFX Forth for Linux IA32 Version: 4.72), the implemenations
of the three words are much longer:

bytes
124 move
96 cmove
118 cmove>

MOVE does not call CMOVE or CMOVE>, so the editing-time advice costs
quite a bit of memory and buys hardly any speed on VFX.

In gf (Gforth 0.7.9_20210826), MOVE calls memmove() (from glibc 2.19
in this case), while CMOVE and CMOVE> are byte-by-byte copies, like on
SwiftForth, resulting in simular performance.

In lxf, I don't see how MOVE is implemented. CMOVE uses REP MOVSB
(which seems to be doing ok in this case, somewhat to my surprise);
CMOVE> also uses REP MOVSB (with a different setup) , which performs
worse than even I would have expected.

Bottom line: MOVE can be fast. And if you want CMOVE and CMOVE> to be
fast, you need to make them so complicated that their combination is
more complicated than MOVE.

>Still better, alignment would have to be taken into account, in order to
>move word-sized (or SIMD-packet-sized) quantities if appropriate.
>Admittedly, this would lead to an explosion of similar words, each for
>itself with very limited use.

And very limited benefit. See
<2017Sep2...@mips.complang.tuwien.ac.at>

>Still, in C the programmer is
>bound to provide information, whether he/she intends to avoid trashing
>of the source area (by either calling memcpy or memmove).

No, th difference (as interpreted by the current implementors of C) is
that memcpy guarantees that there is no overlap. Just use memmove,
and forget memcpy.

Bernd Linsel

unread,
Aug 31, 2021, 3:48:00 PM8/31/21
to
On 31.08.2021 18:43, Anton Ertl wrote:
REP MOVSB with CLD is optimized by today's processors so that it even
moves whole cache lines in ideal cases, so nowadays it's okay to use it
(again -- after nearly two decades of deprecation).

It is documented that Core-i up to and including Broadwell and all AMD
cores preceding Zen2 do not bundle REP MOVSx when the direction flag is set.
I'm sorry, but I don't have the appropriate information available for
newer architectures. Obviously, it seems to be valid for your Core i5 as
well.

>
> Bottom line: MOVE can be fast. And if you want CMOVE and CMOVE> to be
> fast, you need to make them so complicated that their combination is
> more complicated than MOVE.
>
>
> And very limited benefit. See
> <2017Sep2...@mips.complang.tuwien.ac.at>
>
>
> No, th difference (as interpreted by the current implementors of C) is
> that memcpy guarantees that there is no overlap. Just use memmove,
> and forget memcpy.

This results just from using memcpy on overlapped regions being UB while
for memmove it is perfectly defined.
If you force clang or gcc into using library code (-fno-builtin-memcpy,
-fno-builtin-memmove) you can see that memmove, in contrast to memcpy,
starts with a check if dest-start is within the bounds [src-start,
src-start + length) and branches to a backwards copy if yes.

However, the C standards folks .oO(#&$!) have their problems with
memcpy/memmove and the like, because they can be used to achieve things
that were UB, when implemented straightforward...

>
> - anton
>

Ruvim

unread,
Aug 31, 2021, 4:39:13 PM8/31/21
to
On 2021-08-31 10:10, Anton Ertl wrote:
[...]
> Stepen Pelc found the following code in the wild
> <4ffffe5e....@192.168.0.50> (AFAIK Bernd Paysan used to use such
> words):
>
> \ VFX-specific
> : list> ( thread -- element )
> BEGIN @ dup WHILE dup r@ execute
> REPEAT drop r> drop ;
>
> which is used like
>
> : foo bar list> bla blub ;
>
> A list is passed to LIST>, which then executes the "bla blub" part
> repeatedly, once for every list element.
> A more modern (and more flexible and soon-standard)
> approach is to put "bla blub" inside a quotation
> and call that with EXECUTE for each list element.

Yes, it's more flexible but sometime it's less convenient.

I thought about this problem before. Why not throw out our conventional
control flow structures for choices and loops and just use the
quotation-based approach? Factor and some other languages go by this way.

A source code can look like the following:

[: foo bar ;] 10 times

flag [: 1 . ;] [: 0 . ;] ifelse

[: buz flag ;] while

Among other advantages, the quotation-bases control flow can be used
interpretively.

So why not go with this way only? A problem with this way is that it
produces worse readability. Have a look.

Let's take the example of BacFORTH-based nested iteration over two lists
(from my previous message):

: foo bar list=> bluh buz list=> blub ;

Using the quotation-based approach this word "foo" will look as:

: foo [: bluh [: blub ;] buz list:forearch ;] bar list:forearch ;

\ where
\ list:forearch ( i*x xt list -- j*x ) \ xt ( i*x item -- j*x )

It's obvious what the variant is better readable, I think.

A similar picture we will have with nested loops, choices, etc. The more
complex structures, the less readable a quotation-based solution will be.


Yes, when an xt is passed to us, it's more convenient to just apply an
iterator to this xt. And when we have an iterator that accepts an xt,
it's usually more convenient to pass a quotation to it, than create a
separate definition.

But if we have both - an iterator that accepts an xt, and a
corresponding structure that wraps a code, it's easier to use this
structure with our regular code, than the iterator with our code in a
quotation.

Both a quotation and a special structure wraps a fragment of source
code. But the quotation doesn't provide any specific information for the
reader, while the special structure provide some additional information,
and therefore it produces better readability.


There is such a conception as dichotomy of form and content, or markup
and content, in our case of source codes. So we can conditionally
attribute a word to markup or to content.

[: ... ;] -- is a markup.
begin ... while ... repeat -- is a markup too.

The latter markup brings more semantics information for the reader, and
therefore it can be read easier.


Going back to the BacFORTH-based example:

: foo bar list=> bluh buz list=> blub ;

A problem with this markup is that the end of structure is not marked
explicitly.

A better markup can look as following:

: foo bar list:each{ bluh buz list:each{ blub } } ;

Even better to have optional explicit endings:

: foo bar list:each{ bluh buz list:each{ blub }list:each }list:each ;

that make sense on nesting different structures.

(Concerning curly brackets - we can use other symbols instead, it
doesn't matter)




One more step deeper.
An iterator and a structure can be defined one via another.

Take a look, the word "list:each" via the "list:each{ ... }" structure
can be defined as:

: list:each ( xt list -- )
list:each{ ( i*x item xt -- j*x xt )
dup >r execute r>
} drop
;

And vise versa:

: list:each{ \ run-time: ( i*x list -- j*x )
\ wrapped-code ( i*x item -- j*x )
postpone [: end{ postpone list:each }
; immediate

This variant cannot be used interpretively, but this problem is
solvable. Actually, I prefer an implementations with active parsing for
such structures, but I need a recognizers API for that (for nesting the
Forth text interpreter).


Well, we can go even deeper.
We can provide a word "list:each" to the system, and the corresponding
structure will be available for us *automatically*, via a general
recognizer for such cases (e.g., for curly brackets).




Bottom line: we haven't to choose one from these approaches, we can use
them all - iterators and structures. For the conventional control flow
structures the corresponding higher order functions are very useful
sometimes too.


--
Ruvim

Hugh Aguilar

unread,
Aug 31, 2021, 9:33:57 PM8/31/21
to
On Tuesday, August 31, 2021 at 12:47:22 AM UTC-7, Anton Ertl wrote:
> Still, people have used > to end other return-address-using words;
> e.g., Stepen Pelc found the following code in the wild
> <4ffffe5e....@192.168.0.50> (AFAIK Bernd Paysan used to use such
> words):
>
> \ VFX-specific
> : list> ( thread -- element )
> BEGIN @ dup WHILE dup r@ execute
> REPEAT drop r> drop ;
>
> which is used like
>
> : foo bar list> bla blub ;
>
> A list is passed to LIST>, which then executes the "bla blub" part
> repeatedly, once for every list element. A more modern (and more
> flexible and soon-standard) approach is to put "bla blub" inside a
> quotation and call that with EXECUTE for each list element.
>
> So > at the end of the word has become a general indication that the
> rest of the word is treated as separate definition.

This is crap code.
1.) The first node in the list never gets processed because the @ of the next-node
is done in the front.
2.) The pointer to the next-node after the node being processed is not obtained
prior to processing the node. So, if the node's link field gets modified (for example,
zero'd out if the node is moved to a new list), The @ will not get the next-node.
3.) The pointer to the node being processed is on the data-stack during the processing
of the node, so the processing code can't access the data on the data-stack that it
needs to access to communicate with the parent function.

I think Stephen Pelc was impressed by the trickiness of the function, that it
executes the "bla blub" code --- Stephen Pelc had no intention of ever actually
using his tricky little bug-ridden function, so he didn't notice the bugs.

I don't think that Stephen Pelc understands the concept of general-purpose
data-structures --- Peter Knaggs doesn't either, and Stephen Pelc described
Peter Knaggs as MPE's tool-builder --- it is pathetic that the Forth-200x
committee continue to fail to implement a linked-list --- a linked-list is pretty easy.
https://groups.google.com/g/comp.lang.forth/c/cMa8wV3OiY0/m/INBDVBh0BgAJ

Here is code that works:
-------------------------------------------------------------
\ ******
\ ****** some rquotation code
\ ******

VFX? SwiftForth? or [if] \ these don't work with HumptyDumpty's code because these HOFs have locals of their own.

\ These are for use with REX and rquotations. The | prefix is the naming convention for anything that uses quotations.
\ I am getting rid of that "toucher" term and replacing it with "quotation."

\ In VFX it is okay to use REX on an xt or use EXECUTE on an rq --- so these can be used for everything --- but this doesn't work on SwiftForth.
\ On SwiftForth you can only use REX on an rq and EXECUTE on an xt --- so you have to use both kinds of HOF as appropriate.

: |each ( i*x head rq -- j*x ) \ quotation: i*x node -- j*x
{ rq | next -- }
begin ?dup while \ -- node
dup .fore @ to next
rq rex
next repeat ;

: |find-node ( i*x head rq -- j*x node|false ) \ quotation: i*x node -- j*x flag
{ node rq | next -- node|false }
begin node while
node .fore @ to next
node rq rex if node exit then
next to node repeat
false ;

: |find-prior ( i*x head rq -- j*x -1|node|false ) \ quotation: i*x node -- j*x flag
-1 { node rq prior | next -- prior|false } \ prior is -1, meaning found node was the head
begin node while
node .fore @ to next
node rq rex if prior exit then
node to prior next to node repeat
false ;

[then]
-------------------------------------------------------------

dxforth

unread,
Aug 31, 2021, 9:42:50 PM8/31/21
to
On 1/09/2021 02:43, Anton Ertl wrote:
> Bernd Linsel <bl1-rem...@gmx.com> writes:
>> ...
>>And, unfortunately, introducing a run-time decision for the rare case
>>that source and destination memory areas overlap.
> ...
> MOVE does not call CMOVE or CMOVE>, so the editing-time advice costs
> quite a bit of memory and buys hardly any speed on VFX.
> ...
> Bottom line: MOVE can be fast. And if you want CMOVE and CMOVE> to be
> fast, you need to make them so complicated that their combination is
> more complicated than MOVE.

MOVE may be faster since unlike CMOVE CMOVE> it's not required to propagate.
Anyone that cares about speed is likely to implement the former in asm
anyway assuming no optimizer.

The only purpose of the propagation rule seems to have been this from
Fig-Forth:

1 : FILL ( FILL MEMORY BEGIN-3, QUAN-2, BYTE-1 *)
2 SWAP >R OVER C! DUP 1+ R> 1 - CMOVE ;

Nor did every Fig-Forth bother e.g. 8080 Fig did it in asm.

VFX CMOVE has been moving cells contrary to ANS for many years apparently
without issue. I went the same route for my 8086 forth and never looked
back. On that system all moves down are done in cells; only CMOVE> moves
bytes. Consequently I preference CMOVE. That said, moving a few bytes is
never going to be efficient.

P Falth

unread,
Sep 1, 2021, 3:22:34 AM9/1/21
to
In lxf if all parameters are known at compile tile, like in this case, MOVE
tests and chooses the appropriate of cmove and cmove> that are compiled
inline. in this case your bench-move and bench-cmove produces the same code.
If the parameters are not known at compile time a move that does the test
at runtime is compiled.

cmove> also uses the reps movsb but needs to setup the start addresses and direction
flag. Lokking at the results it is clear that Intel forget to improve the down direction
when they improved rep movsb

Peter

Anton Ertl

unread,
Sep 1, 2021, 3:56:22 AM9/1/21
to
dxforth <dxf...@gmail.com> writes:
>VFX CMOVE has been moving cells contrary to ANS for many years apparently
>without issue.

As far as I can tell, VFX sticks to the standard requirements for
CMOVE. E.g.,

: myfill SWAP >R OVER C! DUP 1+ R> 1 - CMOVE ;
pad 16 'b' fill
pad 16 dump \ output:
08BB:6480 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 bbbbbbbbbbbbbbbb

Looking at the output of SEE CMOVE, I see a loop that moves cells, and
two loops that move bytes. I have not analyzed the control flow
between the loops completely, but it looks like VFX CMOVE is using two
comparisons and conditional branches to check whether the to address
is in the from range, and if it is, select the second byte loop.

Anton Ertl

unread,
Sep 1, 2021, 4:40:49 AM9/1/21
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>dxforth <dxf...@gmail.com> writes:
>>VFX CMOVE has been moving cells contrary to ANS for many years apparently
>>without issue.
>
>As far as I can tell, VFX sticks to the standard requirements for
>CMOVE. E.g.,
>
>: myfill SWAP >R OVER C! DUP 1+ R> 1 - CMOVE ;
>pad 16 'b' fill
>pad 16 dump \ output:
>08BB:6480 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 bbbbbbbbbbbbbbbb

This does not prove anything. Here's the proper test:

pad 16 'b' myfill
pad 16 dump \ output:
08BB:6480 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 bbbbbbbbbbbbbbbb

(and yes, I filled pad with something else in between).

dxforth

unread,
Sep 1, 2021, 4:44:04 AM9/1/21
to
On 1/09/2021 17:44, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>VFX CMOVE has been moving cells contrary to ANS for many years apparently
>>without issue.
>
> As far as I can tell, VFX sticks to the standard requirements for
> CMOVE. E.g.,
>
> : myfill SWAP >R OVER C! DUP 1+ R> 1 - CMOVE ;
> pad 16 'b' fill
> pad 16 dump \ output:
> 08BB:6480 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 62 bbbbbbbbbbbbbbbb
>
> Looking at the output of SEE CMOVE, I see a loop that moves cells, and
> two loops that move bytes. I have not analyzed the control flow
> between the loops completely, but it looks like VFX CMOVE is using two
> comparisons and conditional branches to check whether the to address
> is in the from range, and if it is, select the second byte loop.

Spotted this in the change notes:

20131212 SFP011 Added overlap checks to CMOVE and CMOVE>.
On overlap, always use a byte by byte copy.

Stephen Pelc

unread,
Sep 1, 2021, 5:17:28 AM9/1/21
to
On 31 Aug 2021 at 18:43:41 CEST, "Anton Ertl" <Anton Ertl> wrote:
>
> For vfx (VFX Forth for Linux IA32 Version: 4.72), the implemenations
> of the three words are much longer:
>
> bytes
> 124 move
> 96 cmove
> 118 cmove>

That's an artefact of x86 32 bit coding. Writing the fastest possible CMOVE
is laborious. For x86_64/x64 the situation is different as most CPUs have
heavily optimised implementations of REP MOVSB. See Agner Fog et al.

For ARM32, a fast CMOVE has to account for the alignment of the
source and destination addresses as well as the length. By measurement,
the "vast but fast" implementation takes 950 bytes, but is 5 times faster
on average than a simplistic CMOVE. I have not tested ARM64 yet.

Stephen

Anton Ertl

unread,
Sep 1, 2021, 8:09:19 AM9/1/21
to
Stephen Pelc <ste...@vfxforth.com> writes:
>On 31 Aug 2021 at 18:43:41 CEST, "Anton Ertl" <Anton Ertl> wrote:
>>
>> For vfx (VFX Forth for Linux IA32 Version: 4.72), the implemenations
>> of the three words are much longer:
>>
>> bytes
>> 124 move
>> 96 cmove
>> 118 cmove>
>
>That's an artefact of x86 32 bit coding. Writing the fastest possible CMOVE
>is laborious. For x86_64/x64 the situation is different as most CPUs have
>heavily optimised implementations of REP MOVSB. See Agner Fog et al.

For some interpretation of "most CPUs". Here's a Xeon X3460 (Nehalem,
from around 2009) I use every day, and a Ryzen 5800X (Zen 3, from
2021):

cycles/iteration
Nehalem Zen 3
vfx64 vfx32 vfx64 vfx32
212 53 232 24 bench-move
120 48 27 21 bench-cmove
201 45 224 21 bench-cmove>

So, REP MOVSB as used in CMOVE is indeed doing ok on the Zen 3, but it
seems that MOVE goes for CMOVE> in the benchmarked case, and REP MOVSB
as used in CMOVE> is slow on both of these CPUs. It's better to only
go for CMOVE> if the to address is in the from range; you can check
for this efficiently with

to from - u u< if from to u cmove> else from to u cmove then

An additional advantage is that the IF is very predictable (overlap is
very rare), whereas the above-or-below test is less predictable.

The code size has benefitted quite a bit from switching to REP MOVSB:

bytes
80 move
36 cmove
48 cmove>

Ruvim

unread,
Sep 1, 2021, 11:04:55 AM9/1/21
to
On 2021-08-31 19:12, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> On 2021-08-31 10:10, Anton Ertl wrote:
>>> Ruvim <ruvim...@gmail.com> writes:
>>>> In "cmove>", the symbol ">" is just a hint for "from higher addresses to
>>>> lower addresses", but it's very *confused*. Since ">" can also mean the
>>>> normal direction from left to right (and then, from lower to higher
>>>> addresses).
>>>
>>> Note the pronounciation: "c-move-up". It means: cmove the memory
>>> block from lower to higher addresses; i.e., the name describes the
>>> usage (what), not the how.
>> The how is overspecified for the application, though.
>> [...]
>>
>> It's unconvincing. It seems to me that "from lower to higher addresses"
>> is about "how".
>
> The "how" for CMOVE> is to process the character with the highest
> address first, the one with the lowest addess last. So I would call
> the "how" for CMOVE> "from high to low addresses".
>
>> Also, there is a word "cmove" with pronunciation "c-move", without "up",
>> that actually moves "from lower to higher addresses".
>
> It processes the character with the lower address first. If you use
> it for moving to an overlapping address range with a higher address,
> you don't get a copy of the original.
>
>> In anyway, the intention was that ">" is pronounced as "up" in this
>> case. Perhaps, it means "move to upper addresses"?
>
> That's what I was trying to express.

Oh, I see.

The wording "from lower to higher addresses" is almost the same as in
the specification of how "cmove" works: "proceeding
character-by-character from lower addresses to higher addresses"


A problem with this etymology for "cmove>" is that it *can* be actually
used to copy a memory region to a lower address range as well as to a
higher address range.

The only subtle moment when "up" plays any role, and "cmove>" can
produce the different (but expected) result than usual, is the case when
the target range is lower than the source range ("moving down") but
still has an intersection with the source range.

Ditto for "cmove", but for the "moving up" case.


Concerning overspecifying, — I think some applications rely on this
specific behavior to fill a memory region by a pattern.

Some examples of filling by a pattern are following.

s" *******A-B-" 2dup 4 /string 3 pick swap cmove> type

output: -B-A-B-A-B-


s" A-B-*******" 2dup 4 /string 3 pick -rot cmove type

output: A-B-A-B-A-B


Obviously, such effect cannot be achieved via "MOVE":

s" A-B-*******" 2dup 4 /string 3 pick -rot move type

output: A-B-A-B-***
NB: this string shows a part of the source memory region in front of
the complete target memory region.
The target region content is "A-B-***" that is the same as the source
region content before coping.



Probably, for big regions such memory propagation is more efficient than
call "MOVE" for a pattern in a loop.

In any case, this behavior cannot be changed due to backward
compatibility. Though I haven't yet face such application of "CMOVE" or
"CMOVE>".



--
Ruvim

Anton Ertl

unread,
Sep 1, 2021, 11:24:44 AM9/1/21
to
Ruvim <ruvim...@gmail.com> writes:
>On 2021-08-31 10:10, Anton Ertl wrote:
>> : foo bar list> bla blub ;
>>
>> A list is passed to LIST>, which then executes the "bla blub" part
>> repeatedly, once for every list element.
>> A more modern (and more flexible and soon-standard)
>> approach is to put "bla blub" inside a quotation
>> and call that with EXECUTE for each list element.
>
>Yes, it's more flexible but sometime it's less convenient.
>
>I thought about this problem before. Why not throw out our conventional
>control flow structures for choices and loops and just use the
>quotation-based approach? Factor and some other languages go by this way.

In particular, Postscript.

Why not in Forth? One reason is that you cannot use the return stack
or locals to pass data into or out of quotations. Hugh Aguilar's
version of rquotations would at least allow locals; not sure about
humptydumpty's version.

>Let's take the example of BacFORTH-based nested iteration over two lists
>(from my previous message):
>
> : foo bar list=> bluh buz list=> blub ;
>
>Using the quotation-based approach this word "foo" will look as:
>
> : foo [: bluh [: blub ;] buz list:forearch ;] bar list:forearch ;
>
> \ where
> \ list:forearch ( i*x xt list -- j*x ) \ xt ( i*x item -- j*x )

Maybe the lack of readability comes from the name. The stack effect
is also a bad idea; you seem to be assuming that BAR and BUZ have the
stack effects ( -- x ), and in that case the stack effect does not
show its problems, except maybe in readability. Anyway, here we use a
shorter name and a better stack effect:

\ forlist ( ... list xt -- ... ), xt ( ... element -- ... )
: foo bar [: bluh buz ['] blub forlist ;] forlist ;

Readable enough for my taste. Now consider:

: foo2 bar [: bluh buz ['] blub forlist flip ;] forlist flop ;

How would you code that with LIST=> ?

There is a reason why we do not have IF=>, IFELSE=> (how would that
even work?), DO=> etc.

Instead we have words that start and end a control structure, and
sometimes words in the middle (ELSE, WHILE).

Andrew Haley suggested that over xt-taking words, so for the list case
this might be:

\ using the same approach of starting with @, adjust as necessary
: <list ]] begin @ dup while dup >r [[ ; immediate
: list> ]] r> repeat drop [[ ; immediate

And our examples become:

: foo bar <list bluh buz <list blub list> list> ;
: foo2 bar <list bluh buz <list blub list> flip list> flop ;

OTOH, I have added TRY ... ENDTRY (with some variants) as an
alternative to CATCH in Gforth, but find that I more often use CATCH,
despite TRY having additional functionality.

Ruvim

unread,
Sep 1, 2021, 11:49:27 AM9/1/21
to
On 2021-08-31 20:41, S Jack wrote:
> On Tuesday, August 31, 2021 at 7:30:46 AM UTC-5, Ruvim wrote:
>> I would not encourage such usage of ">", and would suggest something
>> more contrast like "==>" instead (if any).
>
> Punctuation in word names has little merit. I'm sure you have
> noted that I use BUILD DOES as oppose to <BUILDS DOES> .

Don't sure. Where can your code be viewed?


> The brackets don't provide any _vital_ meaning
> and is problematic in providing a
> consistent meaning as testified by this thread.

I disagree, at least partially.

In case of DOES a telling meaning can be achieved using the following
markup variants:

: foo ... create ... does{ ... } ... ;

: foo ... create ... [: ... ;] does ... ;


> It also makes text descriptions difficult because punctuation in names
> conflicts with how punctuation is used in text


I use quoting to avoid such conflicts. I write "compile,", or 's"', or
"move" to distinct a Forth word in the surrounding text and avoid ambiguity.



> and it can be also be devastating
> if the host OS or communications gets a hold of them.

I can agree concerning OS API or similar things.



> Note that assembly code does not use punctuation in symbols for
> good reason, all the above said problems.
>
> Dropping unneeded punctuation from word names IMHO would be a step
> in the right direction.

Historically, many names in Forth already contain punctuation symbols.
And some of them have quite stable meaning. For example, "," comma
always means that something is appended into data and/or code space.
Such words cannot be easy throw out or replaced.


--
Ruvim

Anton Ertl

unread,
Sep 1, 2021, 12:45:31 PM9/1/21
to
Ruvim <ruvim...@gmail.com> writes:
>On 2021-08-31 19:12, Anton Ertl wrote:
>> Ruvim <ruvim...@gmail.com> writes:
>>> On 2021-08-31 10:10, Anton Ertl wrote:
>>>> Note the pronounciation: "c-move-up". It means: cmove the memory
>>>> block from lower to higher addresses; i.e., the name describes the
>>>> usage (what), not the how.
...
>The wording "from lower to higher addresses" is almost the same as in
>the specification of how "cmove" works: "proceeding
>character-by-character from lower addresses to higher addresses"

Yes, the difference is between "proceeding" and "move".

>A problem with this etymology for "cmove>" is that it *can* be actually
>used to copy a memory region to a lower address range as well as to a
>higher address range.

The name and its pronounciation reflects the intended use, not all
possible uses.

>The only subtle moment when "up" plays any role, and "cmove>" can
>produce the different (but expected) result than usual, is the case when
>the target range is lower than the source range ("moving down") but
>still has an intersection with the source range.

Yes.

>Concerning overspecifying, — I think some applications rely on this
>specific behavior to fill a memory region by a pattern.

It's an overspecification for the MOVE-like use. Of course, once you
have an overspecification, it can allow other uses.

OTOH, an underspecification like memcpy() in the C standard is even
worse than an overspecification and has led to breakage and long
discussions <https://bugzilla.redhat.com/show_bug.cgi?id=638477>
<https://sourceware.org/bugzilla/show_bug.cgi?id=12518>. The way to
go is a complete specification that can be implemented efficiently.
Forth provides this with MOVE, and C with memmove(), making CMOVE,
CMOVE> only useful for legacy code and the (not-recommended, see
below) pattern case, and memcpy() useful only for legacy code.

>Some examples of filling by a pattern are following.
>
> s" *******A-B-" 2dup 4 /string 3 pick swap cmove> type
>
>output: -B-A-B-A-B-
>
>
> s" A-B-*******" 2dup 4 /string 3 pick -rot cmove type
>
>output: A-B-A-B-A-B
...
>Probably, for big regions such memory propagation is more efficient than
>call "MOVE" for a pattern in a loop.

That's doubtful. This usage produces dependencies from the store of
the second A to the load of it (5-6 cycles on modern CPUs), while
using MOVE multiple times does not. If you want to reduce the loop
and MOVE initialization overhead, first generate a longer pattern, and
then call MOVE with longer patterns. Or use a doubling-at-each step
strategy (or maybe quadrupling at each step); the latency described
above will play a role in the first few iterations, for later
iterations the throughput is the limit.

dxforth

unread,
Sep 1, 2021, 1:24:00 PM9/1/21
to
On 2/09/2021 01:04, Ruvim wrote:
> ...
> Concerning overspecifying, — I think some applications rely on this
> specific behavior to fill a memory region by a pattern.
>
> Some examples of filling by a pattern are following.
>
> s" *******A-B-" 2dup 4 /string 3 pick swap cmove> type
>
> output: -B-A-B-A-B-
>
>
> s" A-B-*******" 2dup 4 /string 3 pick -rot cmove type
>
> output: A-B-A-B-A-B

ANS delivers novelty patterns. Speed one must find elsewhere :)

Ruvim

unread,
Sep 1, 2021, 4:29:53 PM9/1/21
to
On 2021-09-01 17:45, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> On 2021-08-31 10:10, Anton Ertl wrote:
>>> : foo bar list> bla blub ;
>>>
>>> A list is passed to LIST>, which then executes the "bla blub" part
>>> repeatedly, once for every list element.
>>> A more modern (and more flexible and soon-standard)
>>> approach is to put "bla blub" inside a quotation
>>> and call that with EXECUTE for each list element.
>>
>> Yes, it's more flexible but sometime it's less convenient.
>>
>> I thought about this problem before. Why not throw out our conventional
>> control flow structures for choices and loops and just use the
>> quotation-based approach? Factor and some other languages go by this way.
>
> In particular, Postscript.
>
> Why not in Forth? One reason is that you cannot use the return stack
> or locals to pass data into or out of quotations.

Do you want to say that the data stack is not sufficient to pass data
to/from quotations? Why not introduce another stack then?


Anyway, the problem I'm trying to describe doesn't depend on these
features. This problem can have place even if closures are available.



> Hugh Aguilar's
> version of rquotations would at least allow locals;

But they cannot be used beyond the run-time of the definition where they
were defined (and so they might not be returned), isn't? Also, they
cannot have own locals. It's a cost of easy implementation of accessing
the parent's locals.

So, these rquotations are neither closures, nor quotations, but some
third independent type of a reference to a code fragment.

> not sure about humptydumpty's version.
>


>> Let's take the example of BacFORTH-based nested iteration over two lists
>> (from my previous message):
>>
>> : foo bar list=> bluh buz list=> blub ;
>>
>> Using the quotation-based approach this word "foo" will look as:
>>
>> : foo [: bluh [: blub ;] buz list:forearch ;] bar list:forearch ;
>>
>> \ where
>> \ list:forearch ( i*x xt list -- j*x ) \ xt ( i*x item -- j*x )
>
> Maybe the lack of readability comes from the name. The stack effect
> is also a bad idea; you seem to be assuming that BAR and BUZ have the
> stack effects ( -- x ), and in that case the stack effect does not
> show its problems, except maybe in readability. Anyway, here we use a
> shorter name and a better stack effect:
>
> \ forlist ( ... list xt -- ... ), xt ( ... element -- ... )
> : foo bar [: bluh buz ['] blub forlist ;] forlist ;

I intentionally write "[: blub ;]" instead of "['] blub", since I imply
that several words can be used in the "blub" placeholder.


> Readable enough for my taste. Now consider:
>
> : foo2 bar [: bluh buz ['] blub forlist flip ;] forlist flop ;
>
> How would you code that with LIST=> ?

I didn't advocate for the "X=>" markup, I just showed its better
readability, and noted its disadvantage that it only ends with ";" of
the containing word (similar to "does>").

So, in a better markup (that I also showed) I would code it as:

: foo2 bar list:each{ bluh buz 'blub list:each flip } flop ;
or
: foo3 bar list:each{ bluh buz list:each{ blub } flip } flop ;

where
list:each ( ... list xt -- ... )
list:each{ Run-time: ( ... list -- ... )
list:each{ content Run-time: ( ... element -- ... )

Probably, naming (and namespaces) can be better. But my point is that
the stack effect (whether xt or list is on the top) doesn't matter for
readability in this case. I talk about the different problem.

Compare the following code fragments:

bar [: bluh ...

and

bar list:each{ bluh ...

From the latter fragment of code we can assume (with some confidence)
that "bar" returns a list on the top of the stack, and from the curly
bracket the fragment of code starts that treats each element of the
list, and "bluh" accepts an element from the top of the stack.

What we can assume from the former code fragment? Actually nothing,
except it uses a quotation. Since we don't yet see how this quotation is
used.

So readability of a quotation-based code is less than a code with
structured iterators. This problem could even recall the "callback hell"
problem.




> There is a reason why we do not have IF=>, IFELSE=> (how would that
> even work?), DO=> etc.
>
> Instead we have words that start and end a control structure, and
> sometimes words in the middle (ELSE, WHILE).


It seems I was misunderstood.

"LIST=> ... ;" is a construct as well as "IF ... ELSE ... THEN".

In a quotation-based approach we use a higher order function instead of
all these constructs.

I don't consider moving into "X=>"-like construct (or markup), but
moving from the conventional control flow constructs to the
corresponding higher order functions.

I.e., instead of
if foo else bar then

we can write
'foo 'bar' ifelse

Or use quotations as
[: foo buz ;] [: bar qud ;] ifelse

Sometimes such functions (like "ifelse) are very useful, but usually
they lead to less readable code.

Compare above with the following:

if foo buz else bar qud then

or with a curly-based variant:
if{ foo buz }else{ bar qud }

The variant based on "ifelse" and a quotation has worse readability.


>
> Andrew Haley suggested that over xt-taking words, so for the list case
> this might be:

Is everything all right with Andrew? I haven't seen him here for a long
time.


>
> \ using the same approach of starting with @, adjust as necessary
> : <list ]] begin @ dup while dup >r [[ ; immediate
> : list> ]] r> repeat drop [[ ; immediate
>
> And our examples become:
>
> : foo bar <list bluh buz <list blub list> list> ;
> : foo2 bar <list bluh buz <list blub list> flip list> flop ;

Yes, it's what I consider, regardless of the implementation.
This naming choice is not so good, but even this variant is easier to
read than the quotation-based variant:

: foo2 bar [: bluh buz [: blub ;] forlist flip ;] forlist flop ;



> OTOH, I have added TRY ... ENDTRY (with some variants) as an
> alternative to CATCH in Gforth, but find that I more often use CATCH,
> despite TRY having additional functionality.

Do you compare quotation+CATCH against TRY+ENDTRY ?
Since only this comparison makes sense in this discussion context.






Bottom line.
I try to show that it's not a good idea to *replace* the conventional
control flow structures and structured iterators by higher order
functions. Rationale: the code that is only based on higher order
functions and quotations has less readability. Although, HOFs and
quotations are very useful sometimes.


--
Ruvim

Anton Ertl

unread,
Sep 1, 2021, 6:20:20 PM9/1/21
to
I suggested that using a single CMOVE for filling a memory region with
a pattern is inefficient and that we could use an approach that
doubles the filled buffer size at each step (and without overlaps in
copying, so one could use either CMOVE, MOVE, or CMOVE>, as
convenient). So I wanted to test this.

As it turns out, the resulting word also satisfies the requirements
for CMOVE, so I call it CMOVE, too:

: cmove {: afrom ato u | u1 -- :}
ato afrom - u u< if \ overlapping case
ato afrom - u over + to u1 begin
afrom afrom 2 pick + 2 pick u1 over - min cmove
2* dup u u>= until
drop
else \ non-overlapping case
ato afrom u cmove
then ;

This CMOVE calls the simple CMOVE, because that is the fastest variant
for VFX and lxf.

Eliminating the locals is left as an exercise to the puristic readers.

I benchmarked this with:

: bench {: usize ulength -- :}
usize ulength + allocate throw
ulength 0 ?do 'a' i + over i + c! loop
10000000 0 ?do dup dup ulength + usize cmove loop ;

usize is the size of the yet-unfilled buffer, ulength is the length of
the pattern. I used usize=1000 and ulength from 1..5.

Here are the results in cycles/iteration on Zen 3 (Ryzen 5800X):

VFX64 lxf VFX32
orig new orig new orig new ulength
7474 1125 7624 1095 7807 423 1
3360 965 3140 933 4273 420 2
2367 957 2380 922 3018 411 3
2068 790 2147 761 2418 400 4
1645 779 1766 754 2083 426 5

As can be seen, the new CMOVE is substantially faster than the REP
MOVSB based one for this use case. Of course, the question remains
how relevant this use case is; but then, this is the only use case
where CMOVE behaves differently from MOVE, so it may be a
justification for keeping CMOVE.

We see that, the larger ulength is, the faster does the REP MOVSB
perform, because it can copy more bytes during the latency of the
store-load-store chain that determines performance. For the new
CMOVE, we also see a slight improvement with increasing ulength,
because there are then fewer iterations of the loop in the new CMOVE;
the cost of such an iteration at ~160 cycles is surprisingly high;
this may have to do with the cache behaviour of REP MOVSB.

VFX32 uses a simple byte-copying loop rather than REP MOVSB for the
overlapping case (orig), and performs similar to the REP MOVSB
implemenation. VFX32 uses a cell-copying loop in the non-overlapping
case (new), resulting in even faster performance; also, we see that
the number of iterations within the new CMOVE does not have as big an
influence as for VFX64 and lxf.

dxforth

unread,
Sep 1, 2021, 8:28:37 PM9/1/21
to
On 2/09/2021 07:34, Anton Ertl wrote:
> I suggested that using a single CMOVE for filling a memory region with
> a pattern is inefficient and that we could use an approach that
> doubles the filled buffer size at each step (and without overlaps in
> copying, so one could use either CMOVE, MOVE, or CMOVE>, as
> convenient). So I wanted to test this.
>
> As it turns out, the resulting word also satisfies the requirements
> for CMOVE, so I call it CMOVE, too:
>
> : cmove {: afrom ato u | u1 -- :}
> ato afrom - u u< if \ overlapping case
> ato afrom - u over + to u1 begin
> afrom afrom 2 pick + 2 pick u1 over - min cmove
> 2* dup u u>= until
> drop
> else \ non-overlapping case
> ato afrom u cmove
> then ;

F-PC in the mid 80's saw no requirement. Any need for propagation in
CMOVE falling away when Forth-79 introduced FILL and made it a core word.

Marcel Hendrix

unread,
Sep 2, 2021, 2:34:22 AM9/2/21
to
On Thursday, September 2, 2021 at 12:20:20 AM UTC+2, Anton Ertl wrote:
> I suggested that using a single CMOVE for filling a memory region with
> a pattern is inefficient and that we could use an approach that
> doubles the filled buffer size at each step (and without overlaps in
> copying, so one could use either CMOVE, MOVE, or CMOVE>, as
> convenient). So I wanted to test this.
>
> As it turns out, the resulting word also satisfies the requirements
> for CMOVE, so I call it CMOVE, too:

Here I got a bit confused what you meant exactly (I didn't want to
read the whole thread again, sorry). I assume that the new cmove
can do the same as the old cmove, move and cmove> and therefore
triplicated the code with "cmove" replaced by either of the 3 old
words.
[..]
> This CMOVE calls the simple CMOVE, because that is the fastest variant
> for VFX and lxf.
[..]
> : bench {: usize ulength -- :}
> usize ulength + allocate throw
> ulength 0 ?do 'a' i + over i + c! loop
> 10000000 0 ?do dup dup ulength + usize cmove loop ;

this leaves the address of allocated area on the stack.

[..]
> Here are the results in cycles/iteration on Zen 3 (Ryzen 5800X):
>
> VFX64 lxf VFX32
> orig new orig new orig new ulength
> 7474 1125 7624 1095 7807 423 1
> 3360 965 3140 933 4273 420 2
> 2367 957 2380 922 3018 411 3
> 2068 790 2147 761 2418 400 4
> 1645 779 1766 754 2083 426 5

Below are the results for iForth64. The last three columns are for
the new cmove using either cmove, move, or cmove>.
The results are in ms.

FORTH> TEST
\ size len cm mv cm>
\ 1000 1 701 623 687
\ 1000 2 665 578 648
\ 1000 3 659 596 642
\ 1000 4 559 462 541
\ 1000 5 656 577 637

It appears length 4 is optimal, and "move" is fastest but
I am not at all sure that this tests the same thing you do.

-marcel

Anton Ertl

unread,
Sep 2, 2021, 3:55:08 AM9/2/21
to
Marcel Hendrix <m...@iae.nl> writes:
>On Thursday, September 2, 2021 at 12:20:20 AM UTC+2, Anton Ertl wrote:
>> I suggested that using a single CMOVE for filling a memory region with
>> a pattern is inefficient and that we could use an approach that
>> doubles the filled buffer size at each step (and without overlaps in
>> copying, so one could use either CMOVE, MOVE, or CMOVE>, as
>> convenient). So I wanted to test this.
>>
>> As it turns out, the resulting word also satisfies the requirements
>> for CMOVE, so I call it CMOVE, too:
>
>Here I got a bit confused what you meant exactly (I didn't want to
>read the whole thread again, sorry).

The point here is about implementing the pattern-replicating
functionality of CMOVE efficiently. I.e., if you have "ABC" at the
start of BUF, you can fill BUF with "ABCABCABC..." with

buf buf buflen 3 /string cmove

>I assume that the new cmove
>can do the same as the old cmove, move and cmove> and therefore
>triplicated the code with "cmove" replaced by either of the 3 old
>words.

The new CMOVE behaves as CMOVE is required to do (not MOVE, not
CMOVE>). The orig CMOVE it calls in the overlapping case can be
replaced by either MOVE or CMOVE>, because these calls only pass
non-overlapping parameters. Rethinking, for the "non-overlapping"
case you can use CMOVE or MOVE, but not CMOVE>, because this case is
not really non-overlapping, it's just that ato is not in the afrom
range; there is still the possibility of afrom being in the ato range,
and CMOVE> would not work for that. So here's a version with valid
alternative lines in comments:

: cmove {: afrom ato u | u1 -- :}
ato afrom - u u< if \ overlapping case
ato afrom - u over + to u1 begin
afrom afrom 2 pick + 2 pick u1 over - min cmove
\ afrom afrom 2 pick + 2 pick u1 over - min move
\ afrom afrom 2 pick + 2 pick u1 over - min cmove>
2* dup u u>= until
drop
else \ the usual non-pattern case
ato afrom u cmove
\ ato afrom u move
then ;

The non-pattern case is not benchmarked.

>Below are the results for iForth64. The last three columns are for
>the new cmove using either cmove, move, or cmove>.
>The results are in ms.
>
>FORTH> TEST
>\ size len cm mv cm>
>\ 1000 1 701 623 687
>\ 1000 2 665 578 648
>\ 1000 3 659 596 642
>\ 1000 4 559 462 541
>\ 1000 5 656 577 637
>
>It appears length 4 is optimal, and "move" is fastest but
>I am not at all sure that this tests the same thing you do.

It probably does. The different lengths are different use cases
(different pattern lengths), so they are not replacements for each
other, and it does not make sense to

Your results look similar to the VFX32 "new" results, so I guess you
are not using REP MOVSB for implementing CMOVE, CMOVE>, and MOVE.

The reason that longer patterns tend to be a little faster ist that
they reduce the number of iterations in the new CMOVE. I guess that
the reason that pattern length 4 is faster than pattern length 5 for
VFX32 is due to VFX32 using cell (4-byte) loads and stores for much of
the memory block, and byte loads and stores for the rest. With
pattern length 5 the first few iterations perform unaligned cell loads
and stores, and perform some byte loads and stores. Maybe something
along these lines happens for iForth, too.

dxforth

unread,
Sep 2, 2021, 5:48:55 AM9/2/21
to
Are you saying MOVE will do? Elizabeth's position was the /only/ reason
to include CMOVE CMOVE> in ANS was for reasons of propagation. She went
on to say CMOVE> was rarely used for this purpose and could probably be
scrapped. If MOVE will do then both can be scrapped.

Anton Ertl

unread,
Sep 2, 2021, 6:54:03 AM9/2/21
to
dxforth <dxf...@gmail.com> writes:
>Are you saying MOVE will do? Elizabeth's position was the /only/ reason
>to include CMOVE CMOVE> in ANS was for reasons of propagation. She went
>on to say CMOVE> was rarely used for this purpose and could probably be
>scrapped. If MOVE will do then both can be scrapped.

Yes, you can provide MOVE, and for those who use CMOVE for pattern
propagation, provide a file that contains

: cmove {: afrom ato u | u1 -- :}
ato afrom - u u< if \ overlapping case
ato afrom - u over + to u1 begin
afrom afrom 2 pick + 2 pick u1 over - min move
2* dup u u>= until
drop
else \ the usual non-pattern case
ato afrom u move
then ;

Benchmarking that in addition to the orig-CMOVE-based version (again,
on the Ryzen 5800X):

cycles/iteration
VFX64 lxf VFX32
orig new move orig new move orig new move ulength
7474 1125 2903 7624 1095 2240 7807 423 380 1
3360 965 2722 3140 933 2059 4273 420 373 2
2367 957 2724 2380 922 2313 3018 411 389 3
2068 790 2537 2147 761 1871 2418 400 375 4
1645 779 2544 1766 754 2002 2083 426 370 5

So it depends on the MOVE implementation whether this is a good idea.

For VFX32 the resulting CMOVE is even faster than the orig-CMOVE-based
one.

OTOH, VFX64's and lxf's MOVE use the slow case of REP MOVSB in this
usage, and the result is not so great. It is still better than the
original CMOVE for length<=2 (VFX64) or length<=4 (lxf), but I expect
that in the non-pattern case the new CMOVE will never be faster and
occasionally much slower than the original CMOVE on these systems.

dxforth

unread,
Sep 2, 2021, 7:50:16 AM9/2/21
to
On 2/09/2021 07:34, Anton Ertl wrote:
>
> Eliminating the locals is left as an exercise to the puristic readers.

: propagate ( afrom ato u -- )
>r swap ( ato afrom) 2dup - r@ u< if \ overlapping case
tuck ( ato afrom) - r@ ( u) over + begin ( u1) >r
over ( afrom) dup 2 pick + 2 pick r@ over - min move
2* r> over r@ ( u) u< not
until r> 2drop 2drop
else \ the usual non-pattern case
r> ( ato afrom u) move
then ;

Ruvim

unread,
Sep 2, 2021, 8:59:04 AM9/2/21
to
On 2021-09-01 18:58, Anton Ertl wrote:
> Ruvim <ruvim...@gmail.com> writes:
>> On 2021-08-31 19:12, Anton Ertl wrote:
>>> Ruvim <ruvim...@gmail.com> writes:
>>>> On 2021-08-31 10:10, Anton Ertl wrote:
>>>>> Note the pronounciation: "c-move-up". It means: cmove the memory
>>>>> block from lower to higher addresses; i.e., the name describes the
>>>>> usage (what), not the how.
> ...
>> The wording "from lower to higher addresses" is almost the same as in
>> the specification of how "cmove" works: "proceeding
>> character-by-character from lower addresses to higher addresses"
>
> Yes, the difference is between "proceeding" and "move".
>
>> A problem with this etymology for "cmove>" is that it *can* be actually
>> used to copy a memory region to a lower address range as well as to a
>> higher address range.
>
> The name and its pronounciation reflects the intended use, not all
> possible uses.

A possible interpretation of "up" is "start coping from upper addresses"
Though, the specification use the word "higher" for that.
As you show in your tests, "cmove>" can be efficiently implemented in
the high level (comparing to a straightforward machine code).

OTOH, it seems "cmove>" is used quite rare.

Is there any sense to move "cmove>" (and maybe "cmove") to String
extension words?


--
Ruvim

Ruvim

unread,
Sep 2, 2021, 9:46:57 AM9/2/21
to
On 2021-08-31 15:23, dxforth wrote:
> Here's a listing of the naming conventions published in 'Thinking
> Forth' and 'Forth Programmers Handbook':
>
> https://pastebin.com/qpZLFc6h
>
> As to the difference between:
>
>   !name     Store into name                           !DATA
>
>   name!     Store into name                           B!
>
> I was informed it likely stemmed from when forth names were stored
> as count-plus-three characters.

Neither standard word follows this convention concerning "!".
E.g.: "C!", "2!", "+!", "DEFER!", etc.

Though in usual practice, if you have the words "name" and "name!", the
latter one stores something into "name". Probably it's better to use
"!name" instead.


Concerning other old conventions — some of them are outdated, some of
them don't enough consistent.

In general, for new noninternal words it's better to avoid special
symbols except some most consistent and conventional variants.


--
Ruvim

Anton Ertl

unread,
Sep 2, 2021, 10:12:19 AM9/2/21
to
Ruvim <ruvim...@gmail.com> writes:
>A possible interpretation of "up" is "start coping from upper addresses"

Whatever helps you to understand and use it correctly.

>As you show in your tests, "cmove>" can be efficiently implemented in
>the high level (comparing to a straightforward machine code).

Actually, my tests are about CMOVE, not CMOVE>, but I expect similar
results for CMOVE> (except that the orig CMOVE> does worse than the
orig CMOVE on implementations using REP MOVSB, so maybe the speedups
will be better).

>OTOH, it seems "cmove>" is used quite rare.
>
>Is there any sense to move "cmove>" (and maybe "cmove") to String
>extension words?

What would be the benefit? A system (which one?) that does not
implement CMOVE>/CMOVE could change the documentation. New system
implementors (who?) might feel less motivated to implement it. Forth
programmers might be more cautious about using it (although, given
that MOVE is in CORE and the others are in STRING, those who care
about such things probably go for MOVE already).

If these were new words, sure, put them into STRING EXT, but once they
are standardized, I see little benefit in moving them around (except
for moving obsolescent words out of CORE).

Stephen Pelc

unread,
Sep 2, 2021, 10:44:50 AM9/2/21
to
On 1 Sep 2021 at 23:34:40 CEST, "Anton Ertl" <Anton Ertl> wrote:
>
> Here are the results in cycles/iteration on Zen 3 (Ryzen 5800X):
>
> VFX64 lxf VFX32
> orig new orig new orig new ulength
> 7474 1125 7624 1095 7807 423 1
> 3360 965 3140 933 4273 420 2
> 2367 957 2380 922 3018 411 3
> 2068 790 2147 761 2418 400 4
> 1645 779 1766 754 2083 426 5

Well, that was interesting, if not a little frustrating. My tests have been
performed on
a mid-2012 Macbook Pro with a 2.6GHz Core i7. The Forth is:

VFX Forth 64 for Mac OS X x64
© MicroProcessor Engineering Ltd, 1998-2021

Version: 5.20 RC2 [build 0381]
Build date: 1 July 2021

For no special cases, REP MOVSB seems to be the fastest straight copy, but
the limited pattern stuff means that this cannot be used much. Far and away
the
best alternative is a transliteration of VFX32's CMOVE> to AMD64. Hence, the
next
release will contain CMOVE and CMOVE> as described here, and a new MOVE
with a more sophisticated overlap detector.

These changes willl mean that if you want to use CMOVE and CMOVE> to
perform tricksy copies, you still can, but I have no sympathy at the corner
cases.
The general case can be handled by MOVE.

My conclusions with respect to AMD64 are that REP MOVSB is only optimised
for CLD (increasing addresses), and that for decreasing addresses you are
better
off (at least on old x64 CPUs) writing old-style code without MOVSx. In
particular,
all my experiments using REP MOVSQ were failures from a performance point of
view. There are notes online that suggest that the fastest code depends on
knowing
the block length.

> As can be seen, the new CMOVE is substantially faster than the REP
> MOVSB based one for this use case. Of course, the question remains
> how relevant this use case is; but then, this is the only use case
> where CMOVE behaves differently from MOVE, so it may be a
> justification for keeping CMOVE.

If you write code that depends on corner cases, you NEED to keep
CMOVE and CMOVE>.

Stephen

--
Stephen Pelc, ste...@vfxforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, +44 (0)78 0390 3612, +34 649 662 974
http://www.mpeforth.com - free VFX Forth downloads

dxforth

unread,
Sep 2, 2021, 6:49:46 PM9/2/21
to
Hmmm - neither this nor the locals version appear to be completing
the fill. Should I need a propagating move, I'll just define one
as follows:

: pmove ( a a2 u --- ) bounds ?do dup c@ i c! 1+ loop drop ;

For straight memory copying I'll use CMOVE CMOVE> MOVE in that
order of preference as it seems the most logical to me.

dxforth

unread,
Sep 2, 2021, 9:32:31 PM9/2/21
to
On 2/09/2021 23:46, Ruvim wrote:
> On 2021-08-31 15:23, dxforth wrote:
>> Here's a listing of the naming conventions published in 'Thinking
>> Forth' and 'Forth Programmers Handbook':
>>
>> https://pastebin.com/qpZLFc6h
>>
>> As to the difference between:
>>
>>   !name     Store into name                           !DATA
>>
>>   name!     Store into name                           B!
>>
>> I was informed it likely stemmed from when forth names were stored
>> as count-plus-three characters.
>
> Neither standard word follows this convention concerning "!".
> E.g.: "C!", "2!", "+!", "DEFER!", etc.
>
> Though in usual practice, if you have the words "name" and "name!", the
> latter one stores something into "name". Probably it's better to use
> "!name" instead.

Were I extending the range of standard operators I'd try to maintain
convention. For operations specific to an application, it probably
doesn't matter and one can use what feels best. If it's at odds with
convention it stands out which can be a good thing.

: !FNAME ( a n -- ) max-path min cf cell+ place ;
: @FNAME ( -- a n ) cf cell+ count ;

: @STR ( ofs -- a u ) >buf count ;
: !STR ( a u ofs -- ) >buf place ;
: @BYT ( ofs -- c ) >buf c@ ;
: !BYT ( c ofs -- ) >buf c! ;

Anton Ertl

unread,
Sep 3, 2021, 6:48:38 AM9/3/21
to
dxforth <dxf...@gmail.com> writes:
>On 2/09/2021 21:50, dxforth wrote:
>> On 2/09/2021 07:34, Anton Ertl wrote:
>>>
>>> Eliminating the locals is left as an exercise to the puristic readers.
>>
>> : propagate ( afrom ato u -- )
>> >r swap ( ato afrom) 2dup - r@ u< if \ overlapping case
>> tuck ( ato afrom) - r@ ( u) over + begin ( u1) >r
>> over ( afrom) dup 2 pick + 2 pick r@ over - min move
>> 2* r> over r@ ( u) u< not
>> until r> 2drop 2drop
>> else \ the usual non-pattern case
>> r> ( ato afrom u) move
>> then ;
>
>Hmmm - neither this nor the locals version appear to be completing
>the fill.

I tested only the pattern case, and of course the non-pattern case is
broken (to and from interchanged, reported be Peter Faelth) in the
version I posted, and you did not catch my bug when eliminating the
locals. If you have any other case where this CMOVE does not satisfy
the requirements for CMOVE, please provide it as a test case.

So this one should be correct:

: cmove {: afrom ato u | u1 -- :}
ato afrom - u u< if \ pattern replication case
ato afrom - u over + to u1 begin
afrom afrom 2 pick + 2 pick u1 over - min move
2* dup u u>= until
drop
else \ the usual non-pattern case
afrom ato u move
then ;

And adjusting your version for it:

: propagate ( afrom ato u -- )
>r 2dup swap - r@ u< if \ overlapping case
over - r@ ( u) over + begin ( u1) >r
over ( afrom) dup 2 pick + 2 pick r@ over - min move
2* r> over r@ ( u) u< not
until r> 2drop 2drop
else \ the usual non-pattern case
r> ( afrom ato u) move
then ;

Untested.

dxforth

unread,
Sep 3, 2021, 10:32:56 PM9/3/21
to
On 3/09/2021 20:31, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>
>>Hmmm - neither this nor the locals version appear to be completing
>>the fill.
>
> I tested only the pattern case, and of course the non-pattern case is
> broken (to and from interchanged, reported be Peter Faelth) in the
> version I posted, and you did not catch my bug when eliminating the
> locals. If you have any other case where this CMOVE does not satisfy
> the requirements for CMOVE, please provide it as a test case.

Using Ruvim's test:

s" A-B-*******" 2dup 4 /string 3 pick -rot cmove cr type
A-B-A-B-*** ok

What continues to remain unanswered however is the necessity to have
CMOVE propagate. 200x has once again made CMOVE and CMOVE> logical
factors of MOVE, and as such, deserve to be as fast as possible. As
your code demonstrates, propagation and moving data can - and perhaps
should - be de-linked.

Anton Ertl

unread,
Sep 4, 2021, 3:51:42 AM9/4/21
to
dxforth <dxf...@gmail.com> writes:
>s" A-B-*******" 2dup 4 /string 3 pick -rot cmove cr type
>A-B-A-B-*** ok

Thanks. I hope I got it right this time:

: cmove {: afrom ato u -- :}
ato afrom - u u< if \ pattern propagation case
ato afrom - >r ato u begin
2dup r@ min afrom -rot move
dup r@ u> while
r@ /string r> 2* >r repeat
r> drop 2drop
else \ non-pattern case
afrom ato u move
then ;

Rather than trying to find and fix the thinko, I just rewrote the
overlapping case from scratch. This will not work on systems that
rely on programs complying with the restriction 13.3.3.2 d).

>What continues to remain unanswered however is the necessity to have
>CMOVE propagate. 200x has once again made CMOVE and CMOVE> logical
>factors of MOVE, and as such, deserve to be as fast as possible.

They are not logical factors at all, exactly because of performance
reasons. CMOVE and CMOVE> have requirements that make straightforward
implementations slow (and MOVE based on it slow, too), while fast
implementations (like the one above) are more complex than required
for implementing MOVE.

You may be thinking of common factors to both. Let's call them

MOVE< ( from to u -- ) \ do not call MOVE< with to in [from,from+u)
MOVE> ( from to u -- ) \ do not call MOVE> with from in [to,to+u)

You could use MOVE< instead of MOVE in the code above.

>As
>your code demonstrates, propagation and moving data can - and perhaps
>should - be de-linked.

They are: We have CMOVE (and CMOVE>) for propagation, and MOVE for moving.

Anton Ertl

unread,
Sep 4, 2021, 9:01:51 AM9/4/21
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>dxforth <dxf...@gmail.com> writes:
>>s" A-B-*******" 2dup 4 /string 3 pick -rot cmove cr type
>>A-B-A-B-*** ok
>
>Thanks. I hope I got it right this time:
>
>: cmove {: afrom ato u -- :}
> ato afrom - u u< if \ pattern propagation case
> ato afrom - >r ato u begin
> 2dup r@ min afrom -rot move
> dup r@ u> while
> r@ /string r> 2* >r repeat
> r> drop 2drop
> else \ non-pattern case
> afrom ato u move
> then ;

Still does not work in some corner cases; here's a more fully tested
version (again rewritten from scratch):

: cmove ( afrom ato u -- )
dup 0= ?exit
begin {: afrom ato u :} ato afrom - {: u1 :}
afrom ato u1 u umin move
u1 1 u within while
afrom ato u u1 /string repeat ;

And here a no-locals version (also tested) for those systems that are
not up to the locals usage exhibited in the code above:

: cmove ( afrom ato u -- )
dup 0= ?exit
begin ( afrom1 ato1 u1 )
over 3 pick - 2>r
2dup 2r@ umin move
2r@ 1 rot within while
2r> /string repeat
2r> 2drop 2drop ;

We recently had a discussion about "confusing stack juggling". Who
has an easier time following the data flow in the no-locals version
compared to the locals version?

The performance of this variant is slightly worse than the old variant
using MOVE (but then, the old version is not correct):

cycles/iteration
VFX64 lxf VFX32
orig new move sep4 orig new move sep4 orig new move sep4 ulength
7474 1125 2903 2960 7624 1095 2240 2258 7807 423 380 396 1
3360 965 2722 2774 3140 933 2059 2071 4273 420 373 386 2
2367 957 2724 2768 2380 922 2313 2329 3018 411 389 397 3
2068 790 2537 2583 2147 761 1871 1888 2418 400 375 392 4
1645 779 2544 2585 1766 754 2002 2024 2083 426 370 381 5

dxforth

unread,
Sep 4, 2021, 9:16:36 AM9/4/21
to
On 4/09/2021 17:31, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
> ...
>>What continues to remain unanswered however is the necessity to have
>>CMOVE propagate. 200x has once again made CMOVE and CMOVE> logical
>>factors of MOVE, and as such, deserve to be as fast as possible.
>
> They are not logical factors at all, exactly because of performance
> reasons. CMOVE and CMOVE> have requirements that make straightforward
> implementations slow (and MOVE based on it slow, too), while fast
> implementations (like the one above) are more complex than required
> for implementing MOVE.

What requirement - propagation in CMOVE? I can't recall ever using it.
On the rare occasion pattern fill is required a stand-alone function
would be simpler.

: !pattern ( buf len pat len -- )
2 pick min 2>r
begin dup r@ min 0> while
over 2r@ rot swap cmove r@ /string
repeat 2r> 2drop 2drop ;

> You may be thinking of common factors to both. Let's call them
>
> MOVE< ( from to u -- ) \ do not call MOVE< with to in [from,from+u)
> MOVE> ( from to u -- ) \ do not call MOVE> with from in [to,to+u)
>
> You could use MOVE< instead of MOVE in the code above.
>
>>As
>>your code demonstrates, propagation and moving data can - and perhaps
>>should - be de-linked.
>
> They are: We have CMOVE (and CMOVE>) for propagation, and MOVE for moving.

Pre ANS CMOVE and CMOVE> were used for moving. MOVE is for those who are
confused as to which to use.

Anton Ertl

unread,
Sep 4, 2021, 11:29:21 AM9/4/21
to
dxforth <dxf...@gmail.com> writes:
>On 4/09/2021 17:31, Anton Ertl wrote:
>> CMOVE and CMOVE> have requirements that make straightforward
>> implementations slow (and MOVE based on it slow, too), while fast
>> implementations (like the one above) are more complex than required
>> for implementing MOVE.
>
>What requirement - propagation in CMOVE?

Propagation of patterns is a consequence; the specified requirement
is:

|If u is greater than zero, copy u consecutive characters from the data
|space starting at c-addr1 to that starting at c-addr2, proceeding
|character-by-character from lower addresses to higher addresses.

>I can't recall ever using it.

Then you can simply replace all calls of CMOVE in your code with calls
of MOVE for better performance.

If you then don't need CMOVE, you can remove it from your system.

[dxforth wrote:]
>>>As
>>>your code demonstrates, propagation and moving data can - and perhaps
>>>should - be de-linked.
>>
>> They are: We have CMOVE (and CMOVE>) for propagation, and MOVE for moving.
>
>Pre ANS CMOVE and CMOVE> were used for moving. MOVE is for those who are
>confused as to which to use.

The classical CMOVE (and CMOVE>) with its byte-by-byte implementation
could be used for block-copying (with some limitations) and for
patterns, and it still can. If you want something less limited and
more efficient for block-copying, use MOVE. If you want something
more efficient for patterns, use my CMOVE implementation.

dxforth

unread,
Sep 4, 2021, 11:01:34 PM9/4/21
to
On 5/09/2021 01:10, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>On 4/09/2021 17:31, Anton Ertl wrote:
>>> CMOVE and CMOVE> have requirements that make straightforward
>>> implementations slow (and MOVE based on it slow, too), while fast
>>> implementations (like the one above) are more complex than required
>>> for implementing MOVE.
>>
>>What requirement - propagation in CMOVE?
>
> Propagation of patterns is a consequence; the specified requirement
> is:
>
> |If u is greater than zero, copy u consecutive characters from the data
> |space starting at c-addr1 to that starting at c-addr2, proceeding
> |character-by-character from lower addresses to higher addresses.

I know what the spec is. You broke it when you decided the char-by-char
requirement was inefficient.

>
>>I can't recall ever using it.
>
> Then you can simply replace all calls of CMOVE in your code with calls
> of MOVE for better performance.

There's no guarantee MOVE has better performance than CMOVE. It certainly
spends more time deciding - something I can do at compile-time where it
costs nothing.

Anton Ertl

unread,
Sep 5, 2021, 6:38:32 AM9/5/21
to
dxforth <dxf...@gmail.com> writes:
>On 5/09/2021 01:10, Anton Ertl wrote:
>> |If u is greater than zero, copy u consecutive characters from the data
>> |space starting at c-addr1 to that starting at c-addr2, proceeding
>> |character-by-character from lower addresses to higher addresses.
>
>I know what the spec is. You broke it when you decided the char-by-char
>requirement was inefficient.

From what you wrote, you decided that the requirement is inefficient,
and use cell-by-cell copying instead. Which is fine for copying, but
does not always produce the results that CMOVE is required to produce.
By contrast, my CMOVE implementation is efficient and always produces
the right results.

>There's no guarantee MOVE has better performance than CMOVE. It certainly
>spends more time deciding - something I can do at compile-time where it
>costs nothing.

Certainly? My implementation of CMOVE spends more time on deciding
than a typical implementation of MOVE.

Sure, you can write a dumb CMOVE. It will not spend time on deciding,
but it will also be slow. Or you can use a fast CMOVE, which will
spend time on deciding, but amortizes this time quickly.

You can implement MOVE to call a dumb CMOVE, and that's the only
scenario in which your reasoning holds; but then, if you have
implemented both CMOVE and MOVE to perform slowly, why worry about
eliminating the time that MOVE takes for deciding? Penny-wise and
pound-foolish?

Or you can implement MOVE to call the MOVE< or MOVE> factors mentioned
earlier, which do not give the guarantees that CMOVE and CMOVE> give,
and can therefore be implemented efficiently with fewer decisions.

And if you go for efficient implementations, the overhead of MOVE is
less than that of CMOVE. Here are implementations of both in terms of
MOVE< and MOVE>:

: move ( from to u -- )
over 3 pick - 2 pick u< if \ to in [from,from+u)
move>
else
move<
then ;

: cmove ( afrom ato u -- )
dup 0= if exit then
begin ( afrom1 ato1 u1 )
over 3 pick - 2>r
2dup 2r@ umin move<
2r@ 1 rot within while
2r> /string repeat
2r> 2drop 2drop ;

Marcel Hendrix

unread,
Sep 5, 2021, 9:52:03 AM9/5/21
to
Indeed surprising... The results are reported in cycles/iteration.
With locals times are significantly slower.
I repeated the measurements 4 times, no change in any of the
shown digits.

FORTH> test
AMD Ryzen 7 5800X 8-Core Processor, TICKS-GET uses iTSC at 4192MHz
original new-locals new-stack
\ 1000 1 5,562 310 275
\ 1000 2 3,069 293 266
\ 1000 3 2,103 285 250
\ 1000 4 1,611 229 204
\ 1000 5 1,320 292 260 ok

-marcel

P Falth

unread,
Sep 5, 2021, 10:14:30 AM9/5/21
to
When I check the standard it is clear that MOVE moves address units
from and to address units. With now 1 au = 1 char could that not be simplified to
be c-addrs and chars?

Is there any program (except test programs) that rely on cmove propagating?
I did a faster cmove when porting LXF64 to ARM64 but got caught by the test program
Arm does not have anything like rep movs so the byte per byte copy suffers much
more then on X64.

Peter

Anton Ertl

unread,
Sep 5, 2021, 10:40:14 AM9/5/21
to
P Falth <peter....@gmail.com> writes:
>When I check the standard it is clear that MOVE moves address units
>from and to address units. With now 1 au = 1 char could that not be simplified to
>be c-addrs and chars?

Yes.

>Is there any program (except test programs) that rely on cmove propagating?

I don't think I have used this feature of CMOVE in production code,
but then, I don't use CMOVE in code that does not use this feature,
either.

>I did a faster cmove when porting LXF64 to ARM64 but got caught by the test program
>Arm does not have anything like rep movs so the byte per byte copy suffers much
>more then on X64.

Your "faster CMOVE" might be MOVE<, and you can then use the code I
posted for MOVE and CMOVE.

dxforth

unread,
Sep 5, 2021, 10:54:11 AM9/5/21
to
On 5/09/2021 19:50, Anton Ertl wrote:
> dxforth <dxf...@gmail.com> writes:
>>On 5/09/2021 01:10, Anton Ertl wrote:
>>> |If u is greater than zero, copy u consecutive characters from the data
>>> |space starting at c-addr1 to that starting at c-addr2, proceeding
>>> |character-by-character from lower addresses to higher addresses.
>>
>>I know what the spec is. You broke it when you decided the char-by-char
>>requirement was inefficient.
>
> From what you wrote, you decided that the requirement is inefficient,
> and use cell-by-cell copying instead. Which is fine for copying, but
> does not always produce the results that CMOVE is required to produce.
> By contrast, my CMOVE implementation is efficient and always produces
> the right results.

I decided a CMOVE faster than MOVE was in my best interests. A CMOVE
taking up space and no reason to use, was not.

>>There's no guarantee MOVE has better performance than CMOVE. It certainly
>>spends more time deciding - something I can do at compile-time where it
>>costs nothing.
>
> Certainly? My implementation of CMOVE spends more time on deciding
> than a typical implementation of MOVE.

MOVE has to make a decision; CMOVE and CMOVE> do not.
0 new messages