Jonesforth and Hayes CORE tests

Richard Russell

unread,

Oct 1, 2009, 1:05:22 PM10/1/09

to

I'm gradually working my way through the Hayes CORE tests (core.fr),
fixing problems in BB4Wforth (based on Jonesforth) as I find them -
and there have been many! Even Jonesforth's implementation of WHILE
and REPEAT, whilst working correctly in all common circumstances,
proved slightly non-compliant.

Inevitably I've now hit the CREATE...DOES> tests (see results below).
I know that Jonesforth is substantially incompatible with compliant
behaviour, which is why <BUILDS...DOES> has been implemented by a
couple of people instead. What I don't know is just how much work
would be involved in making CREATE...DOES> work properly.

I would value the opinion of the assembled experts on whether it is
worth pursuing this, or whether <BUILDS...DOES> is as good as it gets
with Jonesforth.

Richard.
http://www.rtrussell.co.uk/

BB4Wforth version 0.31 adapted from Jonesforth version 45
Corrections and additions by R.T. Russell, September 2009
244943 cells remaining
OK
S" tester.f" INCLUDED
S" core.f" INCLUDED

TESTING CORE WORDS
TESTING BASIC ASSUMPTIONS
TESTING BOOLEANS: INVERT AND OR XOR
TESTING 2* 2/ LSHIFT RSHIFT
TESTING COMPARISONS: 0= = 0< < > U< MIN MAX
TESTING STACK OPS: 2DROP 2DUP 2OVER 2SWAP ?DUP DEPTH DROP DUP OVER ROT
SWAP
TESTING >R R> R@
TESTING ADD/SUBTRACT: + - 1+ 1- ABS NEGATE
TESTING MULTIPLY: S>D * M* UM*
TESTING DIVIDE: FM/MOD SM/REM UM/MOD */ */MOD / /MOD MOD
TESTING HERE , @ ! CELL+ CELLS C, C@ C! CHARS 2@ 2! ALIGN ALIGNED +!
ALLOT
TESTING CHAR [CHAR] [ ] BL S"
TESTING ' ['] FIND EXECUTE IMMEDIATE COUNT LITERAL POSTPONE STATE
TESTING IF ELSE THEN BEGIN WHILE REPEAT UNTIL RECURSE
TESTING DO LOOP +LOOP I J UNLOOP LEAVE EXIT
TESTING DEFINING WORDS: : ; CONSTANT VARIABLE CREATE DOES> >BODY

Abort error: T{ CR1
OK

Josh Grams

unread,

Oct 1, 2009, 6:37:52 PM10/1/09

to

Richard Russell wrote:
> I'm gradually working my way through the Hayes CORE tests (core.fr),
> fixing problems in BB4Wforth (based on Jonesforth) as I find them -
> and there have been many! Even Jonesforth's implementation of WHILE
> and REPEAT, whilst working correctly in all common circumstances,
> proved slightly non-compliant.
>
> Inevitably I've now hit the CREATE...DOES> tests (see results below).
> I know that Jonesforth is substantially incompatible with compliant
> behaviour, which is why <BUILDS...DOES> has been implemented by a
> couple of people instead. What I don't know is just how much work
> would be involved in making CREATE...DOES> work properly.
>
> I would value the opinion of the assembled experts on whether it is
> worth pursuing this, or whether <BUILDS...DOES> is as good as it gets
> with Jonesforth.

Disclaimer: I don't know the history at all, and I'm basing this on just
a quick grep through my forth directory.

Looking at Andrew's <BUILDS DOES> implementation and the Forth-83
standard and a couple of comments in the PFE source, it looks to me like
CREATE is simply the new (ANS) name for the FIG-Forth <BUILDS.

--Josh

Coos Haak

unread,

Oct 1, 2009, 7:44:09 PM10/1/09

to

Op Thu, 01 Oct 2009 22:37:52 GMT schreef Josh Grams:

But with a difference, DOES> is now IMMEDIATE and <BUILDS compiled an
extra cell ;-)
Is there a <BUILDS in the Forth-83 standard? It was abandoned in
Forth-79 and replaced by CREATE I seem to remember.
--
Coos

CHForth, 16 bit DOS applications
http://home.hccnet.nl/j.j.haak/forth.html

Andrew Haley

unread,

Oct 2, 2009, 4:12:22 AM10/2/09

to

> Op Thu, 01 Oct 2009 22:37:52 GMT schreef Josh Grams:

> > Richard Russell wrote:
> >> I'm gradually working my way through the Hayes CORE tests (core.fr),
> >> fixing problems in BB4Wforth (based on Jonesforth) as I find them -
> >> and there have been many! Even Jonesforth's implementation of WHILE
> >> and REPEAT, whilst working correctly in all common circumstances,
> >> proved slightly non-compliant.
> >>
> >> Inevitably I've now hit the CREATE...DOES> tests (see results below).
> >> I know that Jonesforth is substantially incompatible with compliant
> >> behaviour, which is why <BUILDS...DOES> has been implemented by a
> >> couple of people instead. What I don't know is just how much work
> >> would be involved in making CREATE...DOES> work properly.

For goodness' sake, I have already provided a compliant version of
CREATE ... DOES> ! The only difference was that I had to rename the
standard word CREATE to <BUILDS because the name CREATE was already
used for something else. If you rename

CREATE -> (create) or somesuch
<BUILDS -> CREATE

you're almost done. All that remains is to point the default code
field of a child of CREATE at the runtime action of VARIABLE .

> > Looking at Andrew's <BUILDS DOES> implementation and the Forth-83
> > standard and a couple of comments in the PFE source, it looks to me like
> > CREATE is simply the new (ANS) name for the FIG-Forth <BUILDS.

Exactly.

Andrew.

Richard Russell

unread,

Oct 2, 2009, 4:33:48 AM10/2/09

to

On 2 Oct, 09:12, Andrew Haley <andre...@littlepinkcloud.invalid>
wrote:

> For goodness' sake, I have already provided a compliant version of
> CREATE ... DOES> !

My conclusion was based on this exchange between Albert van der Horst
and yourself:

> > >: DODOES R> LATEST @ >CFA ! ;
> > >: <BUILDS WORD CREATE 0 , ;
> > >: DOES> ' DODOES , E8 C, (DOES) HERE @ 4+ - , ;
> > A DOES> that allocates spaces is incompatible with ISO.
> > The standard allows DOES> to fill in a pointer, not more.
>
> I think we must agree that, in the case of JonesForth, ISO is the
> pinnacle of irrelevance! But yes, having the data somewhere other
> than at the PFA is certainly an inconvenience.

I'm quite confused. On the one hand you seemed to admit that your
DOES> is not compliant, but now you say that it is!

My other puzzlement is that, as Coos said, your <BUILDS compiles an
extra cell (0 ,) which AIUI CREATE shouldn't do.

> CREATE -> (create) or somesuch
> <BUILDS -> CREATE

If I do that, the Hayes test fails much earlier.

Richard.
http://www.rtrussell.co.uk/
To reply by email change' news' to my forename.

Richard Russell

unread,

Oct 2, 2009, 5:19:00 AM10/2/09

to

On 2 Oct, 09:33, Richard Russell <n...@rtrussell.co.uk> wrote:
> My other puzzlement is that, as Coos said, your <BUILDS compiles an
> extra cell (0 ,) which AIUI CREATE shouldn't do.

Scrub that, I'm getting confused with the other implementation of
<BUILDS (which is the one I actually ended up using); I'm not too sure
what Coos meant now. Nevertheless, I still want to know whether your
DOES> is compliant or not.

Andrew Haley

unread,

Oct 2, 2009, 5:22:42 AM10/2/09

to

Richard Russell <ne...@rtrussell.co.uk> wrote:
> On 2 Oct, 09:12, Andrew Haley <andre...@littlepinkcloud.invalid>
> wrote:
> > For goodness' sake, I have already provided a compliant version of
> > CREATE ... DOES> !

> My conclusion was based on this exchange between Albert van der Horst
> and yourself:

> > > >: DODOES R> LATEST @ >CFA ! ;
> > > >: <BUILDS WORD CREATE 0 , ;
> > > >: DOES> ' DODOES , E8 C, (DOES) HERE @ 4+ - , ;
> > > A DOES> that allocates spaces is incompatible with ISO.
> > > The standard allows DOES> to fill in a pointer, not more.
> >
> > I think we must agree that, in the case of JonesForth, ISO is the
> > pinnacle of irrelevance! But yes, having the data somewhere other
> > than at the PFA is certainly an inconvenience.

> I'm quite confused. On the one hand you seemed to admit that your
> DOES> is not compliant, but now you say that it is!

No, I didn't say that, I said that ISO compliance was irrelevant to
JonesForth, as it stood at the time.

> My other puzzlement is that, as Coos said, your <BUILDS compiles an
> extra cell (0 ,) which AIUI CREATE shouldn't do.

He's wrong. There is no extra cell. That "extra cell" is no such
thing: it's the code field. The problem is that JonesForth's CREATE
doesn't construct the whole header, just the name field. A real
CREATE is something like

: CREATE WORD (JonesCREATE) (variable) , ;

> > CREATE -> (create) or somesuch
> > <BUILDS -> CREATE

> If I do that, the Hayes test fails much earlier.

For a fully compliant CREATE, instead of 0 , you have to do something
like (variable) , where (variable) is the runtime action of VARIABLE .
This is something like:

_variable:

add $4, %eax // push the PFA of the defined word
push %eax
NEXT

The full set is:

: CREATE WORD (Old JonesCREATE) (variable) , ;
: VARIABLE CREATE 0 , ;

: DOES> ' DODOES , E8 C, (DOES) HERE @ 4+ - , ;

Andrew.

Richard Russell

unread,

Oct 2, 2009, 5:59:16 AM10/2/09

to

On 2 Oct, 10:22, Andrew Haley <andre...@littlepinkcloud.invalid>
wrote:

> No, I didn't say that, I said that ISO compliance was irrelevant to
> JonesForth, as it stood at the time.

You seemed to agree with Albert's assertion that:

> > > A DOES> that allocates spaces is incompatible with ISO.
> > > The standard allows DOES> to fill in a pointer, not more.

That reads to me that your implementation is non-compliant. In the
'alternative' implementation of <BUILDS...DOES> that I ended up using,
the only thing that DOES> does is to fill in a pointer.

The relevant Hayes test is this; do you believe your proposed
CREATE...DOES> will work?

T{ : DOES1 DOES> @ 1 + ; -> }T
T{ : DOES2 DOES> @ 2 + ; -> }T
T{ CREATE CR1 -> }T
T{ CR1 -> HERE }T
T{ ' CR1 >BODY -> HERE }T
T{ 1 , -> }T
T{ CR1 @ -> 1 }T
T{ DOES1 -> }T
T{ CR1 -> 2 }T
T{ DOES2 -> }T
T{ CR1 -> 3 }T

T{ : WEIRD: CREATE DOES> 1 + DOES> 2 + ; -> }T
T{ WEIRD: W1 -> }T
T{ ' W1 >BODY -> HERE }T
T{ W1 -> HERE 1 + }T
T{ W1 -> HERE 2 + }T

> He's wrong. There is no extra cell.

I know; I corrected myself in another message. I convinced myself
that Coos was right by looking at the 'other' implementation of
<BUILDS DOES> which really does create an extra cell:

: <BUILDS WORD (CREATE) dodoes , 0 , ;

Andrew Haley

unread,

Oct 2, 2009, 6:41:19 AM10/2/09

to

Richard Russell <ne...@rtrussell.co.uk> wrote:
> On 2 Oct, 10:22, Andrew Haley <andre...@littlepinkcloud.invalid>
> wrote:
> > No, I didn't say that, I said that ISO compliance was irrelevant to
> > JonesForth, as it stood at the time.

> You seemed to agree with Albert's assertion that:

> > > > A DOES> that allocates spaces is incompatible with ISO.
> > > > The standard allows DOES> to fill in a pointer, not more.

> That reads to me that your implementation is non-compliant.

Let's recap, carefully. Albert said that "A DOES> that allocates
spaces is incompatible". He's right, it is. But the runtime action
of my DOES> just fills in the code field with an appropriate pointer.

> In the 'alternative' implementation of <BUILDS...DOES> that I ended
> up using, the only thing that DOES> does is to fill in a pointer.

Yes, but the <BUILDS that works with that DOES> adds an extra field,
which renders it noncompliant.

> The relevant Hayes test is this; do you believe your proposed
> CREATE...DOES> will work?

> T{ : DOES1 DOES> @ 1 + ; -> }T
> T{ : DOES2 DOES> @ 2 + ; -> }T
> T{ CREATE CR1 -> }T
> T{ CR1 -> HERE }T
> T{ ' CR1 >BODY -> HERE }T
> T{ 1 , -> }T
> T{ CR1 @ -> 1 }T
> T{ DOES1 -> }T
> T{ CR1 -> 2 }T
> T{ DOES2 -> }T
> T{ CR1 -> 3 }T

> T{ : WEIRD: CREATE DOES> 1 + DOES> 2 + ; -> }T
> T{ WEIRD: W1 -> }T
> T{ ' W1 >BODY -> HERE }T
> T{ W1 -> HERE 1 + }T
> T{ W1 -> HERE 2 + }T

Yes. You'll have to make sure that >BODY does the right thing.

Andrew.

Coos Haak

unread,

Oct 2, 2009, 12:04:55 PM10/2/09

to

Op Fri, 02 Oct 2009 05:41:19 -0500 schreef Andrew Haley:

There must be some misunderstanding.
When I see mentioning <BUILDS I regard that as the implementation
described by Figforth, with an extra cell:

: <BUILDS 0 CONSTANT ;
: DOES> R> LATEST PFA ! ;CODE ...code of DoDoes...

I have not looked at JonesForth nor Andrew's <BUILDS because I assumed
he used the Figforth one, it turned out, that's not the case.

Albert van der Horst

unread,

Oct 2, 2009, 4:14:10 PM10/2/09

to

In article <QYydneFDft5SRVjX...@supernews.com>,

Andrew Haley <andr...@littlepinkcloud.invalid> wrote:
>Richard Russell <ne...@rtrussell.co.uk> wrote:
>> On 2 Oct, 10:22, Andrew Haley <andre...@littlepinkcloud.invalid>
>> wrote:
>> > No, I didn't say that, I said that ISO compliance was irrelevant to
>> > JonesForth, as it stood at the time.
>
>> You seemed to agree with Albert's assertion that:
>
>> > > > A DOES> that allocates spaces is incompatible with ISO.
>> > > > The standard allows DOES> to fill in a pointer, not more.
>
>> That reads to me that your implementation is non-compliant.
>
>Let's recap, carefully. Albert said that "A DOES> that allocates
>spaces is incompatible". He's right, it is. But the runtime action
>of my DOES> just fills in the code field with an appropriate pointer.

I want to make this somewhat more precise. A DOES> that generates
part of a header can't be compatible. The reason is that a following
DOES> must overwrite the action of a preceding one. You can't generate
a part of the header two times.
In ciforth (an indirect thread Forth) DOES> merely store HERE at
an appropriate place. Then the words following DOES> are high
level code that are compiled. So DOES> doesn't change the dictionary
pointer.
In a direct threaded Forth or native code some glue code may be
necessary before high level words can be compiled.
I didn't mean that such glue code would be forbidden.
In that case DOES> does change the dictionary pointer and the
standard has nothing to say that disallows this.

>> In the 'alternative' implementation of <BUILDS...DOES> that I ended
>> up using, the only thing that DOES> does is to fill in a pointer.
>
>Yes, but the <BUILDS that works with that DOES> adds an extra field,
>which renders it noncompliant.

Why? The standard requires that DOES> fills in "something".
This must be filled in "somewhere". So CREATE ( <BUILDS )
has to accommodate that. As long as CREATE works nicely with
>BODY all is well.

>
>> The relevant Hayes test is this; do you believe your proposed
>> CREATE...DOES> will work?
>
>> T{ : DOES1 DOES> @ 1 + ; -> }T
>> T{ : DOES2 DOES> @ 2 + ; -> }T
>> T{ CREATE CR1 -> }T
>> T{ CR1 -> HERE }T
>> T{ ' CR1 >BODY -> HERE }T
>> T{ 1 , -> }T
>> T{ CR1 @ -> 1 }T
>> T{ DOES1 -> }T
>> T{ CR1 -> 2 }T
>> T{ DOES2 -> }T
>> T{ CR1 -> 3 }T
>
>> T{ : WEIRD: CREATE DOES> 1 + DOES> 2 + ; -> }T
>> T{ WEIRD: W1 -> }T
>> T{ ' W1 >BODY -> HERE }T
>> T{ W1 -> HERE 1 + }T
>> T{ W1 -> HERE 2 + }T
>
>Yes. You'll have to make sure that >BODY does the right thing.

When I realised that the pointer filled in by DOES> must be
kept separate from the data found by >BODY, that was the point
I could make a conforming implementation.

>
>Andrew.

--
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Richard Russell

unread,

Oct 2, 2009, 4:35:46 PM10/2/09

to

On 2 Oct, 21:14, Albert van der Horst <alb...@spenarnc.xs4all.nl>
wrote:

> >Yes, but the <BUILDS that works with that DOES> adds an extra field,
> >which renders it noncompliant.
> Why? The standard requires that DOES> fills in "something".
> This must be filled in "somewhere". So CREATE ( <BUILDS )
> has to accommodate that. As long as CREATE works nicely
> with >BODY all is well.

I assume the implication is that either of the 'alternative'
implementations of <BUILDS...DOES> (Andrew's and the one from
lisphacker.com) can, in principle, be adapted to become a compliant
CREATE...DOES>. However, although I prefer the latter to Andrew's
(for example it doesn't involve poking a 'call' instruction into data
memory) I don't understand how it works well enough to make the
necessary modification. In particular, it already 'usurps' the code
field of <BUILDS for its own purposes, so I can't simply add the
'variable' code there to give it the required CREATE functionality.

Therefore for practical reasons I'm going with Andrew's code, at least
for the time being. Maybe I'll revisit the issue at some future
point.

Andrew Haley

unread,

Oct 3, 2009, 4:43:51 AM10/3/09

to

Albert van der Horst <alb...@spenarnc.xs4all.nl> wrote:

> >> In the 'alternative' implementation of <BUILDS...DOES> that I ended
> >> up using, the only thing that DOES> does is to fill in a pointer.
> >
> >Yes, but the <BUILDS that works with that DOES> adds an extra field,
> >which renders it noncompliant.

> Why? The standard requires that DOES> fills in "something".
> This must be filled in "somewhere". So CREATE ( <BUILDS )
> has to accommodate that. As long as CREATE works nicely with
> >BODY all is well.

Right. DOES> isn't supposed to put anything into the data field, [*]
that's all I meant. The old fig-Forth <BUILDS DOES> puts an extra
indirection into the word there, as we know, and this renders it
noncompliant.

> >Yes. You'll have to make sure that >BODY does the right thing.

> When I realised that the pointer filled in by DOES> must be
> kept separate from the data found by >BODY, that was the point
> I could make a conforming implementation.

Sure, that's right. With threaded code there are essentially two ways
to do this: either rewrite the code field to point to the code after
DOES> or add a header field for the purpose.

Andrew.

[*] the address that HERE would have returned had it been executed
immediately after the execution of the CREATE that defined xt.

Andrew Haley

unread,

Oct 3, 2009, 4:53:39 AM10/3/09

to

Richard Russell <ne...@rtrussell.co.uk> wrote:
> On 2 Oct, 21:14, Albert van der Horst <alb...@spenarnc.xs4all.nl>
> wrote:
> > >Yes, but the <BUILDS that works with that DOES> adds an extra field,
> > >which renders it noncompliant.
> > Why? The standard requires that DOES> fills in "something".
> > This must be filled in "somewhere". So CREATE ( <BUILDS )
> > has to accommodate that. As long as CREATE works nicely
> > with >BODY all is well.

> I assume the implication is that either of the 'alternative'
> implementations of <BUILDS...DOES> (Andrew's and the one from
> lisphacker.com) can, in principle, be adapted to become a compliant
> CREATE...DOES>. However, although I prefer the latter to Andrew's
> (for example it doesn't involve poking a 'call' instruction into data
> memory) I don't understand how it works well enough to make the
> necessary modification. In particular, it already 'usurps' the code
> field of <BUILDS for its own purposes, so I can't simply add the
> 'variable' code there to give it the required CREATE functionality.

You'd have to add an extra word to every child of CREATE and fix up
>BODY to step over it. It'd work, but it seems absurdly wasteful when
there is an alternative, especially when you consider that CREATE is
often used without DOES>.

Andrew,

Richard Russell

unread,

Oct 4, 2009, 5:46:33 AM10/4/09

to

On 3 Oct, 09:53, Andrew Haley <andre...@littlepinkcloud.invalid>
wrote:

> You'd have to add an extra word to every child of CREATE and fix
> up >BODY to step over it. It'd work, but it seems absurdly wasteful
> when there is an alternative

I strongly disagree. In my opinion adding an extra cell to every
'child' of CREATE is a small price to pay for eliminating the need to
poke a call instruction into data memory. On modern CPUs, executing
code from data memory is very bad practice. If you do a data write to
an address which happens to be in the instruction cache, the CPU is
forced to invalidate the cache (because it assumes you may be using
'self modifying code') with a potentially large execution-time hit.

This issue is addressed in the Intel Optimization Reference Manual:
"Software should avoid writing to a code page in the same 1 KB subpage
of that is being executed, or fetching code in the same 2 KB subpage
of that is currently being written". Code and (writable) data need to
be separated by at least 2 Kbytes.

Fortunately modifying the 'alternative' <BUILDS...DOES> code to be
compliant with CREATE was straightforward once I'd got my brain around
it:

Assembler code (Intel syntax):

dodoes: cmp dword [eax+4],0 ; Has DOES> been executed ?
jz nodoes
lea ebp,[ebp-4]
mov [ebp],esi
mov esi,[eax+4] ; Get pointer stored by DOES>
nodoes: lea eax,[eax+8]
push eax ; Push user data area address
lodsd
jmp [eax] ; NEXT

Forth code:

: CREATE (WORD) (CREATE) dodoes , 0 , ;
: DOES> R> LATEST @ >DFA ! ;
: >BODY 2 CELLS + ;

To me, this is nicer than Andrew's version in almost every way.
There's no need to execute code from data memory, no need to patch a
Code Field at run time, no need for DOES> to be IMMEDIATE, only one
assembler routine rather than two, easier to understand. The sole
disadvantage is the extra cell used by CREATEd words, but as Albert
pointed out that's perfectly compliant so long as >BODY is adjusted
accordingly.

On a Sieve of Eratosthenes benchmark, BB4Wforth v0.34 (which uses the
above code) ran about 20% faster than v0.33 (which used Andrew's
code).

Andrew Haley

unread,

Oct 4, 2009, 6:58:36 AM10/4/09

to

Richard Russell <ne...@rtrussell.co.uk> wrote:
> On 3 Oct, 09:53, Andrew Haley <andre...@littlepinkcloud.invalid>
> wrote:

> > You'd have to add an extra word to every child of CREATE and fix
> > up >BODY to step over it. It'd work, but it seems absurdly
> > wasteful when there is an alternative

> I strongly disagree. In my opinion adding an extra cell to every
> 'child' of CREATE is a small price to pay for eliminating the need
> to poke a call instruction into data memory. On modern CPUs,
> executing code from data memory is very bad practice. If you do a
> data write to an address which happens to be in the instruction
> cache, the CPU is forced to invalidate the cache (because it assumes
> you may be using 'self modifying code') with a potentially large
> execution-time hit.

Sure, but such a write never happens, so it isn't an issue. At least,
it's fairly unlikely, given that the call is in the parent word, not
the child. Do it the other way and you're wasting a cell in the cache
line for every child of CREATE. If the runtime action of DOES> had to
poke a code field that would have a tragic effect on performance, but
of course it doesn't.

The other thing to realize is that CREATE without DOES> is very
common, and it's worth optimizing for that case. In the code below
you're moving the decision about whether DOES> has been used from
compile time to runtime, which is always a bad move. I can't see any
reason not to fix the code field at sompile time.

> dodoes: cmp dword [eax+4],0 ; Has DOES> been executed ?
> jz nodoes
> lea ebp,[ebp-4]
> mov [ebp],esi
> mov esi,[eax+4] ; Get pointer stored by DOES>
> nodoes: lea eax,[eax+8]
> push eax ; Push user data area address
> lodsd
> jmp [eax] ; NEXT

> Forth code:

> : CREATE (WORD) (CREATE) dodoes , 0 , ;
> : DOES> R> LATEST @ >DFA ! ;
> : >BODY 2 CELLS + ;

> To me, this is nicer than Andrew's version in almost every way.
> There's no need to execute code from data memory, no need to patch a
> Code Field at run time,

I'm beginning to wonder if you understand how it works: neither
version does any patching of the code field at runtime, and both
versions have to set a code pointer at compile time.

> On a Sieve of Eratosthenes benchmark, BB4Wforth v0.34 (which uses the
> above code) ran about 20% faster than v0.33 (which used Andrew's
> code).

Interesting, and rather surprising. The original Sieve of
Eratosthenes benchmark in Forth doesn't use DOES> at all. Maybe you
have some other version, I don't know.

Andrew.

Andrew Haley

unread,

Oct 4, 2009, 7:15:00 AM10/4/09

to

Andrew Haley <andr...@littlepinkcloud.invalid> wrote:
> Richard Russell <ne...@rtrussell.co.uk> wrote:
> > On 3 Oct, 09:53, Andrew Haley <andre...@littlepinkcloud.invalid>
> > wrote:

> > > You'd have to add an extra word to every child of CREATE and fix
> > > up >BODY to step over it. It'd work, but it seems absurdly
> > > wasteful when there is an alternative

> > I strongly disagree. In my opinion adding an extra cell to every
> > 'child' of CREATE is a small price to pay for eliminating the need
> > to poke a call instruction into data memory. On modern CPUs,
> > executing code from data memory is very bad practice. If you do a
> > data write to an address which happens to be in the instruction
> > cache, the CPU is forced to invalidate the cache (because it
> > assumes you may be using 'self modifying code') with a potentially
> > large execution-time hit.

> Sure, but such a write never happens, so it isn't an issue. At
> least, it's fairly unlikely, given that the call is in the parent
> word, not the child.

Actually, that's not quite true: there is a pathological case where it
could happen, something like:

: array create does> cells + ;
array foo 20 cells allot

Here, the data may end up in the same cache line the defining word,
but of course the same penalty would occur for a nearby CODE
definition. The only real way to fix this in general is to put code
in one space and data in the other. Maybe something like this is
happening in the sieve.

Andrew.

Andrew Haley

unread,

Oct 4, 2009, 7:33:22 AM10/4/09

to

Andrew Haley <andr...@littlepinkcloud.invalid> wrote:

> : array create does> cells + ;

err, swap cells + . Not enough caffeine, or something.

Andrew.

Richard Russell

unread,

Oct 4, 2009, 11:11:21 AM10/4/09

to

On 4 Oct, 11:58, Andrew Haley <andre...@littlepinkcloud.invalid>
wrote:

> Do it the other way and you're wasting a cell in the cache
> line for every child of CREATE.

I assume you're talking about the data cache now, not the instruction
cache. Any impact on the execution speed because that extra cell
affects cache efficiency is likely to be minuscule (how many children
of CREATE do you have in your programs anyway?).

> The other thing to realize is that CREATE without DOES> is very
> common, and it's worth optimizing for that case.

That's important, yes, but as you saw execution speed actually
*improved* significantly as a result of the modification.

> In the code below you're moving the decision about whether DOES> has
> been used from compile time to runtime, which is always a bad move.

Again, the change made the program significantly *faster*.

> I'm beginning to wonder if you understand how it works: neither
> version does any patching of the code field at runtime, and both
> versions have to set a code pointer at compile time.

The distinction I was making is simply that yours patches the code
field of the child, whereas mine patches the data field (my use of the
term "run time", meaning everything that happens when you 'run' the
Forth program - including the compilation of the words it comprises -
was imprecise).

> Interesting, and rather surprising.

Indeed. I wouldn't have been surprised if it had been the same or
(very) slightly slower. That it was so significantly faster is a
vindication (if I felt I needed one) of the method I am now using.

There's a more fundamental point to consider, that modern processors
often allow 'data' memory to be flagged as non-executable, resulting
in an exception if any attempt is made to run code there. This is
potentially valuable in improving security. Your method rules out
flagging the dictionary that way because it contains executable code.
Mixing code and (writable) data is quite simply bad practice.

Andrew Haley

unread,

Oct 4, 2009, 12:08:48 PM10/4/09

to

Richard Russell <ne...@rtrussell.co.uk> wrote:
> On 4 Oct, 11:58, Andrew Haley <andre...@littlepinkcloud.invalid>
> wrote:
> > Do it the other way and you're wasting a cell in the cache
> > line for every child of CREATE.

> I assume you're talking about the data cache now, not the instruction
> cache.

Yes.

> Any impact on the execution speed because that extra cell affects
> cache efficiency is likely to be minuscule (how many children of
> CREATE do you have in your programs anyway?).

> > The other thing to realize is that CREATE without DOES> is very
> > common, and it's worth optimizing for that case.

> That's important, yes, but as you saw execution speed actually
> *improved* significantly as a result of the modification.

Well, no, I didn't see that, you did! But I take your point.

> > In the code below you're moving the decision about whether DOES>
> > has been used from compile time to runtime, which is always a bad
> > move.

> Again, the change made the program significantly *faster*.

Maybe, but it still shouldn't be hard to remove the runtime check
without affecting anything else, and unless we're looking at something
extremely pathological doing so wouldn't make the code slower.

> > Interesting, and rather surprising.

> Indeed. I wouldn't have been surprised if it had been the same or
> (very) slightly slower. That it was so significantly faster is a
> vindication (if I felt I needed one) of the method I am now using.

It's certainly very odd. I wouldn't rule out the possibility of the
kind of pathological cache behaviour you describe.

> There's a more fundamental point to consider, that modern processors
> often allow 'data' memory to be flagged as non-executable, resulting
> in an exception if any attempt is made to run code there. This is
> potentially valuable in improving security. Your method rules out
> flagging the dictionary that way because it contains executable
> code.

Not necessarily: a fairly common technique is to regard all code,
threaded and native, as code, and treat it accordingly. So, all code
goes in the code segment and all data in the data segment. Of course,
threaded code doesn't need the execute bit set, but it doesn't hurt.

But tweaking the run-time performance of an indirect threaded code
Forth seems rather academic, since ITC is almost the worst possible
choice for contemporary high-end processors. Subroutine threading is
faster and often smaller, and not very much harder to implement. Pure
ITC is something of a historical curio.

Andrew.

Richard Russell

unread,

Oct 4, 2009, 12:20:25 PM10/4/09

to

On 4 Oct, 17:08, Andrew Haley <andre...@littlepinkcloud.invalid>
wrote:

> But tweaking the run-time performance of an indirect threaded code
> Forth seems rather academic

I entirely agree. My motivation was not to "tweak the run-time
performance" (I actually expected it to get slightly worse, and was
pleasantly surprised when it didn't). The motivation was to get rid
of code poked into the dictionary, which I think is horrible for all
the reasons I've given (and because it just feels - well - nasty!).
We'll have to agree to differ on that one.

Albert van der Horst

unread,

Oct 4, 2009, 4:06:32 PM10/4/09

to

In article <a6KdnYuRVMwNVVXX...@supernews.com>,

Andrew Haley <andr...@littlepinkcloud.invalid> wrote:
>
>Not necessarily: a fairly common technique is to regard all code,
>threaded and native, as code, and treat it accordingly. So, all code
>goes in the code segment and all data in the data segment. Of course,
>threaded code doesn't need the execute bit set, but it doesn't hurt.
>
>But tweaking the run-time performance of an indirect threaded code
>Forth seems rather academic, since ITC is almost the worst possible
>choice for contemporary high-end processors. Subroutine threading is
>faster and often smaller, and not very much harder to implement. Pure
>ITC is something of a historical curio.

Indeed tweaking is not a good idea. Still in my opinion ITC can be
a good starting point for optimisation, because ITC tend to
capture quite well the meaning of the code.
The more an optimiser "understands" about code, the better it can
optimise. I have tried to convey that in earlier discussions and
some notes about that can be found on my website (forth lecture 5).

There is also the case of small processors (Atmel, Renesas) with
very small code memory. All code words can fit there and the
high level words in a comparatively large flash.

It may be too early to conclude that ITC is out.

>
>Andrew.

Groetjes Albert

Anton Ertl

unread,

Oct 4, 2009, 3:02:41 PM10/4/09

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
[Traditional ITC implementation of CREATE...DOES> and cache
consistency issues:]

>Actually, that's not quite true: there is a pathological case where it
>could happen, something like:
>
>: array create does> cells + ;
>array foo 20 cells allot
>
>Here, the data may end up in the same cache line the defining word,

Yes. These kinds of problems have plagued various Forth systems since
the instruction caches was separated from the data cache in the
Pentium. E.g, such an issue is the reason why BigForth is about 30
times slower than iForth on cd16sim, and probably also why BigForth is
slower than Gforth on brew and lexex.

You may consider it pathological, but it still occurs in real-world
code, and pretty often. The main programs where it occurs rarely are
the small benchmarks that are often used to evaluate performance.

>but of course the same penalty would occur for a nearby CODE
>definition. The only real way to fix this in general is to put code
>in one space and data in the other.

Yes. But for the DOES> case there are other options:

* let CREATE have a larger code field.

* Use doubly-indirect threaded code (then the jump after DOES> can be
replaced with a plain address).

* As above, but combine it with primitive-centric direct threaded code
for higher performance.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2009: http://www.euroforth.org/ef09/

Anton Ertl

unread,

Oct 4, 2009, 3:17:23 PM10/4/09

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>But tweaking the run-time performance of an indirect threaded code
>Forth seems rather academic, since ITC is almost the worst possible
>choice for contemporary high-end processors. Subroutine threading is
>faster

On what evidence do you base this claim?

> and often smaller,

And this one? If you are thinking about 64-bit threaded code, that
can deal with program sizes that STC cannot deal with; if limiting
yourself to 2GB or so of program memory is ok, then you can use 32-bit
threaded code even on a 64-bit system and will still be smaller than
STC on most platforms. And on most of those where call instructions
take only 32 bits, the address range of these call instructions is
even smaller (e.g., IIRC 4MB on Alpha).

Elizabeth D Rather

unread,

Oct 4, 2009, 8:05:58 PM10/4/09

to

Richard Russell wrote:
> On 4 Oct, 17:08, Andrew Haley <andre...@littlepinkcloud.invalid>
> wrote:
>> But tweaking the run-time performance of an indirect threaded code
>> Forth seems rather academic
>
> I entirely agree. My motivation was not to "tweak the run-time
> performance" (I actually expected it to get slightly worse, and was
> pleasantly surprised when it didn't). The motivation was to get rid
> of code poked into the dictionary, which I think is horrible for all
> the reasons I've given (and because it just feels - well - nasty!).
> We'll have to agree to differ on that one.

You should realize that traditional implementations of Forth have always
intermingled code and data space. Most Forths had (and still have) an
assembler for the underlying processor, and code definitions share the
same dictionary space with colon definitions and data objects of all
kinds. This is still the case on many (probably most) implementations,
particularly embedded systems (except, of course, on Harvard
architecture parts).

Some 16-bit implementations on x86 architectures experimented with
multi-segment dictionaries in the 80's, but they were soon superseded by
32-bit implementations that reverted to a single, unified dictionary.

So, culturally, mixing code and data isn't a taboo in Forth (just as
global variables don't have a stigma).

Yes, modern cache schemes alter the tradeoffs if you're really pressing
for runtime performance, but, then, if you really want to reach for
runtime performance you'd go with subroutine threading, inlining small
primitives, and an optimizing compiler. If your need isn't that great,
I see no reason to sweat mixing code and data.

Cheers,
Eluzabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Andrew Haley

unread,

Oct 5, 2009, 4:25:41 AM10/5/09

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
> [Traditional ITC implementation of CREATE...DOES> and cache
> consistency issues:]
> >Actually, that's not quite true: there is a pathological case where it
> >could happen, something like:
> >
> >: array create does> cells + ;
> >array foo 20 cells allot
> >
> >Here, the data may end up in the same cache line the defining word,

> Yes. These kinds of problems have plagued various Forth systems since
> the instruction caches was separated from the data cache in the
> Pentium. E.g, such an issue is the reason why BigForth is about 30
> times slower than iForth on cd16sim, and probably also why BigForth is
> slower than Gforth on brew and lexex.

> You may consider it pathological, but it still occurs in real-world
> code, and pretty often. The main programs where it occurs rarely are
> the small benchmarks that are often used to evaluate performance.

I would have expected that precisely the reverse was true, since small
benchmarks are where small defining words and there children are
likely to be close enough to be on the same cache line.

Andrew.

Andrew Haley

unread,

Oct 5, 2009, 4:32:56 AM10/5/09

to

Albert van der Horst <alb...@spenarnc.xs4all.nl> wrote:

> In article <a6KdnYuRVMwNVVXX...@supernews.com>,
> Andrew Haley <andr...@littlepinkcloud.invalid> wrote:
> >
> >But tweaking the run-time performance of an indirect threaded code
> >Forth seems rather academic, since ITC is almost the worst possible
> >choice for contemporary high-end processors. Subroutine threading
> >is faster and often smaller, and not very much harder to implement.
> >Pure ITC is something of a historical curio.

> Indeed tweaking is not a good idea. Still in my opinion ITC can be
> a good starting point for optimisation, because ITC tend to
> capture quite well the meaning of the code.

I don't really understand your meaning. In what way can one form of
threading capture the "meaning" any better than another? I don't
understand how

' foo , ' bar , ' baz ,

captures any more than

foo call bar call baz call

?

Andrew.

Richard Russell

unread,

Oct 5, 2009, 4:48:43 AM10/5/09

to

On 5 Oct, 01:05, Elizabeth D Rather <erat...@forth.com> wrote:
> You should realize that traditional implementations of Forth have always
> intermingled code and data space.

Oh, I do. For example I don't have the same 'gut objection' to words
defined as CODE (so long as the Forth has a reasonable built-in
assembler), especially as I would have expected it to be possible, in
principle, to locate the actual assembled code somewhere else and
simply point the Code Field at it.

However there's no way I am going to be persuaded that Andrew's code:

: DOES> ( - a) ' DODOES , E8 C, (DOES) HERE @ 4+ - , ;

is tolerable, especially when there's a perfectly good alternative.

> So, culturally, mixing code and data isn't a taboo in Forth

Whatever the 'culture', it's an unavoidable requirement on modern
processors to separate code from writable data, if performance is
important. It may be the case that in traditional threaded Forths
this is difficult or impossible to achieve completely, but if
alternative approaches are available (as in the DOES> case) then I
would definitely say choose the one which doesn't mix code and data!

> (just as global variables don't have a stigma).

I wouldn't necessarily say that global variables deserve to have a
stigma in any language, so long as they're used solely for genuinely
global objects. The problem arises when global variables are used as
a subtitute for local variables (or, in Forth, the stacks) out of
laziness or a lack of support for 'information hiding' by the
language.

Anton Ertl

unread,

Oct 5, 2009, 5:08:28 AM10/5/09

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:

>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>> You may consider it pathological, but it still occurs in real-world
>> code, and pretty often. The main programs where it occurs rarely are
>> the small benchmarks that are often used to evaluate performance.
>
>I would have expected that precisely the reverse was true, since small
>benchmarks are where small defining words and there children are
>likely to be close enough to be on the same cache line.

Defining data close to the code that deals with it (or vice versa)
seems to be a common pattern in Forth, and the principle of modularity
suggests it.

In a small benchmark you can get lucky and the code and data happen to
be in different consistency regions; or you have a big sieve or large
matrix to deal with so that most accesses are far from the code; or
you compute fibonacci numbers and don't store to memory at all. But
in a larger application there is lots of variables interspersed with
lots of code, and you cannot be lucky every time. And the cache miss
cost is so high that even occasional occurences of this problem lead
to noticable performance degradation.

BTW, on the Pentium 4 the cache consistency region is 1KB.

Andrew Haley

unread,

Oct 5, 2009, 9:07:49 AM10/5/09

to

Richard Russell <ne...@rtrussell.co.uk> wrote:
> On 5 Oct, 01:05, Elizabeth D Rather <erat...@forth.com> wrote:

> > You should realize that traditional implementations of Forth have
> > always intermingled code and data space.

> Oh, I do. For example I don't have the same 'gut objection' to
> words defined as CODE (so long as the Forth has a reasonable
> built-in assembler), especially as I would have expected it to be
> possible, in principle, to locate the actual assembled code
> somewhere else and simply point the Code Field at it.

> However there's no way I am going to be persuaded that Andrew's code:

> : DOES> ( - a) ' DODOES , E8 C, (DOES) HERE @ 4+ - , ;

> is tolerable, especially when there's a perfectly good alternative.

Just for correctness' sake I must point out that this isn't my code:
it's been standard in many Forth systems since about 1980. For
example, here's F83:

T: DOES> (S -- )
[FORWARD] <(;CODE)> HERE-T ( DOES-OP ) 232 C,-T
[[ ASSEMBLER DODOES ]] LITERAL HERE 2+ - ,-T T;

I don't think I had ever the same "gut objection" as you, but it was
nearly thirty years ago, so maybe I've forgotten.

> > So, culturally, mixing code and data isn't a taboo in Forth

> Whatever the 'culture', it's an unavoidable requirement on modern
> processors to separate code from writable data, if performance is
> important.

Sure, but as I pointed out before, that can be acheived by separating
code (all kinds) from read/write data, so it's irrelevant to this
particular case.

Andrew.

Richard Russell

unread,

Oct 5, 2009, 10:54:18 AM10/5/09

to

On 5 Oct, 14:07, Andrew Haley <andre...@littlepinkcloud.invalid>
wrote:

> Sure, but as I pointed out before, that can be acheived by separating
> code (all kinds) from read/write data, so it's irrelevant to this
> particular case.

It's not irrelevant to BB4Wforth. I had a straight choice to use
'your' CREATE...DOES> code (which risks the instruction cache problem)
and the 'alternative' code (which doesn't).

By choosing the 'alternative' method I've eliminated any possibility
of writable data and code being in close proximity in out-of-the-box
BB4Wforth. Of course if a user creates a CODE definition then the
issue resurfaces, but then it's *his* problem not mine (and since
BB4Wforth doesn't - currently - have an assembler it's not likely).

If I ever choose to incorporate an assembler in BB4Wforth (and since
BBC BASIC has a built-in assembler that's quite practical) then I
would want to devise a scheme whereby separation between writable data
and a CODE definition is guaranteed.

It seems to me to be a counsel of despair to say that because it's
difficult to maintain the separation in threaded Forths (and because
there are other types of implementation more suited to separating code
and data) that one shouldn't even try. That goes against the
principles I've tried to adhere to in over 30 years of programming.