history and rationale for CREATE vs. <BUILDS

Greg

unread,

Mar 14, 2006, 12:55:07 PM3/14/06

to

Does anyone know the history/reason behind the switch from <BUILDS
DOES> to the CREATE DOES> form?

What was it that suddenly allowed CREATE to replace <BUILDS?

I suspect it could have something to do with threading mechanisms, but
I could be wrong about that.

Regards,
-- Greg

Andrew Haley

unread,

Mar 14, 2006, 2:19:41 PM3/14/06

to

Greg <gsch...@ra.rockwell.com> wrote:
> Does anyone know the history/reason behind the switch from <BUILDS
> DOES> to the CREATE DOES> form?

> What was it that suddenly allowed CREATE to replace <BUILDS?

It was a redesign. In the old model, <BUILDS required an extra word
of data beyond that used by CREATE.

I'm not going to attempt ASCII art because if I do I'll be here all
night, but a child of <BUILDS was like this:

CFA: (ptr to <builds runtime action)
PFA: (ptr to DOES> high level code)
... user data

the new model is:

CFA: (ptr to thunk)
... user data

and the thunk is a magic bit of machine code, usually a single
instruction, that is placed in the threaded code immediately before
the DOES> action. This works with any kind of threading.

Andrew.

Greg

unread,

Mar 14, 2006, 2:47:52 PM3/14/06

to

Andrew Haley wrote:

> and the thunk is a magic bit of machine code, usually a single
> instruction, that is placed in the threaded code immediately before
> the DOES> action. This works with any kind of threading.
>
> Andrew.

I'm playing around with a subroutine threaded Forth written in C. As
you pointed out, the <BUILDS creates an additional word beyond the CFA.
I'd like to remove that word and I can't see how to do it in 'C'
without resorting to machine code as you mentioned above.

The problem is that 'C' doesn't provide direct access to the machine
stack which is what that little bit of magic code needs to have in
order to push the address of the calling word's PFA onto the data
stack.

Is my understanding correct?

Regards,
-- Greg

Greg

unread,

Mar 14, 2006, 2:48:14 PM3/14/06

to

Andrew Haley wrote:

> and the thunk is a magic bit of machine code, usually a single
> instruction, that is placed in the threaded code immediately before
> the DOES> action. This works with any kind of threading.
>
> Andrew.

I'm playing around with a subroutine threaded Forth written in C. As

Andrew Haley

unread,

Mar 15, 2006, 5:33:42 AM3/15/06

to

Greg <gsch...@ra.rockwell.com> wrote:

> Andrew Haley wrote:

>> and the thunk is a magic bit of machine code, usually a single
>> instruction, that is placed in the threaded code immediately before
>> the DOES> action. This works with any kind of threading.

> I'm playing around with a subroutine threaded Forth written in C.

Hmm. I can't see how you do this in C. Threaded code works, no
problem, but subroutine threading? How?

> As you pointed out, the <BUILDS creates an additional word beyond
> the CFA. I'd like to remove that word and I can't see how to do it
> in 'C' without resorting to machine code as you mentioned above.

> The problem is that 'C' doesn't provide direct access to the machine
> stack which is what that little bit of magic code needs to have in
> order to push the address of the calling word's PFA onto the data
> stack.

> Is my understanding correct?

It is. You're going to have to do something pretty devious to get
this to work.

Andrew.

Greg

unread,

Mar 15, 2006, 8:42:42 AM3/15/06

to

[Hmm. I can't see how you do this in C. Threaded code works, no
problem, but subroutine threading? How? ]

I should have been more precise. What I am doing is really not
subroutine threading. I do have an inner interpreter. The code field
contains a pointer to a native C routine and the body contains a list
of CFAs. My compiler supports tail recursion optimization so a long
chain of calls does not use the return stack.

Even so, to implement ;CODE in a way that supports CREATE DOES> would
seem to require dropping down into assembly to make things work. Maybe
that's not even possible as it just occured to me that with the tail
recursion optimization, the return address wouldn't even be available
on the machine stack.

Regards,
-- Greg

Andrew Haley

unread,

Mar 15, 2006, 9:15:03 AM3/15/06

to

Greg <gsch...@ra.rockwell.com> wrote:
> [Hmm. I can't see how you do this in C. Threaded code works, no
> problem, but subroutine threading? How? ]

> I should have been more precise. What I am doing is really not
> subroutine threading. I do have an inner interpreter. The code field
> contains a pointer to a native C routine and the body contains a list
> of CFAs.

A lit of addresses pointing to code -- that's direct threaded code,
more or less.

> My compiler supports tail recursion optimization so a long chain of
> calls does not use the return stack.

> Even so, to implement ;CODE in a way that supports CREATE DOES> would
> seem to require dropping down into assembly to make things work.

It would, yes.

> Maybe that's not even possible as it just occured to me that with
> the tail recursion optimization, the return address wouldn't even be
> available on the machine stack.

You could drop into assembler and handle it yourself. It's only a few
instructions, after all.

Andrew.

Anton Ertl

unread,

Mar 15, 2006, 12:04:56 PM3/15/06

to

"Greg" <gsch...@ra.rockwell.com> writes:
>I'm playing around with a subroutine threaded Forth written in C. As
>you pointed out, the <BUILDS creates an additional word beyond the CFA.
> I'd like to remove that word and I can't see how to do it in 'C'
>without resorting to machine code as you mentioned above.

You might ve interested in the discussion of this topic in
<http://www.complang.tuwien.ac.at/papers/ertl02.ps.gz>

>The problem is that 'C' doesn't provide direct access to the machine
>stack which is what that little bit of magic code needs to have in
>order to push the address of the calling word's PFA onto the data
>stack.
>
>Is my understanding correct?

Basically, yes. However, there are ways to work around this problem,
discussed in the paper above:

* Keeping the additional cell, and having it for all words (at least
for all CREATEd words). The additional cell also turned out to be
beneficial when we implemented inlining. This is what Gforth uses.

* Or going with doubly-indirect threading; basically the second
indirection turns the machine-code jump near the DOES> into an
ordinary pointer. To avoid having the cost of the double
indirection for every NEXT, you can use hydrid
direct/doubly-indirect primitive-centric threaded code.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.complang.tuwien.ac.at/forth/ansforth/forth200x.html

Greg

unread,

Mar 15, 2006, 9:33:44 PM3/15/06

to

Anton Ertl wrote:
> "Greg" <gsch...@ra.rockwell.com> writes:

> >Is my understanding correct?
>
> Basically, yes. However, there are ways to work around this problem,
> discussed in the paper above:

Thanks for the link to your paper.

As you suggest, maybe the difference in word layout issue isn't as bad
as I thought. After all, I do have a word which will return the PFA
correctly in either case (by looking at the contents of the CFA as the
distinguishing factor) so the difference is hidden after all.

As a tangential matter, I have to say that although I like the idea of
uniformity which eliminates the need for <BUILDS, at least
aesthetically, I prefer the antiquated <BUILDS DOES> construct as it
was originally designed to be a balanced pair, much like the
colon/semicolon pair of colon definitions. I suppose that <BUILDS
could simply be defined as an alias for CREATE, but perhaps some might
object to using what may be considered an anachronism.

Anyway, thanks for everyone's insights on this matter.

Regards,
-- Greg