Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parrot Forth 0.1

0 views
Skip to first unread message

Matt Diephouse

unread,
Oct 14, 2004, 8:26:59 PM10/14/04
to perl6-i...@perl.org
Parrot Forth

Released: 14 October 2004
Version: 0.1
Download: http://matt.diephouse.com/software/parrot-forth-0.1.tar.gz

This is the initial release of my re-implementation of Parrot Forth in
PIR. Code reviews are both welcome and appreciated (PIR is kind of
new, so I may not be doing everything correctly). The two main goals
of this implementation are: to be a complete example of how to
implement a Parrot compiler and to be as clean and readable as
possible.

Current Features:
o 25 Built-In Words (use `words` to see them)
o A Nice Prompt
o Initial Support for ." (printing a string)
o Basic Integer Support
o The Ability to Define New Words (using `: ... ;`)

Still to Come:
o Strings
o Variables
o Constants
o Loops
o Conditionals
o Numeric Bases
o More Words

To use it, untar, cd into the directory and run `parrot forth.pir`. It
needs to be executed from inside this direcotry, so you may need to
run `../../parrot forth.pir` or something similar.

Enjoy!

--
matt diephouse
http://matt.diephouse.com

Michel Pelletier

unread,
Oct 17, 2004, 11:07:11 PM10/17/04
to Matt Diephouse, perl6-i...@perl.org
> Parrot Forth
>
> Released: 14 October 2004
> Version: 0.1
> Download:
> http://matt.diephouse.com/software/parrot-forth-0.1.tar.gz
>
> This is the initial release of my
> re-implementation of Parrot Forth in
> PIR. Code reviews are both welcome and
> appreciated (PIR is kind of
> new, so I may not be doing everything
> correctly). The two main goals
> of this implementation are: to be a complete
> example of how to
> implement a Parrot compiler and to be as clean
> and readable as
> possible.

This is my first chance to take a look at it but
I'm sorry I've nto been able to run it because
I'm on a different machine. I did look at the
code though.

It's no suprise that I too am writing a forth
like language in PIR, but we have gone in some
different directions. Once again I might be
wrong about the below because I didn't get a
good chance to analyze it. There ar some
differences, like I keep the stack in a
register, you keep yours in a global, and you
store your core words in an "operations" global,
and I use a Parrot lexical pad.

I have no idea if storng something in a register
is worse than a global. I have certainly had
problems with IMCC stomping my well known
registers with $P? temp vars. Lexical pads vs
keeping your own hash are probably equivalent,
but perhaps pads have some cross-language
benefit.

This is a good opertunity thought to discuss
some of the things that I have been doing with
Parakeet. I have completely re-written the
parakeet core to be only a simple eval compile
loop that unerstand only one word pattern:

code <name>
... PIR ...
next

This is, of course, standard Forth. With this
micro kernel I can then bootstrap the remaining
Parakeet wordset from an outside file. A
Standard Forth compliant wordset could be
bootstrapped just as easily.

My problem now is that even this minimal
Parakeet micro-kernel, parser, numeric
converter, is still pretty Parakeet specific.
Where the machine state goes (stack, IP (if you
have one), etc) and where names are kept (Forth
would use a "dictionary", Parakeet uses nested
lexical scope).

I propose you and I work together to make a
totally Forth-language agnostic Forth
micro-kernel. This kernel can be very
minimalistic, a stacik, a machine state hash,
and definitions for the words "code", "next",
"word", and "'" (tick) all having standard Forth
behavior, a simple dictionary and a simple eval
loop.

The micro kernel then just goes through some
common stuff (parse args, load files, init
blocks, etc.) and then bootstrap the larger
language from some optionally specified language
file.

At this point, a standard Forth bootstrap file
would contain things like:

code __init_forth
# initialize more specific state for the language
P30 = new .PerlArray # memory buffer
P29 = new .PerlArray # data stacik
P28 = new .PerlArray # return stack
# ... or whatever
next

__init_forth

code @
# PIR definition for fetch
next

code !
# PIR definition for store
next

...

etc. In the init word, you bootstrap the
machine/language spacific stuff that you intend
to use in later words. I use these constant P
vars as an example here, your idea of using
globals is better.

Parakeet can go its other direction defining a
completely different language, but based on some
shared code that we can both maintain.

Some Parakeet ideas might also be used in your
code, for example, it looks to me like your code
does direct threading:

NOT_DOT_QUOTE:
# check to see if it's an operation
$P0 = find_global "operations"
$S0 = downcase word
$I0 = exists $P0[$S0]
if $I0 != 1 goto NOT_AN_OP
pir_code .= "$P0 = ops['"
pir_code .= $S0
pir_code .= "']\n"
pir_code .= "$P0()\n"
goto RETURN

Direct threading is a common Forth
implementation technique, but it was most often
used because it could be implemented portably in
C with only a small bit of asm. For smaller ops
like @ !, math ops, amd many others, it is more
optimal to use direct code generation to
"inline" the PIR code itself instead of
inlineing an invoke to the PIR code compiled as
as sub.

Sor the word:

: square dup * ;

in the direct thread case would become (in
pseudo PIR):

.sub blah
$P0 = ops["dup"]
$P0()
$P0 = ops["*"]
$P0()
.NEXT
.end

but using native code generation would ineline
the definitions for dup and mul:

.sub ncg
.POP
.NOS = .TOS
.TOS = .TOS * .NOS
.PUSH
.NEXT
.end

resulting in a lot less overhead for core words.
NCG was usually either a commercial feature or
rarely seen in Forth because it was
non-portable, being written in ASM, and
expensive to maintain and multiple platforms.
We can kick that problem to the door.

Do you think this is a good idea? I can certain
help along with implementing words that Parakeet
and Forth share. I myself have never
implemented a complete Forth, but I've given a
couple a stab, both direct thread models.

-Michel

Matt Diephouse

unread,
Oct 18, 2004, 1:29:42 PM10/18/04
to mic...@dialnetwork.com, perl6-i...@perl.org
On Sun, 17 Oct 2004 22:07:11 -0500 (CDT), Michel Pelletier
<mic...@dialnetwork.com> wrote:
> This is my first chance to take a look at it but
> I'm sorry I've nto been able to run it because
> I'm on a different machine. I did look at the
> code though.

Thanks for the feedback. I don't have time to respond to everything
right now, but I thought I'd at least send an initial reply.

> good chance to analyze it. There ar some
> differences, like I keep the stack in a
> register, you keep yours in a global, and you
> store your core words in an "operations" global,
> and I use a Parrot lexical pad.

I've renamed this global to "words" since the release. Naming it
"operations" reflects my Forth naivety.

> I have no idea if storng something in a register
> is worse than a global. I have certainly had
> problems with IMCC stomping my well known
> registers with $P? temp vars. Lexical pads vs
> keeping your own hash are probably equivalent,
> but perhaps pads have some cross-language
> benefit.

I think using a global is probably the only way to get around the
register stomping.

I don't think there would be any cross-language benefit from using
lexical pads. In other languages, yes, but since Forth is built around
a stack, another language can't very well call a Forth subroutine. It
makes more sense to eval something like "1 2 sub .".

I'd be interested in knowing how store_global/find_global differ in
terms of speed/implementation. The same goes for the lexical opcodes.
Dan or Leo?

> I propose you and I work together to make a
> totally Forth-language agnostic Forth
> micro-kernel. This kernel can be very
> minimalistic, a stacik, a machine state hash,
> and definitions for the words "code", "next",
> "word", and "'" (tick) all having standard Forth
> behavior, a simple dictionary and a simple eval
> loop.

I'll reply to this portion of your email later, when I get time to
think and to look at the Parakeet code.

> Some Parakeet ideas might also be used in your
> code, for example, it looks to me like your code
> does direct threading:

...

> Direct threading is a common Forth
> implementation technique, but it was most often
> used because it could be implemented portably in
> C with only a small bit of asm. For smaller ops
> like @ !, math ops, amd many others, it is more
> optimal to use direct code generation to
> "inline" the PIR code itself instead of
> inlineing an invoke to the PIR code compiled as
> as sub.

...



> resulting in a lot less overhead for core words.
> NCG was usually either a commercial feature or
> rarely seen in Forth because it was
> non-portable, being written in ASM, and
> expensive to maintain and multiple platforms.
> We can kick that problem to the door.

I'm not sure that's right. I did think about putting the code inline
(and it would be a trivial change to do so), but I'm not convinced it
would be faster. Yes, you wouldn't have to deal with the overhead
involved with making subroutine calls, but IMCC would also have to
re-parse and re-compile the code every time.

I really ought to benchmark it (later, when there's more time, I
guess), but I wouldn't be surprised if calling the subroutines was
faster. I can also investigate using the fastcall pragma since there
are no parameters. Of course, I would need to add in the cost of doing
a hash lookup as well, so you may be right.

Ultimately, I don't care about speed so much for Forth. Maybe I
should. I don't really plan on using it, so this is more of an
exercise than anything else. It's more important to me that the
implementation be clean and readable than for it to be fast. I want
there to be a low learning curve.

--
matt

Michel Pelletier

unread,
Oct 18, 2004, 3:17:59 PM10/18/04
to ma...@diephouse.com, mic...@dialnetwork.com, perl6-i...@perl.org

>> I propose you and I work together to make a
>> totally Forth-language agnostic Forth
>> micro-kernel. This kernel can be very
>> minimalistic, a stacik, a machine state hash,
>> and definitions for the words "code", "next",
>> "word", and "'" (tick) all having standard
>> Forth
>> behavior, a simple dictionary and a simple
>> eval
>> loop.
>
> I'll reply to this portion of your email later,
> when I get time to
> think and to look at the Parakeet code.

Okay, note that the code I mentioned (the
speration of core from core words) is not
checked in right now, but the version in CVS
does do NCG.

But only at compile time or interactive
interpretation time. Not at runtime. Consider
the code typed into the Forth interpreter:

2 dup * .

would of course print '4'. Your correct that
using NCG this would require compiling new PIR
every time it is typed in, but *only* when you
are working interactively. The time it takes to
do this is infinitessimal compared to the time
it takes to type it in.

For an already compiled word being executed,
however, NCG is *much* faster than calling
subroutines. Consider:

: square dup * ;

in psudo-pir, given the definitions of dup and *:

.sub dup
.POP
.NOS = .TOS
.PUSH2
.end

.sub mul:
.POP2


.TOS = .TOS * .NOS

.PUSH2
.end

using direct thrading this would rsult in the
execution of:

find_global $P0, "dup"
invoke $P0
find_global $P0, "mul"
invoke $P0

in NCG it would result in the execution of:

.POP
.NOS = .TOS
.PUSH2 # this can be optimized out
.POP2 # of NCG, but not direct
threading


.TOS = .TOS * .NOS
.PUSH

Now call this word from a loop:
: square_to_thousand
1000 0 do
i square .
loop
;

Using the direct threading model, this does 2000
global lookups and subroutine invokes, which in
turn, do the actual "work" of 1000
multiplications and the associated stack
traffic. The lookups and invokes are pure
inner-loop overhead.

Using NCG this does 1000 multiplications and the
associated stack traffic (which can be optimized
out for the most part) with no lookups or
invokes.

The overhead of diect threading vs. NCG does not
need to be benchmarked, it can be proven by
argument: both methods execute the same code the
same way, but the NCG method does 2000 less
global lookups and invokes.

The "extra" compiler overhead is trivial, and it
only applies to compile-time; generally when a
program is started. At run-time (when all those
lookups and invokes are happening in the direct
thread case) there is no additional compilation
overhead because a word is compiled only once.

Almost all other Forth's that you may see either
direct or indirect thread; this is not because
it is faster (it isn't) or simpler (not much),
but because it is portable and requires no or
little asm. If there were only one assembly
language in the world then NCG would be the
*only* way to write a forth interpreter,
threading of any kind wouldn't make sense.

-Michel

Matt Diephouse

unread,
Oct 18, 2004, 3:58:45 PM10/18/04
to mic...@dialnetwork.com, perl6-i...@perl.org
On Mon, 18 Oct 2004 14:17:59 -0500 (CDT), Michel Pelletier
<mic...@dialnetwork.com> wrote:
> Okay, note that the code I mentioned (the
> speration of core from core words) is not
> checked in right now, but the version in CVS
> does do NCG.

Noted.

> Using the direct threading model, this does 2000
> global lookups and subroutine invokes, which in
> turn, do the actual "work" of 1000
> multiplications and the associated stack
> traffic. The lookups and invokes are pure
> inner-loop overhead.
>
> Using NCG this does 1000 multiplications and the
> associated stack traffic (which can be optimized
> out for the most part) with no lookups or
> invokes.
>
> The overhead of diect threading vs. NCG does not
> need to be benchmarked, it can be proven by
> argument: both methods execute the same code the
> same way, but the NCG method does 2000 less
> global lookups and invokes.

Indeed. Pardon my ignorance. I hadn't thought things all the way through.

> The "extra" compiler overhead is trivial, and it
> only applies to compile-time; generally when a
> program is started.  At run-time (when all those
> lookups and invokes are happening in the direct
> thread case) there is no additional compilation
> overhead because a word is compiled only once.

This still doesn't seem right. The compilation from Forth to PIR only
happens once, yes. But each time the defined word is used, the PIR
code, which is injected, must be compiled to bytecode.

You said earlier that:

> direct thrading this would rsult in the
> execution of:
>
> find_global $P0, "dup"
> invoke $P0
> find_global $P0, "mul"
> invoke $P0
>
> in NCG it would result in the execution of:
>
> .POP
> .NOS = .TOS
> .PUSH2 # this can be optimized out
> .POP2 # of NCG, but not direct threading
> .TOS = .TOS * .NOS
> .PUSH

The second PIR sequence is longer. It will take IMCC more time to
compile that than the first example. As the words become less trivial,
this will become more true.

But like you said, this only happens at (a) compile time or (b) at the
interactive prompt. And optimizing out push/pop combos will speed
things up more, though I'm not sure how to implement that optimization
using PIR.

So it may be programs can fall on either side of the fence of this
issue. Building words in terms of other words will give NCG an
advantage. But using relatively simple words many times will give
direct threading an advantage. But I do believe you when you say that
NCG is fastest overall (read, for most programs).

Furthermore, our two models will behave differently when you redefine
a word. Consider this Forth example:

: inc 1 + ;
: print+ inc . ;
: inc 2 + ;

Should print+ increment by one or by two? gforth increments by one.
I'd be interesting in knowing which was the "correct" behavior.

--
matt

Michel Pelletier

unread,
Oct 18, 2004, 6:31:05 PM10/18/04
to ma...@diephouse.com, mic...@dialnetwork.com, perl6-i...@perl.org
> This still doesn't seem right. The compilation
> from Forth to PIR only
> happens once, yes. But each time the defined
> word is used, the PIR
> code, which is injected, must be compiled to
> bytecode.

RIght.

> The second PIR sequence is longer. It will take
> IMCC more time to
> compile that than the first example. As the
> words become less trivial,
> this will become more true.

But one can't weigh the compile-time overhead
against the run-time overhead like that. What
if your inner loop runs for many days when
compilation of the NCG code takes only a couple
of milliseconds more?

> But like you said, this only happens at (a)
> compile time or (b) at the
> interactive prompt.

Right.

> And optimizing out push/pop
> combos will speed
> things up more, though I'm not sure how to
> implement that optimization
> using PIR.

Well that's a good question because I haven't
dont it yet. The simple hack aproach is to
pattern replace back to back PUSH/POP pairs
right out with string search and replace. I
estimate this could eliminate up to half of the
data stack traffic on average, depending on the
code of course.

The whole-hog way to do it would be to have a
stack data analysis algorithm try to cache as
much of the stack in P registers as possible,
spilling when necessary. This could probably
eliminate almost all stack data traffic except
for extremely pathological cases.

The Python interpreter could use this method too
to really spank CPython, which has implicit
stack traffic that cannot be easily optimized
out.

> Furthermore, our two models will behave
> differently when you redefine
> a word. Consider this Forth example:
>
> : inc 1 + ;
> : print+ inc . ;
> : inc 2 + ;
>
> Should print+ increment by one or by two? gforth
> increments by one.

I"ve made a pretty big mistake so far by calling
indirect threading direct threading. The
lookup/invoke sequence is really indirect
threading. Direct threading would be if we
could somehow compile into PIR a literal
"pointer" to a sub instead of having to look it
up by name. I don't think this can be done.

gforth keeps the old behavior because it uses
direct threading, the pointer never changes
inside the compiled body of print+ even though
the word definition later does.

In the case of words defined with ":" and ";"
even NCG still does indirect threading via a
lookup and invoke, NCG only inlines CORE word
definitions, words that are defned in PIR and
form the basis for all high level words, but the
high level words themselves are indirect
threaded.

I should have mentioned this before, but this
doesn't invalidate my previous example:

: square dup * ;

: square_to_thousand
1000 0 do
i square .
loop
;

1000 lookups and invokes are still required to
find the high level word "square" in either
indirect threading or NCG (and direct threading
still requires 1000 invokes), but 2000 lookups
and invokes are still eliminated from the inner
loop with NCG because "dup" and "*", which are
core words, are inlined.

Whether or not an old definition is retained if
a word is redefined is a different question, in
the case of Parakeet, it will increment by two
because all high level words are looked up by
name at run-time via indirect threading.

> I'd be interesting in knowing which was the
> "correct" behavior.

I suspect it is implementation defined, but
unfortunately taygeta.com is not working for me
right now.

-Michel

Matt Diephouse

unread,
Oct 18, 2004, 8:31:39 PM10/18/04
to mic...@dialnetwork.com, perl6-i...@perl.org
On Mon, 18 Oct 2004 17:31:05 -0500 (CDT), Michel Pelletier
<mic...@dialnetwork.com> wrote:
> > The second PIR sequence is longer. It will take
> > IMCC more time to
> > compile that than the first example. As the
> > words become less trivial,
> > this will become more true.
>
> But one can't weigh the compile-time overhead
> against the run-time overhead like that. What
> if your inner loop runs for many days when
> compilation of the NCG code takes only a couple
> of milliseconds more?

Hence my statement:

> > So it may be programs can fall on either side of the fence of this
> > issue. Building words in terms of other words will give NCG an
> > advantage. But using relatively simple words many times will give
> > direct threading an advantage. But I do believe you when you say that
> > NCG is fastest overall (read, for most programs).

> Well that's a good question because I haven't


> dont it yet. The simple hack aproach is to
> pattern replace back to back PUSH/POP pairs
> right out with string search and replace. I
> estimate this could eliminate up to half of the
> data stack traffic on average, depending on the
> code of course.

Right. I'm not used to thinking without regexes. `index` and `substr`
can be used for this.

> The whole-hog way to do it would be to have a
> stack data analysis algorithm try to cache as
> much of the stack in P registers as possible,
> spilling when necessary. This could probably
> eliminate almost all stack data traffic except
> for extremely pathological cases.

I'm not /that/ interested in speed.

> The Python interpreter could use this method too
> to really spank CPython, which has implicit
> stack traffic that cannot be easily optimized
> out.

Here I'd be interested in it. :) But this assumes the Python
implementation will be stack based. Just because CPython is doesn't
mean Parrot Python will be, does it?

> > Furthermore, our two models will behave
> > differently when you redefine
> > a word. Consider this Forth example:
> >
> > : inc 1 + ;
> > : print+ inc . ;
> > : inc 2 + ;
> >
> > Should print+ increment by one or by two? gforth
> > increments by one.
>
> I"ve made a pretty big mistake so far by calling
> indirect threading direct threading. The
> lookup/invoke sequence is really indirect
> threading. Direct threading would be if we
> could somehow compile into PIR a literal
> "pointer" to a sub instead of having to look it
> up by name. I don't think this can be done.

I think it can. (Sorry for being so argumentative. :-) The
get_integer_keyed method gets the address of an Eval PMC at the
moment, IIRC. (This was used for the previous Forth implementation.)
GC might be a problem though. Emulation could be done too, but I doubt
that's desirable.

> In the case of words defined with ":" and ";"
> even NCG still does indirect threading via a
> lookup and invoke, NCG only inlines CORE word
> definitions, words that are defned in PIR and
> form the basis for all high level words, but the
> high level words themselves are indirect
> threaded.

Ahh... okay, that makes a bit more sense (or at least it did, I'm not
sure I remember why). It would actually be easier to inline all words
though, as you wouldn't have to treat core words differently (and it
makes it easier to redefine them).

> > I'd be interesting in knowing which was the
> > "correct" behavior.
>
> I suspect it is implementation defined, but
> unfortunately taygeta.com is not working for me
> right now.

Gotcha.

--
matt

Leopold Toetsch

unread,
Oct 19, 2004, 5:06:39 AM10/19/04
to mic...@dialnetwork.com, perl6-i...@perl.org
Michel Pelletier <mic...@dialnetwork.com> wrote:
> The Python interpreter could use this method too
> to really spank CPython, which has implicit
> stack traffic that cannot be easily optimized
> out.

That's not need. The translater can easily create register code, even
from Python bytecode, which is stack oriented.

leo

Darryl

unread,
Oct 19, 2004, 7:50:25 AM10/19/04
to perl6-i...@perl.org
michel wrote:
> Whether or not an old definition is retained if
> a word is redefined is a different question, in
> the case of Parakeet, it will increment by two
> because all high level words are looked up by
> name at run-time via indirect threading.

This is an incorrect __Forth__ behaviour. gForth's is correct.
If you want print+ to increment by two, it should be redefined to do that.
A compiled word should not change its behaviour.

Matt Diephouse

unread,
Oct 26, 2004, 12:06:16 PM10/26/04
to mic...@dialnetwork.com, perl6-i...@perl.org
On Sun, 17 Oct 2004 22:07:11 -0500 (CDT), Michel Pelletier
<mic...@dialnetwork.com> wrote:
> I propose you and I work together to make a
> totally Forth-language agnostic Forth
> micro-kernel. This kernel can be very
> minimalistic, a stacik, a machine state hash,
> and definitions for the words "code", "next",
> "word", and "'" (tick) all having standard Forth
> behavior, a simple dictionary and a simple eval
> loop.
>
> The micro kernel then just goes through some
> common stuff (parse args, load files, init
> blocks, etc.) and then bootstrap the larger
> language from some optionally specified language
> file.

. . .

> Do you think this is a good idea? I can certain
> help along with implementing words that Parakeet
> and Forth share. I myself have never
> implemented a complete Forth, but I've given a
> couple a stab, both direct thread models.

It's certainly worth investigating. I'm interested in seeing what your
new code looks like once you've implemented those ideas (moving to a
bootstrapper).

As a first step, I'm going to move to inlining all words (both
built-in and user defined). Doing this (and adding push/pop macros)
will bring our code bases a lot closer. Once this is done I'll make
another release and we can compare code again.

--
matt

Michel Pelletier

unread,
Oct 26, 2004, 12:56:24 PM10/26/04
to ma...@diephouse.com, mic...@dialnetwork.com, perl6-i...@perl.org

> As a first step, I'm going to move to inlining
> all words (both
> built-in and user defined). Doing this (and
> adding push/pop macros)
> will bring our code bases a lot closer. Once
> this is done I'll make
> another release and we can compare code again.

Cool, since the real meat of it is the
definitions of the words anyway, this is a good
way to proceed. We can finalize on the
stichings later.

-Michel

0 new messages