was far from "standard." According to our surveys, the vast majority of
implementations were "83 standard except..." -- e.g., "except for 32-bit cell
size" being the most common, but also lack of blocks, lack of "standard"
dictionary threading, etc. A significant minority were still 79.
It was our desire to enable as many to come into compliance w/ ANS Forth as
possible, with as little cost as possible. Therefore, we tried to create
compromises that could be accomodated to a great extent in "shells" either on
top of a system or under an application.
RE LOCALS: MacForth has used locals for >8 years. With >6000 users, that
represents a significant experience base. Several other TC members had also
used them, tho not as widely.
-----
This message came from GEnie via willett. You *cannot* reply to the author
using e-mail. Please post a follow-up article, or use any instructions
the author may have included (USMail addresses, telephone #, etc.).
Report problems to: d...@willett.pgh.pa.us _or_ {pitt,sei}!willett!dwp
Lynn Barra, X3, c/o Dan Arnold, 75300,2354
If there's not enuf address space to list both names, just use Dan & reference
Lynn in the message. Be sure the message clearly states up front that it is a
public review response to dpANS Forth.
Now that it's so easy, get those comments in! Look forward to hearing from
you!
[NOTE: For those of you that can send Internet mail,
you can reach a Compu$erve account X,Y as:
X...@compuserve.com
Notice that the comma(,) changes to a dot(.) and
compuserve has an 'e' on the end!
-dwp]
_I_ don't. _They_ do. It was the Forth Standards Team that put the label
"Standard" on the 83 document. However, I must point out that -- based on my
experience -- more lip service is paid to the 83- Standard than any other,
which suggests that it's at least the strongest candidate for an existing
standard.
From Elizabeth Rather:
> MacForth has used locals for >8 years. With >6000 users.... *Sigh.* A
three-part response:
a) this is not a valid argument. If you can tell me _how_many_ of those 6000
users _actively_use_ locals in their programming, I'll be more impressed. The
fact that a feature has been included in a package which has been shipped does
not mean that that feature is either useful, or used.
b) you fall into the same trap as Doug Philips: the false dichotomy of "new
technology" vs. "common practice". There is, in fact, a third category:
established (even boring) technology which is nevertheless _not_ common
practice. (Dare I mention multitasking?)
c) I could make a case for locals being "new technology" in the Forth
community....at least insofar as the "standard" solution being offered leaves
problems unsolved. (E.g., multi-cell locals.)
Brad Rodriguez
B.RODRIGUEZ2 on GEnie | brad...@maccs.dcss.mcmaster.ca
"All this is meaningless drivel if no one calls the client!"
- Chris McEwen
> b) you fall into the same trap as Doug Philips: the false dichotomy of "new
> technology" vs. "common practice". There is, in fact, a third category:
> established (even boring) technology which is nevertheless _not_ common
> practice. (Dare I mention multitasking?)
I apologize for presenting my position as being dichotomous. That was not
my intention. As I did not have access to the usage information, all I
could do is post about how much people have written about Locals (Note 1).
There is, in fact, a forth category: established technology in use by the
rest of the Computer Science community but shunned by the Forth community.
I suspect that there are more categories as well.
As far as multitasking in particular, I suspect that the embedded systems
people might care to differ. Of course, you'll just say: But how many
people actually use it? As far as the theoretical considerations go, it
doesn't matter. Round robin systems have been around a long time. That
they may only be applicable to a certain subset of the users of an ANSI
Forth is something the TC has to weigh itself.
> c) I could make a case for locals being "new technology" in the Forth
> community....at least insofar as the "standard" solution being offered leaves
> problems unsolved. (E.g., multi-cell locals.)
Straw argument. You might as well claim that there is no memory managment
scheme that will work for all applications, therefore memory management leaves
problems unsolved. This is a pretty common argument against any change:
it isn't perfect, so don't do it.
Do the FILE words provide you with a record locking file
interface? NO!??!?!?! Humph! They are clearly leaving
problems unsolved!!!
Now, if you are claiming that the LOCAL specification as it stands now
prevents a clean up-ward path to multi-cell LOCALS.... but that isn't
the sense I get from what you said.
-Doug
Note 1: It would seem from what you are saying about LOCALS that the
Forth community loves to write about things it doesn't use. I'm not
sure if you are intending to imply that or not.
---
Preferred: d...@willett.pgh.pa.us Ok: {pitt,sei}!willett!dwp
As the recently posted bibliography attests, locals have been a source of
serious exploration by a number of quite independent people for many years. A
couple of years ago Martin Tracy held a workshop on the subject at FORML, and
reported a lot of use and experimentation and even more interest and desire
for such a feature.
I personally have no great fondness for locals; in my experience, if your
stack handling is so messy you need locals, you should back off and reconsider
some design issues. But I do believe the user demand is genuine enough to
offer something for folks to try, and the experience base on the TC is
sufficient to offer something useful. But, after all, the locals wordset is
optional -- if few systems choose to offer it, and few users demand it, they
at least cost nothing and hurt no one.
This may well be on target. A profusion of hacks have appeared in
the literature over the years, but one hesitates to use them out
of fear that they may be nonorthogonal, that is, mutually incompatible,
that is, use two or three or more at the same time and things go
blooey.
This is one of the things that makes software component design (as
opposed to coding) so difficult.
I am VERY glad that locals will be in an optional word set. It seems to give
comfort to those use to using them in other languages. I came up with a crude
implementation for FPC with them and found I really didn't use them. I
subsequently cut them out of my FPC version. But, to tell a 'C' collegue that
Forth doesn't support locals turns them off and it is neigh on to impossible
to convince them that code can still be modular and structured. My two cents
worth.
+This is one of the things that makes software component design (as
+opposed to coding) so difficult.
Indeed. I am intrigued by your use of the characterization "hacks."
My impression is slightly different. I view many of the articles to be
more of a proof-of-concept than polished-product. More along the lines of
"this is how I did it on my system" rather than "here is code you can
simply use." I think part of that stems from the diversity of the various
standards and implementations of Forth, and part from the Forth philosophy
of "grow your own solution (from the core wordset) to fit your problem
exactly" mentality. That attitude is not necessarily wrong, but it
doesn't foster reusable code and portability.
-Doug
>I personally have no great fondness for locals; in my experience, if your
>stack handling is so messy you need locals, you should back off and reconsider
>some design issues. But I do believe the user demand is genuine enough to
>offer something for folks to try, and the experience base on the TC is
>sufficient to offer something useful. But, after all, the locals wordset is
>optional -- if few systems choose to offer it, and few users demand it, they
>at least cost nothing and hurt no one.
Spoken like a true Forth conservative! If I am not mistaken, you are also
opposed to file-handling words, right? Your company makes its money from
embedded systems, right? So you aren't really interested in any Forth
features for people that use Forth in other ways.
Let's consider the following problem: a point is represented on the stack
as integers x y z. I wish to define
: pt+ ( pt1 pt2 -- pt) \ add points together.
Now, tell me how to design and implement this word without locals so that
a) you don't need to do a lot of stack manipulations, and b) the
factorization of the point data structure makes sense.
Writing stack manipulations with DUP, ROT, SWAP, DROP, contributes to the
perception of Forth as a write-only language. You have to work out stack
manipulation for each word, and it consumes time. Good factorization of
words occurs only after you've written a bad factorization, so you need to
work out these manipulations at least once per word.
Locals ease the implementation of Forth words. They are extremely useful in
scientific programming where "medium-sized" data structures ( complex #s,
3D coordinates, etc.) are best implemented by value (i.e., all components
of the data structure appear on the stack when the object is used).
Hadil G. Sabbagh
E-mail: sab...@cs.nyu.edu
Voice: (212) 533-6905
Snail: Courant Institute of Math. Sci.
251 Mercer St.
New York,NY 10012
"Injustice anywhere is a threat to justice everywhere."
- Martin Luther King, Jr.
Disclaimer: This is not a disclaimer.
> a) this is not a valid argument. If you can tell me _how_many_
> of those 6000 [MacForth] users _actively_use_ locals in their
> programming, I'll be more impressed.
Well, I've used them. They're quite handy in certain situations. Locals seem
to allow programmers to handle certain situations faster, easier and with a
more readable result. Locals may also make Forth easier for new Forth
programmers, by avoiding difficult stack manipulations.
Why spend a lot of time figuring out how to manipulate data (which item is
under which other item, and what order do I put them in to access them
efficiently?) when the same thing can be written in an immediately obvious
fashion? I tried a complex math routine in stack form, then rewrote it with
locals. The version using locals was much simpler to write, read and
comprehend.
--
: Pt+ ( pt1 \ pt2 -- pt3 )
We could try a few different factorings:
based on pointers,
: Array+ ( a \ b \ c -- a' \ b' \ c' )
>R @+ >R SWAP @+ >R + R> SWAP R> SWAP R> !+ ;
: Pt+ ( pt1 \ pt2 \ pt3 -- ) 2 FOR Array+ NEXT DROP DROP DROP ;
or with a matrix library,
: Pt+ ( pt1 \ pt2 -- pt3 ) 2 matrix+ ;
or rough it on the stack.
: Pt+ ( pt1 \ pt2 -- pt3 ) >R SWAP >R + R> R> + ;
Rob
> Let's consider the following problem: a point is represented on the stack
> as integers x y z. I wish to define
>
> : pt+ ( pt1 pt2 -- pt) \ add points together.
>
> Now, tell me how to design and implement this word without locals so that
> a) you don't need to do a lot of stack manipulations, and b) the
> factorization of the point data structure makes sense.
>
[ deleted: good argument for locals based on the noise caused by stack
flip, flop, swap, drop words. ]
Instead of locals, another possibility has appeared in the press. That is
a mechanism for specifying before and after stack pictures which cause
stack transformation at runtime. typically it looks something like:
(sn abcdef --- facdbe ) where the "abcdef" is the before picture and
the "facdbe" is the after picture. The two major problems with this
technique are implementation, and notation.
Usually implementations are slow and non-reentrant because they
are implemented by a stack flush to a linear array and a copy back to
the stack. but this does not worry me because implementations can
be fixed/improved.
The chosen notation does nothing to improve readability. I would
prefer something like:
(sn ch, ch1, adr, pt --- pt, ch1, ch, adr ) where ch, ch1 are characters,
adr is an address and pt is a three integer data element. I want a
notation that aids readability not hinders it.
just another half-thought out idea from:
--- eric
--
UUCP ...!uunet!wang!harvee!esj e...@harvee.uucp HAM ka1eec
WORK AT&T (617) 577-4068 (w) joha...@athena.polaroid.com
source of the public's fear of the unknown since 1956
> Concerning a question about defining something like:
> : Pt+ ( pt1 \ pt2 -- pt3 )
> or rough it on the stack.
> : Pt+ ( pt1 \ pt2 -- pt3 ) >R SWAP >R + R> R> + ;
I think the original problem poster is someone who doesn't like ruffin'
things on the stack. And I guess everyone agrees that the definition above
is not instantly obvious. At least when I read it I had my pencil ready for
some stackjuggling.
I think the REAL problem lies in the fact that an abstract datatype
notation (pt) is used in the one and only place where it shouldn't.\
Pt+ seems a primitive function on pt's (at least that's what comes to mind
when confronted with a lot of DUP SWAP ROT OVER R> R@ dudes.) that certainly
'knows' about the (x,y)-pair representation of pt. I suppose that
when the operator does not operate on the datatype as a whole,
the stack notation should not present the datatype as a whole.
In my opinion the use of locals makes the thing clear to the 'bit-level'.
To put it in me own just invented L(-local notation:
: Pt+ L( pt1x \ pt1y \ pt2x \ pt2y -- pt3x pt3y )
pt1x pt2x + pt1y pt2y + ;
( Note that the local mechanism only goes as far as the --,
the pt3x and pt3y are just comment. )
The question is whether you will be writing Pt+'s all the time. If your
collection of primitives (ie operations that 'know' of the datatypes
representation) remains small, I wouldn't mind dropping the L and go for some
stackjuggling. The Pt+'s name would give me enuff of an idea of it's
function. It is only when the collection of primitives explodes
( PtDUP, PtSWAP PtR> etc) that the lack of locals becomes a nuisance.
Just me fl 0.02 (I'm Dutch;)
Jan Stout,
wsbu...@urc.tue.nl
>> Concerning a question about defining something like:
>> : Pt+ ( pt1 \ pt2 -- pt3 )
>> or rough it on the stack.
>> : Pt+ ( pt1 \ pt2 -- pt3 ) >R SWAP >R + R> R> + ;
>I think the original problem poster is someone who doesn't like ruffin'
>things on the stack. And I guess everyone agrees that the definition above
>is not instantly obvious. At least when I read it I had my pencil ready for
>some stackjuggling.
I am the original poster. Indeed, my intention was to show a "medium-sized"
data structure that would require stack juggling. In fact, I was thinking
of three-dimensional points (x,y,z).
My biggest concern about stack juggling is that it is "distracting". When I
am working on a program, I am thinking about the data structures and
algorithms. I find easy to concentrate on such details while writing in
Forth; each design concept maps into a "vocabulary" of Forth words.
However, when it comes to implementing a Forth word, I have to jump to a
lower level of detail (i.e.., what's on the stack, how to move it around to
satisfy my definition, etc.) This is why I find locals a big win; it
reduces the number of "mental context switches" I need to make. Observe
that I am talking about the prototyping phase; when it comes to
optimization, I might replace locals with stack juggling or even assembly
language.
>I think the REAL problem lies in the fact that an abstract datatype
>notation (pt) is used in the one and only place where it shouldn't.\
>Pt+ seems a primitive function on pt's (at least that's what comes to mind
>when confronted with a lot of DUP SWAP ROT OVER R> R@ dudes.) that certainly
>'knows' about the (x,y)-pair representation of pt. I suppose that
>when the operator does not operate on the datatype as a whole,
>the stack notation should not present the datatype as a whole.
At first I was confused by this. I think you mean that I should have
written the spec as:
: pt+ ( pt1.x pt1.y pt2.x pt2.y -- pt3.x pt3.y)
I agree with you. It was an oversight on my part.
>In my opinion the use of locals makes the thing clear to the 'bit-level'.
>To put it in me own just invented L(-local notation:
> : Pt+ L( pt1x \ pt1y \ pt2x \ pt2y -- pt3x pt3y )
> pt1x pt2x + pt1y pt2y + ;
> ( Note that the local mechanism only goes as far as the --,
> the pt3x and pt3y are just comment. )
>The question is whether you will be writing Pt+'s all the time. If your
>collection of primitives (ie operations that 'know' of the datatypes
>representation) remains small, I wouldn't mind dropping the L and go for some
>stackjuggling. The Pt+'s name would give me enuff of an idea of it's
>function. It is only when the collection of primitives explodes
>( PtDUP, PtSWAP PtR> etc) that the lack of locals becomes a nuisance.
My favorite locals notation looks like:
: word { in1 in2 ... | l1 l2 ... -> out1 out2 ...}
where in1, in2, ... are input arguments; l1, l2, ..., are locals and
out1, out2, ... are output arguments. Thus, we have
: pt+ { x1 y1 x2 y2 --> x3 y3 }
x1 x2 + -> x3
y1 y2 + -> y3
;
All locals are QUAN-type variables. You cannot take addresses; this allows
them to be stored in registers. Note that the definition of pt+ does not
use local locals.
I can't claim that these are original; I have forgotten the author of this
design. I added notation for floating point:
: word { in1 in2 ... f: fin1 fin2 | l1 l2 .. f: fl1 fl2 -> f: out1 out2 }
all names following the f: in the "section" are floating point. This could
be extended to use any other stack types you may have.
I added these extensions to Bradley's C Forth 83; he provides a small
kernel of words that make implementing local variables quite easy.
Elizabeth Rather writes:
ER>I personally have no great fondness for locals; in my experience, if your
ER>stack handling is so messy you need locals, you should back off and reconsider
ER>some design issues.
In general, I agree with the above. However, there comes a point where one
may "over factor" to facilitate tailoring the PROBLEM to the LANGUAGE. Isn't
that the antithesis of forth? The pt+ example is a good one in this regard.
I discovered the value of locals when I took an analysis of algorithms
course and eagerly attempted to code each algorithm in forth. In many
cases, the stack contortions became too complex and distracted me from
the problem I was solving. Keeping more than three things on the stack
invites trouble and obscurity. In many cases, it just didn't make sense
to factor the algorithm. Also, factoring did not reduce the number of
arguments required by the algorithm's highest level word. I always
attempt to write words which operate on no more than three stack items.
However, the answer to the question "How many stack items should any
word have?" ultimately boils down to "As many as required to solve
the problem". IMHO, before I would consider a forth to be a general
purpose language, that forth would have to support locals.
Hadil G. Sabbagh writes:
HGS>Let's consider the following problem: a point is represented on the stack
HGS>as integers x y z. I wish to define
^ ^ ^
Note that a point as defined here consists of three (3) coordinates.
Rob Chapman writes:
RC> or rough it on the stack.
RC>
RC> : Pt+ ( pt1 \ pt2 -- pt3 ) >R SWAP >R + R> R> + ;
This assumes a point consisting of two (2) coordinates, it does NOT
solve the stated problem.
Eric S Johansson writes:
ESJ>Instead of locals, another possibility has appeared in the press. That is
ESJ>a mechanism for specifying before and after stack pictures which cause
ESJ>stack transformation at runtime. typically it looks something like:
ESJ>(sn abcdef --- facdbe ) where the "abcdef" is the before picture and
ESJ>the "facdbe" is the after picture. The two major problems with this
ESJ>technique are implementation, and notation.
When I read about this in FD, it seemed like a workable scheme, but
interpreting strings to effect stack transformation does not strike
me as an elegant (read forthlike) solution to the problem.
Jan Stout writes.
JS>when confronted with a lot of DUP SWAP ROT OVER R> R@ dudes.) that certainly
JS>'knows' about the (x,y)-pair representation of pt. I suppose that
Again, note how the problem has changed to points with two coordinates!
Finally Hadil G. Sabbagh writes:
HGS>where in1, in2, ... are input arguments; l1, l2, ..., are locals and
HGS>out1, out2, ... are output arguments. Thus, we have
HGS>
HGS>: pt+ { x1 y1 x2 y2 --> x3 y3 }
HGS> x1 x2 + -> x3
HGS> y1 y2 + -> y3
HGS>;
Now the original poster is convinced that he is solving a different problem!
-- Greg
--
=============================================================================
Greg Schmidt -> schm...@iccgcc.decnet.ab.com
=============================================================================
"People with nothing to hide have nothing to fear from O.B.I.T"
-- Peter Lomax
-----------------------------------------------------------------------------
Disclaimer: No warranty is expressed or implied. Void where prohibited.
=============================================================================
A header without a code field can be an alias name.
In Open Boot, we have a special bit in the header that says this
is an alias. Headers with that bit set have an extra pointer field
that points to the code field of the word to which they refer.
This alias mechanism is handled at the very lowest level, right in the
guts of FIND . If you try to tick or FIND an alias, you get the
cfa of the word to which it refers. Given that cfa, if you try to
"go backwards" to the header with >NAME or something like that, you
get the nfa of the word's "original" header, not the "alias" header.
This mechanism, in conjunction with a "transient" dictionary mechanism,
is very convenient for creating words whose names may be discarded
after compilation.
Mitch
I don't see any improvement in readability...
What specifies 'ch' and 'ch1' are characters?
What states that 'pt' is a 3-cell data unit? (other than the above text?)
(You're not proposing that the text 'ch' ch1' 'adr' & 'pt' are known
to the compiler, are you?)
IMHO, the existing LOCALs syntaxes are far superior than the above.
For example, in JForth, multi-celled local items are notated thusly...
: FOO { a b 5 c -- }
the 5 before the 'c' indicates that 'c' is 5 cells in size...of
course, provisions exist to aquire the address of the first cell.
Yes, this approach means that locals may not be given names of
soley numeric content.
>HGS>Let's consider the following problem: a point is represented on the stack
>HGS>as integers x y z. I wish to define
> ^ ^ ^
>Note that a point as defined here consists of three (3) coordinates.
>Again, note how the problem has changed to points with two coordinates!
> ...
>Finally Hadil G. Sabbagh writes:
>HGS>where in1, in2, ... are input arguments; l1, l2, ..., are locals and
>HGS>out1, out2, ... are output arguments. Thus, we have
>HGS>
>HGS>: pt+ { x1 y1 x2 y2 --> x3 y3 }
>HGS> x1 x2 + -> x3
>HGS> y1 y2 + -> y3
>HGS>;
>Now the original poster is convinced that he is solving a different problem!
Just going with the flow, Greg. Here is my original problem (and solution,
using locals).
: pt+ { x1 y1 z1 x2 y2 z2 --> x3 y3 z3 }
x1 x2 + -> x3
y1 y2 + -> y3
z1 z2 + -> z3
;
See my previous posting on the locals syntax.
Hadil
>: pt+ { x1 y1 z1 x2 y2 z2 --> x3 y3 z3 }
> x1 x2 + -> x3
> y1 y2 + -> y3
> z1 z2 + -> z3
>;
>See my previous posting on the locals syntax.
>Hadil
I wrote some weeks ago a library for vector arithmetics (not only addition
and subtraction, but dot and cross product and 3d>2d projection). I solfed
it with an array of 6 floats (called temp). This is a dirty solution, but
it works well:
CAPS OFF \ case insensitive: writes faster, reads better
Create temp 6 floats allot
DOES> swap floats + ;
: temp! 5 FOR i temp f! NEXT ;
: v+ ( pt1 pt2 -- pt3 )
temp! 3 0 DO i temp f@ i 3 + temp f@ f+ LOOP ;
( and so on)
If I had a FP unit, I would use its registers as temprorary and win lots
of time. The temp array is useful for swap, dup, dot and cross products,
for all the things you do with 3 dimensional vectors. The order of the
vector on stack (x,y,z) or (z,y,x) is only important for the cross product
(left hand or right hand). I like the ( x y z -- ) format, because you type
it in this order. I suppose you do the same, but then tell me one thing:
How does your Forth get the locals in the right order? Every locals I know
turn the order round and you have C conventions to give parameters, i.e.:
: GCT ( a b -- gct ) \ has locals
{ b a } ... ;
Bernd Paysan
Bernd> I suppose you do the same, but then tell me one thing:
Bernd> How does your Forth get the locals in the right order?
With the syntax: LOCALS| a b c |
The following definition would do:
: LOCALS| \ Compilling: ( "l1 l2 ... ln <|>" -- )
\ Runtime: ( x1 x2 ... xn -- )
0 ( End marker )
BEGIN BL PARSE-WORD ( a # )
OVER C@ [CHAR] | =
UNTIL ( 0 a1 #1 a2 #2 ... an #n )
BEGIN DUP WHILE (LOCAL) REPEAT ( 0 )
0 (LOCAL) ; IMMEDIATE
--
---------------------------------------------------------------------
Lennart Staflin : There's more to life than books, you know
l...@ida.liu.se : but not much more --- The Smiths
Interesting point. Maybe what we are discusing here is really about
the most "forth-like" way to handle stack manipulations when a data
element may consist of more than one stack cell (i.e. a point in 3-space
or a RGB triplet pixel)
Thanks. Looks genial. But I have two problems: First the block or the TIB may
change it's location between parsing (think of multi programming systems).
Second, I can't create headers given a string (I can, but it is difficult).
I found a solution: I stack the contents of >IN, and everything works well.
I have four words for creating locals: <LOCAL starts the definition,
LOCAL: <name> creates a local, LOCAL> ends the definition and LOCAL; ends the
scope of locals. All these words are immediate, so they can work without
adding new definitions. So the solution is:
: { POSTPONE <LOCAL -1 ( end marker, >IN will never be -1)
BEGIN >IN @ BL WORD 1+ C@ [CHAR] | = UNTIL DROP >IN @ >R
BEGIN DUP 0< 0= WHILE >IN ! POSTPONE LOCAL: REPEAT DROP
POSTPONE LOCAL> ; IMMEDIATE
: } POSTPONE LOCAL; ; IMMEDIATE
: TEST { A B | A . B . } ;
and
1 2 TEST
gives
1 2 ok
Hurray!
Bernd Paysan
The discussion about PT+ was intended to raise the issue of the need
for local variables -- but it also brings to light the problem of
handling data structures that take multiple cells.
It can be a severe disadvantage to put bulky data structures of
varying size directly on the parameter stack. Not only can it be hard
to write the programs required to manipulate the data, but potentially
it requires new sets of stack manipulation words. (If vectors occupy 3
cells and integers one cell the user would presumably want words like
VDUP, VSWAP and mixed operations like VISWAP, etc. to allow Forth
flexibility in implementing algorithms which involve the new data
objects. This would become apparent as soon as anyone want's to use a
word like PT+ to actually do something).
In some of the systems I've used for mathematics, there are 6 or more
data types -- each occupying a different number of cells. It has been
found best to have every data object represented by a single cell on
the stack (this representative is usually either an address,
displacement within a segment set aside for the data type, or index in
an array of objects). In this way data objects can be subjected to all
the usual Forth stack operations with no difficulty.
The data objects are equipped with words for accessing their
components parts. Thus for vectors we would want a word [] so
that "v 2 []" gives either the address of the second component of v or
the second component itself (If you choose the latter, you probably
will also want a word []! to store data.) Chuck Moore seems to have
originated the word "TH" for the component selector -- so "v 2
th"). There will also be, presumably, words for arithmetic operations
on the coordinates.
A vector package can be constructed in layers -- so that it need not
be just restricted to integer coordinates. Information about the
coordinate domain is communicated to the level that handles vector
operations -- so the same code for vector operations can be used for
a variety of different types of coordinates (here is another reason
to make sure that different types of coordinates do not have different
stack sizes).
The simplest way to do this (a better way is described below) is to
have all operations specify the target of their result. Thus, for
example, a vector addition V+ would have the stack diagram
( v1 v2 v3 -- ) or maybe even ( v1 v2 v3 -- v3 )
where the vi are "vectors" (i.e. the addresses of 3-tuples or the
indices in an array of 3-tuples) and v3 is intended to be the result
of adding v1 to v2. The code for V+ will be written in terms of an
addition operation for components. [If preferred, you can write code
for words like this with free use of ordinary Forth variables -- no
recursive calls are involved, so there is no need for local
variables. Forth variables names can be reused -- and they have a
scope: they last until they are redefined. There is no harm in
using the same variable names for independent procedures.]
This approach actually works, but it has the disadvantage that the
syntax for operations is unForth-like. We would like V+ to have the
stack diagram ( v1 v2 -- v3 ) where v3 is not named -- but just
somehow appears. This can be accomplished by creating a set of
temporary storage locations for each data type (I've been using 16
locations, and it seems to work just fine). Each time an operation is
performed, the following happens:
(1) The storage pool is searched for the next available
address
(2) The operation is performed with the result stored at
this address.
(3) The address is put on the stack as the result of the
operation -- and it is marked as "unavailable" in
the pool.
Once all addresses are marked, and no available address is found when
requested, a mini-garbage collection takes place: the stack is
examined to see which of the 16 addresses are still being referenced
in the stack -- all others are unmarked.
This reclamation is very fast -- because only 16 addresses are
involved and they are "used" only if found on the stack -- so no big
search through chains of pointers and variables is required. It is
easily coded in assembly language. [The only caveat about this
approach is that any data of lasting interest which is produced in one
of these temporary locations should be moved to a permanent home. If a
temporary address itself is stored in a variable, it will be
considered available at the next garbage collection.]
I've used this approach for strings, several basic coefficient type
objects (like BIGNUM integers and BIGRAT rational numbers), compound
objects (like polynomials with BIGNUM coefficients), etc. It works
great as long as everything is represented on the stack using a single
cell. It allows you to manipulate "gizmos" and "gadgets" without
constantly being aware of what they look like, how big they are, and
features of their internal representation. SWAP swaps a gizmo with a
gadget, an integer with a string, etc.
[The code for this storage scheme for temporaries was written in
Forth-83 using traditional Forth techniques. It takes about 4 screens.
It was published in Journal of Forth Application and Research -- which
seems to be defunct. I'm willing to mail people an electronic copy of
the article if someone can assure me that this would not violate the
JFAR copyright. (I've never published in a journal that went out of
business before, so I don't know the legalities involved.)]
[I also have fast versions for F83 and F-PC with selected words coded
in assembly language.]
John J Wavrik
jjwa...@ucsd.edu Dept of Math C-012
Univ of Calif - San Diego
La Jolla, CA 92093
I like that approach. I assume that you are not talking about
having a typed stack though?
+Each time an operation is performed, the following happens:
+ (1) The storage pool is searched for the next available
+ address
+ (2) The operation is performed with the result stored at
+ this address.
+ (3) The address is put on the stack as the result of the
+ operation -- and it is marked as "unavailable" in
+ the pool.
+Once all addresses are marked, and no available address is found when
+requested, a mini-garbage collection takes place: the stack is
+examined to see which of the 16 addresses are still being referenced
+in the stack -- all others are unmarked.
I take it then that there is only one area of 16 items, regardless of the
"type" of those items... or are you simplifying for the sake of
discussion? And what happens if all 16 are in use? (I assume that that
can happen either because some other stack value "looks like" a reference
to one of those cells, or because too many temporaries are (accidently?)
being used).
+I've used this approach for strings, several basic coefficient type
+objects (like BIGNUM integers and BIGRAT rational numbers), compound
+objects (like polynomials with BIGNUM coefficients), etc. It works
+great as long as everything is represented on the stack using a single
+cell. It allows you to manipulate "gizmos" and "gadgets" without
+constantly being aware of what they look like, how big they are, and
+features of their internal representation. SWAP swaps a gizmo with a
+gadget, an integer with a string, etc.
Indeed. But again, the programmer still has to know which cell on the
stack is of what type (I'm not saying that that is a bad thing...)
BTW: Upper Deck Forth uses a very similar scheme for handling strings.
But instead of having explicit slots, usage flags, etc., their strategy is
to use an area of memory as a ring buffer. Just keep a pointer to the
next "free" address and a count of remaining bytes. If the new string
object won't fit in the hole left at the end of the buffer, then wrap to
the beginning. For a one K buffer that strategy guarantees at least 4
maximally sized strings simultaneously, and often a lot more. But the
idea is nearly the same. The Upper Deck has the advantage that there is
no GC done whatsoever and that it is totally up to the programmer to make
sure that too many live strings are not "in use" at once.
Could you give a more complete reference to the JFAR article?
Concerning a data management scheme for temporaries,
Doug Philips writes:
> I like that approach. I assume that you are not talking about
> having a typed stack though?
The data stack is not typed, although it could be if all the addresses
for a given address occupy one band of the address space. I generally
like the Forth approach of producing differently named operations for
each data type -- rather than overloading an operator name and having
run time overhead based on typing of operands.
>+Each time an operation is performed, the following happens:
>+ (1) The storage pool is searched for the next available
>+ address
>+ (2) The operation is performed with the result stored at
>+ this address.
>+ (3) The address is put on the stack as the result of the
>+ operation -- and it is marked as "unavailable" in
>+ the pool.
>+Once all addresses are marked, and no available address is found when
>+requested, a mini-garbage collection takes place: the stack is
>+examined to see which of the 16 addresses are still being referenced
>+in the stack -- all others are unmarked.
>
> I take it then that there is only one area of 16 items, regardless
> of the "type" of those items... or are you simplifying for the sake
> of discussion? And what happens if all 16 are in use? (I assume
> that that can happen either because some other stack value "looks
> like" a reference to one of those cells, or because too many
> temporaries are (accidently?) being used).
Each data type has its own set of 16 temporaries. The code that
manages these is the same for all. A child of the defining word TEMP-
STORAGE installs itself as the current data type. If STRINGS is one
such data type, then STRINGS TEMP returns the address of the next
temporary location for STRINGS. (The word STRINGS sets the current
data type, and TEMP returns the next free address for the current data
type.)
In general all this is made invisible at the top level. The sequence
$" ABC" $" XYZ" OVER $+ $+
will put the (temporary) address of the string ABCXYZABC on top of
the stack. Here $" is state-smart -- but its interpret-time action
is
: $" STRINGS TEMP ASCII " WORD OVER $! ;
where
: $! OVER C@ 1+ CMOVE ;
while $+ concatenates two strings, putting the result in a temporary
and returning the address of the temporary to the stack.
If all 16 temporaries are in use, there is an ABORT. Except for
programming errors, the this has not occurred in practice. It is easy
to make up examples where 16 locations could be insufficient (if you
want to add 20 things, you could put them all on the stack and then
apply addition 19 times -- but you can also add as you go). It should
be noted that HP calculators have only 4 locations for temporaries --
so 16 is probably too many.
>+I've used this approach for strings, several basic coefficient type
>+objects (like BIGNUM integers and BIGRAT rational numbers), compound
>+objects (like polynomials with BIGNUM coefficients), etc. It works
>+great as long as everything is represented on the stack using a single
>+cell. It allows you to manipulate "gizmos" and "gadgets" without
>+constantly being aware of what they look like, how big they are, and
>+features of their internal representation. SWAP swaps a gizmo with a
>+gadget, an integer with a string, etc.
>
> Indeed. But again, the programmer still has to know which cell on
> the stack is of what type (I'm not saying that that is a bad
> thing...)
Yes -- the programmer still has to know what is on the stack and its
type -- but not its size. No additional stack manipulation words need
to be created. The storage management is concealed in the operation
and input words so becomes transparent for programming using these
operations.
Example: for vectors there is an addition V+ and
scalar multiplication cV* (applied with
scalar on left).
a b V W -- aV + bW can be produced by
ROT SWAP cV* -ROT cV* V+
It doesn't matter which dimension the vector space is, because the
components are not being stored on the stack. As a result, most of the
code in a vector space package is independent of the size of vectors.
> BTW: Upper Deck Forth uses a very similar scheme for handling
> strings. But instead of having explicit slots, usage flags, etc.,
> their strategy is to use an area of memory as a ring buffer. Just
> keep a pointer to the next "free" address and a count of remaining
> bytes. If the new string object won't fit in the hole left at the
> end of the buffer, then wrap to the beginning. For a one K buffer
> that strategy guarantees at least 4 maximally sized strings
> simultaneously, and often a lot more. But the idea is nearly the
> same. The Upper Deck has the advantage that there is no GC done
> whatsoever and that it is totally up to the programmer to make sure
> that too many live strings are not "in use" at once.
In this ring buffer scheme, I assume that the addresses and lengths of
strings still in use are tagged somehow, so that if the pointer
circles the ring it will not allow the re-use of memory being used by
an active string. Never having seen the Upper Deck system, I don't
know how it determines if a string is still in active use.
"Garbage Collection" is actually a bit excessive a term for what is
done. In my method it amounts to searching the stack (and perhaps
somewhere else) to find which strings are no longer active -- it is
nowhere near what LISP must do to reclaim storage.
> Could you give a more complete reference to the JFAR article?
"Handling Multiple Data Types in Forth"
by John J Wavrik
JFAR vol 6 number 1 (1990)
I agree. However, having a typed stack doesn't necessarily mean that you
have to use it to dispatch operators (though that is a pretty obvious
thing to do with it). Even given the "address range"-typing type
system you mentioned fails because it assumes that everything on the
stack is a pointer.
+In this ring buffer scheme, I assume that the addresses and lengths of
+strings still in use are tagged somehow, so that if the pointer
+circles the ring it will not allow the re-use of memory being used by
+an active string. Never having seen the Upper Deck system, I don't
+know how it determines if a string is still in active use.
There is no check. The premise is that the programmer knows how many
maximally lengthed (counted) strings can be temporary at once, and so must
not use any more than that. I can not say that I am philosophically or
practically bothered by their choice, but I probably would have done
things differently.
My guess is that the biggest gotcha in either method is in keeping (or
thinking that you can keep) temporary objects "live" over calls to
non-trivial words.
+"Garbage Collection" is actually a bit excessive a term for what is done.
Indeed. Your method doesn't have the overwrite hole that the Upper Deck
System's method does, but then neither method will catch a copy of the
temporary pointer instead of copying the data.
-Doug
P.S. I elided all the stuff that I agreed with (mostly).
> Currently we are thinking of sending simple forth commands to
> each node but we think it is too slow for real use. Have there
> been any discussions about this subject?
Something like this is being done by the PROCIC consortium (PROcess Computer
for computationally Intensive Control). They are using a Network Token
Language which can be compiled from Forth or other HLLs. It was presented at
Rochester '92; for more details you could contact either of the following:
Klaus Schleisiek-Kern, DELTA t GmbH, Uhlenhorsterweg 3,
D-2000 Hamburg 76, Germany Tel: +49-40-2296441 Fax: +49-40-2297205
Stephen F. Pelc, MicroProcessor Engineering Ltd., 133 Hill Lane,
Southampton, GB - SO1 5AF, Great Britain Tel: +44-703-631441 Fax: +44-703-
339691
Be advised that this _is_ a commercial (for-profit) venture.
Brad Rodriguez
B.RODRIGUEZ2 on GEnie | brad...@maccs.dcss.mcmaster.ca
So many projects....so little time.