No, SMUDGE (and its associated "smudge-bit") serves no other useful
purpose. No, there's no reason why you couldn't replace it with
something more efficient. Its not part of any Forth standard, and only
exists in a few, mostly very old, Forth systems.
The *functionality*, however, is required by all Forth standards. You
*must* me able to define a word in terms of an older word with the same
name in Forth. But this is more easily accomplished by simply not
linking the current definition into the dictionary until it is complete.
SMUDGE is a kludge, and not a particularly clever one, IMO. Its
presence does not bode well for the brand of Forth in question (whatever
that may be). If it weren't for the rather mediocre, though popular-
in-its-time, Fig-Forth model, SMUDGE would probably have died a well-
deserved death many years ago.
--
Chris Waters | I think, there- | "Never look a gift horse in the mouth"
xt...@netcom.COM| fore I thwim. | --Priam, King of Troy
Being able to see previous versions of a word is important when RECURSING or
when updating a word by incorporating it into a new definition having the
same name. Of course, in order to recurse one wants to see the current definition;
while updating, one wants to see only previous def.
Ok, so could you answer the second part of the question which is _why_
is it necessary to be able to define a word in this way i.e. what
practical benefit is it?
Well, the reason that the standard requires it is because of existing
practice. Why is it existing practice? Well, consider the alternative.
A previous definition *cannot* be found. But what if you *want* to
access the previous definition? You're out of luck. IOW, the practical
benefit is the availablity of the feature.
Recursion is available in either case, so that's not an issue.
Another consideration is that an incomplete definition should *not* be
executed. For this reason, it is best not to make the word visible
until the definition is successfully compiled. If it is not visible, an
older version will be. This, in fact, is probably the real reason
behind the existing practice. The fact that it results in a more
flexible and powerful system is a fringe benefit. :-)
Could you not alias the old definition (i.e. ": old-foo foo ;" or a
true alias) and use that to access the old definition?
Another consideration is that an incomplete definition should *not* be
executed. For this reason, it is best not to make the word visible
until the definition is successfully compiled.
This is something more tangible. However, is there no use that could
be made of executing a partially defined word? I can't think of one
off hand, but then I can't think of why I'd want to access an old
definition without renaming either :-)
: In article <xtifrC9...@netcom.com> xt...@netcom.com (Chris Waters) writes:
: Well, the reason that the standard requires it is because of existing
: practice. Why is it existing practice? Well, consider the alternative.
: A previous definition *cannot* be found. But what if you *want* to
: access the previous definition? You're out of luck.
:
: Could you not alias the old definition (i.e. ": old-foo foo ;" or a
: true alias) and use that to access the old definition?
:
Good point. That's the way examples in dpANS are written since this Standard
says it's implementation defined wether incomplete words are FINDable or not.
And recursions are to be programmed using RECURSE, easy to remember 8-)
:
: Another consideration is that an incomplete definition should *not* be
: executed. For this reason, it is best not to make the word visible
: until the definition is successfully compiled.
:
: This is something more tangible. However, is there no use that could
: be made of executing a partially defined word? I can't think of one
: off hand, but then I can't think of why I'd want to access an old
: definition without renaming either :-)
Agreed.
I would never try to execute a partially defined word. It might hang, loop,
run wild, produce garbage, destroy important data on disk, crash the system...
If anybody thinks a partially definition might be useful, why not simply
cut the semantics of that word and properly end it? Let's factor out!
Bye, Heribert (da...@ifk20.mach.uni-karlsruhe.de)
: RECURSE SMUDGE ; IMMEDIATE
(this of course requires you to say the name of the word being recursed to
after saying SMUDGE; if I had known the dictionary structure better I would
probably have done it somewhat differently.
I think Uniforth is based directly on F83, so I presume that means Laxen-
Parry F83 had SMUDGE.
I agree SMUDGE is a KLUDGE.
--jvn
>My Uniforth for a Definicon board has SMUDGE. It did not have RECURSE.
>So I defined RECURSE as
> : RECURSE SMUDGE ; IMMEDIATE
I would have called this word "RECURSIVE", since it merely marks a
definition as allowing recursion, rather than actually doing the
recursion. In dpANS, "RECURSE" recurses. Thus:
: HANG RECURSIVE HANG ;
is exactly the same as:
: HANG RECURSE ;
Of course, only RECURSE is found in dpANS, since having both is somewhat
redundant. :-)
>(this of course requires you to say the name of the word being recursed to
>after saying SMUDGE; if I had known the dictionary structure better I would
>probably have done it somewhat differently.
Also, since SMUDGE is a toggle (which makes it even *more* confusing,
IMO), you would have to execute SMUDGE one more time after you complete
the definition, to make the word visible again, since the SMUDGE in
semicolon, which is supposed to reveal the word, will hide it instead.
>I think Uniforth is based directly on F83, so I presume that means Laxen-
>Parry F83 had SMUDGE.
Actually, while I don't have it on-line at the moment, I'm fairly sure
that l&P used HIDE and REVEAL, rather than SMUDGE. Not only is it not a
confusing toggle, but the names actually make sense, unlike "SMUDGE"! :-)
>I agree SMUDGE is a KLUDGE.
There's also a minor bug in SMUDGE (which is also a bug in *some*
implementations of IMMEDIATE). If you change vocabularies, you'll end
up SMUDGing the wrong word.
I believe that L&P used a variable, LAST, which pointed to the most
recent definition, irrespective of vocabulary. HIDE would actually
*remove* the word from the current vocabulary, so that nothing in the
system *except* LAST was pointing to it. Note that with this scheme,
there is no smudge bit in the header. F-PC (another system derived from
L&P) works this way. With this scheme, IMMEDIATE will work correctly
because it uses LAST to find the word to make immediate.
And, of course, you can define:
: RECURSIVE REVEAL ; IMMEDIATE
Which doesn't require any tweaking afterwards, or (this is more system
dependent):
: RECURSE LAST @ NAME> , ; IMMEDIATE
OC, since the definition of RECURSE is system dependent, it should be
part of the system, so it's a good thing that it's in the standard! :-)
>I think Uniforth is based directly on F83, so I presume that means Laxen-
>Parry F83 had SMUDGE.
>I agree SMUDGE is a KLUDGE.
Not so quick! SMUDGE is perhaps a lesser "elegant" solution for controlling the
visibility of a word as a normal Forth system would use... but it does work *and*
it is a very simple solution to control of visiblity of local definitions.
For example, one could (with a suitable segmented system) define other definitions
within the space of a colon word while it is still being defined. At this time
it is SMUDGEd out. At the end of the word one can then follow the chain of words back,
toggling SMUDGE bit on each word, until the colon word is toggled - thus hiding all
locally defined words and making visible the colon word in one fell swoop!
This is the simplest way to do this (as opposed to some kind of linked-list tree, etc).
You may say that SMUDGE is an unelegant hack. I look at it as a simple and unstructured
(read "uncluttered") minimalist solution.
PS. Personally, for *normal* Forth systems, I prefer the more modern solution - it is
faster to run and has less control structures (structure? no flag ;) .
A good implementation of Forth treats the smudge bit as the most
significant bit of the length of the name; that way a smudged word
won't be found as its name seems to be too long. There's no cost in
run time.
SMUDGE is a pretty simple solution to the problem which costs very
little.
Andrew.
>In article <1993Jun29.1...@mdd.comm.mot.com> bick...@mdd.comm.mot.com writes:
>>In <C9CFn...@murdoch.acc.Virginia.EDU> j...@fermi.clas.Virginia.EDU (Julian V.
>> Noble) writes:
>>
>>You may say that SMUDGE is an unelegant hack. I look at it as a simple and
>> unstructured
>>(read "uncluttered") minimalist solution.
>>
>>
>>PS. Personally, for *normal* Forth systems, I prefer the more modern solution -
>> it is faster to run and has less control structures (structure? no flag ;) .
>A good implementation of Forth treats the smudge bit as the most
>significant bit of the length of the name; that way a smudged word
>won't be found as its name seems to be too long. There's no cost in
>run time.
"..as its name seems to be too long..." Seems to whom? This all
requires logic and time to process while compiling. While, normally,
no words are SMUDGEd out in the dic ('cepting the last partial def.)
is is costly to repeatedly check for such while FINDing a word. Having
a word removed from the search chain is more efficient.
I note that if one wanted a local def. functionality within the dic. that
using a linked-list solution can get expensive in time and complex in
implimentation (compared to just using a SMUDGE bit).
.roger.
>In article <1993Jun29.1...@mdd.comm.mot.com> bick...@mdd.comm.mot.com writes:
>>You may say that SMUDGE is an unelegant hack. I look at it as a simple and
>> unstructured
>>(read "uncluttered") minimalist solution.
Seems like not linking the word into the dictionary is an even simpler
and less cluttered solution. :-)
I admit that bicknell's example of an exotic use for smudge (a *highly*
system dependent use, at that, which depends heavily on how the
dictionary is threaded) was interesting and thought provoking. I don't
necessarily think that it's the best approach to the problem, however.
Especially since it depends just as much on the dictionary threading
structure as any alternative would. Nevertheless, I haven't analyzed it
in detail, so I'm not going to condemn it out of hand. But I think that
it's a rare case in in case.
>>PS. Personally, for *normal* Forth systems, I prefer the more modern solution -
>> it is faster to run and has less control structures (structure? no flag ;) .
And is more readable to boot! :-)
>A good implementation of Forth treats the smudge bit as the most
Hrm, I'm not sure that I'd concede that a "good implementation" of Forth
would even *have* a smudge bit!! :-)
>significant bit of the length of the name; that way a smudged word
>won't be found as its name seems to be too long. There's no cost in
>run time.
*Ahem* It still has to determine that the length is out of range (a
test that it shouldn't even have to make in the first place!), and then
traverse to the next link and follow the link to the next header. That
may not be a *high* run-time cost, but it is most certainly a run-time
cost! And, on an 8-bit embedded machine, it may even be an
unacceptably high cost, small as it is.
>SMUDGE is a pretty simple solution to the problem which costs very
>little.
True, but hardly as simple as unlinking/relinking (the links have to be
built anyway, so the linking code is already in the system), and *far*
more confusing and obscure. SMUDGE offers no discernable benefits, and
has some obvious, though minor, disadvantages. Technical comparisons
aside (production code is unlikely to have any unfinished/smudged words
in the dictionary), SMUDGE is badly named and unpredictable (it either
hides or reveals a word--if you're lucky, it'll do the one you want).
Forth already has enough potential to be hard-to-read and confusing.
SMUDGE just makes this worse. HIDE and REVEAL, on the other hand, are
readable and predictable, and don't depend on exotica such as using
"unused bits in a byte".
>In <741392...@aph.demon.co.uk> a...@aph.demon.co.uk (Andrew Haley) writes:
>>A good implementation of Forth treats the smudge bit as the most
>
>Hrm, I'm not sure that I'd concede that a "good implementation" of Forth
>would even *have* a smudge bit!! :-)
>
>>significant bit of the length of the name; that way a smudged word
>>won't be found as its name seems to be too long. There's no cost in
>>run time.
>
>*Ahem* It still has to determine that the length is out of range (a
>test that it shouldn't even have to make in the first place!), and then
>traverse to the next link and follow the link to the next header.
^^^^^^^^^^^^^^^^^^^^^^^^^
Traversal of the name is only done on really bad Forth systems; in a
good system the links are chained together so that once a match on
the length has failed you can skip directly to the next word.
>That may not be a *high* run-time cost, but it is most certainly a run-time
>cost! And, on an 8-bit embedded machine, it may even be an
>unacceptably high cost, small as it is.
Oh, I see, you're just talking about the match attempt on the word
currently being defined. OK, not zero, just vanishingly small.
>>SMUDGE is a pretty simple solution to the problem which costs very
>>little.
>
>True, but hardly as simple as unlinking/relinking (the links have to be
>built anyway, so the linking code is already in the system), and *far*
>more confusing and obscure.
Taking the linking code out of CREATE requires an extra word (not too
bad) and an extra USER variable in _every_ terminal task to point to
the word currently being defined (pretty bad). Also, to prevent a
word being linked in twice (with catastrophic results) it's necessary
to check that it's not already linked. Also, the code to unlink a
word (which isn't needed at present) is pretty complex.
It's an increase in complexity for no apparent (to the user)
improvement in usage.
>SMUDGE offers no discernable benefits, and
>has some obvious, though minor, disadvantages. Technical comparisons
>aside (production code is unlikely to have any unfinished/smudged words
>in the dictionary), SMUDGE is badly named and unpredictable (it either
>hides or reveals a word--if you're lucky, it'll do the one you want).
True. That can be a real problem in some circumstances.
>Forth already has enough potential to be hard-to-read and confusing.
>SMUDGE just makes this worse. HIDE and REVEAL, on the other hand, are
>readable and predictable, and don't depend on exotica such as using
>"unused bits in a byte".
Oh, I don't mind HIDE and REVEAL as alternative names to SMUDGE; I've
had to do that myself whilst metacompiling. I just don't see what
the problem is with using a bit in the length byte to do the job.
Any "improvements" to Forth must either offer significant
improvements in performance or be simpler to implement. I don't
think this suggestion offers either.
Andrew.
: In article <xtifrC9...@netcom.com> xt...@netcom.com writes:
:
: >In <741392...@aph.demon.co.uk> a...@aph.demon.co.uk (Andrew Haley) writes:
:
: >>A good implementation of Forth treats the smudge bit as the most
[...]
: >>significant bit of the length of the name; that way a smudged word
: >>won't be found as its name seems to be too long. There's no cost in
: >>run time.
: >
: >*Ahem* It still has to determine that the length is out of range (a
: >test that it shouldn't even have to make in the first place!), and then
: >traverse to the next link and follow the link to the next header.
: ^^^^^^^^^^^^^^^^^^^^^^^^^
: Traversal of the name is only done on really bad Forth systems; in a
: good system the links are chained together so that once a match on
: the length has failed you can skip directly to the next word.
:
: >That may not be a *high* run-time cost, but it is most certainly a run-time
: >cost! And, on an 8-bit embedded machine, it may even be an
: >unacceptably high cost, small as it is.
:
Checking the length first can be faster as it is some sort of hashing, even
on a 8bit machine.
On a 16bit machine, e.g. PDP-11, it is possible to check the length byte and
the name's first character at the same time with only one word-instruction.
Then, also doing a SMUDGE test by reserving a high-bit in the length byte
comes for free run-time cost. See PDP-11 FIG Forth.
The results may be even more dramatically on a 32bit or 64bit machine!
Remaining characters, if any, are only compared if necassary, of course.
[...]
: Oh, I don't mind HIDE and REVEAL as alternative names to SMUDGE; I've
: had to do that myself whilst metacompiling. I just don't see what
: the problem is with using a bit in the length byte to do the job.
:
Agreed.
: Any "improvements" to Forth must either offer significant
: improvements in performance or be simpler to implement. I don't
: think this suggestion offers either.
:
Or be more portable, but this may not be your criterion.
And if portability is the issue, the complexity problem is up the implementor,
not to the normal user, as portable programs can't make assumptions about the
dictionary structure or they are environmentally dependent.
Bye, Heribert (da...@ifk20.mach.uni-karlsruhe.de)
How about
: bar foo ;
: foo ... bar ... ;
This is slow and does not look very nice, but then what you are doing
is not very nice either.
- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
Yes, but there are cases where RECURSIVE works, but RECURSE does not.
Just yesterday I have written code that looks like
: foo ... ['] foo ... ;
Now I have to figure out how to run this. Using a deferred word is
easy, but ugly.
While we are at it, IMO RECURSIVE is also better from the viewpoint of
readability: When I write a recursive word (e.g. sort), I don't think
"Now it should recurse". I think "Now sort this part". And a reader
should think so, too.
Standardizing RECURSIVE, not RECURSE would also eliminate the
headaches people have with using RECURSE in DOES>-code and in
anonymous words (of course, these are the cases where RECURSE can
express things that RECURSIVE cannot).
And it isn't an exact replacement either. It fails to be an exact
replacement because it allows access to the original word (foo) via the
intermediate word (bar).
+but then what you are doing is not very nice either.
Oh? What about adding more control structure "security" than the
underlying system provides? Or adding more debugging information than the
underlying system provides? It is very elegant to be able to redefine a
word in terms of its old definition, and by doing it the way Forth does,
you can be sure that the old definition will not be accessed when new
words are added to the system.
-Doug
But the point is that you are still making a test that you _KNOW_, a
priori, will _ALWAYS_ fail. Why bother making such a test at all? I
think the "don't link the word into the dictionary until its definition
is complete" is a much more elegant solution.
-Doug
>In article <xtifrC9...@netcom.com> xt...@netcom.com writes:
>>In <741392...@aph.demon.co.uk> a...@aph.demon.co.uk (Andrew Haley) writes:
>>*Ahem* It still has to determine that the length is out of range (a
>>test that it shouldn't even have to make in the first place!), and then
>>traverse to the next link and follow the link to the next header.
> ^^^^^^^^^^^^^^^^^^^^^^^^^
>Traversal of the name is only done on really bad Forth systems; in a
So you're saying that the name field doubles as the link? :-)
You have to do *something* to get to the link field, even if it's merely
a "1 CELLS -".
>>That may not be a *high* run-time cost, but it is most certainly a run-time
>>cost! And, on an 8-bit embedded machine, it may even be an
>>unacceptably high cost, small as it is.
>>>SMUDGE is a pretty simple solution to the problem which costs very
>>>little.
>>
>>True, but hardly as simple as unlinking/relinking (the links have to be
>>built anyway, so the linking code is already in the system), and *far*
>>more confusing and obscure.
>Taking the linking code out of CREATE requires an extra word (not too
>bad)
To replace SMUDGE, you mean? I'd hardly call that *extra*. :-)
>and an extra USER variable in _every_ terminal task to point to
>the word currently being defined (pretty bad).
Maybe. Most people seem to think it's worth it.
>Also, to prevent a
>word being linked in twice (with catastrophic results) it's necessary
>to check that it's not already linked.
It is? That's funny, I've never seen that done. "LAST @ CURRENT @ !"
will not link twice no matter how many times you call it.
> Also, the code to unlink a
>word (which isn't needed at present) is pretty complex.
Sure didn't seem complex in any of the myriad systems I've seen it in!
"LAST @ @ CURRENT @ !" is pretty typical.
>It's an increase in complexity for no apparent (to the user)
>improvement in usage.
Then why is it found in every Forth system I've seen in the last decade?
It's no more complex than trying to juggle individual bits (which
someone has arbitrarily decided are "unused", a decision I'm *not* sure
I like!) As for improvement in usage, I would argue that that is the
whole *point* of the scheme!
>Oh, I don't mind HIDE and REVEAL as alternative names to SMUDGE; I've
>had to do that myself whilst metacompiling. I just don't see what
>the problem is with using a bit in the length byte to do the job.
Munging bits is always awkward at best, and can be a downright PITA on
some machines. Furthermore, it mangles the length byte! Why have the
length byte not equal the length? That only adds to the confusion and
the difficulty in understanding Forth.
>Any "improvements" to Forth must either offer significant
>improvements in performance or be simpler to implement. I don't
>think this suggestion offers either.
Well, it's hard to argue with someone who has already made his mind up.
But I think it's a trifle arrogant to talk about an "improvement" to
Forth when discussing something that is more common than the alternative
that you're advocating. I haven't seen a Forth system that uses SMUDGE
in nearly a decade.
I might turn the question around, and say, what possible advantages can
accrue from this strange SMUDGE mechanism you're proposing? The way
that Forth works now, with HIDE and REVEAL linking and unlinking the
word, works fine, and doesn't rely on odd concepts like "unused bits"
and bit toggles and stranges hacks to (FIND).
>In <741520...@aph.demon.co.uk> a...@aph.demon.co.uk (Andrew Haley) writes:
>
>>Also, to prevent a
>>word being linked in twice (with catastrophic results) it's necessary
>>to check that it's not already linked.
>
>It is? That's funny, I've never seen that done. "LAST @ CURRENT @ !"
>will not link twice no matter how many times you call it.
That doesn't work too well if you want to hide a word that's not the
top of the dictionary. Maybe that's not too important.
>> Also, the code to unlink a
>>word (which isn't needed at present) is pretty complex.
>
>Sure didn't seem complex in any of the myriad systems I've seen it in!
>"LAST @ @ CURRENT @ !" is pretty typical.
Um. Okay, I guess that you're using a rather simpler dictionary
structure than I'm used to where this thing is somewhat easier.
>>It's an increase in complexity for no apparent (to the user)
>>improvement in usage.
>
>Then why is it found in every Forth system I've seen in the last decade?
I've no idea. That's what I'm trying to discover!
>>Any "improvements" to Forth must either offer significant
>>improvements in performance or be simpler to implement. I don't
>>think this suggestion offers either.
>
>Well, it's hard to argue with someone who has already made his mind up.
I haven't. Really. In fact, I'm going to try it to see if it makes
the system any faster or better. The whole point of my line of
argument is to try to get the proponents of this scheme fully to
describe its advantages. I'm not opposed to changing things; when it
comes to dictionary structure, there are some great ways to improve
performance: the best that I know of is hashing the dictionary,
rather than using a linked list.
>But I think it's a trifle arrogant to talk about an "improvement" to
>Forth when discussing something that is more common than the alternative
>that you're advocating. I haven't seen a Forth system that uses SMUDGE
>in nearly a decade.
Ah. That's something I really didn't know. Almost all of my Forth
experience is with polyFORTH, and I assumed that the deletion of the
smudge bit was something that was being proposed rather than
something that was common usage. Mea culpa.
How is the code organized? In the systems I know, CREATE creates a
word and links it in. CREATE is called by : . Presumably, you need
two versions of CREATE, one which does and one which doesn't link in
the word. Or do you have a flag so that CREATE knows what kind of
thing it's creating?
If you have a vectored CREATE, do you need to vector both versions?
Andrew.
Pete.
But the point is that you are still making a test that you _KNOW_, a
priori, will _ALWAYS_ fail. Why bother making such a test at all? I
think the "don't link the word into the dictionary until its definition
is complete" is a much more elegant solution.
Well after asking what I thought was a simple question I'll have to
admit to being confused by most of the replies :-(. So at the risk of
showing even more naivette: Assuming a linear linked dictionary, why
not link word being searched for to the _end_ of the dictionary, so
that it will always be found? The purpose of this is to avoid the "is
end of dictionary" test after following the link from the previous
header. Once the word is found, it is only necessary to check if the
address is the same the one at the end of the dictionary to determine
if it is an attempted recursive definition or if a previous definition
has been found. Using this technique, there isn't much point to a
SMUDGE bit, the required functionality is tacked on after the address
comparision. Comments?
>In article <xtifrC9...@netcom.com> xt...@netcom.com writes:
>>In <741520...@aph.demon.co.uk> a...@aph.demon.co.uk (Andrew Haley) writes:
>>
>>>Also, to prevent a
>>>word being linked in twice (with catastrophic results) it's necessary
>>>to check that it's not already linked.
>>
>>It is? That's funny, I've never seen that done. "LAST @ CURRENT @ !"
>>will not link twice no matter how many times you call it.
>That doesn't work too well if you want to hide a word that's not the
>top of the dictionary. Maybe that's not too important.
No more than it is with "CURRENT @ @ DUP C@ $20 XOR SWAP !" (aka
SMUDGE). :-)
>>> Also, the code to unlink a
>>>word (which isn't needed at present) is pretty complex.
>>
>>Sure didn't seem complex in any of the myriad systems I've seen it in!
>>"LAST @ @ CURRENT @ !" is pretty typical.
>Um. Okay, I guess that you're using a rather simpler dictionary
>structure than I'm used to where this thing is somewhat easier.
It's a little more complex if you hash the vocabulary into multiple
threads, but not greatly. OC, the complexity of SMUDGE is also
increased when you hash the dictionary. It's a matter of finding the
thread that the latest word has been added to. And here, the user
variable LAST gives you a big advantage, since you can just use the
variable to find the thread, instead of comparing the tops of all the
threads to see which one has a higher address (a test that may not be
valid if your threads go through different areas of memory, i.e. you
haven't built your dictionary in a strictly linear fashion, which is a
fairly common thing to do when creating temporary definitions above the
normal dictionary).
For comparison, here's the definitions of HIDE and REVEAL from F-PC:
: HIDE ( -- )
LAST @ DUP N>LINK Y@ SWAP CURRENT @ YHASH ! ;
I think that the "Y@" has to do with the segmented architecture, but
I'm not sure. Note how the presence of the LAST variable makes it easy
to do the hashing to find the appropriate thread. N>LINK is an alias
for 2-, btw.
: REVEAL ( -- )
LAST @ DUP N>LINK SWAP CURRENT @ YHASH ! ;
Now, if we use the same linking scheme, but assume no LAST variable, and
try to implement SMUDGE, we end up with:
: LATEST ( -- alf )
0 CURRENT @ DUP #THREADS CELLS +
SWAP DO I @ UMAX 1 CELLS +LOOP ;
and
: SMUDGE ( -- )
LATEST L>NAME DUP C@ $20 XOR SWAP C! ;
Now, which scheme seems more complex? Note that we can't use the YHASH
function, since that requires that we have the name to hash, which we
don't unless we have a variable like LAST pointing to the most recent
definition. Thus we have to sort through all the threads by hand with a
loop. Granted, all the overhead of LATEST will only be stored in the
dictionary once, while the user variable LAST has one occurance per
terminal task. And I admit that it hadn't occured to me that this might
be a big deal (I rarely have more than three or four tasks). But there
are obvious speed and simplicity gains from the variable.
Note that this will also break down completely if I try one of the
temporary definition schemes that basically involve pointing HERE to
another place in memory (usually up high somewhere). Without LAST to
point to the most recent definition, I *cannot* temporarily move the
dictionary pointer up into high memory somewhere without breaking this
definition of LATEST.
So there's the argument for the LAST variable. And once you have that,
the differences between using SMUDGE or HIDE/REVEAL are extremely
minimal (as you can see above). And HIDE/REVEAL don't (as I have
mentioned before) rely on obscure and dubious features like unused (sez
who?) bits in the length byte.
>>>It's an increase in complexity for no apparent (to the user)
>>>improvement in usage.
>>
>>Then why is it found in every Forth system I've seen in the last decade?
>I've no idea. That's what I'm trying to discover!
Well, I'm not the originator of the scheme, so I'm having to do a little
guesswork and analysis to see just what's going on here. And I've never
used a Forth system that had hashed dictionary threads that *didn't* use
the LAST variable. I'm interested in how PolyFORTH defines SMUDGE, and
how it goes about finding the latest definition from the various threads
if it doesn't use LAST.
>I haven't. Really. In fact, I'm going to try it to see if it makes
>the system any faster or better. The whole point of my line of
>argument is to try to get the proponents of this scheme fully to
>describe its advantages.
Hmm, well, I can offer the data above. I have to admit that it never
even occured to me that there might be systems around that used hashing
but that *didn't* use HIDE and REVEAL. Maybe there's some tricks that
haven't occured to me. But I'm not so much a proponent of HIDE and
REVEAL as I am a happy user of systems that have them.
BTW, doesn't PolyFORTH use a compination of hashing and threading, much
like the scheme outlined above? I've only ever seen one Forth system
that used hashing *instead* of threading, and that was an experimental
system by Michael McNeil which he gave up on as too rigid and limited
(although it sure did compile fast!) :-)
>>But I think it's a trifle arrogant to talk about an "improvement" to
>>Forth when discussing something that is more common than the alternative
>>that you're advocating. I haven't seen a Forth system that uses SMUDGE
>>in nearly a decade.
>Ah. That's something I really didn't know. Almost all of my Forth
>experience is with polyFORTH, and I assumed that the deletion of the
>smudge bit was something that was being proposed rather than
>something that was common usage. Mea culpa.
Ah, well I didn't know that PolyFORTH still used the SMUDGE bit. As I
say, I didn't think that SMUDGE was compatible with hashing, and I knew
that PolyFORTH used some hashing. So mea culpa to me too. I admit that
I have a lot of respect for Forth Inc., so I'm curious how they
implemented SMUDGE with hashing, and without LAST. (I also thought that
LAST was a PolyFORTH concept, which just shows ta go how much I've kept
up with PolyFORTH, I guess.) :-)
>How is the code organized? In the systems I know, CREATE creates a
>word and links it in. CREATE is called by : . Presumably, you need
>two versions of CREATE, one which does and one which doesn't link in
>the word. Or do you have a flag so that CREATE knows what kind of
>thing it's creating?
Nothing so complex. CREATE creates a word, points LAST to it, then
calls REVEAL. : just calls HIDE to unlink it again.
The LMI UR/FORTH implementations use a totally hashed dictionary.
This is a solid commercial system that has been shipping since 1986
and is available for DOS, OS/2, 80386 32-bit protected mode (using
the Phar Lap DOS Extender), and Windows 3.1. The hashed symbol
table approach actually simplifies many things and dramatically
improves compilation speed.
There is a "header segment" which is just all the symbols and
pointers to the corresponding code and data fields along with
some flags, but it is never searched directly. FIND reaches the
header segment through a hash table, then does a string compare.
Hash code collisions are handled by chaining the headers with
the same hash code together. Even with several thousand words
in the dictionary, FIND does on the average <2 string comparisons.
There are some interesting implications of this approach. For
example FORGET is easy - nothing to unlink and relink, just chop
the header segment back to the desired header, then scan the
header segment from the start and rebuild the hash table. Since
FORGETs are very rare, the overhead involved here is irrelevant.
Similarly, to make one word or some contiguous words headerless,
you just chop them out of the header segment, collapse the
header segment by moving later definitions up over the excised
ones as necessary, then rebuild the hash table as for FORGET.
Vocabularies are implemented by appending a vocabulary number byte
to the symbol string before it is hashed (thus, up to 255 vocabularies
are possible in the system). As I said before, hash code collisions
are chained together. As it happens, the most recent definition
in such a chain will be at the front of the chain, so redefinitions
of a name in the same vocabulary will have the expected behavior
(the most recent will be used). Words with the same name in different
vocabularies will never have the same hash code so will always be
findable according to the search order.
Incidentally, per someone's comment that he hadn't seen SMUDGE
used in a decade... I never did like the fig-Forth SMUDGE because
it was toggling a bit -- when you executed SMUDGE you weren't bringing
a header to a known state, you were bringing it to the opposite of
its previous state. So we changed LMI systems a long time ago to
use SMUDGE (which is like HIDE) and UNSMUDGE (which is like REVEAL).
>In article <xtifrC9...@netcom.com> xt...@netcom.com (Chris Waters) writes:
>>... I've only ever seen one Forth system
>>that used hashing *instead* of threading, and that was an experimental
>>system by Michael McNeil which he gave up on as too rigid and limited
>>(although it sure did compile fast!) :-)
>The LMI UR/FORTH implementations use a totally hashed dictionary.
[...]
>There is a "header segment" which is just all the symbols and
>pointers to the corresponding code and data fields along with
>some flags, but it is never searched directly. FIND reaches the
>header segment through a hash table, then does a string compare.
>Hash code collisions are handled by chaining the headers with
>the same hash code together. Even with several thousand words
>in the dictionary, FIND does on the average <2 string comparisons.
Yup, sounds exactly like McNeil's system (which was described in one of
the early ForML conference proceedings. Interesting. I didn't know
that anyone was using it. The rest of the details mentioned (ease of
coding FORGET, how to handle vocabularies, etc.) are all exactly the
same as McNeil described in his paper.
Live and learn. I'm glad to hear that this scheme is successful. I
guess that with newer machines, with more memory, the overhead of the
hash tables isn't as much of a problem as it was back when McNeil first
tried it (back when 64k was all the CPUs could address). Memory was the
reason that he abandoned the scheme.
>Incidentally, per someone's comment that he hadn't seen SMUDGE
>used in a decade...
That was I.
>I never did like the fig-Forth SMUDGE because
>it was toggling a bit -- when you executed SMUDGE you weren't bringing
>a header to a known state, you were bringing it to the opposite of
>its previous state.
Exactly! :-)
>So we changed LMI systems a long time ago to
>use SMUDGE (which is like HIDE) and UNSMUDGE (which is like REVEAL).
That's better, though I still dislike the concept of fiddling with bits.
I realize that standard(s) define 31 characters as the maximum header
length, but I prefer to be able to define much longer names in my system
if I feel like it. But then, I suppose it might be harder to actually
unlink and relink in a fully hashed system. Oh well.
I'm a little dubious about the names as well (something more descriptive
seems like it would be preferable), but again, oh well. :-)
cheers, and thanks for the info, Ray.
I am curious to know what your hashing function is. Is the hash table
fixed size or does it grow/shrink?
+Vocabularies are implemented by appending a vocabulary number byte
+to the symbol string before it is hashed (thus, up to 255 vocabularies
+are possible in the system).
Are you saying then that if there are n vocabularies in the search order
that up to n separate probes of the hash table must be done before
deciding that maybe the word you are trying to find is a number?
-Doug
That's news to me. I don't remember ever seeing McNeil's paper.
But it's possible that Rick Wilton (who did much of the original
implementation of UR/FORTH on DOS) had read it. Or perhaps it's
just convergent evolution!
>>So we changed LMI systems a long time ago to
>>use SMUDGE (which is like HIDE) and UNSMUDGE (which is like REVEAL).
>
>That's better, though I still dislike the concept of fiddling with bits.
>I realize that standard(s) define 31 characters as the maximum header
>length, but I prefer to be able to define much longer names in my system
>if I feel like it. But then, I suppose it might be harder to actually
>unlink and relink in a fully hashed system. Oh well.
But if you have SMUDGE and UNSMUDGE (or HIDE and REVEAL), the
implementation is really irrelevant. There's no reason why the
bit (if it is implemented in a bit flag) has to be embedded in
the symbol string. In UR/FORTH, each header has a separate
"flags byte" which includes the "smudged/unsmudged" bit along
with several others. The full extended character set can
be used in definition names, and names can be as long as 255
characters. (although such names wouldn't be Standard)
>I'm a little dubious about the names as well (something more
>descriptive seems like it would be preferable)...
Yeah, in retrospect, I agree. At the time, however, we felt
that we were presenting our users with enough new concepts to
cope with in UR/FORTH that we didn't want to invent new names
for functionality they were already relatively familiar with.
Live & learn, I guess.
Fixed size, although we make it large enough that there will
be relatively few collisions in a typical application (the hash
table's in its own segment, so memory is not a constraint).
The hash function itself is somewhat ad-hoc -- we tried a
number of different hash functions, using large applications
of our own as testbeds, and picked one that was efficient yet
scattered the name space fairly evenly across the hash codes.
>+Vocabularies are implemented by appending a vocabulary number byte
>+to the symbol string before it is hashed (thus, up to 255 vocabularies
>+are possible in the system).
>
>Are you saying then that if there are n vocabularies in the search order
>that up to n separate probes of the hash table must be done before
>deciding that maybe the word you are trying to find is a number?
Yes. However, we are trying to optimize the common case, and
as far as we have seen search orders deeper than 2 or 3 vocabularies
are quite uncommon. FIND also collapses the search order
if possible before doing anything else, e.g. if CONTEXT and
CURRENT are both the same vocabulary, it will only probe the
hash table once.
In any event, the hashed symbol table approach is always
going to be better than the traditional linked dictionary.
You are indirectly raising the spectre of the pathological
case, an application that contains (say) 255 vocabularies,
and then somehow during compilation sets up a search order
that contains most or all of these vocabularies. (Never
mind the difficulties of documenting and maintaining such
an application!) Well, such an application would undoubtedly
contain many thousands of definitions -- at least, I can't
visualize an application where it would make sense to have
the memory and maintenance overhead of a vocabulary only to
put one or two words in the vocabulary. Then,
probing the hash table 255 times would still be preferable
to performing a linear search down thousands of words
partitioned among 255 vocabulary threads before trying
to convert the string to a number.
>Yes. However, we are trying to optimize the common case, and
>as far as we have seen search orders deeper than 2 or 3 vocabularies
>are quite uncommon. FIND also collapses the search order
>if possible before doing anything else, e.g. if CONTEXT and
>CURRENT are both the same vocabulary, it will only probe the
>hash table once.
I thought that searching CURRENT was removed from both the 83-standard
and dpANS! In fact, I'm not even sure that it was in the 79-standard.
(Although collapsing any duplicate vocabularies pointed to by CONTEXT is
certainly a good idea in any case.)
>In any event, the hashed symbol table approach is always
>going to be better than the traditional linked dictionary.
Yes, substantially, if you're talking about compile speed. It does,
however, require giving up some flexibility. Probably a worthwhile
tradeoff in 97% of the cases however.
I presume that there are also regular threads maintained in the
dictionary? (Used to rebuild the hash-table when needed, as well as to
find the most recent definition in a given thread.)
Definitely sounds like a groovy scheme. If you ever decide to bring
that OS/2 version up-to-date, let me know. (My interest in a 16-bit
OS/2 version is lower than my interest in JCL, and almost as low as my
interest in a Windows version. A 32-bit OS/2 version, OTOH, has at
least one guaranteed sale right off the bat.) :-)
I am using CONTEXT and CURRENT as shorthand for the "first vocabulary
in the search order" and "the compilation vocabulary." VARIABLEs
with the names CONTEXT and CURRENT actually exist in our system, but
they just contain a vocabulary number, not a pointer to a linked
list of headers as in a traditional system. The FIND in our
system is driven by a byte array of vocabulary numbers that is set
up by execution of a vocabulary name and DEFINITIONS and so on.
This byte array defines the search order and can theoretically
be of any length up to 255 bytes, although in our systems the
default search order has 3 tiers: CONTEXT, CURRENT, and lastly
FORTH.
As for Forth-83 and dpANS -- Forth-83 was singularly vague on
the subject of vocabulary implementation, and (the last time
I looked) so is dpANS. In the Forth-83 standard there is nothing
that says the use of system variables named CONTEXT and CURRENT
makes the system nonstandard. Nor is there anything that says
the compilation vocabulary CAN'T be part of the search order.
Perhaps you are confusing the infamous Ragsdale ALSO/ONLY
experimental vocabulary proposal with the Forth-83 standard.
It's true it was included in the Forth-83 Standard document,
but it is not part of the standard and is not required.
We considered, and still consider, the ALSO/ONLY scheme to
be severely brain-damaged and we have never used it, although
it is straightforward to layer an ALSO/ONLY compatibility layer
on top of any of our systems.
Thanks for the info.
+In any event, the hashed symbol table approach is always
+going to be better than the traditional linked dictionary.
+You are indirectly raising the spectre of the pathological
+case, an application that contains (say) 255 vocabularies,
+and then somehow during compilation sets up a search order
+that contains most or all of these vocabularies.
I was not trying to say that hashing wasn't better. I wanted to be
sure I understood the way that you kept the name spaces of the
vocabularies separate. Since your vocabulary byte is "appended" to the
word being looked up, I don't suppose that you compute the hash of the
word itself, then incrementally compute the hash for the vocabulary <n>
version of the word name?
-Doug
I see that I was somewhat vague about where the vocabulary number
resides. The vocabulary number for a given definition is actually
part of the header and follows directly after the symbol string.
The overall structure of a header is something like this:
symbol length
symbol string (n bytes)
vocabulary number
flags byte (smudge, precedence, etc.)
pointer to code field
pointer to parameter field ("body")
pointer to next header with same hash code, or NULL
The helper word that computes a hash code accepts an address and
length on the stack. So the main definition that builds the
hash table only has to add one to the symbol string length to
have the hash function include the vocabulary number into the
hash code.
For lookups, FIND accepts the address of a counted string.
If necessary, it copies the string to a scratch buffer. It
then loops across the byte array that defines the search order.
For each vocabulary number in the search order, it copies the
vocabulary number to the end of the string being looked up,
calls the hash function, then probes the hash table. Sounds
complicated but the amount of code involved is actually very
small.
>I think Uniforth is based directly on F83, so I presume that means Laxen-
>Parry F83 had SMUDGE.
No - F83 used HIDE and REVEAL.
-- Mike.
------------------------------------------------------------------------
Mike Hore mi...@kralizec.zeta.org.au (best) _--_|\
CompuServe: 100033,3164 / \
\_.--._x
Insanity is hereditary. You get it from your kids. v
------------------------------------------------------------------------