What are my options for dictionary hashing?

chitselb

unread,

Sep 7, 2010, 12:50:42 AM9/7/10

to

I'm writing a Forth (for NMOS 6502, not that it matters) and using a
(mostly) Fig model for my dictionary headers. It's a direct-threaded
implementation because I think that will be faster, and I'm using a
split parameter stack except for TOS. I'm sticking to FORTH-83
because I'm comfortable with it and because I think ANS Forth is a
little overkill for a machine with max 32K RAM (Commodore PET 2001-
N). I've never implemented Forth before, so this is my first rodeo,
and I'm having a lot of fun with it so far.

It occurred to me that having FIND search through a single linked list
of all 300+ words before it fails (and we try NUMBER instead) is a
little inefficient. So I implemented a simple XOR hashing function to
crunch the name field down to a 4-bit number. That way, it only has
to search (on average) 6.25% of the dictionary for a name to figure
out that it isn't in there. I also preloaded all the LFA fields with
a dummy value ($DEAD) and wrote a RETHREAD to build the LFAs and a 32-
byte hashtable, and an UNTHREAD to set them back to $DEAD. The
UNTHREAD is for FORGET so I could just UNTHREAD, move the DP, then
RETHREAD instead of arduously relinking all those hashed LFAs. Very
clever, I told myself. Then I started to code FIND and -FIND .

So, what happens when the CURRENT vocabulary isn't FORTH ? Suppose
it's EDITOR or ASSEMBLER or some user-created vocabulary? I looked in
"Starting Forth" and "Thinking Forth" and neither book was any help at
all. I'll spend some time wading through all my back-issues of Forth
Dimensions looking for ideas, but here's what I've come up with on my
own.

Plan A)
A1. manually link the LFAs of child vocabularies in the assembler
source
so RETHREAD (and UNTHREAD) wouldn't find them
A2. have FIND hash the word only when the VOC-LINK of the directory is
0 (means it's
FORTH)

Plan B)
B1. make the 32-byte hashtable a part of the VOCABULARY data
structure. And by the way, what's that $A081 doing in there? I
couldn't figure out what it's used for.

Plan C)
C1. Ask the geniuses here. There has to be prior art. What are my
options?

I'm leaning toward Plan A.

Andrew Haley

unread,

Sep 7, 2010, 4:31:38 AM9/7/10

to

chitselb <chit...@gmail.com> wrote:
>
> Plan C)
> C1. Ask the geniuses here. There has to be prior art. What are my
> options?
>
> I'm leaning toward Plan A.

XOR a 4-bit vocabulary number with the 4-bit hash value to select
the linked list to search.

Andrew.

m_l_g3

unread,

Sep 7, 2010, 11:45:41 AM9/7/10

to

1. Instead (or together with) hashing, you could use a length-based
approach.
Application words are usually long, while system words are mostly
short
(6 characters max IIRC). Application words are usually invoked in a
few
application words that follow them in the source, while system words
are used
through the whole source.

So you split names into two streams: long names (most likely defined
right before
the name where they are used), and short names (most likely frequently
used words),
and optimize their search times separately.

2. I can tell you what I had.
It were nested loops over all word lists, all their threads (linked
lists),
all names; while the name are higher than the new DP, they get
unlinked.
(And yes, first of all one has to unlink word lists that will be
deleted.)

3. Each word list had a mask: M = (2^n-1) for M+1 threads.
So if you set M to 0 (one sublist), you have the classical single-
sublist
word list.

4. For hash functions, the ROL (rotate bits left) instruction is
usually
better than SHL (shift bits left).

5. If Forth is written in Assembly, word lists are usually based on a
single
linked list. You run Forth, run a Forth program that re-links words
into
multiple sublists, and save the dictionary.

BruceMcF

unread,

Sep 7, 2010, 4:00:46 PM9/7/10

to

How many total words are going to be possible?

There are two basic scenarios ... a few big vocabularies and lots of
little little vocabularies.

For lots of little vocabularies, they are already short, so hashing is
not needed. If its a big vocabulary, you want more hashes, but sixteen
will likely be overkill for vocabularies other than FORTH.

In operation, put an AND-mask byte into the vocabulary structure, and
mask the hash index when starting the search of each vocabulary. FORTH
is $FF for the full 4-bit index, and a 32-byte table of linked list
pointers, the "big" vocabulary mask is $03, for a 2-bit index and an
eight-byte table of linked list pointers, and the "small" vocabulary
mask is $00, for a index guaranteed to be $00 and a single cell
pointer.

At the outset, if your WORDLIST sets up a default "little" vocabulary
and your VOCABULARY creates a WORDLIST then extends that to a "big"
vocabulary, that may be all you need. And all "big" wordlist
structures are linked in their own list, to make UNTHREAD and RETHREAD
simple iterative words that only bother with the wordlists that need.

A second approach is to drop FORGET and stick to MARKER. If all
wordlist structures are linked together, MARKER can point to the most
recent one, and when the MARKER executes, you can step down each
wordlist in turn, resetting each linked list to point to the first
word in the list under the MARKER. That should be equally workable
with sixteen, four or one linked list per vocabulary. Then there is no
need for UNTHREAD and RETHREAD, just THREAD.

Hugh Aguilar

unread,

Sep 7, 2010, 7:00:07 PM9/7/10

to

On Sep 6, 10:50 pm, chitselb <chits...@gmail.com> wrote:
> I'm writing a Forth (for NMOS 6502, not that it matters) and using a
> (mostly) Fig model for my dictionary headers. It's a direct-threaded
> implementation because I think that will be faster, and I'm using a
> split parameter stack except for TOS.

Any kind of threading on a 6502 is a big mistake (arg!) --- use
subroutine threading! Make your compiler generate branch instructions
for loops and IFs and so forth, and your code will be "faster." Also,
you shouldn't put the TOS in a zero-page variable because that won't
help on the 6502 (that only works on processors with 16-bit registers
in which you can put the TOS in a register) --- use the split stack
and keep everything on the stack. The only exception to this is that
literal values can be loaded into A:Y using immediate addressing prior
to calling functions such as + and AND and so forth --- this is a
pretty good optimization.

I've written a 65c02 cross-compiler --- I know what I'm talking about.

> It occurred to me that having FIND search through a single linked list
> of all 300+ words before it fails (and we try NUMBER instead) is a
> little inefficient.

Use my symtab code available at: http://www.forth.org/novice.html

This is the best algorithm available for low-memory computers such as
your PET. It is way better than hash tables for several word-lists
with each word-list being a separate structure, because you don't have
to predict ahead of time how many words there will be and provide
memory for that many --- each word-list grows dynamically. See this
thread for a discussion of memory usage:
http://groups.google.com/group/comp.lang.forth/browse_thread/thread/c37b473ec4da66f1

> C1. Ask the geniuses here. There has to be prior art. What are my
> options?

My symtab is "prior art" --- it is pretty artistic code, and I first
wrote it (in IBM370 assembly language) over ten years ago. I invented
the algorithm myself and I'm pretty proud of it. It is possible that
the algorithm has been invented elsewhere too, but I wouldn't know
about that because I never do any research --- I don't really concern
myself with OPC (other people's code) too much. For the most part, I
recommend that you not concern yourself with OPC either, although
using my symtab would be an exception to this rule.

John Passaniti

unread,

Sep 8, 2010, 12:37:50 AM9/8/10

to

On Sep 7, 7:00 pm, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
> This is the best algorithm available for low-memory computers
> such as your PET. It is way better than hash tables for several
> word-lists with each word-list being a separate structure, because
> you don't have to predict ahead of time how many words there will
> be and provide memory for that many --- each word-list grows
> dynamically.

The same is true for linear hashing, which also incrementally builds
the hash table one slot at a time. And not only would later retrieval
be close to O(1), but it would consume precious memory at a slower
rate than symtab will. So it's pretty insane to suggest using symtab
for limited-memory systems when there are other techniques that use
less memory, less CPU, and offer faster access.

> See this thread for a discussion of memory usage:

Excellent suggestion-- because apparently in the messages you refused
to read, you'll see that symtab wastes memory. The OP will certainly
benefit and will avoid symtab.

> It is possible that
> the algorithm has been invented elsewhere too, but I wouldn't know
> about that because I never do any research --- I don't really concern
> myself with OPC (other people's code) too much. For the most part, I
> recommend that you not concern yourself with OPC either, although
> using my symtab would be an exception to this rule.

This is priceless. First, you make a virtue out of willful
ignorance. Then for bonus hypocrisy points, you tell the OP to avoid
other people's code... except for the stunning wonder that is the
memory and CPU hog that is symtab.

chitselb

unread,

Sep 8, 2010, 2:55:56 AM9/8/10

to

As far as STC vs. DTC vs. ITC, I'm sticking with the DTC approach.
The intangible here is why am I reinventing this wheel (Forth on a
PET? c'mon!) in the first place. I can't explain that anymore than I
can explain why Sally Struthers likes kittens and children. The DTC
model is something I've never played with before, while the ITC model
is familiar ground. STC just seems like imposing Forth-like design
patterns on plain old 6502 assembler, also familiar ground.

Here's my chart, according to my perception (where closer to the left
side is "better").

Memory: ITC ~= DTC < STC
Speed: STC > DTC > ITC
Coded: DTC >>> ITC > STC

legend: ~= means "it's a wash", > means "more", < means "less" and
>>> means "way more"

As for symtab, what I got out of this exchange (and examining the
SYMTAB source) is that symtab is some kind of a self-balancing btree
based on the popularity of a word. It's been a while since I read
Knuth, but right now a word header is a length byte, the text of the
word name, and a link. By going DTC I did away with the CFA pointer.
Adding in a left node, a right node, a counter, and other stuff
doesn't seem like it's going to save memory.

In Forth Dimensions FD-V05N3.pdf and FD-V05N2.pdf (which I am pretty
sure I downloaded from http://forth.org ) There's a very nice pair of
articles by Evan Rosen on figForth vocabularies, which even explains
what the $A081 is (as I suspected, a dummy name field for "space", an
otherwise invalid word). Those two articles really cleared up the
conceptual model of chaining and nesting vocabularies. Still Plan A,
which is now to hash and multithread*16 the biggest vocabulary (FORTH)
and single-thread all child vocabularies.

The project in its very pre-alpha state is at http://github.com/chitselb/pettil
. I'm using the xa65 cross-assembler and vice emulation packages on
Ubuntu to do this on my laptop.

chitselb

unread,

Sep 8, 2010, 3:19:49 AM9/8/10

to

I just did a mini-benchmark in Blazin' Forth on an emulated C=64.
What this times is performing a worst-case (FIND) 33 times, using the
figForth linked-list approach and about 300 words.

jiffy@ 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
9 jiffy@

jiffy@ is a word which pushes the 32-bit system "jiffy" clock, based
on a 1/60th of a second interrupt. There are 33 "9"s between the two
timer reads (the maximum I could type on an 80-character input
line).

I tried it a few times, with just the FORTH vocabulary (222 jiffies)
and also after typing EDITOR DEFINITIONS (234 jiffies) . So for those
two cases it took about four seconds for the cursor to come back.
Compiling blocks of source which typically includes a lot of literal
numbers is the thing I'm trying to juice up a bit here.

Rod Pemberton

unread,

Sep 8, 2010, 7:15:47 AM9/8/10

to

"chitselb" <chit...@gmail.com> wrote in message
news:670bf137-f903-4046...@i13g2000yqd.googlegroups.com...

> It occurred to me that having FIND search through a single linked list
> of all 300+ words before it fails (and we try NUMBER instead) is a
> little inefficient. So I implemented a simple XOR hashing function to
> crunch the name field down to a 4-bit number. That way, it only has
> to search (on average) 6.25% of the dictionary for a name to figure
> out that it isn't in there.

XOR hash results are really, really poor when using ASCII. There are too
many similarly set bits.

You may want to add some randomness to your hash. Create a 128 byte table
of random values. Use the ASCII char as an index into the table. XOR the
returned ASCII values. This will produce much better results.

> ... down to a 4-bit number ...

It's been almost two decades since I coded anything in 6502... Wouldn't an
8-bit byte do just as well? It would also reduce the hash collisions esp.
if you add some randomness to the hash.

Rod Pemberton

unread,

Sep 8, 2010, 7:16:19 AM9/8/10

to

"chitselb" <chit...@gmail.com> wrote in message

news:6480f166-3431-4623...@z28g2000yqh.googlegroups.com...

> As far as STC vs. DTC vs. ITC, I'm sticking with the DTC approach.
> The intangible here is why am I reinventing this wheel (Forth on a
> PET? c'mon!) in the first place. I can't explain that anymore than I
> can explain why Sally Struthers likes kittens and children. The DTC
> model is something I've never played with before, while the ITC model
> is familiar ground. STC just seems like imposing Forth-like design
> patterns on plain old 6502 assembler, also familiar ground.
>

Brad Rodriguez has a nice series which covers DTC, ITC, etc., with diagrams,
for his 8-bit Forth's for Z80, 8051, 6809. Unfortunately, he didn't produce
a 6502 Forth.
http://www.bradrodriguez.com/papers/

> Here's my chart, according to my perception (where closer to the left
> side is "better").
>
> Memory: ITC ~= DTC < STC
> Speed: STC > DTC > ITC
> Coded: DTC >>> ITC > STC
>
> legend: ~= means "it's a wash", > means "more", < means "less" and
> >>> means "way more"
>

DTC uses one address lookup.
ITC uses two address lookups, i.e., indirect addressing.
STC is jsr/rts assembly, instead of address lists.

> By going DTC I did away with the CFA pointer.

Uh... CFA field?...

The CFA, while it typically points to the DOCOL routine for most words, it
doesn't for all. By eliminating the CFA, you won't be able to distinguish
between compiled words and low-level routines (aka "primitives") or words
that use a non-DOCOL CFA: dovar, docon, douse, dodoes, etc. Eliminating the
CFA for compiled words is not an issue, but for the other words it is -
since they won't be calling DOCOL. You'd need a way to determine when you'd
reached one of these words instead of a compiled word. I.e., the nesting of
addresses in compiled words is much like traversing a binary tree and a
non-DOCOL CFA is used to tell you when you've reached a node without any
branches...

> In Forth Dimensions FD-V05N3.pdf and FD-V05N2.pdf (which I am pretty
> sure I downloaded from http://forth.org ) There's a very nice pair of
> articles by Evan Rosen on figForth vocabularies, which even explains
> what the $A081 is (as I suspected, a dummy name field for "space", an
> otherwise invalid word). Those two articles really cleared up the
> conceptual model of chaining and nesting vocabularies.

Thanks, I may take a look.

Rod Pemberton

John Passaniti

unread,

Sep 8, 2010, 10:06:52 AM9/8/10

to

On Sep 8, 2:55 am, chitselb <chits...@gmail.com> wrote:
> As for symtab, what I got out of this exchange (and examining
> the SYMTAB source) is that symtab is some kind of a
> self-balancing btree based on the popularity of a word. It's
> been a while since I read Knuth, but right now a word header
> is a length byte, the text of the word name, and a link. By
> going DTC I did away with the CFA pointer. Adding in a left
> node, a right node, a counter, and other stuff doesn't seem
> like it's going to save memory.

Correct. And beyond that is the code for symtab itself, which would
consume space (and significant CPU time). symtab may have some
application, but ironically, for symbol tables, it's pretty poor
(relative to competitive options)-- even on a fast machine with gobs
of memory.

You've stated that your "plan A" is to hash the word to select a list
in the FORTH vocabulary and then search that list. That (or something
like it) is probably best, provided you find a good hash that equally
balances the number of words in each list. Fortunately, since you'll
already know the words in the FORTH vocabulary, you'll be able to test
and tune your hash to be specific for those words.

The second thing you can do to speed things up is to make searching
each list faster. I don't know the structure of your dictionary
header, but often you'll see a byte used to store the immediate flag
and the length. If you restrict the length to something reasonable
(say 5 bits), that means you have two bits left over. Perform a
secondary hash and pick a couple bits from it to store there. Then,
when searching the list you not only can skip words that don't match
the length, but that also don't match the hashed value. Will this
save much time? I don't know-- but modeling it shouldn't be
difficult.

John Passaniti

unread,

Sep 8, 2010, 10:13:42 AM9/8/10

to

On Sep 8, 7:15 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:

> XOR hash results are really, really poor when using ASCII. There are too
> many similarly set bits.

True, although I wouldn't suggest this:

> You may want to add some randomness to your hash. Create a 128 byte table
> of random values. Use the ASCII char as an index into the table. XOR the
> returned ASCII values. This will produce much better results.

That's relatively expensive on a 6502-- you've got the 128 bytes used
up for a table and the cost of indexing into that table. A simpler
solution might be to roll the bits after each exclusive-or with a
character. That's a single instruction, and will provide ample mixing
of bits. Again, as the words being hashed are known (they are the
FORTH vocabulary), it is easy enough to quickly model this and measure
the performance and distribution the hash provides.

Paul Rubin

unread,

Sep 8, 2010, 10:30:10 AM9/8/10

to

John Passaniti <john.pa...@gmail.com> writes:
> Fortunately, since you'll
> already know the words in the FORTH vocabulary, you'll be able to test
> and tune your hash to be specific for those words.

Could also use a PATRICIA tree, stealing a bit or two from each
character to say where the prefix segments start and stop. That might
be smaller than any linked list or hash table, because it compresses out
common prefixes across the different words.

John Passaniti

unread,

Sep 8, 2010, 2:10:12 PM9/8/10

to

On Sep 8, 10:30 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> Could also use a PATRICIA tree, stealing a bit or two from
> each character to say where the prefix segments start and
> stop. That might be smaller than any linked list or hash
> table, because it compresses out common prefixes across the
> different words.

I thought of this, but rejected radix trees because my assumption was
that although the FORTH vocabulary is known at compile-time, new
definitions will go there by default. So, the cost of updating the
tree would be high for a slow 6502 and probably not worth the effort.

I wonder if a dictionary approach I took for a Forth-like language
targeting a HCS08 might make sense here. The fundamental idea there
was that I stored the dictionary two different ways. The dictionary
that the user extended was a traditional list, but the dictionary that
had all the base definitions for the language was a sorted array. The
reason for this was that the base definitions were stored in flash and
couldn't easily be updated, so why not store them more efficiently?
This allowed searching the base dictionary with a binary search in
O(log2 N) time.

So the way my dictionary was searched was to search for the word in
the traditional list. If it wasn't found, then try again in the base
words dictionary. This still gives the ability to override words in
the base dictionary (since they are placed in the traditional list and
found first). I don't know if this would help the OP who's concern
seems to be speed. That would depend on how many definitions he
typically added to the base. If there were a lot, then it probably
wouldn't help all that much since the majority of the time would be
spent searching that list. In my case, the number of words that would
be defined at run-time was typically very small, so it was a big gain.

I guess the OP could take this idea a bit farther and store all
definitions in a sorted array. Build headers and definitions up to
higher memory and extend the sorted array that points to the headers
down to lower memory. Each definitions' header then doesn't need a
pointer to the next definition, and that pointer then moves to the
sorted array, so it's a wash in terms of memory. The biggest downside
is that adding definitions means moving at worst N pointers. But
algorithmically, it's pretty trivial to identify the insertion point
and move the affected pointers.

Hugh Aguilar

unread,

Sep 8, 2010, 3:59:56 PM9/8/10

to

On Sep 8, 12:55 am, chitselb <chits...@gmail.com> wrote:
> As for symtab, what I got out of this exchange (and examining the
> SYMTAB source) is that symtab is some kind of a self-balancing btree
> based on the popularity of a word. It's been a while since I read
> Knuth, but right now a word header is a length byte, the text of the
> word name, and a link. By going DTC I did away with the CFA pointer.
> Adding in a left node, a right node, a counter, and other stuff
> doesn't seem like it's going to save memory.

The following is my calculation of memory use of symtab compared to a
hash table:

On Dec 2 2009, 4:21 pm, Hugh Aguilar <hugoagui...@rosycrew.com> wrote:
> My own calculation is that a hash table uses the same amount of memory
> as a symtab when the hash table is 1/2 full. This calculation helps
> the argument for hash tables because hash tables are a lot more
> efficient (fewer collisions) when 1/2 full rather than 2/3 full. Your
> calculation of 2/3 not only didn't make sense, but it hurt your
> argument anyway. According to the wikipedia article
> (http://en.wikipedia.org/wiki/Hash_table), the load factor should not
> exceed .7 because the cost of the collisions becomes excessive. A load
> factor of .67 (2/3) is pretty close to .7. I've never implemented a
> hash table, so I don't have any personal experience with this. AFAIK
> however, almost everybody resizes the hash table when the load factor
> passes the 1/2 threshold.
>
> My calculation works like this: The memory consumed per node in a hash
> table is equal to N+N/L, with N being the number of nodes and L being
> the load factor. I am assuming the use of a linked list for collision
> resolution. Because of this, each node contains a forward link pointer
> to the next node in the linked list, or NIL if there are no more nodes
> in the linked list. This gives us N words of memory (one word per
> node). The load factor L is the fraction of the table that contains
> pointers to linked lists, and 1-L is the fraction of the table that
> contains NIL pointers. N/L is the size of the table given N nodes
> (assuming no collisions at all). I add N to N/L to obtain the total
> amount of memory used per node. If L is 1/2, then N+N/L is 3. My
> symtab also takes 3 words per node (left, rite and count), so a load
> factor of 1/2 is where hash tables equal symtabs in memory overhead.
> This, coincidentally, is also the load factor where most people resize
> their hash tables. When the load factor is < 1/2, the hash table uses
> more memory per node than a symtab does. When the load factor reaches
> the 1/2 threshold, the hash table is resized. The result is that the
> hash table *always* uses more memory per node than a symtab does. If
> you resize the hash table by doubling its memory usage, the hash
> table's load factor becomes 1/4. Now the hash table uses 5 words
> (1+1/.25) per node. This is 167% of symtab's 3 words per node, which
> is pretty bad.
>
> My calculation above assumes that there are no collisions at all. This
> isn't realistic. Even in a sparsely populated hash table you are
> likely to have a few collisions. When the number of nodes is around
> 1/2 the hash table size however, there aren't too many collisions and
> it is reasonable to assume that there aren't any collisions. As the
> number of nodes increases to 2/3 or more, there are a lot more
> collisions and my equation becomes increasingly inaccurate. This is
> nothing to worry about because, as I said, it is typical to resize the
> hash table at 1/2 anyway.

A good point in regard to the 6502 is that the better hash functions
rely on multiplication, which the 6502 doesn't support. If you try to
construct a hash function out of XOR and shift, it isn't going to be
very random --- you are going to get a lot of collisions.

As I said earlier, the big problem with hash tables is that you have
to predict ahead of time how many elements will be in the hash table,
and allot memory for that many elements. The good news is that there
are only *two* mistakes that you can make --- to predict low and to
predict high. The bad news, is that you are *always* guilty of one
mistake or the other. If you predict low, then your hash table will
bog down due to excessive collisions. If you predict high, then you
waste memory. Hash tables in a low-memory computer that lacks
multiplication are much more complicated than you are imagining.

Resizing a hash table (when you realize that your prediction was too
low) is a major hassle. You have to stop the world and rehash *every*
element in the table. On a 6502, that is going to be slow. Also,
during the rehash, you have to have both the old hash table and the
new hash table in memory simultaneously, and there may not be room for
both on the PET.

Anton Ertl however, tells us of a technique for avoiding the "stop the
world" scenario:

On Dec 7 2009, 5:16 pm, Hugh Aguilar <hugoagui...@rosycrew.com> wrote:
> On Dec 7, 10:59 am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
> wrote:
>
> > Hugh Aguilar <hugoagui...@rosycrew.com> writes:
> > >The
> > >problem with hash tables is that they have to "stop the world" and
> > >resize themselves
>
> > No, if you want to avoid stopping the world, you can always keep a
> > hash table A with size M and a hash table B with size 2M around, and
> > search them both. Whenever you insert an element, you insert it into
> > B, and also delete an element from A and insert it into B. When the
> > load factor in B reaches the threshold, A will be empty and can be
> > freed, B becomes A, and you can create a hash table with size 4M that
> > becomes the new B. In this way the cost of resizing is distributed
> > smoothly across all the insertions. Of course, the disadvantage is
> > that the lookup becomes more expensive.
>
> That is a good method for avoiding the "stop the world" scenario. I
> hadn't been aware of this. The downside is that you now have to do two
> insertions and one deletion for every word that gets inserted. Also,
> you have to do one or two (average 1.5) lookups for every word that
> gets looked up. It seems unlikely that this is going to be faster than
> symtab. Also, you have two hash tables in use at the same time, which
> is taking up a lot of memory. Previously your hash-table had been
> equal to symtab in memory usage when your load factor reached 1/2 and
> worse than symtab when your load factor was still less than 1/2. Now
> you are *always* worse than symtab. I think that symtab is better than
> your solution in regard to both speed and memory. I think that hash
> tables are only a good choice if you just throw a lot of memory at
> them from the get-go, so that your load factor never approaches the
> 1/2 threshold.

All in all, I think that hash tables on a 6502 would be a huge
mistake. The code for a hash table is simpler than the symtab code,
but you run into a lot of complication in regard to slow speed,
excessive collisions, and the dreaded resizing of the hash table after
the whole thing bogs down to an intolerable level.

The symtab code is fairly complicated --- but I've already written it
--- you can just include it in your compiler and you are good to go.
You can give each word-list ("vocabulary" in Forth-83) its own symtab
tree, which will speed the whole thing up considerably. This is
unrealistic with hash tables because you have to predict the size of
*each* wordlist rather than just the size of the entire dictionary ---
some of those predictions are going to be bad, so you are pretty much
guaranteed to get into trouble. By comparison, symtab trees grow
dynamically --- you never have to predict anything --- you just use
them without worry. You also aren't required to devise any hash
function that is supposed to be random without involving
multiplication, which likely involves more mathematical knowledge than
you have (or anybody on c.l.f. including myself have). Using symtab is
easy!

John Passaniti

unread,

Sep 8, 2010, 4:29:30 PM9/8/10

to

On Sep 8, 3:59 pm, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
> The following is my calculation of memory use of symtab
> compared to a hash table:

Yep, just as wrong then as you are now. You keep forgetting that
"hash table" isn't a single algorithm but a family of algorithms, each
with different properties. Your calculation of memory usage for a
particular kind of hash table, not all. And it also suffers from a
fantastically conservative 50% load factor before you resize. Nobody
does that. A much more typical number is 75%, and even then, a
strategy that often works out very well is to resize only when the
average (or maximum) length of a collision chain reaches some
threshold. That often allows deferring a resize until very late.
After all, a collision chain can be traversed faster than a tree.

I know you don't bother to read other people's code, but you could at
least learn about the algorithms you are so quick to comment on.

> A good point in regard to the 6502 is that the better hash
> functions rely on multiplication, which the 6502 doesn't
> support. If you try to construct a hash function out of XOR
> and shift, it isn't going to be very random --- you are
> going to get a lot of collisions.

Funny, I always thought a shift *was* a multiplication (or division).

No matter-- my suggestion for more randomness is to exclusive-or and
then roll, not shift. That will nicely recirculate bits. You could
model this yourself and see... oh, sorry, I forgot. Not only don't
you read other people's code, you never test anything for fear that it
will go counter to your intuition.

> As I said earlier, the big problem with hash tables is that
> you have to predict ahead of time how many elements will be
> in the hash table, and allot memory for that many elements.

Nope. As explained before.

> Resizing a hash table (when you realize that your prediction
> was too low) is a major hassle. You have to stop the world
> and rehash *every* element in the table. On a 6502, that is
> going to be slow. Also, during the rehash, you have to have
> both the old hash table and the new hash table in memory
> simultaneously, and there may not be room for both on the PET.

Nope. There are a variety of hashing algorithms that don't require
rehashing the entire table. Look up "linear hashing" then "extendible
hashing' and finally, "consistent hashing." Google is your friend.
Honest!

> The symtab code is fairly complicated --- but I've already
> written it --- you can just include it in your compiler
> and you are good to go.

But why should he? That's other people's code! Oh, I forgot-- you
are the exception to your own rule.

> You also aren't required to devise any hash function that
> is supposed to be random without involving multiplication,
> which likely involves more mathematical knowledge than
> you have (or anybody on c.l.f. including myself have).

That's mostly true-- the math behind selecting a good hash function is
non-trivial. Good thing the rest of us live in a world where we are
free to look up the results of other's research; hash functions have
been studied for years, and one has only to choose one appropriate to
the values being hashed. That's not an option for you, since you
don't look at other people's code, but thankfully the rest of us have
no problem leveraging the knowledge of others.

> Using symtab is easy!

And a huge waste of memory and CPU. It may be good for something, but
it's poor for managing symbol tables.

Paul Rubin

unread,

Sep 8, 2010, 7:13:56 PM9/8/10

to

John Passaniti <john.pa...@gmail.com> writes:
>> Could also use a PATRICIA tree...

> I thought of this, but rejected radix trees because my assumption was
> that although the FORTH vocabulary is known at compile-time, new
> definitions will go there by default. >

> I wonder if a dictionary approach I took for a Forth-like language
> targeting a HCS08 might make sense here. The fundamental idea there
> was that I stored the dictionary two different ways. The dictionary
> that the user extended was a traditional list, but the dictionary that
> had all the base definitions for the language was a sorted array.

Right, look in an updateable dictionary first, and if the word isn't
found in it, then look in the compressed static dictionary. I think the
radix tree might take less space than a sorted array by eliminating
redundant characters shared across keywords. I thought of this approach
during the discussion about your "tester" program where the idea was to
write a Forth subset with minimal memory consumption. It occurred to me
that it might even be possible to generate the radix tree with an
external preprocessor that interleaved the tree constants with the
machine instructions with the Forth primitives, saving space by getting
rid of pointers from the tree to the code by putting the code right
there inside the tree. I don't know whether the space savings for a few
hundred words would be greater than the extra code to deal with the
fancy representation though.

Hugh Aguilar

unread,

Sep 8, 2010, 8:26:42 PM9/8/10

to

On Sep 8, 5:13 pm, Paul Rubin <no.em...@nospam.invalid> wrote:

Why is a binary-search of a sorted list faster than a balanced binary
tree? That doesn't make any sense!

My symtab is faster than a balanced binary tree because it is *not*
balanced, but rather it has the most common words toward the top.

Paul Rubin

unread,

Sep 8, 2010, 9:23:16 PM9/8/10

to

Hugh Aguilar <hughag...@yahoo.com> writes:
> Why is a binary-search of a sorted list faster than a balanced binary
> tree? That doesn't make any sense!

I think the idea is that the sorted list saves memory by avoiding
storing any internal pointers. I don't know if searching it would be
any faster than searching a tree.

chitselb

unread,

Sep 8, 2010, 10:27:59 PM9/8/10

to

I added this:
;~debug
stx n
sed
cmp #10
adc #'0'
cld
jsr CHROUT
ldx n
;~debug

At the end of my dhash primitive. Since all the name headers exist,
these are the actual hashcodes I'll wind up with in the FORTH
vocabulary.

cf75a01122cb20ce23432
4881c0388bdbaad0685512697e4f2301c1a510a8
93dfe2b3279a78429f5bda70e7479fa9facebfaf
befdece15d9c9f89f8143ca89c6c03f9b383442d
e48b311bb3facc3551b22b870bd069beb60c861d
2369bc1273ecc14676976844524cb026fd33628c
540319b4629491b4b10daf0ce6059e8ed98ff9e8
d48cd0a80077edeccfafa55d17616fc0bbe1

So, 297 words. Here's the histogram.
0 ********************
1 *********************
2 *******************
3 ******************
4 ******************
5 *************
6 ****************
7 **************
8 *********************
9 ********************
a ****************
b ***********************
c ************************
d ****************
e ******************
f ********************

I'm saying it works for me and beats the heck out of single chaining:
************************************************************************
************************************************************************
************************************************************************
************************************************************************
*********

I may wind up ordering each of the 16 threads so I can use binary-
search on each subchain, if it's still not fast enough. (Where "fast
enough" is defined as spending more than _x_% of CPU time in FIND when
compiling screens). I'll measure that vs. single-thread search when
the time comes.

Paul Rubin

unread,

Sep 9, 2010, 12:20:26 AM9/9/10

to

chitselb <chit...@gmail.com> writes:
> I may wind up ordering each of the 16 threads so I can use binary-
> search on each subchain, if it's still not fast enough.

The binary search (especially if those dicts are updateable so you also
have to maintain them in sorted order) will take extra code, and also
the binary search will add bookkeeping overhead of its own. It might be
comparably fast to use 32 subchains instead of 16, without using extra
memory (you'd use 32 more pointer bytes, but save that extra code).

Do you have a way to profile the execution time (instruction by
instruction) in that simulator?

And wow, I knew the PET/C64 were slow compared to what we use now, but I
never realized it was THAT slow. My hat is off to the programmers who
got those machines doing useful things.

John Passaniti

unread,

Sep 9, 2010, 2:10:53 AM9/9/10

to

On Sep 8, 7:13 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> I thought of this approach during the discussion
> about your "tester" program where the idea was to
> write a Forth subset with minimal memory consumption.
> It occurred to me that it might even be possible to
> generate the radix tree with an external preprocessor
> that interleaved the tree constants with the machine
> instructions with the Forth primitives, saving space
> by getting rid of pointers from the tree to the code
> by putting the code right there inside the tree.

Using an external preprocessor that writes out the final dictionary
that is compiled in the code is exactly what I'm doing. I'm not
encoding it as a radix tree of any sort, but certainly could try it.
Maybe I'll do that and see how much it actually saves.

Of course, if I go down that road, then looking at things like a
"perfect hash" is probably even better, at least for performance. The
issue there would be the algorithmic complexity of the generated hash,
but assuming it's not high, having true O(1) access would win in terms
of performance. I've played with tools to generate perfect hashes
before, but never had a static set of symbols to search. For Tester,
I do.

Paul Rubin

unread,

Sep 9, 2010, 4:19:31 AM9/9/10

to

chitselb <chit...@gmail.com> writes:
> So, 297 words. Here's the histogram.

> 5 ************* ...
> c ************************

I wonder if you could get better dispersion with 17 chains instead of
16. That is, if you compute an 8-bit hash of the symbol, take the hash
mod 17 instead of xor'ing the two nibbles. You can mod by 17 by
subtracting the high nibble from the low one and adjusting for the sign:

h = 8-bit hash value
a, b = high and low nibbles of h (a=high, b=low)
m = b - a (as an 8-bit signed value)
if m < 0 then m := m + 17

now m == h % 17 unless I missed something. I don't know if the 6502 has
nibble instructions since I've never programmed one though.

chitselb

unread,

Sep 9, 2010, 4:19:40 AM9/9/10

to

I'm wondering why it is usually done like this:

fig:
[LFAlo|LFAhi] [len]WORDNAM[E] [CFAlo|CFAhi] [parameter field...]

Blazin' Forth moves the LFA to the other side of the NFA, like this.
[len]WORDNAM[E] [LFAlo|LFAhi] [CFAlo|CFAhi] [parameter field...]

So now we don't have to TRAVERSE (read: fetch and examine each byte of
"WORDNAME" until we get to the last byte with the hi-bit set) to get
to LFA.

But what's wrong with this? Given a CFA, NFA, or LFA we know exactly
where the length/flags byte is, and subtracting its contents gives us
the address of the name's first character. I'd leave the hi-bit set
(for reassurance/validation purposes). I can't be the first person to
come up with this idea.

[W]ORDNAME[len] [LFAlo|LFAhi] [CFA] [parameter field...]

Andrew Haley

unread,

Sep 9, 2010, 4:34:11 AM9/9/10

to

John Passaniti <john.pa...@gmail.com> wrote:

> On Sep 8, 3:59?pm, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
>> The following is my calculation of memory use of symtab
>> compared to a hash table:
>
> Yep, just as wrong then as you are now. You keep forgetting that
> "hash table" isn't a single algorithm but a family of algorithms, each
> with different properties. Your calculation of memory usage for a
> particular kind of hash table, not all. And it also suffers from a
> fantastically conservative 50% load factor before you resize. Nobody
> does that.

Actually, 50% is about right for open addressing, but then there's no
need for chaining so there's no need for the links.

I'm not sure I understand the point of this discussion any more: using
a hash table to select 16 or 32 lists is obviously the right thing to
do, and only takes a few lines of Forth.

Andrew.

Elizabeth D Rather

unread,

Sep 9, 2010, 4:53:46 AM9/9/10

to

On 9/8/10 10:19 PM, chitselb wrote:
> I'm wondering why it is usually done like this:
>
> fig:
> [LFAlo|LFAhi] [len]WORDNAM[E] [CFAlo|CFAhi] [parameter field...]

I don't think there exists a "usually". The head really needs to be
arranged to take advantage of the processor it's being implemented on,
which is one reason everyone's stoutly resisted any attempt to
standardize dictionary entry structure.

> Blazin' Forth moves the LFA to the other side of the NFA, like this.
> [len]WORDNAM[E] [LFAlo|LFAhi] [CFAlo|CFAhi] [parameter field...]
>
> So now we don't have to TRAVERSE (read: fetch and examine each byte of
> "WORDNAME" until we get to the last byte with the hi-bit set) to get
> to LFA.
>
> But what's wrong with this? Given a CFA, NFA, or LFA we know exactly
> where the length/flags byte is, and subtracting its contents gives us
> the address of the name's first character. I'd leave the hi-bit set
> (for reassurance/validation purposes). I can't be the first person to
> come up with this idea.
>
> [W]ORDNAME[len] [LFAlo|LFAhi] [CFA] [parameter field...]

Much depends on the particular instruction set, and I'm not familiar
with this one. But on some processors, it's fastest to link to the link
field. You know how big it is (one cell), and if the count/flags and
name follow, you just start comparing with the count byte, and as soon
as something doesn't match you move on. On other processors it's easy
to start with the count/name. It's not necessary to "traverse" the
whole name. In any case, you want to take advantage of whatever address
manipulation the processor offers you. On one early processor (I forget
now which) we had a link actually point to the 2nd byte of the next
link; it saved several cycles.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Rod Pemberton

unread,

Sep 9, 2010, 4:54:04 AM9/9/10

to

"John Passaniti" <john.pa...@gmail.com> wrote in message
news:64dc3361-2450-4221...@m1g2000yqo.googlegroups.com...

> On Sep 8, 7:15 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> wrote:
> > XOR hash results are really, really poor when using ASCII. There are too
> > many similarly set bits.
>
> True, although I wouldn't suggest this:
>
> > You may want to add some randomness to your hash. Create a 128 byte
table
> > of random values. Use the ASCII char as an index into the table. XOR the
> > returned ASCII values. This will produce much better results.
>
> That's relatively expensive on a 6502--

Relatively expensive? Are you joking? You do know that even the lowly Vic
20 had at least 3.5KB available for the user's app... Yes? Is 128 bytes
too much for it to handle?

> you've got the 128 bytes used
> up for a table
>

He could put the table into zero page, if his machine has enough free space
there. That would shave a few cycles. He could also reduce the size of the
random table to 64 bytes - just those between space and underscore, i.e.,
uppercase and most graphics char's.

> and the cost of indexing into that table.

Yeah, about 3, 4, maybe 5 or so extra cycles versus potentially far more
hash collisions. More hash collisions means there will be more string
comparisions to identify the correct word in the dictionary. Each string
comparison is roughly equivalent to one hash sequence using the random
table. I'd argue that the cost associated with an increase in string
comparisons are far more expensive than the cost of a more complex hash. My
tests indicate there could be 8 to 10 collisions on average, since the hash
value is being restricted to 8-bit bytes, peaking at 16 or so, depending on
the algorithm's distribution of hash values. Lower collisions and an
equalized hash distribution from a better algorithm could be worth the
expense.

> A simpler solution might be to roll the bits after each exclusive-or with
a
> character. That's a single instruction, and will provide ample mixing
> of bits.

It's definately simpler. It's definately fewer cycles. I'm not sure that
it's any better...

> Again, as the words being hashed are known (they are the
> FORTH vocabulary), it is easy enough to quickly model this and measure
> the performance and distribution the hash provides.

Only the Forth words in the initial dictionary are known, by him. Are his
users expected to not define any words, variables, or constants of their
own? I.e., an optimal hash for his initial set of Forth words may be a poor
choice once a user defines some words. The users words could create many
collisions depending on the algorithm, i.e., slowdown. Therefore, fitting
the hash to a static wordset is probably not a good idea. On a modern PC
with GCC, he should be able to test all character combinations upto 7 or 8
ASCII characters in length with various hashes fairly quickly. However,
he'll probably need to learn a bit about hashes before he can develop a
good, and simple one on his own.

Rod Pemberton

Paul Rubin

unread,

Sep 9, 2010, 5:12:14 AM9/9/10

to

"Rod Pemberton" <do_no...@notreplytome.cmm> writes:
> Relatively expensive? Are you joking? You do know that even the lowly Vic
> 20 had at least 3.5KB available for the user's app... Yes? Is 128 bytes
> too much for it to handle?

Well, after the 128 byte table there's only 3.37k left... that seems
pretty ugly if it can be avoided.

I thought the usual trick was to rotate the register 1 bit when
computing the hash. Something like:

byte hash(byte *str) {
byte h = 0;
while (*str)
h = rotate_left(h) + *str++;
}

That separates out common bits between characters.

Paul Rubin

unread,

Sep 9, 2010, 5:44:04 AM9/9/10

to

John Passaniti <john.pa...@gmail.com> writes:
> Of course, if I go down that road, then looking at things like a
> "perfect hash" is probably even better, at least for performance.

Usual perfect hash generators like gperf burn space to increase speed.
If you want a minimal perfect hash it's still going to probably use
some tables and burn space.

I'd expect space is more important than speed in tester. You're running
on pretty fast cpu's and aren't going to compile massive amounts of
forth code.

Cuckoo hashing looks like a cute trick:

http://en.wikipedia.org/wiki/Cuckoo_hashing

You set up the hash table so that the key you want is always in one of
two cells (computed using two different hash functions) if it's in the
table at all. Then you look in at most two places to check whether a
given key is in the table,

Anton Ertl

unread,

Sep 9, 2010, 6:15:17 AM9/9/10

to

"Rod Pemberton" <do_no...@notreplytome.cmm> writes:
>"chitselb" <chit...@gmail.com> wrote in message
>news:670bf137-f903-4046...@i13g2000yqd.googlegroups.com...
>> It occurred to me that having FIND search through a single linked list
>> of all 300+ words before it fails (and we try NUMBER instead) is a
>> little inefficient. So I implemented a simple XOR hashing function to
>> crunch the name field down to a 4-bit number. That way, it only has
>> to search (on average) 6.25% of the dictionary for a name to figure
>> out that it isn't in there.
>
>XOR hash results are really, really poor when using ASCII. There are too
>many similarly set bits.

It's poor for any data. You want each bit in the input to ideally
influence half of the bits of the hash value on average. With xor one
bit influences only one bit. Addition is significantly better and
almost as cheap on the 6502 (don't CLC after every character, feeding
the carry back helps distributing the bits).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2010: http://www.euroforth.org/ef10/

Paul Rubin

unread,

Sep 9, 2010, 6:29:12 AM9/9/10

to

Paul Rubin <no.e...@nospam.invalid> writes:
> I'd expect space is more important than speed in tester. You're running
> on pretty fast cpu's and aren't going to compile massive amounts of
> forth code.

Thinking a little more, the real space burner (and the idea of radix
trees to compress bytes out) is the space used by tokens themselves, so
you can tell if you've missed the dictionary. Maybe the thing to do is
get rid of those with a Bloom filter. I think if you have a few hundred
statically known words, with say 12 bits of Bloom filter per keyword,
you should be able to construct a filter that tells you with perfect
accuracy whether a given string is in the wordlist. Then use a minimal
or near-minimal perfect hash to get from the word to the code.

Bernd Paysan

unread,

Sep 9, 2010, 7:55:30 AM9/9/10

to

On Donnerstag, 9. September 2010 12:15, Anton Ertl wrote:
> It's poor for any data. You want each bit in the input to ideally
> influence half of the bits of the hash value on average. With xor one
> bit influences only one bit. Addition is significantly better and
> almost as cheap on the 6502 (don't CLC after every character, feeding
> the carry back helps distributing the bits).

When we implemented hashing in Gforth, I started with add all bytes
together. It worked remarkable well, the currently used version with
rotation is only slightly better.

Do a Chi-square test of the results. If it is roughly a standard
distribution with the expected variance, your hash function is good enough.

--
Bernd Paysan
"If you want it done right, you have to do it yourself!"
http://www.jwdt.com/~paysan/

Alex McDonald

unread,

Sep 9, 2010, 8:19:28 AM9/9/10

to

On 9 Sep, 10:12, Paul Rubin <no.em...@nospam.invalid> wrote:

unsigned long
hash(unsigned char *str)
{
unsigned long hash = 5381;
int c;

while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

return hash;
}

See http://www.cse.yorku.ca/~oz/hash.html.

Quoted; djb2
this algorithm (k=33) was first reported by dan bernstein many years
ago in comp.lang.c. another version of this algorithm (now favored by
bernstein) uses xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of
number 33 (why it works better than many other constants, prime or
not) has never been adequately explained.

Longer discussion; http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809

John Passaniti

unread,

Sep 9, 2010, 9:53:11 AM9/9/10

to

On Sep 9, 4:34 am, Andrew Haley <andre...@littlepinkcloud.invalid>
wrote:

> Actually, 50% is about right for open addressing, but then
> there's no need for chaining so there's no need for the links.

You may have different references or have used poorer hash functions
than I have. But most every reference I have and in personal
experience, open-addressed hash tables usually do fine until about 75%
full. And even then, the question isn't how many additional probes it
takes, but if the cost of those probes is greater than for competitive
algorithms.

Regardless, the constant refrain expressed by Hugh is that using hash
tables (again, he sees them all as one single thing) always consumes
more memory than the awesome wonder that is symtab. That's false, if
only because he's only considering a single type of hash table.

> I'm not sure I understand the point of this discussion any
> more: using a hash table to select 16 or 32 lists is
> obviously the right thing to do, and only takes a few lines
> of Forth.

And it's functionally identical to a hash table using chaining for
collision resolution, since it is known that every slot in the hash
will have collisions.

The point of this discussion? Well, that's easy. It started off as
helping the OP with strategies for faster dictionary searches. The
OP's use of a hash function to select a list caused an autonomic
response in Hugh, who upon seeing "hash" found a point in the
discussion he could promote symtab, demote both research and
(implicitly) quantitative test/measurement, and offer his usual set of
bizarre hypocritical statements. I responded before figuring out that
the OP was clearly able to evaluate symtab for himself. But that led
to a side discussion (as it always does in these threads) to
alternatives to the OP's original scheme which may or may not be
useful.

And personally, since these days, I'm targeting platforms that in
terms of memory are even more restricted than the OP's (but faster),
I'm interested in other insights others may have.

Anton Ertl

unread,

Sep 9, 2010, 10:26:06 AM9/9/10

to

chitselb <chit...@gmail.com> writes:
>I'm wondering why it is usually done like this:
>
>fig:
>[LFAlo|LFAhi] [len]WORDNAM[E] [CFAlo|CFAhi] [parameter field...]
>
>Blazin' Forth moves the LFA to the other side of the NFA, like this.
>[len]WORDNAM[E] [LFAlo|LFAhi] [CFAlo|CFAhi] [parameter field...]

Maybe you confused these two. fig-Forth has the link field between
the name and the code field.

>But what's wrong with this? Given a CFA, NFA, or LFA we know exactly
>where the length/flags byte is, and subtracting its contents gives us
>the address of the name's first character. I'd leave the hi-bit set
>(for reassurance/validation purposes). I can't be the first person to
>come up with this idea.
>
>[W]ORDNAME[len] [LFAlo|LFAhi] [CFA] [parameter field...]

fig-Forth supported storing only the first few characters of the name,
so in general subtracting len (which always stored the full length)
does not give the start of the name.

Also, going back from the LFA or CFA to the name is a relatively rare
operation. OTOH, in Gforth it's not guaranteed to work correctly, and
if we switched to your data structure, it would.

Andrew Haley

unread,

Sep 9, 2010, 10:55:38 AM9/9/10

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:

> fig-Forth supported storing only the first few characters of the
> name,

No it didn't. fig-FORTH had variable-length names, up to 31
characters, and stored them all up to the limit set by WIDTH.

> so in general subtracting len (which always stored the full length)
> does not give the start of the name.

Andrew.

Anton Ertl

unread,

Sep 9, 2010, 10:56:53 AM9/9/10

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>
>> fig-Forth supported storing only the first few characters of the
>> name,
>
>No it didn't. fig-FORTH had variable-length names, up to 31
>characters, and stored them all up to the limit set by WIDTH.

Strange, first you contradict, then you agree with me by expanding on
what I wrote.

Andrew Haley

unread,

Sep 9, 2010, 11:14:38 AM9/9/10

to

John Passaniti <john.pa...@gmail.com> wrote:
> On Sep 9, 4:34?am, Andrew Haley <andre...@littlepinkcloud.invalid>

> wrote:
>> Actually, 50% is about right for open addressing, but then
>> there's no need for chaining so there's no need for the links.
>
> You may have different references or have used poorer hash functions
> than I have. But most every reference I have and in personal
> experience, open-addressed hash tables usually do fine until about 75%
> full.

I guess it depends on what you mean by "fine". When the table is half
full there are only on average 1.5 probes for a successful search and
2.5 probes for an unsuccessful one: great stuff. By 75% it's quite a
lot worse: 2.5 probes for a successful search and 8.5 probes for an
unsuccessful one. This has an expecially big impact if you're doing a
lot of insertions.

> And even then, the question isn't how many additional probes it
> takes, but if the cost of those probes is greater than for
> competitive algorithms.

True. As long as you don't want to do much in the way of deletions, I
really like open addressing. But with MARKER and FORGET you have to
do deletions, and I'd use chaining.

> Regardless, the constant refrain expressed by Hugh is that using hash
> tables (again, he sees them all as one single thing) always consumes
> more memory than the awesome wonder that is symtab. That's false, if
> only because he's only considering a single type of hash table.

I agree.

>> I'm not sure I understand the point of this discussion any
>> more: using a hash table to select 16 or 32 lists is
>> obviously the right thing to do, and only takes a few lines
>> of Forth.
>
> And it's functionally identical to a hash table using chaining for
> collision resolution, since it is known that every slot in the hash
> will have collisions.

Yes. Fast, simple, cheap.

Andrew.

Andrew Haley

unread,

Sep 9, 2010, 11:56:46 AM9/9/10

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>>Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>>
>>> fig-Forth supported storing only the first few characters of the
>>> name,
>>
>>No it didn't. fig-FORTH had variable-length names, up to 31
>>characters, and stored them all up to the limit set by WIDTH.
>
> Strange, first you contradict, then you agree with me by expanding on
> what I wrote.

Ah, I see. LOL! :-)

If I wanted to write that sentence in a way that wasn't guaranteed to
mislead readers whose first language was English, I'd write "fig-Forth
optionally supported storing only the first few characters of the
name."

Andrew.

Paul Rubin

unread,

Sep 9, 2010, 1:45:23 PM9/9/10

to

Andrew Haley <andr...@littlepinkcloud.invalid> writes:
> If I wanted to write that sentence in a way that wasn't guaranteed to
> mislead readers whose first language was English, I'd write "fig-Forth
> optionally supported storing only the first few characters of the
> name."

I thought early Forths (I don't know if that means fig-Forth) used
the first and last character.

Paul Rubin

unread,

Sep 9, 2010, 1:47:39 PM9/9/10

to

Paul Rubin <no.e...@nospam.invalid> writes:
> I think if you have a few hundred
> statically known words, with say 12 bits of Bloom filter per keyword,
> you should be able to construct a filter that tells you with perfect
> accuracy whether a given string is in the wordlist.

Wait, that's pretty obviously wrong. I'll have to think about this some
more.

Andrew Haley

unread,

Sep 9, 2010, 2:03:07 PM9/9/10

to

Count and first 3 chars, actually. But that wasn't fig-FORTH.

Andrew.

John Passaniti

unread,

Sep 9, 2010, 4:54:26 PM9/9/10

to

On Sep 9, 1:47 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> Wait, that's pretty obviously wrong. I'll have to think
> about this some more.

Maybe this is where you got confused:

The is a somewhat amusing message from Charles Moore in an issue of
Forth Dimensions where he replies to someone about the storage of only
the first three characters and the length by rendering his response
like this:

DID YOU KNO- THA- HUG- WRO-- THE AWE----
WON--- THA- IS SYM--- ?

The point, I guess, was that the name encoding scheme was "good
enough" because given context, you could still pretty much read it.
(Incidentally, it's even more amusing to take his full letter and slap
together a script that takes the dictionary and prints all possible
words that fit the patterns. It's a lot more ambiguous than most
would suspect. But I digress.)

Now, combine that bit of history with the well-known filter that takes
the first and last letter of a word and then scrambles the letters in
between. The same message above might come out as:

DID YOU KONW TAHT HGUH WTORE THE AESMOWE
WNDOER TAHT IS SMTAYB ?

This filter is usually used to demonstrate that after a certain age,
people stop reading words as individual characters, but as entire
words, and that the order of letters matters less. (There are various
variations on this theme, including randomizing vowels after the
first, swapping pairs of consonants, and other related silliness.
Many of these still result in words that can (mostly) be read without
much difficulty.)

So my guess is you've mentally merged the two encodings and came up
with FIG-Forth using the second.

Rod Pemberton

unread,

Sep 9, 2010, 6:51:47 PM9/9/10

to

"Alex McDonald" <bl...@rivadpm.com> wrote in message
news:d6aa3ccb-8f83-4343...@c16g2000vbp.googlegroups.com...

> On 9 Sep, 10:12, Paul Rubin <no.em...@nospam.invalid> wrote:
> > "Rod Pemberton" <do_not_h...@notreplytome.cmm> writes:

> > [snip, hash functions]

>
>See http://www.cse.yorku.ca/~oz/hash.html.
>
> Quoted; djb2
> this algorithm (k=33) was first reported by dan bernstein many years
> ago in comp.lang.c. another version of this algorithm (now favored by
> bernstein) uses xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of
> number 33 (why it works better than many other constants, prime or
> not) has never been adequately explained.
>

Those have really bad collisions even when the input is restricted to
printable ASCII chars instead of binary. Since the context is for the 6502,
an algorithm with high collisions may be an acceptable characteristic...
One really needs at least a 16-bit hash to provide a minimum level of
low-collisions for typical English words.

As for magic values improving a hash, yes, certain values like can turn a
poor hash function into a decent, but not great, hash function.

I you want a good hash, you'll implement Austin Appleby's MurmurHash2 (or
perhaps a more recent version). Bob Jenkins has produced hash functions
almost as good, about 10% more collisions, e.g., hashlittle() from
lookup3.c. All the other hash functions I've tested - including those by
others with solid reputations - don't come close to these two, in terms of
low collisions. However, I wouldn't expect someone coding for a 6502 in
assembly to implement either of them due to their size or complexity.

The first part of this message by me lists the many hash functions I've
tested for personal use:
http://groups.google.com/group/comp.lang.misc/msg/a37141ec795b97fb

Rod Pemberton

Hugh Aguilar

unread,

Sep 9, 2010, 7:58:07 PM9/9/10

to

On Sep 9, 4:51 pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:

> I you want a good hash, you'll implement Austin Appleby's MurmurHash2 (or
> perhaps a more recent version). Bob Jenkins has produced hash functions
> almost as good, about 10% more collisions, e.g., hashlittle() from
> lookup3.c. All the other hash functions I've tested - including those by
> others with solid reputations - don't come close to these two, in terms of
> low collisions. However, I wouldn't expect someone coding for a 6502 in
> assembly to implement either of them due to their size or complexity.

You could use symtab, which is already written --- leave the hash
table for another year when you have more time. :-)

I originally implemented symtab in IBM370 assembly language. It
shouldn't be too difficult in 6502 assembly. Almost everything is
recursive, so you don't want to use zero-page variables because you'll
be wasting good zero-page space --- and will be pushing and popping
them continuously, which kills the speed. The way I would do it, is to
use the parameter stack for all of your data. Rather than use it in a
Forth-like way though, in which you move data around on the stack, you
should make C-like stack frames and use index,X addressing to access
the data (assembly language doesn't have Forth's limitation of only
having access to the top two values on the stack). Define records to
represent your stack frames; this can be done by hand with EQU if your
assembler doesn't have any mechanism for doing it automatically.

Writing symtab in 6502 assembly should be the work of only a couple
days --- if you had started immediately when I had suggested symtab,
rather than muddle around with the ever-increasing complexity of hash
tables, you would be done by now.

John Passaniti

unread,

Sep 10, 2010, 1:27:11 AM9/10/10

to

On Sep 9, 6:51 pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:

> Those have really bad collisions even when the input is restricted to
> printable ASCII chars instead of binary. Since the context is for the 6502,
> an algorithm with high collisions may be an acceptable characteristic...
> One really needs at least a 16-bit hash to provide a minimum level of
> low-collisions for typical English words.

I think you're missing a key part here-- he is *expecting*
collisions. There are 16 buckets and over 300 words so collisions are
unavoidable. The ideal here isn't to avoid collisions, but to equally
distribute those 300+ words across the 16 buckets so that he has a
reduced list to search.

Bernd Paysan

unread,

Sep 10, 2010, 4:14:39 AM9/10/10

to

On Freitag, 10. September 2010 07:27, John Passaniti wrote:
> I think you're missing a key part here-- he is *expecting*
> collisions. There are 16 buckets and over 300 words so collisions are
> unavoidable. The ideal here isn't to avoid collisions, but to equally
> distribute those 300+ words across the 16 buckets so that he has a
> reduced list to search.

BTW: This also makes calculating the memory overhead straigh-forward. A
traditional wordlist would have "one bucket", i.e. there is one root pointer
and one chain. This list has 16 buckets, and uses the same pointers in the
dictionary for chaining. Therefore, the overhead is a constant 15 cells per
dictionary plus the code for the hash function.

Alex McDonald

unread,

Sep 10, 2010, 4:48:28 AM9/10/10

to

The djb2 hash is what I now use in Win32Forth. The original used a ROL
to mix the bits, but djb2 is just as quick and produces a better and
more uniform distribution. With so few buckets, it's pretty academic
anyway; perhaps, for example, a simple 4bit mask somewhere on the
xor'd first letter and the length would suffice.

Rod Pemberton

unread,

Sep 10, 2010, 5:19:38 AM9/10/10

to

"John Passaniti" <john.pa...@gmail.com> wrote in message

news:96765853-5c44-4ac3...@k11g2000vbf.googlegroups.com...

> On Sep 9, 6:51 pm, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> wrote:
> > Those have really bad collisions even when the input is restricted to
> > printable ASCII chars instead of binary. Since the context is for the
> > 6502, an algorithm with high collisions may be an acceptable
> > characteristic... One really needs at least a 16-bit hash to provide
> > a minimum level of low-collisions for typical English words.
>
> I think you're missing a key part here-- he is *expecting*
> collisions.

...

> There are 16 buckets and over 300 words so collisions are
> unavoidable.

If he decides to stay with a small number buckets with a relatively large
number of words, then many collisions are imminent. That's obvious. If
there are an additional 200 user defined words and he has a hash with a
fairly good distribution, then he could average about 32 words per
"bucket"... If it has a poor distribution, he could peak at, say 48 or 56
words, per "bucket"... Either way, that's alot of collisions, and alot of
string comparisons to identify the correct word. If the average word length
is 8 char's, and the average number of comparisons to locate the correct
word is 32 words, and the average cpu cycles are 3, then he'd need to
compare 256 bytes on average at 768 cycles on average to identify just *one*
word in the dictionary. So, maybe you and I and others should be attempting
to change his expectations to something with more rewarding results, instead
of accepting his expectations as is.

> The ideal here isn't to avoid collisions, but to equally
> distribute those 300+ words across the 16 buckets so that he has a
> reduced list to search.
>

The easiest way to get a "reduced list to search" is to reduce the number of
string comparisons needed to determine a word match. The number of string
comparisons is the same as the number of hash collisions. One way to reduce
string comparisons, on average, is to use a hash that is more evenly
distributed, as you've stated. A better way to reduce string comparisons is
to use a hash which has far fewer collisions in the first place, as I
explained previously. That requires a larger bucket size and a good
algorithm, which I state previously. He can easily increase the bucket size
to "256 buckets" (8-bits), which I asked him about previously. Even with an
8-bit hash, he will likely still see many collisions.

Rod Pemberton

Andrew Haley

unread,

Sep 10, 2010, 7:43:17 AM9/10/10

to

No-one is going to compare the last 6 chars of a name if the first two
chars don't match. The right way to do it is the obvious way: walk
along the list, comparing length and first char. That can be done in
a very tight loop, as you know for certain that every word has at
least one valid character.

>> The ideal here isn't to avoid collisions, but to equally
>> distribute those 300+ words across the 16 buckets so that he has a
>> reduced list to search.
>
> The easiest way to get a "reduced list to search" is to reduce the
> number of string comparisons needed to determine a word match. The
> number of string comparisons is the same as the number of hash
> collisions. One way to reduce string comparisons, on average, is to
> use a hash that is more evenly distributed, as you've stated. A
> better way to reduce string comparisons is to use a hash which has
> far fewer collisions in the first place, as I explained previously.
> That requires a larger bucket size and a good algorithm, which I
> state previously. He can easily increase the bucket size to "256
> buckets" (8-bits), which I asked him about previously. Even with an
> 8-bit hash, he will likely still see many collisions.

This is a PET, with (presumably) 32K of RAM, aso you're not going to
want a 256 entry hash table for every vocabulary. Hashing a
vocabulary ID with the name would solve that one, and might give the
best results. I suspect we're in diminishing returns, though.

Andrew.

John Passaniti

unread,

Sep 10, 2010, 8:55:04 AM9/10/10

to

Sorry, I missed your message the first time:

On Sep 9, 4:54 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:

> > That's relatively expensive on a 6502--
>
> Relatively expensive? Are you joking? You do know that even
> the lowly Vic 20 had at least 3.5KB available for the user's
> app... Yes? Is 128 bytes too much for it to handle?

In terms of addressability, no. In terms of memory consumption that
could be used for other things, yes. Those 128 bytes out of 3.5k
represents roughly 4% of the total memory available. I haven't
programmed on a 6502 for years, but I certainly wouldn't want to give
up that much memory for something that can more easily be accomplished
with a simple roll instruction.

> He could put the table into zero page, if his machine has enough
> free space there. That would shave a few cycles. He could also
> reduce the size of the random table to 64 bytes - just those
> between space and underscore, i.e., uppercase and most graphics
> char's.

Zero page on a 6502 is precious. You don't waste it with a 128 byte
lookup table when that table isn't necessary.

> > and the cost of indexing into that table.
>
> Yeah, about 3, 4, maybe 5 or so extra cycles versus potentially
> far more hash collisions.

Those extra cycles can also be used for a simpler roll or other bit-
mixing instruction sequence, and not incur the overhead of a 128 byte
table.

> More hash collisions means there will be more string
> comparisions to identify the correct word in the dictionary.

In a hash, yes. And if that's the data structure that was being
discussed, I would agree with you. The OP isn't implementing a hash.
He is instead using a hash function to select one of 16 possible lists
to then search. So the goal here isn't to find a hashing function
that gives unique values for each word. The goal here is to find a
hashing function that equally distributes across the number of
possible lists.

> > A simpler solution might be to roll the bits after each exclusive-or
> > with a character. That's a single instruction, and will provide ample
> > mixing of bits.
>
> It's definately simpler. It's definately fewer cycles. I'm not sure
> that it's any better...

It is. You're free to model this yourself and see. And for that
matter, you're free to model your 128 byte table (and whatever pseudo-
random values you would store there) and measure the actual
performance instead of assuming it is naturally superior. Given a
poor pseudo-random number generator populating those 128 values, you
could have exactly the same problem. And just as you (correctly)
point out that generating a good hash function is non-trivial,
generating a good pseudo-random number generator is equally
difficult. It is in fact, the same basic problem.

> Only the Forth words in the initial dictionary are known, by him.
> Are his users expected to not define any words, variables, or
> constants of their own?

They are, and they can either use the same hashing function to select
from a list, or their words can be stored in a separate list. There
is nothing in Forth (except tradition) that prevents one from having
more than one storage scheme for a dictionary.

Hugh Aguilar

unread,

Sep 10, 2010, 6:04:14 PM9/10/10

to

Even if your hash function is perfectly rectangular (which it won't
be), you are only reducing your search space by 1/16. That is
comparable to 4 steps into a binary tree. Then you go into a linked
list, which is horribly inefficient. This doesn't make any sense.

You are much better off to use symtab. A binary tree is only slightly
more complicated than a linked list, and much more efficient. Symtab
moves the more common words to the top --- that is why it is more
efficient than a balanced binary tree. Also, assuming that there is no
relation between alphabetic ordering and frequency of use, the symtab
tree will be pretty much balanced.

Symtab has 3 words per node (left pointer, rite pointer and counter),
whereas a linked list has 1 word per node (fore pointer). This isn't
that much of difference in memory. For 300 words, that is 1800 bytes
compared to 600 bytes (1200 bytes difference) --- not a big deal given
32K of memory. If this small amount of memory does end up being
desperately needed at some time in the future (after the guy has
written some gigantic application), the memory can be recovered by
converting the dictionary into a linked list. By that time, you will
have collected a lot of information regarding frequency of usage for
the *specific* application in question (the count values in each
node). When you build your linked list, you can put the more common
words in the front.

All this talk about hash functions just doesn't make any sense at all.
It is very difficult to take comp.lang.forth seriously when I see
everybody spending their time muddling around with such nonsense. I
really get the impression that none of you have worked as professional
programmers; you seem to be the kind of programmers who want to spend
all of your time in meetings rambling on about subjects (pseudo-random
number generators) that you don't know anything about, and generally
avoiding ever actually writing any software. This worry about saving a
few hundred bytes of memory is a classic example of premature
optimization. If I was writing that 6502 Forth system, the dictionary
would have already been completed by now and I would have moved onto
writing the application.

After the application gets finished, if there is a memory shortage
problem, that would be the time to worry about optimization. Actually,
the best way to save memory is to convert the on-board compiler into a
cross-compiler and rebuild the application from a desktop computer ---
in which case *no* memory is used by the dictionary, or the compile-
time code either. By that time, the application is already written, so
the lack of interactive development that you get with a cross-compiler
is no longer an issue.

John Passaniti

unread,

Sep 10, 2010, 7:52:55 PM9/10/10

to

On Sep 10, 6:04 pm, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
> You are much better off to use symtab.

No, that doesn't make sense. It's other people's code. And since I
want to be just like you, Hugh (well, except for the whole not being a
professional programmer thing), I'll avoid other people's code.

> Symtab has 3 words per node (left pointer, rite pointer and counter),
> whereas a linked list has 1 word per node (fore pointer). This isn't
> that much of difference in memory. For 300 words, that is 1800 bytes
> compared to 600 bytes (1200 bytes difference) --- not a big deal given
> 32K of memory.

Wow! symtab really is amazing! Apparently it doesn't take any code
space to implement it! And as it doesn't use up any code space, it
follows that it must be infinitely fast! No wonder you never bothered
to do any actual measurements on symtab-- it's too small and too fast
to measure!

> All this talk about hash functions just doesn't make any sense at all.
> It is very difficult to take comp.lang.forth seriously when I see
> everybody spending their time muddling around with such nonsense.

I know! It's so stupid! Why bother with a data structure that will
usually find elements in O(1) time, when they could use symtab and
get... well... you never really did measure it to determine what kind
of time you can find elements in. But that's okay-- we don't need to
know stupid things like that. All we need to know is that you are
gosh-darn proud of an algorithm that you refuse to measure.

> I really get the impression that none of you have worked as
> professional programmers;

Hey, that's not fair! You know, I wanted to be a professional
programmer, but I never learned how to drive a car. So instead of
following in your career footsteps, I'm now stuck designing software
and writing code for money. And I'm not the only one saddled with a
dead-end job that doesn't involve knowing the fastest route between
the airport and local hotels. There are lots of other people in this
newsgroup who equally suffer the indignity of having to develop
software and get paid for it.

I wish I could be like you, Hugh! Free of the shame of being paid to
write software for money, you're free! Nobody has bought you, and
from the looks of it, nobody ever will.

BruceMcF

unread,

Sep 10, 2010, 11:38:46 PM9/10/10

to

On Sep 8, 7:16 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:
> The CFA, while it typically points to the DOCOL routine for most words, it
> doesn't for all. By eliminating the CFA, you won't be able to distinguish
> between compiled words and low-level routines (aka "primitives") or words
> that use a non-DOCOL CFA: dovar, docon, douse, dodoes, etc. Eliminating the
> CFA for compiled words is not an issue, but for the other words it is -
> since they won't be calling DOCOL. You'd need a way to determine when you'd
> reached one of these words instead of a compiled word. I.e., the nesting of
> addresses in compiled words is much like traversing a binary tree and a
> non-DOCOL CFA is used to tell you when you've reached a node without any
> branches...

But there's no need to "know" ... you jumped TO the word, so each word
starts with executable machine code, whether the code of the
primitives or JSR ENTER, JSR DOVAR, etc. The word itself "knows" what
to do.

Primitives save two bytes, compiled words and those that in an ITC use
a non-DOCOL CFA: cost a byte.

BruceMcF

unread,

Sep 11, 2010, 12:46:51 AM9/11/10

to

On Sep 9, 4:19 am, chitselb <chits...@gmail.com> wrote:
> I'm wondering why it is usually done like this:
>
> fig:
> [LFAlo|LFAhi] [len]WORDNAM[E] [CFAlo|CFAhi] [parameter field...]
>
> Blazin' Forth moves the LFA to the other side of the NFA, like this.
> [len]WORDNAM[E] [LFAlo|LFAhi] [CFAlo|CFAhi] [parameter field...]
>

> So now we don't have to TRAVERSE (read: fetch and examine each byte of
> "WORDNAME" until we get to the last byte with the hi-bit set) to get
> to LFA.

Huh?

LinkFieldAddr Len NAM[E] JSR-ENTER/Code

Target is *whatever* Len NAM[E], (V) pointed to *whatever*
...

FIND-IN-LIST ( c$ list -- c$' TRUE | c$ FALSE )
LDA DL,X
STA W
LDA DH,X
STA W+1
SEC
LDA DL+1,X
SBC #2
STA V
LDA DH+1,X
SBC #0
STA V+1
STX TX
LPNAME:
LDY #2
LDA (W),Y
AND #1F
CMP (V),Y
BNE NEXTWORD
TAX
LPCHAR:
DEX
BMI FOUND
INY
LDA (W),Y
CMP (V),Y
BEQ LOOP
NEXTWORD
LDY #1
LDA (W),Y
BEQ NOTFOUND ; any $00xx is end of list.
PHA
DEY
LDA (W),Y
STA W
PLA
STA W+1
BNE LPNAME
FOUND:
TXA
LDX TX
STA DL,X
STA DH,X
CLC
LDA W
ADC #2
STA DL+1,X
LDA W+1
ADC #0
STA DH+1,X
JMP NEXT
NOTFOUND:
LDX TX
LDA #0
STA DL,X
STA DH,X
JMP NEXT

> But what's wrong with this? Given a CFA, NFA, or LFA we know exactly
> where the length/flags byte is, and subtracting its contents gives us
> the address of the name's first character.

But there's no benefit to having to subtract, when pointing to the LFA
of the next word in the list preceding the name allows us to easily
start offset by 2 and compare up, and on failure dropping down to
(LFA),1 and (LFA),0 to move to the next in the list.

BruceMcF

unread,

Sep 11, 2010, 1:09:23 AM9/11/10

to

On Sep 8, 10:13 am, John Passaniti <john.passan...@gmail.com> wrote:
> That's relatively expensive on a 6502-- you've got the 128 bytes used
> up for a table and the cost of indexing into that table. A simpler

> solution might be to roll the bits after each exclusive-or with a
> character. That's a single instruction, and will provide ample mixing

> of bits. Again, as the words being hashed are known (they are the
> FORTH vocabulary), it is easy enough to quickly model this and measure
> the performance and distribution the hash provides.

If the target is a 16-bit index, I wonder whether nybble rolling might
not be better, especially since the most expensive mismatches are
those that are identical until somewhere near the end. On the 6502,
that's quickest as a table routine:
AND #$0F
TAY
LDA NYBROLL,Y
...
NYBROLL:
.BYTE 0,2,4,6,8,$A,$C,$E,1,3,5,7,9,$B,$D,$F

BruceMcF

unread,

Sep 11, 2010, 1:28:51 AM9/11/10

to

On Sep 9, 6:15 am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:

> >XOR hash results are really, really poor when using ASCII. There are too
> >many similarly set bits.

> It's poor for any data. You want each bit in the input to ideally
> influence half of the bits of the hash value on average. With xor one
> bit influences only one bit.

I don't follow. Is this an XOR hash without rotation?

> Addition is significantly better and almost as cheap on the 6502 (don't CLC after every character, feeding the carry back helps distributing the bits).

Faster, if no rolling is needed. For shorter words, I think I'd want
to sum the nybbles at the end.

CLC
LDY #0
LDA (V),Y
TAY
HASHLP
ADC (V),Y
DEY
BNE HASHLP
STA W
LSR A
LSR A
LSR A
LSR A
ADC W
AND #$0F
...

chitselb

unread,

Sep 11, 2010, 5:50:19 AM9/11/10

to

I'm impressed to find other 6502 golfers here.

I liked combining the 4-bit hashkey with the Bloom filter idea. In
this implementation, each of the 16 threads has its own two-byte wide
Bloom filter. For every word in the dictionary, I generate hash1 to
choose which chain&filter to use, and hash2 and hash3 turn on two of
the bits in the filter. Nothing ever turns bits off, so false
positives are possible, false negatives are not. I could elaborate
more, but it's better coming from the wikipedia article than from me.

I'm not a mathematician, so my hash algorithms might not be very
good. I'll probably use that nybble rotate as one of them. I've only
walked through the code once in the debugger, so I don't know what
will happen after I crunch and insert the entire dictionary. If more
than half of the bits in the filter are on, I'll just double the width
of the filter and tweak the hash2 and hash3 to match the new size
until that stops happening. With suitable filter size I can avoid
having to search the dictionary for things that aren't in it (mostly
numbers).

Bruce also suggested ordering the chains by length (shortest word at
the head), which would be a pretty simple thing to implement. It's
not a binary search, but at least FIND can bail out early if the next
word in the chain is longer than what FIND is searching for.

So here's working code (for xa65) based on K=2 and a 16-bit wide
filter:

;--------------------------------------------------------------
latestl .dsb 16,0 ; .dsb pseudo-op declares arg1 bytes of
storage filled with arg2
latesth .dsb 16,0
bloom .dsb 32,0 ; bloom filter for Forth vocabulary
power2 .byt $01,$02,$04,$08,$10,$20,$40,$80

; n is the 8-byte work area
; tos is the top of stack
; stackl & stackh are the split data stack
; $e081 is a ROM address where <a href="http://www.pagetable.com/?
p=43">Bill's easter egg</a> hides its message in PET "Upgrade" version
ROMs
; CHROUT is the ROM routine to output a PETSCII character (print the
character in A)

;--------------------------------------------------------------
;
; DHASH ( nfa -- hash3 hash2 hash1 )
;
; input is NFA or counted string. Output is a trio of hashcodes,
; each generated by a different algorithm but all in the range
; $00 to $0f. Used to break the FORTH vocabulary into smaller
; chains for (FIND) and to implement a Bloom filter for speed.
;
dhashlfa .word $adde
.byt (dhash-*-1)|bit7
.asc "DHAS","H"|bit7
dhash ldy #0
lda (tos),y
and #$3f ; turn off 7 and 6, not 5 (smudge)
tay
sta n
sta n+1
sta n+2
sec ; set to a known state
l009 lda (tos),y ; xor the string together
and #$7f ; turn off bit 7 on the final byte
eor n ; just XOR everything
sta n ; hash1
adc $e081,y ; add the ghost of !TFOSORCIM
ror ; and rotate
sta n+1 ; hash2
rol ; do something else.
rol
eor #$a5
rol
sbc n
sta n+2 ; hash3
dey
bne l009
ldy #2
l049 lda n,y
rol
rol
rol
rol
eor n,y
and #$0f ; normalize hash2 & hash3 to 4-bits and
push
dex
sta stackl,x
jsr debug ;~debug
lda #0
sta stackh,x
dey
bne l049
lda n
lsr
lsr
lsr
lsr
eor n
and #$0f
pha
jsr debug ;~debug
lda #',' ;~debug
jsr CHROUT ;~debug
pla
jmp put
;~debug
debug php
sty n+4
stx n+3
sed
cmp #10
adc #'0'
cld
jsr CHROUT
ldy n+4
ldx n+3
plp
rts

;--------------------------------------------------------------
;
; DHASH! ( LFAnew hash3 hash2 hash1 -- )
;
; links the current LFA onto the top of this hash' thread
;
; ~ todo - order them by length
;
; When we're done, the word whose LFAnew is on the stack will become
; the new top of this hash thread, and the new word's LFA will link
; to the previous top of the chain.
;
dhashstorelfa .word $adde
.byt (dhashstore-*-1)|bit7
.asc "DHASH","!"|bit7
dhashstore lda #4
jsr popnwords ; N,N+2,N+4 = hashes, N+6 = LFAnew
lda n
asl
sta n+1

lda n+2
and #7
tay
cmp n+2
beq l058
inc n+1
l058 lda power2,y
ldy n+1
ora bloom,y
sta bloom,y

lsr n+1
asl n+1
lda n+4
and #7
tay
cmp n+4
beq l059
inc n+1
l059 lda power2,y
ldy n+1
ora bloom,y
sta bloom,y

ldy n ; get link from hashkey table -> N2
lda latestl,y
sta n+2
lda latesth,y
sta n+3 ; LFAlatest -> N2
bne l045
lda #<executelfa
sta n+2
lda #>executelfa
sta n+3
l045 lda n+6
sta latestl,y
lda n+7
sta latesth,y ; LFAnew -> hashtable(key)
ldy #0
lda n+2
sta (n+6),y
iny
lda n+3
sta (n+6),y ; LFAlatest -> [LFAnew]
jmp next

----

Anton Ertl

unread,

Sep 11, 2010, 8:25:29 AM9/11/10

to

Hugh Aguilar <hughag...@yahoo.com> writes:
>Even if your hash function is perfectly rectangular (which it won't
>be), you are only reducing your search space by 1/16. That is
>comparable to 4 steps into a binary tree. Then you go into a linked
>list, which is horribly inefficient. This doesn't make any sense.

...

>Symtab has 3 words per node (left pointer, rite pointer and counter),
>whereas a linked list has 1 word per node (fore pointer). This isn't
>that much of difference in memory. For 300 words, that is 1800 bytes
>compared to 600 bytes (1200 bytes difference) --- not a big deal given
>32K of memory.

Well, if we are prepared to use 1200 bytes more for the dictionary, we
could use a hash table with 600 buckets. Each bucket will be filled
with half a word on average. A failing search will take 1/2 string
comparison on average, a successful search will take a little over one
string comparison on average.

Albert van der Horst

unread,

Sep 11, 2010, 9:31:46 AM9/11/10

to

In article <i6ct76$937$1...@speranza.aioe.org>,

What is wrong with the expectation: "I have 16 buckets so I expect a
speed up of approximately 16 or somewhat less, given that my
hash is not perfect."
And the conclusion: "That is good enough for me"

For example the 6809 ciforth is sluggish, with a speedup like that it
becomes a perfectly usable Forth.

>Rod Pemberton
Groetjes Albert

--
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

BruceMcF

unread,

Sep 11, 2010, 1:17:48 PM9/11/10

to

On Sep 11, 5:50 am, chitselb <chits...@gmail.com> wrote:
> Bruce also suggested ordering the chains by length (shortest word at
> the head), which would be a pretty simple thing to implement. It's
> not a binary search, but at least FIND can bail out early if the next
> word in the chain is longer than what FIND is searching for.

If each hashed list is sorted by length, you have a slightly faster
walk down the linked list to get to the set of characters of that
length, and then the next length mismatch is NOTFOUND. If shorter
literals are more common, a list sorted short words first will fail
more quickly on average.

# FIND-IN-LIST ( TARGET$ LIST -- NFA TRUE | TARGET$ FALSE )
# CODE

LDA DL,X
STA W
LDA DH,X
STA W+1
SEC
LDA DL+1,X
SBC #2

STA V,X
LDA DH+1,X
SVC #0
STA V+1
STX TX
LDA (V),Y
STA LEN
STARTLP:
LDY #2
LDA (W),Y
AND #$1F
CMP LEN
BEQ CHARLP
DEY
LDA (W),Y
BEQ NOTFOUND

PHA
DEY
LDA (W),Y
STA W
PLA
STA W+1

BNE STARTLP
NAMELP:
LDY #2
LDA (W),Y
AND #$1F
CMP LEN
BNE NOTFOUND
CHARLP:

INY
LDA (W),Y
CMP (V),Y

BNE WORDLP1
CPY LEN
BNE CHARLP
BEQ FOUND
WORDLP1:
DEY
LDA (W),Y
BEQ NOTFOUND

PHA
DEY
LDA (W),Y
STA W
PLA
STA W+1

BNE WORDLP
FOUND:
LDX TX
LDA W
STA DL+1,X
LDA W+1
STA DH+1,X
LDA #$FF
BNE FLAG
NOTFOUND:
LDA #0
FLAG:

Hugh Aguilar

unread,

Sep 11, 2010, 1:42:57 PM9/11/10

to

On Sep 11, 3:50 am, chitselb <chits...@gmail.com> wrote:
> I'm impressed to find other 6502 golfers here.

The 65c02 was the first target that I ever wrote a cross-compiler for.
Is "golfer" slang for "hobbyist?" Well, I am a hobbyist.

> I'm not a mathematician, so my hash algorithms might not be very
> good.

There aren't *any* mathematicians on comp.lang.forth. All of this talk
about pseudo-random number generator (prng) design is just an effort
to mush all the characters together. For example, Anton Ertl says
this:

> You want each bit in the input to ideally
> influence half of the bits of the hash value on average. With xor one

> bit influences only one bit. Addition is significantly better and

> almost as cheap on the 6502 (don't CLC after every character, feeding
> the carry back helps distributing the bits).

That is not math; that is mush!

Of course, I'm a "mushematician" too --- I invented the LC53 prng
which I included in my novice package. :-)

> Bruce also suggested ordering the chains by length (shortest word at
> the head), which would be a pretty simple thing to implement. It's
> not a binary search, but at least FIND can bail out early if the next
> word in the chain is longer than what FIND is searching for.

If you count how many times each word in the dictionary is accessed,
you will find over time that there is a lot of variance. The words
that are accessed a lot are "high-priority" and the words that are
rarely accessed are "low-priority." If one word is accessed several
times more often than another word, then the best way to speed up the
compilation of your application is to make access of the high-priority
word several times faster than access of the low-priority word.

1.) Ideally, your hash table should be arranged so the low-priority
words collide with each other, and the high-priority words don't
collide much at all --- so the high-priority words get buried in short
linked-lists and the low-priority words get buried (more deeply) in
long linked-lists. Unfortunately, the hash table doesn't provide a
mechanism for doing this.

2.) Ideally, your hash table's linked-lists should be arranged so the
low-priority words are at the end of the list, and the high-priority
words are toward the front of the list --- so the high-priority words
get located sooner, with fewer string comparisons being done.
Unfortunately, the hash table doesn't provide a mechanism for doing
this.

The primary predictor for whether a word will get fast access or slow
access, is the order in which the words are entered into the
dictionary. Assuming that new words are appended on the ends of the
linked lists, the earliest defined words will tend to be toward the
front of the linked lists and will get faster access. A word at the
front of the linked-list will get an order of magnitude faster access
than a word at the tail of the linked-list if the list is 10 nodes
long. Considering that you've only got 16 "buckets," your linked-lists
are going to be more like 100 nodes long, so the speed difference
between head and tail is actually two orders of magnitude.

Do you think maybe you ought to put the high-priority nodes in the
front of the list? The frequency that a high-priority word gets
accessed compared to a low-priority word, can be one or even two
orders of magnitude difference!

I'm very unimpressed with all this pseudo-scientific mumbo-jumbo about
hash tables --- the performance of the hash-table depends entirely
upon luck. If your high-priority words get put in the front of the
list, you will get good performance. If your high-priority words get
put in the end of the list, you will get bad performance.
Unfortunately, none of these hash-table experts know the difference
between high-priority words and low-priority words, because they
aren't counting how often the words get accessed. How deep a
particular word is buried in its linked list, is entirely a matter of
luck.

You have to actually count how often the words get accessed, to know
if they are high-priority or low-priority. That is what my symtab
does. As I said before, if you are worried about the memory usage of
symtab, you should still implement symtab just for the purpose of
getting a count on word access. You can rebuild your dictionary to use
a simple linked-list later, to save memory --- and you will get
reasonably good performance if you put the high-priority words in
front of the low-priority words.

You are wasting your time mucking around with hash tables --- you
don't know the difference between high-priority words and low-priority
words --- you are working entirely in the dark, in complete ignorance!

If you really are intent on using a hash table, you should make a swag
(guess) as to which words are high-priority and which are low-priority
--- and you should locate the supposed high-priority words ahead of
the low-priory words in your source-code so they get put in the linked-
lists toward the front. This is pretty haphazard, but it is better
than nothing.

Paul Rubin

unread,

Sep 11, 2010, 1:57:17 PM9/11/10

to

chitselb <chit...@gmail.com> writes:
> I liked combining the 4-bit hashkey with the Bloom filter idea. In
> this implementation, each of the 16 threads has its own two-byte wide
> Bloom filter.

Wait, that doesn't sound useful, such a narrow filter will have all of
its bits set. You need 12 bits or so for each token in the thread. So
if you have 300 symbols and 16 threads, you want about 30 bytes of Bloom
filter in each thread, which may be more than you can afford.

Hugh Aguilar

unread,

Sep 11, 2010, 1:57:44 PM9/11/10

to

On Sep 10, 5:52 pm, John Passaniti <john.passan...@gmail.com> wrote:

> Wow! symtab really is amazing! Apparently it doesn't take any code
> space to implement it! And as it doesn't use up any code space, it
> follows that it must be infinitely fast!

Actually, I know exactly how much code space symtab takes to implement
--- because I implemented it.

> I wish I could be like you, Hugh!

It is a little bit late for you to become a man --- most males do that
in their teenage years.

LOL

Note for Elizabeth Rather --- the above remark was both "homophobic"
and "intolerant." It is time (again!) for you to spring into action
and bravely defend the homosexual community's right to spew. That was
pretty amusing the first time that you did it, and I expect to be
amused again.

http://groups.google.com/group/comp.lang.forth/browse_thread/thread/c37b473ec4da66f1

You're inextricably linked to John Passanati. You have committed
yourself to defending him, so you are going to have to do it again and
again and again.

Uno mas!

BruceMcF

unread,

Sep 11, 2010, 2:49:34 PM9/11/10

to

On Sep 11, 1:57 pm, Paul Rubin <no.em...@nospam.invalid> wrote:

That's 480 extra bytes, 512 total, for that you can have 128 hash
buckets with a page to spare.

Brad

unread,

Sep 11, 2010, 4:31:33 PM9/11/10

to

>
> Even if your hash function is perfectly rectangular (which it won't
> be), you are only reducing your search space by 1/16. That is
> comparable to 4 steps into a binary tree. Then you go into a linked
> list, which is horribly inefficient. This doesn't make any sense.
>

A linked list is not inefficient. Traversing 20 links is fast, since
name mismatch occurs quickly. It could be as fast as 5 or 10 binary
search steps since it's a tighter loop.

Binary dictionary search has been considered and rejected by Forth
implementors for 40 years. There must be a reason.

-Brad

John Passaniti

unread,

Sep 11, 2010, 6:41:30 PM9/11/10

to

On Sep 11, 8:25 am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:

> Well, if we are prepared to use 1200 bytes more for the dictionary,
> we could use a hash table with 600 buckets. Each bucket will be
> filled with half a word on average. A failing search will take
> 1/2 string comparison on average, a successful search will take a
> little over one string comparison on average.

But wait, it gets better! If we were to waste as much memory as Hugh,
we should also factor in the amount of memory wasted in *code* for an
excessively complicated algorithm with unknown performance. So it's
not just wasting 1200 bytes for the dictionary, it's the bytes wasted
on the code as well.

Of course, none of this probably matters as the OP's primary concern
seems to be speed. Hugh's algorithm wastes *many* cycles building a
tree and rebalancing it according to frequency, and then wastes even
more cycles then searching the tree. His hope is that more common
symbols will be found faster, but he doesn't bother to characterize
the average or worst case performance. Why he expects anyone to take
symtab seriously without even a minimal investment on his part to
provide such statistics on a real-world data set is beyond me.

In contrast, the OP can get *guaranteed* O(log2 N) worst-case
performance by using a binary search on a dictionary stored as an
array with zero additional memory overhead (since the pointer used for
the link to the next word wouldn't be necessary and could be used to
build the array. He can have a *guarantee* of no more than 9 probes
to find a word, and he'll do it with far less memory (both in code and
data) and far fewer processor cycles than Hugh's symtab. The code is
dead-simple. The only expense would be moving memory to perform an
insertion sort on the array, but the cost of that is *still* going to
be dramatically cheaper than the kind of manipulation needed by Hugh's
symtab.

And if the OP still finds O(log2 N) worst-case performance to be
intolerable, then hash tables are an obvious choice, and there he
doesn't need to be a mathematician to come up with a good hash
function. Hash tables have been used for *years* in compilers and
there is plenty of research available in hash functions to draw on to
find something that is both simple and well-distributed. Hugh doesn't
know this because he refuses to do any research on the subject, but
that reality doesn't change just because Hugh refuses to observe it.
Hell, you need nothing more than Google. And once one does that
research, they find there are many variations on hash tables that make
them even more appealing-- open addressing, incremental extensibility,
and so on.

John Passaniti

unread,

Sep 11, 2010, 7:07:31 PM9/11/10

to

On Sep 11, 1:57 pm, Hugh Aguilar <hughaguila...@yahoo.com> wrote:
> Actually, I know exactly how much code space symtab takes to
> implement --- because I implemented it.

Really?

Can you tell us (or provide a reasonable estimate) of how much code
space it will take up implemented in either 6502 assembly or a Forth
for the 6502?

Can you tell us what the average and worst-case performance is on a
real-world data set?

Can you tell us the average time taken to insert elements into the
tree for a real-world data set?

Can you give the total time taken to build the tree for that real-
world data set?

> > I wish I could be like you, Hugh!
>
> It is a little bit late for you to become a man --- most males do
> that in their teenage years.

I'm not sure what you're referring to, but whatever it is, I'm sure
it's just another one of the impossibly low-bars that you were able to
cross off your life list. For you, being a man is apparently not
about words and works, but about what you did when you were a
teenager? Fascinating.

> LOL

Indeed. If this is the best you can manage when I (and others,
including the OP) have demonstrated the many real problems with
symtab, then yes, LOL.

> Note for Elizabeth Rather --- the above remark was both "homophobic"
> and "intolerant."

It was? I just read it as lame.

Funny that you lecture me about being a man, when you can't defend
your own work with integrity and honesty. Apparently, when I bring up
a list of technical issues with symtab (including the flawed
foundational premise it is based on) and you don't have a rational
response, the best you're able to do is distract away by bringing
Elizabeth into this.

John Passaniti

unread,

Sep 11, 2010, 7:16:02 PM9/11/10

to

On Sep 11, 4:31 pm, Brad <hwfw...@gmail.com> wrote:
> Binary dictionary search has been considered and rejected by
> Forth implementors for 40 years. There must be a reason.

Probably for these reasons:

1. They were on platforms where the search time for a linear list was
fast enough that they didn't care.
2. They didn't have enough words in their dictionary that it
mattered.
3. They didn't have large enough applications that the cumulative
search time mattered.
4. They recognized that search time is only when compiling code and
didn't mind making the programmer wait a bit, especially in systems
that cross-compiled to another platform.
5. They didn't consider alternatives and went for the canonical
implementation.

But in this particular case, the OP has cited very long search times
and is on a slow processor. Throwing away a technique like binary
searching merely because other programmers on other platforms
compiling other applications didn't choose it for whatever unknown
reason doesn't seem like a winning strategy for software design. One
of the essential lessons I get from Charles Moore's work is that you
don't have to be bound to tradition; that you can pick what makes
sense for the particular application.

BruceMcF

unread,

Sep 11, 2010, 9:18:00 PM9/11/10

to

On Sep 11, 1:17 pm, BruceMcF <agil...@netscape.net> wrote:

I think this can be made shorter at the cost of 2 cycles per word only
for those words equal in length to the target:

# FIND-IN-LIST ( TARGET$ LIST -- NFA TRUE | TARGET$ FALSE )
# CODE
LDA DL,X
STA W
LDA DH,X
STA W+1
SEC
LDA DL+1,X
SBC #2
STA V,X
LDA DH+1,X
SVC #0
STA V+1

LDA (V),Y
STA LEN
STARTLP:
LDY #2
LDA (W),Y
AND #$1F
CMP LEN

BMI NEXTWORD

BNE NOTFOUND
CHARLP:
INY
LDA (W),Y
CMP (V),Y

BNE NEXTWORD

CPY LEN
BNE CHARLP
BEQ FOUND

NEXTWORD:

DEY
LDA (W),Y
BEQ NOTFOUND
PHA
DEY
LDA (W),Y
STA W
PLA
STA W+1
BNE STARTLP

FOUND:

Elizabeth D Rather

unread,

Sep 11, 2010, 10:23:24 PM9/11/10

to

FORTH, Inc. has used binary searches for many things, dating back to the
70's, but dictionary search isn't one of them. The main problem is that
for a binary search your list has to be maintained in order, so searches
are very fast, but insertions are slow. You're doing dictionary
searches mainly at compile time, so slow insertions are death.

We have always used some sort of hash, with multiple chains so that the
hash selects which chain. Wordlists add an offset or 2nd hash before
the chain is selected. This strategy has delivered good performance on
processors ranging from 8080's to contemporary systems (Swiftforth is
supporting >3,000 words in a dozen or so wordlists, depending on
options). Details of the hash, #chains, etc., vary depending on the
platform.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

chitselb

unread,

Sep 12, 2010, 4:51:47 AM9/12/10

to

Numbers in parens are subject to change to improve performance. I'm
just describing the scheme I'm impelmenting using what the actual
(untuned) values are right now, in the bench-tested code that's
actually checked in.

Child vocabularies will be outside this system and instead maintained
as simple, separate, single chains (like figForth) and won't be
considered further, other than to say that they will work in the final
implementation.

The root FORTH vocabulary ("dictionary") is divided into (16) separate
chained linked lists ("threads") They are ordered by ascending name
length (Thanks for the idea, Bruce!) to allow early exit by FIND and
accessed by a 4-bit hash ("hash1")

I had never heard of this Bloom filter thing before, but after
glancing briefly at the diagram in the Wikipedia article, I now
consider myself enough of an expert to code one up for the 6502 and
fiddle around with it for a few days to see if it will help speed
things up. The current implementation uses two (8-bit) hashes
("hash2" and "hash3") and (2 separate) (256-bit) bit fields. Yes, I
know these values are far from optimal. I'm still playing with it
though, because the idea intrigues me.

Here are my thoughts: The ideal Hash1 has a perfect normal
distribution for the (300) words stored initially and all subsequently
added words. But the ideal Hash2/Hash3 functions are the opposite,
and has what I believe a mathemetician would call a Bernoulli
distribution. Heads or Tails. East Coast / West Coast. I don't want
hash2 or hash3 to come up "Iowa"

Most things presented to FIND fall into two categories
1) A word that exists in the dictionary
2) A literal number
3) typos

Here's FIND pseudocode
1. DHASH generates hash1, hash2, hash3
2. check the Bloom filter. Goto step 7 if fail
3. use hash1 to choose a thread to scan
4. follow the links until we get to something that matches the length
of this number
5. byte-for-byte string comparison until we fail
6. and exhaust words of matching length
7. QUIT sends it to NUMBER

Here's hash1.
sec
l009 lda (tos),y
eor n
sta n
rol n
dey
bne l009
lda n
sta n+1 ; <--- here's hash2

lsr
lsr
lsr
lsr
eor n
and #$0f

sta n

I really don't like hash2, it's too normally distributed. And here's
hash3, which I'm hoping will better distinguish letters from
numbers.

ldy #1
lda (tos),y ; get 1st character after
length
sec
sbc #' ' ; normalize so printables are
0..63
and #$3f ; mask off low 6-bits
sta n+2
lda n
lsr
ror
ror
ora n+2
sta n+2 ; hash3

And of course, all the application programmer has to do to destroy the
utility of my simple hash3 is : 1ohno ; : 2phooey ; : 3drat ; :
4hosed ; : 5grrrr ; etc... I mostly wanted to see how it looked once
the bit fields were populated. (Too sparse? Too dense? What does
"Too" mean here?).

Once I get it all up and running, I'll throw the first 10,000 integers
at it (encoded as counted strings) and also throw the entire
dictionary at it to get some benchmark times. Then I'll play with it
some more and if I don't come up with something compellingly good, it
might be time to eliminate this Bloom filter stuff entirely.

Hugh, by "6502 Golf" I am referring to the game-like practice of
squeezing every byte and clock cycle out of a given chunk of code,
expending effort far beyond the bounds that reason or good coding
practice would dictate. Self-modifying code? No problem. I recently
saw a Chuck Peddle interview where he talked about the early days.
When he designed the chip, he says he traveled around some and had the
chance to ask rooms full of engineers at other companies, what else
can be removed? So for me 6502 golf expresses an appreciation of the
zen minimalist spirit captured in the architecture of the chip. You
probably wouldn't catch me playing "Perl golf" or "Java golf"

Alex McDonald

unread,

Sep 12, 2010, 11:15:46 AM9/12/10

to

It's normal to insert the token at the head of the list, not the tail.
Later words then mask earlier words with the same name without special
action, and insertions are cheaper. In which case, the *reverse* of
your advice would apply; define the words as close to their use as
possible.

Caching words that are frequently used has been used in Forth
compilers before. Given that most caching has an space and code
overhead that is often greater than a hashed search, it is rarely
worth the effort. Especially for 300 words.

John Passaniti

unread,

Sep 12, 2010, 4:26:55 PM9/12/10

to

On Sep 11, 10:23 pm, Elizabeth D Rather <erat...@forth.com> wrote:
> FORTH, Inc. has used binary searches for many things, dating
> back to the 70's, but dictionary search isn't one of them. The
> main problem is that for a binary search your list has to be
> maintained in order, so searches are very fast, but insertions
> are slow. You're doing dictionary searches mainly at compile
> time, so slow insertions are death.

I guess that's something I want to model to convince myself that it
matters. Keeping the list sorted will move at worst N CELLS bytes
worth of memory (where N is the number of words in the dictionary) on
each insertion. That is potentially a lot, but then so is searching a
large linear list for the words that make up that definition. Most
Forth definitions will require several probes into the dictionary to
compile them, so you have to compare the time taken to move memory to
do the insertion verses the cumulative time needed for a linear search
on all the words of the definition.

What ultimately probably makes linear searching practical for many
traditional Forths is that more often, the words the programmer calls
on will be ones they defined and not lower-level system words. So
those will be found faster. But on most every Forth I've ever used,
the most basic words (such as math, control flow, and stack
manipulation) are some of the first definitions and thus last when
searching, so there is still going to be a lot of cumulative time
spent finding them.

Of course, using a hash, either to select from multiple linear lists
or as a hash table, is going to pay off big. And except for
unrealistic pathological cases that don't happen except by engineering
them, hash tables will win over binary searches. Concerns about "bad"
hash functions are valid, but thankfully, most people don't have to
design hash functions. There has been tons of study in this, and
loads of published results on various takes on hash functions that
people can freely use.

Still, I'm not the kind of guy who takes pretty much anything on
faith, so I'll model the behavior of a binary search and see if the
insertion for each definition is indeed "death". Should be fun.

Paul Rubin

unread,

Sep 12, 2010, 4:33:21 PM9/12/10

to

John Passaniti <john.pa...@gmail.com> writes:
> But on most every Forth I've ever used, the most basic words (such as
> math, control flow, and stack manipulation) are some of the first
> definitions and thus last when searching,

One obvious hack is to build the initial linked lists so that the
builtins are ordered by reverse frequency of use as observed by
analyzing a code corpus. So the most frequently used words will
appear first and fastest.

Andreas

unread,

Sep 12, 2010, 5:06:51 PM9/12/10

to

John Passaniti:

> What ultimately probably makes linear searching practical for many
> traditional Forths is that more often, the words the programmer calls
> on will be ones they defined and not lower-level system words. So
> those will be found faster. But on most every Forth I've ever used,
> the most basic words (such as math, control flow, and stack
> manipulation) are some of the first definitions and thus last when
> searching, so there is still going to be a lot of cumulative time
> spent finding them.

In _most_ cases compile-time reduction is not worth the effort.

In my most cases I use a classic VOCABULARY scheme. (For the
application) unused vocabs plus a HIDDEN vocab keep many words out of
the search CONTEXT. In this way the search thread is linear and remains
acceptably short.

In other words: K.I.S.S.

Sp...@controlq.com

unread,

Sep 12, 2010, 5:30:23 PM9/12/10

to

On Sat, 11 Sep 2010, Elizabeth D Rather wrote:
>
> FORTH, Inc. has used binary searches for many things, dating back to the
> 70's, but dictionary search isn't one of them. The main problem is that for
> a binary search your list has to be maintained in order, so searches are very
> fast, but insertions are slow. You're doing dictionary searches mainly at
> compile time, so slow insertions are death.

Elizabeth,

I would have thought that compilation time would take a second seat to
execution time in an embedded system, but I suppose for a commercial
product, compilation benchmarks are an important differentiator, not that
a millisecond or two will make a huge difference to an embedded engineer
for each test cycle.

Surely linear search is only an issue when defining words, or running at
the interactive Ok prompt ... execution time of compiled words should be
unaffected by the choice of hash/linear dictionary lookup, no?

> We have always used some sort of hash, with multiple chains so that the hash
> selects which chain. Wordlists add an offset or 2nd hash before the chain is
> selected. This strategy has delivered good performance on processors ranging
> from 8080's to contemporary systems (Swiftforth is supporting >3,000 words in
> a dozen or so wordlists, depending on options). Details of the hash,
> #chains, etc., vary depending on the platform.
>
> Cheers,
> Elizabeth

As I recall, it was once recommended that small integers (eg: 0..9) be
defined as constants so that they would be found more quickly by the
interpreter, and not only after the search fell off the end of the
dictionary, so to speak. Of course this would potentially be affected by
the BASE value ...

Rob Sciuk

Elizabeth D Rather

unread,

Sep 12, 2010, 10:53:43 PM9/12/10

to

On 9/12/10 11:30 AM, Sp...@ControlQ.com wrote:
> On Sat, 11 Sep 2010, Elizabeth D Rather wrote:
>>
>> FORTH, Inc. has used binary searches for many things, dating back to
>> the 70's, but dictionary search isn't one of them. The main problem is
>> that for a binary search your list has to be maintained in order, so
>> searches are very fast, but insertions are slow. You're doing
>> dictionary searches mainly at compile time, so slow insertions are death.
>
> Elizabeth,
>
> I would have thought that compilation time would take a second seat to
> execution time in an embedded system, but I suppose for a commercial
> product, compilation benchmarks are an important differentiator, not
> that a millisecond or two will make a huge difference to an embedded
> engineer for each test cycle.

Sure, runtime is the most important. But compilation efficiency is
important as a programming convenience, because when you're testing
stuff it's easiest to just recompile every time. If you can compile
even a very large program in <1 sec. it's a much pleasanter work
environment.

> Surely linear search is only an issue when defining words, or running at
> the interactive Ok prompt ... execution time of compiled words should be
> unaffected by the choice of hash/linear dictionary lookup, no?

Well, command-line processing is usually so fast you don't notice. Even
in the 70's that was instantaneous. It's really only compilation that
involves lots of dictionary searches. And of course, execution is
utterly unaffected, as you say.

>> We have always used some sort of hash, with multiple chains so that
>> the hash selects which chain. Wordlists add an offset or 2nd hash
>> before the chain is selected. This strategy has delivered good
>> performance on processors ranging from 8080's to contemporary systems
>> (Swiftforth is supporting >3,000 words in a dozen or so wordlists,
>> depending on options). Details of the hash, #chains, etc., vary
>> depending on the platform.
>

> As I recall, it was once recommended that small integers (eg: 0..9) be
> defined as constants so that they would be found more quickly by the
> interpreter, and not only after the search fell off the end of the
> dictionary, so to speak. Of course this would potentially be affected by
> the BASE value ...

We did some studies on that very early on, and concluded there was no
gain for integers >2, and even 2 was marginal. Nowadays, optimizing
compilers take care of them much better anyway.

John Passaniti

unread,

Sep 13, 2010, 1:29:03 AM9/13/10

to

On Sep 12, 4:33 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> One obvious hack is to build the initial linked lists so
> that the builtins are ordered by reverse frequency of use
> as observed by analyzing a code corpus. So the most
> frequently used words will appear first and fastest.

If you're going to go there, an even more obvious hack would be that
every time a word was found after a search, link it at the start of
the wordlist. This moves words that are actually used closer to the
front, while words that are never used get pushed to the end. Then
you're not stuck with a static set of frequencies that may not
represent your code's frequencies. It's also adaptive, so as the
particular set of words used changes, those words become faster.

John Passaniti

unread,

Sep 13, 2010, 1:40:11 AM9/13/10

to

On Sep 12, 5:06 pm, Andreas <a....@nospam.org> wrote:
> In _most_ cases compile-time reduction is not worth the effort.

If we're talking about compile-time when using Forth interactively,
then I largely agree. A bit of time after you hit return doesn't
usually matter. That statement of course is predicated on how fast
the machine you're on it. The faster the machine, the less any of
this matters. On a modern desktop system, I'd be surprised if even
the least efficient Forth doesn't keep up with the user. But the OP
did mention *seconds* to compile definitions, so putting forward some
minor optimizations is worth the effort. And when using Forth non-
interactively, I always see benefit in reducing the compile time,
since a large Forth program could potentially take a long time.

> In other words: K.I.S.S.

Simplicity is one thing. Meeting performance expectations is
another. Fast machines with gobs of memory make going for the simple
and stupid much easier than targeting a 6502 running at 1MHz.

Elizabeth D Rather

unread,

Sep 13, 2010, 4:06:26 AM9/13/10

to

On 9/12/10 10:26 AM, John Passaniti wrote:
> On Sep 11, 10:23 pm, Elizabeth D Rather<erat...@forth.com> wrote:
>> FORTH, Inc. has used binary searches for many things, dating
>> back to the 70's, but dictionary search isn't one of them. The
>> main problem is that for a binary search your list has to be
>> maintained in order, so searches are very fast, but insertions
>> are slow. You're doing dictionary searches mainly at compile
>> time, so slow insertions are death.
>
> I guess that's something I want to model to convince myself that it
> matters. Keeping the list sorted will move at worst N CELLS bytes
> worth of memory (where N is the number of words in the dictionary) on
> each insertion. That is potentially a lot, but then so is searching a
> large linear list for the words that make up that definition. Most
> Forth definitions will require several probes into the dictionary to
> compile them, so you have to compare the time taken to move memory to
> do the insertion verses the cumulative time needed for a linear search
> on all the words of the definition.

...

>
> Of course, using a hash, either to select from multiple linear lists
> or as a hash table, is going to pay off big. And except for
> unrealistic pathological cases that don't happen except by engineering
> them, hash tables will win over binary searches. Concerns about "bad"
> hash functions are valid, but thankfully, most people don't have to
> design hash functions. There has been tons of study in this, and
> loads of published results on various takes on hash functions that
> people can freely use.

Yes. We have experimented both with different hash algorithms and
different numbers of linear lists (selected by hash, modified by
wordlist index), and are satisfied that what we use now is close to
optimum.

> Still, I'm not the kind of guy who takes pretty much anything on
> faith, so I'll model the behavior of a binary search and see if the
> insertion for each definition is indeed "death". Should be fun.

I'll be interested in your results!

Rod Pemberton

unread,

Sep 13, 2010, 4:37:30 AM9/13/10

to

"John Passaniti" <john.pa...@gmail.com> wrote in message

news:19e450cd-4fe8-4149...@j2g2000vbo.googlegroups.com...
> On Sep 9, 4:54 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> wrote:
> > That's relatively expensive on a 6502--
>
> > Relatively expensive? Are you joking? You do know that even
> > the lowly Vic 20 had at least 3.5KB available for the user's
> > app... Yes? Is 128 bytes too much for it to handle?
>
> In terms of addressability, no. In terms of memory consumption that
> could be used for other things, yes. Those 128 bytes out of 3.5k
> represents roughly 4% of the total memory available. I haven't
> programmed on a 6502 for years, but I certainly wouldn't want to give
> up that much memory for something that can more easily be accomplished
> with a simple roll instruction.
>

Ludicrous... 128 bytes is minimal, even with only 4KB. He's running a
single application. When was the last time, or ever for that matter, that
you wrote a 4KB application in assembly?

> > He could put the table into zero page, if his machine has enough
> > free space there. That would shave a few cycles. He could also
> > reduce the size of the random table to 64 bytes - just those
> > between space and underscore, i.e., uppercase and most graphics
> > char's.
>
> Zero page on a 6502 is precious.
>

True, but, most of it is usually consumed by the OS proper. I.e.,
unavailable to him.

> You don't waste it with a 128 byte
> lookup table when that table isn't necessary.

If his machine has an OS, he cannot fit even the 64 byte table I mentioned
there... Not sure why you assumed it was going there.

> The OP isn't implementing a hash. He is instead using a hash function to
> select one of 16 possible lists to then search.

That is implementing a hash! Who cares whether he stores the hash values on
a per word or per list basis? It's the same thing. He still has a list of
words, distinguished by a hash value, that he must check, via string
comparisons, to determine the correct word. Those comparisons are the
bottleneck. Using a different, better hash can reduce those comparisons.
How many times do I need to say this?

> So the goal here isn't to find a hashing function that gives unique values
> for each word. The goal here is to find a hashing function that equally
> distributes across the number of possible lists.

And, that reduces - in all cases - to the issue I discussed about reducing
string comparisons.

> > > A simpler solution might be to roll the bits after each exclusive-or
> > > with a character. That's a single instruction, and will provide ample
> > > mixing of bits.
> >
> > It's definately simpler. It's definately fewer cycles. I'm not sure
> > that it's any better...
>
> It is. You're free to model this yourself and see.
>

I did. I mentioned that too... ROL is really poor. At best, it matches
the 128 byte table with good randomized data. Certain initial seed values
make the table method better. I mentioned that too...

(Some time ago, I mentioned to you the apparent memory failures you exhibit.
They are apparent in your responses to me throughout this thread too.)

> And for that matter, you're free to model your 128 byte table (and
> whatever pseudo-random values you would store there) and measure
> the actual performance instead of assuming it is naturally superior.

I did, the hash part, but in C on x86. I mentioned the results in another
post... I.e., "horrid" to do what he's wanting to do... Do you remember
the "average" and "peak" values I posted? No? Why not?

> Given a poor pseudo-random number generator populating
> those 128 values, you could have exactly the same problem.
> And just as you (correctly) point out that generating a good hash function
> is non-trivial, generating a good pseudo-random number generator is
> equally difficult. It is in fact, the same basic problem.
>

Laughable...

Almost no one would code a PRNG to "populate those 128 values" when there
are numerous high quality PRNG's publicly available. This is another insane
claim by you. In fact, I'd call it an attempt at FUD: fear, uncertainty,
doubt. Anyone could easily compile a PRNG such as George Marsaglia's
KISS4691 or the Matsumoto and Nishimura's Mersenne Twister on some other
platform, e.g., x86 or AA-64, to generate a precomputed table.

Rod Pemberton

unread,

Sep 13, 2010, 5:06:13 AM9/13/10

to

"BruceMcF" <agi...@netscape.net> wrote in message
news:123d2e09-e393-4d1a...@g10g2000vbc.googlegroups.com...
> On Sep 8, 7:16 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
> wrote:
> > The CFA, while it typically points to the DOCOL routine for most words,
it
> > doesn't for all. By eliminating the CFA, you won't be able to
distinguish
> > between compiled words and low-level routines (aka "primitives") or
words
> > that use a non-DOCOL CFA: dovar, docon, douse, dodoes, etc. Eliminating
the
> > CFA for compiled words is not an issue, but for the other words it is -
> > since they won't be calling DOCOL. You'd need a way to determine when
you'd
> > reached one of these words instead of a compiled word. I.e., the nesting
of
> > addresses in compiled words is much like traversing a binary tree and a
> > non-DOCOL CFA is used to tell you when you've reached a node without any
> > branches...
>
> But there's no need to "know" ... you jumped TO the word, so each word
> starts with executable machine code, whether the code of the
> primitives or JSR ENTER, JSR DOVAR, etc. The word itself "knows" what
> to do.
>

You cannot jump to a compiled word without a CFA... It's not assembly.
It's a list of addresses. Jumping to a compiled word requires a CFA set to
DOCOL. This is the exact opposite of what I just described! It's the
historical model. I described *eliminating* the CFA for DOCOL words.

If the CFA is eliminated for all DOCOL words, some other method of address
interpretation is needed and a method for identification of non-DOCOL words
is necessary. If the DOCOL CFA's are eliminated, compiled Forth words
become a large linked-list of addresses, except for the handful of non-DOCOL
words. I.e., a DOCOL free Forth using linked-list can save some bytes.

Rod Pemberton

Albert van der Horst

unread,

Sep 13, 2010, 8:50:34 AM9/13/10

to

In article <440b6150-9def-46d7...@j19g2000vbh.googlegroups.com>,
John Passaniti <john.pa...@gmail.com> wrote:

>On Sep 11, 4:31=A0pm, Brad <hwfw...@gmail.com> wrote:
>> Binary dictionary search has been considered and rejected by
>> Forth implementors for 40 years. There must be a reason.
>
>Probably for these reasons:
>
>1. They were on platforms where the search time for a linear list was
>fast enough that they didn't care.
>2. They didn't have enough words in their dictionary that it
>mattered.
>3. They didn't have large enough applications that the cumulative
>search time mattered.
>4. They recognized that search time is only when compiling code and
>didn't mind making the programmer wait a bit, especially in systems
>that cross-compiled to another platform.
>5. They didn't consider alternatives and went for the canonical
>implementation.

6. It makes the redefinition of words harder, especially if you
want FORGET to work.

7. It breaks tricks ^H^H^H^H^H techniques for numbers like the
$.. $.... words in figForth and denotations in ciforth.

There is an expectation of words to be in order of definition.
A hash implementation doesn't break that expectation.

>But in this particular case, the OP has cited very long search times
>and is on a slow processor. Throwing away a technique like binary
>searching merely because other programmers on other platforms
>compiling other applications didn't choose it for whatever unknown
>reason doesn't seem like a winning strategy for software design. One
>of the essential lessons I get from Charles Moore's work is that you
>don't have to be bound to tradition; that you can pick what makes
>sense for the particular application.

Agreed. What Chuck Moore does in colorforth, is just use an array
and linear search. So no linking of data structures there.
Adding binary search would be a snap, but replacing a blindingly fast
Pentium search string instruction that races through a couple of 100
words apparently isn't worth it.

Albert van der Horst

unread,

Sep 13, 2010, 9:00:38 AM9/13/10

to

In article <f6108d06-b64e-4c1c...@e20g2000vbn.googlegroups.com>,
chitselb <chit...@gmail.com> wrote:
<SNIP>

>Here are my thoughts: The ideal Hash1 has a perfect normal
>distribution for the (300) words stored initially and all subsequently
>added words. But the ideal Hash2/Hash3 functions are the opposite,

As you remark later, hashing doesn't lead to a normal distribution,
The chance that a certain hash occurs -100 times is zero.
A normal distribution requires a non-zero (small) chance.
Also it is discrete, the chance of having a hash 7.3 times is zero.

Paul Rubin

unread,

Sep 13, 2010, 11:07:49 AM9/13/10

to

Albert van der Horst <alb...@spenarnc.xs4all.nl> writes:
>>Here are my thoughts: The ideal Hash1 has a perfect normal

> As you remark later, hashing doesn't lead to a normal distribution,

The term Chitselb wanted is "uniform distribution" (all probabilities
equal). "Normal distribution" is the famous bell-shaped curve.

John Passaniti

unread,

Sep 13, 2010, 11:55:12 AM9/13/10

to

On Sep 13, 4:37 am, "Rod Pemberton" <do_not_h...@notreplytome.cmm>
wrote:

> Ludicrous... 128 bytes is minimal, even with only 4KB. He's
> running a single application. When was the last time, or ever
> for that matter, that you wrote a 4KB application in assembly?

Last year. 8k of hand-optimized DSP code. Before that, 6k of
assembly-language code I wrote originally in 1991 that was for a 8051
that I was asked to port to a HCS08 processor.

But ignoring that, I don't see what the amount of assembly language I
write has to do with anything. Memory is memory, why does it matter
what the code is written in? What, are the bits thinner when writing
in assembler?

> > Zero page on a 6502 is precious.
>
> True, but, most of it is usually consumed by the OS proper. I.e.,
> unavailable to him.

Ummm, no. The amount of data used by the "OS" (you do know we're
talking about a Commodore PET, right?) is minimal; most of that is
indeed available to the application, and you're not going to put a
table of 128 bytes there.

> > You don't waste it with a 128 byte
> > lookup table when that table isn't necessary.
>
> If his machine has an OS, he cannot fit even the 64 byte table I mentioned
> there... Not sure why you assumed it was going there.

It was originally 128 bytes of zero page. Now it's 64 bytes. How
about not use *any* of it? You're hung up on the idea that you need a
table to mix up the bits of the ASCII values. That's certainly one
way, but not the only one. Instead of talking abstractly about this,
you are invited to provide that table and that hash algorithm that
uses that table. I will then gleefully demonstrate any number of hash
routines that are shorter and faster that provide *better* results.

> > The OP isn't implementing a hash. He is instead using a hash function to
> > select one of 16 possible lists to then search.
>
> That is implementing a hash! Who cares whether he stores the hash values on
> a per word or per list basis? It's the same thing. He still has a list of
> words, distinguished by a hash value, that he must check, via string
> comparisons, to determine the correct word. Those comparisons are the
> bottleneck. Using a different, better hash can reduce those comparisons.
> How many times do I need to say this?

The difference is in the design of the hash function. In a hash
table, the goal for the hash function is uniqueness given some number
of slots. In this scheme, the goal is equal distribution of the
values from the hash function. Structurally, what the OP is doing is
a degenerate hash table that is fully loaded and won't ever be
resized. I see that as different in that it places different
requirements on the hash function. If you don't see that as
different, that's fine with me.

> > So the goal here isn't to find a hashing function that gives unique values
> > for each word. The goal here is to find a hashing function that equally
> > distributes across the number of possible lists.
>
> And, that reduces - in all cases - to the issue I discussed about reducing
> string comparisons.

Exactly, which is why the hash function needs to be different from
what would normally be desired for a hash table; it has different
requirements. In particular, the OP would ideally want a hash
function that resulted in an even distribution, but that further made
subsequent string comparisons quick. That means the hash function
could be designed so words with the same length and starting letters
are placed in different lists. For example, REFILL, REPEAT, and
RESIZE would ideally be in different lists as would FACOS, FALOG,
FALSE, FASIN, and FATAN. No, it's not a requirement the OP had, but
you're right

> > It is. You're free to model this yourself and see.
>
> I did. I mentioned that too... ROL is really poor. At best, it matches
> the 128 byte table with good randomized data. Certain initial seed values
> make the table method better. I mentioned that too...

What a bizarre reply-- ROL is really poor, but at best is *matches*
the 128 byte table. So if it "at best" is no better than a wasteful
table of 128 bytes, why would choose to waste 128 bytes on the table
and additional bytes to address that table? Wow, when you're fixated
on a particular solution-- even when simpler and smaller that are "at
best" equivalent exist-- you don't let go, do you?

> (Some time ago, I mentioned to you the apparent memory failures
> you exhibit. They are apparent in your responses to me throughout
> this thread too.)

My memory is fine. I will admit fatigue at trying to follow your
constantly shifting arguments.

> > And for that matter, you're free to model your 128 byte table (and
> > whatever pseudo-random values you would store there) and measure
> > the actual performance instead of assuming it is naturally superior.
>
> I did, the hash part, but in C on x86. I mentioned the results in another
> post... I.e., "horrid" to do what he's wanting to do... Do you remember
> the "average" and "peak" values I posted? No? Why not?

Yep, you gave off-the-cuff numbers that were wrong. So I ignored it.
The OP is trying to store around 300 words. You wrote that a good
distribution would result in 32 words per bucket. Ummm, no. The math
here is kind of tricky, but if you have 300 words and 16 buckets, then
a good distribution would be, gosh, 300/16 or about 19 words per
bucket. Moving on, you wrote that a poor distribution would result in
"say 48 or 56" words per bucket. Now call me old fashioned, but if
you actually wrote code and measured a poor distribution, qualifying
words like "say" probably won't be in your results.

So from this, I concluded that you didn't write any code to test the
distribution and were just pulling numbers out of thin air. Or, if I
wanted to be charitable, you did write code and did do a test, but you
didn't test against what the OP is looking for. Which makes the
numbers useless because you're just offering generic observations, not
something helpful to the OP.

Unlike you, I like to not only test my claims, but show the code for
those tests. So to do that, I started with a relevant dataset-- I
took the 359 unique Forth words in the ANS Standard, which I assume to
be a superset of the words the OP has in his dictionary. I then
slapped together a script that takes several simple hash functions and
tests them. As a testbench, this lets me play with various hash
functions and actually see the results. Here's the code:

http://japanisshinto.com/stuff/hashtest.pl

What I get out of this is a display that shows each algorithm and the
delta of the number of entries in that bucket from the ideal. For
example, here is the result taking a hash function that does an
exclusive OR followed by a roll:

-3 9 -4 0 -1 -1 -1 6 1 2 -5 0 0 -1 -3 2

Not fabulous-- the buckets with 9, 6, and -5 aren't great. But this
actually was the best for 16 buckets of the simple algorithms I
tried. Isn't science fun?!

So now, armed with this, you are invited to give me your best hash
function. I'll plug it in my testbench and we'll actually see if your
hash function is any better. And gosh, I'm even giving you the
dataset I'm testing against, so you're free to engineer the "random"
values to be less than random and tuned for this particular dataset.
Only requirement here is that your hash function does what the OP
wants-- returns a number between 0 and 15.

Time to show me wrong, Rod. Stop talking and start coding.

Brad

unread,

Sep 13, 2010, 12:02:37 PM9/13/10

to

Well, that is just too clever. But simple classic hashing should solve
this search time problem that was prevalent (and solved) 40 years ago.
If it's too slow, use more buckets. If you have enough buckets, it's
faster than binary search.

The OP seems to be under the impression that string comparison takes a
long time. Yeah, but only if the two strings are almost identical.
First of all, the string length probably won't be a match -- a quick
exit. Then the first characters probably won't match. So for almost
all of the string comparisons, you don't get to compare more than one
or two bytes.

-Brad

Hugh Aguilar

unread,

Sep 13, 2010, 1:27:57 PM9/13/10

to

On Sep 12, 2:33 pm, Paul Rubin <no.em...@nospam.invalid> wrote:

> John Passaniti <john.passan...@gmail.com> writes:
> > But on most every Forth I've ever used, the most basic words (such as
> > math, control flow, and stack manipulation) are some of the first
> > definitions and thus last when searching,
>

> One obvious hack is to build the initial linked lists so that the
> builtins are ordered by reverse frequency of use as observed by
> analyzing a code corpus. So the most frequently used words will
> appear first and fastest.

How are you going to analyze code corpus? (and btw, doesn't "corpus"
refer to a dead body?) Are you planning on writing a lint-like pseudo-
compiler that goes through the motions of compiling but actually just
counts the words? It is easy to think of grandiose ideas like this,
but less easy to implement them --- by comparison, I have implemented
symtab. Wouldn't it be a lot easier just to use symtab? Then you get
an analysis not of a dead body, but of your own application. As I said
earlier, once you get your data you can always switch over to a linked
list later on to save memory. A linked list ordered from high-priority
to low-priority words is a lot more efficient than a linked list
ordered randomly --- but it is not as efficient as a symtab tree with
the high-priority words at the top, because a binary tree cuts off
roughly 1/2 of the words with every comparison, whereas a linked list
cuts off exactly *one* word with every comparison.

On the plus side though, you are at least starting to consider the
importance of putting high-priority words in front of low-priority
words. As I said before, the high-priority words get accessed between
one and two orders of magnitude more often than the low-priority
words. If you can make the high-priority words an order of magnitude
faster than the low-priority words (this would be a symtab tree 10
levels deep), then you are going to get a speed boost of two to three
orders of magnitude over a randomly populated linked list.

On Sep 11, 2:31 pm, Brad <hwfw...@gmail.com> wrote:
> > Even if your hash function is perfectly rectangular (which it won't
> > be), you are only reducing your search space by 1/16. That is
> > comparable to 4 steps into a binary tree. Then you go into a linked
> > list, which is horribly inefficient. This doesn't make any sense.
>
> A linked list is not inefficient. Traversing 20 links is fast, since
> name mismatch occurs quickly. It could be as fast as 5 or 10 binary
> search steps since it's a tighter loop.

Even a simple binary tree is going to be more efficient than a linked
list --- as I said above, with a binary tree you cut out roughly half
of the words with each comparison, compared to a linked list that cuts
out exactly *one* word with each comparison. It is true that a list
comparison is faster than a tree comparison because all you need to do
is check for equivalence rather than greater or less than, but this
speed difference is pretty minor. With a binary tree you have a lot
more information to work with (greater or less than, compared to
simple equal or not equal), and you can do a lot more with this
information --- cut out half the needed comparisons.

A linked list really has no virtue except that it uses less memory
because it has only one link pointer. I wouldn't even describe a
linked list as being easier to implement than a binary tree --- they
are both pretty easy.

> Binary dictionary search has been considered and rejected by Forth
> implementors for 40 years. There must be a reason.

This goes back to what I said earlier about how Luke Skywalker asked
Yoda why something was true, and Yoda replied: "There is no why!" When
Skywalker asks "why," he gets yelled at --- the "force" is not
science, but merely religion. You basically want to be Yoda --- you
want to be the big-shot sensei while I accept my role as grasshopper
(student) --- and you don't want to explain your reasons, because you
don't have any.

Realistically, the "reason" why Chuck Moore used a linked-list
dictionary was that it was just an expedient hack. I doubt that he
expected people to still be grasshoppering this idea in the year 2010.
The linked-list dictionary is cruft.

Sp...@controlq.com

unread,

Sep 13, 2010, 1:28:26 PM9/13/10

to

On Sun, 12 Sep 2010, Elizabeth D Rather wrote:
> On 9/12/10 11:30 AM, Sp...@ControlQ.com wrote:
>>
>> Elizabeth,
>>
>> I would have thought that compilation time would take a second seat to
>> execution time in an embedded system, but I suppose for a commercial
>> product, compilation benchmarks are an important differentiator, not
>> that a millisecond or two will make a huge difference to an embedded
>> engineer for each test cycle.
>
> Sure, runtime is the most important. But compilation efficiency is important
> as a programming convenience, because when you're testing stuff it's easiest
> to just recompile every time. If you can compile even a very large program
> in <1 sec. it's a much pleasanter work environment.

Agreed. This does, of course bring up FORGET, and its obsolescence. If
you are simply re-compiling a program, how does one return the embedded
system to its pristine state in the absence of a FORGET/FENCE mechanism
(or similar) word? This is particularly important in an embedded
environment ... unless ...

>
>>
>> As I recall, it was once recommended that small integers (eg: 0..9) be
>> defined as constants so that they would be found more quickly by the
>> interpreter, and not only after the search fell off the end of the
>> dictionary, so to speak. Of course this would potentially be affected by
>> the BASE value ...
>
> We did some studies on that very early on, and concluded there was no gain
> for integers >2, and even 2 was marginal. Nowadays, optimizing compilers
> take care of them much better anyway.
>
> Cheers,
> Elizabeth

It seems that frequently comparison and contrast is made between what is
the classic "FIG?" model, and what modern compilers are doing -- usually
with a message which impugns (?) the historical approach.

Is it that modern Forth systems consist entirely of optimizing X-compilers
which produce dense machine code for embedded systems or is an interactive
model still in vogue? Does the VM/interpreter concept still exist or has
it been abstracted away? (I'm making reference here to the embedded sector
primarily, rather than the "hosted" environment).

I'm not sure that I understand what is meant by "modern compilers" in the
context of Forth, and I'm not trying to be facetious here. I'd really
like to understand how the language has evolved -- at least in commercial
variants. Are there pointers to reference materials which would help me
to understand the evolution??

Cheers,
Rob Sciuk

Hugh Aguilar

unread,

Sep 13, 2010, 1:54:26 PM9/13/10

to

I had been assuming that the words were appended on the linked list.
If they are prepended, then the earlier words (often also the more
common primitives) would be last to be found --- that would be
horrible!

You are better off to append rather than prepend. Assuming that the
earlier words are more common (pretty much true), you would pretty
much get high-priority words in front of low-priority words. As I said
before, this is haphazard, but it is better than nothing.

Appending is slightly slower than prepending (because the list has to
be traversed), but this only matters when new words are created, which
only happens once per word --- compared to when words are found by
FIND, which can happen many hundreds of times in an application. Also,
traversing a list is very fast, as all you have to look for is the NIL
link at the end.

> Caching words that are frequently used has been used in Forth
> compilers before. Given that most caching has an space and code
> overhead that is often greater than a hashed search, it is rarely
> worth the effort. Especially for 300 words.

Caching is similar to a splay tree, in that it assumes that recently
accessed words are likely to be accessed again in the near future ---
an assumption that has no basis in regard to compilation. I don't
recommend caching for the same reason that I don't recommend splay
trees --- there is no reason to believe that if you reference a
particular Forth word you are going to access it again very soon. The
exact opposite is true; if your function calls a particular sub-
function, it is not going to call it again immediately.

On Sep 12, 3:30 pm, S...@ControlQ.com wrote:
> On Sat, 11 Sep 2010, Elizabeth D Rather wrote:
>
> > FORTH, Inc. has used binary searches for many things, dating back to the
> > 70's, but dictionary search isn't one of them. The main problem is that for
> > a binary search your list has to be maintained in order, so searches are very
> > fast, but insertions are slow. You're doing dictionary searches mainly at
> > compile time, so slow insertions are death.
>
> Elizabeth,
>
> I would have thought that compilation time would take a second seat to
> execution time in an embedded system, but I suppose for a commercial
> product, compilation benchmarks are an important differentiator, not that
> a millisecond or two will make a huge difference to an embedded engineer
> for each test cycle.
>
> Surely linear search is only an issue when defining words, or running at
> the interactive Ok prompt ... execution time of compiled words should be
> unaffected by the choice of hash/linear dictionary lookup, no?

Elizabeth Rather is implying that creating words and inserting them
into the dictionary is done at compile-time, but looking up words is
done at run-time. This is complete nonsense. Words are *both* created
and looked up during compile-time --- none of this happens at run-
time.

Most applications don't even have the dictionary in memory during run-
time, and many commercial compilers (including SwiftForth, which E.R.
should be familiar with), disallow having the dictionary in memory
during run-time to prevent the users from selling a Forth compiler in
competition with the compiler that it was written in. Occasionally,
you will find Forth applications that keep the dictionary in memory at
run-time, and do compilation at run-time, but this is very rare.

As I said above, a word only gets created once, but it can be looked
up by FIND many hundreds of times during the compilation of an
application. The best way to speed up compilation time for an
application, is to make finding the words that get accessed hundreds
of times (the high-priority words) an order of magnitude faster than
finding the words that get accessed once or twice (the low-priority
words). One order of magnitude increase in speed for a word that is
accessed two orders of magnitude more than the others, will result in
a three order of magnitude increase in speed for the compilation.

Elizabeth D Rather

unread,

Sep 13, 2010, 1:54:31 PM9/13/10

to

On 9/13/10 7:28 AM, Sp...@ControlQ.com wrote:
>
> On Sun, 12 Sep 2010, Elizabeth D Rather wrote:
>> On 9/12/10 11:30 AM, Sp...@ControlQ.com wrote:
>>>
>>> Elizabeth,
>>>
>>> I would have thought that compilation time would take a second seat to
>>> execution time in an embedded system, but I suppose for a commercial
>>> product, compilation benchmarks are an important differentiator, not
>>> that a millisecond or two will make a huge difference to an embedded
>>> engineer for each test cycle.
>>
>> Sure, runtime is the most important. But compilation efficiency is
>> important as a programming convenience, because when you're testing
>> stuff it's easiest to just recompile every time. If you can compile
>> even a very large program in <1 sec. it's a much pleasanter work
>> environment.
>
> Agreed. This does, of course bring up FORGET, and its obsolescence. If
> you are simply re-compiling a program, how does one return the embedded
> system to its pristine state in the absence of a FORGET/FENCE mechanism
> (or similar) word? This is particularly important in an embedded
> environment ... unless ...

To restore the dictionary to a predetermined state requires merely
saving a little state information: in our systems, heads of all the
dictionary threads, CONTEXT, CURRENT, and dictionary pointer. MARKER
records a new set. Operationally, though, I rarely use more than the
restore to the state of the precompiled system, since it's so fast to
recompile everything. Last time we went around the FORGET issue a lot
of people said the same thing.

...

>
> It seems that frequently comparison and contrast is made between what is
> the classic "FIG?" model, and what modern compilers are doing -- usually
> with a message which impugns (?) the historical approach.
>
> Is it that modern Forth systems consist entirely of optimizing
> X-compilers which produce dense machine code for embedded systems or is
> an interactive model still in vogue? Does the VM/interpreter concept
> still exist or has it been abstracted away? (I'm making reference here
> to the embedded sector primarily, rather than the "hosted" environment).

The common practice nowadays is interactive cross-compilers, where the
host (PC running Windows or Linux) does all the compiling for the target
and automatically downloads code to the target. A searchable target
dictionary remains in the host, and an umbilical connection enables you
to type a command line on the host which will be executed on the target.
The overall effect is exactly like interactive development in
traditional Forth systems, except that the target isn't burdened with
the overhead of dictionary management and a text interpreter. With the
compiler actually running on the host, it can do the work of generating
optimized machine code, so you get smaller, faster target programs.

> I'm not sure that I understand what is meant by "modern compilers" in
> the context of Forth, and I'm not trying to be facetious here. I'd
> really like to understand how the language has evolved -- at least in
> commercial variants. Are there pointers to reference materials which
> would help me to understand the evolution??

There's an overview here: http://www.forth.com/embedded/index.html. You
can also get a free evaluation version of SwiftX for your favorite
target, limited only in the size of the target program you can generate,
to try it out further. It comes with target source and extensive
documentation.

Hugh Aguilar

unread,

Sep 13, 2010, 2:15:37 PM9/13/10

to

On Sep 11, 6:25 am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:

> Hugh Aguilar <hughaguila...@yahoo.com> writes:
> >Even if your hash function is perfectly rectangular (which it won't
> >be), you are only reducing your search space by 1/16. That is
> >comparable to 4 steps into a binary tree. Then you go into a linked
> >list, which is horribly inefficient. This doesn't make any sense.

> ...
> >Symtab has 3 words per node (left pointer, rite pointer and counter),
> >whereas a linked list has 1 word per node (fore pointer). This isn't
> >that much of difference in memory. For 300 words, that is 1800 bytes
> >compared to 600 bytes (1200 bytes difference) --- not a big deal given
> >32K of memory.
>
> Well, if we are prepared to use 1200 bytes more for the dictionary, we
> could use a hash table with 600 buckets. Each bucket will be filled
> with half a word on average. A failing search will take 1/2 string
> comparison on average, a successful search will take a little over one
> string comparison on average.
>
> - anton

The flaw in this plan is that your hash function (adding all the
characters together) is only good for 4 or 5 bits (16 or 32 buckets).
Also, the 6502 doesn't have division, so you can't have a hash table
with 600 buckets --- it has to be either 512 or 1024. It seems
unlikely that you are going to upgrade your 4-5 bit hash function into
a 9-10 bit hash function. For one thing, the 6502 is very much an 8-
bit processor --- any upgrade from 8-bit arithmetic to 16-bit
arithmetic involves a huge speed penalty. Also, as I've said many
times, multiplication is the only function that is theoretically
rectangularly-distributed --- and the 6502 doesn't have
multiplication. If you try to use addition, that is mush not math.

Since you seem to be fascinated by hash functions, let me suggest a
simple hash function --- just use the length byte of the string. If
your names are limited to 31 characters, you will have 31 buckets ---
the first bucket will be of one-character words, the second bucket
will be of two-character words, etc.. This won't be very close to a
rectangular distribution, but it won't be much worse than what you are
getting with your addition-based hash function. If you do this, you
will get the following benefits:

1.) The hash will be extremely fast because no calculation is being
done. This will work especially well in Forth because Forth uses
counted strings --- as compared to C which uses zero-terminated
strings and would require a traversal of the string to determine its
length.

2.) You will save memory because you won't have to store the count
byte for every string in the dictionary. You know how long each string
is according to which linked list it is in.

3.) The string comparison will be slightly faster because it won't
have to compare the length bytes of the strings every time, as it will
know ahead of time that the string in the list is the same length as
the target string.

Our OP is targeting a Commodore PET --- it is a hobbyist project. The
algorithm I have described above is very easy to implement, which is
important to somebody working on the weekends. This simple algorithm
is also pretty comparable in efficiency to all of the complicated
pseudo-scientific hash algorithms that have been suggested so far
(especially yours).

Stephen Pelc

unread,

Sep 13, 2010, 2:21:17 PM9/13/10

to

On Mon, 13 Sep 2010 13:28:26 -0400, Sp...@ControlQ.com wrote:

>It seems that frequently comparison and contrast is made between what is
>the classic "FIG?" model, and what modern compilers are doing -- usually
>with a message which impugns (?) the historical approach.

Not to impugn, but to remind people that the technical aspects of
Forth have changed (and improved IMHO) since the 1980s.

>Is it that modern Forth systems consist entirely of optimizing X-compilers
>which produce dense machine code for embedded systems or is an interactive
>model still in vogue? Does the VM/interpreter concept still exist or has
>it been abstracted away? (I'm making reference here to the embedded sector
>primarily, rather than the "hosted" environment).

Optimising compilers such as VFX exist in both native (desktop) form
and as cross compilers. You can even have them on (big-enough)
embedded systems.

This is quite separate from whether a target-compiled system contains
an interactive Forth system. Personally, I much prefer debugging
on an open interactive system. Most MPE targets can be interactive
or tethered/Umbilical.

>I'm not sure that I understand what is meant by "modern compilers" in the
>context of Forth, and I'm not trying to be facetious here. I'd really
>like to understand how the language has evolved -- at least in commercial
>variants. Are there pointers to reference materials which would help me
>to understand the evolution??

Modern compilers for recent CPUs generate optimised native code. This
is usually ten or more times faster (we've measured it) than classic
threaded code. A 10:1 change in performance is enabling technology.
It's what permits us to write TCP/IP stacks or USB stacks entirely
in high-level Forth. I haven't written a serious interrupt routine
in Forth in the last ten years or more - the compiler is that good
on an ARM.

Stephen

--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads

Sp...@controlq.com

unread,

Sep 13, 2010, 2:30:11 PM9/13/10

to

On Mon, 13 Sep 2010, Elizabeth D Rather wrote:

> There's an overview here: http://www.forth.com/embedded/index.html. You can
> also get a free evaluation version of SwiftX for your favorite target,
> limited only in the size of the target program you can generate, to try it
> out further. It comes with target source and extensive documentation.
>
> Cheers,
> Elizabeth
>

Thanks for the Link, Elizabeth. It appears that the STK-500/AVR board on
my desk here is reported as "not recommended". I'm not sure what this
means. Oh well.

Cheers,
Rob

Hugh Aguilar

unread,

Sep 13, 2010, 2:39:59 PM9/13/10

to

On Sep 13, 11:54 am, Elizabeth D Rather <erat...@forth.com> wrote:

> On 9/13/10 7:28 AM, S...@ControlQ.com wrote:
> > I'm not sure that I understand what is meant by "modern compilers" in
> > the context of Forth, and I'm not trying to be facetious here. I'd
> > really like to understand how the language has evolved -- at least in
> > commercial variants. Are there pointers to reference materials which
> > would help me to understand the evolution??
>
> There's an overview here:http://www.forth.com/embedded/index.html. You
> can also get a free evaluation version of SwiftX for your favorite
> target, limited only in the size of the target program you can generate,
> to try it out further. It comes with target source and extensive
> documentation.

It is a very bad idea to ask a commercial-Forth salesperson for

"pointers to reference materials which would help me to understand

[Forth's] evolution." That is like asking a GM salesmen for
information on the history of automobiles --- his history is unlikely
to mention the fact that GM is bankrupt.

Elizabeth Rather is a salesperson. Her entire involvement on
comp.lang.forth is oriented toward inducing people to spend $500 on
SwiftForth or SwiftX.

I spent the $500 and it was a huge mistake; the product is worthless.
Use your LOCATE function to look at the source code, and you will see
how bad it is. Here is SwiftForth's addition-based hash function
roughly comparable in efficiency to Anton's suggestion of an expedient
solution for the Commodore PET:

{ ------------------------------------------------------------------------
The dictionary strands are hashed to improve search speed. This
routine takes a string and a number of strands and computes which
strand the word will be in. The routine works best for prime N .

: HASH ( c-addr u n -- hash )
>R 0 -ROT BOUNDS DO
4 LSHIFT I C@ $20 or +
DUP $F0000000 AND ?DUP IF 24 RSHIFT XOR $0fffffff and THEN
LOOP R> ( #STRANDS) MOD ;

HASHED takes a string and a WID, and returns the address of the strand
that the string belongs in.
------------------------------------------------------------------------ }

CODE HASH ( c-addr u n -- hash )
ESI PUSH
EBX PUSH
0 [EBP] EDX MOV \ and edx is the loop counter
EBX EBX XOR \ ebx is the accumulator, init = 0
EAX EAX XOR \ eax is the character load & add
register
4 [EBP] ESI MOV
1 L: 4 # CL MOV \ shift for accumulator
EBX CL SHL \ shift left
LODSB \ one char
$20 # AL OR \ mask to "lower case"
EAX EBX ADD \ into the accumulator
EBX EAX MOV \ copy for destructive test
$F0000000 # EAX AND \ mask for high 4 bits
2 L# JZ \ if no high bits were set, continue
EAX EBX XOR \ clear high bits
24 # CL MOV \ shift 24
EAX CL SHR \ to the right
EAX EBX XOR \ and xor back into the accumulator
2 L: EDX DEC \ decrement loop counter
1 L# JNZ \ and continue while non zero
EBX EAX MOV \ into eax for divide, edx is zero
from dec/jnz
EBX POP
EBX DIV \ do division
EDX EBX MOV \ return remainder
EBX 2 # SHL \ as a cell quantity
ESI POP
8 # EBP ADD
RET END-CODE

Sp...@controlq.com

unread,

Sep 13, 2010, 2:40:34 PM9/13/10

to

On Mon, 13 Sep 2010, Stephen Pelc wrote:

> Date: Mon, 13 Sep 2010 18:21:17 GMT
> From: Stephen Pelc <steph...@mpeforth.com>
> Reply-To: steph...@INVALID.mpeforth.com
> Newsgroups: comp.lang.forth
> Subject: Re: What are my options for dictionary hashing?

Thanks, Stephen. I suppose we've moved on from an embedded Basic as host
OS for PC's, there is no reason why Forth should not have similarly
evolved. It certainly does create a huge chasm between the original FIG
model and today's commercial Forth development systems ...

When you say "serious interrupt routine", do you mean one that does more
than set a flag variable, or stuff a char into a buffer? IMHO, that's all
that interrupts should do ... 8-).

Cheers,
Rob.

Hugh Aguilar

unread,

Sep 13, 2010, 2:44:30 PM9/13/10

to

On Sep 13, 12:21 pm, stephen...@mpeforth.com (Stephen Pelc) wrote:
> I haven't written a serious interrupt routine
> in Forth in the last ten years or more - the compiler is that good
> on an ARM.

In assembly-language, you mean?

Also, you are being a salesperson too now. I think our OP has a 6502,
not an ARM.

Sp...@controlq.com

unread,

Sep 13, 2010, 2:48:35 PM9/13/10

to

On Mon, 13 Sep 2010, Hugh Aguilar wrote:

> It is a very bad idea to ask a commercial-Forth salesperson for
> "pointers to reference materials which would help me to understand
> [Forth's] evolution." That is like asking a GM salesmen for
> information on the history of automobiles --- his history is unlikely
> to mention the fact that GM is bankrupt.
>
> Elizabeth Rather is a salesperson. Her entire involvement on
> comp.lang.forth is oriented toward inducing people to spend $500 on
> SwiftForth or SwiftX.
>

Not to put too fine a point on it, Hugh, I was speaking to the
organ-grinder. If I want to speak to the monkey, I'll pull the string.

Cheers,
Rob.

Andreas

unread,

Sep 13, 2010, 2:52:45 PM9/13/10

to

John Passaniti:

> On Sep 12, 5:06 pm, Andreas<a....@nospam.org> wrote:
>> In _most_ cases compile-time reduction is not worth the effort.

> But the OP
> did mention *seconds* to compile definitions, so putting forward some
> minor optimizations is worth the effort.

Of course I am fishing in the dark here, but I'd not be surprised at all
if those seconds were _not_ caused by dictionary searching.

>> In other words: K.I.S.S.
>
> Simplicity is one thing. Meeting performance expectations is
> another. Fast machines with gobs of memory make going for the simple
> and stupid much easier than targeting a 6502 running at 1MHz.

I agree, being old enough to have marvelled over a SYM-1 board in my
time. :-)