Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Hashing

5 views
Skip to first unread message

Chewy509

unread,
Mar 2, 2005, 6:15:42 AM3/2/05
to
Hi Everyone,

I've been looking into several hashing algorithms, in particular for use in
a compiler/assembler.

While the tens of texts I have read, all discuss the differences between
them, and appropriate ways to implement them (mainly discribed in C). A few
of the texts indicate that certain hash algorithm's are better suited to
particular domain due to inherited capacity to avoid hashing collisions
within the application domain. This makes quite a bit of sense, as some
would be suitable for authentication (where rare occurances of collisions
can occur without too much grief), while others are more suited for hashing
tokens/labels within a source file (where a single collision can mean the
difference between an assembler/compiler not working). However they don't
indicate what algorithm is most suited to what application.

So, since we have a few people here that have implemented assemblers and/or
compilers (notably Randy and Rene), what hashing algorithms are best suited
for hashing tokens within a source file, that essentially will only contain
7bit ASCII, with tokens (or labels) no larger than 32 characters?

At this stage speed is NOT a requirement, but the inherit ability of the
algorithm to avoid collisions within the domain is extremely important. I
would prefer that the target hash size to be 64bit, but larger would be fine
if I could guarantee that with the algorithm would generate no collisions
within the above domain, eg 7bit ASCII (really just 'a..z', 'A..Z', '0..9',
'_' characters only), with token length no larger than 32 characters.

PS. I still haven't finalised what the largest token/label length should be,
is 32 characters a good size, or should I make this smaller or larger? How
often would people use labels larger than say 16 or 32 characters?
PPS. This is for a mini project of mine to build my own HLA. (don't worry
Randy, I have no desire to rule the HLA niche of assemblers, just to learn
about lexical scanning and parsing in a lot more detail, and obviously learn
the basics of language development).

--
Darran (aka Chewy509)...


Betov

unread,
Mar 2, 2005, 6:33:36 AM3/2/05
to
"Chewy509" <chewy509.doe...@austarnet.com.au> écrivait
news:38llj5F...@individual.net:

> Hi Everyone,
>
> I've been looking into several hashing algorithms, in particular for
> use in a compiler/assembler.

I have also seen many, but none of them seemed to me to
be definitively good. You can see the one i wrote for
RosAsm (at the bottom of the [Assembler] TITLEs, or
take a search, for example, for "CheckSumTable"...).

I keep convinced that the envolved Algo is vey good,
and there is little Tool for viewing the results under
a graphical representation. It shows a square of Pixels,
where each pixel, in the first square, represents one
record, and the pixels drawing a red line at the bottom
(second square down), represent the Linked records.

For a Try&See, i'd suggest you compile a CopyOfRosAsm,
so that a big number of symbols could be represented.

This side-tool enabled me with testing the various
Algos, the easy way: The better the Algo is, the more
the pixels, in the firts square, look like a random
repartition, and the lesser red lines, at the bottom.
Rosasm Algo was, among all the ones i tested, the best.

The source of this implemenation is extreemly easy
to read, and, i suppose, to port, and adapt.

The symbols lenght does not matter, at all, and has
zero relationship with the collisions (has no effect
on the number of Linked Records).


Betov.

< http://rosasm.org/ >


hutch--

unread,
Mar 2, 2005, 8:55:53 PM3/2/05
to
Darran,

What you have to keep in mind is that a hash table for words in by
necessity an imperfect device and it is for a very simple reason. When
you work with decimal numbers, you have 10 variations where when you
work with characters you have with single byte types, 256 possibles and
that accounts for the compromise required in hash table design.

To make a perfect hash using the direct table method would involve
options to the power of 256 which is more memory than you can expect in
computers for a long time to come. The existing "perfect hashes" are
usually tree based and while they are collission free, they are slower
than a normal table based hash.

Most table based hashes have a seperate algo to generate the number
within a specified range from the original word so the result will fit
into the table size and they generally use the remainder of a division
which tends to give a reasonably well behaved distribution for a wide
variety of different words but none of these techniques are collission
free and this is why you need a collision handling technique.

The simplest ones increment the location if it is already used but this
leads to what is called clustering, a phenomenon where a long sequence
of adjoining "buckets" are filled and the technique slows dow with
linear testing of buckets. You can get around this by using variable
length increments and the word length is often useful to do this.

By spacing the change of bucket location on the word length you can
improve the distribution of filled buckets so there is a lot lower
chance of repeatedly testing filled buckets.

You need to a couple of basic things with the algo, it must wrap around
so you can go off the end and back ito the begining again. It is
important with this tecnique that you use a PRIME number for the bucket
count so the wrap around does not keep reading the same set of buckets
and you need to keep track of the number of filled buckets so you know
when the table is starting to fill up.

It is normal that a hash table gets slower as it approaches saturation
so the idea is to use a large enough one that does not get saturated
this way. You still see code in 32 bit that has DOS style assumptions
and tiny hash tables but there is no reason why you cannot make them
with a member count over 100000 or even larger if necessary.

What is important is to ensure that you have an effective collision
handling technique as testing over a long time shows that if the table
has a wide enough distribution without the known defects, the collision
handling code is very fast and the general performance is very good.

There are a few tricks with the method that you use to allocate the
memory for the table, you need two arrays, one is the pointers to the
other which is the main data storage array. If you allocate the memory
as zero filed, it makes testing buckets very fast as you only have to
test the 1st byte to tell if its used or not. Yopu would normally write
the word at the start of the bucket and know where the data strored
with it is located.

It can be in the format something like the following.

If the bucket is for example 32 bytes long and it is advantageous to
have the buckets as intervals of 4 bytes for memory alignment purposes,
you set your own layout for what part holds the word and what part
holds the data.

With the bucket at 32 bytes, if you reserve the 1st 16 bytes for the
word and bytes 16 to the end for the data, if the word matches, you
step another 16 bytes to get the info related to the word. The data can
be whatever you like and this includes being a pointer to another
memory location. Just for example, if you have the API functions as
prototypes, you locate the API while loading the hash, store the word
at the start of the bucket then store a pointer to the actual prototype
in the second half of the bucket.

Regards,

hutch at movsd dot com

Chewy509

unread,
Mar 3, 2005, 5:46:16 AM3/3/05
to
"Betov" wrote in message...
> "Chewy509" écrivait

>
>> Hi Everyone,
>>
>> I've been looking into several hashing algorithms, in particular for
>> use in a compiler/assembler.
>
> I have also seen many, but none of them seemed to me to
> be definitively good.

Hi Betov,

That's not really what I wanted to hear! :(

I'll definately look into the tool you mentioned.

I find it strange, with all the research into hashing and the various algo's
that someone hasn't published a list, that lists each algo and what they (in
there opinion) are suited for... (eg speed, collision potientials within
certain domains, etc). If someone knows of a list, I would be highly
interested in seeing.

PS. I know hashing isn't really about asm, but it seems more on-topic than
the usual stuff that gets discussed here. :P I also was hoping to get an
insight from the guys that write the assemblers we all so dearly love to
use, and what they did to overcome the normal issues that revolve around
hashing symbols. ;)

--
Darran (aka Chewy509)...


Chewy509

unread,
Mar 3, 2005, 5:36:51 AM3/3/05
to
Hutch,

Thanks for the insight, I'll look more into other implementations and how
they handle collisions.

Thinking a bit more about it, I can encode 32 characters into 192 bits (each
character being 6bits, and packed). This would ensure zero collisions...
However, I'm a bit concerned that matching hashes (of 192bits in size) might
get a bit expensive as the number of hashes increases... (Or I could reduce
the maximum symbol length to say 16 characters? which would produce a hash
of only 96bits).

Since speed at this time isn't important, I was thinking a single hash table
of 8,000 entries, and just use a simple list for the table. (I was thinking
since the hash algo and table would be limited to only a few areas, updating
it in the future won't be too difficult). Or am I under-estimating the
importance of the hashing algo and hash table structure within a compiler?

* Since symbols can only contain 'a..z','A..Z','0..9','_', I'm only dealing
with 63 characters, which will easily fit into 6bits. Since max symbol
length is 32 characters, 32 x 6 bits = 192bits.

--
Darran (aka Chewy509)...


̇ęń.com Ray M. Ransom

unread,
Mar 3, 2005, 11:03:19 AM3/3/05
to
"Chewy509" wrote:

> I've been looking into several hashing algorithms, in particular
> for use in a compiler/assembler.

There is no perfect hashing algorithm. Unless you have an infinitely large
symbol table, they all produce collisions.

You have to weigh speed, complexity and size. I've been writing assemblers
for almost 30 years and have evaluated hundreds of hashing schemes. After
all that, it's tough to beat the standard ElfHash algorithm for simplicity,
small size and speed. Below is the symbol hash, table lookup and collision
routine from my current generation assemblers. It hashes the symbol at
SYMBUF, indexes into the table and searches for the symbol. A pointer to the
existing symbol or the next empty location is returned in EDI, with the Z
flag set if the symbol is already in the table. This way, the same routine
is used to store a new symbol or look for an old one. The first byte of each
entry in the table must be cleared during initialization for this routine to
work.

===================================================================

SYMLEN EQU 32 ;Symbol table entry length
MAXSYM EQU SYMLEN-5 ;Maximum symbol length
MaxSymCnt EQU 65536 ;Max symbol count
SYMMSK EQU MaxSymCnt-1 ;Symbol table hash mask
SYMSHF EQU 5 ;2^x = SYMLEN (32)

PUSH ESI ;Save source line pointer
MOV ESI,offset SYMBUF ;Point to symbol buffer
XOR EDI,EDI ;Clear hash accumulator
HASHZ: LODSB ;Get a character
TEST AL,AL ;EOS?
JZ short HSHDON ;Yes
SHL EDI,4
MOVZX EAX,AL ;Extend character to EAX
ADD EDI,EAX ;Accumulate character
MOV EAX,EDI ;Hash value to EAX
AND EAX,0F0000000H ;Top nibble = 0?
JZ short HASH1 ;Yes
MOV EBX,EAX
ROL EBX,4 ;Right justify top nibble
XOR EDI,EBX
HASH1: NOT EAX ;1's complement temporary value
AND EDI,EAX ;Mask hash
JMP short HASHZ
HSHDON: AND EDI,SYMMSK ;Calculate symbol table index
SHL EDI,SYMSHF ;Form symbol table hash offset
ADD EDI,[EBP-VP+SYMBAS] ;Index into symbol table
SCHSM1: CMP byte ptr[EDI],0 ;Symbol at current table location?
JNZ short SCHSM2 ;Yes
INC ESI ;"Not found" (NZ)
SSMDON: POP ESI ;Restore source line pointer
RET
SCHSM2: PUSH EDI ;Save symbol table pointer
XOR ECX,ECX
MOV CL,MAXSYM
MOV ESI,offset SYMBUF ;Point to symbol buffer
REPZ CMPSB ;Symbol matches?
POP EDI ;Restore symbol table pointer
JZ short SSMDON ;Yes
INC [EBP-VP+HASHCL] ;Count hash collision
ADD EDI,SYMLEN ;Point to next symbol
CMP EDI,[EBP-VP+SYMEND] ;Search done?
JB short SCHSM1 ;No
CALL FATERR
DB 'Symbol table overflo','w'+80H

Betov

unread,
Mar 3, 2005, 11:49:02 AM3/3/05
to
"Ray M. Ransom" <ray@mîcôs˙ęń.com> écrivait
news:bzGVd.54074$uc.38241@trnddc03:

> "Chewy509" wrote:
>
>> I've been looking into several hashing algorithms, in particular
>> for use in a compiler/assembler
>

> I've been writing
> assemblers for almost 30 years and have evaluated hundreds of hashing
> schemes.


Congratulations. :)


After implementation, in replacement of RosAsm
Algo, this one produces, - on 7,967 symbols -,
3,532 collisions (instead of 455 for RosAsm Algo),
this is to say 8 times more collisions.

Also, the The Pixels'Square shows 'artifacts', that
mean some significative differences instead of the
expected 'random-like' repartition.


Betov.

< http://rosasm.org/ >

Annie

unread,
Mar 3, 2005, 5:56:07 PM3/3/05
to

On 2005-03-03 be...@free.fr said:

> Ray M. Ransom wrote:
>
> > I've been writing assemblers for almost 30 years and have
> > evaluated hundreds of hashing schemes.
>
> Congratulations. :)
>

> > [ ... code snipped ... ]


>
> After implementation, in replacement of RosAsm
> Algo, this one produces, - on 7,967 symbols -,
> 3,532 collisions (instead of 455 for RosAsm Algo),
> this is to say 8 times more collisions.
>
> Also, the The Pixels'Square shows 'artifacts', that
> mean some significative differences instead of the
> expected 'random-like' repartition.

> _____
> Betov. ((( `\
_ _`\ )
3532 collisions on 7967 symbols? (^ ) )
That doesn't sound right. Are you ~-( )
sure you implemented Ray's algo _'((,,,)))
correctly, Betov? ,-' \_/ `\
( , |
Ray Ransom is a well-known and `-.-'`-.-'/|_|
widely-respected programmer in \ / | |
ASM circles. I've used some of =()=: / ,' aa
his DOS utilities for years.

It's difficult to believe that an algorithm that
Ray recommends would perform so poorly.

Give us an example of the 'symbols' you used in
your test.

NoDot

unread,
Mar 3, 2005, 6:33:55 PM3/3/05
to
Annie wrote:
> Give us an example of the 'symbols' you used in
> your test.

How much you people wnat to bet that it's the RosAsm source code?

--
The above was written by NoDot.

Visit the Website of NoDot:
<www.geocities.com/nodot1989/>

̇ęń.com Ray M. Ransom

unread,
Mar 4, 2005, 12:44:20 AM3/4/05
to
"Annie" wrote:

> Give us an example of the 'symbols' you used in
> your test.

Also, what is the symbol table size and what is the table index mask? Do
they match? And are they the same as for the RosAsm algorithm?

It is also important to note that collision count is not the whole story.
The job of good hashing algorithm is to spread the symbols around. Even if
every symbol generates a collision, the algorithm can still be very efficient
if the next slot is always open. In the real world, we're not trying to
generate a perfect hash that never collides. All we want to do is reduce
search times to an acceptable level, while keeping the code small, simple,
fast and understandable (and, unfortunately, easy to code in C). ElfHash is
not the best or the most elegant, but it offers a good compromise of all
those things.

I've been doing this enough to know that a 50% collision rate is not
unacceptable. What matters is the average number of slots that have to be
searched after each collision. If a hash generates only a few collisions,
but 2000 symbols must be searched to find an open slot after each one, it's
not a good hash.

My guess is that ElfHash can improve symbol table performance by a factor of
20 to 100 over a non-hashed table. Sure, there are algorithms that can do
better, but why bother with the added complexity when everything runs
instantaneously on today's fast computers? Keep it simple. Then, when it
breaks, it can be fixed.

Another thing to consider, so long as we're on the subject of hashing, is
that linkers should use symbol hashing, just like assemblers. The
performance improvement is far more dramatic than for an assembler because
the symbol counts can be so much higher.


Betov

unread,
Mar 4, 2005, 3:12:32 AM3/4/05
to
"Annie" <m...@privacy.net> écrivait news:d084m2$gqh$2...@domitilla.aioe.org:

This is quite easy to test the performances of any such
Algo inside RosAsm itself. We just have to replace the
RosAsm 'CheckSum64' Algo by the tested one, and, if the
CheckSum is a 32 Bit one, to set ebx to zero (after this,
RosAsm builds a 16 Bit CheckSum, from the 64 Bits ones,
for addressing the Table, so that it does not change a thing
for the other parts of the comparison test).

Then, it is quite trivial to re-compile a version of RosAsm,
- that holds enough symbols for viewing a 'talking' Pixels
square -, and to compare the results, as it also can tell
the whole number of Symbols, plus the number of collisions
in the Table.

The exact number of collision depends, first and evidently,
on the Table Size. But, as RosAsm Table Size is the same
in all case, the significative numbers to be compared are
the one i gave for the two collisions amounts: 8 times
less for RosAsm original Algo.

Yes, 3,532 collisions on 7,967 is _much_, but mind you,
many of the recommanded Algos, i found here and there,
were as bad as this one. Not my fault if the others do
not write a validity visual test, and feel happy with
a bad Algo.


Betov.

< http://rosasm.org/ >


Betov

unread,
Mar 4, 2005, 3:46:06 AM3/4/05
to
"Ray M. Ransom" <ray@mîcôs˙ęń.com> écrivait
news:UASVd.21033$QQ3.4365@trnddc02:

> "Annie" wrote:
>
>> Give us an example of the 'symbols' you used in
>> your test.
>
> Also, what is the symbol table size and what is the table index mask?
> Do they match? And are they the same as for the RosAsm algorithm?

This has no effect on the comparison. Your Algo port (i don't
use the same Registers, '0' End become "> LowSigns", and so on,
but, well... here is _your Algo_ port):

CheckSum64:
push edi
mov ebx 0, eax 0, edi 0, ecx 0
While B$esi > LowSigns
lodsb
shl edi 4
movzx eax al
add edi eax
mov eax edi
and eax 0F0000000 | jz L1>
mov ebx eax
rol ebx 4
xor ebx eax
L1: not eax
and edi eax
End_While
mov ebx 0, eax edi
pop edi
ret

... that is 8 times less 'Random-like' than RosAsm one:

CheckSum64:
; esi -> Name
mov eax 0, ebx 0, ecx 0

While B§esi > LowSigns
rol eax 1 | lodsb | mul eax | xor ebx edx | inc ecx
End_While
add ebx ecx
; ebx:eax = CheckSum64
ret

Also, notice that RosAsm complete Algo, for the Table Addressing
makes use of a downward 16 bit computation, that says:

CheckSum16:
; ebx:eax = CheckSum64 (not modified here))
mov ecx eax | xor ecx ebx | mov edx ecx
rol edx 16 | xor cx dx
and ecx 0FFFF | shl ecx 4
; ecx = CheckSum16, to be used as a Displacement to the matching Record
; (To 'CheckSumsRecords' first half part, 16 Bytes per Record)
ret

... in all cases, so that the downward comparisons test
are _valid_, dispiting the difference of size between a
32 Bits and a 64 Bits initial CheckSum: Resumed in a
16 Bits repartition representation, anyway. So... :)


> It is also important to note that collision count is not the whole
> story. The job of good hashing algorithm is to spread the symbols
> around. Even if every symbol generates a collision, the algorithm can
> still be very efficient if the next slot is always open. In the real
> world, we're not trying to generate a perfect hash that never
> collides. All we want to do is reduce search times to an acceptable
> level, while keeping the code small, simple, fast and understandable
> (and, unfortunately, easy to code in C). ElfHash is not the best or
> the most elegant, but it offers a good compromise of all those things.

You are right, here. The _whole_ Method is to be considered
and the CheckSum performances is only one side of the problem.


> I've been doing this enough to know that a 50% collision rate is not
> unacceptable. What matters is the average number of slots that have
> to be searched after each collision. If a hash generates only a few
> collisions, but 2000 symbols must be searched to find an open slot
> after each one, it's not a good hash.

Yes and no. Depends on the _whole_ method.

In RosAsm one, the CheckSum64 is also used as a Member of
the Records, in the Table, as well as the 'LinkedRecord'
Pointers, so that no String comparison is performed before
having found out two identical 64 Bits CheckSums - what
have never been observed, on any big File, up to now... -.

There is only one string comparison, at the end, as a final
absolute security, but it is, pratically, of no use: All big
Files computed by RosAsm would be computed exactly the same
_without_ _any_ String comparison. Just, we well know, that
statiscally talking, this is impossible to guaranty, therefore
the final comparison.


> My guess is that ElfHash can improve symbol table performance by a
> factor of 20 to 100 over a non-hashed table. Sure, there are
> algorithms that can do better, but why bother with the added
> complexity when everything runs instantaneously on today's fast
> computers? Keep it simple. Then, when it breaks, it can be fixed.

RosAsm CheckSum is not more complicated than yours.
It is even a bit simpler, but it also achieves what
it is expected to do, that is to produce numbers
that look like randomaly distributed.

If you like to see all of this by yourself, i will
let your Algo commented out, in the next release of
RosAsm, and a Link to be Right-Clicked, at the first
Pos of the Source (I'll post an annoucement, here...).

For now, if you want to test RosAsm Algo, in your own
things, it is quite trivial to try&see:

CheckSum32:
; esi -> Name
mov eax 0, ebx 0, ecx 0

While B§esi > LowSigns
rol eax 1 | lodsb | mul eax | xor ebx edx | inc ecx
End_While
add ebx ecx
; ebx:eax = CheckSum64
xor eax ebx
; eax = CheckSum32
ret

... that should make "some" difference, whatever downward
method, you effectively use for finaly pointing out the
various Symbols...


Betov.

< http://rosasm.org/ >


hutch--

unread,
Mar 4, 2005, 5:09:36 AM3/4/05
to
I would be interested to see what qualifies as a collision testing
technique becaue its really easy to generate rubbish. The technique I
used was to to grab about 15 thousands variable length unique words and
point them at the hashing routine and then count the collisions.

The easiest way to get a phony high collision count is to pass a large
number of identical words to the procedure as they will all hit the
same location first time up. I found with large counts of unique words
that you could maintain a collision rate of between 5 to 10 percent and
while you could drop that some by chasing a more elaborate hashing
algo, it was slower than a slightly higher collision count as a well
designed collision handler is in itself very fast.

Visual pixel indicators may be "cute" but they are no substitute for a
collision handing count as that will tell you what is happening
properly.

This much I have found, if a hashing algo is done right, the speed
remains nearly constant for a far higher saturation level in the table
size until it gets close to the maximum size. As memory is not a real
consideration on modern hardware in 32 bit, a large sparse table is not
a big deal to implement these days.

Nice small algo Ray.

hutch--

unread,
Mar 4, 2005, 5:27:02 AM3/4/05
to
Darran,

In the MASM32 example code there is a small test hash application
called USTRINGS.ASM that does some of the work you are looking at. I
have played with the idea of loading an array with primes as you can
extract the values when hashing the word quickly enough and I think the
collision handling is reliable.

I have a few others floating around but not in consumer format as they
were special purpose stuff.

\masm32\examples\console\hashapp\ustrings.asm

Chewy509

unread,
Mar 4, 2005, 5:15:59 AM3/4/05
to
"Ray M. Ransom" wrote in message...

> "Chewy509" wrote:
>
>> I've been looking into several hashing algorithms, in particular
>> for use in a compiler/assembler.
>
> There is no perfect hashing algorithm. Unless you have an infinitely
> large
> symbol table, they all produce collisions.
>
> You have to weigh speed, complexity and size. I've been writing
> assemblers
> for almost 30 years and have evaluated hundreds of hashing schemes. After
> all that, it's tough to beat the standard ElfHash algorithm for
> simplicity,
> small size and speed.

The more I read, it appears that through all the attempts to get a
near-perfect hash algo for a particular domain, most don't produce anything
better than some of the tried and true algo's, (such as elfhash you
mentioned).

I was reading an old DDJ article on hashing today at work, and the basic
premise, that it was better to use a standard hash algo (such as HashPJW or
elfhash), and concentrate on collision detection and then use nonlinear
rehash on a collision, (and continue until you find a free slot).

PS. Thanks for the code...

--
Darran (aka Chewy509)...


Chewy509

unread,
Mar 4, 2005, 5:29:03 AM3/4/05
to
"Betov" wrote in message...

>
> CheckSum64:
> ; esi -> Name
> mov eax 0, ebx 0, ecx 0
>
> While B§esi > LowSigns
> rol eax 1 | lodsb | mul eax | xor ebx edx | inc ecx
> End_While
> add ebx ecx
> ; ebx:eax = CheckSum64
> ret
>
Hi Rene,

I can see your hash algo is based on a somewhat classic mutiple technique
with a twist (as opposed to elfhash which is based on addition and bit
shifting). However most hashes that are multiplication based use a prime
instead of previous source input. What inspired you to do this? The way I
see it, using a prime can ensure even distribution, but using the source
input, isn't that considered too random?

Anyway I'll see how this compares to elfhash and hashPJW...

--
Darran (aka Chewy509)...


Chewy509

unread,
Mar 4, 2005, 5:42:36 AM3/4/05
to
"hutch--" wrote in message...

Thanks Hutch, I'll look into it.

PS. It's actually nice to see some real programming talk in this NG! :D

--
Darran (aka Chewy509)...


Betov

unread,
Mar 4, 2005, 6:00:53 AM3/4/05
to
"Chewy509" <chewy509.doe...@austarnet.com.au> écrivait
news:38qrpaF...@individual.net:


What do you mean with "too random". The best Algo is
simply the one that produces the most "random-like"
repartition in the Table. As simple as this.

How i wrote it is quite simple. I first wrote the whished
implementations for _testing_ the validity (how random
it looks like). Then i ported all of the Algos i found
here and there on the Net, that were introduced as very
good ones. They all seemed to me deffective, and i wrote
mine. A quite easy task, when you have the visual Tool
providing a Pixels representation of the Table occupation,
and the counts for amount of Symbols and amount of Linked
Records in the Table.

Comparing what i see in the RosAsm [Tools] / [RosAsm dev tests]
/ [Show Symbols Repartition] Dialog, with a true Random Pixels
drawing Algo, i simply see no difference.

Comparing with any other Algo having a good "reputation",
i _see_ a big difference... :)


Betov.

< http://rosasm.org/ >


Randall Hyde

unread,
Mar 4, 2005, 8:53:30 AM3/4/05
to

"Chewy509" <chewy509.doe...@austarnet.com.au> wrote in message >

> The more I read, it appears that through all the attempts to get a
> near-perfect hash algo for a particular domain, most don't produce
anything
> better than some of the tried and true algo's, (such as elfhash you
> mentioned).
>
> I was reading an old DDJ article on hashing today at work, and the basic
> premise, that it was better to use a standard hash algo (such as HashPJW
or
> elfhash), and concentrate on collision detection and then use nonlinear
> rehash on a collision, (and continue until you find a free slot).
>
> PS. Thanks for the code...
>

One thing to consider is switching algorithms once you have
a collision. Hopefully, the number of collisions to a particular
bucket (versus the total number of collisions) is relatively low,
even as the symbol table increases in size. Often, a simple
binary search can locate an item in a hash bucket chain more
rapidly than rehashing. This depends, of course, on the
number of symbols and their composition, of course.
Just remember, the hash function is not free, and if you
typically have less than 10 collisions in any given bucket,
it may be cheaper to use a different scheme.

Of course, the fact that today's assemblers can process
over 100,000 symbols in far less than one second suggests
that once you have a decent hashing algorithm, you should
really look for performance improvements elsewhere.
Cheers,
Randy Hyde


Betov

unread,
Mar 4, 2005, 9:22:57 AM3/4/05
to
"Chewy509" <chewy509.doe...@austarnet.com.au> écrivait
news:38qrp8F...@individual.net:

> The more I read, it appears that through all the attempts to get a
> near-perfect hash algo for a particular domain, most don't produce
> anything better than some of the tried and true algo's, (such as
> elfhash you mentioned).

The "domain" is defined, here above, as the one of
"Assembler-Compilers" (and for the Assemblers able
to do all of the Linker works, like FASM and RosAsm,
i would add the Linker job...).

In that context, the "perfect" CheckSum is the one
that provides the results the closer to a good Random
Algo.

This point alone _is_ significative, because the lesser
"random-like", the more Collisions, and the more parsing
work to do, whatever it is.


> I was reading an old DDJ article on hashing today at work, and the
> basic premise, that it was better to use a standard hash algo (such as
> HashPJW or elfhash), and concentrate on collision detection and then
> use nonlinear rehash on a collision, (and continue until you find a
> free slot).

The Collisions cases holding is, evidently, the other
important side of the problem.

First, once the repartition is "randomized", the Table
size must be defined as big as required to match with
the biggest requirements (few collisions with more than
significative Apps).

Second, the best method to assume the Collisions cases,
in this context, is to simply have _TWO_ Tables. Each
Record should also hold a Pointer to the second Table,
in cases of collisions, to tell where to go on scanning.
So, the first Table is the "randomaly distributed" Records
Table, whereas the second Table simply hold the 'Linked
Records", registred in order, one after another, but
accessed, so to say, instantly. Just a Pointers switch
and Loop the CheckSum Comparison.

This is one of the reasons why RosAsm compiles more than
1,5 Mega of assembly Source per second (even HLL-Like Style
Sources), on a Celeron 1.3.


Betov.

< http://rosasm.org/ >


̇ęń.com Ray M. Ransom

unread,
Mar 4, 2005, 10:52:06 AM3/4/05
to
"Betov" wrote:

> CheckSum64:
> push edi
> mov ebx 0, eax 0, edi 0, ecx 0
> While B$esi > LowSigns
> lodsb
> shl edi 4
> movzx eax al
> add edi eax
> mov eax edi
> and eax 0F0000000 | jz L1>
> mov ebx eax
> rol ebx 4
> xor ebx eax
> L1: not eax
> and edi eax
> End_While
> mov ebx 0, eax edi
> pop edi
> ret

> CheckSum16:


> ; ebx:eax = CheckSum64 (not modified here))
> mov ecx eax | xor ecx ebx | mov edx ecx
> rol edx 16 | xor cx dx
> and ecx 0FFFF | shl ecx 4
> ; ecx = CheckSum16, to be used as a Displacement to the matching Record
> ; (To 'CheckSumsRecords' first half part, 16 Bytes per Record)
> ret

In combining these two routines, you have created something new. Whatever it
is, it is not ElfHash, and it is not a legitimate test of the ElfHash
algorithm.

The output from the ElfHash routine is already the symbol table index. If
you feed it to CheckSum16, it becomes some weird thing. You have to mask the
output by the maximum number of symbols in your table-1 and shift by x, where
2^x is the size of each table entry. If the symbol table can hold 64K
16-byte symbols, CheckSum16 should be replaced by the following:

and eax,0ffffh ;Mask for 64K entries
shl eax,4 ;Index to 16-byte entry
xchg eax,ecx ;Table index to ECX

And, by the way, ElfHash is not my algorithm. It's been in the public domain
for decades.

The RosAsm algorithm is cute, but, as with so many things RosAsm, it is
specific to its own environment. It is locked to a specific symbol table
entry size and a specific maximum symbol count. For this reason, it is
useless outside its own limited context.


Betov

unread,
Mar 4, 2005, 12:04:35 PM3/4/05
to
"Ray M. Ransom" <ray@mîcôs˙ęń.com> écrivait
news:Gu%Vd.65979$Dc.53811@trnddc06:

> In combining these two routines, you have created something new.
> Whatever it is, it is not ElfHash, and it is not a legitimate test of
> the ElfHash algorithm.
>
> The output from the ElfHash routine is already the symbol table index.
> If you feed it to CheckSum16, it becomes some weird thing. You have
> to mask the output by the maximum number of symbols in your table-1
> and shift by x, where 2^x is the size of each table entry. If the
> symbol table can hold 64K 16-byte symbols, CheckSum16 should be
> replaced by the following:
>
> and eax,0ffffh ;Mask for 64K entries
> shl eax,4 ;Index to 16-byte entry
> xchg eax,ecx ;Table index to ECX

It cannot make any difference: After the CheckSum64 is
build, the RosAsm method 'folds' it, by xoring, the
ebx:eax CheckSum64 into a CheckSum16. So, at the end,
the resulting Word _is_ used as a Table Indice. No
difference at all.

This is a kind of "Two shots tricks", in the original
RosAsm method:

1) It computes the qword, then the Word, from the qWord.

2) Then it write in the Table at the Word Indice, and
also stores the qWord in the Record, for fast searches,
in cases of collision, so that instead of comparing two
Strings, we compared two qWords, in 99,999999999......%
of the cases.


> And, by the way, ElfHash is not my algorithm. It's been in the public
> domain for decades.

Well, i suppose i can say that the one i use is mine,
but it seems to me so trivial that i will not request
it to be called "Betov Hash", as i never use the word
"Hash". :)) :)) :))

The only difficult thing to understand, in this Algo
is that the numbers from 0 to 'A', have the same
probability to come out than any other number for
variable and significative lengths Symbols.


> The RosAsm algorithm is cute, but, as with so many things RosAsm, it
> is specific to its own environment. It is locked to a specific symbol
> table entry size and a specific maximum symbol count. For this
> reason, it is useless outside its own limited context.

All such Algos are to be considered inside their specific
Method. But none are bound to a given Table Size. Whatever
the size of the CheckSum, it can always be changed by xoring,
for matching with the Table Size (unless the Size would be
something not on any boundary... :))

The xor folding keeps the repartition randomization alive,
absolutely (i even think that, for smaller sizes, it would
only _improve_ the randomization, if ever possible).

The only purpose of such a CheckSum is to provide a Numbers
serie, from a given number of Symbols, in such a way that
the numbers repartition (on any concretization) could look
like Randomaly distributed. Nothing less. Nothing more. So,
the CheckSum64 of RosAsm, whatever size convertion to any
CheckSum16, CheckSum8, 4, or whatever doable intermediate,
remains exactely at the same level of Randomization, and
is perfectly portable into any context, and to any usable
Table size. No matter if it is directly, or not directly,
a Table Indice.

The reverse i also true, for the exact same reasons, and
this is why it takes to me one minute to fully verify in
what terms a proposed Algo is good or not.

Specific does not always mean "Locked to a context". It
is also a Programming Style, a life philosophia, a type
of tools, and so on. The name was, before, "Anarchia", but,
as nobody, here, understands what i am talking about with
this, i don't use it any more, either. :))


Betov.

< http://rosasm.org/ >

Octavio

unread,
Mar 4, 2005, 5:08:56 PM3/4/05
to
Betov <be...@free.fr> wrote in message news:<XnF960F658823...@212.27.42.77>...

> "Ray M. Ransom" <ray@mîcôs˙ęń.com> écrivait
> news:UASVd.21033$QQ3.4365@trnddc02:
> CheckSum64:
> push edi
> mov ebx 0, eax 0, edi 0, ecx 0
> While B$esi > LowSigns
> lodsb
> shl edi 4
> movzx eax al
> add edi eax
> mov eax edi
> and eax 0F0000000 | jz L1>
> mov ebx eax
> rol ebx 4
xor ebx eax ;wrong, is xor edi,ebx

Betov

unread,
Mar 5, 2005, 3:43:03 AM3/5/05
to
9331...@terra.es (Octavio) écrivait
news:a3b1e869.05030...@posting.google.com:

> Betov <be...@free.fr> wrote in message
> news:<XnF960F658823...@212.27.42.77>...

>> "Ray M. Ransom" <ray@mîcôsÿêñ.com> écrivait


>> news:UASVd.21033$QQ3.4365@trnddc02:
>> CheckSum64:
>> push edi
>> mov ebx 0, eax 0, edi 0, ecx 0
>> While B$esi > LowSigns
>> lodsb
>> shl edi 4
>> movzx eax al
>> add edi eax
>> mov eax edi
>> and eax 0F0000000 | jz L1>
>> mov ebx eax
>> rol ebx 4
> xor ebx eax ;wrong, is xor edi,ebx

Correct. My fault.

The result is now 632 collisions against 445.
Still less good than RosAsm Algo, but not that
bad as 8 times more collisions. The Randomization
seems now acceptable.

Thanks. Betov.

< http://rosasm.org/ >


webs...@gmail.com

unread,
Mar 6, 2005, 11:15:28 PM3/6/05
to
Chewy509 wrote:
> I've been looking into several hashing algorithms, in particular for
> use in a compiler/assembler.
>
> While the tens of texts I have read, all discuss the differences
> between them, and appropriate ways to implement them (mainly
> discribed in C). A few of the texts indicate that certain hash
> algorithm's are better suited to particular domain due to inherited
> capacity to avoid hashing collisions within the application domain.
> This makes quite a bit of sense, as some would be suitable for
> authentication (where rare occurances of collisions can occur without

> too much grief), while others are more suited for hashing
> tokens/labels within a source file (where a single collision can mean

> the difference between an assembler/compiler not working). However
> they don't indicate what algorithm is most suited to what
> application.

> [...] At this stage speed is NOT a requirement, but the inherit


> ability of the algorithm to avoid collisions within the domain
> is extremely important.

Hoping that there are no collisions is kind of silly, since hashing is
generally an information lossy operation. You cannot avoid them
completely. You can only minimize them. Remember that even in 6
characters (uncluding uppercase, lowercase, and numbers) you can't
avoid at least one collision in a *full* 32 bit output.

So you have to deal with collisions in your *design*. One can only
reasonably make such leaps of faith for either carefully designed
"perfect hashes" (which cannot be constructed if you don't have
complete control over the range of the input) or cryptographic hashes
like SHA-256, or Whirlwind or Tiger something like that. Lets ignore
thiese.

Ok, so there are two main hash table designs:

1. Closed hash table -- This is just a flat table in which each slot is
either "empty" or directly filled will your data. On insert and search
collisions, the entry is first hashed then it is successively and
deterministically "rehashed" to successive hash values until an empty
slot or match is found. A simply way to achieve this is to make the
hash table a prime in size, then use a "secondary" hash function to use
a linear skip step that is just added modulo the hash table size of
successive rehashing (you just have to make sure your don't pick *0* as
a skip step. :) ). Another consideration, of course, is what to do
once the hash table fills -- usually for performance reasons, you only
wait until your hash fills to 73% and then just rebuild the table from
scratch with a larger size (usually about double in size). So this
could require a sequence of prime numbers, however that's not hard to
come by:

http://www.pobox.com/~qed/primeat.zip

Then there is the problem of deletion of entries. You cannot just mark
an entry as "empty" because it might be along a rehash path for some
other entry. One way around this is to have an alternate marking of
"deleted". For inserts, a "deleted" entry can be treated like an empty
entry, and for searching a "deleted" entry should be considered a
no-match against anything. (Keep in mind that if you want to avoid
duplicates, an insert implicitely has to perform a search.)

2. Open hash table -- the values are always hashed to a particular
index into your open hash table, but each index would then point to a
simpler data structure such as a linked list or other similar
vector-like list. So insert/search/delete just reduce to checking the
index, then performing the insert/search/delete on the simpler data
structure. So the point of this is that the algorithms are much
simpler, but the trade off is that you have to do one more level of
indirection and if your table gets very large, then the performance is
limited by the lower level data structure.

Personally, I tend to prefer the Open hash table just because I know
that I can pick the table index to be in the L1 cache, so that first
overhead doesn't really cost me anything anyways.

Ok. Now both variations of hash tables can only perform well if your
input data (in this case, assembler source symbols right?) has a low
collision rate from the hashing function. There are a lot of really
bad ideas out there about how to construct hash functions. And the
texts are not very helpful -- speficially because they tend not to
explore the state of the art. Just to get you past these I'll point
you to two sources that you should consider to be the "at least" lower
bar for good hash functions:

http://www.pobox.com/~qed/hash.html
http://burtleburtle.net/bob/hash/doobs.html

They are written in C, but most can be translated to assembly easily
(at worst, just compile it up, then disassemble it.) Ok, one thing you
should notice is that both Bob Jenkins and I have emphasized the need
for a creation of a "general hash function". That is to say, something
which has low collision regardless of the input data and which produces
a full width 32 bit hash output, but which can be masked and still
deliver a good hash function with minimal collisions.

But there are a couple of reasons that he and I have concentrated on
building full 32 bit hash outputs (something which at first may seem
like overkill.) As a modification to the hash designs described above,
one thing you can do is to augment each entry with the full 32 bit hash
value for that entry. Why would you do this? Because it gives you a
quick test for equality rejection. I.e., you know that an entry you
are searching for cannot be equal to one you are currently examining in
the hash table if the hash values are unequal. By having a minimal
collision rate full 32 bit hash function available for this purpose,
the probability of a false positive (i.e., the hash values are the same
but the entries actually are not) is extremely low.

Now, if we look at the closed hash table design we see that we in fact
may need to generate *two* hash functions for a given entry. However,
the total bits used for the initial index and the linear probe skip
value is almost certainly going to be less than 32. So rather than
generating *two* hash functions, you can just rip off the lower bits
for the initial hash, and use the upper bits for the probe skip value.
This will only work well if you have a really good virtually
collision-free hash function such as "One-at-a-Time" hash or Bob
Jenkin's hash or my Hash function (which I call "SuperFastHash" :) ).

As you can see, all these ideas are completely domain independent.
I.e., its possible to create a good hash design that works well for
just about anything that you are trying to hash. That you are using it
for assembly labels or some such really doesn't matter. I don't
believe that is any specific benefit that can be derived from some
assumption about assembly labels that matters.

> PS. I still haven't finalised what the largest token/label length
> should be, is 32 characters a good size, or should I make this
> smaller or larger? How often would people use labels larger than
> say 16 or 32 characters?

Why does this matter? Personally, I would set the largest size to be
2^31-1. Remember an assembler can be used as the back end for a higher
level language's compiler. That means support for autogenerated labels
is something worth while -- and your HLL designer might be miffed by a
hard limit that they wish was not there.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Betov

unread,
Mar 7, 2005, 2:22:24 AM3/7/05
to
webs...@gmail.com écrivait news:1110168928.746845.189540
@f14g2000cwb.googlegroups.com:

> [Very good post, Paul].

You don't explain how and why _two_ 'hashing' Algo should
be used. I suppose you mean to restart a calculation, _on_
the previous CheckSum, as there is no theorical reason
why a second try alone could not also achieve in a Collision,
and to _loop_ as long as necessary, for pointing out a free
'slot'. Once the Table is, say, 50% occupied, this might slow
the process down, a lot, and this is why I...

Also, prefer the Double Tables Method, with a Linked Record.
This should be way faster, as there is nothing else to be
done, but a simple pointer switch, to go on.

I also do a Comparison one the CheckSums, before the final
String Comparison, but, instead of 32, i use a 64 Bits Record
Member. Those ones have never been observed collising, up to
now, on any big file. This indead saves a lot of time.


Betov.

< http://rosasm.org/ >


Chewy509

unread,
Mar 7, 2005, 4:45:25 AM3/7/05
to
Paul wrote in message...

> Chewy509 wrote:
>> I've been looking into several hashing algorithms, in particular for
>> use in a compiler/assembler.
>>

<snip excellent post>

For the moment (this is just a bit of a progress report for everyone), I'm
using elfhash() for the hash function, and a closed hash table 30,241 in
size, and use linear rehashing on a collision, for initial stage
development. This appears to work well, until I get up to around 20,000
tokens, but for a first alpha version system is acceptable (for me).
However, I've isolated the whole hashing functionality, so that swapping it
out (the algo and hash-table setup) at a later stage won't be too much
hassle.

I did try Rene's (Betov's) hash algorithm he posted which worked quite well
(a bit better than elfhash in distribution, thus lower rate of collisions),
so will keep this in mind for the future use if needed. (Rene, I take it the
code snippet and algo as posted is Public Domain and not GPL?)

As it stands the hashing system (including hash table) works without fault
(just a little slow when getting above 20,000+ labels).

>> PS. I still haven't finalised what the largest token/label length
>> should be, is 32 characters a good size, or should I make this
>> smaller or larger? How often would people use labels larger than
>> say 16 or 32 characters?
>
> Why does this matter? Personally, I would set the largest size to be
> 2^31-1. Remember an assembler can be used as the back end for a higher
> level language's compiler. That means support for autogenerated labels
> is something worth while -- and your HLL designer might be miffed by a
> hard limit that they wish was not there.
>

At the moment, I am storing the token within the hash table, to be used as
part of collision detection and for the moment I take my input, and generate
another assembler listing (very similar to how HLA works), so keeping the
labels the same in the output, sort of helps (in both further source hacking
and debugging). Obvisously having a token 4GB in size wouldn't be too
useful, (just imagine the size of the source file, if you used tokens that
size :D ), so was aiming for a more realistic size was intended. There's no
problem setting the token length larger (just need to change 1 #define in
the source)... But just out of interest, do any other current
compilers/assemblers impose a size limit on the label? (even if it is rather
large, eg 65535 characters).

--
Darran (aka Chewy509)...


Betov

unread,
Mar 7, 2005, 7:17:23 AM3/7/05
to
"Chewy509" <chewy509.doe...@austarnet.com.au> écrivait
news:392m8aF...@individual.net:

> I did try Rene's (Betov's) hash algorithm he posted which worked quite
> well (a bit better than elfhash in distribution, thus lower rate of
> collisions), so will keep this in mind for the future use if needed.
> (Rene, I take it the code snippet and algo as posted is Public Domain
> and not GPL?)

As opposed to most, i do not consider any Algo as
diserving any License right. Any Algo, whatever license,
is _mine_ and _yours_. No matter what "they" claim.

To me, a license must apply to something significative,
and no "Algo" can be significative. If it was, changing
one single instruction would save :) and, as, in all
cases, you _do_ have to adapt to your context...

Though, if you take the _whole_ RosAsm CheckSum Tables
Managements, in this case, it is, evidently GPLed, and
hopefully GPLed, as it saves you from having to even
ask me for any re-use permission. :)


> As it stands the hashing system (including hash table) works without
> fault (just a little slow when getting above 20,000+ labels).

"20,000 and +" is perfectly possible to have (RosAsm
is more than 25,000... :)


Betov.

< http://rosasm.org/ >

Octavio

unread,
Mar 7, 2005, 9:29:24 AM3/7/05
to
webs...@gmail.com wrote in message news:<1110168928.7...@f14g2000cwb.googlegroups.com>...

> 2. Open hash table -- the values are always hashed to a particular
> index into your open hash table, but each index would then point to a
> simpler data structure such as a linked list or other similar
> vector-like list. So insert/search/delete just reduce to checking the
> index, then performing the insert/search/delete
And is also very important to order the list, placing the last accesed
key struct to the begining of the list, this gives better performance
especially when the number of collisions is very high.
And this is my favourite hash function by now, hash codes are not as
god as the bob jenkins hash and the buffer must be aligned ,but is
much faster, and has very few collisions with x86 nemonics.Also the
aligned buffer allows for
faster 32bit comparisons later.

;esi points to a buffer dword aligned
;wich contains the size and the keyword to
search
;padded with zeros for align
#hash push esi,ecx ;esi-> db name_size,'name' align 4
lodsd ebx=eax ;returns hash code in eax,ebx
cl=al shr cl,3 jnc >2
sub esi,4
# add eax,[esi] adc ebx,[esi+4] add esi,8
# rol eax,12 sub eax,ebx
rol ebx,9 sub eax,ebx sub ebx,eax
rol eax,15 sub eax,ebx
dec cl jns <2
pop esi,ecx
ret

Betov

unread,
Mar 8, 2005, 2:54:11 AM3/8/05
to
NoDot <no_...@msn.remove_this.com> écrivait news:3944qnF5uipqhU1
@individual.net:

>> (very similar to how HLA works)
>

> Oh, boy, Betov will kill you for saying that.


???!!!... and why should i ???!!!...

1) I have nothing against High Level Assemblers

2) RosAsm _IS_ an Hight level able Assembler, as
long as "Bottom-Up" includes "Up".

Where i will kill him is if he tells me that Randall
Hyde is an Assembly Programmer, or even a competent
Programmer, whereas he is an unsignificant ass-hole
unable to do anything clean, but for selling himself.
Assembly is a kind of "Truth Language". Therefore,
there is no room for a liar and swindler like Randall
Hyde in the Assembly arena.

What is Randall Hyde "HLA", anyway? This is an HLL
Pre-Parser, that is a very bad HLL, of course, but
why should i fight against "very bad HLL Pre-Parsers"?
No i don't care, at all of "Bad HLL Pre-Parsers". I
just have no choice but fighting for the defense of
Assembly when a swindler means to misslead beginners
and to claim that a very bad HLL Pre-Parser _is_ an
Assembler. It is _NOT_. Period.


Betov.

< http://rosasm.org/ >


webs...@gmail.com

unread,
Mar 8, 2005, 3:37:50 AM3/8/05
to
Betov wrote:
> webs...@gmail.com écrivait:

> You don't explain how and why _two_ 'hashing' Algo should
> be used.

Right ... my post was long enough and I didn't have any relgion or
politics to discuss. :) If you want to know more about hashing you
can read this article:

http://en.wikipedia.org/wiki/Hash_table

I hope people understand that the second hash function is used to
essentially sequence all entries in a hash table in some order in which
test for insert/search in the event that the primary hash fails. So
using a secondary hash value (then adding 1 or doing some other thing
to force it to be non-zero) as a constant offset just successively
added relative to the primary position is one way -- though it has some
clustering problems. The link above suggests quadradic offsets
(remember to use finite differences!) as an improvement to this.

> Also, prefer the Double Tables Method, with a Linked Record.
> This should be way faster, as there is nothing else to be
> done, but a simple pointer switch, to go on.

Right, a variation of the open hashing scheme is fine. As I said, I
prefer the open hashing method myself, since periodic full table
rebuilding just seems like the wrong way to go to me (its a lot of
code, you give up real time response, and I don't think the benefit is
that great in practice).

> I also do a Comparison on the CheckSums, before the final


> String Comparison, but, instead of 32, i use a 64 Bits Record
> Member. Those ones have never been observed collising, up to
> now, on any big file. This indead saves a lot of time.

Ok, I used to do that too -- but I discovered that even 64 bits is very
risky, and the state of the art at the time simply was not up to the
task of making 64 bit achieve probabilistic perfection. Today, using
something like the FNV hash, you could probably do reasonably well,
however you still can fail to a deliberate attack:

http://en.wikipedia.org/wiki/Birthday_Paradox

Basically, it takes O(2^32) tries and space to find a collision for any
hashing function that outputs only 64 bits, which is achieveable in
today's technology. So you're going to want to use a serious crypto
function for real hashing, and you need to output at *least* 160 bits
(2^80 tries is considered out of reach for machines for the forseeable
future.)

However, in the past 6 months, the primary historical choices, namely
MD5 and SHA-1, have both been shown to have serious weakness (a number
of MD5 collision has been found, and a theoretical weakness has
allegedly been shown in to reduce SHA-1 to only O(2^69) in
effectiveness.) So you really *should* use something else like SHA-256
or Tiger or Whirlpool:

http://en.wikipedia.org/wiki/Cryptographic_hash_function

This is all a bit much, so what I generally do instead is just stick
with a fast 32 bit hash, and when it looks like a potential
duplication, I then take the hit and proceed to perform a full check,
since its rare enough anways.

webs...@gmail.com

unread,
Mar 8, 2005, 3:53:05 AM3/8/05
to
Octavio wrote:

> webs...@gmail.com wrote:
> > 2. Open hash table -- the values are always hashed to a
> > particular index into your open hash table, but each index
> > would then point to a simpler data structure such as a linked
> > list or other similar vector-like list. So insert/search/
> > delete just reduce to checking the index, then performing the
> > insert/search/delete
> And is also very important to order the list, placing the last
> accesed key struct to the begining of the list, this gives
> better performance especially when the number of collisions is
> very high.

Yes, a little "LRU" heuristic assistance, I couldn't agree more.

> And this is my favourite hash function by now, hash codes are not

> as god as the bob jenkins hash and the buffer must be aligned,
> but is much faster, and has very few collisions with x86
> mneumonics. Also the aligned buffer allows for faster 32bit


> comparisons later.
>
> ;esi points to a buffer dword aligned
> ;wich contains the size and the keyword to
> search
> ;padded with zeros for align
> #hash push esi,ecx ;esi-> db name_size,'name' align 4
> lodsd ebx=eax ;returns hash code in eax,ebx
> cl=al shr cl,3 jnc >2
> sub esi,4
> # add eax,[esi] adc ebx,[esi+4] add esi,8
> # rol eax,12 sub eax,ebx
> rol ebx,9 sub eax,ebx sub ebx,eax
> rol eax,15 sub eax,ebx
> dec cl jns <2
> pop esi,ecx
> ret

You wierdos and your "new assemblers" ... :) It looks like the main
substance of your mixing is rotates and subtractions. I don't see how
this is avalanche resisitant -- for example, I believe that a sequence
of 24 zero bytes, 48 zero bytes, 72 zero bytes, and so on, will all
hash to the value of 0. Are you sure there isn't a simple mathematical
examination which reveals trivial collisions in just 8 bytes
(apparently your minimum size)?

Chewy509

unread,
Mar 8, 2005, 4:52:44 AM3/8/05
to
"NoDot" wrote in message...

> Chewy509 wrote:
>> (Rene, I take it the code snippet and algo as posted is Public Domain and
>> not GPL?)
>
> Oh, you are *that* new, are you?

Well I do know enough, that there are enough people out there, who will
shout "He stole GPL code! Die, you thief! To hell with you!" at the first
sight of the matching code. Despite the fact that Rene, has just just openly
stated, the pure function (and algo) isolated as is, he considers PD, but
the entire hash table and supporting code is GPL (as coded within Rosasm).
I'd rather ask, than just assume. :) (PS. Thanks Rene ;) ).

And with the current Software Patent nightmare brewing in the US and EU,
better be safe than sorry. ;)

>> (very similar to how HLA works)
>
> Oh, boy, Betov will kill you for saying that.

Why? I just used that (HLA) as a reference to how it works now.

--
Darran (aka Chewy509)...


Octavio

unread,
Mar 8, 2005, 12:08:10 PM3/8/05
to
webs...@gmail.com wrote in message news:<1110271985.4...@l41g2000cwc.googlegroups.com>...

> this is avalanche resisitant -- for example, I believe that a sequence
> of 24 zero bytes, 48 zero bytes, 72 zero bytes, and so on, will all
> hash to the value of 0. Are you sure there isn't a simple mathematical
> examination which reveals trivial collisions in just 8 bytes
> (apparently your minimum size)?
minimum size is 4 bytes, and the purpose of this hash routine is to
hash keywords (leters & digits) not a string of zeros, and do it very
fast rather than avoid trivial collisions.
If you want to hash a string of zeros you can provide a non zero
initial value,or use another algo like crc32.

NoDot

unread,
Mar 8, 2005, 9:15:08 PM3/8/05
to
Betov wrote:
> 1) I have nothing against High Level Assemblers

OK, that surprises me.

> 2) RosAsm _IS_ an Hight level able Assembler, as
> long as "Bottom-Up" includes "Up".

A high level assembler has the mini-HLL built in, but as optional. (I
reiterate that last word: *optional*)RosAsm's language is a low-level
assembly language. Your 'pre-parsers' make is otherwise.

> [snip]

There is a difference between a language and it's implimentation.
Understand (and accept) that, and then we can talk sensibly.

> Betov.
> < http://rosasm.org/ >

--
The above was written by NoDot.

Visit the Website of NoDot:
<www.geocities.com/nodot1989/>

NoDot

unread,
Mar 8, 2005, 9:16:50 PM3/8/05
to
Chewy509 wrote:
> Why? I just used that (HLA) as a reference to how it works now.

He doesn't like HLA. Read his reply to my post for an example of what he
usually says in the HLA vs RosAsm debates.

webs...@gmail.com

unread,
Mar 12, 2005, 1:47:40 PM3/12/05
to
Octavio wrote:

> webs...@gmail.com wrote:
> > this is avalanche resisitant -- for example, I believe that a
> > sequence of 24 zero bytes, 48 zero bytes, 72 zero bytes, and so
> > on, will all hash to the value of 0. Are you sure there isn't
> > a simple mathematical examination which reveals trivial
> > collisions in just 8 bytes (apparently your minimum size)?
> minimum size is 4 bytes, and the purpose of this hash routine is
> to hash keywords (leters & digits) not a string of zeros, and do
> it very fast rather than avoid trivial collisions.
> If you want to hash a string of zeros you can provide a non zero
> initial value,or use another algo like crc32.

Octavio I have written a test for various hash functions here:

http://www.pobox.com/~qed/htest.zip

I apologize in advance for it all being written in C, however I am
going for a wider audience than I could get with assembly language.
I'd also like to point out its not a performance test, but rather a
hash quality test.

In any event, as I am not fammiliar with the dialect of assembler that
you are using, I cannot know for sure if I translated your function
correctly. Do you mind checking it to see? Your function is called:
OctavioHash().

Also if anyone else is interested in testing their hash functions with
this test, just point me at them and I will throw them in there.

toby

unread,
Mar 12, 2005, 4:04:52 PM3/12/05
to
Betov wrote:
> "Chewy509" <chewy509.doe...@austarnet.com.au> écrivait
> news:392m8aF...@individual.net:
>
> > I did try Rene's (Betov's) hash algorithm he posted which worked
quite
> > well (a bit better than elfhash in distribution, thus lower rate of
> > collisions), so will keep this in mind for the future use if
needed.
> > (Rene, I take it the code snippet and algo as posted is Public
Domain
> > and not GPL?)
>
> As opposed to most, i do not consider any Algo as
> diserving any License right. Any Algo, whatever license,
> is _mine_ and _yours_. No matter what "they" claim.

Unless in public domain, *implementations* are subject to copyright and
hence use is governed by a license (note: not patentable). Of course
one cannot copyright an idea but one can copyright the expression of an
idea. Think: text of a novel. I am not opening a debate on this but I
wonder if you would entirely disagree with these points.

>
> [snip]


>
> Though, if you take the _whole_ RosAsm CheckSum Tables
> Managements, in this case, it is, evidently GPLed, and
> hopefully GPLed, as it saves you from having to even
> ask me for any re-use permission. :)
>
>

> ...
>
>
> Betov.
>
> < http://rosasm.org/ >

Betov

unread,
Mar 12, 2005, 4:58:31 PM3/12/05
to
"toby" <to...@telegraphics.com.au> écrivait news:1110661492.704047.251680
@l41g2000cwc.googlegroups.com:

>> As opposed to most, i do not consider any Algo as
>> diserving any License right. Any Algo, whatever license,
>> is _mine_ and _yours_. No matter what "they" claim.
>
> Unless in public domain, *implementations* are subject to copyright and
> hence use is governed by a license (note: not patentable). Of course
> one cannot copyright an idea but one can copyright the expression of an
> idea. Think: text of a novel. I am not opening a debate on this but I
> wonder if you would entirely disagree with these points.

Implementations have poor sense, as, in most cases,
you _do_ have to _adapt_ to a given context. That
is, to _rewrite_.

In Assembly, like in any human language, there
are infinite way of saying one thing.

Now, talking of Asm Algos, if they are small
enough, like a CheckSum64 Algo, that is nothing
but a trivial adjusted multiplication, even the
effective implementation fails to any definition
of copyright, unless MicroSoft would decide that
they have rights on multiplication. :))

Plus, a License makes sense if the beholder of
the license decides to activate the legal coin.
So, for what one writes, this is the beholder who
decides of what is what. In the GPL case, this
is a bit more complicated as, so to say, any GPL
fan is the legal owner of the GPLed stuff. But,
well, i do not expect that the GPL fans and the
FSF will "copyleft" the multiplication either...

:)) :)) :))

Betov.

< http://rosasm.org/ >


Betov

unread,
Mar 12, 2005, 5:04:58 PM3/12/05
to
webs...@gmail.com écrivait news:1110653260.041534.189480
@z14g2000cwz.googlegroups.com:

> Octavio I have written a test for various hash functions here:
>
> http://www.pobox.com/~qed/htest.zip
>
> I apologize in advance for it all being written in C, however I am
> going for a wider audience than I could get with assembly language.
> I'd also like to point out its not a performance test, but rather a
> hash quality test.
>
> In any event, as I am not fammiliar with the dialect of assembler that
> you are using, I cannot know for sure if I translated your function
> correctly. Do you mind checking it to see? Your function is called:
> OctavioHash().
>
> Also if anyone else is interested in testing their hash functions with
> this test, just point me at them and I will throw them in there.

Sorry. I don't talk chineese, and i fail to understand
why i should have to install a demential tool like
CVS for testing a trivial Demo.

I suppose and hope that Octavio is able to propose
something more serious than that nasty bad joke.


Betov. Assembly Programmer. ;)

< http://rosasm.org/ >


toby

unread,
Mar 12, 2005, 6:39:34 PM3/12/05
to
Betov wrote:
> "toby" <to...@telegraphics.com.au> écrivait
news:1110661492.704047.251680
> @l41g2000cwc.googlegroups.com:
>
> >> As opposed to most, i do not consider any Algo as
> >> diserving any License right. Any Algo, whatever license,
> >> is _mine_ and _yours_. No matter what "they" claim.
> >
> > Unless in public domain, *implementations* are subject to copyright
and
> > hence use is governed by a license (note: not patentable). Of
course
> > one cannot copyright an idea but one can copyright the expression
of an
> > idea. Think: text of a novel. I am not opening a debate on this but
I
> > wonder if you would entirely disagree with these points.
>
> Implementations have poor sense, as, in most cases,
> you _do_ have to _adapt_ to a given context. That
> is, to _rewrite_.
>
> In Assembly, like in any human language, there
> are infinite way of saying one thing.

Exactly. An infinite number of different implementations, and the
expression of each can be copyrighted. You can always make another one
*legally* if you start from scratch (i.e. not by copying an existing
one). If you prefer to copy someone else's expression, then you must
abide by their license. That is the reasonable expectation and right of
any author (as you would expect for yourself, even if you choose public
domain). This is the perfectly fair system that has allowed us to
advance the state of the art thus far. (By giving legal weight to
spurious "rights", Patent stupidity distorts and poisons this ecosystem
stone dead.)

>
> Now, talking of Asm Algos, if they are small
> enough, like a CheckSum64 Algo, that is nothing
> but a trivial adjusted multiplication, even the
> effective implementation fails to any definition
> of copyright, unless MicroSoft would decide that
> they have rights on multiplication. :))

Of course. But here you are talking about an "idea" - which I never
claimed was copyrightable. In addition, you are invoking the
mathematical tradition of publishing methods for free use by those who
follow.

>
> Plus, a License makes sense if the beholder of
> the license decides to activate the legal coin.
> So, for what one writes, this is the beholder who
> decides of what is what. In the GPL case, this
> is a bit more complicated as, so to say, any GPL
> fan is the legal owner of the GPLed stuff.

As an author you automatically have copyright no matter what license
you choose (or you can disclaim it and give to public domain). It has
nothing to do with the GPL per se.

> But,
> well, i do not expect that the GPL fans and the
> FSF will "copyleft" the multiplication either...

Of course not. AFAIK the FSF relies upon the meaning of "copyright" as
I give above: protection for a specific expression of an idea. This is
an important protection whether the applicable license is GPL or not.
It's only about asserting one's copyright on the work that one owns:
The program you just spent 400 hours developing, or the novel you spent
3 years writing.

Those who claim spurious ownership (patents, M$, SCO, etc) must not be
tolerated under any fair system: it is theft.

webs...@gmail.com

unread,
Mar 13, 2005, 1:16:48 AM3/13/05
to
Betov wrote:
> Sorry. I don't talk chineese, and i fail to understand
> why i should have to install a demential tool like
> CVS for testing a trivial Demo.

You don't have to install CVS to test the demo. If you can't program
in C, or execute a C compiler then ignore it -- clearly I am not
directing this demo at you.

> Betov. Assembly Programmer. ;)

--
Paul Hsieh. Programmer.

Betov

unread,
Mar 13, 2005, 3:55:59 AM3/13/05
to
webs...@gmail.com écrivait news:1110694608.443852.259460
@f14g2000cwb.googlegroups.com:

> {...]


>
> You don't have to install CVS to test the demo. If you can't program
> in C, or execute a C compiler then ignore it

??? Why on earth would i run a C executable that does
nothing at all, without even knowing how the tests
are done, what is tested and what are the results???

> clearly I am not
> directing this demo at you.

Sure, you were addressing Octavio, namely. Nevertheless,
we are talking at a public News Group inside a thread
where i did post something significative and effective,
you like it or not. So, i am full rights to consider that
your test should also be interresting for me.

Now, when you will be over with this and eventualy post
something readable, - for example, the Asm Code of the
various Algos and their performances -, i will gladly
test them with my own tool, that is written in Assembly,
that outputs graph and number, and that takes 2 minutes
for implementing an Algo to be tested.


"Cheese". Betov.

< http://rosasm.org/ >

Octavio

unread,
Mar 13, 2005, 10:42:44 AM3/13/05
to
webs...@gmail.com wrote in message news:<1110653260.0...@z14g2000cwz.googlegroups.com>...

> I'd also like to point out its not a performance test, but rather a
> hash quality test.

then you should use error checking or encription algorithms, mi hash
routine
is for dictionary search and speed is the most important.

> In any event, as I am not fammiliar with the dialect of assembler that
> you are using,

I don愒 know why you found it so strange, it uses the same instruction
nemonics and parameters order than most assemblers.

> I cannot know for sure if I translated your function
> correctly.

No, there is one 64 bit addition not two 32 bits aditions, and the len
is used in hash computation.


perhaps are you familiar with FASM?

hash:
push ecx
push esi
lodsd
mov ebx,eax
mov cl,al
shr cl,3
jnc l2
sub esi,4
l1:


add eax,[esi]
adc ebx,[esi+4]
add esi,8

l2:


rol eax,12
sub eax,ebx
rol ebx,9
sub eax,ebx
sub ebx,eax
rol eax,15
sub eax,ebx
dec cl

jnl l1
pop esi
pop ecx
ret

Is not posible to make a simply translation to 'C' and i惴 not very
familiar
with 'C' but i will try to do something usable for your test, and also
expect
that you post the results, i don愒 want to install a 'C' compiler and
learn how to use it.

this routine accepts any len value but reads up to 8 bytes after the
end of the string, perhaps your compiler doesn愒 support uint64_t but
you are
probably a better 'C' programmer than me, and you know how to solve
it.

union r{ uint64_t ab; uint32_t a,b; char l; }

uint64_t OctavioHash (const char * s, int len) {
uint64_t tmp;
r.ab=(uint64_t *)s ;
r.ab<<=8 ;
r.l=(char)len;
if (len <8 ){
switch(len){
case 6 : r.b&&=0xffffff ;
case 5 : r.b&&=0xffff ;
case 4 : r.b&&=0xff ;
case 3 : r.b=0
case 2 : r.ab&&=0xffffff ;
case 1 : r.ab&&=0xffff;
case 0 : return 0
}
mix() return r.ab
}
s+=7; len-=7;
while (len > 7) { s+=8; mix();len-=8;r.ab+=(uint64_t *)s; }
mix()
tmp=(uint64_t *)s
switch(len){ case 7 : tmp&&=0xffffffffffffff ;
case 6 : tmp&&=0xffffffffffff ;
case 5 : tmp&&=0xffffffffff ;
case 4 : tmp&&=0xffffffff ;
case 3 : tmp&&=0xffffff ;
case 2 : tmp&&=0xffff ;
case 1 : tmp&&=0xff ;
case 0 : return r.ab
}
r.ab+=tmp mix() return r.ab

}

void mix() {
r.b = rol32 (r.b, 9);
r.a -=r.b;
r.b-=-r.a;
r.a = rol32 (r.a, 15);
r.a-= r.b;
return
}

Octavio

unread,
Mar 13, 2005, 10:44:23 AM3/13/05
to
> I suppose and hope that Octavio is able to propose
> something more serious than that nasty bad joke.
what joke?
what do you mean?

webs...@gmail.com

unread,
Mar 13, 2005, 10:49:48 AM3/13/05
to
Betov wrote:
> webs...@gmail.com:

> > You don't have to install CVS to test the demo. If you can't
program
> > in C, or execute a C compiler then ignore it
>
> ??? Why on earth would i run a C executable that does
> nothing at all, without even knowing how the tests
> are done, what is tested and what are the results???

I'm not suggesting that you do. The key word in there was *ignore*.

> > clearly I am not directing this demo at you.
>
> Sure, you were addressing Octavio, namely. Nevertheless,
> we are talking at a public News Group inside a thread
> where i did post something significative and effective,
> you like it or not. So, i am full rights to consider that
> your test should also be interresting for me.

Yes ok, sorry for the oversight. But everyone is using these "strange"
assemblers -- its been a while since I've looked at other people x86
ASM, and I'm used to MASM and TASM syntax. You can imagine this has
caused a little bit of a barrier for me. I've implemented your
function now -- you're going to wish I hadn't (see below.)

> Now, when you will be over with this and eventualy post
> something readable, - for example, the Asm Code of the
> various Algos and their performances -, i will gladly
> test them with my own tool, that is written in Assembly,
> that outputs graph and number, and that takes 2 minutes
> for implementing an Algo to be tested.

Oh you want a piece of this? ;) Here's the assembly code for my own
hash function (basically straight from the compiler, but I couldn't
resist just some slight optimizations):

---8X-----------------------------------------------------

SuperFastHash_:
push ebx
push ecx
push esi
push edi
mov ecx,eax
mov esi,edx
mov eax,edx
test edx,edx
jle L$3
test ecx,ecx
jne L$4
L$3:
xor ebx,ebx
jmp L$13
L$4:
mov edi,edx
sar esi,0x00000002
and edi,0x00000003
xor edx,edx
cmp edx,esi
jge L$6
L$5:
xor ebx,ebx
mov bx,word ptr [ecx]
add eax,ebx
mov ebx,eax
shl ebx,16
xor eax,ebx
xor ebx,ebx
mov bx,word ptr 0x2[ecx]
shl ebx,11
xor eax,ebx
add ecx,4
mov ebx,eax
shr eax,11
inc edx
add eax,ebx
cmp edx,esi
jl L$5
L$6:
cmp edi,2
jb L$7
jbe L$9
cmp edi,3
je L$8
jmp L$12
L$7:
cmp edi,1
je L$10
jmp L$12
L$8:
xor edx,edx
mov dx,[ecx]
add eax,edx
mov edx,eax
shl edx,16
xor eax,edx
movsx edx,byte ptr 0x2[ecx]
shl edx,18
xor eax,edx
mov edx,eax
shr edx,11
jmp L$11
L$9:
xor edx,edx
mov dx,[ecx]
add eax,edx
mov edx,eax
shl edx,11
xor eax,edx
mov edx,eax
shr edx,17
jmp L$11
L$10:
movsx edx,byte ptr [ecx]
add eax,edx
mov edx,eax
shl edx,10
xor eax,edx
mov edx,eax
shr edx,1
L$11:
add eax,edx
L$12:
lea edx,[eax*8]
xor eax,edx
mov edx,eax
shr edx,5
add eax,edx
lea edx,[eax*4]
xor eax,edx
mov edx,eax
shr edx,15
add eax,edx
mov edx,eax
mov ebx,eax
shl edx,10
xor ebx,edx
L$13:
mov eax,ebx
pop edi
pop esi
pop ecx
pop ebx
ret

---8X-----------------------------------------------------

I trust you can adapt it to your dialect of assembly language.

Now *my* tools is not a performance tool. Its a hash function quality
analysis tool. Betov, I was trying to save you some public
embarassment. I don't have anything against you, but you leave me with
no choice:

Here's my translation of your hash for 32 bits to C:

/* Derived from Betov's CheckSum32 */
uint32_t BetovHash (const char * s, int len) {
int i;
uint32_t a, b;
uint64_t x;

a = b = 0;
i = 0;
while (i < len) {
a = (rol32 (a, 1) & INT32_C(0xffffff00)) | ((uint8_t)(*s));
x = (uint64_t) a;
x = x * x;
b ^= (uint32_t) (x >> 32);
a = (uint32_t) x;
i++;
s++;
}

return a ^ (b + len);
}

This hash has some bad collisions including:

Betov(4)) = Betov(5)) = 0x1bfc093
Betov(1)) = Betov(0)) = 0x149ca93
Betov(41)) = Betov(51)) = 0xbd587d83
Betov(42)) = Betov(52)) = 0x46e83df6
Betov(43)) = Betov(53)) = 0x8eaf52d
Betov(44)) = Betov(54)) = 0x356b680
Betov(45)) = Betov(55)) = 0x2f4190ce
Betov(45)The) = Betov(55)The) = 0xb34c20dd
Betov(46)) = Betov(56)) = 0x99f05825
Betov(48)) = Betov(58)) = 0x18a229f1
Betov(49)) = Betov(59)) = 0x2cbed127
Betov(52) = Betov(42) = 0x1c13dc6
Betov(0))) = Betov(1))) = 0x47f863fd
Betov(14) = Betov(04) = 0x14b5a92
Betov(148)) = Betov(048)) = 0xdae1382a
Betov($)) = Betov(%)) = 0x673a93
Betov(12) = Betov(02) = 0x14b11c6
Betov(1L) = Betov(0L) = 0x14ec692
Betov(11) = Betov(01) = 0x14aed63
Betov(13) = Betov(03) = 0x14b362b
Betov(*)) = Betov())) = 0xad3093
Betov(1990) = Betov(0990) = 0xc6ee101d
Betov(1993) = Betov(0993) = 0x16ed2130
Betov(1F00) = Betov(0F00) = 0x9525cf35
Betov(1F18) = Betov(0F18) = 0xe955f742
Betov(00F6) = Betov(10F6) = 0xc9c89d3a
Betov(1F20) = Betov(0F20) = 0xdd64eb7
Betov(0+i0) = Betov(1+i0) = 0xbb7e365d
Betov(0985) = Betov(1985) = 0x81d43e3f
Betov(47) = Betov(57) = 0x1c211d3
Betov(49) = Betov(59) = 0x1c266b3
Betov(559) = Betov(459) = 0x9fb0b6ee
Betov(439) = Betov(539) = 0x794cff0d
Betov(471) = Betov(571) = 0x7552326b
Betov(474) = Betov(574) = 0x8a7b1f9a
Betov(477) = Betov(577) = 0x9f9c48db

A large part of the problem is that your <rol eax,1 | lodsb>
combination is basically pounding on the low 8 bits of eax, with very
few bits escaping out past the bottom 8 bits. This means that your
upper 24 bits captures less of the input entropy than it should. And
the result is that you have aliasing amongst input that has the same
length, and the same last character.

Doing something as simple as rol eax, 2 instead makes this hash
function actually quite robust for text hashing. See you were supposed
to very quitely make this change when nobody was looking and suggest it
as an "improvement for anomilous cases", then we'd all harumph, but
nevertheless accept the change, and nobody would know of the flaws of
your original function. Alas, you decided to walk a different path ...

Oh and BTW, as long as you are talking about performance, since you are
overwritting the bottom 8 bits anyways, why don't you use SHL instead
of ROL? (Its a little faster to do that on at least some x86s out
there.)

P.S.: An updated version of the test with Betov's original function, as
well as my proposed fix, and a fix for my broken implementation of
Octavio's function which appears to make it robust is available from my
site here:

http://www.pobox.com/~qed/htest.zip

'\o//'annabee

unread,
Mar 13, 2005, 11:42:39 AM3/13/05
to
På 13 Mar 2005 07:49:48 -0800, skrev <webs...@gmail.com>:

" You can imagine this has
caused a little bit of a barrier for me"

> I trust you can adapt it to your dialect of assembly language.

Why would he not ?????
Thats not the question at all. The question is why you couldnt easily do
it the other way :))))))
THAT whats the people will think about, and since you say you cant,
everything you say is to be taken with a *BIG* grain of salt.


> Now *my* tools is not a performance tool. Its a hash function quality
> analysis tool. Betov, I was trying to save you some public
> embarassment. I don't have anything against you, but you leave me with
> no choice:

:)))))))))))) Are you two years old ?

:))))))))))

> Here's my translation of your hash for 32 bits to C:

> /* Derived from Betov's CheckSum32 */
> uint32_t BetovHash (const char * s, int len) {
> int i;
> uint32_t a, b;
> uint64_t x;
>
> a = b = 0;
> i = 0;
> while (i < len) {
> a = (rol32 (a, 1) & INT32_C(0xffffff00)) | ((uint8_t)(*s));
> x = (uint64_t) a;
> x = x * x;
> b ^= (uint32_t) (x >> 32);
> a = (uint32_t) x;
> i++;
> s++;
> }
>
> return a ^ (b + len);
> }

Why on earth would you destroy the readability with this piece of cryptic
gospel ?

:))))))))))))))))))))))))))))))))))))

You're the same kind of "tester" as Master P ????? :)))))))))))))))))))))
:)))))))))))))))))))

> A large part of the problem is that your <rol eax,1 | lodsb>
> combination is basically pounding on the low 8 bits of eax, with very
> few bits escaping out past the bottom 8 bits.

LOL. You havent captured much. Your analyse is very premature. Why didnt
you check the performances on REAL LIFE data, and saved yourself from
looking like a hum hum (::proffesional:))) ?

> This means that your
> upper 24 bits captures less of the input entropy than it should.

:)))))))))))))))))

> And
> the result is that you have aliasing amongst input that has the same
> length, and the same last character.

Bla bla bla. How often does that happen ??????

:-))))))))))))))))))))))))))))))))))))))))))) LOL.

You come of as just amusing Paul. Could you really be this ignorant, after
all ?

> Doing something as simple as rol eax, 2 instead makes this hash
> function actually quite robust for text hashing. See you were supposed
> to very quitely make this change when nobody was looking and suggest it
> as an "improvement for anomilous cases", then we'd all harumph, but
> nevertheless accept the change, and nobody would know of the flaws of
> your original function. Alas, you decided to walk a different path ...

Lol ! :)))))))))))
There are exactly 44 MORE collisons with your changes!! :))))))

> Oh and BTW, as long as you are talking about performance, since you are
> overwritting the bottom 8 bits anyways, why don't you use SHL instead
> of ROL? (Its a little faster to do that on at least some x86s out
> there.)

:-)))) Thats made no changes here. And mind you, it would make NO CHANGES
anywhere else either.

> P.S.: An updated version of the test with Betov's original function, as
> well as my proposed fix, and a fix for my broken implementation of
> Octavio's function which appears to make it robust is available from my
> site here:

Why on earth would a man use C ?????? Its so lame. you don't learn much
about programming using C, you just learn to be ignorant and arrogant,
when you have nothing to support your arroganze and ignorance with than a
cryptic piece of code, to maybe impress a child.

:-))))))))))))))


>
> http://www.pobox.com/~qed/htest.zip

Heres the app used to test you proposed changes:

http://szmyggenpv.com/downloads/MMEXGray.Zip

--
http://TheWannabee.org

webs...@gmail.com

unread,
Mar 13, 2005, 12:01:14 PM3/13/05
to
Octavio wrote:

> webs...@gmail.com wrote:
> > I'd also like to point out its not a performance test, but rather
> > a hash quality test.
>
> then you should use error checking or encription algorithms, mi hash
> routine is for dictionary search and speed is the most important.

Yes, but speed for a hash table is related to quality (because that's
what determines the number of collisions, and rehashes you will need to
do) as well as flat out performance. A good hash function, I would
suggest, needs to do well on both.

> > In any event, as I am not fammiliar with the dialect of assembler
> > that you are using,
>

> I don´t know why you found it so strange, it uses the same


> instruction nemonics and parameters order than most assemblers.

Things like "lodsb ebx=eax" freaked me out, and I assumed that lines
preceed with # were a comment (lots of other programming languages use
that convention.)

> > I cannot know for sure if I translated your function
> > correctly.
>
> No, there is one 64 bit addition not two 32 bits aditions, and
> the len is used in hash computation.

Ok, I fixed the 64 bit addition as well as a bug in the rol32() macro.
The way you are using/extracting the string length suggests to me that
you are hashing Pascal-style strings. Is this correct? Either way,
what I chose to do was just ram the length directly into the a and b
variables (analogous to your EAX and EBX), and rely on the quiescence
of your hash function rather than tricking it into mashing together
part of your length and part of your data at the start.

Either way, this appears to fix the hash function so that I am not able
to find collisions in it. Meaning that your hash function was actually
quite sound from the very beginning. Congratulations. :)

Betov

unread,
Mar 13, 2005, 12:03:34 PM3/13/05
to
webs...@gmail.com écrivait news:1110728988.958820.14030
@g14g2000cwa.googlegroups.com:

> Betov wrote:
>> webs...@gmail.com:
>> > [...]


> I'm not suggesting that you do. The key word in there was *ignore*.

Sure. I am not completly senile, mind you.

Not yet... :))


>> > [...]


>
> Yes ok, sorry for the oversight. But everyone is using these "strange"
> assemblers -- its been a while since I've looked at other people x86
> ASM, and I'm used to MASM and TASM syntax. You can imagine this has
> caused a little bit of a barrier for me.

No, i can't. The actual Assemblers, compared to the
obsolete ones, are way easier and have, _all_, a way
better, more logical and simpler syntax than these
obsolete ones. So, no, i can not understand. Sorry.


> I've implemented your
> function now -- you're going to wish I hadn't (see below.)

If you say so... Let's see this. :))


>> [...]

Are you sure that 94 Instructions for a trivial CheckSum
could be called "SuperFast"? :))


> I trust you can adapt it to your dialect of assembly language.

Sure. By the way, it doesn't work. But well, a detail.
I might have miss-understood the "byte ptr 0x2[ecx]"
thingie. I have supposed this was for "byte[ecx+2]",
but i may be wrong...

Anyway this is not the real problem... that is that
you do not provide the required information for any
port.


> Now *my* tools is not a performance tool. Its a hash function quality
> analysis tool. Betov, I was trying to save you some public
> embarassment. I don't have anything against you, but you leave me with
> no choice:

:)) :)) :))

And how could you "embarass" me? :))

For what reason on earth should i feel "embarassed", if
you had a better Algo to show than mine? Well, of course,
if you had a bigger dig than mine i would feel a bit
jalous. But for an Algo... why on earth should i?

:)) :)) :))


> Here's my translation of your hash for 32 bits to C:
>
> /* Derived from Betov's CheckSum32 */

> [... C shit snipped ...]


>
> This hash has some bad collisions including:
>
> Betov(4)) = Betov(5)) = 0x1bfc093

> Betov(1)) = ...

Well, i am going to tell you. :)) :)) :))

When you mean to compare the number of collisions
coming from two different computations, you have
to explain what test and to tell how many Collisions
you got with Algo1, and how many collisions with Algo2.

Then, in case you would have some success, and would
like to show to others what you have found out, you
have to propose your Source, as you did here, but with
some _exact_ information about what is what. For example
your Asm source begins with:

mov ecx,eax

? What is this eax about ?

Nothing surprising that the computation hangs, here.
I am an Assembly Programmer. Not a magician, mind you.

Plus, as your tests are done in C, for the RosAsm CheckSum
Algo, they have, by definition no room on an Assembly News
Group. If you are unable to do it clean in Assembly, it is
not my fault. So, i cannot comment on your C port, and it
has zero interrest to me, and absolutely no signification.

Once you will have posted the informations i need to do
a port of your CheckSum to RosAsm, i will run a test, and
i will have the correct number of collisions to be compare
with something serious.

For your other comments on how i should do this or that,
you may be right or wrong. For now i have no cue.

The point i seem to beginning to have a cue about, is that
you seem to be a talking head, in the Randall Hyde style,
not to say, an arrogant ass-hole.

Waiting for the required infos...


Betov.

< http://rosasm.org/ >

Betov

unread,
Mar 13, 2005, 12:14:07 PM3/13/05
to
9331...@terra.es (Octavio) écrivait news:a3b1e869.0503130744.6e5473d2
@posting.google.com:


I mean that if a talking head has nothing to show but
a C shit, in a CheckSum discussion, on an Assembly News
Group, he'd better shut up.

I am bored of these arrogant ass-holes who have nothing
to show, and who are absolutely convinced that they
have something to teach to others.

I am bored of these ass-holes who have a problem with the
size of their dick.

I am bored of these ass-holes who never wrote anything
significative in Assembly and who come and play the
expert, at the ony cost of pedantic words.

Aren't you? Sure? :)) :)) :))


Betov.

< http://rosasm.org/ >


̇ęń.com Ray M. Ransom

unread,
Mar 13, 2005, 1:55:54 PM3/13/05
to
<webs...@gmail.com> wrote:

> ...Here's the assembly code for my own hash function...

[code snipped]

Here is an implementation not based on compiler output. I wrote it many days
ago, with the intent to benchmark it, but have been too busy to do that until
today.

This routine is about 40% faster than the ElfHash routine I posed earlier. I
don't know how it compares with regard to collisions and distribution. Of
course, knowing the length of each symbol in advance is a tremendous
advantage. In cases where that is not true, the speed advantage would be
consumed by having to compute symbol lengths.

Also, I notice your compiler-generated routine does sign extension on the two
bytes loaded by themselves before merging them into the hash. Is this
correct? If so, the routine below does not do that. All bytes are unsigned,
which is the way I would expect the algorithm to work.

;[SYMEND] points to symbol end+1
;Returned hash in EDI
SuperFastHash:
MOV ESI,offset SYMBUF ;Point to symbol buffer
XOR EDI,EDI ;Default hash value
MOV ECX,[SYMEND] ;Get symbol end+1 pointer
SUB ECX,ESI ;Calculate symbol length
JZ short SFHRET ;Null length
CMP byte ptr[ESI],0 ;Null string?
JNZ short SFHASH ;No
SFHRET: RET
SFHASH: MOV EDI,ECX ;hash = len;
MOV EBX,ECX
AND EBX,3 ;rem = len & 3;
SHR ECX,2 ;Form dword count
JZ short SFHEND ;Null dword count
SFMAIN: XOR EAX,EAX ;Pre-extend
LODSW ;Get a word
ADD EDI,EAX ;hash += getshort (data);
MOV EDX,EDI
SHL EDX,16
XOR EDI,EDX ;hash ^= hash << 16;
LODSW ;Get a word
SHL EAX,11
XOR EDI,EAX ;hash ^= getshort (data) << 11;
MOV EDX,EDI
SHR EDX,11
ADD EDI,EDX ;hash += hash >> 11;
LOOP SFMAIN
SFHEND: XOR EAX,EAX ;Pre-extend
DEC EBX ;Ends on dword?
JS short SFHAVA ;Yes
JZ short CASE1 ;Mod 1
DEC EBX ;Ends on word?
JZ short CASE2 ;Yes
CASE3: LODSW ;Get a word
ADD EDI,EAX ;hash += getshort (data);
MOV EDX,EDI
SHL EDX,16
XOR EDI,EDX ;hash ^= hash << 16;
XOR EAX,EAX ;Pre-extend
LODSB ;Get last character
SHL EAX,18
XOR EDI,EAX ;hash ^= data[2 * sizeof (char)] <<
18;
MOV EDX,EDI
SHR EDX,11
ADD EDI,EDX ;hash += hash >> 11;
JMP short SFHAVA
CASE2: LODSW ;Get last word
ADD EDI,EAX ;hash += getshort (data);
MOV EDX,EDI
SHL EDX,11
XOR EDI,EDX ;hash ^= hash << 11;
MOV EDX,EDI
SHR EDX,17
ADD EDI,EDX ;hash += hash >> 17;
JMP short SFHAVA
CASE1: LODSB ;Get last character
ADD EDI,EAX ;hash += *data;
MOV EDX,EDI
SHL EDX,10
XOR EDI,EDX ;hash ^= hash << 10;
MOV EDX,EDI
SHR EDX,1
ADD EDI,EDX ;hash += hash >> 1;
SFHAVA: LEA EDX,[EDI*8]
XOR EDI,EDX ;hash ^= hash << 3;
MOV EDX,EDI
SHR EDX,5
ADD EDI,EDX ;hash += hash >> 5;
LEA EDX,[EDI*4]
XOR EDI,EDX ;hash ^= hash << 2;
MOV EDX,EDI
SHR EDX,15
ADD EDI,EDX ;hash += hash >> 15;
MOV EDX,EDI
SHL EDX,10
XOR EDI,EDX ;hash ^= hash << 10;
RET


Octavio

unread,
Mar 13, 2005, 2:22:43 PM3/13/05
to
9331...@terra.es (Octavio) wrote in message
> jnl l1

this is a error,the correct is 'jns l1'

Betov

unread,
Mar 13, 2005, 3:00:48 PM3/13/05
to
"Ray M. Ransom" <ray@mîcôs˙ęń.com> écrivait
news:_00Zd.2442$mq2.995@trnddc08:

> <webs...@gmail.com> wrote:
>
>>[...]


>
> Here is an implementation not based on compiler output. I wrote it
> many days ago, with the intent to benchmark it, but have been too busy
> to do that until today.
>
> This routine is about 40% faster than the ElfHash routine I posed
> earlier. I don't know how it compares with regard to collisions and
> distribution. Of course, knowing the length of each symbol in advance
> is a tremendous advantage. In cases where that is not true, the speed
> advantage would be consumed by having to compute symbol lengths.
>

> [...]


This one is pretty good. It beats the one of RosAsm:

447 Collisions (against 453) on an Auto-Compilation.

I will not take it because so many more intructions are
not really justified by the modest performance gain,
and because it would be way slower for the usage i do
of it (without previoulsy known length of the Symbols).

Good job, nevertheless.


Betov.

< http://rosasm.org/ >

webs...@gmail.com

unread,
Mar 13, 2005, 3:14:56 PM3/13/05
to
Betov wrote:
> webs...@gmail.com:

> > Yes ok, sorry for the oversight. But everyone is using these
> > "strange" assemblers -- its been a while since I've looked at
> > other people x86 ASM, and I'm used to MASM and TASM syntax.
> > You can imagine this has caused a little bit of a barrier for
> > me.
>
> No, i can't. The actual Assemblers, compared to the
> obsolete ones, are way easier and have, _all_, a way
> better, more logical and simpler syntax than these
> obsolete ones. So, no, i can not understand. Sorry.

Alright putz. I have not seriously looked at x86 assembly for a few
years now. For all the great features and bells and whistles your new
assembly languages might have or how easy they are supposed to be to
use, *I HAVE NOT SEEN THEM BEFORE JUST A FEW DAYS AGO*.

> > ---8X-----------------------------------------------------
> > ...


> > ---8X-----------------------------------------------------
>
> Are you sure that 94 Instructions for a trivial CheckSum
> could be called "SuperFast"? :))

Well, its certainly going to be faster than your function. But it
looks like Octavio is likely going to kick both our asses by a country
mile in terms of just flat out performance.

> > I trust you can adapt it to your dialect of assembly language.
>
> Sure. By the way, it doesn't work.

Sorry, I just ripped it from the disassembly of my C compiler. I
actually hand optimized and tested this one:

;---8X-----------------------------------------------------
; esi <- source data.
; ecx <- source data length.
; eax -> 32 bit hash result.

xor ebx, ebx
mov edi, ecx
mov eax, edi
and edi, 3
shr ecx, 2
jz L2

L1:
mov bx, word ptr [esi] ; 0
add eax, ebx ; 1
mov bx, word ptr [esi+2] ; 0
shl ebx, 11 ; 1
xor ebx, eax ; 2
shl eax, 16 ; 2
add esi, 4 ; 0
xor eax, ebx ; 3
xor ebx, ebx ; 1
mov edx, eax ; 4
shr eax, 11 ; 4
add eax, edx ; 5
dec ecx ; 2
jnz L1

L2:

cmp edi, 1
jz lcase1
cmp edi, 2
jz lcase2
cmp edi, 3
jnz L3

lcase3:
mov bx, word ptr [esi]
add eax, ebx
xor ebx, ebx
mov edx, eax
shl eax, 16
xor eax, edx
movsx ebx, byte ptr [esi+2]
shl ebx, 18
xor eax, ebx
mov edx, eax
shr eax, 11
add eax, edx
jmp L3

lcase2:
mov bx, word ptr [esi]
add eax, ebx
mov edx, eax
shl eax, 11
xor eax, edx
mov edx, eax
shr eax, 17
add eax, edx
jmp L3

lcase1:
movsx ebx, byte ptr [esi]
add eax, ebx
mov edx, eax
shl eax, 10
xor eax, edx
mov edx, eax
shr eax, 1
add eax, edx

L3:

mov ebx, eax
shl eax, 3
xor eax, ebx
mov ebx, eax
shr eax, 5
add eax, ebx
mov ebx, eax
shl eax, 2
xor eax, ebx
mov ebx, eax
shr eax, 15
add eax, ebx
mov ebx, eax
shl eax, 10
xor eax, ebx
;---8X-----------------------------------------------------

> For what reason on earth should i feel "embarassed", if
> you had a better Algo to show than mine? Well, of course,
> if you had a bigger dig than mine i would feel a bit
> jalous. But for an Algo... why on earth should i?

Uhh ... no, see my algorithm may be good, but then so is Bob Jenkin's
and apparently Octavio's. So there are many good hash functions. The
real problem is that your algorithm is just *SO BAD*.

> > This hash has some bad collisions including:
> >
> > Betov(4)) = Betov(5)) = 0x1bfc093
> > Betov(1)) = ...
>
> Well, i am going to tell you. :)) :)) :))
>
> When you mean to compare the number of collisions
> coming from two different computations, you have
> to explain what test and to tell how many Collisions
> you got with Algo1, and how many collisions with Algo2.

No you don't understand. What I am telling you is that using your
function, the string "4)" hashes to the same value as "5)" in *all* of
its output bits. Similarly "1990" and "0990" and "477" and "577". In
other words *all* scenarios with those strings will cause a collision,
regardless of the size of your hash table, no matter what. I was
actually using it just on some large 1MB text file of a very technical
nature (like an RFC), but that's not really relevant given the kinds of
collisions that have turned up. Good hash functions (including
Octavio's believe it or not! (my suspicions about his function did not
manifest in this or any other text file I tried)) basically output *0*
such collisions. A good hash function should be *very difficult* to
find collisions for (i.e., you should be forced to brute force search
for them). So I would say that your initial function got disqualified
before we even got to the serious test.

Ok, most reasonably designed hash functions that I have come across
*never* collide in fewer bits than the number of output bits. I.e.,
their mixing functions usually are injective all the way to the output.
That clearly is not the case with your function because you constantly
overwrite the bits where the majority of the entropy exists in your
accumulator.

If you're really interested in collision rates, then ignoring these
problems, we can say look at the fourth Chapter of Darwin's "Origin of
the Species" (which is small enough that it doesn't find problems in
your hash function) for a table of size, lets say 32 entries (just to
force some collisions), and we see:

Maximum # collisions:
SFH: 49
Betov: 51
Octavio: 71

So, when it works, your function is competitive with mine in terms of
table collisions, and Octavio seems to have some kind of chaining
problems.

> Once you will have posted the informations i need to do
> a port of your CheckSum to RosAsm, i will run a test, and
> i will have the correct number of collisions to be compare
> with something serious.

Well, there it is. Have fun.

Betov

unread,
Mar 13, 2005, 4:57:25 PM3/13/05
to
webs...@gmail.com écrivait news:1110744896.334409.17910
@f14g2000cwb.googlegroups.com:

> Betov wrote:
>> webs...@gmail.com:
>> > [...]


>
> Alright putz. I have not seriously looked at x86 assembly for a few
> years now. For all the great features and bells and whistles your new
> assembly languages might have or how easy they are supposed to be to
> use, *I HAVE NOT SEEN THEM BEFORE JUST A FEW DAYS AGO*.

Alright putz. Not my fault.


>> [...]


>
> Well, its certainly going to be faster than your function. But it
> looks like Octavio is likely going to kick both our asses by a country
> mile in terms of just flat out performance.

Octavio Algo can unfortunatly not be tested by me,
as it does not consider symbols smaller than 4 Bytes.


>> [...]

OK. This one works perfectly, with results similar
to the RosAsm original one, and to the Ransom recent
one. Collisions on my test:

RosAsm: 453
This last one: 450
Ronsom one: 447

This is to say all quite similar, close to each others
at 1 or 2%.

Given the exptreeme simplicity of RosAsm one:

------------------------------------------------------
mov eax 0, ebx 0, ecx 0

While B§esi > LowSigns
rol eax 1 | lodsb | mul eax | xor ebx edx | inc ecx
End_While

add ebx ecx

-------------------------------------------------------

i think i will go on being happy with that one, for now.


> [...] no, see my algorithm may be good, but then so is Bob Jenkin's


> and apparently Octavio's. So there are many good hash functions. The
> real problem is that your algorithm is just *SO BAD*.

If you say so, i have to suppose that you have
something more serious than you C joke to argue.

As opposed to what you claim, here, the numbers
i give can be verified by anybody, in two minutes.


>> > [...Blabla...]

Well...


Betov.

< http://rosasm.org/ >

Fullt navn

unread,
Mar 13, 2005, 5:12:37 PM3/13/05
to
På 13 Mar 2005 21:57:25 GMT, skrev Betov <be...@free.fr>:


> OK. This one works perfectly, with results similar
> to the RosAsm original one, and to the Ransom recent
> one. Collisions on my test:
>
> RosAsm: 453
> This last one: 450
> Ronsom one: 447
>
> This is to say all quite similar, close to each others
> at 1 or 2%.

Whats your numbers on MMEX ?
http://szmyggenpv.com/downloads/MMEXBlue.Zip

I get :

374 collisions, time = 1242

RosAsm one : 351. time : 1202

Maybe I did something wrong ?

Heres the code I used :


mov ecx esi
while B§ecx > LowSigns | inc ecx | End_While
sub ecx esi
Gmail.CheckSum
mov ebx 1

The Gmail.CheckSum is exactly as posted (se below)


> Given the exptreeme simplicity of RosAsm one:

>
> ------------------------------------------------------
> mov eax 0, ebx 0, ecx 0
>
> While B§esi > LowSigns
> rol eax 1 | lodsb | mul eax | xor ebx edx | inc ecx
> End_While
>
> add ebx ecx
>
> -------------------------------------------------------
>
> i think i will go on being happy with that one, for now.


>
> Betov.
>
> < http://rosasm.org/ >

[Gmail.CheckSum


xor ebx, ebx
mov edi, ecx
mov eax, edi
and edi, 3
shr ecx, 2
jz L2>

L1:
mov bx, w§esi ; 0
add eax, ebx ; 1
mov bx, w§esi+2 ; 0


shl ebx, 11 ; 1
xor ebx, eax ; 2
shl eax, 16 ; 2
add esi, 4 ; 0
xor eax, ebx ; 3
xor ebx, ebx ; 1
mov edx, eax ; 4
shr eax, 11 ; 4
add eax, edx ; 5
dec ecx ; 2
jnz L1<

L2:

cmp edi, 1
jz L7>
cmp edi, 2
jz L6>


cmp edi, 3
jnz L3>

@LCase3:
mov bx, w§esi


add eax, ebx
xor ebx, ebx
mov edx, eax
shl eax, 16
xor eax, edx

movsx ebx, b§esi+2


shl ebx, 18
xor eax, ebx
mov edx, eax
shr eax, 11
add eax, edx
jmp L3>

L6:
mov bx, w§esi


add eax, ebx
mov edx, eax
shl eax, 11
xor eax, edx
mov edx, eax
shr eax, 17
add eax, edx
jmp L3>

L7:
movsx ebx, b§esi


add eax, ebx
mov edx, eax
shl eax, 10
xor eax, edx
mov edx, eax
shr eax, 1
add eax, edx

L3:

mov ebx, eax
shl eax, 3
xor eax, ebx
mov ebx, eax
shr eax, 5
add eax, ebx
mov ebx, eax
shl eax, 2
xor eax, ebx
mov ebx, eax
shr eax, 15
add eax, ebx
mov ebx, eax
shl eax, 10
xor eax, ebx

]


--

\\?o?//annabee

unread,
Mar 13, 2005, 5:15:53 PM3/13/05
to
På Sun, 13 Mar 2005 23:12:37 +0100, skrev Fullt navn
<faq@@@.@.@.szyggenpv.com>:

> På 13 Mar 2005 21:57:25 GMT, skrev Betov <be...@free.fr>:
>
>
>> OK. This one works perfectly, with results similar
>> to the RosAsm original one, and to the Ransom recent
>> one. Collisions on my test:
>>
>> RosAsm: 453
>> This last one: 450
>> Ronsom one: 447
>>
>> This is to say all quite similar, close to each others
>> at 1 or 2%.
>
> Whats your numbers on MMEX ?
> http://szmyggenpv.com/downloads/MMEXBlue.Zip
>
> I get :
>
> 374 collisions, time = 1242

correction, that was _without_ "mov ebx 1",
its :

367 collisions, time = 1242, with.

--
I am shy

Phil Carmody

unread,
Mar 13, 2005, 5:55:01 PM3/13/05
to
Ah, the signs that one's desperate to escape killfiles.

Hopefully you'll get bored of morphing as swiftly as we got bored
by your posts. You and Annie make a great pair. Don't have kids.

Phil

\\?o?//annabee

unread,
Mar 13, 2005, 6:08:44 PM3/13/05
to
På Mon, 14 Mar 2005 00:55:01 +0200, skrev Phil Carmody
<thefatphi...@yahoo.co.uk>:

> Ah, the signs that one's desperate to escape killfiles.
>
> Hopefully you'll get bored of morphing as swiftly as we got bored
> by your posts. You and Annie make a great pair. Don't have kids.

:) the kind of post you just posted, is the only type you post. How fun do
you think that is dingbat?

>
> Phil

--
I am shy

Octavio

unread,
Mar 13, 2005, 7:18:06 PM3/13/05
to
webs...@gmail.com wrote in message news:<1110732298.7...@o13g2000cwo.googlegroups.com>...

> Yes, but speed for a hash table is related to quality (because that's
> what determines the number of collisions, and rehashes you will need to
> do) as well as flat out performance. A good hash function, I would
> suggest, needs to do well on both.

I would add that the hash table size must be a prime number and that
greater is it better performance due to less collisions.

And now i will explain why i totally disagree with those topics.
Its true that quality hash codes reduces the collisions ,thats good
,but not
so important, if one algorithm has 10% of collisions, and another has
the double , then is only 10% better, but if the second algorithm is
4x faster
then it is 4x better and betwen the various hash algorithms there is
no big differencies in the number of collisions, but there are
important differencies on speed.
Rehashes are used in closed hash algorithms, that only on a few cases
are better than the open method. But they are a waste of time, a
simple pointer increment has the same probabilities of collisions and
is faster.
Collisions is that worry the most programmers, in the closed method
when the array contains the data (not pointers to the data) the number
off collisions should be smaller like 0%-50% and the only way to get
it is using a very large hash table and a decent hash routine. But
there is another method called 'open' wich uses a array of lists, one
avantage of this method is that the table is always big enought. when
a collision happens the new data is added to the list ,and the list
can be very large, but searching a keyword in a list with 1000
keywords would require an average of 500 iterations.
But it happens that in most cases the keyword is just at the begining
of the list, and that's why this method is so faster. On a given
language for example
spanish we can have 1000000 of words but 95% of times we are using a
small subset of about 1000 words ,if those words are placed at the
begining of the lists (lru), with a hash table of 2048 we have a lot
of probabilities of finding the word at the first try, even with 99.9
of collisions. And since the hash table is small ,the cache will make
things to run faster.
And finally the prime number size for hash table, was used in the era
of thermo-ionic valves ,but now ,at least good programmers use powers
of two.
There are also some others algorithms that other people call 'hash
method'
simply because they use hash codes, like binary trees that use a hash
code instead of the word itself to build the tree, it has the
advantage that the tree usually is better balanced than another that
uses the words, for this purpose yes ,a good hash code is important.

> you are hashing Pascal-style strings. Is this correct?

if Pascal-style is something like db 6,'string'
then is correct, y think this is more eficient than asciiz strings
that
can be longer but waste time with calls to strlen,and are limited to
text data.


hmmm... a bit longer and this would be a Beth post. :)

̇ęń.com Ray M. Ransom

unread,
Mar 13, 2005, 9:10:41 PM3/13/05
to
"Betov" wrote:

> OK. This one works perfectly, with results similar
> to the RosAsm original one, and to the Ransom recent
> one. Collisions on my test:
>
> RosAsm: 453
> This last one: 450
> Ronsom one: 447

Actually, the routine I posted is not mine. It is an assembly version of
Paul Hsieh's SuperFastHash. It may be unrecognizable as such because it was
written from scratch, not generated by a compiler. Except for his compiler's
sign extending bytes, mine should be exactly the same as his in terms of
collisions. There should be no differences in output unless the input text
has the high bit set on some characters. It is smaller and faster (and real
assembly), however.

The difference of 3 collisions in your test is puzzling.


Betov

unread,
Mar 14, 2005, 3:08:10 AM3/14/05
to
Fullt navn <faq@@@.@.@.szyggenpv.com> écrivait
news:opsnljzb...@news.braodpark.no:

> get :
>
> 374 collisions, time = 1242
>
> RosAsm one : 351. time : 1202
>
> Maybe I did something wrong ?
>

> Heres the code I used : [...]


I think you did nothing wrong. The results may evidently
vary, depending on what File we choose for the test.

1) When several CheckSums comes to very similar results,
at on 1 ot 2% of differences, the differences are no more
really significative, and selectin another Test File may
as well reverse the results classification.

2) After all, we are doing nothing but trying to emulated
a _Random_ repartition. So, the final words should be to
compare with a very good _Random_ Algo. I suppose that
the result of a Random repartition should give results
very similar to the ones we actually have, what would mean
that we are close to the best possible Algo, with these
3 ones.

In all cases, they will surely never be better than a Random
similation of a CheckSum. I did not made this simulation,
because this is too much work to do, and because it would
immidiately run into one another discussion saying "How good
is your Random Algo?", that not an easy suject, as well... :)


Betov.

< http://rosasm.org/ >


Betov

unread,
Mar 14, 2005, 3:12:21 AM3/14/05
to
"Ray M. Ransom" <ray@mîcôs˙ęń.com> écrivait news:Bo6Zd.4716$wL6.3583
@trnddc03:

> The difference of 3 collisions in your test is puzzling.
>

Nothing important: I have added the Labels when pasting
the various Algos for the tests (done on Auto-Compilations)


Betov.

< http://rosasm.org/ >

\o//annabee

unread,
Mar 14, 2005, 7:27:25 AM3/14/05
to
På 14 Mar 2005 08:08:10 GMT, skrev Betov <be...@free.fr>:

>
> 2) After all, we are doing nothing but trying to emulated
> a _Random_ repartition.

Yes, that occured to me. So this Paul guy who said your algorythm was "*so
bad*" was shitting in his own pants then. :) This was also what I thought.
C people are so easy to come to the wrong conclusions :))

I find it very amusing, to have used RosAsm for a year, using _VERY_
large, and very similar labels, and equates names, and never having any of
the problems he, and master P talks about.

So we now have at least 3 diffrent sourcecodes to prove that RosAsm
symbolscanner is one of the very best, and this people are probably
_still_ going to try to defeat observable facts. With teoretical crap as
usual.

> Betov.
>
> < http://rosasm.org/ >
>
>
>
>

--
nas ne dagoniat

Frank Kotler

unread,
Mar 14, 2005, 8:41:33 AM3/14/05
to
\\\\o//annabee wrote:

> ...this Paul guy...

"This Paul guy" hasn't been active in the asm newsgroups for
several years (I thought we'd driven him off with our bs -
glad it's not so!), so you're excused for not knowing who he
is, being new here. I can assure you that he knows what he's
talking about! Here's one of his pages...

http://www.azillionmonkeys.com/qed/asmexample.html

I suggest you don't offer any coding challenges to him.

Best,
Frank

Betov

unread,
Mar 14, 2005, 9:44:34 AM3/14/05
to
Frank Kotler <fbko...@comcast.net> écrivait
news:4235948D...@comcast.net:

I also knew of Paul's name since many years, but this
is not a valid argument for anything: Master Randall
Hyde is widely known since DOS ages, and this does not
save him from being an unsignificant ass-hole who will
never understand a thing at what Assembly Programming
is.

The case of Paul is a quite usual one, in our area:

He is an HLL Programmer making occasional use of Asm
to enhance his HLL coding, like, say, 99% of MASM users.
These guys may occasionaly be very interresting to discuss
with, when it comes to a subject like this one, where we
are focusing on a very tiny problem, and dispiting the
aggressive tone that this thread came into (without any
reason, and without any of my help), i could appreciate
his contribution if it was valid. It is not really much,
but well, it's always good to try and see...

So, i have nothing, personaly against Paul, that i didn't
knew personaly, but when he comes and piss on me with no
base, in the Randall Hyde manner, i answer what i have to
answer.

Now, generally speacking, these HLLs programmers doing
occasional use of Asm Routines, even if they are able
to write great Routines, that _are_ of interrest, have
no idea of what Assembly programming means. You can, for
example, analyse how an impressionist painter created his
very small painting points for years, this is not the way
for learning how to become an impressionist painter.

If such a guy had any real knowledge of what i call Assembly,
he would simply not be writting his things in C. As simple
as this. Talking heads have yet some efforts to do before
they could impress me with pedantic words and smoking theories.
For example, having some real life Apps that they would have
written in Assembly and made available on their Web Page. :)


Betov.

< http://rosasm.org/ >


\o//annabee

unread,
Mar 14, 2005, 10:40:32 AM3/14/05
to
På Mon, 14 Mar 2005 08:41:33 -0500, skrev Frank Kotler
<fbko...@comcast.net>:

I know 99.9% who he is :))) Just tickle him a little. I learned my first
lesson from him.
And I have a link to his homepage in my favorites, and I read a LOT of it.
I just found it funny, to have him react so vigourously to Betov and to
manage to
post about 3 assumtions that wore evidently wrong, when examined.

:)) You all need to learn to take yourself, not so seriously, programmers.
Me too, btw.

If this is enough to scare him back were he came from, then maybe thats a
better place for him to be. :)) Only he would know, but it seems likly.

On the philosophical side, look at it this way : If I was to act like if I
was special, then someone would tell me, that I wasnt. But if I wore to
tell someone he isnt spesial, then someone else would tell me that he is.
:))) Like for instance you :)))

Can it not be both at the same time. Anyway, I bought his books.... He
took my money, lets see how he can take my freedom as well.

About coding challanges. It should be selfevident, to anyone reading my
posts, that I am a newb, and to anyone reading my code, that theres
nothing remotely optimal there, at all. Its all written brute force and
fast-written, not at all to be the best running code or anything. I have
however tried to think on the strategical side, a bit, but not much. It
would kill me to be so thourough as for instance Wolfgang is, as my aims
are still years ahead. But I try to pick up something here and there.

When it comes to asm, and coding, I consider myself VERY fresh. It is
because of this fact, that I praise Betov so much, because without him, I
would have never seen or been able to experience that even a newbie could
learn to program asm, and think that it is in 80% of cases, easy. More
easy than using a HLL. Also, I learn TONS more now. And it should be a
crying matter to the asm community, that I, as a newbie, using RosAsm, are
able to produce 1000 kb of asm code, at close to no sweat, using Betov
tools, and philosophy (to some degree), and allready have available the
maybe 20-30th largest actuall (opensource) fully assembly app, in the
world, that are in the known.

With 4-5 years from now, assuming health, I will have built the 3nd
largest Free Software asm app, in the world. And this will happen without
much sweat at all. And RosAsm is now the 2nd largest asm app in the world
? At least there no indication, that I am aware of, that this isnt so. I
am not sure though.

So feel free to talk shit about Betov, and RosAsm. It all only reveal that
you do not know what you speak of, at all. (I am talking to "you", not
you, Frank, in particular).
And a newb can do as well as "you" (talking heads) do.

Swallow that thought for a second.

WAKE UP!. RosAsm really is what the guy says it is. It really really IS
the best programming tool out there. And its also the fastest of the
actuall win32 assemblers (speaking in terms of compilation speed,
development speed together).

Maybe someone noticed, how easy it was to test the various algorythms with
RosAsm ? It took me about 2-3 minutes or so for inclusion and testing.

RosAsm is still in rapid developments, and for sure, it has certain bugs.
However, when compared to the upsides, theese small, close to never to be
seen bugs, that has useful and easy workarounds in all cases, do not
matter at all. RosAsm is a superiour tool for programming, in my view, way
better than both C, Delphi and C++. WAY better. And I am not even
mentioning VB, C#, Java and such tools, as theese are for the people,
feeling comfty beeing ignorant, not for programmers, at any rate.


> Best,
> Frank

--
Victory will be achieved with the use of right-click

Octavio

unread,
Mar 14, 2005, 3:35:13 PM3/14/05
to
Betov <be...@free.fr> wrote in message news:<XnF9618EBC8AD...@212.27.42.77>...

> Octavio Algo can unfortunatly not be tested by me,
> as it does not consider symbols smaller than 4 Bytes.
I have tested it below is the source, about 550 collisions on rosasm source
the compile time is the same.

CheckSum64:
[Name_bufer: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?]
push edi
push ebp
push edx
mov edi Name_bufer+1 , ah 0
hash_copy: lodsb | cmp al,lowsigns | jna hash_start
inc ah | stosb | jmp hash_copy
hash_start:
mov d$edi 0 edi name_bufer b$edi ah
dec esi
push esi
mov esi name_bufer b$esi ah
lodsd | mov ebx eax cl al
shr cl 3 | jnc l2>
sub esi 4
l1: add eax d$esi | adc ebx d$esi+4 | add esi 8
l2: rol eax 12 | sub eax ebx
rol ebx 9 | sub eax ebx |sub ebx eax
rol eax 15 | sub eax ebx
dec cl | jns l1<
pop esi
pop edx
pop ebp
pop edi
ret

Betov

unread,
Mar 14, 2005, 5:05:24 PM3/14/05
to
9331...@terra.es (Octavio) écrivait
news:a3b1e869.05031...@posting.google.com:

> Betov <be...@free.fr> wrote in message
> news:<XnF9618EBC8AD...@212.27.42.77>...
>> Octavio Algo can unfortunatly not be tested by me,
>> as it does not consider symbols smaller than 4 Bytes.
> I have tested it below is the source, about 550 collisions on rosasm
> source the compile time is the same.

Nice. Yes, i think that all of these Algos are achieving
something very close to a real Random repartition, in the
Table, and, so forth, we should fail to do much better,
as anyway, we cannot hope "more than random"... :)

Just something i did not understood, in one of your other
Posts: Why do you say that your table must be based on a
_prime_ Number? Why a prime? How do you address the Table?

[As opposed, i do it with a Table Size that can be addressed
directly by a Word].


Betov.

< http://rosasm.org >

PS. It is quite natural that the compile time is the same.
The Algo should be _extreemly_ more slower to have any
impact on the whole Compile time... In fact, the speed
of the CheckSum64 Algo, per se, has no importance at all,
as it is nothing, compared to the time (a real big time,
that we could measure in seconds on such a Compilation),
that is saved by the direct addressing of the Table. When
i implemented this Method (the whole method, not the CheckSum)
RosAsm Auto-Compile time saved, at once, several seconds
(two on four, here, if i recall well).

webs...@gmail.com

unread,
Mar 14, 2005, 8:34:44 PM3/14/05
to
Octavio wrote:

> webs...@gmail.com wrote:
> > Yes, but speed for a hash table is related to quality (because
> > that's what determines the number of collisions, and rehashes you
> > will need to do) as well as flat out performance. A good hash
> > function, I would suggest, needs to do well on both.
>
> I would add that "the hash table size must be a prime number and that
> greater is it better performance due to less collisions".
>
> And now i will explain why i totally disagree with those topics. Its
> true that quality hash codes reduces the collisions, thats good, but
> not so important, if one algorithm has 10% of collisions, and another

> has the double, then is only 10% better, but if the second algorithm


> is 4x faster then it is 4x better and betwen the various hash
> algorithms there is no big differencies in the number of collisions,
> but there are important differencies on speed.

Ok, but I have a different point of view on this. I am trying to
present a hash function that can be used for any purpose, now or later,
on any data, by me or anyone else who want to use the hash function.

Its basically a notion of "reusable code". I.e., its only useful to me
if I am not constantly maintaining it, or adjusting it for a similar,
yet not identical purpose as I am using it for today. So by ensuring
that my function is as collisions-free as possible under *any*
conditions it means that in 10 years from now when I grab it for use
for whatever unknown purpose, I won't be screwed because my "real world
data" has changed.

All this being said, I recognize the value of your function and your
approach even if others seem blind to it. Of course I am skeptical of
its resiliancy to more rigorous stress testing, but that should not
discourage you. More to the point, it is very possible, in fact most
likely that even if your function ends up showing flaws in its present
form today, that it can be fixed with a final fixed cost avalanche
mixer at the end of your code. Look back at the end of my function and
you will see what I am talking about. I have the code at the end which
continues to grind on the hash value even after it has stopped
consuming input data.

It is very likely that, when I get some time, I will set up the much
more powerful "bit avalanche" test into my hash test, and more than
likely this initial version of your function will fail. But if, as I
suspect, your function truly is a lot faster than my function, then I
will pursue the strategy of a "fix up" for your function, and in the
end it will more than likely be the fastest thing out there and at
least as robust as anything else.

So I hope you understand, my criticism of your function is just me
kicking the tires. I'm actually quite intrigued by the design -- using
a 64 bit accumulator seems like a very straightforward method for
capturing a lot of performance, and with the jarring simplicity of your
mixing function I was quite stunned to see it work very effectively on
typical sample text.

> Rehashes are used in closed hash algorithms, that only on a few cases
> are better than the open method. But they are a waste of time, a
> simple pointer increment has the same probabilities of collisions and
> is faster.

Yeah but it has a clustering problem. You actually don't want to do it
this way. If your closed hash table is < 70% full (which you want)
then the probability that you need a rehash on any scan is something
quite reasonable like 30% or so. So concentrating your efforts on
optimizing the rehashing for brute for performance isn't the right way
to go -- you instead want to minimize the probability that you have to
do it, or decrease the large costs coming from walking long rehash
chains.

> [... explanation of closed/open hash tables snipped ...] On a given


> language for example spanish we can have 1000000 of words but 95% of
> times we are using a small subset of about 1000 words ,if those words

> are placed at the begining of the lists (lru), with a hash table of
> 2048 we have a lot of probabilities of finding the word at the first
> try, even with 99.9 of collisions.

Yes, yes, yes. This is all fine, but by doing this you are leaning
very heavily on the characteristics and usage of the input data. In
general, many of the mechanisms beyond the hash function itself will
contribute to the effectiveness of a hash table, especially if you can
characterize the data you are hashing. But this brings me back to my
earlier point, that I am interesting in writing a function and
mechanism that I don't need to keep readjusting or tweaking just
because my input data changes in characteristics.

> [...] And since the hash table is small, the cache will make things


> to run faster. And finally the prime number size for hash table, was

> used in the era of thermo-ionic valves, but now, at least good


> programmers use powers of two.

:) Of course. With a closed hash table, in fact, you can make the
table size a power of two and the "rehash linear skip values" odd
numbers. This allows the rehash to still be able to walk the entire
table. You can in fact improve upon this by making the skip value a
prime (this reduces clustering).

But you should not rush to dismiss all theory. Your aging professor
might not know much about hand optimized assembly, but may still say
something useful in the lectures. The key to deriving value from
"theory", just like programming, is understanding the core material at
its most basic level. This is why I know that power of 2 sized hash
tables, with prime skip values are a good idea. Not because some
professor told me that, but rather because he explained why prime sized
hash tables were a good idea. In understanding the idea, I am able to
find a better way to achieve the same thing.

> [...] There are also some others algorithms that other people call


> 'hash method' simply because they use hash codes, like binary trees
> that use a hash code instead of the word itself to build the tree, it

> has the advantage that the tree usually is better balanced than

> another that uses the words, for this purpose yes, a good hash code
> is important.

Right, a very good point. My goal, however, is to have a function
which is both good *and* fast. Then I don't find myself making a
different choice for different purposes. You see?

There is a natural intuition people have that you sometimes sacrifice
speed for quality. My tendency is to want to have both at the same
time.

> > you are hashing Pascal-style strings. Is this correct?
>
> if Pascal-style is something like db 6,'string' then is correct, I


> think this is more eficient than asciiz strings that can be longer
> but waste time with calls to strlen,and are limited to text data.

I perfectly agree. See: http://bstring.sf.net/ :)

Octavio

unread,
Mar 15, 2005, 12:12:10 PM3/15/05
to
Betov <be...@free.fr> wrote in message news:<XnF9619ED2658...@212.27.42.74>...

> Just something i did not understood, in one of your other
> Posts: Why do you say that your table must be based on a
> _prime_ Number? Why a prime? How do you address the Table?
did you read the whole post ?
Some lines below is explained that the size should be a power of two.

Betov

unread,
Mar 15, 2005, 1:09:47 PM3/15/05
to

> Betov <be...@free.fr> wrote in message

Yes, you wrote:

"...with a hash table of 2048 we have..."

Though, you first complete sentence was saying:

"I would add that the hash table size must be a prime
number and that greater is it better performance due
to less collisions."

Therefore my question, as this sentence does no make
sense to me. Maybe you are talking "here" of the Table
Size, and "there" of the Number of records in the Table.
But, even if this is the number of Records, i see no
reason why it should be a prime, as long as this is not
for a 2n Search, but for a direct access.


Betov.

< http://rosasm.org/ >


Octavio

unread,
Mar 15, 2005, 1:16:43 PM3/15/05
to
webs...@gmail.com wrote in message news:<1110850484.0...@g14g2000cwa.googlegroups.com>...

> It is very likely that, when I get some time, I will set up the much
> more powerful "bit avalanche" test into my hash test, and more than
> likely this initial version of your function will fail. But if, as I
> suspect, your function truly is a lot faster than my function, then I
> will pursue the strategy of a "fix up" for your function, and in the
> end it will more than likely be the fastest thing out there and at
> least as robust as anything else.
This algo was obtained using a brute force method over a known
set of data (x86 nemonics) so i,m not surprised that with another data
it
performs bad, but adding another mixing step at the end reduces the
number of
colisions about 15% (rosasm test), thats because the mixer is faster
but not very good and all the input bits don“t change all the outputs
bits and also some bits are losts. But the brute force method seems
very good for searching
hash routines, the problem is that i only tested with x86 nemonics and
only check about a million of combinations with a very small mixer.
Using brute force with genetic algorithm selections and a similar
scheme
of a loop with a mix plus another mix at the end and a wide set of
test data
should provide what you are searching.

> a 64 bit accumulator seems like a very straightforward method for
> capturing a lot of performance,

Bob jenkins hash uses 3*32 bits accumulators , my algo is like a
simplified version optimized for speed.

> My goal, however, is to have a function
> which is both good *and* fast. Then I don't find myself making a
> different choice for different purposes. You see?

Yes yes i see , but i never said that my hash was the best of the
world, simply
that is my favourite with the only purpose of using it on my assembler
'Octasm' and for this unique purpose it“s very good.

Octavio

unread,
Mar 15, 2005, 5:52:58 PM3/15/05
to
>I would add that the hash table size must be a prime number and that
>greater is it better performance due to less collisions.

>And now i will explain why i totally disagree with those topics.

The first two lines are the wrong topics not my ideas. About the prime number
i'm not talking about using it to rehash but about a old method that uses a
division to calculate a index to a hash table with a prime number size ,that
i think is ineficient.

Betov

unread,
Mar 16, 2005, 3:13:51 AM3/16/05
to
9331...@terra.es (Octavio) écrivait
news:a3b1e869.0503...@posting.google.com:


Ah! Good. (Maybe my News Reader messed the thing...).


Betov.

< http://rosasm.org/ >


Phil Carmody

unread,
Mar 19, 2005, 5:05:09 AM3/19/05
to
On Mon, 14 Mar 2005 08:41:33 -0500, Frank Kotler wrote:
> \\\\o//annabee wrote:
>
>> ...this Paul guy...
>
> "This Paul guy" hasn't been active in the asm newsgroups for
> several years (I thought we'd driven him off with our bs -
> glad it's not so!),

Seconded.

> so you're excused for not knowing who he
> is, being new here. I can assure you that he knows what he's
> talking about! Here's one of his pages...
>
> http://www.azillionmonkeys.com/qed/asmexample.html
>
> I suggest you don't offer any coding challenges to him.

Au contraire!

Phil

Johannes Kroll

unread,
Mar 19, 2005, 3:45:44 PM3/19/05
to
On 15 Mar 2005 10:16:43 -0800
9331...@terra.es (Octavio) wrote:

Interesting. I tried what you described and it works surprisingly good.
I select some pre-assembled opcodes / opcode groups randomly to
add/remove it from my function, while keeping track of the input/output
registers of every opcode. Then I test how good the output is. If it's
better than in the last iteration I keep the modification. Here's a
function that approximates an integer square root, it's rather large
(disassembled with ndisasm)...

mov [0x806018c],ebx
mov [0x8060190],ecx
mov [0x8060194],edx
mov [0x8060198],esi
mov [0x806019c],edi
mov [0x80601a4],ebp
mov esi,[esp+0x4]
mov edx,esi
mov ecx,edx
ror ecx,0x17
shr ecx,0xf
add ecx,byte +0x7
mov edx,ecx
shr edx,0x1
add edx,byte +0x15
add edx,byte +0x7
mov ebx,edx
mov ecx,esi
test ebx,ebx
jz near 0x58
push edx
push eax
mov eax,ecx
xor edx,edx
div ebx
mov ecx,eax
pop eax
pop edx
mov ebp,ebx
test ecx,ecx
jz near 0x6e
push edx
push eax
mov eax,ebp
xor edx,edx
div ecx
mov ebp,eax
pop eax
pop edx
or ecx,ebp
mov edi,ecx
mov ebp,esi
shr ebp,0xb
add edi,byte +0x3
sub edi,ebp
dec edi
mov eax,edi
mov ebp,edi
test ebp,ebp
jz near 0x8f
push edx
xor edx,edx
div ebp
pop edx
mov ebx,ebp
rol ebp,0x3
and ebp,0xa28a28a2
and eax,0x5d75d75d
xor ebp,eax
xor eax,ebp
mov edx,ebx
dec ebx
mov ebp,ebx
add eax,byte +0x4
mov edi,edx
mov ebx,edx
rol eax,0x3
and eax,0xa28a28a2
and ebx,0x5d75d75d
xor eax,ebx
xor ebx,eax
mov ecx,eax
or ebx,ecx
ror edx,0xd
test esi,esi
jz near 0xdc
push edx
push eax
mov eax,edx
xor edx,edx
div esi
mov edx,eax
pop eax
pop edx
mov ecx,esi
add edx,0xfe
add ebx,byte +0x2d
mov eax,ecx
xor ebx,ecx
xor ecx,ebx
and ebx,0xf0f0f0f0
and ecx,0xf0f0f0f
xor ebx,eax
or edi,edx
shl ebx,0x7
mov ecx,esi
or edi,ebx
mov eax,ecx
xor ebx,ecx
xor ebx,edx
xor ebx,esi
xor ebx,edi
xor ebx,ebp
mov eax,ebx
mov ebx,[0x806018c]
mov ecx,[0x8060190]
mov edx,[0x8060194]
mov esi,[0x8060198]
mov edi,[0x806019c]
mov ebp,[0x80601a4]
ret

It's completely randomly generated (except the prologue and epilogue
code of course). Some "opcodes" are not single opcodes but opcode
groups, e.g. mul, div, and some mixing stuff (I thought the XOR+AND with
the immediate operands was good for hashing). When tested with values
from 0 to 5000 in 50 steps, the mean difference to the real sqrt() is
about 1.7. The larger the number the better the output seems to be.

Somehow it doesn't work for string hash functions because of a bug
somewhere. I'm gonna try floating point opcodes next, I think with a lot
of patience an approximation for "almost everything" could be generated
:)

Randall Hyde

unread,
Mar 22, 2005, 12:32:08 AM3/22/05
to
I read all of these posts with a bit of amusement.
People are so concerned about "collisions" in their
hash functions that they often forget that the cost
of computing the hash function itself is often more
expensive than what you save by having no collisions.

Octavio's function, for example, is cheap compared to
many, but I count 12 instructions in the main loop that
does the hash computation (that is, 12 instructions for
*each* character that appears in the string under
consideration). Keeping in mind that you can compare
as many as *four* characters in a string with a pair
of instructions (not using cmps, which is slow), you'll
find that it's often better, *ESPECIALLY WHEN
SEARCHING A FIXED DATA BASE*, to use
a simpler function that produces a few collisions now
and then, and then just use a *linear* search to resolve
the collisions. Of course, when using a fixed lookup table,
you can use a better search strategy, such as a binary
search, to make things even better.

BTW, I've done a *ton* of tests on the HLA scanner
and I've found that all this effort makes almost *zero*
impact on the performance of the lexical analysis phase.
Indeed, even using a cmpsb instruction (much slower than
discrete instructions) had almost no impact on the performance
of the lexical analyzer. Clearly, a decent hashing function
can have an impact on open-ended searches (e.g., through
a symbol table), but for fixed-table searches, you need to
be very careful about the cost of the function itself. HLA's
hashing function isn't particularly good; I've seen up to four
collisions into one hash bucket (in roughly a 4,096-entry
hash table). However, this isn't a common case and I make
up for the collisions by having a *very* fast hash function
and a very fast string comparison (all comparisons are
registers against immediate constants) as well as using
a state-machine driven binary search (i.e., Octavio's
complaint that the lexer is "code-driven" rather than
"table-driven"). Bottom line is that some simple profiling
indicates that separating reserved words from other
identifiers consumes so little time, that any attempts
to improve that performance would be a complete
waste of time. That's not to say that a better hashing
function wouldn't improve HLA v2.0's performance --
there is still the issue of looking up symbols in the
HLA symbol table that could be addressed, but
in terms of separating reserved words from identifiers,
there's not much to be gained at this point.

Cheers,
Randy Hyde


Octavio

unread,
Mar 23, 2005, 12:13:48 PM3/23/05
to
"Randall Hyde" <rand...@earthlink.net> wrote in message news:<s5O%d.876$gI5...@newsread1.news.pas.earthlink.net>...

> Octavio's function, for example, is cheap compared to
> many, but I count 12 instructions in the main loop that
> does the hash computation (that is, 12 instructions for
> *each* character
wrong ,is 12 instructions for 8 characters.

Betov

unread,
Mar 23, 2005, 12:24:29 PM3/23/05
to
9331...@terra.es (Octavio) écrivait
news:a3b1e869.05032...@posting.google.com:


Anyway, the argumentation is utterly stupid:

_YES_ OF COURSE, the time "lost" with the computation
of the CheckSum counts for nope compared to the time
saved 1) with reducing the Collisions Number 2) with
the time saved by the envolved whole management method
(depends on the "problem"...).

The ass-hole does not know what he is talking about.
As usual.


Betov.

< http://rosasm.org/ >


Randall Hyde

unread,
Mar 24, 2005, 12:03:23 AM3/24/05
to

"Octavio" <9331...@terra.es> wrote in message
news:a3b1e869.05032...@posting.google.com...

Oops, my mistake, sorry.
Of course, one problem with this approach is that you have to make
sure your string buffer is a multiple of eight characters long. Probably
not an issue for you in your assembler, but in HLA v2.0 it could be
a problem as I'm reading data from a memory mapped file and going
off the end of the file could be disasterous.

FWIW, the HLA v2.0 hashing function is quite simple:

USIDloop:

// Hash function is:
//
// hash := (hash rol 3) ^ SpreadTable[chr];
//
// Note that "hash" is a 16-bit value.

rol( 3, bx );
add( 1, esi ); // Bump up to next char.
xor( al, bl );

// Quick check for EOF:

cmp( esi, EOF );
jae USIDdone;

mov( [esi], al ); // Get the next char.
mov( SpreadTable[eax], al ); // Spread bits around.
test( al, al ); // Is it a valid ID char?
jnz USIDloop; // Repeat, if valid.

Note that this loop is actually doing quite a bit more than computing
the hash value, it is:

1. Computing the hash function, which is handled by the instructions:

mov( SpreadTable[eax], al );
rol( 3, bx );
xor( al, bl );

2. Converting lower-case and upper-case characters to the same
value (in order to do a case-insensitive reserved word lookup).
This function is handled by:

mov( SpreadTable[eax], al );

3. Fetching successive characters from the input file for lexical
analysis and checking them to see if they are part of a reserved
word or identifier, or if they are something else. This is handled
by the instructions:

add( 1, esi );
cmp( esi, EOF );
jae USIDdone;
mov( [esi], al );
mov( SpreadTable[eax], al );
test( al, al );
jnz USIDloop;


So how many instructions do I execute per character for the hash function?
Well, there are only two that aren't specifically used for anything else:
rol( 3, bx); and xor( al, bl ); All of the other instructions would still
need to
be there for the remainder of the lexical analysis needed by the HLA
scanner.

The important thing to note here, and the thing that makes a big difference,
is
that I merged the hash function computation into the same loop that scans
for identifiers/reserved words and does the upper/lower case conversion
in order to amortize the cost of loop control instructions better.

Strictly speaking, btw, there is *one* additional instruction required for
the hash value computation, after the "jnz USIDloop" instruction above,
I do the following for reserved words:

xor( bh, bl );

This is because HLA uses a eight-bit hash value computed from the
reserved word/identifier when searching through the reserved words
list. The actual reserved word hash table is organized as a two-dimensional
array declared roughly as follows:

rsHashTable: dword[13,256];

where the first (leftmost) dimension is indexed by the lexeme's length
and the second dimension is indexed by the eight-bit hash value.
The complete hash table is *very* sparse, mainly because of the length
component, but by indexing off the length as well as the hash value
I was able to do something very important -- I was able to guarantee
that when doing an actual string comparison I always know exactly
how many characters I've got to compare. This lets me write
straight-line code/immediate data string comparison code for each of
the reserved words (the "code" rather than "data table" comparison
you were complaining about in an earlier message). The four bits
of hash value used for encoding the length of the string does not
help spread the entries around well, compared to a better hashing function,
but it does reduce the cost of doing a string comparison by a *tremendous*
amount when dealing with collisions. Given that at least one string
comparison is *always* necessary, this isn't bad.

HLA has somewhere between 700 and 800 reserved words (IIRC).
The simple hash function described above maps almost 500 of those
reserved words to individual slots (i.e., only one string comparison is
needed to reject or accept the match). About 100 of the slots have
a single collision (two reserved words map to the same entry), meaning
it *could* take as many as two string comparisons to accept or reject
the match. It drops down to something like 20 slots that have three
collisions and four slots that have four collisions. Here is a typical
example of a case where there are three reserved words hashing to
the same slot (and how the collisions are resolved):

// Three identifiers of length four, all mapping to the same hash
// value (240, in this particular case):

Len4_Hash240_0:
cmp( eax, rwstr( "cset", 0, 4 ));
jne Len4_Hash240_1;
mov( tkn_cset, eax );
mov( type_tc, ebx );
jmp rwDone;

Len4_Hash240_1:
cmp( eax, rwstr( "int8", 0, 4 ));
jne Len4_Hash240_2;
mov( tkn_int8, eax );
mov( type_tc, ebx );
jmp rwDone;

Len4_Hash240_2:
cmp( eax, rwstr( "fseg", 0, 4 ));
jne IsAnID;
mov( tkn_fseg, eax );
mov( regseg_tc, ebx );
jmp rwDone;


Note that the "string comparison" consists of *two* instructions.
A single compare and a single conditional jump. While the
collision resolution search is done by a linear search, the bottom
line is that we're talking about, worst case, six machine instructions
to accept or reject one of these reserved words. Granted, three
of those instructions are conditional jumps, which are a bit expensive,
but the bottom line is that this is *much* faster than doing a
traditional string comparison. This efficiency is made possible
by hashing off the length of the identifier (so that I always know
exactly how many characters I've got to compare in one of these
sequences).

BTW, the code above is machine-generated, it is not maintained
by hand. I've simply got a text file containing all the reserved
words that a utility program reads and generates the above
code sequences for me (I guess Rene got a little bent out
of shape about this file :-) ).

The bottom line is that simply writing a fast hashing function
isn't good enough. The way to get a really fast lexer is by
streamlining the whole process of scanning a reserved word,
computing its hash value, and searching for the reserved word
in the data base.

I'm not about to claim that HLA's approach is the fastest. I'm
pretty sure it can be sped up a bit if someone really puts their
mind to work on it, but it does do a pretty good job.

BTW, should the scanned lexeme turn out *not* to be a
reserved word, then HLA uses the 16-bit hash value computed
by the lexer as an index into one of the hash tables for the
symbol table. As it turns out, HLA uses a "symbol tree"
rather than a "symbol table" (to use Beth's terminology, which
is quite appropriate here) because HLA is a block structured
language with scoped variables. Each scope (node in the
symbol tree) has its own little hash table. For global objects
and namespace objects, the hash table is indexed by a
16-bit value (i.e., 65,536 entries). For all other nodes,
the hash table is indexed by an 8-bit value (256 entries).
The fact that there are multiple tables helps spread the
symbols around, and further reduces the effects of collisions
(or, if you prefer, it automatically extends the number of
entries in the overall hash table each time you add a local
context to your source file).

Based on suggestions in this thread, I changed my hash function
a little bit, rotating by three bits instead of one. I also modified
the SpreadTable lookup table to "randomize" the bit patterns
for the legal ID symbols (if you think about it, identifiers only
consume about five bits: 26 alphabetic, 10 decimal, and a
few symbols like "_"; this small number of bits doesn't help
distribute items across the eight-bit character space very well).
That, combined with removing one other instruction from the
hash function I originally started with increased the speed of
HLA v2.0 by roughly 33% in a special benchmark I've got that
does several hundred thousand lookups. Not to bad for
a "two instruction" hashing function (well, three if you count
the table lookup).

In the symbol tree, btw, my collision resolution function
is a binary search rather than a linear search. Thus far
I've been getting away with a very simple binary search
that inserts symbols into the binary tree as it encounters
collisions. This means that if you enter all your symbols
in sorted order (forward or reverse), the whole thing
degenerates to a linear search. In general, however, you
can expect the inputs to be sufficiently random that this
shouldn't be a problem. I would point out, however, that
my "special benchmark" I mentioned above inserts all
the symbols in sorted order, so it's basically a "worst
case" scenerio; typical entries in that source file look
like this:

const
aaaa := 0;
aback := aaaa + 1 - 0;
aback_ := 0;
aback_aaaa := aback+aaaa-0;
abaft := aback + 1 - aaaa;
abaft_ := 0;
abaft_aback := abaft+aback-aaaa;
abandon := abaft + 1 - aback;
abandon_ := 0;
abandon_abaft := abandon+abaft-aback;
abandoned := abandon + 1 - abaft;
abandoned_ := 0;
abandoned_abandon := abandoned+abandon-abaft;
.
.
.

(this is the infamous "dictionary file" that crashes RosAsm within
about 250 statements that Rene claims assemblers shouldn't
have to handle :-); HLA v2.0, of course, handles it just fine.)

The full file is more than 115,000 lines of code like the above.
HLA v2.0 processes this (on a 3GHz PIV) at around 320,000
lines/second. This program, because of its size, is one of the
*slowest* compile times (on a lines/sec basis) I've found for
HLA v2.0. I've never bothered trying to profile the code and
find out where the time is being spent because, quite frankly,
processing 115,000 lines of code in under 1/2 second is
close enough to instantaneous for me.

In any case, I certainly second your point that having a butt-kicking
hash function the reduces the number of collisions to almost zero
is useless if the computation of the hash function is expensive.
Far better to use a very cheap hashing function that generates
a few more collisions. You can pay for a few extra string
comparisons (to resolve collisions) very rapidly if you're using
an expensive hash function (i.e., more than two or three instructions).
I once measured the difference in performance between two versions
of my program: one that used cmpsb (slow) to do a string comparison
(against symbols in the symbol tree) and one that use a discrete
sequence of instruction to do the string comparison (fast). On a file
like the one described above, there was about a 0.5% difference in
performance between the assemblies of the same file using the
two different algorithms (and this was on an input file heavily
skewed to favor the faster comparison algorithm). I immediately
dropped the discrete comparison and went with the cmpsb
version, because it's a whole lot easier to understand and maintain.

Bottom line, keep the hashing function cheap. It *does* make
a difference in the performance of the assembler. If you get a few
more collisions, that's not so bad -- my experiences indicate that
you don't lose a whole lot with the string comparisons needed to
resolve those collisions (even when doing a linear search, as
the test example I used winds up degenerating to a linear search).
Cheers,
Randy Hyde


Octavio

unread,
Mar 24, 2005, 2:33:23 PM3/24/05
to
"Randall Hyde" <rand...@earthlink.net> wrote in message news:<vSr0e.2491$gI5...@newsread1.news.pas.earthlink.net>...

> Of course, one problem with this approach is that you have to make
> sure your string buffer is a multiple of eight characters long.

Perhaps my code is very very very unreadable, but i have already
explained
that the buffer is a multiple of four. Anyway it has the inconvenience
that the word must be copied to this buffer but when done ,the hash
code is calculated very fast,the comparisons are also very fast (4
aligned bytes at a time) and also the word is moved very fast to the
dicctionary.



> The important thing to note here, and the thing that makes a big difference,
> is
> that I merged the hash function computation into the same loop that scans
> for identifiers/reserved

I do the same in octasm v0.1 but in v0.2 i will use the aligned
buffer, i like to experiment :)


> BTW, the code above is machine-generated,

This makes easy to write a million lines of code but big programs also
have big inconveniences for example the CPU will have a lot of cache
misses making the execution of every instruction very slow.
have you used the perfomance counter registers?


> The fact that there are multiple tables helps spread the
> symbols around, and further reduces the effects of collisions

Octasm also has multiple namespaces and multiple hash tables, but hash
tables sizes are proportional to the number of keywords they hold,else
is a waste of memory, also take in account that most programmers and
compilers like to use only one namespace.

> HLA v2.0 by roughly 33% in a special benchmark I've got that

but you are marketing it since year 2001 ,i'm a slow programmer but
you beat me :)
for benchmark would be better to also use real source code ,not just a
dicctionaire, one algorithm can be the best with dictionnaire but not
in real source code, is not the same problem.

Randall Hyde

unread,
Mar 24, 2005, 9:02:59 PM3/24/05
to

>
>
> > BTW, the code above is machine-generated,
>
> This makes easy to write a million lines of code but big programs also
> have big inconveniences for example the CPU will have a lot of cache
> misses making the execution of every instruction very slow.
> have you used the perfomance counter registers?

No, but I've run other profilers on the code.
The lexeme recognition code only consumes a tiny fraction
of the assembly time for a typical file. Any attempt to speed
it up (e.g., by using performance counter registers) would be
fruitless and a waste of time. There are bigger fish to fry.

>
>
> > The fact that there are multiple tables helps spread the
> > symbols around, and further reduces the effects of collisions
> Octasm also has multiple namespaces and multiple hash tables, but hash
> tables sizes are proportional to the number of keywords they hold,else
> is a waste of memory, also take in account that most programmers and
> compilers like to use only one namespace.

I completely disagree with your statement that most programmers and
compilers like to use only one namespace. I would agree that most
*assembler* authors like to use only one namespace because doing
otherwise is a lot more work, but even programmers who use relatively
"flat" (that is, not block structured) languages like C still tend to use
local and global namespaces (standard C, btw, supports four different
namespaces, IIRC).

As for wasting memory, I make no claims about HLA v2.0's efficient
use of memory. Saving space (particularly at the expense of speed)
has never been a consideration to me. Of concern is the amount of
cache space that gets used, of course, but there again the concept
of namespaces and the symbol tree work real well. Programs written
in a block-structured language tend to access local variables within
their context, thus pounding on the local symbols in memory (and allowing
the other symbols to go stale).

As for HLA's "code-driven" approach, don't forget that cache
controllers tend to work quite a bit better for code fetches than
for random data fetches. And the way the lexer generator program
I've written generates comparison sequences, if there is a collision
and the code doesn't match it on the first comparison, chances are
pretty good that the following code will be in cache (spatial locality).


>
> > HLA v2.0 by roughly 33% in a special benchmark I've got that
> but you are marketing it since year 2001 ,i'm a slow programmer but
> you beat me :)
> for benchmark would be better to also use real source code ,not just a
> dicctionaire, one algorithm can be the best with dictionnaire but not
> in real source code, is not the same problem.

"Real source code" would test a lot of irrelevant things.
If you want to test the performance of your symbol table lookup code,
you want source input that has a *lot* of symbols. Note, btw, that
the "dictionary" example I've given does *not* test the mnemonic
lookup, only the effectiveness of the hashing function and the use
of the symbol trees for standard identifiers. The only effect that mnemonic
lookup has on all of this is that all of those symbols are going to fail
to match the mnemonics, so it winds up comparing the symbol against
*every* mnemonic in a hash bucket.

Your emphasis on "special" tends to make it sound like this benchmark
was written in order to make the speed of HLA look good. In fact, just
the opposite is true (and I believe I pointed this out in my earlier post).
The alphabetic organization of the symbols in the benchmark guarantees
that when there are collisions in the hash table, the secondary search
algorithm degenerates to a linear search. The number (and organization)
of the symbols ensures that there will be lots of collisions (I figure an
average
of 3-6 collisions per bucket, in this particular case). As a general rule,
I
tend to choose benchmarks that stress out a code section (i.e., make it
run slower than it would on average) in order to help with the profiling
and determine where the code needs to be sped up. Writing benchmarks
that show how "fast" code code is (as Rene and the RosAsm crowd are
fond of doing) is about as useful as writing testing code that always works.
Cheers,
Randy Hyde

Betov

unread,
Mar 25, 2005, 2:49:45 AM3/25/05
to
"Randall Hyde" <rand...@earthlink.net> écrivait
news:njK0e.3865$H06....@newsread3.news.pas.earthlink.net:

> Writing benchmarks
> that show how "fast" code code is (as Rene and the RosAsm crowd are
> fond of doing) is about as useful as writing testing code that always
> works.
>

Pure lie. We never wrote any benchmark. I always said that
benchmarks are utterly no use for the real Assembly Authors,
and that Randall Hyde, who never wrote any Assembler, does
it for no reason but swindling and propaganda selling.

The only tests done on RosAsm are for compiling significative
Apps, like its own 3 Megas: One and half Mega per Second on a
Celeron 1.3. Period. And this ass-hole perfectly know that he
will never be able to compete with this.


Betov.

< http://rosasm.org/ >

Octavio

unread,
Mar 25, 2005, 1:28:48 PM3/25/05
to
"Randall Hyde" <rand...@earthlink.net> wrote in message news:<njK0e.3865$H06....@newsread3.news.pas.earthlink.net>...

> If you want to test the performance of your symbol table lookup code,
> you want source input that has a *lot* of symbols.
Yes but is not the same a list of keywords with only one occurrence
than a
list were a few keywords are 90% of total ,some algoritms perform
better in the first case and others are better for the second, like a
hash table with ordered lists.
0 new messages