a possible 4-way hash function test

bob_j...@burtleburtle.net

unread,

Jul 21, 2008, 12:32:24 PM7/21/08

to Hash Functions

When writing a hash function for an expression, say "child1 ^
child2" (^ means XOR), you'd like to do
hash = seed;
hash += hash(child1);
hash += hash(child2);
hash += "XOR";
mix(hash);
The order of the inputs won't affect the result, which is good,
because XOR is commutative. If you store the hash in the expression,
then new expressions can find their hash given just the hashes of
their children and constant extra work for their own operation. I've
seen this done.

But look at this part
hash += hash(child1);
hash += hash(child2);
The hash function for the children was this hash function. What if
you have (a01 XOR a10), and also (a00 XOR a11)? They're different.
But the four subexpressions (a01, a10, a00, a11) differ in only two
bits. The pairs (a01,a10) and (a00, a11) differ the same way in the
first of those bits but differently in the second. If the mix isn't
thorough enough, the sums of those pairs of hashes are likely to be
the same, and you'll be getting different complex expressions with the
same hash value all over the place.

To test whether a hash function is good enough for this use, pick some
random key blocks, run through all pairs of bits in a block, and XOR
all settings of those bits with those keys. For each pair of bits,
for each key, compute (hash(01)+hash(10)) - (hash(00)+hash(11)). (The
equivalent test for XOR is hash(01)^hash(10)^hash(00)^hash(11).) Then
do the normal tests on that output to make sure it is well
distributed. This test is O(n*n) in the number of bits in a block,
and it covers a real use case.

Andres Valloud

unread,

Oct 1, 2008, 4:05:29 AM10/1/08

to hash-fu...@googlegroups.com

Bob,

Sorry for the delay in answering, I have been extremely busy as of late.

bob_j...@burtleburtle.net wrote:
> When writing a hash function for an expression, say "child1 ^
> child2" (^ means XOR), you'd like to do
> hash = seed;
> hash += hash(child1);
> hash += hash(child2);
> hash += "XOR";
> mix(hash);
> The order of the inputs won't affect the result, which is good,
> because XOR is commutative. If you store the hash in the expression,
> then new expressions can find their hash given just the hashes of
> their children and constant extra work for their own operation. I've
> seen this done.
>
> But look at this part
> hash += hash(child1);
> hash += hash(child2);
> The hash function for the children was this hash function. What if
> you have (a01 XOR a10), and also (a00 XOR a11)? They're different.
> But the four subexpressions (a01, a10, a00, a11) differ in only two
> bits. The pairs (a01,a10) and (a00, a11) differ the same way in the
> first of those bits but differently in the second. If the mix isn't
> thorough enough, the sums of those pairs of hashes are likely to be
> the same, and you'll be getting different complex expressions with the
> same hash value all over the place.
>

I think what you're going after is that the hash function has to
commute. For example, given the expressions

e1 = c1 ^ c2;
e2 = c2 ^ c1;

you'd still want hash(e1) = hash(e2) because e1 = e2 (because ^ itself
commutes). So this means that hash() cannot use the children order
information because otherwise it stops being commutative. And as you
point out, commutative hash functions are harder to design than non
commutative ones precisely because of this.

> To test whether a hash function is good enough for this use, pick some
> random key blocks, run through all pairs of bits in a block, and XOR
> all settings of those bits with those keys. For each pair of bits,
> for each key, compute (hash(01)+hash(10)) - (hash(00)+hash(11)). (The
> equivalent test for XOR is hash(01)^hash(10)^hash(00)^hash(11).) Then
> do the normal tests on that output to make sure it is well
> distributed. This test is O(n*n) in the number of bits in a block,
> and it covers a real use case.
>

I can't quite put my finger to it, at least not completely, but I
suspect that this may not always detect issues. For example, consider
this alternate hash function inner code instead.

hash ^= hash(child1);
hash ^= hash(child2);

Let's define hash(leaf child) as a lookup into an s-box. I get the
feeling that this should get around the bit pair test because the s-box
will cause all sorts of bit patterns to change at once. And yet the
s-box values will collide heavily and cancel each other out should there
be equal children in the expression.

Something that also concerns me is that even if the test passes, the
strong bit correlation patterns given by something like an s-box may
cause the hash function quality to suffer. In other words, let's say
the combination rule for s-box values is bit xor, then consider the
simplified case of hashing strings on a character by character basis,
assuming one cannot use the character sequence information. Then,
basically,

'abc' hash => sBox[a] ^ sBox[b] ^ sBox[c]

In the same way,

'def' hash => sBox[d] ^ sBox[e] ^ sBox[f]

But then,

'abcx' hash => 'abc' hash ^ sBox[x]
'defx' hash => 'def' hash ^ sBox[x]

If many keys have the same characters, such as the character x in this
case, its contribution becomes effectively a no-op (or, in other words,
it is equivalent to a distribution preserving permutation which does not
contribute to the hash quality).

I think what you are trying to detect with the test is that these no-ops
should not be trivial. But, distribution wise (and particularly mod p),
it should not matter much what is the exact permutation that is not
contributing much to the hash values. After all, it is just a
permutation that preserves distribution, and so the distribution quality
before the permutation is going to be the same as the distribution
quality after the permutation. So, while the effect may no longer be
trivial, it is still there because of the requirement that the hash
function must commute on its inputs.

Well... this may seem to make sense to me, but... maybe I missed something?

Also, I am curious. In your particular case, didn't the standard
quality tests like chi^2 detect problems with the original hash function
code you posted, when measuring against a dataset extracted from practice?

Andres.

aka...@gmail.com

unread,

Oct 19, 2008, 2:34:18 AM10/19/08

to Hash Functions

Hi
i have problem
i make byte message in c# and send with sokect to server.
but server abort connection, this message.
`SC`00541.00TJ123456PPC 00000000DLGLGN 00000001TXBEG
Login:user=a, pswd=a 9afb81c7
"9afb81c7" this checksum of this message.
how to create checksum.
this 8 bytes, which is negative value of the result of an exclusive or
operation of message in 32-digit format.
can you help how to create this check sum.
thank you...

Reply all

Reply to author

Forward