hash function Shahaha

Rade

unread,

Mar 21, 2011, 8:27:01 PM3/21/11

to

Shahaha is a hash function and ideas behind Shahaha are discussed
here
http://groups.google.com/group/sci.crypt/browse_thread/thread/118b5fdf66144d64

There is a few improvements over Wolfram's rule 30 hash design:
- a linear dependency cost on a message
- operating on a word level instead on a bit level

Shahaha is implemented as shahaha.h (see below). The genKat.c (google
NIST sha3 competition site) may be used to test / use shahaha function
(just include shahaha.h file). Shahaha conforms to the most of
requirements of NIST sha3 competition excluding the very long message
part of genKat.c. That part is not working because shahaha needs
complete message at once as an imput.

Regards Rade Vuckovac

//shahaha.h

#include <stdint.h>

typedef unsigned char BitSequence;
typedef unsigned long long DataLength; // a typical 64-bit value
typedef enum { SUCCESS = 0, FAIL = 1, BAD_HASHBITLEN = 2, NOMEM = 3 }
HashReturn;

typedef struct
{
int hashbitlen;
int rounds;
uint8_t mixer;
uint8_t carry;
uint64_t initial;
uint64_t asize;
uint8_t * ip;
}
hashState;

HashReturn Init(hashState *state, int hashbitlen)
{
state->hashbitlen = hashbitlen;
state->rounds = 3;
state->mixer = 85; //01010101
state->carry = 0;
state->initial = 256;
state->asize =0;
state->ip = (uint8_t *) calloc(state->initial, sizeof(uint8_t));
if(state->ip == NULL)
{
printf("Error allocating memory\n");
return NOMEM;
}

return SUCCESS;
}

HashReturn Update(hashState *state, const BitSequence *data,
DataLength databitlen)
{
uint64_t i, databytelen;

databytelen = databitlen / 8;
if(databitlen % 8 !=0)
{
databytelen++;
state->ip[0] ^= databitlen % 8;
}

state->asize = state->initial + databytelen;
state->ip = (uint8_t *) realloc(state->ip, (state->asize) *
sizeof(uint8_t));
if(state->ip == NULL)
{
printf("Error (re)allocating memory\n");
return NOMEM;
}

for(i = 0; i < databytelen; i++)
{
state->ip[state->initial + i] = data[i];
}

return SUCCESS;
}

HashReturn Final(hashState *state, BitSequence *hashval)
{
int i;
uint64_t j;

for(i = 0; i < state->rounds; i++)
{
for(j = 0; j < state->asize; j++)
{
uint8_t out = state->ip[j];
const uint8_t in = state->ip[(j + 1) % (state->asize)];
const uint8_t a = state->ip[(j + 2) % (state->asize)];
const uint8_t b = state->ip[(j + 3) % (state->asize)];

if (a > b)
state->carry ^= in;
else
state->carry ^= ~in;

state->ip[j] = (out ^= state->carry);
state->carry += state->mixer;
}
}

for(i = 0; i < (state->hashbitlen / 8); i++)
{
uint8_t out = state->ip[i];
const uint8_t in = state->ip[(i + 1) % (state->asize)];
const uint8_t a = state->ip[(i + 2) % (state->asize)];
const uint8_t b = state->ip[(i + 3) % (state->asize)];

if (a > b)
state->carry ^= in;
else
state->carry ^= ~in;

state->ip[i] = (out ^= state->carry);
state->carry += state->mixer;

hashval[i] = state->ip[i];
}

return SUCCESS;
}

HashReturn Hash(int hashbitlen, const BitSequence *data,
DataLength databitlen, BitSequence *hashval)
{
hashState state;

Init(&state, hashbitlen);

Update(&state, data, databitlen);

Final(&state, hashval);

return SUCCESS;
}

Scott Fluhrer

unread,

Mar 22, 2011, 12:15:31 PM3/22/11

to

Collision in the below hash function (and 256 bit tags; the tag size is
variable, and so I had to pick something):

Preimage1:
1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
39 75 56 cf 1a 22 9e f5 4f 6c 8e 35 6d 22 69 a2 66 32 17 b0 4d eb 53

Preimage 2:
1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
39 75 56 cf 1a 22 9e f5 4f 6c 8e 7a 26 9d 6e 41 11 4d ff b0 4d eb 53

Common Hash Value:
25 5d 77 b1 2a 47 38 21 e4 2a 97 9f 75 5a 34 97 b7 2d 1f a0 32 96 b5 80 8f
88 c7 72 5d f4 cd ee

I'd tell you how this collision works (and how I found it), but I suspect
it'd be more instructive for you to figure it out yourself.

"Rade" <rade.v...@gmail.com> wrote in message
news:7da30d2d-0538-4cf9...@22g2000prx.googlegroups.com...

Tom St Denis

unread,

Mar 22, 2011, 12:43:56 PM3/22/11

to

On Mar 21, 8:27 pm, Rade <rade.vucko...@gmail.com> wrote:
> state->asize = state->initial + databytelen;
> state->ip = (uint8_t *) realloc(state->ip, (state->asize) *
> sizeof(uint8_t));

First and foremost, buffering the entire message before you process it
eliminates your hash from most uses. It's not practical.

Second, the realloc statement is also a source of a memory leak.

Tom

Scott Fluhrer

unread,

Mar 22, 2011, 6:16:23 PM3/22/11

to

"Scott Fluhrer" <sflu...@ix.netcom.com> wrote in message
news:13008104...@sj-nntpcache-3.cisco.com...

> Collision in the below hash function (and 256 bit tags; the tag size is
> variable, and so I had to pick something):
>
> Preimage1:
> 1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
> 39 75 56 cf 1a 22 9e f5 4f 6c 8e 35 6d 22 69 a2 66 32 17 b0 4d eb 53
>
> Preimage 2:
> 1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
> 39 75 56 cf 1a 22 9e f5 4f 6c 8e 7a 26 9d 6e 41 11 4d ff b0 4d eb 53
>
> Common Hash Value:
> 25 5d 77 b1 2a 47 38 21 e4 2a 97 9f 75 5a 34 97 b7 2d 1f a0 32 96 b5 80 8f
> 88 c7 72 5d f4 cd ee
>
>
> I'd tell you how this collision works (and how I found it), but I suspect
> it'd be more instructive for you to figure it out yourself.

I've strengthened the attack to a second-preimage attack. Here's an example
with a 512 bit hash tag:

Preimage 1:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00

Preimage 2:
50 6f 6e 63 68 6f 20 77 61 73 20 68 65 72 65 00 00 00 00 00 00 00 da 2b d0
c4 3e 36 3c 00 00 ff 67 00 00 00

Common Hash Value:
74 6e d5 6d f1 c1 a4 4e 90 12 a9 69 95 f5 90 0a 3c a6 5d a5 29 49 0c 96 e8
ea 81 01 5d bd 48 52 84 fe 85 1d 41 b1 74 5e 60 a2 d9 39 c5 c5 20 1a ec 36
ad d5 99 79 fc e6 98 7a 91 d1 6d 0d 38 a2

Finding this took about 150,000 hash evaluations (rather less than the
2**511 it's supposed to take). Cheers!

Rade

unread,

Mar 23, 2011, 8:34:46 AM3/23/11

to

On Mar 23, 8:16 am, "Scott Fluhrer" <sfluh...@ix.netcom.com> wrote:
> "Scott Fluhrer" <sfluh...@ix.netcom.com> wrote in message

> > "Rade" <rade.vucko...@gmail.com> wrote in message

> >news:7da30d2d-0538-4cf9...@22g2000prx.googlegroups.com...
> >> Shahaha is a hash function and ideas behind Shahaha are discussed
> >> here

> >>http://groups.google.com/group/sci.crypt/browse_thread/thread/118b5fd...

That was always problem with cellular automata complexity, especially
setting initial state to zeros (see previous tread with rule 30). For
rule 30 remedies was to set initial state to expansion of Pi. Before
fiddling with Shahaha and initial state there is one simple idea. The
final hash value may be xored with message. If message is longer than
hash just continue xoring from the begining of the hash value. For
example hash h = 123 (from Shahaha), message m = 12345, then improved
hash ih = (123 xor 123) xor 45. The idea : m1 now needs the specific
hash to match original pairing h xor m = h1 xor m1 ( and if it is
working then that may be even a general medicine for collision
attacks).

Rade

Scott Fluhrer

unread,

Mar 23, 2011, 9:27:09 AM3/23/11

to

"Rade" <rade.v...@gmail.com> wrote in message

news:af68b0bf-92f9-4a01...@f36g2000pri.googlegroups.com...

> That was always problem with cellular automata complexity, especially
> setting initial state to zeros (see previous tread with rule 30). For
> rule 30 remedies was to set initial state to expansion of Pi. Before
> fiddling with Shahaha and initial state there is one simple idea. The
> final hash value may be xored with message. If message is longer than
> hash just continue xoring from the begining of the hash value. For
> example hash h = 123 (from Shahaha), message m = 12345, then improved
> hash ih = (123 xor 123) xor 45. The idea : m1 now needs the specific
> hash to match original pairing h xor m = h1 xor m1 ( and if it is
> working then that may be even a general medicine for collision
> attacks).

Before you do that, it might be wise to study my attack vectors to find out
how they work. If the goal of your fiddling is to disrupt these
demonstrated attacks, it might be wise to be fairly certain that what you're
doing will actually do that. Until you understand it, well, you're just
guessing, and to be honest, your guesses aren't particular accurate.

--
poncho

Rade

unread,

Mar 24, 2011, 9:23:11 AM3/24/11

to

On Mar 23, 11:27 pm, "Scott Fluhrer" <sfluh...@ix.netcom.com> wrote:
> "Rade" <rade.vucko...@gmail.com> wrote in message

Thanks for suggestions. Is that mean that the patch is working?

Rade

Scott Fluhrer

unread,

Mar 24, 2011, 3:06:27 PM3/24/11

to

"Rade" <rade.v...@gmail.com> wrote in message

news:ff84d93e-6fa3-4751...@i39g2000prd.googlegroups.com...

> Thanks for suggestions. Is that mean that the patch is working?

Do you mean whether your guess that xoring in the preimage into the hash
might foil the attack? Well, I suppose that "you're just guessing, and ...
your guesses aren't particular[ly] accurate" could be interpretted as "by
George, you've got it". I'll try to be less ambiguous next time.

On the other hand, no, it doesn't work. To make it more of a challenge, I
tried to find a four-way collision, where the messages collided both with
and without the tweak. Here's what I came up with (with 256 bit tags; each
preimage is a 512 bit value consisting of a 256 bit value repeated twice; if
you do the xor-hack, the two halves will cancel each other out and leave the
hash value unaffected):

Preimage:
94 97 3c b8 65 fd b1 bf c8 f6 50 22 2c 74 15 3f 31 7f d8 01 62 bf 34 dd c9
c5 1a eb d8 19 7e 7b
94 97 3c b8 65 fd b1 bf c8 f6 50 22 2c 74 15 3f 31 7f d8 01 62 bf 34 dd c9
c5 1a eb d8 19 7e 7b

Preimage:
9e a7 42 75 bd ce b9 f2 48 1b 7c ff 44 54 20 82 29 59 47 70 e9 3f 61 9e 06
1d b0 01 38 db 36 43
9e a7 42 75 bd ce b9 f2 48 1b 7c ff 44 54 20 82 29 59 47 70 e9 3f 61 9e 06
1d b0 01 38 db 36 43

Preimage:
d4 91 78 1b fc 99 38 f9 d6 6a 11 b5 28 84 16 a7 32 35 96 74 49 57 6c 43 82
0a b9 22 6a a8 8b ac
d4 91 78 1b fc 99 38 f9 d6 6a 11 b5 28 84 16 a7 32 35 96 74 49 57 6c 43 82
0a b9 22 6a a8 8b ac

Preimage:
7d 76 02 c3 c4 ba ae 46 a2 74 47 64 a4 01 bc 3e 75 98 77 18 66 63 2a 35 6b
99 f1 41 25 b3 57 64
7d 76 02 c3 c4 ba ae 46 a2 74 47 64 a4 01 bc 3e 75 98 77 18 66 63 2a 35 6b
99 f1 41 25 b3 57 64

Common hash value (both with and without the tweak):
16 ac 5f e8 0a fc db 58 86 64 6f f0 2a 6b 95 82 7f 98 04 61 31 5a ae 67 3c
1a aa 0f af eb 05 e2

Now, will you actually take my suggestion, and study your own hash function
to see why I can make collisions so easily???

--
poncho

rossum

unread,

Mar 24, 2011, 8:02:05 PM3/24/11

to

On Thu, 24 Mar 2011 06:23:11 -0700 (PDT), Rade
<rade.v...@gmail.com> wrote:

>Thanks for suggestions. Is that mean that the patch is working?
>
>Rade

Scott has already answered that; your patch is broken. In the past I
proposed my own hash function here, and Scott very kindly tore it
apart and helped me to discover its weaknesses for myself.

You are being offered some very high quality tuition here, for free.
I suggest that you don't waste the opportunity.

Your hash is broken. You need to try to work out how Scott has broken
it. For instance compare the first two preimages he found:

>Preimage1:
>1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
>39 75 56 cf 1a 22 9e f5 4f 6c 8e 35 6d 22 69 a2 66 32 17 b0 4d eb 53
>
>Preimage 2:
>1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
>39 75 56 cf 1a 22 9e f5 4f 6c 8e 7a 26 9d 6e 41 11 4d ff b0 4d eb 53
>
>Common Hash Value:
>25 5d 77 b1 2a 47 38 21 e4 2a 97 9f 75 5a 34 97 b7 2d 1f a0 32 96 b5 80 8f
>88 c7 72 5d f4 cd ee

Have a look at the two preimages. Which bytes are the same? Which
bytes are different? Now look at your code and try to work out why
Scott picked those bytes to change while leaving all the others
unchanged. That will get you started on the weaknesses of you
proposal.

rossum

Scott Fluhrer

unread,

Mar 24, 2011, 11:48:27 PM3/24/11

to

"rossum" <ross...@coldmail.com> wrote in message
news:7slno6toc28uq29br...@4ax.com...

> On Thu, 24 Mar 2011 06:23:11 -0700 (PDT), Rade
> <rade.v...@gmail.com> wrote:
>
>>Thanks for suggestions. Is that mean that the patch is working?
>>
>>Rade
> Scott has already answered that; your patch is broken. In the past I
> proposed my own hash function here, and Scott very kindly tore it
> apart and helped me to discover its weaknesses for myself.

Why, thank you!

>
> You are being offered some very high quality tuition here, for free.
> I suggest that you don't waste the opportunity.
>
> Your hash is broken. You need to try to work out how Scott has broken
> it. For instance compare the first two preimages he found:
>
>>Preimage1:
>>1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
>>39 75 56 cf 1a 22 9e f5 4f 6c 8e 35 6d 22 69 a2 66 32 17 b0 4d eb 53
>>
>>Preimage 2:
>>1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
>>39 75 56 cf 1a 22 9e f5 4f 6c 8e 7a 26 9d 6e 41 11 4d ff b0 4d eb 53
>>
>>Common Hash Value:
>>25 5d 77 b1 2a 47 38 21 e4 2a 97 9f 75 5a 34 97 b7 2d 1f a0 32 96 b5 80 8f
>>88 c7 72 5d f4 cd ee
>
> Have a look at the two preimages. Which bytes are the same? Which
> bytes are different? Now look at your code and try to work out why
> Scott picked those bytes to change while leaving all the others
> unchanged. That will get you started on the weaknesses of you
> proposal.

Hmmmm, for normal hash functions, that would be a good place to start.
However, Shahaha is not structured like a normal hash function, and so I
suspect that wondering why certain bytes were picked might be misleading.
On my first attempt, I didn't quite fully understand the hash function, and
so I thought certain bytes had to be fixed for the attack to work, when
there is no such requirement. You can hardly be expected to reconstruct my
misunderstanding. If you take a look at my latter efforts, such as my most
recent 4-way collision, the answer to 'which bytes change' is 'all of them'.

Instead, I suspect that a more productive avenue might be to inquire 'what
does the internal hash state look like midway through the hash function?
When you compare the processing of the two preimages, what's the same?'. If
you can answer that (and can also answer *why* it happens), you'll be a lot
closer.

--
poncho

Rade

unread,

Mar 25, 2011, 8:30:16 AM3/25/11

to

On Mar 25, 1:48 pm, "Scott Fluhrer" <sfluh...@ix.netcom.com> wrote:
> "rossum" <rossu...@coldmail.com> wrote in message

Thanks again. I will check a few things and report of Shahaha status.

Rade

unread,

Mar 28, 2011, 11:48:43 PM3/28/11

to

The problem with current design of Shahaha is that some local changes
in initial state not propagate globally (because of linear cost) as it
is not case with rule 30 hash proposal. The fix may be in the form of
processing last state via some sort of Merkle - Damgard construct. But
in the other hand whole Shahaha may be constructed around Merkle -
Damgard.

Rade

Tom St Denis

unread,

Mar 29, 2011, 6:10:32 AM3/29/11

to

You seemed to ignore my reply ... but let me restate. If you don't
make the hash practical it doesn't really matter if it's secure or
not.

Also, fix your memory leak.

Tom

Rade

unread,

Mar 29, 2011, 9:18:32 PM3/29/11

to

As I mentioned Shahaha may be designed using Merkle - Damgard
construct, and I think the further design effort is probably going
that way.

Rade

Bryan

unread,

Apr 1, 2011, 3:29:57 AM4/1/11

to

Rade wrote:
> As I mentioned Shahaha may be designed using Merkle - Damgard
> construct,

No. Shahaha has already been designed. It either follows the Merkle–
Damgård construction or it does not, and unless I'm grossly
misunderstanding the code, it does not. Scott Fluhrer broke Shahaha.

> and I think the further design effort is probably going that way.

The Merkle–Damgård structure still has advocates, but I think "wide
pipe" constructions are the future of hash function design. Either
way, Shahaha is a discredited mistake of the past.

Rade, if you want to get any value from Shahaha, take Scott's advice
and study how to break it. Can you produce similar results? Can you
improve upon them? If your answer is 'yes', can you actually show the
preimages that prove it, as Scott did, or are you merely imagining
that you could?

Less central, but worth noting, can you fix the memory leak in your
code? Tom tried to clue you in, and though I might describe the defect
differently, Tom pointed out real problem. Furthermore, what was the
point of your functions Init(), Update(), and Final() returning a
value of type HashReturn, while your function Hash() calls them but
ignores the return values?

Defining codes for impossible errors is arguably defensible, and so is
ignoring impossible errors, but here we have the indefensible case of
incorrect results returned as success. Init() and Update() can return
NOMEM. Why does Hash() return SUCCESS even when it fails?

The news on Shahaha is not good. The coding is bad, and the cryptology
is worse. The best we can say is that the lessons are there to be
learned.

--
--Bryan

Rade

unread,

Apr 1, 2011, 11:05:00 PM4/1/11

to

My apologies if my naming convention is not coherent. Shahaha for me
is a cellular automata rule which is used in a hash construction. In
posted attempt the rule did not do the job. As I mentioned the problem
with Shahaha is that part of the message (preferably second part of
the message) can be changed and that changes are not propagating
globally (because carry variable is sole carrier of the message change
and as was shown can be manipulated), consequently not changing the
final hash. The Wolfram rule 30 apparently does not have that problem
but the construct is not efficient (see proposal
http://www.google.com/url?sa=D&q=http://groups.google.com/group/sci.crypt/browse_thread/thread/118b5fdf66144d64).
From my point of view the rule behind Shahaha may have future ie be
efficient and secure. The Merkle - Dagmard construct may be the way
to go and the consequent design will be named appropriately. Also I
think that programming issues raised are not primary concern when
actually a hash concept is discussed.

Rade