There is a few improvements over Wolfram's rule 30 hash design:
- a linear dependency cost on a message
- operating on a word level instead on a bit level
Shahaha is implemented as shahaha.h (see below). The genKat.c (google
NIST sha3 competition site) may be used to test / use shahaha function
(just include shahaha.h file). Shahaha conforms to the most of
requirements of NIST sha3 competition excluding the very long message
part of genKat.c. That part is not working because shahaha needs
complete message at once as an imput.
Regards Rade Vuckovac
//shahaha.h
#include <stdint.h>
typedef unsigned char BitSequence;
typedef unsigned long long DataLength; // a typical 64-bit value
typedef enum { SUCCESS = 0, FAIL = 1, BAD_HASHBITLEN = 2, NOMEM = 3 }
HashReturn;
typedef struct
{
int hashbitlen;
int rounds;
uint8_t mixer;
uint8_t carry;
uint64_t initial;
uint64_t asize;
uint8_t * ip;
}
hashState;
HashReturn Init(hashState *state, int hashbitlen)
{
state->hashbitlen = hashbitlen;
state->rounds = 3;
state->mixer = 85; //01010101
state->carry = 0;
state->initial = 256;
state->asize =0;
state->ip = (uint8_t *) calloc(state->initial, sizeof(uint8_t));
if(state->ip == NULL)
{
printf("Error allocating memory\n");
return NOMEM;
}
return SUCCESS;
}
HashReturn Update(hashState *state, const BitSequence *data,
DataLength databitlen)
{
uint64_t i, databytelen;
databytelen = databitlen / 8;
if(databitlen % 8 !=0)
{
databytelen++;
state->ip[0] ^= databitlen % 8;
}
state->asize = state->initial + databytelen;
state->ip = (uint8_t *) realloc(state->ip, (state->asize) *
sizeof(uint8_t));
if(state->ip == NULL)
{
printf("Error (re)allocating memory\n");
return NOMEM;
}
for(i = 0; i < databytelen; i++)
{
state->ip[state->initial + i] = data[i];
}
return SUCCESS;
}
HashReturn Final(hashState *state, BitSequence *hashval)
{
int i;
uint64_t j;
for(i = 0; i < state->rounds; i++)
{
for(j = 0; j < state->asize; j++)
{
uint8_t out = state->ip[j];
const uint8_t in = state->ip[(j + 1) % (state->asize)];
const uint8_t a = state->ip[(j + 2) % (state->asize)];
const uint8_t b = state->ip[(j + 3) % (state->asize)];
if (a > b)
state->carry ^= in;
else
state->carry ^= ~in;
state->ip[j] = (out ^= state->carry);
state->carry += state->mixer;
}
}
for(i = 0; i < (state->hashbitlen / 8); i++)
{
uint8_t out = state->ip[i];
const uint8_t in = state->ip[(i + 1) % (state->asize)];
const uint8_t a = state->ip[(i + 2) % (state->asize)];
const uint8_t b = state->ip[(i + 3) % (state->asize)];
if (a > b)
state->carry ^= in;
else
state->carry ^= ~in;
state->ip[i] = (out ^= state->carry);
state->carry += state->mixer;
hashval[i] = state->ip[i];
}
return SUCCESS;
}
HashReturn Hash(int hashbitlen, const BitSequence *data,
DataLength databitlen, BitSequence *hashval)
{
hashState state;
Init(&state, hashbitlen);
Update(&state, data, databitlen);
Final(&state, hashval);
return SUCCESS;
}
Preimage1:
1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
39 75 56 cf 1a 22 9e f5 4f 6c 8e 35 6d 22 69 a2 66 32 17 b0 4d eb 53
Preimage 2:
1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
39 75 56 cf 1a 22 9e f5 4f 6c 8e 7a 26 9d 6e 41 11 4d ff b0 4d eb 53
Common Hash Value:
25 5d 77 b1 2a 47 38 21 e4 2a 97 9f 75 5a 34 97 b7 2d 1f a0 32 96 b5 80 8f
88 c7 72 5d f4 cd ee
I'd tell you how this collision works (and how I found it), but I suspect
it'd be more instructive for you to figure it out yourself.
"Rade" <rade.v...@gmail.com> wrote in message
news:7da30d2d-0538-4cf9...@22g2000prx.googlegroups.com...
First and foremost, buffering the entire message before you process it
eliminates your hash from most uses. It's not practical.
Second, the realloc statement is also a source of a memory leak.
Tom
I've strengthened the attack to a second-preimage attack. Here's an example
with a 512 bit hash tag:
Preimage 1:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00
Preimage 2:
50 6f 6e 63 68 6f 20 77 61 73 20 68 65 72 65 00 00 00 00 00 00 00 da 2b d0
c4 3e 36 3c 00 00 ff 67 00 00 00
Common Hash Value:
74 6e d5 6d f1 c1 a4 4e 90 12 a9 69 95 f5 90 0a 3c a6 5d a5 29 49 0c 96 e8
ea 81 01 5d bd 48 52 84 fe 85 1d 41 b1 74 5e 60 a2 d9 39 c5 c5 20 1a ec 36
ad d5 99 79 fc e6 98 7a 91 d1 6d 0d 38 a2
Finding this took about 150,000 hash evaluations (rather less than the
2**511 it's supposed to take). Cheers!
That was always problem with cellular automata complexity, especially
setting initial state to zeros (see previous tread with rule 30). For
rule 30 remedies was to set initial state to expansion of Pi. Before
fiddling with Shahaha and initial state there is one simple idea. The
final hash value may be xored with message. If message is longer than
hash just continue xoring from the begining of the hash value. For
example hash h = 123 (from Shahaha), message m = 12345, then improved
hash ih = (123 xor 123) xor 45. The idea : m1 now needs the specific
hash to match original pairing h xor m = h1 xor m1 ( and if it is
working then that may be even a general medicine for collision
attacks).
Rade
> That was always problem with cellular automata complexity, especially
> setting initial state to zeros (see previous tread with rule 30). For
> rule 30 remedies was to set initial state to expansion of Pi. Before
> fiddling with Shahaha and initial state there is one simple idea. The
> final hash value may be xored with message. If message is longer than
> hash just continue xoring from the begining of the hash value. For
> example hash h = 123 (from Shahaha), message m = 12345, then improved
> hash ih = (123 xor 123) xor 45. The idea : m1 now needs the specific
> hash to match original pairing h xor m = h1 xor m1 ( and if it is
> working then that may be even a general medicine for collision
> attacks).
Before you do that, it might be wise to study my attack vectors to find out
how they work. If the goal of your fiddling is to disrupt these
demonstrated attacks, it might be wise to be fairly certain that what you're
doing will actually do that. Until you understand it, well, you're just
guessing, and to be honest, your guesses aren't particular accurate.
--
poncho
Thanks for suggestions. Is that mean that the patch is working?
Rade
Do you mean whether your guess that xoring in the preimage into the hash
might foil the attack? Well, I suppose that "you're just guessing, and ...
your guesses aren't particular[ly] accurate" could be interpretted as "by
George, you've got it". I'll try to be less ambiguous next time.
On the other hand, no, it doesn't work. To make it more of a challenge, I
tried to find a four-way collision, where the messages collided both with
and without the tweak. Here's what I came up with (with 256 bit tags; each
preimage is a 512 bit value consisting of a 256 bit value repeated twice; if
you do the xor-hack, the two halves will cancel each other out and leave the
hash value unaffected):
Preimage:
94 97 3c b8 65 fd b1 bf c8 f6 50 22 2c 74 15 3f 31 7f d8 01 62 bf 34 dd c9
c5 1a eb d8 19 7e 7b
94 97 3c b8 65 fd b1 bf c8 f6 50 22 2c 74 15 3f 31 7f d8 01 62 bf 34 dd c9
c5 1a eb d8 19 7e 7b
Preimage:
9e a7 42 75 bd ce b9 f2 48 1b 7c ff 44 54 20 82 29 59 47 70 e9 3f 61 9e 06
1d b0 01 38 db 36 43
9e a7 42 75 bd ce b9 f2 48 1b 7c ff 44 54 20 82 29 59 47 70 e9 3f 61 9e 06
1d b0 01 38 db 36 43
Preimage:
d4 91 78 1b fc 99 38 f9 d6 6a 11 b5 28 84 16 a7 32 35 96 74 49 57 6c 43 82
0a b9 22 6a a8 8b ac
d4 91 78 1b fc 99 38 f9 d6 6a 11 b5 28 84 16 a7 32 35 96 74 49 57 6c 43 82
0a b9 22 6a a8 8b ac
Preimage:
7d 76 02 c3 c4 ba ae 46 a2 74 47 64 a4 01 bc 3e 75 98 77 18 66 63 2a 35 6b
99 f1 41 25 b3 57 64
7d 76 02 c3 c4 ba ae 46 a2 74 47 64 a4 01 bc 3e 75 98 77 18 66 63 2a 35 6b
99 f1 41 25 b3 57 64
Common hash value (both with and without the tweak):
16 ac 5f e8 0a fc db 58 86 64 6f f0 2a 6b 95 82 7f 98 04 61 31 5a ae 67 3c
1a aa 0f af eb 05 e2
Now, will you actually take my suggestion, and study your own hash function
to see why I can make collisions so easily???
--
poncho
>Thanks for suggestions. Is that mean that the patch is working?
>
>Rade
Scott has already answered that; your patch is broken. In the past I
proposed my own hash function here, and Scott very kindly tore it
apart and helped me to discover its weaknesses for myself.
You are being offered some very high quality tuition here, for free.
I suggest that you don't waste the opportunity.
Your hash is broken. You need to try to work out how Scott has broken
it. For instance compare the first two preimages he found:
>Preimage1:
>1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
>39 75 56 cf 1a 22 9e f5 4f 6c 8e 35 6d 22 69 a2 66 32 17 b0 4d eb 53
>
>Preimage 2:
>1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
>39 75 56 cf 1a 22 9e f5 4f 6c 8e 7a 26 9d 6e 41 11 4d ff b0 4d eb 53
>
>Common Hash Value:
>25 5d 77 b1 2a 47 38 21 e4 2a 97 9f 75 5a 34 97 b7 2d 1f a0 32 96 b5 80 8f
>88 c7 72 5d f4 cd ee
Have a look at the two preimages. Which bytes are the same? Which
bytes are different? Now look at your code and try to work out why
Scott picked those bytes to change while leaving all the others
unchanged. That will get you started on the weaknesses of you
proposal.
rossum
Why, thank you!
>
> You are being offered some very high quality tuition here, for free.
> I suggest that you don't waste the opportunity.
>
> Your hash is broken. You need to try to work out how Scott has broken
> it. For instance compare the first two preimages he found:
>
>>Preimage1:
>>1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
>>39 75 56 cf 1a 22 9e f5 4f 6c 8e 35 6d 22 69 a2 66 32 17 b0 4d eb 53
>>
>>Preimage 2:
>>1d 87 30 02 9e 91 f9 51 c9 26 2a 5d e8 11 3f b0 19 f9 8a 86 91 30 70 fc 44
>>39 75 56 cf 1a 22 9e f5 4f 6c 8e 7a 26 9d 6e 41 11 4d ff b0 4d eb 53
>>
>>Common Hash Value:
>>25 5d 77 b1 2a 47 38 21 e4 2a 97 9f 75 5a 34 97 b7 2d 1f a0 32 96 b5 80 8f
>>88 c7 72 5d f4 cd ee
>
> Have a look at the two preimages. Which bytes are the same? Which
> bytes are different? Now look at your code and try to work out why
> Scott picked those bytes to change while leaving all the others
> unchanged. That will get you started on the weaknesses of you
> proposal.
Hmmmm, for normal hash functions, that would be a good place to start.
However, Shahaha is not structured like a normal hash function, and so I
suspect that wondering why certain bytes were picked might be misleading.
On my first attempt, I didn't quite fully understand the hash function, and
so I thought certain bytes had to be fixed for the attack to work, when
there is no such requirement. You can hardly be expected to reconstruct my
misunderstanding. If you take a look at my latter efforts, such as my most
recent 4-way collision, the answer to 'which bytes change' is 'all of them'.
Instead, I suspect that a more productive avenue might be to inquire 'what
does the internal hash state look like midway through the hash function?
When you compare the processing of the two preimages, what's the same?'. If
you can answer that (and can also answer *why* it happens), you'll be a lot
closer.
--
poncho
Thanks again. I will check a few things and report of Shahaha status.
Rade
The problem with current design of Shahaha is that some local changes
in initial state not propagate globally (because of linear cost) as it
is not case with rule 30 hash proposal. The fix may be in the form of
processing last state via some sort of Merkle - Damgard construct. But
in the other hand whole Shahaha may be constructed around Merkle -
Damgard.
Rade
You seemed to ignore my reply ... but let me restate. If you don't
make the hash practical it doesn't really matter if it's secure or
not.
Also, fix your memory leak.
Tom
As I mentioned Shahaha may be designed using Merkle - Damgard
construct, and I think the further design effort is probably going
that way.
Rade
No. Shahaha has already been designed. It either follows the Merkle–
Damgård construction or it does not, and unless I'm grossly
misunderstanding the code, it does not. Scott Fluhrer broke Shahaha.
> and I think the further design effort is probably going that way.
The Merkle–Damgård structure still has advocates, but I think "wide
pipe" constructions are the future of hash function design. Either
way, Shahaha is a discredited mistake of the past.
Rade, if you want to get any value from Shahaha, take Scott's advice
and study how to break it. Can you produce similar results? Can you
improve upon them? If your answer is 'yes', can you actually show the
preimages that prove it, as Scott did, or are you merely imagining
that you could?
Less central, but worth noting, can you fix the memory leak in your
code? Tom tried to clue you in, and though I might describe the defect
differently, Tom pointed out real problem. Furthermore, what was the
point of your functions Init(), Update(), and Final() returning a
value of type HashReturn, while your function Hash() calls them but
ignores the return values?
Defining codes for impossible errors is arguably defensible, and so is
ignoring impossible errors, but here we have the indefensible case of
incorrect results returned as success. Init() and Update() can return
NOMEM. Why does Hash() return SUCCESS even when it fails?
The news on Shahaha is not good. The coding is bad, and the cryptology
is worse. The best we can say is that the lessons are there to be
learned.
--
--Bryan
My apologies if my naming convention is not coherent. Shahaha for me
is a cellular automata rule which is used in a hash construction. In
posted attempt the rule did not do the job. As I mentioned the problem
with Shahaha is that part of the message (preferably second part of
the message) can be changed and that changes are not propagating
globally (because carry variable is sole carrier of the message change
and as was shown can be manipulated), consequently not changing the
final hash. The Wolfram rule 30 apparently does not have that problem
but the construct is not efficient (see proposal
http://www.google.com/url?sa=D&q=http://groups.google.com/group/sci.crypt/browse_thread/thread/118b5fdf66144d64).
From my point of view the rule behind Shahaha may have future ie be
efficient and secure. The Merkle - Dagmard construct may be the way
to go and the consequent design will be named appropriately. Also I
think that programming issues raised are not primary concern when
actually a hash concept is discussed.
Rade