What's the probability that 2 files of n bytes have the same hash using SHA1?

PeterMacGonagan

unread,

Dec 13, 2009, 8:10:48 AM12/13/09

to

Hello,

My question is about SHA1 hash function: http://en.wikipedia.org/wiki/SHA_hash_functions.

"SHA-1 produces a 160-bit digest from a message with a maximum length
of (2^64 - 1) bits. ... A brute-force search would require 2^80
operations."
I don't understand how it's possible to compute the required amount of
operations to uncipher the hash generated (2^80)? A new method can
brute force the hash in 2^33 operations.

My question is:

What's the probability that 2 files of n bytes have the same hash
using SHA1?

I'm interrested by the technique you'll use to answer my question ;o)

Thank you in advance.
Best Regards.
Peter MacGonagan

mike clark

unread,

Dec 13, 2009, 11:07:11 AM12/13/09

to

2^80 comes from the Birthday Problem (http://en.wikipedia.org/wiki/
Birthday_problem). I don't know where the 2^33 comes from. Any
references? Furthermore, you are not unciphering the generated hash.
That is impossible to do since hashes have less bits than potential
input files. There are infinitely many files with the same hash; it is
just computationally infeasible to find them.

I believe that the probability that 2 files have the same hash depends
on the requirements, but I could be wrong. Are you given one file and
expected to find another with the same hash? Or, are you just allowed
to create 2 files at random? Or, are you allowed to create any number
of files and only require that two be the same?

Joseph Ashwood

unread,

Dec 13, 2009, 10:11:17 AM12/13/09

to

"PeterMacGonagan" <rodrigu...@gmail.com> wrote in message
news:39c88e97-a7e0-4a0b...@n35g2000yqm.googlegroups.com...

> What's the probability that 2 files of n bytes have the same hash
> using SHA1?

Assuming sufficient n, 1/2^160.

> I'm interrested by the technique you'll use to answer my question ;o)

I used elementary probability, do your own homework.
Joe

unruh

unread,

Dec 13, 2009, 2:46:42 PM12/13/09

to

On 2009-12-13, PeterMacGonagan <rodrigu...@gmail.com> wrote:
> Hello,
>
> My question is about SHA1 hash function: http://en.wikipedia.org/wiki/SHA_hash_functions.
>
> "SHA-1 produces a 160-bit digest from a message with a maximum length
> of (2^64 - 1) bits. ... A brute-force search would require 2^80
> operations."

That is the number of operations required to find two files which have
the same hashi (Birthday paradox). To find another file with the same hash as a given file
requires about 2^160 operations ( where an operation is the hashing of
the file. ) (assuming sha1 is indistiguishable from a random function.)

> I don't understand how it's possible to compute the required amount of
> operations to uncipher the hash generated (2^80)? A new method can
> brute force the hash in 2^33 operations.

No, brute force by definition is "try all inputs until you find one thar
gives the desired output"

>
> My question is:
>
> What's the probability that 2 files of n bytes have the same hash
> using SHA1?

2^(-160)

Kristian Gjøsteen

unread,

Dec 15, 2009, 3:42:08 AM12/15/09

to

On 13.12.09 14.10, PeterMacGonagan wrote:
> My question is about SHA1 hash function: http://en.wikipedia.org/wiki/SHA_hash_functions.
>
> "SHA-1 produces a 160-bit digest from a message with a maximum length
> of (2^64 - 1) bits. ... A brute-force search would require 2^80
> operations."
> I don't understand how it's possible to compute the required amount of
> operations to uncipher the hash generated (2^80)? A new method can
> brute force the hash in 2^33 operations.

You should read the Wikipedia article carefully, as well as related
theory. Pay attention to mentions of the "birthday paradox" and exactly
what is attacked (the full SHA-1 or a reduced version).

> My question is:
>
> What's the probability that 2 files of n bytes have the same hash
> using SHA1?

Unless you specify how the files are chosen, it's impossible to answer,
of course.

-- kg

PeterMacGonagan

unread,

Dec 15, 2009, 5:59:40 AM12/15/09

to

Ok, thank you for all your answers.

I read about the birthday paradox and I think I undestand it. As we
can read, we can approximate the number of attempts to generate a
collision using brute force by:
Q(H)\approx \sqrt{\frac{\pi}{2}H}.

For a 160-bit hash, there are 2^160 possibilities (approximately
1,4615e+48 different outputs). So, it'll take approximately 1.515e+24
attempts to generate a collision (2^80 = 1.2e+24).

"It is easy to see that if the outputs of the function are distributed
unevenly, then a collision can be found even faster. The notion of
'balance' of a hash function quantifies the resistance of the function
to birthday attacks and allows the vulnerability of popular hashes
such as MD and SHA to be estimated (Bellare and Kohno, 2004)."
I don't see how they can evaluate the "alance" (uniformity???) of this
kind of functions?

For my problem. Let say that one file is fixed: it's the reference
with product the hash of reference. The other is choosen randomly
except the 4 first bytes. You can take as exemple a file size of 1Mo,
please.
It's very interresting!

PeterMacGonagan

unread,

Dec 15, 2009, 6:01:22 AM12/15/09

to

> I used elementary probability, do your own homework.

It's not a homework! It's just for me. I'm trying to understand how
these things work!

unruh

unread,

Dec 15, 2009, 11:28:12 AM12/15/09

to

On 2009-12-15, PeterMacGonagan <rodrigu...@gmail.com> wrote:
> Ok, thank you for all your answers.
>
> I read about the birthday paradox and I think I undestand it. As we
> can read, we can approximate the number of attempts to generate a
> collision using brute force by:
> Q(H)\approx \sqrt{\frac{\pi}{2}H}.
>
> For a 160-bit hash, there are 2^160 possibilities (approximately
> 1,4615e+48 different outputs). So, it'll take approximately 1.515e+24
> attempts to generate a collision (2^80 = 1.2e+24).

Well, depends on what you mean by "collision". It it means "find two
files with the same hash" you are right. If it means ( find a second
file with the same hash as this given file) then this is wrong. It would
take 2^159 on average.

Giuliano Bertoletti

unread,

Dec 19, 2009, 7:50:46 AM12/19/09

to

Hello,

when you speak about probability you have to define exacltly how the
assoicated experiment is setup. For an interesting example see:

http://en.wikipedia.org/wiki/Bertrand's_paradox_(probability)

What I mean is that:

if you have a fixed file (so a fixed hash) and you look for another
different file matching that hash, your experiment is equivalent to
drawing balls from a container which holds 2^160 different objects. So
the probability to find what you want from a single draw is 1/(2^160).

On the other hand if you only need to show a collision, you might
require two randomly files have the same hash and that's where the
birthday paradox kicks in. In this case indeed the result is roughly a
collision after an average of 2^80 drawings. Choose how many drawings
you're going to make and you can compute the probability.

Finally if a mathematical exploit is found which is faster then
bruteforce then the computational complexity required to find a
collision may be lower depending on how well such an exploit works and
in general this has little or nothing to do with the framework used above.

Cheers,
Giulio.

PeterMacGonagan ha scritto:

Peter MacGonagan

unread,

Dec 24, 2009, 6:49:13 AM12/24/09

to

Thank you. I understood.

Merry Christmas!