Recombinant Hashes

Chris Dew

unread,

Aug 17, 2009, 3:36:59 AM8/17/09

to Hash Functions

Hi, I did a little bit of playing around with hash functions - I'm
very interested to see if I've discovered/invented something new, or
whether there is another name for a function which can accept lots of
hashes (of the same original data) and reconstitute the data.

http://finalcog.com/recombinant-hashes

If anyone is interested enough to want a demo, it will just take me a
couple of evenings to put a web interface on the front of it.

Thanks for any interest,

Chris.

Andres

unread,

Aug 17, 2009, 3:49:47 AM8/17/09

to Hash Functions

Chris,

In the page, you write

> A couple of years ago, while playing around with hash-function-generating-functions (as you do), I came up with a way of generating a (practically limitless) series of hash functions and a recombinator. Take a lump of data and use as many of these hash functions as you like to make hashes of the original data. Tell the recombinator the length (in bits) of the original data and give it any selection of the hashes, whose total length exceeds the size of the original data by at least one bit, and it will split out the original data. This allows the generation of far more hashes than are needed, and provided that a quorum are available, the original data can be retrieved. e.g. if you generated hashes, whose combined length was three times the length of the source data, then the source data could be recreated provided at least a third of the hashes were available.

If I am reading this right, you have a data set D and a recombinator R
which takes a number of hash functions f1, f2, fk so that the total
number of bits coming from the hash functions is larger than the
largest element in D. I am assuming this also means that the number
of bits given by R is such that |D| < 2^{bits given by R}.

To me, this says that R produces more integers than there are elements
in D. Hence, the elements of D can be numbered by R because R has
"integer labels to spare". Now, assuming that the recombinator does
not have collisions (d1 ~= d2 but R(d1)=R(d2)), clearly it is possible
to at least theoretically reconstruct d from R(d) (for some d in D),
even if by brute forcing D through R.

Am I understanding this correctly so far? If so, I have some
questions.

* How do you ensure that R has no collisions?

* How do you choose the hash functions you give to R?

* How does R combine the output of the hash functions? I am assuming
bit concatenation, is that so?

* I assume that R has the requirement of having no collisions. Thus,
at first sight, one could use one (or a couple) crypto hashes. After
all, assuming one uses two 128 bit hashes, one would need a monster
data set D to find a simultaneous collision in both hash functions and
thus in R. I'd expect that, in practice, even one 128 bit crypto hash
would be sufficient. Why the need for a recombinator?

* If the hash functions are required to be longer than the data, why
not use the data itself? Then, reconstruction of d from R(d) would be
trivial because d = R(d).

Do you have a concrete example you can share? Maybe I am thinking
this too abstractly and I am missing something.

Andres.

Chris Dew

unread,

Aug 17, 2009, 4:24:29 AM8/17/09

to Hash Functions

The hash functions aren't given to the recombinator, only the
hashes.

A maximum size of original data M
Orgininal data D
A set of hash functions f0...fn
Hashes of D, H0...Hn
Recombinator R

f0...fn and R are generated by one procedure with a parameter of M.
f0...fn are then applied to D to get H0...Hn
Provided a large enough subset of H0...Hn are provided to R, then it
can recreate D. R needs to know which hash function generated each of
H0...Hn, but only by sequence number.

> Am I understanding this correctly so far? If so, I have some
> questions.
>
> * How do you ensure that R has no collisions?

R has no collisions because there is a maximum size to the original
data.

>
> * How do you choose the hash functions you give to R?

The hash functions are generated by a hash function generator and are
uniquely identifiable with a simple sequence number. i.e. the fifth
hash function when max_data_size == 4096 bits.

>
> * How does R combine the output of the hash functions? I am assuming
> bit concatenation, is that so?

It needs to know which of the hash functions generated each hash that
the recombinator is given.

>
> * I assume that R has the requirement of having no collisions. Thus,
> at first sight, one could use one (or a couple) crypto hashes. After
> all, assuming one uses two 128 bit hashes, one would need a monster
> data set D to find a simultaneous collision in both hash functions and
> thus in R. I'd expect that, in practice, even one 128 bit crypto hash
> would be sufficient. Why the need for a recombinator?

Thinking about this one...

>
> * If the hash functions are required to be longer than the data, why
> not use the data itself? Then, reconstruction of d from R(d) would be
> trivial because d = R(d).

The hashes (in total) are required to contain one bit more data than
the original data. The advantage is that the hashes can be
distributed all over the internet, but only their locations need to be
known the get back the original data. Add to that the fact that you
can generate far more hashes than are required and so long as a quorum
are available you can get the original data back.

>
> Do you have a concrete example you can share? Maybe I am thinking
> this too abstractly and I am missing something.

Yes, I'll spend a couple of evenings creating a web demo and post it
here next weekend.

>
> Andres.

Andres

unread,

Aug 17, 2009, 4:32:09 AM8/17/09

to Hash Functions

Chris,

> Provided a large enough subset of H0...Hn are provided to R, then it
> can recreate D. R needs to know which hash function generated each of
> H0...Hn, but only by sequence number.

I get the impression that the hash functions must be of a known shape
so that their operation can be reversed.

> > Am I understanding this correctly so far? If so, I have some
> > questions.
>
> > * How do you ensure that R has no collisions?
>
> R has no collisions because there is a maximum size to the original
> data.

This also means that the hash functions won't have simultaneous
collisions over D (in general, I'd expect this to be unlikely... is it
a requirement though?).

> > * How do you choose the hash functions you give to R?
>
> The hash functions are generated by a hash function generator and are
> uniquely identifiable with a simple sequence number. i.e. the fifth
> hash function when max_data_size == 4096 bits.

Can you share the hash function generator?

> Yes, I'll spend a couple of evenings creating a web demo and post it
> here next weekend.

Cool! Let me know when it's available :).

Andres.

Chris Dew

unread,

Aug 17, 2009, 5:21:28 AM8/17/09

to Hash Functions

On 17 Aug, 09:32, Andres <andres.vall...@gmail.com> wrote:
> Chris,
>
> > Provided a large enough subset of H0...Hn are provided to R, then it
> > can recreate D. R needs to know which hash function generated each of
> > H0...Hn, but only by sequence number.
>
> I get the impression that the hash functions must be of a known shape
> so that their operation can be reversed.

Yes, they are. There is a 'meta' hash function, which when it is
partially applied to a set of values, become the set of the hash
functions.

>
> > > Am I understanding this correctly so far? If so, I have some
> > > questions.
>
> > > * How do you ensure that R has no collisions?
>
> > R has no collisions because there is a maximum size to the original
> > data.
>
> This also means that the hash functions won't have simultaneous
> collisions over D (in general, I'd expect this to be unlikely... is it
> a requirement though?).

Yes, as far as I can work out, it is a requirement and it is satisfied
by the set of generated hash functions. I may be wrong - I'm a
physicist by training and my discrete mathematics is all wikipedia-
taught.

>
> > > * How do you choose the hash functions you give to R?
>
> > The hash functions are generated by a hash function generator and are
> > uniquely identifiable with a simple sequence number. i.e. the fifth
> > hash function when max_data_size == 4096 bits.
>
> Can you share the hash function generator?

I'm considering writing a file backup/distribution/sharing system
using this code and releasing it under GPL. In the unlikely event of
getting a call from google, or another company, they may have other
plans.

>
> > Yes, I'll spend a couple of evenings creating a web demo and post it
> > here next weekend.
>
> Cool! Let me know when it's available :).

Will do.
>
> Andres.

Chris Dew

unread,

Aug 23, 2009, 7:50:37 AM8/23/09

to Hash Functions

A demo is now up and working (so long as a long-running process
doesn't die).

http://www.finalcog.com/recombinant-hashes-demo

Regards,

Chris.

On 17 Aug, 09:32, Andres <andres.vall...@gmail.com> wrote:

Reply all

Reply to author

Forward