Chris,
In the page, you write
> A couple of years ago, while playing around with hash-function-generating-functions (as you do), I came up with a way of generating a (practically limitless) series of hash functions and a recombinator. Take a lump of data and use as many of these hash functions as you like to make hashes of the original data. Tell the recombinator the length (in bits) of the original data and give it any selection of the hashes, whose total length exceeds the size of the original data by at least one bit, and it will split out the original data. This allows the generation of far more hashes than are needed, and provided that a quorum are available, the original data can be retrieved. e.g. if you generated hashes, whose combined length was three times the length of the source data, then the source data could be recreated provided at least a third of the hashes were available.
If I am reading this right, you have a data set D and a recombinator R
which takes a number of hash functions f1, f2, fk so that the total
number of bits coming from the hash functions is larger than the
largest element in D. I am assuming this also means that the number
of bits given by R is such that |D| < 2^{bits given by R}.
To me, this says that R produces more integers than there are elements
in D. Hence, the elements of D can be numbered by R because R has
"integer labels to spare". Now, assuming that the recombinator does
not have collisions (d1 ~= d2 but R(d1)=R(d2)), clearly it is possible
to at least theoretically reconstruct d from R(d) (for some d in D),
even if by brute forcing D through R.
Am I understanding this correctly so far? If so, I have some
questions.
* How do you ensure that R has no collisions?
* How do you choose the hash functions you give to R?
* How does R combine the output of the hash functions? I am assuming
bit concatenation, is that so?
* I assume that R has the requirement of having no collisions. Thus,
at first sight, one could use one (or a couple) crypto hashes. After
all, assuming one uses two 128 bit hashes, one would need a monster
data set D to find a simultaneous collision in both hash functions and
thus in R. I'd expect that, in practice, even one 128 bit crypto hash
would be sufficient. Why the need for a recombinator?
* If the hash functions are required to be longer than the data, why
not use the data itself? Then, reconstruction of d from R(d) would be
trivial because d = R(d).
Do you have a concrete example you can share? Maybe I am thinking
this too abstractly and I am missing something.
Andres.