I have a message of N bits. I want to represent this message in V
separate volumes of Y bits each. The message must be coded such that
the entire message can be reconstructed from any subset of X volumes.
What is the mathematical relationship between N, V, X, and Y?
>From simple thought experiments, I can demonstrate that
Y=N/X when X=V;
Y=N/X when X=V-1; and
Y=N/X when X=1.
>From this, I'm tempted to conclude that Y=N/X, but I can't prove it; in
fact, it seems to be false for specific cases (e.g. when N=2, V=4, and
X=2). What is the true relationship between the variables? Can anyone
point me to the answer?
-TC
VY >= N.
Thank you for the reply. Yes, I can see that
VY >= XY >= N >= Y
Unfortunately, that's a broad range; I had hoped for an answer which
would give me a more precice value for Y. From the terseness of your
answer, should I infer that you know of no narrower solution?
-TC
Take the simple case where (V-1)Y=N. That is, simply chop up your
data into V-1 equal sized slices. Let the V'th share be the xor of
the first V-1 shares. Now if you lose any one share, you can
reconstruct it by xor'ing together the remaining ones. This is
roughly how RAID-5 storage systems usually work.
If you want to get fancier, like you want ten slices where any seven
can reconstruct the data, there are schemes for that too, but they're
more complicated than just xor'ing. Basically you use a type of error
correcting code called a deletion code. Example: say you have V data
points D1,D2,...,DV. Think of the ordered pairs (1,D1), (2,D2),
... (V,DV) as points on a curve. They completely determine the degree
V-1 Lagrange interpolation polynomial passing through those points.
Now you can compute a bunch of additional points on the curve, say at
V+1, V+2, etc. Now you can take any V of the resulting collection
of points and reconstruct the polynomial and evaluate it at 1,...,V
to get back the original points.
If the data points are k bits wide and you do this in the field
GF(2**k), I'm told the resulting scheme turns out to be the same as a
Reed-Solomon code, but I haven't worked that out myself.
-TC