Redundancy Required to Overcome Data Loss

TC

unread,

Dec 24, 2005, 6:01:26 PM12/24/05

to

I suspect this is a classic problem in information theory, but it is
new to me.

I have a message of N bits. I want to represent this message in V
separate volumes of Y bits each. The message must be coded such that
the entire message can be reconstructed from any subset of X volumes.
What is the mathematical relationship between N, V, X, and Y?

>From simple thought experiments, I can demonstrate that
Y=N/X when X=V;
Y=N/X when X=V-1; and
Y=N/X when X=1.
>From this, I'm tempted to conclude that Y=N/X, but I can't prove it; in
fact, it seems to be false for specific cases (e.g. when N=2, V=4, and
X=2). What is the true relationship between the variables? Can anyone
point me to the answer?

-TC

Paul Rubin

unread,

Dec 24, 2005, 6:08:14 PM12/24/05

to

"TC" <golem...@yahoo.com> writes:
> >From this, I'm tempted to conclude that Y=N/X, but I can't prove it; in
> fact, it seems to be false for specific cases (e.g. when N=2, V=4, and
> X=2). What is the true relationship between the variables? Can anyone
> point me to the answer?

VY >= N.

TC

unread,

Dec 24, 2005, 8:16:10 PM12/24/05

to

Paul,

Thank you for the reply. Yes, I can see that

VY >= XY >= N >= Y

Unfortunately, that's a broad range; I had hoped for an answer which
would give me a more precice value for Y. From the terseness of your
answer, should I infer that you know of no narrower solution?

-TC

Paul Rubin

unread,

Dec 24, 2005, 8:45:53 PM12/24/05

to

"TC" <golem...@yahoo.com> writes:
> VY >= XY >= N >= Y
>
> Unfortunately, that's a broad range; I had hoped for an answer which
> would give me a more precice value for Y. From the terseness of your
> answer, should I infer that you know of no narrower solution?

Take the simple case where (V-1)Y=N. That is, simply chop up your
data into V-1 equal sized slices. Let the V'th share be the xor of
the first V-1 shares. Now if you lose any one share, you can
reconstruct it by xor'ing together the remaining ones. This is
roughly how RAID-5 storage systems usually work.

If you want to get fancier, like you want ten slices where any seven
can reconstruct the data, there are schemes for that too, but they're
more complicated than just xor'ing. Basically you use a type of error
correcting code called a deletion code. Example: say you have V data
points D1,D2,...,DV. Think of the ordered pairs (1,D1), (2,D2),
... (V,DV) as points on a curve. They completely determine the degree
V-1 Lagrange interpolation polynomial passing through those points.
Now you can compute a bunch of additional points on the curve, say at
V+1, V+2, etc. Now you can take any V of the resulting collection
of points and reconstruct the polynomial and evaluate it at 1,...,V
to get back the original points.

If the data points are k bits wide and you do this in the field
GF(2**k), I'm told the resulting scheme turns out to be the same as a
Reed-Solomon code, but I haven't worked that out myself.

TC

unread,

Dec 25, 2005, 12:15:21 AM12/25/05

to

Thank you for the explanation. I found it extremely useful. Following
your lead, I did some research on Reed-Solomon codes, and I believe I
now have answers to my questions.

-TC

Message has been deleted