Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Compressing "random" data? They are not that random, actually we have a HINT...

50 views
Skip to first unread message

Waseihou Watashi

unread,
Mar 17, 2012, 7:53:25 PM3/17/12
to
Let's describe a problem that I want to solve. I need to compress
blocks of seemengly random data, which are created in a way described
there:

http://en.wikipedia.org/wiki/OFFSystem

Suppose that block size is always 128Kb for all blocks.

A compressible block of data A is xored with a block of data R that is
random perfectly random and saved into block B, which is then also
statistically perfectly random. Blocks R and B can be downloaded so
that anyone who has knowledge that B = R xor A can obtain A = R xor
B. I need to be able to construct an algorithm that will compress
block B so that it can be send through computer network with less
overhead.

Knowledge of compresing agent:

blocks A,R,B
knowledge B = A xor R

output: compressed block B1 and an decompressing algorithm L so that H
+L < A
There is no point in compressing R because it is perfectly random and
cannot be compressed.

Knowledge of decompressing agent:
blocks B1,R
knowledge that B1 is block B which can be decompressed by algorithm L
knowledge that A = B xor R
so it's obvious here how to decompress it...

The question is that if A is nonrandom (for example Word document,
well compressible file), then an effective algorithm L can be found
for this case. The algorithm MUST NOT contain the information that B =
A xor R, A = B xor R, R = A xor B because L is to be made public and
it must be assured that the node does not know this information for
plausible deniability purpose.

I do not need to know why the method works. I need to know:
a) if it's possible
b) functional description of the method so that it can be implemented

Thomas Richter

unread,
Mar 18, 2012, 8:49:43 AM3/18/12
to
> b) functional description of the method so that it can be implemented.

Unfortunately, I do not know enough about "OFF" to really judge, but for
me it looks like you want to find a compression algorithm of sources
A,B,C that, while operating independently, are correlated. Here A,B,C
would be the source nodes of your P2P network, and the receiver would
require all input messages to reconstruct the output. If this type of
setting applies, then you're in the domain of "multi-channel-coding",
and the answer whether compression is possible in this case is "yes",
even though it is a bit counter-intuitive. There is an abstract theorem
called the "Slepian-Wolf Theorem" that says that in this situation, the
sources can be compressed independently such that the overall rate
(channel capacity) required to transmit this message over multiple
channels is, up to a small error, identical to the channel capacity
required to transmit the same message from a single source over a single
channel. IOW, it should be theoretically possible to encode
*independently* (i.e. without knowledge from the other parts of the
message) at each of the sources and still reach a rate that is not much
larger than a rate from compressing the sources combined.

I'm certainly not an expert in multi-channel coding, but it is clearly a
hard problem and only approximate solutions to this problem have been
found. While I can't give you a receipt how to make such a scheme
working in general, I can at least quote a couple of papers where
implications of this theorem are discussed to reach compression even in
case the input data is "seemingly random".

This paper here from the Proceedings of the 2009 DCC:

On Compression of Data Encrypted with Block Ciphers
Demijan Klinc, Carmit Hazay, Ashish Jagmohan‡, Hugo Krawczyk,
and Tal Rabin

discusses an application of Slepian-Wolf to compression of encrypted
data where encryption is based on block-cyphers. It is there shown that
you can compress an, say SSL-encoded stream, without actually having to
know the cypher, i.e. you do not need to decode the message to be able
to encode it. It is probably the closest match to your question.

Here is another application of Slepian-Wolf that might fit into your
application:

A Reliable Chunkless Peer-to-Peer Architecture for Multimedia Streaming
R. Bernardini, R. Rinaldo, and A. Vitali

Again, I cannot provide a ready-made algorithm since I do not know
enough about "OFF" and multi-channel coding in general, but I believe
that you *can* actually compress in your setting, though I believe it is
certainly non-obvious and requires quite some work from your side.

Good luck,
Thomas
0 new messages