It is my first time in this news group, and I hope the question will not
sound to stupid.
I would like to know if the existence of a logarithmic compression(e.g.
1M -> O(log(1M)) which run in EXPTIME
is already achieved somewhere.
Best Regards,
Yes, provided that the data you want to compress is in a form that can
be described by the output of a program that inputs a number N and
outputs N symbols of data. Some examples of such data might be
- a sequence of N zero bits
- the first N digits of pi
- N zero bits encrypted with AES-256 in CBC mode with key "foobar23".
Not all data can be represented by a program that depends on N. Text
and images probably do not. If that is the case, then the algorithm
described below may fail:
The compression algorithm is to try all possible programs in
lexicographcial order starting with the smallest and up to size O(log
N), running each one with input N for at most O(2^N) time, and testing
whether it outputs the data you are trying to compress. If it does,
output the program and N. To decompress, run the program with input N.
The compression algorithm runs in exponential time. If you restrict
each test program to run in polynomial time, then compression also runs
in polynomial time and it would still compress the examples I gave.
Even so, it would be too slow to be practical.
This isn't a homework question, is it?
-- Matt Mahoney
Hint: Ask that question *before* you give him the answer :-)
regarding the home work issue, you can all be rest assure that it is not
homework.
regarding Matt Mahoney,
1. "output the program and N" - if N>O(log N) then compression > O(log N)
2. "try all possible programs" - which mean any PROGRAM where
1<|program|<log N e.g. 2^(log N) programs any one of the program can run 2^N
->2^N^(2^logN)>EXPTIME
3. lets P be a program and X input, [[P]]X does not stop or >=EXPTIME and
[[P]]Y is the correct one,but since X<Y we would not find it.
from 2,3 we do not cover all the possible options.
4. you did not prove that exist such a P .
5. P will not be genric algorithm for any N.
did not understood the polynomial time?
what I think I have is a program which can be compiled,and it is a
logarithmic compression program which run in EXPTIME,
and since it is not my subject the question is :
is it already achieved somewhere?
Let me try to state more precisely what I mean. First, there is no
algorithm that will compress all possible inputs by even one bit, much
less logarithmically. However if your data has a certain form which I
gave examples for, but need to define more precisely, then there is an
algorithm to compress it logarithmically and this algorithm runs in
EXPTIME. In fact, with some minor assumptions, the compression
algorithm runs in polynomial time, which is much faster. By polynomial
time I mean it runs in O(n^k) time where n is the size of the input and
k is some finite integer. By EXPTIME I assume you mean O(C^n) for some
constant C > 1.
First let me define logarithmic compression. Let S be an infinite
sequence of strings, such that the i'th element S_i of S has length i,
and is obtained by appending one symbol from a fixed alphabet (say 0
and 1) to the (i-1)'th element. (The 0'th element is the empty
string). Then define logarithmic compressibility as follows: there
exists a compression function f (an injection from strings to strings)
and positive constant C such that the number of S_i for which |f(S_i)|
> C*log(i) is finite. In other words, the compressed output has size O(log n).
The examples I gave earlier are logarithmically compressible, but not
all sequences are. For example, the sequence S where the i'th bit is 1
if the i'th program in some language L halts, and 0 otherwise, is not
logarithmically compressible (or even computable).
So let me address your questions.
1. To output the program and N means to output a binary representation
of N, which has O(log N) size. The program P has fixed size, so the
total size is O(log N).
2. The compression algorithm, f, is to try the first Ca * A^N programs
for some constants Ca > 0, A > 1, and let each one run for Cb * B^N
time for some Cb > 0, B > 1 to test whether it outputs S_N. So the run
time is Ca * A^N * Cb * B^N = Ca * Cb * (A+B)^N which is in EXPTIME.
If such a program P exists, then f will find it for sufficiently large
Ca, A, Cb, B, and N.
3. f is not guaranteed to find the smallest P such that P(i) = S_i,
just find one such P, if one exists.
4. There are some sequences for which P does not exist, in which case
f fails. I gave one example. There are probably others for which P
doesn't exist, but we can't prove it. The counting argument requires
that P not exist for the vast majority of sequences. However P does
exist for many useful sequences, namely those which you can describe as
being generated by a computable process, such as the sequence of prime
numbers.
5. P doesn't need to be generic. It is the output of the compression
program and is specific to S.
f could be implemented on a Turing machine, but is too slow to be
practical. So to answer your question, probably nobody has written a
*useful* compressor this way.
-- Matt Mahoney