Levin’s coding theorem can be written as follows:
K(x) = -log m(x) +O(1)
I claim that there exists a similar theorem that would be more useful because the O(1) would be bounded by a smaller constant. In addition, the proof would be more natural and less contrived. The relaxation can be done by replacing K(x) with its expectation value E(K(x)). So the new theorem would be:
E(K(x)) = -log m(x) +O(1)
And here is my proof outline:
E(K(x)) > -log m(x) is trivial because of 2^-K(x) < m(x). Therefore, our focus is on proving the opposite inequality: E(K(x)) < -log m(x) +O(1).
Vaguely speaking, it says that if there exists a large percentage of long programs that output x, we can expect there to also exist a short program that outputs x. Now the main proof idea is that due to the sheer number of long programs up to some length, statistically, we should expect at least one of these long programs to be highly compressible itself, such that the short program could be obtained directly from it.
In order to show this, we just need to calculate the expected minimum of K(p) over all programs p that output x. To avoid confusion, note that this is a minimum of a minimum, since K(p) is itself already a minimum as well.
The result of this calculation should then equal -log m(x) +O(1) or so.
Deriving this result should not be too hard, if we know the probability distributions of K(p) for given L(p), which we can write as:
P( K(y)=k | L(y)=n ).
I don’t know however a theorem where such a probability distribution was derived. So some help would be appreciated here. I would expect it to be roughly of the following form:
P( K(y)=k | L(y)=n ) ≅ 2^(k-n-O(1))
Sidenote: As an improvement of the proof, one could take into account that the programs p are not allowed to be prefixes of each other, which means that, given the knowledge of this restriction, each program p will actually carry around log(L(p)) less algorithmic information than K(p).
Can you please clarify what you mean by expected value? Over what probability measure? And if you take mean value over x, how can it still be a function of x? The expected value means p(x_1)*K(x_1)+p(x_2)*K(x_2)+…
P( K(y)=k | L(y)=n ) ≅ 2^(k-n-O(1))
And I was hoping that someone would already know such a theorem.
And I should also mention that this formula would only be valid for k<n+log(n)+O(1) and the PD would be zero for larger k,
since generally K(y)<L(y)+log(L(y))+O(1).
So the PD increases exponentially with k until it suddenly drops to zero.
I just read the proof of Levin's coding theorem in the book 'an introduction to kolmogorov complexity and its applications' Section 4.3.4, Theorem 4.3.3.The proof takes almost two pages and at the end it mentions an additive constant of 4 bits: K_T ≤ -log m(x) + 4. However, somewhat disappointingly, the Kolmogorov complexity K_T is based on a somewhat contrived reference machine T which is not going to be the simplest machine and thus the final additive constant will be much larger than 4. In practice, the usefulness of the theorem depends on how small this constant is. At least the proof shows that the constant will not be astronomical. But due to the exponential growth of probability w.r.t. the number of bits, I would still consider it useful for the constant to be quite small.
-Has someone estimated how large this constant actually is?