Hi Nil,
and See Chapters 4 and 5 of the Li&Vitányi book.
The usual formulation relies on measure theory, but it's perfectly fine to use your probabilistic formulation (in particular your second point). A bit more precisely but with some simplifications, a semi-measure µ assigns a probability weight µ(x_{1:oo}) to all infinite strings x_{1:oo}, and they must sum to 1 (or at most 1, hence "semi-"measure):
Note that some (many) strings may have a zero weight. (In particular, in the case of a deterministic µ, only a single infinite sequence has a unitary weight, all others have a zero weight.)
Then, the probability of a finite prefix string x_{1:t} is just the sum of the weights this measures assigns to all strings that start with this prefix:
The conditional probability follows by definition:
From a more practical point of view, you can also start with the conditional probability, ensuring it sums to at most 1, and work your way backwards as in your point 2.
Now define the Solomonoff-Levin mixture M, where p(µ) is the prior weight of µ, and is positive for any (semi-)computable µ (and they must sum to at most 1):
M(x_{1:t}) = \sum_µ p(µ) µ(x_{1:t})
Then we easily obtain that, for any µ and in particular the "source" µ* that generates the sequence x_{1:t}:
M(x_{1:t}) ≥ p(µ*) µ*(x_{1:t})
(assuming the source is (semi-)computable)
i.e., M dominates µ*. Since p(µ*) is a constant independent of x_{1:t}, M is almost as good as the source. We often use the logarithmic loss to compare two probabilistic models:
-log(M(x_{1:t}) ≤ -log(µ*(x_{1:t})) -log P(µ*)
or as a "redundancy" of M to µ*:
In particular if p(µ*)= 2^{-K(µ*)} then the redundancy of M compared to µ* is at most K(µ*). Looks pretty hard to beat that :)
Levin showed that the probabilistic formulation is equivalent (up to some constant dependent on the reference machine) to the Solomonoff's formulation with deterministic programs. Hutter showed that there exists a reference machine for which this constant is 0 (IIRC).
Hope this helps,
Laurent