Berry's Paradox and universal distribution

444 views
Skip to first unread message

Wei Dai

unread,
May 28, 2009, 7:56:54 AM5/28/09
to one-...@googlegroups.com
This is a reply to a part of Abram's "join post" on the everything-list
(http://dragonlogic-ai.blogspot.com/2008/11/interesting-mailing-list-i-recently.html).
I'm the owner of that mailing list but haven't been following it closely in
recent times, so I only just noticed Abram's post. I'm replying here since
the topic might be more suitable for this list.

Abram Demski wrote:
> As I said, I'm also interested in the notion of probability. I
> disagree with Solomonoff's universal distribution
> (http://en.wikipedia.org/wiki/Ray_Solomonoff), because it assumes that
> the universe is computable. I cannot say whether the universe we
> actually live in is computable or not; however, I argue that,
> regardless, an uncomputable universe is at least conceivable, even if
> it has a low credibility. So, a universal probability distribution
> should include that possibility.

This is my position as well: there is no reason to assume that the universe
must be computable, as Solomonoff's universal distribution does. For a while
I tried to learn enough logic to see what a truly universal distribution
would look like. But then I found an argument, related to Berry's Paradox,
that seems to show that any attempt at a truly universal distribution must
fail. (From Wikipedia: The Berry paradox is a self-referential paradox
arising from the expression "the smallest possible integer not definable by
a given number of words.")

The argument is very simple. Consider an arbitrary probability distribution
P, and the smallest integer (or the lexicographically least object) x such
that P(x) < 1/3^^^3 (in Knuth's up-arrow notation). Since x has a short
description, a universal distribution shouldn't assign it such a low
probability, but P does, so P can't be a universal distribution.

(I've stated variants of this argument before at [1] and [2], but this
version seems to be more general.)

Is there a way around this? If not, how should we (or an AI) deal with
uncertainty without being able to use a universal distribution as the prior?

[1] "is induction unformalizable?"
http://groups.google.com/group/everything-list/browse_thread/thread/c7442c13ff1396ec
[2] http://lesswrong.com/lw/f9/a_request_for_open_problems/bws

Eliezer Yudkowsky

unread,
May 28, 2009, 11:21:46 PM5/28/09
to One Logic
Solomonoff's universal distribution isn't trying to assume the
universe is computable. It's trying to beat any computable
probability distribution you could put over the universe. In other
words, if your attempt to *calculate*, to *talk about* and *reason
about* the universe is computable - even if you are talking *about*
systems that you think are uncomputable - then your ratiocinations
appear somewhere in Solomonoff induction.

On May 28, 4:56 am, "Wei Dai" <wei...@weidai.com> wrote:
> This is a reply to a part of Abram's "join post" on the everything-list
> (http://dragonlogic-ai.blogspot.com/2008/11/interesting-mailing-list-i...).
> [1] "is induction unformalizable?"http://groups.google.com/group/everything-list/browse_thread/thread/c...
> [2]http://lesswrong.com/lw/f9/a_request_for_open_problems/bws

Abram Demski

unread,
May 29, 2009, 3:31:29 PM5/29/09
to one-...@googlegroups.com
Eliezer,

Good point. Although, since there are other uncomputable distributions
that dominate the Solomonoff distribution, that description does not
uniquely recommend the Solomonoff distribution. One alternative would
be Jurgen Schmidhuber's universal distribution over what he called
"descriptions". But, that distribution is not even computable in the
limit, unlike the Solomonoff distribution. This could be used to argue
that the Solomonoff distribution is the "best we can do" in some
sense... but, even though we cannot approximate Jurgen's distribution
to arbitrary precision given enough processing power, I suppose we
could *partially* approximate it, using a theorem prover to prove
convergence for as many descriptions as possible... and, a similar
argument holds for even more difficult distributions.

The question is, is there a significant sense in which this is
different from approximating the Solomonoff distribution?

(For those on this list not familiar with these issues, I recommend
starting by looking up "algorithmic information theory".)

--Abram
--
Abram Demski
http://dragonlogic-ai.blogspot.com/

Wei Dai

unread,
May 29, 2009, 3:34:55 PM5/29/09
to One Logic
Eliezer Yudkowsky wrote:
> Solomonoff's universal distribution isn't trying to assume the
> universe is computable. It's trying to beat any computable
> probability distribution you could put over the universe. In other
> words, if your attempt to *calculate*, to *talk about* and *reason
> about* the universe is computable - even if you are talking *about*
> systems that you think are uncomputable - then your ratiocinations
> appear somewhere in Solomonoff induction.

That's an interesting perspective. I agree that a human being, assuming he
is computable, couldn't beat Solomonoff's universal distribution in a
prediction game. He might come to believe that the universe has uncomputable
physics, but that doesn't seem to help him gain an advantage over
Solomonoff.

But what if the human could augment himself with uncomputable physics, and
start thinking uncomputable thoughts? He could then have an uncomputable
prior, which could possibly beat Solomonoff's universal distribution. Can a
Bayesian agent do the same thing? My initial thought is "no", since a
Bayesian isn't supposed to change his prior. But a Bayesian agent can
perhaps build external calculating devices (e.g. hypercomputers) to help
himself, and maybe the combined system of agent+hypercomputer can be modeled
as an extended agent with an uncomputable prior. I'll have to think about
this some more...

Abram Demski

unread,
May 29, 2009, 3:40:59 PM5/29/09
to one-...@googlegroups.com
Wei Dai,

Glad to hear from you. I am struck anew by the simplicity of this
paradox every time I look at it.

For those on this list not familiar with the background material for
this paradox, it essentially relies on a formalization of Occam's
Razor. The basic assumption is that, in absence of evidence, the only
deciding factor between two theories is their length written down--
probability decreases as length increases. So, anything with a short
description length is fairly probable.

I've attempted to reply to this paradox before, but I only sort of
waved my hands and pointed to assumptions that might be questionable.

Right now, my first thought is that the lexographically least object
such that P(x) < 1/3 ^^^ 3 is ill-defined. (For those not familiar
with knuth's up-arrow notation, I recommend looking it up, just
because it is fun. Although, really, all you need to know is that
1/3^^^3 is a very small probability with a fairly short description.)

So, what plausible arguments make such an entity ill-defined?

Clearly it will not have a place on the Tarski hierarchy, since it
refers to all levels.

In a loopy theory, I'd think it would turn out to be one of those
things about which we must remain silent. (That is, a a loopy theory
like Kripke's theory of truth, but applied to probability as well...)

Of course, these strategies are (imo) not "referentially complete",
which is the whole point...

Overall, my feeling is that any treatment of the more standard
paradoxes tends to carry over to a treatment of this paradox.

--Abram

2009/5/28 Wei Dai <wei...@weidai.com>:

Wei Dai

unread,
May 29, 2009, 9:20:42 PM5/29/09
to one-...@googlegroups.com
Abram Demski wrote:
> I've attempted to reply to this paradox before, but I only sort of
> waved my hands and pointed to assumptions that might be questionable.

Did you post your previous reply somewhere?

> Right now, my first thought is that the lexographically least object
> such that P(x) < 1/3 ^^^ 3 is ill-defined.

But that object is well-defined for any particular P that's well-defined,
isn't it?

> Clearly it will not have a place on the Tarski hierarchy, since it
> refers to all levels.

I might be out of my depth here, but it seems to me that the
lexicographically least object x such that P(x) < 1/3 ^^^ 3 belongs to at
most one level above P on the Tarski hierarchy. Does the trouble start if we
try to quantify over all well-defined P and conclude that a well-defined
universal distribution doesn't exist? If so, we're still left with an easy
way to shoot down any particular well-defined candidate someone might
propose for a universal distribution.

BTW, I've found some academic research into whether it's ok to quantify over
"indefinitely extensible totalities". See section 11 of
http://www.st-andrews.ac.uk/~arche/old/pages/papers/wright-shapiro%201.pdf
for example. I'm not sure if any of that helps here though.

Abram Demski

unread,
May 30, 2009, 7:34:10 PM5/30/09
to one-...@googlegroups.com
Wei Dai,

> Did you post your previous reply somewhere?

I sent it to you, if I recall correctly.

All I did was question the assumption that description length should
be the only factor in determining probability. I still find this
objection plausible, but perhaps not the best objection. After all, I
do not take your paradox as *proof* that description length should not
be the only factor.

Does the paradox rely critically on an assumption that description
length should be the only factor? Or can you derive the paradox just
from the assumption that the prior should dominate other priors?

> Does the trouble start if we
> try to quantify over all well-defined P and conclude that a well-defined
> universal distribution doesn't exist?

That was my intention, yes.

> universal distribution doesn't exist? If so, we're still left with an easy
> way to shoot down any particular well-defined candidate someone might
> propose for a universal distribution.

Yes, sounds right. In such an approach the prior would fail to be
well-defined. The semantics of the logic would fail to be well-defined
according to itself, too, so that is not surprising.

So, what I am saying is that I do not agree with *current* solutions
to these problems, but for those solutions, it looks like everything
carries over just fine to your paradox. So it seems as if any *actual*
solution will have the same property.

On the other hand, a solution might still fall prey to your paradox,
but give a good story as to why the situation is acceptable. This
seems fairly plausible, too.

--Abram

Wei Dai

unread,
May 30, 2009, 10:46:06 PM5/30/09
to one-...@googlegroups.com
> I sent it to you, if I recall correctly.

Yes, I see that we had an email conversation on this topic last June. I
apologize for my poor memory.

> Does the paradox rely critically on an assumption that description
> length should be the only factor? Or can you derive the paradox just
> from the assumption that the prior should dominate other priors?

Ok, here's a version based on dominance: for any probability distribution P
definable at level L of the Tarski hierarchy, there exists a P' at level at
most L+1 which it doesn't dominate. Let P'(x) = 1/3^n if x is the
lexicographically least object such that P(x) < 1/3^^^n, otherwise P'(x) =
c*P(x), with c chosen so that P' sums to 1.

> So, what I am saying is that I do not agree with *current* solutions
> to these problems, but for those solutions, it looks like everything
> carries over just fine to your paradox. So it seems as if any *actual*
> solution will have the same property.

Yes, I'm hoping that's the case too, which is why I'm here. :)

> On the other hand, a solution might still fall prey to your paradox,
> but give a good story as to why the situation is acceptable. This
> seems fairly plausible, too.

I'm still thinking through implications of Eliezer's response, which seems
to say that the situation is acceptable. Does his response carry over to the
other paradoxes?

Abram Demski

unread,
Jun 2, 2009, 9:58:40 AM6/2/09
to one-...@googlegroups.com
Wei,

Hmmm. With respect to inductive priors, I understand Eliezer's
response as saying that what *I'm* doing, in asking for a better
prior, is (at best) only making *more specific* requirements on a
prior-- since it will be computably approximated, there is no way it
can strictly dominate the solomonoff prior. In other words: any
description-length-based prior over some computable deduction system
*is* a solomonoff prior, if the deduction system is turing-complete;
but I'm hoping it may also have some additional desirable properties.

(Such properties include explicit recognition of a halting oracle in
the environment, if there is sufficient evidence for it.)

How would Eliezer's response carry over to other paradoxes?
"First-order logic does not necessarily assume that the domain of
discourse is computable, but rather that the logic must be run on a
computer." Something like that?

--Abram

Wei Dai

unread,
Jun 10, 2009, 2:05:42 AM6/10/09
to one-...@googlegroups.com
> (Such properties include explicit recognition of a halting oracle in
> the environment, if there is sufficient evidence for it.)

I have found some discussion about how human beings might be able to
recognize a halting oracle:

http://www.cs.nyu.edu/pipermail/fom/2004-March/008010.html

> "First-order logic does not necessarily assume that the domain of
> discourse is computable, but rather that the logic must be run on a
> computer." Something like that?

But what's the counterpart of Solomonoff's Universal Distribution in this
analogy? How about this instead: "Consider a language L that contains all
truths that can be proved by computable minds. We know that there are truths
that L does not contain, but we can never prove any of them so there is no
point trying to extend L any further (assuming we are computable)." Can we
define this L formally?

Abram Demski

unread,
Jun 10, 2009, 2:58:15 PM6/10/09
to one-...@googlegroups.com
Wei Dai,

This proposal seems wrong to me. A language that contains all truths
that can be proved by computable logics will simply contain all truths
that can be stated finitely! For each truth, there is a computable
logic that just spits out that one statement. So, your logic would
just contain all true finite sentences. Practically, what this means
is, we can always usefully extend a computable logic by adding more
true axioms. (The trouble, of course, is knowing which axioms are
true.)

For me, this is an indication that finding the widest set of knowable
mathematical truths isn't the real problem (though it is still an
interesting problem). The more basic question is what the broadest set
of meaningful mathematical *statements* is. IE, what is the universal
language? Set theory? Kripke's theory of truth? Etc.

--Abram

Wei Dai

unread,
Jun 10, 2009, 11:34:15 PM6/10/09
to One Logic
Eliezer Yudkowsky wrote:
> Solomonoff's universal distribution isn't trying to assume the
> universe is computable. It's trying to beat any computable
> probability distribution you could put over the universe. In other
> words, if your attempt to *calculate*, to *talk about* and *reason
> about* the universe is computable - even if you are talking *about*
> systems that you think are uncomputable - then your ratiocinations
> appear somewhere in Solomonoff induction.

Damn my poor memory again. I just remembered that I already came up with an
example where a human being can beat Solomonoff's universal distribution.
It's at http://www.overcomingbias.com/2008/11/complexity-and/. But Eliezer
wasn't convinced that the game was fair.

Let me try a different example here. Consider a hyperintelligent being (i.e.
someone who has access to a hypercomputer) who wants to find a
(hypercomputable) input sequence and a computable algorithm that will
humiliate Solomonoff's universal distribution: the computable algorithm will
predict the sequence better than the UD, by an infinite margin. The game is,
in each round each agent will output a single prediction for the next
symbol. UD loses the game if the computable algorithm's number of correct
predictions minus its number of correct predictions goes to positive
infinity. Does this seem fair?

To make things simple, the HI chooses the computable algorithm to be "always
predict 0". And it chooses the input sequence by simulating the UD-based
predictor: if the predictor predicts 0 as the next symbol, then it chooses
1, otherwise it chooses 0. Now consider what happens. The computable
algorithm gains a point whenever the UD predicts a 1. And that will occur an
infinite number of times, because every time it predicts a 0 it will see a
1, and it will eventually predict a 1 after seeing enough 1s. (I can try to
prove this if it's not intuitively obvious to you.)

So, Solomonoff's universal distribution is guaranteed to beat (technically,
not lose by more than a finite margin) any computable distribution, *if the
input distribution is computable*, otherwise it can be beaten by a
computable distribution. Maybe this explains why we failed to "carry over"
Eliezer's response to the other paradoxes.

Eliezer Yudkowsky

unread,
Jun 11, 2009, 12:51:49 AM6/11/09
to One Logic
On Jun 10, 8:34 pm, "Wei Dai" <wei...@weidai.com> wrote:
>
> Let me try a different example here. Consider a hyperintelligent being (i.e.
> someone who has access to a hypercomputer) who wants to find a
> (hypercomputable) input sequence and a computable algorithm that will
> humiliate Solomonoff's universal distribution: the computable algorithm will
> predict the sequence better than the UD, by an infinite margin. The game is,
> in each round each agent will output a single prediction for the next
> symbol. UD loses the game if the computable algorithm's number of correct
> predictions minus its number of correct predictions goes to positive
> infinity. Does this seem fair?

Actually, no. I believe the proofs are over probabilities, and the
most elegant proofs are over accumulated log probabilities (Bayes-
score). So what we want to do is establish that in the limit, and
regardless of the environment, whether the environment is computable
or not, there is no computable algorithm whose accumulated log-score
beats the Solomonoff distribution - an (uncomputable) weighted mixture
of all computable mixtures. And then the proof seems relatively
trivial, since any computable algorithm has a finite weight in this
mixture. Indeed the difference in their scores is bounded by (the log
of) the initial finite weight, unless I've missed something.

So it's possible that you could put forth some sort of theorem about a
computable decision process being defeated by an uncomputable
environment. In fact this is quite easy - let a sequence of ladies
and tigers always be placed by the environment to make a decision
process perfectly wrong. Then it will be dominated by *any* other
process that is not perfectly identical to it, including, probably,
"always choose the door on the right", and so on.

But *epistemically* speaking, Solomonoff induction cannot do
infinitely worse than anything computable. In the lady and the tiger
case, I expect it would assign probabilities closer and closer to
exactly 50%, so that despite being "wrong" on every occasion - despite
assigning greater than 50% probability to the wrong answer on each
occasion - the margin of greaterness would become so tiny that it
would lose by only a finite amount to an algorithm that treated the
ladies and tigers as a fair coin. (And of course there is no
computable algorithm that actually makes the correct predictions!)

Wei Dai

unread,
Jun 11, 2009, 5:31:15 PM6/11/09
to One Logic
Eliezer Yudkowsky wrote:
> Actually, no. I believe the proofs are over probabilities, and the
> most elegant proofs are over accumulated log probabilities (Bayes-
> score).

Solomonoff's original proof was over probabilities, but other proofs are
over the number of errors. From http://www.hutter1.net/ai/perrbnd.pdf:

Another natural question is to ask for relations between the total number of
expected
errors E_ξ in Solomonoff prediction and the total number of prediction
errors Eμ in the
informed scheme. Unfortunately [9] does not bound E_ξ in terms of Eμ in a
satisfactory
way. For example it does not exclude the possibility of an infinite E_ξ
even if Eμ is finite.
Here we want to prove upper bounds to E_ξ in terms of Eμ ensuring as a
corollary that
the above case cannot happen.

> So what we want to do is establish that in the limit, and
> regardless of the environment, whether the environment is computable
> or not, there is no computable algorithm whose accumulated log-score
> beats the Solomonoff distribution - an (uncomputable) weighted mixture
> of all computable mixtures. And then the proof seems relatively
> trivial, since any computable algorithm has a finite weight in this
> mixture. Indeed the difference in their scores is bounded by (the log
> of) the initial finite weight, unless I've missed something.

I think you're right, but only if the computable algorithm isn't allowed to
use noncomputable notation to represent its beliefs. Suppose a human
believes that the next symbol is 0 with probability 1/BB(100), and the
payoff for guessing 0 correctly is BB(101), then he can decide to bet on 0
without actually being able to compute BB(100). So I see no reason why such
beliefs should not be allowed. But if you do allow such beliefs, then the
Solomonoff distribution is no longer guaranteed to be the best
epistemically.

So, given that the Solomonoff distribution is not guaranteed to be best
either epistemically or in decision making, there seems to be no reason to
stop looking for better alternatives.

Eliezer Yudkowsky

unread,
Jun 11, 2009, 8:23:00 PM6/11/09
to One Logic
On Jun 11, 2:31 pm, "Wei Dai" <wei...@weidai.com> wrote:
> Eliezer Yudkowsky wrote:
> > Actually, no.  I believe the proofs are over probabilities, and the
> > most elegant proofs are over accumulated log probabilities (Bayes-
> > score).
>
> Solomonoff's original proof was over probabilities, but other proofs are
> over the number of errors.

Hey, well, more the worse for them, I'd say. If you're dealing with
non-exceptional situations - non-devilish environments - then
shouldn't a proof of *epistemic* error-boundedness generally carry
over to a proof of decision error-boundedness? In other words, are
you sure you're not assuming that the environment is a
superintelligent adversary which is strictly more superintelligent
than you, which is the sort of reasoning that leads people to adopt
randomized algorithms? In general, even though I see epistemic
rationality as ultimately deriving from instrumental rationality, I
think that epistemics are simpler than decisions, and so I would trust
more a proof that began with epistemic correctness and moved from
there to instrumental correctness.

> > So what we want to do is establish that in the limit, and
> > regardless of the environment, whether the environment is computable
> > or not, there is no computable algorithm whose accumulated log-score
> > beats the Solomonoff distribution - an (uncomputable) weighted mixture
> > of all computable mixtures.  And then the proof seems relatively
> > trivial, since any computable algorithm has a finite weight in this
> > mixture.  Indeed the difference in their scores is bounded by (the log
> > of) the initial finite weight, unless I've missed something.
>
> I think you're right, but only if the computable algorithm isn't allowed to
> use noncomputable notation to represent its beliefs. Suppose a human
> believes that the next symbol is 0 with probability 1/BB(100), and the
> payoff for guessing 0 correctly is BB(101), then he can decide to bet on 0
> without actually being able to compute BB(100). So I see no reason why such
> beliefs should not be allowed. But if you do allow such beliefs, then the
> Solomonoff distribution is no longer guaranteed to be the best
> epistemically.

I'm not clear on what you mean by this. The Solomonoff distribution
contains all computable and consistent probability mixtures. Is the
idea that there are concrete decision problems where we can do better
by admitting computable yet inconsistent probability mixtures? This,
to say the least, is a non-obvious idea, and it needs to be cashed out
a bit more clearly in terms of the admissible language of the problem
and its solution.

Any concrete sequence of sensory data can be predicted by Solomonoff
as well as by any computable consistent mixture, up to a bounded total
probabilistic error. So what sort of decision problem are we talking
about, with a computable decision algorithm, which goes outside this
formalism, in a way that enables it to exhibit decisive instrumental
superiority despite its lack of decisive epistemic superiority? Is it
only pathological cases with adversarial superior superintelligences
that can would be expected to produce an infinite series of lady-and-
tiger bad bets, or is there some more natural case?

> So, given that the Solomonoff distribution is not guaranteed to be best
> either epistemically or in decision making, there seems to be no reason to
> stop looking for better alternatives.

By the way, if at any point I stop making sense - I'm being briefer
here than I am in my blog posts - be sure to let me know.

Abram Demski

unread,
Jun 11, 2009, 9:10:53 PM6/11/09
to one-...@googlegroups.com
Eliezer, Wei Dai,

I am thinking that a better way of interpreting Wei Dai is to add
*incompletely specified* probability distributions, not inconsistent
ones. So, for example, a probability distribution that is described in
terms of the busy beaver function would be incompletely specified,
because the logic we're using will only be able to prove some facts
about it.

In Wei Dai's example, the epistemology and the decision theory
interact in an interesting way-- so that although we cannot compute
the utility, we know something about how it compares to the
probability.

It is hard to see exactly how this would lead to action, however,
since a specific utility value is not known.

--Abram

Wei Dai

unread,
Jun 11, 2009, 11:00:48 PM6/11/09
to one-...@googlegroups.com
To restate my idea, a computable decision process isn't necessarily limited
to using a computable probability distribution. Suppose he has a formally
specified, consistent probability function that isn't computable. He may be
able to prove that EU(A) > EU(B) even if he can't compute EU(A) and EU(B),
where A and B are choices that he faces. (This can be the case whether or
not his utility function is computable.) This way, he can possibly make
better decisions than an agent using Solomonoff's distribution. If you ask
him to give you a prediction about something as a probability, he might have
to give you the probability in a non-algorithmic notation, like 1/BB(100).
Internally, he is also reasoning about probabilities and expected utilities
in these non-algorithmic forms, so it seems reasonable to say that he in
fact has P(0) = 1/BB(100) as an epistemic belief.

Does that clear it up?

(Of course in general he's likely to have choices C and D such that he can't
prove either EU(C) > EU(D) or EU(D) > EU(C). How to make decisions in cases
like that would depend on a theory of how to handle mathematical
uncertainty, which I don't have...)

Eliezer asks for more natural examples. That's reasonable and something I'll
think about.

Wei Dai

unread,
Jun 12, 2009, 3:02:23 PM6/12/09
to one-...@googlegroups.com
I've come up with a more natural example of a decision problem where
Solomonoff's distribution does infinitely worse than a human/computable
agent. It doesn't depend on an adversarial superintelligence. Consider the
sequence of unary encodings of BB(1), BB(2), BB(4), ... again. It occurs to
me that Solomonoff's distribution actually makes two kinds of mistakes. The
one I stated at http://www.overcomingbias.com/2008/11/complexity-and/ was
that after seeing 1s long enough, it eventually predicts a probability for 0
that's too low. The other mistake is, for a quite a while after seeing a 0,
it predicts a probability for 0 that's too high. It's easier to construct an
example using the latter error instead of the former.

So consider a game where in each round the player is asked to bet on 0 or 1.
The payoff for 0 in round i is equal to 1 minus the i-th bit of the sequence
of unary encodings of BB(1), BB(2), BB(4), ..., and the payoff for 1 is
always 1/2^100.

It seems clear that for a human, except for the first few rounds where he
has a reasonable chance of guessing BB(2^n) correctly, the best strategy is
to always bet on 1. Consider the situation after he sees BB(2^n). He knows
that if he keeps betting on 0, by the time he gets a payoff of 1, he could
have gotten a total payoff of BB(2^(n+1))/2^100 by betting on 1 instead.

The uncomputable probability distribution that the human is using here is
P_i(1) = i-th bit of BB(1), BB(2), BB(4), ... (This is different from what I
suggested before.) He doesn't know how to compute this, obviously, nor the
expected utility of betting on 0 in any particular round, but he can prove
that the strategy "always bet on 1" has a higher expected payoff than
"always bet on 0" and it is in fact better than any other computable
strategy.

What about the agent using Solomonoff's distribution? After seeing
BB(1),...,BB(2^n), the algorithmic complexity of BB(1),...,BB(2^n) is sunk,
so to speak. It will predict a higher expected payoff for playing 0 in any
round i where the conditional complexity K(i | BB(1),...,BB(2^n)) < 100.
This includes for example 2*BB(2^n), 2*BB(2^n)+1, BB(2^n)^2 * 3 + 4,
BB(2^n)^^^3, etc. It will bet on 0 in these rounds (erroneously, since
K(BB(2^(n+1)) | BB(2^n)) > 100 for large n), and therefore lose relative to
a human.

Is this example more convincing?

Oh, in case there's objection that the frequency of Solomonoff's
distribution losing goes to 0 too quickly in this game, we can replace the
sequence BB(2^n) by some other mathematically well-defined sequence S(n)
such that K(S(n+1) | S(1), ..., S(n)) > 100, but S(n) doesn't increase so
quickly. For example, let S(n+1) be the smallest integer such that K(S(n+1)
| S(1), ..., S(n)) > 100. This sequence increases slowly, since S(n) + a
random 101-bit number has conditional complexity more than 100, so S(n+1)
can't be bigger than S(n)+2^101.

Wei Dai

unread,
Jun 19, 2009, 3:26:00 AM6/19/09
to one-...@googlegroups.com
I wrote earlier:

> To restate my idea, a computable decision process isn't necessarily
> limited
> to using a computable probability distribution. Suppose he has a formally
> specified, consistent probability function that isn't computable. He may
> be
> able to prove that EU(A) > EU(B) even if he can't compute EU(A) and EU(B),
> where A and B are choices that he faces.

I recalled that Jürgen Schmidhuber's Gödel Machine already contains ideas
that are very similar to the above. See specifically question 4, page 22 in
ftp://ftp.idsia.ch/pub/juergen/gm6.pdf. So I'll recap how my position
differs from his:

1. A Gödel machine starts with an initial strategy and tries to find
provable improvements to it. Since humans can handle mathematical
uncertainty, I think machines should be able to as well (even if we don't
know how yet). Therefore I suggest that a machine should look for *likely*
improvements that are not necessarily provable.

2. A Gödel machine has fixed "environment axioms", essentially a
generalization of the Bayesian prior. Schmidhuber writes "All such prior
assumptions are perfectly formalizable in an appropriate A (otherwise we
could not write scientific papers about them)." But as I pointed out earlier
in this thread, we don't know how to formalize Occam's Razor, if we
interpreted it as something like "the environment likely has a short
mathematical description".

Abram Demski

unread,
Jun 19, 2009, 12:45:16 PM6/19/09
to one-...@googlegroups.com
Wei Dai,

Yesterday, I wrote up some thoughts on why uncomputable models might
be useful in everyday life:

http://dragonlogic-ai.blogspot.com/2009/06/importance-of-uncomputable-models-i.html

--Abram
Reply all
Reply to author
Forward
0 new messages