self-referential probability theory

25 views
Skip to first unread message

Abram Demski

unread,
Aug 25, 2009, 8:09:43 PM8/25/09
to one-...@googlegroups.com
Hi all,

The following blog post (by me) rants for a while about how terribly
annoying Tarski's Undefinability theorem is, and then presents my
current thinking on how to
"get around" the bloody thing (or, why it is OK that we can't):

http://dragonlogic-ai.blogspot.com/2009/08/climbing-mountain-undefinability-is.html

Basically why I'm suggesting is that it's OK that no rational system
can fully understand itself, because it can learn about itself. It
becomes a different system in doing so, and so it can't ever really
finish, but this is not a death spiral; rather, it is a form of
self-improvement.

As I say there, I think this view really needs to be formalized before
it can be properly evaluated... what I mean to do in this email is
outline what I think that formalization looks like.

The initial seed of an idea came when I posted a short "paradox" to
the AGI list run by Ben Goertzel (http://www.agiri.org/email/).
Thinking of the Liar paradox, and of the idea that probability is a
real-valued generalization of boolean truth values, I came up with the
idea that a "naive probability theory" (a probability theory which
allows self-referential sentences) must be inconsistent just like the
theory of naive truth. This amused me greatly, so I posted the
argument.

Dr. Matthias Heger showed that my argument had a (rather obvious in
hindsight) flaw; the equivalent of Tarski's T-schema does not hold. We
cannot reason "if X, them P(X)=1", nor can we reason "If not X, then
P(X)=0." Thus, self-referential probability theory appears to be
consistent (it is immune to the most obvious translation of the Liar
paradox, at any rate).

So, my idea is this: perhaps a system which has probabilistic
self-reference can learn about itself in the way that I am proposing,
so that over time it can climb the Tarski hierarchy rather than
remaining stuck on any particular level.

So, here's my beginning of a formalization for that....

Start by adding syntactic self-reference to first-order logic, such as
arises from Godel numbers within Peano arithmetic. Note, however, that
as I add more axioms and inference rules to the system, I mean it to
be understood that the self-reference is similarly modified so that
the system keeps accurately reflecting its whole set of axioms and
rules.

Add to the system the standard axioms of probability theory, operating
upon it's own statements (with "and" playing the role of set
intersection, "or" playing the role of set union, "for all" playing
the role of countable intersection, and "exists" playing the role of
countable union, and of course "not" playing the role of set
compliment).

Also, add a method for doing a Bayesian update of the probability
distribution. This gets messy, because we can't possibly just update
the whole distribution in one big step when a new piece of evidence
presents itself; so, we need a mechanism for proving things about bits
of the current distribution we're worried about. This is one of the
holes in the formalization that I need to fill in.

In addition to updating the probability distribution based on data
coming in from the outside world, I want the system to update based on
facts it derives about itself and about mathematics in general. The
update method has more constraints placed on it by this desire: since
the axioms imply everything else, and since events that are logically
implied by events having probability 1 also must have probability 1,
it looks as if we've got nothing to update on; our prior will already
agree with all the consequences of our axioms. But, that isn't what we
want; we want mathematical facts to still "feel" to the system like
they require explanation, so that it will invent concepts to help
explain them. This means that we should give the axioms a probability
less than 1 in the prior, and then start updating based on them. But,
an update rule that can do this in a satisfying way is still needed.

Finally, the prior itself needs to be specified. My current assumption
is that it would be a length-based prior. But, it needs to conform to
the axioms of probability theory, too.

--Abram Demski

Vladimir Nesov

unread,
Aug 26, 2009, 6:11:21 AM8/26/09
to one-...@googlegroups.com
Did you read

JY Girard From foundations to ludics (
http://iml.univ-mrs.fr/~girard/bsl.pdf.gz )? There might be something
to this...

--
Vladimir Nesov

Abram Demski

unread,
Aug 28, 2009, 6:18:57 PM8/28/09
to one-...@googlegroups.com
Vladimir,

Thanks! I had intended to read about ludics a few years ago, actually,
and never got around to it.

So far (about 1/3 through) the paper is very little but scoffing at
the standard approach to foundations... but I am being patient with
it...

--Abram
--
Abram Demski
http://dragonlogic-ai.blogspot.com/
http://groups.google.com/group/one-logic

Vladimir Nesov

unread,
Aug 28, 2009, 6:45:52 PM8/28/09
to one-...@googlegroups.com
On Sat, Aug 29, 2009 at 2:18 AM, Abram Demski<abram...@gmail.com> wrote:
>
> Vladimir,
>
> Thanks! I had intended to read about ludics a few years ago, actually,
> and never got around to it.
>
> So far (about 1/3 through) the paper is very little but scoffing at
> the standard approach to foundations... but I am being patient with
> it...
>

It's somewhat specific scoffing though, that gives a direction. I
haven't read about ludics itself yet, and I'm not sure I'll go there
soon. Just something to get around the perspective that leads to
standard limitations of meta.

I'm trying to grok Linear Logic as a way of describing uncertainty
about a process (or uncertainty/knowledge of a process about its
environment), unsuccessfully as yet. Next on the menu is its
connection to models of concurrency.

--
Vladimir Nesov

Abram Demski

unread,
Aug 30, 2009, 5:18:46 PM8/30/09
to one-...@googlegroups.com
Vladimir,

That seems wise... it does look like I lack some of the background
that is being assumed in this exposition of Ludics. Most notably (so
far) I am unfamiliar with "realizability" and not too familiar with
linear logic. I'll see how much I can understand without taking the
time to fully study those things first, though.

Since you also follow www.lesswrong.com, I imagine you noticed at the
end of http://lesswrong.com/lw/16f/decision_theory_an_outline_of_some_upcoming_posts/
there was mention of some approach towards the problem of dealing with
mathematical uncertainty... though, nothing too specific... since
that's *also* necessary for what I was trying to describe in my post
here, obviously I think this is a big, important problem. The problem
has several elements:

1) Standard Bayesian probability takes logical consequence as given;
that is, if we update a probability distribution on a fact, we also
update on all of its consequences. Yet, in practice, we want to
maintain uncertainty here so that we can delay calculating; so, we
need to be able to have some uncertainty about logical implication,
rather than being forced to carry it out in order to calculate
probabilities.

2) Thanks to the halting problem, mathematical uncertainty which
ranges over wide enough mathematical domains (that is, which ranges
over questions such as halting) and which relies on a computable prior
is bound to get arbitrarily bad surprises (IE, a program that it's
arbitrarily sure is non-halting will halt). This was pointed out to me
by Eliezer. What this means is, we should not trust the results of
this sort of probabilistic reasoning. But, as I mentioned in my blog
post, evolution would construct minds with essentially this
methodology... it would not distinguish between logical truths and
logical statements which imply only truth within lifespan-like
computation times but eventually imply falsehoods. So, are our beliefs
fundamentally unjustified with respect to mathematics? Moreover,
humans appear to use this sort of reasoning over one lifetime, and
rejecting it seems rather crippling to our ability to accept the
consistency of systems... for example, it seems reasonable to argue
that ZFC is probably consistent because it has been around a long time
and nobody has found an inconsistency. Should we be barred from
concluding this with any definite probability?

--Abram

Vladimir Nesov

unread,
Aug 30, 2009, 7:39:01 PM8/30/09
to one-...@googlegroups.com
I don't follow your argument very well...

For me, a salient problem with standard expected utility maximization
is that it hinges on very naive semantics of action and observation.
It's basically assumed that you "are" in a state that is associated
with a set of "outcomes", and that any action "selects" a given subset
in it. The subsequent actions are supposed to repeat this on the
subset selected by the preceding actions. There are minor variations,
but the picture is roughly this.

Then, probability and utility are defined on the points of this set,
defining preference over outcomes. In other words, you take the
semantics of the logic of interaction, in the usual case the naive
semantics, and augment it by the preference.

The trouble is that the logic describing interaction of the agent with
environment can be more subtle than that, which is apparent from the
thought experiments/models such as Prisoner's dilemma, Parfit's
Hitchhiker, Nercomb's problem, Counterfactual mugging (the last is
pretty damning on the usual approaches).

My current project is to figure out how to more adequately logically
model the situations above as well as others, and how to integrate
preference in that model.

Coherent space semantics for Linear Logic (see [*], [**]) seems like a
good candidate -- it assigns topological spaces to types, which is a
perfect fit to probability spaces (only lacks measures which are to
specify preference, and you can do without utility function using two
measures instead -- see
http://lesswrong.com/lw/148/bayesian_utility_representing_preference_by/
). Term-as-a-clique interpretation is also pretty enlightening.

____
[*] J. Y. Girard, et al. (1989). Proofs and types. Cambridge
University Press, New York, NY, USA.
http://www.paultaylor.eu/stable/prot.pdf
[**] J. Y. Girard (1987). `Linear logic'. Theor. Comput. Sci.
50(1):1-102. http://iml.univ-mrs.fr/~girard/linear.pdf

--
Vladimir Nesov

Abram Demski

unread,
Aug 30, 2009, 11:23:48 PM8/30/09
to one-...@googlegroups.com
Vladimir,

Sounds interesting, though I lack the background to fully judge...
but, what is the motivation behind moving to that particular
formalism? The adjustments to decision theory under discussion at Less
Wrong seem quite well-motivated to me. Do you think those are
insufficient? (I suppose I could find the answer to that be reading
more of the comment threads, since you are fairly prolific there!)

Do you also have a reference for this "term as clique" stuff?

(I wish I could just read all these papers you are suggesting straight
away! Unfortunately, I've got classes, job, life, et cetera...)

It seems to me that handling mathematical uncertainty is *the* major
hole left to be filled in the current analysis which the Less Wrong
crowd has put together. Do you disagree?

--Abram

Vladimir Nesov

unread,
Aug 31, 2009, 5:28:09 AM8/31/09
to one-...@googlegroups.com
On Mon, Aug 31, 2009 at 7:23 AM, Abram Demski<abram...@gmail.com> wrote:
>
> Vladimir,
>
> Sounds interesting, though I lack the background to fully judge...
> but, what is the motivation behind moving to that particular
> formalism? The adjustments to decision theory under discussion at Less
> Wrong seem quite well-motivated to me. Do you think those are
> insufficient? (I suppose I could find the answer to that be reading
> more of the comment threads, since you are fairly prolific there!)
>
> Do you also have a reference for this "term as clique" stuff?
>
> (I wish I could just read all these papers you are suggesting straight
> away! Unfortunately, I've got classes, job, life, et cetera...)
>
> It seems to me that handling mathematical uncertainty is *the* major
> hole left to be filled in the current analysis which the Less Wrong
> crowd has put together. Do you disagree?
>

The last paragraph only assumed the knowledge from the references I
gave. You can get the idea by reading ch. 8 of [*]. It's fairly short,
and I gather you have prerequisite knowledge to start on it straight
away.

Stuff discussed on Less Wrong isn't strong enough. "UDT" is more of an
errata item, pointing out what's going wrong with the naive semantics
of observation, but not presenting the mathematical uderstanding for
dealing with that problem. TDT can't handle updateless reasoning,
fails on problems like counterfactual mugging, and it isn't obvious it
can be redeemed. For all the ideas at decision theories, problem
statements (e.g. causal graphs) still appear by magic.

____________


[*] J. Y. Girard, et al. (1989). Proofs and types. Cambridge
University Press, New York, NY, USA.
http://www.paultaylor.eu/stable/prot.pdf

--
Vladimir Nesov

Reply all
Reply to author
Forward
0 new messages