The following blog post (by me) rants for a while about how terribly
annoying Tarski's Undefinability theorem is, and then presents my
current thinking on how to
"get around" the bloody thing (or, why it is OK that we can't):
http://dragonlogic-ai.blogspot.com/2009/08/climbing-mountain-undefinability-is.html
Basically why I'm suggesting is that it's OK that no rational system
can fully understand itself, because it can learn about itself. It
becomes a different system in doing so, and so it can't ever really
finish, but this is not a death spiral; rather, it is a form of
self-improvement.
As I say there, I think this view really needs to be formalized before
it can be properly evaluated... what I mean to do in this email is
outline what I think that formalization looks like.
The initial seed of an idea came when I posted a short "paradox" to
the AGI list run by Ben Goertzel (http://www.agiri.org/email/).
Thinking of the Liar paradox, and of the idea that probability is a
real-valued generalization of boolean truth values, I came up with the
idea that a "naive probability theory" (a probability theory which
allows self-referential sentences) must be inconsistent just like the
theory of naive truth. This amused me greatly, so I posted the
argument.
Dr. Matthias Heger showed that my argument had a (rather obvious in
hindsight) flaw; the equivalent of Tarski's T-schema does not hold. We
cannot reason "if X, them P(X)=1", nor can we reason "If not X, then
P(X)=0." Thus, self-referential probability theory appears to be
consistent (it is immune to the most obvious translation of the Liar
paradox, at any rate).
So, my idea is this: perhaps a system which has probabilistic
self-reference can learn about itself in the way that I am proposing,
so that over time it can climb the Tarski hierarchy rather than
remaining stuck on any particular level.
So, here's my beginning of a formalization for that....
Start by adding syntactic self-reference to first-order logic, such as
arises from Godel numbers within Peano arithmetic. Note, however, that
as I add more axioms and inference rules to the system, I mean it to
be understood that the self-reference is similarly modified so that
the system keeps accurately reflecting its whole set of axioms and
rules.
Add to the system the standard axioms of probability theory, operating
upon it's own statements (with "and" playing the role of set
intersection, "or" playing the role of set union, "for all" playing
the role of countable intersection, and "exists" playing the role of
countable union, and of course "not" playing the role of set
compliment).
Also, add a method for doing a Bayesian update of the probability
distribution. This gets messy, because we can't possibly just update
the whole distribution in one big step when a new piece of evidence
presents itself; so, we need a mechanism for proving things about bits
of the current distribution we're worried about. This is one of the
holes in the formalization that I need to fill in.
In addition to updating the probability distribution based on data
coming in from the outside world, I want the system to update based on
facts it derives about itself and about mathematics in general. The
update method has more constraints placed on it by this desire: since
the axioms imply everything else, and since events that are logically
implied by events having probability 1 also must have probability 1,
it looks as if we've got nothing to update on; our prior will already
agree with all the consequences of our axioms. But, that isn't what we
want; we want mathematical facts to still "feel" to the system like
they require explanation, so that it will invent concepts to help
explain them. This means that we should give the axioms a probability
less than 1 in the prior, and then start updating based on them. But,
an update rule that can do this in a satisfying way is still needed.
Finally, the prior itself needs to be specified. My current assumption
is that it would be a length-based prior. But, it needs to conform to
the axioms of probability theory, too.
--Abram Demski
JY Girard From foundations to ludics (
http://iml.univ-mrs.fr/~girard/bsl.pdf.gz )? There might be something
to this...
--
Vladimir Nesov
It's somewhat specific scoffing though, that gives a direction. I
haven't read about ludics itself yet, and I'm not sure I'll go there
soon. Just something to get around the perspective that leads to
standard limitations of meta.
I'm trying to grok Linear Logic as a way of describing uncertainty
about a process (or uncertainty/knowledge of a process about its
environment), unsuccessfully as yet. Next on the menu is its
connection to models of concurrency.
--
Vladimir Nesov
For me, a salient problem with standard expected utility maximization
is that it hinges on very naive semantics of action and observation.
It's basically assumed that you "are" in a state that is associated
with a set of "outcomes", and that any action "selects" a given subset
in it. The subsequent actions are supposed to repeat this on the
subset selected by the preceding actions. There are minor variations,
but the picture is roughly this.
Then, probability and utility are defined on the points of this set,
defining preference over outcomes. In other words, you take the
semantics of the logic of interaction, in the usual case the naive
semantics, and augment it by the preference.
The trouble is that the logic describing interaction of the agent with
environment can be more subtle than that, which is apparent from the
thought experiments/models such as Prisoner's dilemma, Parfit's
Hitchhiker, Nercomb's problem, Counterfactual mugging (the last is
pretty damning on the usual approaches).
My current project is to figure out how to more adequately logically
model the situations above as well as others, and how to integrate
preference in that model.
Coherent space semantics for Linear Logic (see [*], [**]) seems like a
good candidate -- it assigns topological spaces to types, which is a
perfect fit to probability spaces (only lacks measures which are to
specify preference, and you can do without utility function using two
measures instead -- see
http://lesswrong.com/lw/148/bayesian_utility_representing_preference_by/
). Term-as-a-clique interpretation is also pretty enlightening.
____
[*] J. Y. Girard, et al. (1989). Proofs and types. Cambridge
University Press, New York, NY, USA.
http://www.paultaylor.eu/stable/prot.pdf
[**] J. Y. Girard (1987). `Linear logic'. Theor. Comput. Sci.
50(1):1-102. http://iml.univ-mrs.fr/~girard/linear.pdf
--
Vladimir Nesov
The last paragraph only assumed the knowledge from the references I
gave. You can get the idea by reading ch. 8 of [*]. It's fairly short,
and I gather you have prerequisite knowledge to start on it straight
away.
Stuff discussed on Less Wrong isn't strong enough. "UDT" is more of an
errata item, pointing out what's going wrong with the naive semantics
of observation, but not presenting the mathematical uderstanding for
dealing with that problem. TDT can't handle updateless reasoning,
fails on problems like counterfactual mugging, and it isn't obvious it
can be redeemed. For all the ideas at decision theories, problem
statements (e.g. causal graphs) still appear by magic.
____________
[*] J. Y. Girard, et al. (1989). Proofs and types. Cambridge
University Press, New York, NY, USA.
http://www.paultaylor.eu/stable/prot.pdf
--
Vladimir Nesov