nonmonotonicity

2 views
Skip to first unread message

YKY (Yan King Yin, 甄景贤)

unread,
Sep 26, 2009, 3:46:18 AM9/26/09
to one-...@googlegroups.com
Nonmonotonicity is another problem that faces classical logic. It
seems that it is impossible to encode nonmonotonic logic within
classical logic. So the current theorem provers would be unusable for
AGI without modifications...

YKY

Russell Wallace

unread,
Sep 26, 2009, 11:15:48 AM9/26/09
to one-...@googlegroups.com
Good point. Nonmonotonicity is something that will certainly have to
be addressed sooner or later. It seems to me there are two approaches:

1. Punt nonmonotonicity to a separate heuristic layer and forget about
it until we get monotonic logic nailed down solid.

2. Start off with nonmonotonic reasoning and then embed monotonic
logic within it as a special case.

I had been going with the first option, but this latest round of
problems with making higher-order logic both consistent and expressive
is starting to make me wonder, as a physicist once put it, "is nature
trying to tell us something?"

2009/9/26 YKY (Yan King Yin, 甄景贤) <generic.in...@gmail.com>:

Abram Demski

unread,
Sep 26, 2009, 8:56:26 PM9/26/09
to one-...@googlegroups.com
Russel, YKY,

I have mixed feelings about this. I might agree that nonmonotonic
logic provides a faster approximation of basic probabilistic
reasoning. However, nonmonotonic systems get quite complicated, and
subtle questions of conflict resolution arise. My understanding is
that probability theory clears up all these questions, leaving a
simple, elegant base theory where we'd have a mess if we tried to use
nonmonotonic logic. However, this impression is not thanks to an
in-depth knowledge of the field, but rather is gleaned from the
commentary of others.

The following paper from the first AGI conference shows something of
the state of the art of nonmonotonic reasoning, although it does not
contain a formal specification:

http://oscarhome.soc-sci.arizona.edu/ftp/PAPERS/General%20intelligence.pdf

The point for me here is that (although the system presented by the
author is general-purpose) the author points out that most work in
nonmonotonic logic deals only with limited subsets of first-order
logic. My question is, if it gets really complicated to do
nonmonotonic reasoning in general domains, is it still worth it? Or
should one just approximate probabilities directly?

Perhaps I should clarify what I mean. My thinking is that for any
nonmonotonic reasoning system, we *in fact* would want to place more
confidence on an assertion if it goes longer without being refuted. So
just having an on/off asserted-or-not should be thought of as a rough
approximation, where in fact we want degrees of belief. (And, of
course, at present I take "degree of belief" to be best formalized by
probability theory.)

Nonmonotonic logic as a basic entity, rather than just an
approximation of probability theory, *does* have some interesting
properties. Nonmonotonic logics can "solve" the halting problem by
assuming that all Turing machines are nonhalting, until proven
otherwise. The system will eventually converge to the correct belief
for each machine. However, there are harder problems that still can't
be completely characterized in nonmonotonic logics; incompleteness
still applies, even for this weaker version of correctness based on
convergence. Still, this might be seen as an improvement. My current
thinking, though, is that we'd still like to have degrees of belief
for these things... it seems as if we should be surer that a Turing
machine is nonhalting if we've run it for a long time.

(Whether being surer is *actually* justified is a complicated question
which I hope to deal with, but not in this email... but, at least it
seems clear that humans become surer in such situations.)

Now, Russel's point:

> I had been going with the first option, but this latest round of
> problems with making higher-order logic both consistent and expressive
> is starting to make me wonder, as a physicist once put it, "is nature
> trying to tell us something?"

I personally find it very interesting how "naive" theories (naive set
theory, naive theories of truth, naive higher-order logic, and so on)
seem to capture everything that one wants to capture in a sort of
magical way... and then explode into a horrible mess. The interesting
part is how separate-feeling these two parts are. Yes, the theory can
prove anything... but it doesn't feel as if that's *why* it can prove
the particular things one is after. One doesn't travel through the
Curry sentence on the way to getting the result one wants; instead,
everything along the way seems perfectly natural and good. Naive set
theory (and the others) "feel right". Why is this?

How useful are paraconsistent theories derived from naive set theory
or naive higher-order logic? Are they practical? What do they do for
us? I feel I don't know enough about that area.

My basic objection to paraconsistent techniques is simply that a
statement's being both true and false seems to lack meaning. Lately,
though, I've been thinking of it in terms of a simple form of
nonmonotonic logic: if we prove a statement in a paraconsistent
system, then we'll accept it until we also find a refutation.
(Similarly, if we disprove a statement, we'll deny it until we also
find a proof?) If we use this sort of interpretation, I'm wondering,
should we take certain paraconsistent foundational theories to be
true? Where do they stand?

--Abram
--
Abram Demski
http://dragonlogic-ai.blogspot.com/
http://groups.google.com/group/one-logic

Russell Wallace

unread,
Sep 26, 2009, 9:33:49 PM9/26/09
to one-...@googlegroups.com
Abram - excellent post, and I still need to spend more time thinking
about your other points, and following that link; but I can comment on
this bit now:

> I have mixed feelings about this. I might agree that nonmonotonic
> logic provides a faster approximation of basic probabilistic
> reasoning. However, nonmonotonic systems get quite complicated, and
> subtle questions of conflict resolution arise. My understanding is
> that probability theory clears up all these questions, leaving a
> simple, elegant base theory where we'd have a mess if we tried to use
> nonmonotonic logic. However, this impression is not thanks to an
> in-depth knowledge of the field, but rather is gleaned from the
> commentary of others.

There are two problems with this.

First, probability theory is great stuff where applicable, but not
everything is a matter of probability. Consider:
Is the sun a big star?
Is there life on Mars?
Is the millionth digit of pi odd?
If pressed to give an answer in the range 0..1, I might answer each of
these with 0.5, but I would not mean the same thing as I mean when I
say there is a 0.5 probability that a coin will land heads.

Second, there is the issue of time, and of updating beliefs when new
information arrives. Even if an AI program operates purely in batch
mode, it will need to reason about processes that operate online.

Abram Demski

unread,
Sep 26, 2009, 11:41:32 PM9/26/09
to one-...@googlegroups.com
Russel,

> First, probability theory is great stuff where applicable, but not
> everything is a matter of probability. Consider:
> Is the sun a big star?
> Is there life on Mars?
> Is the millionth digit of pi odd?
> If pressed to give an answer in the range 0..1, I might answer each of
> these with 0.5, but I would not mean the same thing as I mean when I
> say there is a 0.5 probability that a coin will land heads.

I'd like to hear what you think the difference is. I think there are
two questions, the second more practical:

(1) What does the number mean?
(2) How is the number used in inference and decision making?

For the second question: do you think the way you should bet on the
questions would differ, or you only think that the meaning of the
number differs?

--Abram

Russell Wallace

unread,
Sep 27, 2009, 12:08:07 AM9/27/09
to one-...@googlegroups.com
It makes a difference to how the numbers are used.

Is the sun a big star?
If this were a question of probability, we would expect to be able to
resolve it by measuring the Sun's mass, thereby resolving the 0.5 to
either zero or one. But the measurement has been done, and the 0.5
remains, because the result of the measurement and comparison with
other stars is that the Sun is a medium-sized star; the 0.5 is a
measure of fuzziness not uncertainty.

Is the millionth digit of pi odd?
This one is a matter of uncertainty, but if I were to offer you a bet
on it, would you be wise to take it? No, because I could already have
calculated the answer, and it will always be the same answer no matter
who does the calculation -- the actual probability is either zero or
one, you just don't know which. There are contexts in which it would
make sense to take it as having a subjective probability of 0.5, but
there are also contexts in which it does not, so however you do it,
you have to keep track of the difference between this kind of
pseudo-probability and a real probability.

Abram Demski

unread,
Sep 27, 2009, 1:27:28 AM9/27/09
to one-...@googlegroups.com
Russel,

I am quite tempted to retort that the actual probability of *any*
event is either 0 or 1! Either it happens, or it doesn't.

Likewise, any gamble can be rigged! That's not the point of the question.

But fuzziness, yea, that is separate.

In typical situations there is no way to turn down a gamble; that is,
every choice is a gamble to a lesser or greater extent. So, it seems
to me that one must have a probability distribution over everything
that can bear on decision-making. Otherwise, how to we calculate
expected utilities? Of course, standard probability theory becomes
intractable; we need to approximate...

Hopefully I'll have more to say on this later.

Cheers,

--Abram

YKY (Yan King Yin, 甄景贤)

unread,
Sep 27, 2009, 7:23:28 AM9/27/09
to one-...@googlegroups.com
On Sun, Sep 27, 2009 at 8:56 AM, Abram Demski <abram...@gmail.com> wrote:

> I have mixed feelings about this. I might agree that nonmonotonic
> logic provides a faster approximation of basic probabilistic
> reasoning. However, nonmonotonic systems get quite complicated, and
> subtle questions of conflict resolution arise. My understanding is
> that probability theory clears up all these questions, leaving a
> simple, elegant base theory where we'd have a mess if we tried to use
> nonmonotonic logic. However, this impression is not thanks to an
> in-depth knowledge of the field, but rather is gleaned from the
> commentary of others.

I don't like binary nonmonotonic logics either; but my intention is
to suggest that classical binary logic should be abandoned because it
cannot handle *nonmonotonicity*.

> Perhaps I should clarify what I mean. My thinking is that for any
> nonmonotonic reasoning system, we *in fact* would want to place more
> confidence on an assertion if it goes longer without being refuted. So
> just having an on/off asserted-or-not should be thought of as a rough
> approximation, where in fact we want degrees of belief. (And, of
> course, at present I take "degree of belief" to be best formalized by
> probability theory.)

I think "degree of belief" cannot be formalized by probability theory
alone. NARS has advocated the use of more than 1 number, and both PLN
and my PZ logic have followed suite. But I agree that the
formalization should better be compatible with probability theory.

In particular, belief revision can be handled easily with NARS-style
confidence, but it seems much harder to solve with single-number
probability theory alone.

I think nonmonotonicity should be handled by combining probability
theory and classical logic -- but that may change the proof theory of
classical logic. So I suggest to move away from classical logic to
probabilistic logic...

YKY

YKY (Yan King Yin, 甄景贤)

unread,
Sep 27, 2009, 7:38:33 AM9/27/09
to one-...@googlegroups.com
On Sun, Sep 27, 2009 at 12:08 PM, Russell Wallace
<russell...@gmail.com> wrote:

> Is the sun a big star?

This one is a fuzziness. My point has long been that we need both
fuzziness and probability for AGI...

> Is the millionth digit of pi odd?

You seem to have some conceptual problems with subjective
probabilities. We form subjective probabilities in our minds because
we do not know everything. But we can *update* those probabilities
when given more information. So there is no problem here.

YKY

Russell Wallace

unread,
Sep 27, 2009, 9:47:13 AM9/27/09
to one-...@googlegroups.com
2009/9/27 YKY (Yan King Yin, 甄景贤) <generic.in...@gmail.com>:

>
> On Sun, Sep 27, 2009 at 12:08 PM, Russell Wallace
> <russell...@gmail.com> wrote:
>
>> Is the sun a big star?
>
> This one is a fuzziness.  My point has long been that we need both
> fuzziness and probability for AGI...

Yep.

>> Is the millionth digit of pi odd?
>
> You seem to have some conceptual problems with subjective
> probabilities.  We form subjective probabilities in our minds because
> we do not know everything.  But we can *update* those probabilities
> when given more information.  So there is no problem here.

I would say that what I have a problem with is the conflating of
subjective and objective probability (the latter ultimately boils down
to indexical uncertainty). We can assign a subjective probability of
0.5 to an arbitrary bit of pi, and our reasoning about this need not
-- must not -- violate the laws of probability theory, but it must
also, by whatever method, keep track of the fact that this is not an
irreducible indexical uncertainty; it is a logically determined result
that we have not computed. If we do keep track of this, and if our
inference machinery correctly updates when required, that is no
problem; does anyone have an outline design for some --
computationally tractable -- such machinery?

YKY (Yan King Yin, 甄景贤)

unread,
Sep 27, 2009, 10:16:07 AM9/27/09
to one-...@googlegroups.com

Of course, this is what NARS and PLN and P(Z)C are concerned with, ie,
probabilistic belief revision.

But I suspect that this aspect is not easily captured by classical /
binary logic. So there is a real need to abandon that approach...

YKY

Abram Demski

unread,
Sep 27, 2009, 4:58:06 PM9/27/09
to one-...@googlegroups.com
Russel,

I agree that there is a distinction to be made about probabilities
which are due to mathematical uncertainty (as with the expansion of
pi) and those which involve "real" uncertainty (as with uncertainty
about the physical laws). However, I'm not so sure about a distinction
between "subjective" and "objective" probabilities. Are there true
"physical random processes" to have objective probabilities about?
What exactly does such a thing mean, and how is it fundamentally
different from uncertainty about non-random processes which we lack
full knowledge of?

Suppose we've got a set of physical laws PL which determine some
things, but not quite everything. In universe A, the things not
determined are decided randomly; ie, physical random processes. In
universe B, these situations are decided by a set of additional laws
which state one way or another in each indeterminate case that
actually occurs. (Thus it could be finite.) The laws of universe B
happen to coincide with the random results from universe A.

In each universe, there is an observer named Ted. In universe A, he is
using objective probabilities when he's thinking about certain events,
whereas in universe B, he'd be classified as using subjective
probabilities. Right? My conclusion, of course, is that there's no
real difference between the two...

--Abram

Abram Demski

unread,
Sep 27, 2009, 5:40:38 PM9/27/09
to one-...@googlegroups.com
YKY,

> I think "degree of belief" cannot be formalized by probability theory
> alone. NARS has advocated the use of more than 1 number, and both PLN
> and my PZ logic have followed suite. But I agree that the
> formalization should better be compatible with probability theory.

When I say "probability theory," I mean it in a fairly broad sense;
interval probabilities, higher-order probabilities, and other such
entities are fine with me. Even things like intuitionistic probability
theory are fine with me if they are justified properly.

> I think nonmonotonicity should be handled by combining probability
> theory and classical logic -- but that may change the proof theory of
> classical logic. So I suggest to move away from classical logic to
> probabilistic logic...

Agreed... I think probability theory is needed to unravel the
paradoxes. But, I don't yet have the system ti substantiate that
claim. ;)

--Abram

2009/9/27 YKY (Yan King Yin, 甄景贤) <generic.in...@gmail.com>:

Russell Wallace

unread,
Sep 27, 2009, 7:11:57 PM9/27/09
to one-...@googlegroups.com
Well, philosophically, e.g. Schrodinger's cat is a matter of objective
probability because it is a matter of indexical uncertainty: 50% of me
is in universes where the cat is alive, 50% in universes where the cat
is dead, whereas either 100% of me is in universes where the millionth
digit of pi is odd, or 100% of me is in universes where it is even, I
just don't know which is the case.

But really that doesn't matter; you are right that true randomness
might be impossible in practice to distinguish from e.g. the output of
a cryptographically secure algorithmic random number generator.

What matters in practice for an AI program is that it keep track of
things like what evidence a subjective probability estimate is based
on, whether it might be possible to obtain better evidence, and what
information various agents might have and what beliefs they might hold
on the matter.

Abram Demski

unread,
Sep 28, 2009, 7:00:09 PM9/28/09
to one-...@googlegroups.com
Russel,

I think I'd even argue that indexical uncertainty is not different in
principle, but I'm not totally sure.

Now, about mathematical uncertainty:

[The considerations ended up much longer and more meandering than I
expected; be warned. Perhaps I will try to write some blog posts
dealing with issues more systematically.]

If we are uncertain about the result of a computation, the correct
distribution to use is the Solomonoff distribution. Not very helpful,
right? :) I'm replacing a merely-difficult computation with a
literally uncomputable measure of uncertainty. However, what I'm
thinking is that simple approximations of the distribution will still
be useful. We'll have an optimal policy dependent on the cost of
computation, bound of course by the case in which we decide to just
execute the function.

What I'm imagining is that we run a quick compression program on any
possibilities we are considering as the output, taking the length to
indicate the prior probability.

There's no exact sense of optimal here, since we're purposefully
ignoring information; what I propose is that there will be different
"optimal" strategies based on what information we decide to take into
account. We can't in general execute code to determine how much
information we should take into account in a particular case, because
of course that costs processing time as well, so we'd want to ask if
it's worth it to do *that* as well, and so on... thus I imagine this
to be somewhat a matter of experimentation to see which
clever-sounding ideas actually work well in practice.

Example of a clever idea: perhaps we could *partially* execute the
expensive function, by paying attention to a high-level part of the
expression but ignoring certain subportions. (In other words, if the
program is represented as an expression tree, we snip off parts.) We
then apply the solomonoff-approximation idea, but with the solomonoff
distribution filling in the holes rather than representing the final
result. This idea may or may not have merit.

To summarize the approach I'm suggesting: we decide on the partial
information we want to use, then we determine the optimal strategy
given that partial info, then we approximate that strategy. Different
systems can choose different partial information to use, and other
methodologies must be used to decide which systems are better; still,
I think the approach is not too bad.

However, the real problem is a theory of using partial information
which already contains (1) approximations due to other
partial-information integration, and (2) approximations due to other
approximation algorithms (such as the approximate compression
algorithm used above).

It seems useful to, every time we *use* an approximate value, spend a
little time improving the approximation (or perhaps just record the
amount, so that the time can be lumped together later to improve
effectiveness.

Imagine a system where we remember the results of every calculation
that we perform, and the estimates of every calculation which we don't
fully perform but instead approximate. When evaluating an expression,
what we do is look up every sub-expression we need to evaluate, and
then combine them. Of course, the sub-expression may not be completely
evaluated. In such a case what we do is either fully evaluate,
approximate, or totally fudge (ie use a simple probability
distribution over the possible values). Every time we look up such a
value again, we take what time we have to improve the estimate. This
might mean something like the following:

If we've got no time, return some default approximation.

If we've got some time:
Spend a fixed fraction on evaluating the expression directly, and
save the results. We keep finishing a bit more of the computation
every time we come back to visit.
Spend the rest on approximation methods which return a result *right
now*. Again, we save the results, and hopefully can use them to help
produce a better approximation next time.

If we've got lots of time:
Launch a planner subroutine with some set fraction of the time. With
the rest, do what it said (which could involve more planning).

To completely specify the arrangement, specify the computational
primitives in use, and how each gets approximated. An approximation
technique should be an anytime algorithm, which will get stopped and
resumed throughout the life of the system. (We'll also want some
notion of forgetting in the actual version, since we don't want to
remember *everything*.)

This sort of system will, apparently, do the best that it can with the
resources it has.

However!

We want a system which represents our *uncertainty* about the
computation. The system as given will spit out bad answers with no
indication of their badness.

To fix this problem, we've got to pass around probability
distributions rather than approximate answers. Rather than working on
improving the approximation when it gets processing time, the system
works on getting a less ambiguous probability distribution. The only
thing that changes here is that rather than specifying an
approximation anytime-alg for each computational primitive, we specify
an anytime-alg which returns tighter and tighter probability
distributions as it is given more time.

Note: Since the probabilities themselves will be a subject of
computation, we *can* apply the strategy recursively to 'em. If we do,
we'll get arbitrarily high-order probabilities.

This seems to me to be the correct way of dealing with mathematical uncertainty.

Practically speaking, we may want to violate the
probability-distribution requirement, and sometimes merely return
approximate answers rather than probability distributions. This is
particularly tempting in the higher-order territory. Imagine a
function returning an expression which would evaluate to the expected
utility of something. We want to call a function to do this evaluating
for us; but if we call an approximation function that keeps to the
probability-distribution requirement, what we'll get back is another
probability distribution! No good.

Actually, upon further reflection, I think that "example" corresponds
to the general case where we'd really really want to apply
approximations. Other than when we want to take actions, probability
distributions would be acceptable. So, it's when probabilities
interact with *utilities* that we need to truly approximate.

(Of course, it can still be useful elsewhere.)

--Abram

On Sun, Sep 27, 2009 at 7:11 PM, Russell Wallace

Russell Wallace

unread,
Sep 29, 2009, 6:56:18 PM9/29/09
to one-...@googlegroups.com
What you're saying makes sense, though my intuition for what its worth
is that that's not how the trade-offs will go.

Consider that memory is more often a limiting factor than time.

Given that, I'm inclined to think a good first draft strategy when
given a literal computation to perform is likely to be:

Try to compute it, using a few seconds of CPU time and all available
working memory
if this fails, forget about computing it, discard all intermediate
results and reclaim the memory, and flag it as probably nonterminating

Abram Demski

unread,
Oct 1, 2009, 1:00:16 PM10/1/09
to one-...@googlegroups.com
Russel,

Fair point. The "saving all the results" part of the system is just a
random implementation... I should not have phrased it as such an
essential part of the idea. The more important concept is requiring
that we have probability estimates that tighten if we do more
calculation, rather than approximate answers that get better (but come
with no indication of their goodness).

In other words,

--In situations of limited processing power, typical theories of
rationality are insufficient, because they always assume sufficient
processing power to perform the calculations prescribed by the theory.
--In real situations, one wants to be able to act without deducing all
the relevant facts (ie running all the relevant computations).
--My claim: running less of the relevant computations should result in
lower-certainty beliefs, rather than less-accurate beliefs, to remain
rational. (Argument: it is not rational to have higher certainty than
is justified by the computations performed.)

This essentially is a stance against nonmonotonic default assumptions,
and to an extent, against approximations more generally...

Problems:

--Returning a probability distribution rather than a guess of an
answer defers some computation until later (since a probability
distribution is a function that returns a value when evaluated for
some particular guess). It is not clear how to manage the computation
effort if this is a nontrivial part of it.
--One way to make the probability calculations more tractable is to
apply typical approximations, but the theory suggests that instead we
should use methods that produce uncertain results (which in this case
are higher-order probabilities). This means we return yet another
probability distribution, delaying computation even further (but
potentially making it an easier computation when the time comes).
--At some point, definite decisions need to be made. If we've got a
distribution that represents the uncertainty about which move is best,
we attempt to determine the highest-probability action and do it.

Hope that's a bit clearer.

--Abram

On Tue, Sep 29, 2009 at 6:56 PM, Russell Wallace
Reply all
Reply to author
Forward
0 new messages