Alexey, Nil, Zar, Linas, others...
****
Ah, I have some new thoughts on the "theorem proving + AGI + parsing"
side, but will have to wait to type them in till I get a little time
at the computer, I'm traveling between meetings and conferences in
Europe just now...
****
And here we go...
GENERAL BLATHER
About the general issue of what role logic and theorem-proving have to
play in AGI, obviously there is room for different opinions on this….
My own view, as most of you know, is
1) There are going to be many viable routes to human-level and
superhuman AGI, not just one
2) A route with a large role for probabilistic-logic theorem-proving
is one viable route
3) Advantages of a route of type 2) would seem to be:
3a) A system that can make good use of current computing hardware
which is logic-based (whereas e.g. a neural net based architecture
would make better use of analog computing hardware)
3b) A system that should be good at scientific and engineering
thinking, and also good at meta-thinking about how to improve its own
code … i.e. a good candidate to figure out how to cure aging, create
nanotech and femtotech, launch the Singularity, etc.
3c) A system that should be coherent in maintaining its goals over
time, as compared to a system operating more similarly to mammals
…
Anyway that is my view which is the result of a lot of thought from
diverse directions, but of course none of us knows for sure how to
create advanced AGI yet so disagreement is understandable..
I note also that since, by Curry-Howard correspondence, programs and
proofs are equivalent, the statement that AGI can be founded centrally
on theorem-proving is equivalent to the statement that AGI can be
founded centrally on program-execution. The statement that AGI can
be founded centrally on probabilistic logic theorem proving, is
equivalent to the statement that AGI can be founded centrally on
probabilistic programming. The difference between a theorem proving
based AI and a program learning based AI is merely an “implementation
detail” ;-) …
Now more interestingly let me explore some specifics I have been
thinking about lately!
CONNECTOR THEOREM PROVING
First of all, Linas, if you haven’t seen it before you will enjoy the
diagrams in
http://www.scholarpedia.org/article/Connection_method
which are explained in more detail in
“A Vision for Automated Deduction Rooted in the Connection Method” (W. Bibel0
https://www.researchgate.net/publication/318226306_A_Vision_for_Automated_Deduction_Rooted_in_the_Connection_Method?enrichId=rgreq-e84163672cff86aca93bd93eac91d998-XXX&enrichSource=Y292ZXJQYWdlOzMxODIyNjMwNjtBUzo1NDg5MjU5MjY4OTU2MTZAMTUwNzg4NTU0NzA0NQ%3D%3D&el=1_x_3&_esc=publicationCoverPdf
and in even more detail in the book
Wolfgang Bibel:
Automated Theorem Proving. Vieweg Verlag, Wiesbaden, 293 pp. (1982);
2. Edition 289 pp, 1987.
which I found online in pdf via sci-hub …
This is connection-based theorem proving, in which a proof of a
theorem is constructed by drawing connections (links) between terms in
the theorem, in such a way that each link joins two instances that
involve the same predicate (but one in negated form and one in
non-negated form, where “negated vs. non-negated” is assessed based on
whether an instance would be negated in a DNF normalized version of
the theorem). A proof is a bunch of links that form a set of
complete paths thru the theorem (so every instance is along some path
leading to the final instance of some term in the theorem); and such
that there is some unification of all the terms linked, that plays
nicely with equating each linked pair of instances.
A complication is that sometimes there need to be a couple links
emanating from a given instance, i.e. an instance in the theorem may
get used twice, three times, etc. (This counting of usages would
provide an obvious connection with linear logic, which may or may not
be useful…)
This general concept has been around since the 1980s, but recently has
been used within some highly effective theorem-provers, i.e. leanCop
http://www.leancop.de
and its descendants. These provers are interesting, among other
reasons, because they consist of extremely concise Prolog code, yet
they work nearly as well as the state-of-the-art theorem-provers that
are much more complex as code, and have been pretty heavily
optimized...
Leancop operates on theorems in disjunctive normal form. However, a
variant called nanocop
http://www.leancop.de/nanocop/
operates on non-normalized theorems, using a straightforward and
elegant variant of the connection-based method given here
http://www.jens-otten.de/papers/concalc_tab11.pdf
Also, randocop
http://ceur-ws.org/Vol-373/paper-08.pdf
is like leancop but with a more efficient search algorithm. Leancop
and nanocop basically use chaining based search involving two
operators, “extension” and “retraction.” XXX … What randocop does, is
considers many randomizations of the order of the clauses in the DNF
version of the theorem, and does the chaining search on each of these.
This ends up working faster. (This is not surprising, it’s in line
with many other sorts of results in the search space, showing that
doing a shallow search from a lot of randomly selected starting points
can often be better than doing a deep search from a single starting
point…)
SAT SOLVERS FOR CONNECTOR THEOREM PROVING
As a semi-aside here, one purely algorithmic thought I had when
reading all this is that the search+unification here can probably be
done way faster using an SAT solver, at least for all but the shortest
proofs. This is vaguely similar to how with the link parser it's
way faster for long sentences to use SAT for parsing than a standard
NL parsing algorithm...
Use of SAT for theorem-proving has been tried already in a slightly
different context, see
https://www.lsi.upc.edu/~oliveras/espai/smtSlides/lynch.pdf
There are also various other tricky ways to code e.g. the unification
part as an SAT problem,
https://pdfs.semanticscholar.org/59ad/1b79d7a99c9330a494787deb8d6d3020d376.pdf
(that paper's not about SAT it's about unification using neural nets,
but the part where they code unification as constraint satisfaction
would apply to SAT as well as to NNs)
So bottom line is, there are lots of ways to play with the encoding
but one could use SAT or SMT here as a substitute or augmentation to
chaining.
To use SAT here you’d presumably have to set a bound on how many times
each instance can be re-used. Then you could increase the bound
incrementally, trying the SAT solver again for the constraints
generated with each new bound. Though the SAT solver would want to
work on the normalized version of the theorem, one could use
information from the grouping of instances in the non-normalized
version to guide the search inside the SAT solver.
CONNECTORS, PROOF SKETCHES AND AGI
Now why do I like this from an AGI perspective? Because it lets one
do higher-level, abstract probabilistic reasoning just by reasoning
about the links (and setting the unification aside).
To see how this might work consider that a “proof sketch” could be
written in connector form as: A set of connectors that is not
complete, and skips some steps.
For instance, in a proof sketch one might have a link skipping from an
instance of P in clause 1 to an instance of P in clause 10, bypassing
instances of P in clauses 3 and 5.
Or, in a proof sketch, one might have an instance of P used for the
second time, but not for the first time (leaving it open when it will
be used for the first time). Or one might have an instance of P used
for the k’th time in one link, and used for the m’th time in another
link, with the constraint that k<m but no commitment made about the
values of k and m.
In general a proof sketch, then, is a set of proofs sharing some
common elements.
If one is doing probabilistic induction or abduction across a bunch of
connector proofs, one is going to learn a lot about which connections
and which combinations of connections occur in which kind of contexts
within which kinds of proofs. This learning will naturally lend
itself to various conjectural proof sketches for newly presented
theorems.
The process of filling in a proof sketch to get a proof can be
confronted just like the problem of proving a theorem de novo - by
chaining or by SAT/SMT or some combination thereof.
SYNTACTIC LINKS AND LOGICAL CONNECTORS
As a minor aside, given the loose analogy between link parsing and
connector proofs, it’s also interesting to look at how the links in a
sentence’s link parse relate to the connectors in that same sentence’s
logical interpretation.
The links in a sentence’s link parse become atomic relationships
Li(w_a, w_b)
where w_a and w_b are word-instances and Li is a link of type i. The
logical interpretation of a sentence then involves a bunch of
implications such as
L1(w_1, w_3) & L5(w_3,w_7) & L9(w_1,w_7) ==> P4(w_1,w_7) & P8(w_7, w_3) <p>
where the Pi are logical relationships and <p> is a probability value.
Sometimes there may be complex quantifier relations on the right hand
side of the implication.
The overall semantic interpretation of a sentence, then looks like an
implication of the form
(conjunction of all the syntactic links found in the sentence)
==>
(logical formula involving conjunctions and disjunctions and negations
of logical relationships P_i between logical terms corresponding to
the word-instances in the sentence)
where, in the logical formula on the right hand side, each term may be
tagged with a probability value.
The existence of a syntactic link between two word-instances in a
sentence, has the effect of causing the logical terms corresponding to
the word-instances to be grouped into (one or more of) the same
probability-tagged conjunctions of logical relationships in the
logical interpretation of the sentence.
If the sentence describes N different situations (each one with
different observation-instances providing groundings for the
word-instances) S1, S2, …, S_N , then a proof of the validity of the
logical interpretation of the sentence as a model of the situations,
is produced by drawing connectors from the observation-instances to
the logical terms…