Linas, Nil, others,
OK, here is a more serious reply... which to be frank took a few hours
to write out... it may still not satisfy you, but at least it was
interesting to write down this stuff that has been bouncing in my head
a while ;) ;p ...
GENERAL FRAMEWORK OF OBJECTS AND TRANSFORMATIONS
OK, so let’s look at this at the level of abstraction of a general
rule engine… this is one level at which parsing and inference are the
same thing, right? (and after all, we do have a general rule engine
in OpenCog, so...)
In that context, what we have is some set of objects, and some set of
transformations…
Each transformation maps certain objects into certain other objects….
We could view each transformation as a category obviously… or we could
view any set of transformations as a category …
The transformations can also be viewed as objects. (Sometimes a
transformation is an object for a different transformation; but it’s
also possible for a transformation to be an object for itself and
transform itself.)
Some objects refer to observations from sensors or sets thereof; these
objects, among others, may come with count values. In the simplest
case, the count of an object may refer to the number of times an
observation falling into the set denoted by the object has been made
by the AI system associated with all these objects and
transformations…
Some objects are n-ary relations among other objects. For instance,
binary inheritance relations; predicates that transform objects into
truth values (truth values also being considerable as objects in this
general sense); etc.
We may also have types and type inheritance relationships (which are
a kind of object), so that objects may be seen as falling into a type
hierarchy. Since functions are a kind of object too, dependent types
may exist in this framework (and are also a kind of object)…
That is just the simple, abstract set-up, right? That obviously
encompasses parsing, logical inference, and plenty of other stuff…
SYNTAX PARSING
For the case of parsing… suppose we have a sentence object, i.e. a
linear series of word-instance objects. In link grammar (which seems
equivalent to pregroup grammar), each word can be viewed as a
transformation that maps (usually other) words into phrases. So for
instance if we have
_____ S___
|
dogs
i.e. “dogs” with an S connector on the right, this is a transformation
that transforms {any word with an S connector on its left} into a
phrase. We may have
___S___
|
bark
… in which case letting “dogs” transform “bark”, it will transform it into
______S_____
| |
dogs bark
i.e. into the phrase
dogs bark
We can model this grammar categorically, by considering the process of
connecting a word to another word or phrase as a “product” …. Then
the entity “dog as an entity to connect to the left of another word or
phrase” is the right adjoint of “dog”; and the entity “dog as an
entity to connect to the right of another word” is the left adjoint of
“dog”, and so we have a monoidal category. Which is evidently an
asymmetric monoidal category -- because grammar is left-right
asymmetric: a word’s potential syntactic connections on the left are
not the same as its potential syntactic connections on the right.
Whether modeling the grammar categorically is useful, I am unsure.
Parsing a sentence is a case where we have a limited number of
objects of interest in a given multi-transformation process. That is,
a sentence may consist of (say) 5 specific word-instances (e.g. “The
dogs bark at me”) …. And we have a constraint that if we apply the
word-instance “dogs” to the word-instance “bark” as indicated above,
we can’t also apply the word-instance “dogs” to the word-instance “me”
(as in the case “This sentence dogs me” ) …
So in this case we need some notion of a “constrained transformation
system” … where we have a certain set of objects, and then a certain
set of transformations, and a set of constraints on which
transformations can be used together. The objects plus
transformations can be thought of as a category. We then have a
bunch of disjunctions among (object, transformation) pairs, indicating
that if e.g. (O1,T1) is in the transformation system, then (O2,T2)
cannot be…. These disjunctions are the “constraints” …
CONSTRAINED TRANSFORMATION SYSTEMS IN PLN LOGIC
A logic like PLN is in the sense of the above paragraph also
“constrained transformation system”, where here the constraint
pertains to multiple-counting of evidence…. The objects of PLN
may be any objects of appropriate types (the types that can serve as
premises to the logic rules, e.g. an InheritanceLink and its
associated targets and truth value); and the transformations are the
inference rules. The constraints are things like:
“
If we accumulate a piece of indirect evidence E in favor of
“Inheritance A C” as part of the input to the deduction-rule
transformation to the list object composed of the objects “Inheritance
A B” and “Inheritance B C” … then we cannot use this same piece of
evidence E as part of the input to a deduction-rule transformation
whose input contains “Inheritance A C” and whose output contains
“Inheritance A B” …
“
Since in practical PLN we are not keeping track of individual pieces
of evidence, we need to say this as
“
If we use the truth value of “Inheritance A C” as part of the input to
the deduction-rule transformation to the list object composed of the
objects “Inheritance A B” and “Inheritance B C” … then we cannot use
this same truth value as part of the input to a deduction-rule
transformation whose input contains “Inheritance A C” and whose output
contains “Inheritance A B” …
“
Inference trails are a crude way of enforcing the disjunctive
constraint involved in PLN (or NARS or other similar uncertain logic
systems), when the latter is considered as a constrained
transformation system.
We can also view PLN categorically if we want to. PLN can be viewed
as building on typed lambda calculus, adding onto it probabilistic
truth values attached to expressions. In its full generality PLN
should be viewed as building on lambda calculus with dependent types.
Just as simply typed lambda calculus corresponds to closed Cartesian
categories,
http://www.goodmath.org/blog/2012/03/11/interpreting-lambda-calculus-using-closed-cartesian-categories/
similarly, lambda calculus with dependent types then is known to
correspond to a (locally closed) cartesian category. A category C is
locally Cartesian closed, if all of its slices C/X are Cartesian
closed (i.e. if it’s Cartesian closed for each parameter value used in
a parameter-dependent type). An arrow F: X —>Y can be interpreted as
a variable substitution and as a family of types indexed over Y in the
type theory. Basically, the category associated with a typed lambda
calculus has objects equal to the types, and morphisms equal to
type-inheritance relations between the types.
Note that unlike the asymmetric monoidal category used to model
syntax, here we have a SYMMETRIC category modeling logic expressions.
This is, crudely, because logic doesn’t care about left versus right,
whereas link grammar (and syntax in general) does.
MAPPING SYNTAX TO LOGIC
"RelEx + RelEx2Logic” maps syntactic structures into logical
structures. It takes in structures that care about left vs. right,
and outputs symmetric structures that don’t care about left vs. right.
The output of this semantic mapping framework, given a sentence, can
be viewed as a set of type judgments, i.e. a set of assignations of
terms to types. (Categorially, assigning term t to type T
corresponds to an arrow “t \circ ! : Gamma ---> T” where ! is an arrow
pointing to the unit of the category and Gamma is the set of type
definitions of the typed lambda calculus in question, and \circ is
function composition) .
For instance, if we map “dogs bark” into
EvaluationLink
PredicateNode “bark”
ConceptNode “dogs”
then we are in effect assigning “dogs” to the type T corresponding to
EvaluationLink
PredicateNode “bark”
ConceptNode $X
with free variable $X .
Categorially, the arrow
(ConceptNode $X) ----> (EvaluationLink (PredicateNode “bark”) (ConceptNode $X))
corresponds to this type expression.
The symmetry here consists in the fact that it is not “dog as it
connects to the left” or “dog as it connects to the right” that
belongs to this type T, it is just plain old “dog”…
Dependent types come in here when one has semantic mappings that are
left unresolved at the time of mapping. Anaphora are a standard
example, for instance sentences like “Every man who owns a donkey,
beats it”
http://www.slideshare.net/kaleidotheater/hakodate2015-julyslide
The point here is that we want to give “beat” a type of the form like
EvaluationLink
PredicateNode “beat”
ListLink
SomeType $y
SomeType $z
where then the type of $y is the type of men who own donkeys, and the
type of $z is the type of donkeys. But the rule for parsing “beats
it” should be generic and not depend on the specific types of the
subject and object of “beats”, which will be different in different
cases.
INFERENCE CONTROL
So, given this simple, general, abstract set-up, then we confront the
question of control….
What “control” means mostly here is something like pursuit of the
following goals:
“
We have a bunch of objects, and want to apply a bunch of
transformations to them, to product a bunch of other objects
satisfying some desirable characteristics (where the “desirable
characteristics” are often formulable in terms of transformations
mapping objects into numbers)
“
Or
“
We have a specific transformation, and want to find/create some object
that this transformation can act on … or some object that this
transformation will map into the largest number possible … etc.
“
In cases like this we need to apply multiple transformations (could
be in serial or in parallel) to achieve the goal, and there is a
problem of knowing which ones to try and in what order.
The “meta-learning approach to this is: Keep a record of what sets of
(serial/parallel) transformations have helped achieve similar goals.
Then, use this record as the starting-point for inference regarding
what transformations seem most probable to achieve the current goals.
Try these transformations that are “probable-looking based on
history” and see what happens. The information on what happened,
provides more data for meta-learning.
This can be viewed as a kind of probabilistic programming, as follows.
Consider a probability distribution p(A,G) over transformations of
object A aimed at goal G, where the probability of a transformation T
is proportional to:
· How useful T was for achieving goal G1
· How similar G1 is to G
Consider a similar distribution over objects A relative to goal G,
i.e. p(A,G) proportional to
· How useful transforming A was for achieving goal G1
· How similar G1 is to G
Then, consider a process as follows:
· Choose an object A relative to p(A,G)
· Choose a transformation T relative to p(T,G)
· Store the result
· Do some more inference to update the p(*,G) distributions
· Repeat
This is what I’ve called PGMC or “Probabilistic Growth and Mining of
Combinations”, and what Nil calls “metalearning for inference
control.”
RESOURCE DEPENDENCY AND LOGIC
Now let’s address linear logic related ideas.
The lambda calculus with multiplicities is straightforward to think about,
ftp://ftp-sop.inria.fr/mimosa/personnel/gbo/rr2025.ps
https://www-lipn.univ-paris13.fr/~manzonetto/papers/bcem12.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.471.7213&rep=rep1&type=pdf
Basically, one does lambda calculus, but on multisets rather than
ordinary sets. There is some math (see the last paper above)
showing that this is equivalent to a form of linear logic, under some
circumstances. One can handle cases where some quantities can be
used indefinitely many times, and others can be used only a certain
fixed number of times (their degree of multiset membership).
In terms of constrained transformation systems as I discussed them
above, resource constraints are one possible type of constraint. In
the case of the link grammar / pregroup grammar system, the
constraints involved in parsing a sentence come down to assuming that
the left and right transformations corresponding to each
word-instance, are finite resources with a multiplicity of one.
In the PLN case, the constraints involved in avoiding
multiple-counting of evidence in a sense come down to assuming that
each piece of evidence is a finite resource with a multiplicity of
one. However, this is a bit tricky, because each piece of evidence
can be used many times overall. But within a limited scope of
inference (such as application of the deduction rule among three
premises Inheritance(A,B), Inheritance(B,C), Inheritance(A,C)), then
the single observation must be treated as a finite resource with a
multiplicity of one.
In general, though, it seems to me that the management of resource
limitations in OpenCog is going to be more complicated than lambda
calculus with multiplicities or linear logic. There are some cases
where the issue is that a certain Atom, in a certain context, can only
be transformed in a certain way once. But in many other cases, the
resource limitations are different… the issues are stuff like: We
would rather prioritize use of an Atom that has a lot of other uses,
so that the marginal memory cost of this particular use of the Atom is
as small as possible. In this case, the issue isn’t that each Atom
can be used only a fixed number of times. Rather, it’s more like: we
want to probabilistically prioritize Atoms that have already been used
a lot of times. This is somewhat conceptually related to the
situations modeled by “lambda calculus with resources” and linear
logic, but not quite the same.
POSSIBLE WORLDS
In cases where we have objects that are supposed to model some
situation (e.g. the PLN logical links coming out of parsing and
interpreting a sentence, or interpreting a visual scene, etc.), then
we can associate each set of objects with the set of situations that
they correctly model.
For instance, an interpretation of a sentence in the context of a
certain class of situations, is associated with the set of possible
situations within that class for which that interpretation is
accurate.
Similarly, a conclusion of a logical inference in the context of a
certain class of situations, is associated with the set of possible
situations within that class for which the conclusion is accurate.
(Or we can make it fuzzy if appropriate, and say that the conclusion
is fuzzily accurate regarding different situations, to different
degrees….)
In some cases there will be contradictory interpretations that cannot
(or are very unlikely to) both be accurate. In this case, if we
don’t have enough evidence to distinguish, we may want to keep both
around in the Atomspace, wrapped in different contexts. For
instance, we may do this with multiple parses of a sentence and their
ensuing logical interpretations. Or, in PLN, the Rule of Choice
specifies that we should do this if we have two very different
estimates of the truth value of the same Atom (keep them separate
rather than merge them via some sort of weighted average).
FORWARD AND BACKWARD CHAINING, ITERATED GOAL-ORIENTED PROBABILISTIC
OBJECT/TRANSFORMATION CHOICE, WHAT ELSE?
Forward and backward chaining are clearly somewhat crude control
mechanisms; however, they are what are standardly used in
theorem-provers. It’s not clear what great alternatives exist in a
general setting.
However, one alternative suggested by the above “meta-learning”
approach to inference control is to just use sampling based on the
specified distributions p(A,G) and p(T,G). If the transformations
involved include both e.g. forward and backward applications of PLN
logic rules, then this sort of sampling approach is going to combine
“forward and backward chains” in a manner guided by the distributions.
When the most likely-looking inference steps are forward, the process
will do forward chaining a while. When the most likely-looking
inference steps are backward, the process will do backward chaining a
while. Sometimes it will mix up forward and backward steps freely.