OpenCog, DNNs, PPLs: Atoms vs Values

372 views
Skip to first unread message

Alexey Potapov

unread,
May 20, 2018, 11:53:26 AM5/20/18
to Ben Goertzel, opencog, Константин Тимофеев, Nil Geisweiller, Linas Vepstas, Vitaly Bogdanov, Cassio Pennachin
Ben, Nil, Linas, Cassio, and whoever might be interested,

2018-05-20 12:54 GMT+03:00 Ben Goertzel <b...@goertzel.org>:
> But how will you calculate P(image|crow,black)?

Well as you know, if you really want to, something like "the RGB value
of the pixel at coordinate (444,555) is within a distance .01 of
(.3,.7,.8)" can be represented as a logical atom  ... so there is no
problem using logic to reason about perceptual data in a very raw way if you want to

OTOH I don't really want to do it that way... instead, as you know, I
want to model visual data using deep NNs of the right sort, and then
feed info about the structured latent variables of these NNs and their
interrelationships into the logical reasoning engine....   This is
because it seems like NNs, rather than explicit logic or probabilistic
programming, are more efficient at processing large-scale raw video
data...

Yeah... and here is the dilemma.
We consider two different yet connected tasks:
Connecting OpenCog with deep neural networks (more specifically, with Tensorflow library);
Implementing efficient probabilistic programming with the use of OpenCog.

Both tasks can be considered as a part of the Semantic Vision problem, but their solution can be useful in a more general context.

OpenCog + Tensorflow
Depth of OpenCog+Tensoflow integration can be quite different. Shallow integration implies that Tensorflow is used as an external module, and communication between Tensorflow and OpenCog is limited to passing activities of neurons, which are represented both by Tensorflow and Atomspace nodes.
The most restricted way is just to run (pre-trained) TF models on input data and to set values of Atomspace nodes in correspondence with the activities of output neurons. What will be missing in this case: feedback connections from the cognitive level to the perception system; online (and joint) training of neural networks and OpenCog.
Let us consider the Visual Question Answering (VQA) task as a motivating example. How will OpenCog be able to answer such questions as “What is the color of the dress of the girl standing to the left of the man in a blue coat?” If our network is pre-trained to detect and recognize all objects in the image and supplement them with detailed descriptions of colors, shapes, poses, textures, etc., then Pattern Matcher will be able to answer such questions (converted to corresponding queries). However, this approach is not computationally feasible: there are too many objects in images, and too many grounded predicates which can be applied to them. Thus, the question should influence the process of how the image is interpreted.
For example, even if we detected bounding boxes (BBs) for all objects and inserted them into AtomSpace, predicate “left to” is not immediately evaluated to all pairs of BBs. Instead, it will be evaluated during query execution by Pattern Matcher (hopefully) only for relevant BBs labeled as “girl” and “man”. Similarly, grounded predicate “is blue” implemented by a neural subnetwork can be computed only in the course of query execution meaning that the work of Pattern Matcher should be extended to neural network levels. Indeed, purely DNN solutions for VQA usually implement some top-down processes at least in the form of attention mechanisms.
Apparently, a cognitive feedback to perception is necessary for AGI in general.
It is not a problem to feed Tensorflow models with data generated by OpenCog via placeholders, but OpenCog will also need some interface for executing computational graphs in Tensorflow. This can be done by binding corresponding Session.run calls with Grounded Predicate/Schema nodes.
The question is how to combine OpenCog and neural networks on the algorithmic level. Let us return to the considered request for VQA. We can imagine a grounded schema node, which detects all bounded boxes with a given class label, and inserts them into Atomspace, so Pattern Matcher or Backward Chainer can further evaluate some grounded predicates over them finally finding an answer to the question. However, the question can be “What is the rightmost object in the scene?” In this case, we don’t expect our system to find all objects, but rather to examine the image starting from its right border. We can imagine queries supposing other strategies of image processing/examination. In general, we would like not to hardcode all possible cases, but to have a general mechanism, which can be trained to execute different queries.
To make neural networks transparent for Pattern Matcher, we need to make nodes of Tensorflow also habitants of Atomspace. The same is needed for a general case of unsupervised learning. In particular, architecture search is needed in order to achieve better generalization with neural networks or simply to choose an appropriate structure of the latent code. Thus, OpenCog should be able to add or deleted nodes in Tensorflow graphs.
These nodes correspond not just to neural layers, but also to operations over them. One can imagine TensorNode nodes connected by PlusLink, TimesLink, etc.. There can be tricky technical issues with Tensorflow (e.g. compilation of dynamical graphs), but they should be solvable.
A conceptual problem consists in that fact that Pattern Matcher work with Atoms, but not with Values. Apparently, activities of neurons should be Values. However, evaluation of, e.g. GreaterThanLink requires NumberNode nodes. Operations over (truth) values are usually implemented in Scheme within rules fed to URE. This might be enough for dealing with individual neuron activities as truth values and with neural networks as grounded predicates, but patterns in values cannot be matched or mined directly (while the idea of SynerGANs implied the necessity to mine patterns in activities of neurons of the latent code).

I was going to illustrate by concreate the same kind of problems with implementing probabilistic programming with OpenCog, but I guess it's already TL;DR.

So, briefly speaking, we need Pattern Matcher and Pattern Miner to work over Values/Valuations, that is not the case now (OpenCog uses only truth and attention values, and Atomese/Pattern Matcher doesn't have a built-in semantic even for them). I cite Linas here:
"Atoms are:

* slow to create, hard to destroy

* are indexed and globally unique

* are searchable

* are immutable


Values are:

* fast and easy to create, destroy, change

* values are highly mutable.

* values are not indexed, are not searchable, are not globally unique."

But we need "fast and easy to create, destroy, change, highly mutable, but searchable" entities. So, this is not only technical, but also conceptual problem...

I would really like to hear your opinion on this. What should we do? Resort to the most shallow integration between OpenCog and DNNs? In this case, SynerGANs will not work since we will not be able to mine patterns in values, and we will not be able to use Pattern Matcher to solve VQA. Express output of DNNs as Atoms? Linas objected even the idea to express coordinates and lables of bounding boxes as Atoms. To do this with activities of neurons will be even worse. Put everything into Space-Time server? But the idea to use the power of Pattern Matcher, URE, etc. will not be achievable. Extend Pattern Matcher to work with Values? Maybe... /*I like the idea of embedding TF computational graph into Atomspace, but tf.mul works over Values (tensors) - not NumberNodes. Thus, in this case, it will be required to make all links (like TimesLink) to work not only with NumberNodes, but also with Values... but I foresee objections from Linas here... Also, I believe it should be useful in general since Values are not first-class objects in Atomese - you should use Scheme/Python/C to describe how to recalculate truth values; you cannot reason about them directly...

Or should we try to use a sort of PPL as a bridge between Values and Atoms? Maybe... Or we should do something unifying all these.*/


The question is not just about binding vision and PLN. It is more general. Say, if you driving a car, you estimate distances and velocities of other cars and take actions on this basis. These are also Values, and you 'reason' over them using both 'number crunching' and 'logic' simultaneously (I don't mean procedural knowledge here in sense of GroundedSchemaNode). So, I don't think that we should limit outselves to a shallow integration and use DNNs/PPL/etc. peripherically only...


Ben Goertzel <b...@goertzel.org>:

if one stays in the world of finite discrete
distributions, one can construct probabilistic logics with
sampling-based semantics... https://arxiv.org/pdf/1602.06420.pdf


Sounds quite interesting. I'll study it in detail...

 -- Alexey


Ben Goertzel

unread,
May 20, 2018, 5:45:59 PM5/20/18
to Alexey Potapov, opencog, Константин Тимофеев, Nil Geisweiller, Linas Vepstas, Vitaly Bogdanov, Cassio Pennachin
***
But we need "fast and easy to create, destroy, change, highly mutable,
but searchable" entities. So, this is not only technical, but also
conceptual problem...

I would really like to hear your opinion on this. What should we do?
***

Hmm, well when I think about the algorithms involved, I do not see why
the Pattern Miner and Pattern Matcher would be unable to search for
patterns involving Values... I think they could.... It's true the
code doesn't do this now though...

The point is that both of these PM algorithms are based on local
search, internally. The Pattern Miner takes a pattern and expands it
incrementally. The Pattern Matcher uses graph search with
backtracking.... In each case the search is proceeding by following
links from the existing patterns already found, and there seems no
reason that the search couldn't explore Values as well as Links while
doing this expansion...

It is true that Values are not indexed globally. But it seems to me
that the search algorithms inside the PMs do not need such indexes...

...

Now coordinate values of bounding boxes ... If we are talking about
something like the bounding box of Ben's face during a conversation,
which changes frequently, this would be appropriately stored in the
Atomspace using a StateLink,

https://wiki.opencog.org/w/StateLink

I would think... Note a comment on that page,

"At this time, the lookup of atom properties is not optimized for
speed, but it could be, by caching properties in the StateLink C++
code."

...

In any case I am confused about how these technical OpenCog plumbing
issues related to the general issues you raise...

One question is: Is probabilistic logic an appropriate method for the
core of an AGI system, given that this AGI system must proceed largely
on observation-based semantics ...

I think the answer is YES

Another question is: Is the current OpenCog infrastructure fully ready
to support scalable probabilistic logic on real-time observation
data...

I think the answer is NOT QUITE
--
Ben Goertzel, PhD
http://goertzel.org

"Only those who will risk going too far can possibly find out how far
they can go." - T.S. Eliot

Ben Goertzel

unread,
May 20, 2018, 5:57:04 PM5/20/18
to Alexey Potapov, opencog, Константин Тимофеев, Nil Geisweiller, Linas Vepstas, Vitaly Bogdanov, Cassio Pennachin
> One question is: Is probabilistic logic an appropriate method for the
> core of an AGI system, given that this AGI system must proceed largely
> on observation-based semantics ...
>
> I think the answer is YES
>
> Another question is: Is the current OpenCog infrastructure fully ready
> to support scalable probabilistic logic on real-time observation
> data...
>
> I think the answer is NOT QUITE

Similarly, we could ask

One question is: Is probabilistic programming an appropriate method for the
core of an AGI system, given that this AGI system must proceed largely
on observation-based semantics ...

I think the answer is YES

Another question is: Is any currently available probabilistic
programming infrastructure fully ready
to support scalable probabilistic programming on real-time observation
data...

I think the answer is NO... or maybe (??) NOT QUITE

...

Regarding the comparison btw probabilistic logic and probabilistic
programming, I would note that

-- dealing with quantifiers and their binding functions in
probabilistic logic is a pain in the ass

-- dealing with execution traces in probabilistic programming is a
pain in the ass

[But ofc, to do probabilistic program learning in any AGI-ish sense,
you need to be modeling execution traces
and all the variable state changes and interrelationships in there etc. ]

So there is copious mess about variables, of different sorts, in both
paradigms..

...

Semi-relatedly, it seems to me that if one takes the connector
approach to proofs, then the set of connectors
comprising a proof can be viewed as a set of dependent types -- and a
proof then can be translated
to a program via following the prescription embodied in the Agda
language, but assuming Agda has
at its disposal a library function that carries out unification ...

First order unification in Agda seems OK

https://github.com/wenkokke/FirstOrderUnificationInAgda

higher order also seems to work

https://github.com/Saizan/miller

but may have bigger scalability issues...

So the mapping from connector proofs to procedures becomes pretty
concrete in this sense

...

The paper I linked in my previous email shows how to (for discrete
pdfs over finite domains) map
probabilistic logic into simple probabilistic programs.... However
it only deals with first-order probability
distros.... When we extend these methods to 2nd and 3rd order
probability distros, we run into the
issue that doing probabilistic program learning via MC sampling or
anything similar to that becomes
extremely slow.... One then wants to do inference to bypass the need
for sampling. But what kind
of inference? Perhaps PLN type abductive and inductive inference?
In this case one needs the probabilistic
logic in order to actually do learning over probabilistic programs
without incurring unrealistic overhead...

...

Overall, my feeling is that probabilistic programming will be better
for procedural knowledge, and probabilistic
logic will be better for declarative knowledge. Converting between
the two will be valuable also. Exactly
where each formulation will be most useful, we will need to determine
via experiment...

-- Ben

Nil Geisweiller

unread,
May 21, 2018, 3:36:31 AM5/21/18
to Alexey Potapov, Ben Goertzel, opencog, Константин Тимофеев, Nil Geisweiller, Linas Vepstas, Vitaly Bogdanov, Cassio Pennachin
Hi Alexey,

these are valid points. Currently, as you probably already understand,
the (only?) way to match values is to resort to grounded schemata, see
for instance

https://github.com/opencog/opencog/blob/ea987668ed713c55c2df087b81f55736d7469772/opencog/learning/miner/rules/shallow-abstraction.scm#L72

where absolutely-true-eval is define here

https://github.com/opencog/atomspace/blob/master/opencog/scm/opencog/rule-engine/rule-engine-utils.scm#L434

For similar reasons PLN formulas are programmed with grounded schemata.
A way to address that would be to complement Atomese with links encoding
operators to access and modify values, GetValueLink, etc. This wouldn't
make the pattern matcher more efficient (initially), but at least it
would allow OpenCog to reason about values.

Nil

On 05/20/2018 06:53 PM, Alexey Potapov wrote:
> Ben, Nil, Linas, Cassio, and whoever might be interested,
>
> 2018-05-20 12:54 GMT+03:00 Ben Goertzel <b...@goertzel.org
> <mailto:b...@goertzel.org>>:
>
> > But how will you calculate P(image|crow,black)?
>
> Well as you know, if you really want to, something like "the RGB value
> of the pixel at coordinate (444,555) is within a distance .01 of
> (.3,.7,.8)" can be represented as a logical atom  ... so there is no
> problem using logic to reason about perceptual data in a very raw
> way if you want to
>
> OTOH I don't really want to do it that way... instead, as you know, I
> want to model visual data using deep NNs of the right sort, and then
> feed info about the structured latent variables of these NNs and their
> interrelationships into the logical reasoning engine....   This is
> because it seems like NNs, rather than explicit logic or probabilistic
> programming, are more efficient at processing large-scale raw video
> data...
>
>
> Yeah... and here is the dilemma.
> We consider two different yet connected tasks:
> –Connecting OpenCog with deep neural networks (more specifically, with
> Tensorflow library);
> –Implementing efficient probabilistic programming with the use of OpenCog.
>
> Both tasks can be considered as a part of the Semantic Vision problem,
> but their solution can be useful in a more general context.
> *
> *
> *OpenCog + Tensorflow*
> alsoconceptual problem...
> Ben Goertzel<b...@goertzel.org <mailto:b...@goertzel.org>>:

Alexey Potapov

unread,
May 21, 2018, 10:42:26 AM5/21/18
to Ben Goertzel, opencog, Константин Тимофеев, Nil Geisweiller, Linas Vepstas, Vitaly Bogdanov, Cassio Pennachin

Hmm, well when I think about the algorithms involved, I do not see why
the Pattern Miner and Pattern Matcher would be unable to search for
patterns involving Values... I think they could....  It's true the
code doesn't do this now though...

Yes, it should be quite possible algorithmically. And that's exactly why we discuss this - because we want to use PM algorithms on Values. However, to implement this, some architectural and organizational decisions should be made (should we generalize existing values to tensors or introduce a separate type of values; should we overload TimesLink, etc. to work both with NumberNodes and Values, or introduce new types of Links, or introduce introduce special links that "atomize" values, etc.; should this be done in a separate repo with keeping core PM algorithms unchanged, or should the core PM be modified, and by whom, etc.). We have few guys who can work on this, but we need to know the preferable way.
 

It is true that Values are not indexed globally.   But it seems to me
that the search algorithms inside the PMs do not need such indexes...

seems so
 

Now coordinate values of bounding boxes ... If we are talking about
something like the bounding box of Ben's face during a conversation,
which changes frequently, this would be appropriately stored in the
Atomspace using a StateLink,

https://wiki.opencog.org/w/StateLink


We considered StateLink as a way to feed OpenCog with observations within the reinforcement learning direction. But the current question remains the same: should we use NumberNodes or Values?..
Also, DNNs are trained on (mini-)batches. It is not too natural from an autonomous agent perspective, but efficient.

 

In any case I am confused about how these technical OpenCog plumbing
issues related to the general issues you raise...

Difference between Atoms and Values is relevant, but this relevance will be much better seen when we go from just Atoms vs Values to the inference processes over them  (declarative logic represents computations inversely; and back inversion to direct computations performed by processors is done by the inference engine; that's why logic poorly deals with number crunching, i.e. Values manipulation, while it is good for reasoning over Atoms), which I have not yet discussed on a technical level. However, I mentioned this problem in my long message on example of PM application to VQA. Maybe we should not discuss all these question simultaneously, but I can try to elaborate on this if you wish.

 

One question is: Is probabilistic logic an appropriate method for the
core of an AGI system, given that this AGI system must proceed largely
on observation-based semantics ...

I think the answer is YES

I think it is necessary but not sufficient
 

Another question is: Is the current OpenCog infrastructure fully ready
to support scalable probabilistic logic on real-time observation
data...

I think the answer is NOT QUITE

True.


Similarly, we could ask

One question is: Is probabilistic programming an appropriate method for the

core of an AGI system, given that this AGI system must proceed largely
on observation-based semantics ...

I think the answer is YES

Well, as I have already said once, I don't think that (existing) probabilistic programming really solves anything. It is a good way to uniformly put problems. So, I wouldn't say it's an appropriate *method*, but it's an appropriate (but again not sufficient) way of framing the AGI problem.
 

Another question is: Is any currently available probabilistic
programming infrastructure fully ready
to support scalable probabilistic programming on real-time observation
data...

I think the answer is NO... or maybe (??) NOT QUITE

Definitely.


Regarding the comparison btw probabilistic logic and probabilistic
programming, I would note that

-- dealing with quantifiers and their binding functions in
probabilistic logic is a pain in the ass

-- dealing with execution traces in probabilistic programming is a
pain in the ass

[But ofc, to do probabilistic program learning in any AGI-ish sense,
you need to be modeling execution traces
and all the variable state changes and interrelationships in there etc. ]

So there is copious mess about variables, of different sorts, in both
paradigms..

Sure.


When we extend these methods to 2nd and 3rd order
probability distros, we run into the
issue that doing probabilistic program learning via MC sampling or
anything similar to that becomes
extremely slow....   One then wants to do inference to bypass the need
for sampling.   But what kind
of inference?  Perhaps PLN type abductive and inductive inference?
In this case one needs the probabilistic
logic in order to actually do learning over probabilistic programs
without incurring unrealistic overhead...


Exactly. Probabilistic logic is a way to make inference over probabilistic programs much more efficient. I have specific examples for this in mind.

Overall, my feeling is that probabilistic programming will be better
for procedural knowledge, and probabilistic
logic will be better for declarative knowledge

Hmm... not precisely. In the context of probabilistic inference, purely procedural knowledge is the result of specialization of a general inference procedure w.r.t. specific generative model, that is, discriminative models are purely procedural. With the use of generative models, you can infer (and should infer with the use of search like in probabilistic logic) truth values for any conditional expression, but these models don't say how exactly to calculate these values, so they don't represent procedural knowledge in this sense, and have some features of declarative knowledge. I couldn't call generative models a declarative knowledge either. So, I'm slightly confused how to classify them...

Alexey Potapov

unread,
May 21, 2018, 3:51:08 PM5/21/18
to Nil Geisweiller, Ben Goertzel, opencog, Константин Тимофеев, Linas Vepstas, Vitaly Bogdanov, Cassio Pennachin
Hi Nil,

Currently, as you probably already understand, the (only?) way to match values is to resort to grounded schemata.
 
Conceptually, this might be ok. We have conscious processes implemented in a self-reflective Atomese and operating over atoms, and subconscious processes operating over values in a native code. However, what we lack in this case is meta-computations: specialization of a conscious decision-making w.r.t. some specific task should yield its efficient implementation in a native code (or trained DNNs). What we also lack is a general API to connect these conscious and subconscious processes.

For similar reasons PLN formulas are programmed with grounded schemata. A way to address that would be to complement Atomese with links encoding operators to access and modify values, GetValueLink, etc. This wouldn't make the pattern matcher more efficient (initially), but at least it would allow OpenCog to reason about values.
 

What do you suppose GetValueLink to do?
Do you mean that this link is needed for Pattern Matcher to explicitly know that we want to apply TimesLink, etc. to Values? I guess, it makes sense, although this might be not too efficient, indeed.


-- Alexey

Linas Vepstas

unread,
May 21, 2018, 4:11:40 PM5/21/18
to Alexey Potapov, Ben Goertzel, opencog, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
Hi Alexey,

I will answer this email in several parts.  Re: atoms vs values, my thinking is this:

-- Use Atoms to represent the "topology" of a network: what is connected to what.  Atoms express (long-term, slowly-varying) relationships between things.

-- Use Values to hold fast-changing data.  For example, you could have a (C++) VideoValue object that, when you attached to it, provided you with a video-stream.  (Perhaps you want a VideoProducerValue and a VideoConsumerValue. I have not thought about that very much).  The point is that, using today's code base, as it exists right now, you could write the code for a VideoValue object "in an afternoon", and it would work, with no performance bottlenecks, no excess RAM usage, no excess CPU overhead.  (The "afternoon" might actually be a few days -- but it would not be a few weeks.  You can get started now.)

I want to say that Values can be used to carry things that "flow around on the network", but this idea has not been explored very much.  Right now, Values are only "fast-changing-things attached to an atom".  How that Atom might represent the "topology" (the connection) between "things" does not yet have any clear policy.   I have been advocating the idea of using "connectors" to connect things. I've tortured Anton Kolonin and the language-learning crew with this idea, but the concept of forming connections is more general than just linguistics.

Should I repeat some basics?  Atoms are heavy-weight, precisely because they create and update caches of what they are connected to.  That makes it easy and fast to find what an atom is connected to, but slow to actually make the atom.  Atoms are also held in an index, so that they can be searched by name, by type. Insertion into an index is expensive -- and stupid, if you never use the index.  Values avoid this overhead.

--linas


--
cassette tapes - analog TV - film cameras - you

Alexey Potapov

unread,
May 21, 2018, 4:40:33 PM5/21/18
to Linas Vepstas, Ben Goertzel, opencog, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
Linas,
 

I will answer this email in several parts.  Re: atoms vs values, my thinking is this:

-- Use Atoms to represent the "topology" of a network: what is connected to what.  Atoms express (long-term, slowly-varying) relationships between things.

-- Use Values to hold fast-changing data.  For example, you could have a (C++) VideoValue object that, when you attached to it, provided you with a video-stream.  (Perhaps you want a VideoProducerValue and a VideoConsumerValue. I have not thought about that very much).  The point is that, using today's code base, as it exists right now, you could write the code for a VideoValue object "in an afternoon", and it would work, with no performance bottlenecks, no excess RAM usage, no excess CPU overhead.  (The "afternoon" might actually be a few days -- but it would not be a few weeks.  You can get started now.)

I want to say that Values can be used to carry things that "flow around on the network", but this idea has not been explored very much.  Right now, Values are only "fast-changing-things attached to an atom".  How that Atom might represent the "topology" (the connection) between "things" does not yet have any clear policy.   I have been advocating the idea of using "connectors" to connect things. I've tortured Anton Kolonin and the language-learning crew with this idea, but the concept of forming connections is more general than just linguistics.

Should I repeat some basics?  Atoms are heavy-weight, precisely because they create and update caches of what they are connected to.  That makes it easy and fast to find what an atom is connected to, but slow to actually make the atom.  Atoms are also held in an index, so that they can be searched by name, by type. Insertion into an index is expensive -- and stupid, if you never use the index.  Values avoid this overhead.


Yes, this is clear (although I'd like to know more about your ideas regarding connectors), and I agree with this. But this is not an answer to the question, what is the best way to integrate DNNs with Atomese.

-- Alexey

Linas Vepstas

unread,
May 21, 2018, 5:11:29 PM5/21/18
to Alexey Potapov, Ben Goertzel, opencog, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
Ongoing breakup of TL;DR into small pieces.

On Sun, May 20, 2018 at 10:53 AM, Alexey Potapov <pot...@aideus.com> wrote:

Both tasks can be considered as a part of the Semantic Vision problem, but their solution can be useful in a more general context.

OpenCog + Tensorflow
Depth of OpenCog+Tensoflow integration can be quite different. Shallow integration implies that Tensorflow is used as an external module, and communication between Tensorflow and OpenCog is limited to passing activities of neurons, which are represented both by Tensorflow and Atomspace nodes.
The most restricted way is just to run (pre-trained) TF models on input data and to set values of Atomspace nodes in correspondence with the activities of output neurons. What will be missing in this case: feedback connections from the cognitive level to the perception system; online (and joint) training of neural networks and OpenCog.
Let us consider the Visual Question Answering (VQA) task as a motivating example. How will OpenCog be able to answer such questions as “What is the color of the dress of the girl standing to the left of the man in a blue coat?” If our network is pre-trained to detect and recognize all objects in the image and supplement them with detailed descriptions of colors, shapes, poses, textures, etc., then Pattern Matcher will be able to answer such questions (converted to corresponding queries). However, this approach is not computationally feasible: there are too many objects in images, and too many grounded predicates which can be applied to them.

Is that true? Maybe. Certainly, "color of a dress" is a long-term durable property of a dress: it will not change for hours. In that sense, it is appropriate to record it, statically, in the AtomSpace.

One form of autism, I am told, is that the brain is overwhelmed with sensory data: one is seeing and hearing everything, and cannot focus on any one thing.  Perhaps this could become a risk for the atomspace. But -- "too many objects in images, and too many grounded predicates" -- How many are we talking about, here? dozens, hundreds of objects? hundreds of predicates per object? That is 100x100 = 10K and, currently, you can create and add maybe 100K atoms/sec to the atomspace (via C++, less by scheme, python, due to wrapper overhead). So this seems manageable.

Of course, it can be much more efficient to "not notice something until someone asks you about it". And then you can respond, and say "Hey, I never noticed that before, but yes, now that you asked, I can now clearly see that her dress is blue".   My son was trampling over flowers, the other day, which, for some reason, he had not noticed until I pointed at them. Odd, since they were bright blue, albeit quite small.
 
Thus, the question should influence the process of how the image is interpreted.
For example, even if we detected bounding boxes (BBs) for all objects and inserted them into AtomSpace, predicate “left to” is not immediately evaluated to all pairs of BBs. Instead, it will be evaluated during query execution by Pattern Matcher (hopefully) only for relevant BBs labeled as “girl” and “man”.
Yes.
 
Similarly, grounded predicate “is blue” implemented by a neural subnetwork can be computed only in the course of query execution meaning that the work of Pattern Matcher should be extended to neural network levels.

There is a generic mechanism called "GroundedPredicateNode", and it can call arbitrary C++/scheme/python/haskell code, which must return a true/false value.  True means "yes, match and continue with the rest of the query".

Unfortunately, GroundedPredicateNodes are "black boxes"; we do not know what is inside. Thus, it is useful to sometimes define "clear boxes":  for example: GreaterThanLink.  The GreaterThanLink can handle an infinite number of inputs, but it is not a black box: we know exactly what kind of inputs it expects, what it produces, what it does.  Thus, it is possible to perform logical reasoning on GreaterThanLinks, and/or perform algebraic simplification (a<b<c implies a<c, etc)

 
Indeed, purely DNN solutions for VQA usually implement some top-down processes at least in the form of attention mechanisms.
Apparently, a cognitive feedback to perception is necessary for AGI in general.
It is not a problem to feed Tensorflow models with data generated by OpenCog via placeholders, but OpenCog will also need some interface for executing computational graphs in Tensorflow. This can be done by binding corresponding Session.run calls with Grounded Predicate/Schema nodes.

.. or to Values. 

The question is how to combine OpenCog and neural networks on the algorithmic level. Let us return to the considered request for VQA. We can imagine a grounded schema node, which detects all bounded boxes with a given class label, and inserts them into Atomspace,

For example, one creates a ConceptNode "dress".  One also creates a PredicateNode "*-bounding-box-*"  Then one writes C++ code to implement the TensorFlowBBValue object.   One then associates all three:

(cog-set-value! (Concept "dress") (Predicate "*-bounding-box-*") (TensorFlowBBValue "obj-id-42"))

What is the current bounding box for that dress?  I don't know, but I can find out:

(cog-value->list (cog-value (Concept "dress") (Predicate "*-bounding-box-*")))

returns 2 or 4 floating point numbers, as a list.    Is Susan wearing that dress?

(cog-set-value! (Concept "Face-of-Susan") (Predicate "*-bounding-box-*") (TensorFlowBBValue "obj-id-66"))

(is-near? A B)  (> 0.1 distance (cog-value A (Predicate "*-bounding-box-*")) (cog-value B (Predicate "*-bounding-box-*"))

returns true if there is less than 0.1 meters distance between the bounding boxes on A and B.

The actual location of the bounding boxes is never stored, and never accessed, unless the is-near? predicate runs.

 
so Pattern Matcher or Backward Chainer can further evaluate some grounded predicates over them finally finding an answer to the question. However, the question can be “What is the rightmost object in the scene?” In this case, we don’t expect our system to find all objects, but rather to examine the image starting from its right border.

This is uglier, and there are several reasonable solutions.  It depends on whether or not you want to waste CPU cycles maintaining a left-right sorted list of objects, or not.  Performing ten sorts per second is expensive, if you are almost never  interested in the right-most object.
  
We can imagine queries supposing other strategies of image processing/examination. In general, we would like not to hardcode all possible cases, but to have a general mechanism, which can be trained to execute different queries.
Yes.
 

To make neural networks transparent for Pattern Matcher, we need to make nodes of Tensorflow also habitants of Atomspace.
Yes.
 
The same is needed for a general case of unsupervised learning. In particular, architecture search is needed in order to achieve better generalization with neural networks or simply to choose an appropriate structure of the latent code. Thus, OpenCog should be able to add or deleted nodes in Tensorflow graphs.

Yes. Atoms are best used for representing graphs and relationships that are stable over long periods of time (more than a few seconds, on current-generation CPU's)

These nodes correspond not just to neural layers, but also to operations over them. One can imagine TensorNode nodes connected by PlusLink, TimesLink, etc..

Yes.  However, we might also need PlusValue or TimesValue.  I do not know why, yet, but these are potentially useful, as well.
 
There can be tricky technical issues with Tensorflow (e.g. compilation of dynamical graphs), but they should be solvable.
A conceptual problem consists in that fact that Pattern Matcher work with Atoms, but not with Values. Apparently, activities of neurons should be Values. However, evaluation of, e.g. GreaterThanLink requires NumberNode nodes.

This is a historical accident. GreaterThanLink and NumberNodes were invented long before the idea of Values became clear.  Now that the usefulness of Values is becoming clear, its time to redesign GreaterThanLink.

Perhaps we need an IsLeftOfLink that knows automatically to obtain the "*-centroid-*" value on two atoms, and then return true/false depending on the result (or throw exception if there is no *-centroid-* value.)

The pattern matcher can then work, without any modification at all, with IsLeftOfLink.  I assume the same would be true for URE/PLN.

Bonus: because IsLeftOfLink is a "clearbox" link, we can reason about it, without actually having to access any values. We know that, "A left-of B left-of C" implies that "A left-of C"

--linas
 
Operations over (truth) values are usually implemented in Scheme within rules fed to URE. This might be enough for dealing with individual neuron activities as truth values and with neural networks as grounded predicates, but patterns in values cannot be matched or mined directly (while the idea of SynerGANs implied the necessity to mine patterns in activities of neurons of the latent code).

I was going to illustrate by concreate the same kind of problems with implementing probabilistic programming with OpenCog, but I guess it's already TL;DR.

So, briefly speaking, we need Pattern Matcher and Pattern Miner to work over Values/Valuations, that is not the case now (OpenCog uses only truth and attention values, and Atomese/Pattern Matcher doesn't have a built-in semantic even for them). I cite Linas here:
"Atoms are:

* slow to create, hard to destroy

* are indexed and globally unique

* are searchable

* are immutable


Values are:

* fast and easy to create, destroy, change

* values are highly mutable.

* values are not indexed, are not searchable, are not globally unique."

But we need "fast and easy to create, destroy, change, highly mutable, but searchable" entities. So, this is not only technical, but also conceptual problem...

I would really like to hear your opinion on this. What should we do? Resort to the most shallow integration between OpenCog and DNNs? In this case, SynerGANs will not work since we will not be able to mine patterns in values, and we will not be able to use Pattern Matcher to solve VQA. Express output of DNNs as Atoms? Linas objected even the idea to express coordinates and lables of bounding boxes as Atoms. To do this with activities of neurons will be even worse. Put everything into Space-Time server? But the idea to use the power of Pattern Matcher, URE, etc. will not be achievable. Extend Pattern Matcher to work with Values? Maybe... /*I like the idea of embedding TF computational graph into Atomspace, but tf.mul works over Values (tensors) - not NumberNodes. Thus, in this case, it will be required to make all links (like TimesLink) to work not only with NumberNodes, but also with Values... but I foresee objections from Linas here... Also, I believe it should be useful in general since Values are not first-class objects in Atomese - you should use Scheme/Python/C to describe how to recalculate truth values; you cannot reason about them directly...

Or should we try to use a sort of PPL as a bridge between Values and Atoms? Maybe... Or we should do something unifying all these.*/


The question is not just about binding vision and PLN. It is more general. Say, if you driving a car, you estimate distances and velocities of other cars and take actions on this basis. These are also Values, and you 'reason' over them using both 'number crunching' and 'logic' simultaneously (I don't mean procedural knowledge here in sense of GroundedSchemaNode). So, I don't think that we should limit outselves to a shallow integration and use DNNs/PPL/etc. peripherically only...


Ben Goertzel <b...@goertzel.org>:

if one stays in the world of finite discrete
distributions, one can construct probabilistic logics with
sampling-based semantics... https://arxiv.org/pdf/1602.06420.pdf


Sounds quite interesting. I'll study it in detail...

 -- Alexey


Linas Vepstas

unread,
May 21, 2018, 5:28:55 PM5/21/18
to Ben Goertzel, Alexey Potapov, opencog, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
On Sun, May 20, 2018 at 4:45 PM, Ben Goertzel <b...@goertzel.org> wrote:


Hmm, well when I think about the algorithms involved, I do not see why
the Pattern Miner and Pattern Matcher would be unable to search for
patterns involving Values...

Here is the key difference between the two: Atoms know what they are
attached to.  A link knows it's outgoing-set: its a C++ vector, accessible in
nanoseconds.  Any atom knows it's incoming set - a set of C++ "weak
pointers", accessible in fractions of a microsecond.  Thus, graph traversal
is very fast.

To create an atom, you have to create the incoming, outgoing sets, perform
various consistency checks. This takes tens of microseconds.

Values have no clue who they belong to.  Given a Value, its impossible
to know what atom it belongs to. (except to search all atoms).  Thus,
creating values, altering them is fast.

Values are stored "inside of" atoms, under a key.  Thus, every atom
is a key-value database (i.e. every atom is a stand-alone noSQL
database).

To use Values in the pattern matcher, you have to create a predicate that
can answer the question: "does this atom have a value filed at key X,
and is that value the desired shape/form/value?"  (viz, you have to perform
a "nosql lookup" that returns yes/no as an answer).

--linas










--
Ben Goertzel, PhD
http://goertzel.org

"Only those who will risk going too far can possibly find out how far
they can go." - T.S. Eliot

Linas Vepstas

unread,
May 21, 2018, 5:52:31 PM5/21/18
to Alexey Potapov, Ben Goertzel, opencog, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
On Mon, May 21, 2018 at 3:40 PM, Alexey Potapov <pot...@aideus.com> wrote:

Yes, this is clear (although I'd like to know more about your ideas regarding connectors),

Sure; say when. I can talk about them for days.  Unfortunately, the core idea is so simple, so obvious, that it becomes very difficult to talk about the advanced concepts, so this needs to be a distinct conversation.

 
and I agree with this. But this is not an answer to the question, what is the best way to integrate DNNs with Atomese.

"The best way" is a very abstract question.  I know two ways of answering it.

a) I study DNN's for a long time, think hard about DNN's for a long time, find an answer that I like, and then attempt to explain it to a disbelieving, perplexed, ungrateful, hostile audience.

b) You ask me narrow, focused questions about certain specific tasks, and I answer how they could be accomplished, and how much work that would take.

I find that b) is much easier.

Currently, I don't know what more you want, besides what I've already written, in the last several emails.

Alexey Potapov

unread,
May 21, 2018, 7:11:20 PM5/21/18
to opencog, Ben Goertzel, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
2018-05-22 0:11 GMT+03:00 Linas Vepstas <linasv...@gmail.com>:

How many are we talking about, here? dozens, hundreds of objects? hundreds of predicates per object? That is 100x100 = 10K and, currently, you can create and add maybe 100K atoms/sec to the atomspace (via C++, less by scheme, python, due to wrapper overhead). So this seems manageable.

Thousands or even millions of objects. I can ask you a question about a speck of dust sparkling in the sunlight, hot pixel on your screen, tiny birthmark on a face, a hole in a button with a thread passing through it, etc. Each pixel belongs to tens of "objects"...
Predicates per object? Maybe dozens or hundreds, yes, but it's difficult to count.
The problem is not (only) in Atomspace, but in DNNs. They cannot recognize small objects in a bottom-up way. A thread doesn't differ from a wire. Chair leg doesn't differ from a pencil. We need to reconstruct context, go back, analyze an image again, etc. This already requires some higher-level knowledge, but not only bottop-up discriminative models. This requires too much computations to analyze each pixel in each context/at each scale...
Maybe, it will possible to analyze images and construct their more or less complete description given much more computational resources than available now (but not unrealistically much) -- this is not done by humans, but computers may have greater capabilities (in future). But still, questions can be about a priori unknown objects, which pre-trained discriminative models cannot detect (or which interpretation has low probability: "does this cloud look more like horse or elephant?").
Thus, I believe, we cannot limit ourselves to a too loose integration of reasoning and perception.
Queries/questions may force us to analyze images in a different way, and this can be views as an expantion of the opencog-style pattern matcher to lower perception levels.
 

Similarly, grounded predicate “is blue” implemented by a neural subnetwork can be computed only in the course of query execution meaning that the work of Pattern Matcher should be extended to neural network levels.

There is a generic mechanism called "GroundedPredicateNode", and it can call arbitrary C++/scheme/python/haskell code, which must return a true/false value.  True means "yes, match and continue with the rest of the query".

Unfortunately, GroundedPredicateNodes are "black boxes"; we do not know what is inside. Thus, it is useful to sometimes define "clear boxes":  for example: GreaterThanLink.  The GreaterThanLink can handle an infinite number of inputs, but it is not a black box: we know exactly what kind of inputs it expects, what it produces, what it does.  Thus, it is possible to perform logical reasoning on GreaterThanLinks, and/or perform algebraic simplification (a<b<c implies a<c, etc)

Yes, I meant this.

 

The question is how to combine OpenCog and neural networks on the algorithmic level. Let us return to the considered request for VQA. We can imagine a grounded schema node, which detects all bounded boxes with a given class label, and inserts them into Atomspace,

For example, one creates a ConceptNode "dress".  One also creates a PredicateNode "*-bounding-box-*"  Then one writes C++ code to implement the TensorFlowBBValue object.   One then associates all three:

(cog-set-value! (Concept "dress") (Predicate "*-bounding-box-*") (TensorFlowBBValue "obj-id-42"))

What is the current bounding box for that dress?  I don't know, but I can find out:

(cog-value->list (cog-value (Concept "dress") (Predicate "*-bounding-box-*")))

returns 2 or 4 floating point numbers, as a list.    Is Susan wearing that dress?

(cog-set-value! (Concept "Face-of-Susan") (Predicate "*-bounding-box-*") (TensorFlowBBValue "obj-id-66"))

(is-near? A B)  (> 0.1 distance (cog-value A (Predicate "*-bounding-box-*")) (cog-value B (Predicate "*-bounding-box-*"))

returns true if there is less than 0.1 meters distance between the bounding boxes on A and B.

The actual location of the bounding boxes is never stored, and never accessed, unless the is-near? predicate runs.

Well... the problem is that our system should learn most of this somehow, and it cannot learn C++ or Schema code (at least now, without meta-computations). We would like to hardcode as less as possible. We can (and likely should) code TensorFlowValue and cog-set-value!ing them, but we would like to avoid hardcoding (is-near? A B).
 

 Yes... Yes...

Good that we are on the same page here.
 

These nodes correspond not just to neural layers, but also to operations over them. One can imagine TensorNode nodes connected by PlusLink, TimesLink, etc..

Yes.  However, we might also need PlusValue or TimesValue.  I do not know why, yet, but these are potentially useful, as well.

This is exactly my question whether we need them or not :)
Nil also proposed to use GetValueLink...
 

 
There can be tricky technical issues with Tensorflow (e.g. compilation of dynamical graphs), but they should be solvable.
A conceptual problem consists in that fact that Pattern Matcher work with Atoms, but not with Values. Apparently, activities of neurons should be Values. However, evaluation of, e.g. GreaterThanLink requires NumberNode nodes.

This is a historical accident. GreaterThanLink and NumberNodes were invented long before the idea of Values became clear.  Now that the usefulness of Values is becoming clear, its time to redesign GreaterThanLink.

Nice.
 

Perhaps we need an IsLeftOfLink that knows automatically to obtain the "*-centroid-*" value on two atoms, and then return true/false depending on the result (or throw exception if there is no *-centroid-* value.)

Sorry, I did't not precisely get this. What is centroid and how is it connected to IsLeftOf?
 
-- Alexey

Ben Goertzel

unread,
May 21, 2018, 10:09:19 PM5/21/18
to opencog, Константин Тимофеев, Nil Geisweiller, Linas Vepstas, Vitaly Bogdanov, Cassio Pennachin
Hi Alexey,

> Yes, it should be quite possible algorithmically. And that's exactly why we
> discuss this - because we want to use PM algorithms on Values. However, to
> implement this, some architectural and organizational decisions should be
> made (should we generalize existing values to tensors or introduce a
> separate type of values; should we overload TimesLink, etc. to work both
> with NumberNodes and Values, or introduce new types of Links, or introduce
> introduce special links that "atomize" values, etc.; should this be done in
> a separate repo with keeping core PM algorithms unchanged, or should the
> core PM be modified, and by whom, etc.). We have few guys who can work on
> this, but we need to know the preferable way.

I think it's better, if possible, to figure out a way to suitably
modify the core PM rather than
using a separate repo ...

However, I guess the PM tweaks would need to be done someone on your team, as
Linas and Nil probably are too busy and we don't have a lot of others
who can rapidly
perform such changes...

I would personally be in favor of overloading stuff like TimesLink in
order to apply to both
NumberNodes and Values, because it seems to me that the Atom/Value
distinction is more of
an efficiency-driven implementation distinction rather than a
fundamental mathematical/conceptual
distinction...

Nil and Linas should be consulted on this stuff, but at this point you
are also in the exalted
"inner circle" with foundational input on these OpenCog-architecture issues...

>> Now coordinate values of bounding boxes ... If we are talking about
>> something like the bounding box of Ben's face during a conversation,
>> which changes frequently, this would be appropriately stored in the
>> Atomspace using a StateLink,
>>
>> https://wiki.opencog.org/w/StateLink
>>
>
> We considered StateLink as a way to feed OpenCog with observations within
> the reinforcement learning direction. But the current question remains the
> same: should we use NumberNodes or Values?..

See my comments below on that... maybe we want some special TensorValues ..

> Also, DNNs are trained on (mini-)batches. It is not too natural from an
> autonomous agent perspective, but efficient.

Yes I see. Again maybe some new TensorValue construct will be needed, we just
need to understand clearly what the requirements are in terms of any special
indexing etc.

> Difference between Atoms and Values is relevant, but this relevance will be
> much better seen when we go from just Atoms vs Values to the inference
> processes over them (declarative logic represents computations inversely;
> and back inversion to direct computations performed by processors is done by
> the inference engine; that's why logic poorly deals with number crunching,
> i.e. Values manipulation, while it is good for reasoning over Atoms), which
> I have not yet discussed on a technical level. However, I mentioned this
> problem in my long message on example of PM application to VQA. Maybe we
> should not discuss all these question simultaneously, but I can try to
> elaborate on this if you wish.

The difference between Atoms and Values is just an implementation-efficiency
tactic...

Values as currently implemented have some properties of Atoms but not others...

Possibly different implementation-efficiency tactics may be of value
in a "tensorial
Atomese" context...

If needed we could also introduce some sort of entity that is between
a Value and an Atom
in some sense -- i.e. we could introduce some sort of TensorValue entity
that

1) Perhaps, knows what links to it (like an Atom but unlike a Value)

2) has an internal tensor that is mutable

There is nothing prohibiting one from building something like this
into Atomspace,
though obviously not breaking various mechanisms would require some care...

>> One question is: Is probabilistic logic an appropriate method for the
>> core of an AGI system, given that this AGI system must proceed largely
>> on observation-based semantics ...
>>
>> I think the answer is YES
>
>
> I think it is necessary but not sufficient

Sure, clearly I agree w that which is why OpenCog has all this other
shit in it too ;) ... and then the interesting questions come up, like
which other methods do we need and how do they need to interoperate...

>
> Exactly. Probabilistic logic is a way to make inference over probabilistic
> programs much more efficient. I have specific examples for this in mind.

It will be good to hear the examples when you have time...

>> Overall, my feeling is that probabilistic programming will be better
>> for procedural knowledge, and probabilistic
>> logic will be better for declarative knowledge
>
>
> Hmm... not precisely. In the context of probabilistic inference, purely
> procedural knowledge is the result of specialization of a general inference
> procedure w.r.t. specific generative model, that is, discriminative models
> are purely procedural. With the use of generative models, you can infer (and
> should infer with the use of search like in probabilistic logic) truth
> values for any conditional expression, but these models don't say how
> exactly to calculate these values, so they don't represent procedural
> knowledge in this sense, and have some features of declarative knowledge. I
> couldn't call generative models a declarative knowledge either. So, I'm
> slightly confused how to classify them...

Yes, the language we have for describing these things introduces confusions...

For instance, I like to think about evolutionary programming (e.g.
MOSES) as a tool for learning procedural knowledge, but OTOH our main
use of this tool right now is for learning classification rules. Now
a program embodying a classification rule is, in a sense, a "procedure
for performing the classification" ... but then in this sense, every
logical inference is also a cognitive procedure ;p

So you're right, we don't have the right language for describing which
problems are best addressed by a programming-language-ish approach and
which by a logic-ish approach.... (and noting that there is a
sorta-fast conversion btw the two approaches... even so in practice
the conversion is not sooo fast as to obviate the value of looking at
the two approaches sorta-separately, at the moment..)

This seems a solvable language/conceptual problem, but I don't have
time to think about it hard right now either...

ben

Nil Geisweiller

unread,
May 22, 2018, 1:35:04 AM5/22/18
to Alexey Potapov, Nil Geisweiller, Ben Goertzel, opencog, Константин Тимофеев, Linas Vepstas, Vitaly Bogdanov, Cassio Pennachin
On 05/21/2018 10:51 PM, Alexey Potapov wrote:
> processes operating over values in a native code. However, what we lack
> in this case is meta-computations: specialization of a conscious
> decision-making w.r.t. some specific task should yield its efficient
> implementation in a native code (or trained DNNs). What we also lack is
> a general API to connect these conscious and subconscious processes.

Good point. I believe a possible way to connect these is via
reasoning, initially at least before this turns into schemata as well
(some of it anyway), something that is able to weight the pros and
cons of using a subconscious efficient process vs a more conscious and
general one.

What seems to happen in human is that high confidence knowledge tend
to go subconscious, while less confidence knowledge tend to be the
subject of the attention. So for instance a at first you may focus on
edge detection, etc, once you've built some high confidence model
about the relationships within this domain, you move that to the
subconscious (the neo cortex? I'm not enough into neuroscience to
tell) then you can focus on the next abstraction.

So transposed to OpenCog, I don't really know, it could mean that for
instance once you have say built an ImplicationLink with sufficient
confidence (with a Dirac like second order distribution), you no
longer need to bother updating it, unless perhaps the likelihood of
the incoming data considerably deviate from normal, in which case it
may mean that you need to go back to the basics (I suspect psychedelic
drugs do something like that, though I don't know if that a side
effect or a fundamental effect, I lack experience to really tell).

> For similar reasons PLN formulas are programmed with grounded
> schemata. A way to address that would be to complement Atomese with
> links encoding operators to access and modify values, GetValueLink,
> etc. This wouldn't make the pattern matcher more efficient
> (initially), but at least it would allow OpenCog to reason about values.
>
>
> What do you suppose GetValueLink to do?

For instance

GetValueLink
<atom>
<key>

would return the value corresponding to `key` in `atom`. Note that
`key` is itself an atom. However the returned value may not
necessarily be an atom, it may be a proto atom, if so it would need to
"atomized". This makes me think that ProtoAtoms probably need some
"atomize" method or something.

> Do you mean that this link is needed for Pattern Matcher to explicitly
> know that we want to apply TimesLink, etc. to Values? I guess, it makes
> sense, although this might be not too efficient, indeed.

The atomization might be the problem, but if it is meant to be
volatile (not be stored in the atomspace) then maybe the whole thing
can be done efficiently.

It seems it would be good if Atomese offers operators to create,
destroy, etc, atomspace, because then temporary computations could be
moved in small atomspace that I suspect would still be able to
manipulate atoms with less overhead (though I don't know well from
where come the overhead of inserting atoms in an atomspace).

Nil

>
>
> -- Alexey

Nil Geisweiller

unread,
May 22, 2018, 1:48:52 AM5/22/18
to linasv...@gmail.com, Alexey Potapov, Ben Goertzel, opencog, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
On 05/21/2018 11:11 PM, Linas Vepstas wrote:
> actually make the atom.  Atoms are also held in an index, so that they
> can be searched by name, by type. Insertion into an index is expensive
> -- and stupid, if you never use the index.  Values avoid this overhead.

Could this indexing be made lazy?

Nil

Nil Geisweiller

unread,
May 22, 2018, 2:05:32 AM5/22/18
to Nil Geisweiller, Alexey Potapov, Ben Goertzel, opencog, Константин Тимофеев, Linas Vepstas, Vitaly Bogdanov, Cassio Pennachin
On 05/22/2018 08:34 AM, Nil Geisweiller wrote:
>> Do you mean that this link is needed for Pattern Matcher to explicitly
>> know that we want to apply TimesLink, etc. to Values? I guess, it
>> makes sense, although this might be not too efficient, indeed.
>
> The atomization might be the problem, but if it is meant to be
> volatile (not be stored in the atomspace) then maybe the whole thing
> can be done efficiently.

Actually, I don't think the atomese interpretor (the Instantiator class)
insert intermediary created atoms in the atomspace, so it might actually
be almost efficient out-of-the-box.

Nil

Nil Geisweiller

unread,
May 22, 2018, 2:10:22 AM5/22/18
to Alexey Potapov, opencog, Ben Goertzel, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
On 05/22/2018 02:11 AM, Alexey Potapov wrote:
> Perhaps we need an IsLeftOfLink that knows automatically to obtain
> the "*-centroid-*" value on two atoms, and then return true/false
> depending on the result (or throw exception if there is no
> *-centroid-* value.)
>
>
> Sorry, I did't not precisely get this. What is centroid and how is it
> connected to IsLeftOf?

I think what Linas means is that atoms could optionally have a
"*-centroid-*" key, with associated coordonates as value, and IsLeftOf
may use such valuations to calculate this in information on the fly.

See the implementations of Atom::setTruthValue and Atom::getTruthValue
to understand what I mean.

Nil

Alexey Potapov

unread,
May 22, 2018, 10:26:50 PM5/22/18
to opencog, Константин Тимофеев, Nil Geisweiller, Linas Vepstas, Vitaly Bogdanov, Cassio Pennachin
Hi Ben,


I think it's better, if possible, to figure out a way to suitably
modify the core PM rather than using a separate repo ...

However, I guess the PM tweaks would need to be done someone on your team, as
Linas and Nil probably are too busy and we don't have a lot of others
who can rapidly perform such changes...

I would personally be in favor of overloading stuff like TimesLink in
order to apply to both
NumberNodes and Values, because it seems to me that the Atom/Value
distinction is more of
an efficiency-driven implementation distinction rather than a
fundamental mathematical/conceptual distinction...

Nil and Linas should be consulted on this stuff, but at this point you
are also in the exalted
"inner circle" with foundational input on these OpenCog-architecture issues...

Got this.
 

If needed we could also introduce some sort of entity that is between
a Value and an Atom in some sense -- i.e. we could introduce some sort of TensorValue entity that

1) Perhaps, knows what links to it (like an Atom but unlike a Value)

2) has an internal tensor that is mutable

There is nothing prohibiting one from building something like this
into Atomspace,
though obviously not breaking various mechanisms would require some care...

OK, we will think about this.
 
>
> Exactly. Probabilistic logic is a way to make inference over probabilistic
> programs much more efficient. I have specific examples for this in mind.

It will be good to hear the examples when you have time...

Sure. I'm traveling now, and I'll have a talk on a conference soon. After this, I will have time to go into detail
 

For instance, I like to think about evolutionary programming (e.g.
MOSES) as a tool for learning procedural knowledge, but OTOH our main
use of this tool right now is for learning classification rules.  Now
a program embodying a classification rule is, in a sense, a "procedure
for performing the classification" ... but then in this sense, every
logical inference is also a cognitive procedure ;p

I have also an idea about using evolutionary programming for training combined Atomese/DNN models (e.g. for SynerGAN-ish models or VQA)...

-- Alexey

Ben Goertzel

unread,
May 22, 2018, 11:06:49 PM5/22/18
to opencog, Константин Тимофеев, Nil Geisweiller, Linas Vepstas, Vitaly Bogdanov, Cassio Pennachin
> I have also an idea about using evolutionary programming for training combined Atomese/DNN models (e.g. for SynerGAN-ish models or VQA)...

Ah, interesting.

Yes, an advantage of evolutionary algorithms is that they apply to
basically any data type, or any combination thereof, etc. ..

They are roughly equally mediocre for a wide variety of data types and
fitness functions ;)

ben
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CABpRrhzQdwWp-gPQ6mGEjyF6WJpax1ya0MJePgQD6scZbk6aYQ%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.

Alexey Potapov

unread,
May 23, 2018, 8:19:49 AM5/23/18
to Linas Vepstas, Ben Goertzel, opencog, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
Linas,
 

Yes, this is clear (although I'd like to know more about your ideas regarding connectors),

Sure; say when. I can talk about them for days.  Unfortunately, the core idea is so simple, so obvious, that it becomes very difficult to talk about the advanced concepts, so this needs to be a distinct conversation.

If you'd have a paper (not necessarily a journal-style paper) systematically describing the idea and its application to OpenCog, it would be really nice. 
 

b) You ask me narrow, focused questions about certain specific tasks, and I answer how they could be accomplished, and how much work that would take.

I find that b) is much easier.

Currently, I don't know what more you want, besides what I've already written, in the last several emails.

OK, we'll return to this question with more examples.

-- Alexey

Alexey Potapov

unread,
May 23, 2018, 8:44:18 AM5/23/18
to Nil Geisweiller, Ben Goertzel, opencog, Константин Тимофеев, Linas Vepstas, Vitaly Bogdanov, Cassio Pennachin
Nil,
 

What seems to happen in human is that high confidence knowledge tend
to go subconscious, while less confidence knowledge tend to be the
subject of the attention. So for instance a at first you may focus on
edge detection, etc, once you've built some high confidence model
about the relationships within this domain, you move that to the
subconscious (the neo cortex? I'm not enough into neuroscience to
tell) then you can focus on the next abstraction.

Just small remark: it's not (only) about confidence, but also about efficient computations. If you face a P-class problem, which solution you don't know, you will first apply general inference/search methods, which will be exponentially slow at first (or, given limited time, they will be very imprecise). Specialization of a general inference engine w.r.t. a narrow problem results in inference procedures with completely different computational structure. The reason, why this procedures are unconscious, lies not in removal of linear overheads (interpretation vs compilation), but in moving from a general inference operating over unified search spaces to direct computations hardly applicable outside the scope of a corresponding narrow task.
 

So transposed to OpenCog, I don't really know, it could mean that for
instance once you have say built an ImplicationLink with sufficient
confidence (with a Dirac like second order distribution), you no
longer need to bother updating it, unless perhaps the likelihood of
the incoming data considerably deviate from normal, in which case it
may mean that you need to go back to the basics (I suspect psychedelic
drugs do something like that, though I don't know if that a side
effect or a fundamental effect, I lack experience to really tell).

Well... it should not necessarily be about high confidence. You can have a narrow task with very low amount of data, but very efficient procedures to calculate these low-confedence probabilities.
 

For instance

GetValueLink
  <atom>
  <key>

would return the value corresponding to `key` in `atom`. Note that
`key` is itself an atom. However the returned value may not
necessarily be an atom, it may be a proto atom, if so it would need to
"atomized". This makes me think that ProtoAtoms probably need some
"atomize" method or something.

I don't clearly understand, what do you mean by "links returning something"? Links per se do nothing, especially, in terms of "returning something". Returning to where? Do you mean that GetValueLink will be executable? In any case, we are talking about applying Pattern Matching to Values, so I'm interested in how do you view GetValueLink will behave within Pattern Matcher?
 
The atomization might be the problem, but if it is meant to be
volatile (not be stored in the atomspace) then maybe the whole thing
can be done efficiently.

It seems it would be good if Atomese offers operators to create,
destroy, etc, atomspace, because then temporary computations could be
moved in small atomspace that I suspect would still be able to
manipulate atoms with less overhead (though I don't know well from
where come the overhead of inserting atoms in an atomspace).

Got it.

-- Alexey

Nil Geisweiller

unread,
May 23, 2018, 10:00:21 AM5/23/18
to Alexey Potapov, Nil Geisweiller, Ben Goertzel, opencog, Константин Тимофеев, Linas Vepstas, Vitaly Bogdanov, Cassio Pennachin
Alexey,

On 05/23/2018 03:44 PM, Alexey Potapov wrote:
> Nil,
>
>
> What seems to happen in human is that high confidence knowledge tend
> to go subconscious, while less confidence knowledge tend to be the
> subject of the attention. So for instance a at first you may focus on
> edge detection, etc, once you've built some high confidence model
> about the relationships within this domain, you move that to the
> subconscious (the neo cortex? I'm not enough into neuroscience to
> tell) then you can focus on the next abstraction.
>
>
> Just small remark: it's not (only) about confidence, but also about
> efficient computations. If you face a P-class problem, which solution
> you don't know, you will first apply general inference/search methods,
> which will be exponentially slow at first (or, given limited time, they
> will be very imprecise). Specialization of a general inference engine
> w.r.t. a narrow problem results in inference procedures with completely
> different computational structure. The reason, why this procedures are
> unconscious, lies not inremoval of linear overheads (interpretation vs
> compilation), but in moving from a general inference operating over
> unified search spaces to direct computations hardly applicable outside
> the scope of a corresponding narrow task.

Yep, I agree, it's not just about confidence, but also, very
importantly, about efficiency.

There this notion of super compilation that Ben worked on with some
colleagues a while ago, that is very relevant. It develop the execution
tree of a program and restructure it, simplifies it, remove any
redundancy, etc. That's pretty powerful in principle, though probably
extremely slow.

That is said, I do believe that inference control alone could makes
inference pretty fast, over restricted domains of course, without the
need to immediately turn it into a specialized program.

So it seems what we want is

1. Perform general inference
2. Learning control rules to speed it up
3. Speed up Inference + Control Rule into an even faster program, by
super compiling it

this makes a lot of sense for any process that is reused over and over,
so that it's worth the effort of super compiling it.

>
>
> So transposed to OpenCog, I don't really know, it could mean that for
> instance once you have say built an ImplicationLink with sufficient
> confidence (with a Dirac like second order distribution), you no
> longer need to bother updating it, unless perhaps the likelihood of
> the incoming data considerably deviate from normal, in which case it
> may mean that you need to go back to the basics (I suspect psychedelic
> drugs do something like that, though I don't know if that a side
> effect or a fundamental effect, I lack experience to really tell).
>
>
> Well... it should not necessarily be about high confidence. You can have
> a narrow task with very low amount of data, but very efficient
> procedures to calculate these low-confedence probabilities.

True.

> For instance
>
> GetValueLink
>   <atom>
>   <key>
>
> would return the value corresponding to `key` in `atom`. Note that
> `key` is itself an atom. However the returned value may not
> necessarily be an atom, it may be a proto atom, if so it would need to
> "atomized". This makes me think that ProtoAtoms probably need some
> "atomize" method or something.
>
>
> I don't clearly understand, what do you mean by "links returning
> something"? Links per se do nothing, especially, in terms of "returning
> something". Returning to where? Do you mean that GetValueLink will be
> executable? In any case, we are talking about applying Pattern Matching
> to Values, so I'm interested in how do you view GetValueLink will behave
> within Pattern Matcher?

I mean when run by cog-execute!, which also happens to be the standard
way of invoking the pattern matcher.

Nil

Linas Vepstas

unread,
May 24, 2018, 12:33:52 AM5/24/18
to opencog, Ben Goertzel, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
On Mon, May 21, 2018 at 6:11 PM, Alexey Potapov <pot...@aideus.com> wrote:


2018-05-22 0:11 GMT+03:00 Linas Vepstas <linasv...@gmail.com>:

How many are we talking about, here? dozens, hundreds of objects? hundreds of predicates per object? That is 100x100 = 10K and, currently, you can create and add maybe 100K atoms/sec to the atomspace (via C++, less by scheme, python, due to wrapper overhead). So this seems manageable.

Thousands or even millions of objects. I can ask you a question about a speck of dust sparkling in the sunlight, hot pixel on your screen, tiny birthmark on a face, a hole in a button with a thread passing through it, etc. Each pixel belongs to tens of "objects"...

I want to keep this conversation realistic.  Sophia, today, struggles to see human faces. That's like, 1 or 2 of them. On a good demo day, she can see facial expressions. sort-of. distinguishing between at most half-a-dozen of them. With low accuracy. If you get lucky. That's pretty much it.  Ok, so using stuff like realsense and custom software she can kind-of-ish see hands and arms, *if* the are dead-ahead, well-posed, good lighting, no background movements (e.g. 5 other people crowding around).  But if you tilt your head, have bad lighting (viz what we consider "normal" lighting), she's blind.  If she could become consistently aware of even just one object, besides a face, in her field of view, without cheats like coloring it some bright color, that would be great. If she could see ten things, that would be mind-blowingly awesome. She could go on TV interviews and answer questions like "what do you see?".

Realistic compute power -- lets say several laptops worth of compute, and a GPU card that doesn't have some insanely whirry fan.  This is what you can get on-site, at the location where the vision is happening.

Of course, you can also stream data to the cloud, and do lots of processing there, but then there are bandwidth issues, and latency issues.

So the 100K Atoms/sec number is on a one-or-two-core 2014-2016 vintage average desktop-type CPU.
 

 

The question is how to combine OpenCog and neural networks on the algorithmic level. Let us return to the considered request for VQA. We can imagine a grounded schema node, which detects all bounded boxes with a given class label, and inserts them into Atomspace,

For example, one creates a ConceptNode "dress".  One also creates a PredicateNode "*-bounding-box-*"  Then one writes C++ code to implement the TensorFlowBBValue object.   One then associates all three:

(cog-set-value! (Concept "dress") (Predicate "*-bounding-box-*") (TensorFlowBBValue "obj-id-42"))

What is the current bounding box for that dress?  I don't know, but I can find out:

(cog-value->list (cog-value (Concept "dress") (Predicate "*-bounding-box-*")))

returns 2 or 4 floating point numbers, as a list.    Is Susan wearing that dress?

(cog-set-value! (Concept "Face-of-Susan") (Predicate "*-bounding-box-*") (TensorFlowBBValue "obj-id-66"))

(is-near? A B)  (> 0.1 distance (cog-value A (Predicate "*-bounding-box-*")) (cog-value B (Predicate "*-bounding-box-*"))

returns true if there is less than 0.1 meters distance between the bounding boxes on A and B.

The actual location of the bounding boxes is never stored, and never accessed, unless the is-near? predicate runs.

> Well... the problem is that our system should learn most of this somehow,

I agree!

> and it cannot learn C++ or Schema code

Of course! The whole point of Atomese is that it is a kind-of programming language that can be machine-manipulated, machine-learned.

> We would like to hardcode as less as possible. We can (and likely should) code TensorFlowValue

I think that would be a good experiment to conduct.  While Ben and other enjoy designing systems top-down, I like to pursue a bottom-up approach -- build something, see how well it works. If it works poorly, make sure that we understood *why* it failed, and what parts were good, and then try again.  So, for me a TensorFlowValue object would highlight what's good and what's bad in the current design.  Engineering hill-climbing.


> but we would like to avoid hardcoding (is-near? A B).

I agree, sort-of-ish.  English language propositions are a "closed class" - its a finite list, and a fairly small list -- a few dozen that are truly practical. A few hundred, if you start listing archaic, obsolete, rare ones, ones unapplicable to images ... https://en.wikipedia.org/wiki/List_of_English_prepositions   So for now, I find it acceptable to hard code a certain subset.

A discussion about "how can we learn prepositions from nothing?" would have to be a distinct conversation.
 
 

These nodes correspond not just to neural layers, but also to operations over them. One can imagine TensorNode nodes connected by PlusLink, TimesLink, etc..

Yes.  However, we might also need PlusValue or TimesValue.  I do not know why, yet, but these are potentially useful, as well.

> This is exactly my question whether we need them or not :)

Whether they are needed or not depends a lot on what kind of data is exposed by TensorFlowValue, and how that data is then routed up into the natural-language and reasoning layers. There are multiple possible designs for this; there is no particular historical precedent (in the atomspace) for this.

> Nil also proposed to use GetValueLink...

I didn't really understand that proposal. He seemed to be talking about truth values, not values in general.
 

 

Perhaps we need an IsLeftOfLink that knows automatically to obtain the "*-centroid-*" value on two atoms, and then return true/false depending on the result (or throw exception if there is no *-centroid-* value.)

> Sorry, I didn't not precisely get this. What is centroid and how is it connected to IsLeftOf?

https://en.wikipedia.org/wiki/Centroid

It avoids some of the complexity of bounding boxes (which might be touching, overlapping or inside-of.)

Following bottom-up design principles, I would rather have a simple, well-thought-out, fast, clear, working proof-of-concept before adding many dozens of complex spatial and temporal relationships.

The *-someword-*  is just a common way of naming quasi-global-variables in scheme. It's an "eyecatcher". ascii-art visual bling.  So I imagine that there could be a (PredicateNode "*-centroid-*) that acts as a key, and the value for it would be x,y,z floating point values.   Meanwhile, (PredicateNode "*-bounding-box-*") would be associated with 6 floats - two opposed corners of a cuboid, or a (PredicateNode "*-ellipsoid-*") might return 15 floating-point numbers given an ellipsoid.

Linas.

Nil Geisweiller

unread,
May 24, 2018, 12:57:04 AM5/24/18
to linasv...@gmail.com, opencog, Ben Goertzel, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
On 05/24/2018 06:33 AM, Linas Vepstas wrote:
> > Nil also proposed to use GetValueLink...
>
> I didn't really understand that proposal. He seemed to be talking about
> truth values, not values in general.

No, I meant values in general. So for instance if atom (Node "A") holds
a FloatValue (5.2, 0.1, 4.5) at key (Schema "*-my-key*-"), the following
would return something like

(cog-execute!
(GetValueLink
(Node "A")
(Schema "*-my-key-*")))

(List (Number 5.2) (Number 0.1) (Number 4.5))

Here the conversion from FloatValue to atom is kinda obvious, in other
more sophisticated values (like TensorFlowValue) a method will have to
be provided.

I suppose we would add some

ProtoAtom::to_atom() (or perhaps to_handle)

In case the proto atom is an atom it would return itself, otherwise it
would construct the corresponding atom.

Nil

Linas Vepstas

unread,
May 24, 2018, 1:01:21 AM5/24/18
to Ben Goertzel, opencog, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
Hi Ben,

I'm trying to spend just enough time to transmit all of the key design principles and constraints to Alexey and his team - (and Anton and his team) - as they are first-in-line to then diseminate this knowledge to a wider group.  Transmitting this information accurately is very key, very important at this point in time - mistakes here will be amplified over time.


On Mon, May 21, 2018 at 9:09 PM, Ben Goertzel <b...@goertzel.org> wrote:


I think it's better, if possible, to figure out a way to suitably
modify the core PM rather than
using a separate repo ...

Nothing I've heard so far requires any modification at all to the PM.

However, I guess the PM tweaks would need to be done someone on your team, as
Linas and Nil probably are too busy and we don't have a lot of others
who can rapidly
perform such changes...

Yes. Keeping up with email is almost full-time.
 

I would personally be in favor of overloading stuff like TimesLink in
order to apply to both
NumberNodes and Values, because it seems to me that the Atom/Value
distinction is more of
an efficiency-driven implementation distinction rather than a
fundamental mathematical/conceptual
distinction...

:-/

Its both efficiency and fundamental. Square pegs and round holes.   You cannot put a Value in the outgoing set of a Link.  That means that you cannot overload a TimesLink.   There are lots of other really neat things you can do, however.
 

> Also, DNNs are trained on (mini-)batches. It is not too natural from an
> autonomous agent perspective, but efficient.

Yes I see.   Again maybe some new TensorValue construct will be needed, we just
need to understand clearly what the requirements are in terms of any special
indexing etc.

At this point, the natural progression would be to start writing some over-all design proposal.  What's the input, what's the output, what's connected to what, what data is being generated, what data is being consumed.  This will make it much more clear exactly what code needs to be written where.
 

The difference between Atoms and Values is just an implementation-efficiency
tactic...

Um, no, not really, They are really fundamentally very different beasts.  This is key, and trying to blur the distinction is going to lead  to trouble.  Trust me, for now; if you want to come back to this in 1/2 a year, once all of the various team-members are up-to-speed, then we can have a rational discussion. But if we have that discussion now, it will crash and burn.
 

-- Linas

Linas Vepstas

unread,
May 24, 2018, 1:19:09 AM5/24/18
to Nil Geisweiller, Alexey Potapov, Ben Goertzel, opencog, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
On Tue, May 22, 2018 at 12:34 AM, Nil Geisweiller <ngei...@googlemail.com> wrote:



What do you suppose GetValueLink to do?

For instance

GetValueLink
  <atom>
  <key>

would return the value corresponding to `key` in `atom`. Note that
`key` is itself an atom. However the returned value may not
necessarily be an atom, it may be a proto atom, if so it would need to
"atomized". This makes me think that ProtoAtoms probably need some
"atomize" method or something.

Yeah. OK-ish, maybe, sort-of-ish.  There's a bunch of ugly little details that is going to torpedo this proposal. 

I would be much much happier if, instead, we had a collection of predicates that responded to questions like "is it true that this value has some property?" -- for example, "is it true that object X is to the left of object Y?"  This would be a lot easier than trying to shove x,y,z values into NumberNodes, and then doing some kind of numerical computations inside the atomspace.

I  strongly urge that we look at some practical examples, first.


It seems it would be good if Atomese offers operators to create,
destroy, etc, atomspace,

This is coming, its part of gituhub issue #something-or-other.  An important pre-requisite is clearing out most SetLink usages.  Once this is done, then we can create an AtomSpaceLink that would be a lot like a MemberLink. (conceptually. performance requires a rather strange implementation.)
 
because then temporary computations could be
moved in small atomspace that I suspect would still be able to
manipulate atoms with less overhead (though I don't know well from
where come the overhead of inserting atoms in an atomspace).

The primary overhead is the scanning of an atom, to see if it is unique (viz, if we already have it  in the atomspace, or not) This is not conceptually hard, but it places an upper limit on the atom insertion and deletion rate. Its the primary bottleneck.

This scanning takes slightly longer in large atomspaces, but is still significant for tiny atomspaces.  As a rule, you want to avoid placing NumberNodes into the atomspace.
 
Linas.

Linas Vepstas

unread,
May 24, 2018, 1:24:38 AM5/24/18
to Nil Geisweiller, Alexey Potapov, Ben Goertzel, opencog, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
No, its fundamental to what the definition of the atomspace is. Its the only way that you can have a single, unique (Concept "cat") in the system.  It also provides memory management, so that (Concept "cat") doesn't disappear, when the last pointer to it is released.

Of course, you don't have to use the atomspace, if you don't want to -- but then you have to (a) provide your own memory management, and (b) provide your own way of finding all instances of (Concept "cat") that you care about. But if you just want (a) and (b), then why not just use the atomspace?

--linas

Linas Vepstas

unread,
May 24, 2018, 1:29:38 AM5/24/18
to Nil Geisweiller, Alexey Potapov, Ben Goertzel, opencog, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
Yes, I worked very hard to try to make the Instantiator not insert anything into the atomspace until the very last possible moment.  I was mostly successful.  There are some hard parts.

But at some point, the Instantiator has to return to the caller, and then what? The caller gets this mess, what does the caller want to do with it? You can't pass the buck forever.

Linas Vepstas

unread,
May 24, 2018, 1:43:15 AM5/24/18
to Alexey Potapov, opencog, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
On Tue, May 22, 2018 at 9:26 PM, Alexey Potapov <pot...@aideus.com> wrote:

 

If needed we could also introduce some sort of entity that is between
a Value and an Atom in some sense -- i.e. we could introduce some sort of TensorValue entity that

1) Perhaps, knows what links to it (like an Atom but unlike a Value)

2) has an internal tensor that is mutable

There is nothing prohibiting one from building something like this
into Atomspace,
though obviously not breaking various mechanisms would require some care...

OK, we will think about this.

Actually, I want you to not think about this. I strongly believe that pretty much anything you can think of will fit nicely into an Atom, or into a Value.  I do not want to see a third kind of "generic object system" being created, that would be a deep mistake.

However, you can create a FooBarValue C++ class, and put whatever kinds of methods that you want into it.  Some methods might be commonly used, almost generic.

For example, every atom  type that derives from FunctionLink will always have an `execute()` C++ method on it; this is used to umm .. "make things happen".   Atom types that do not derive from this do not have this method.

--linas

Linas Vepstas

unread,
May 24, 2018, 1:48:26 AM5/24/18
to Alexey Potapov, Ben Goertzel, opencog, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
On Wed, May 23, 2018 at 7:19 AM, Alexey Potapov <pot...@aideus.com> wrote:
Linas,

 

Yes, this is clear (although I'd like to know more about your ideas regarding connectors),

Sure; say when. I can talk about them for days.  Unfortunately, the core idea is so simple, so obvious, that it becomes very difficult to talk about the advanced concepts, so this needs to be a distinct conversation.

If you'd have a paper (not necessarily a journal-style paper) systematically describing the idea and its application to OpenCog, it would be really nice. 

Based on previous experience, though, it seems like everyone finds the first half of it so obviously trivial, that they can't understand the second half.  Feedback has been very frustrating, because I think there's an important idea in there; its just hard to communicate.

-- Linas

Linas Vepstas

unread,
May 24, 2018, 2:06:11 AM5/24/18
to Nil Geisweiller, Alexey Potapov, Ben Goertzel, opencog, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin

On Wed, May 23, 2018 at 9:00 AM, Nil Geisweiller <ngei...@googlemail.com> wrote:

>  I mean when run by cog-execute!, which also happens to be the standard way of invoking the pattern matcher.

 
Any atom type that derives from FunctionLink has the C++ method

    virtual Handle execute() const;

method on it ... which cog-execute! calls. This typically triggers a long chain-reaction of executions (because its recursive ... there's even a way of doing infinite recursion with it, and starting/stopping/joining threads with it... in pure atomese. There are Atoms that create threads and join them. This is why Atomese is a "programming language" - it supports recursion etc.).  Note that this method returns an Atom.

There used to be a cog-evaluate! that was similar but returned a TruthValue.   I do not recall if we removed it or not; if it still exists, then it is not widely used.

It is possible that we might someday need to introduce some kind of cog-valuate! method that returns a value, but otherwise works much like cog-execute!  However, I am extremely super-nervous about having such a conversation right now. I think the newcomers need to have another half-year-ish of hands-on experience before we debate such fairly significant architectural changes.  It's very important that everyone understand the current architecture, before we just start changing it.

The existing architecture has room for a lot of things, a lot of freedom for designing things. I'd like to stick to it as much as possible.

Ben Goertzel

unread,
May 24, 2018, 2:08:23 AM5/24/18
to opencog, Nil Geisweiller, Alexey Potapov, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
***
The existing architecture has room for a lot of things, a lot of
freedom for designing things. I'd like to stick to it as much as
possible.
***

It's very understandable, however "... as possible" is key here, and
it's hard to see how the current system can scalably deal with tensors
from sensory processing tools, without some at least modest
changes/additions...
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAHrUA37-y9gbEr1mUY0JE_paNF8oK99UWgsCBqi5jX-M6kfK6w%40mail.gmail.com.

Nil Geisweiller

unread,
May 24, 2018, 2:18:19 AM5/24/18
to linasv...@gmail.com, Nil Geisweiller, Alexey Potapov, Ben Goertzel, opencog, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
On 05/24/2018 07:24 AM, Linas Vepstas wrote:
> Could this indexing be made lazy?
>
>
> No, its fundamental to what the definition of the atomspace is. Its the
> only way that you can have a single, unique (Concept "cat") in the
> system.
But the notion of uniqueness is only relevant when you query something
about cat, like it's incoming set, etc.

Now regarding memory management, yes it's true, although I guess it
could still be lazy and only index when the memory grows too much.

I'm not suggesting to even consider doing that, it seems like a massive
can of worms we don't want to open, not in the near future anyway, but I
was still wondering...

Nil

Linas Vepstas

unread,
May 24, 2018, 2:21:03 AM5/24/18
to opencog, Nil Geisweiller, Alexey Potapov, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
On Thu, May 24, 2018 at 1:08 AM, Ben Goertzel <b...@goertzel.org> wrote:
***
The existing architecture has room for a lot of things, a lot of
freedom for designing things. I'd like to stick to it as much as
possible.
***

It's very understandable, however "... as possible" is key here, and
it's hard to see how the current system can scalably deal with tensors
from sensory processing tools, without some at least modest
changes/additions...


Nothing I've heard so far requires any changes at all, and I can see a reasonable, simple solution, just fine.  Of course, we might be talking about different things.  But also - I've not seen enough detail here to force me to think deeply or hard about the problem - so perhaps I've over-simplified things in my head.

-- Linas

Linas Vepstas

unread,
May 24, 2018, 2:35:24 AM5/24/18
to Nil Geisweiller, Alexey Potapov, Ben Goertzel, opencog, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
On Thu, May 24, 2018 at 1:18 AM, Nil Geisweiller <ngei...@googlemail.com> wrote:
On 05/24/2018 07:24 AM, Linas Vepstas wrote:
    Could this indexing be made lazy?


No, its fundamental to what the definition of the atomspace is. Its the only way that you can have a single, unique (Concept "cat") in the system.
But the notion of uniqueness is only relevant when you query something about cat, like it's incoming set, etc.

If you say
   
    (cog-set-tv! (Concept "cat")  (stv 0.6 0.7))

and then later on, you say

   (cog-set-tv! (Concept "cat")  (stv 0.8 0.9))

how do you imagine that the system knows that this is the same "cat", and not two different instances of "cat" floating around in the system?  Answer: they are both the same atom because both statements cause that atom to be fetched from the same atomspace.   There's no magic here. There is an underlying mechanism that makes this possible.
  

Now regarding memory management, yes it's true, although I guess it could still be lazy and only index when the memory grows too much.

No, you have to index immediately, because otherwise the memory has to be freed instantly, on the spot.  Either there is at least one valid pointer to an Atom, or that memory is freed.  It is undesirable to have a block of memory with zero valid pointers to it -- this is called a memory leak.  A block of memory with zero pointers to it is lost, forever and ever.

Alexey Potapov

unread,
May 24, 2018, 7:03:16 AM5/24/18
to opencog, Ben Goertzel, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
Linas,
 
I want to keep this conversation realistic.  Sophia, today, struggles to see human faces.

We don't talk about applying existing narrow method. These method may be needed/realistic/practical now, but they don't bring us much closer to AGI. If we were talking about 'realistic' things in this sense, we would not talk about AGI at all.
Our task is to move forward to goal of creating a vision system for AGI. It's not about making a better narrow face recognition algo. We can do this, but it is not our task now.
 
Realistic compute power -- lets say several laptops worth of compute, and a GPU card that doesn't have some insanely whirry fan.  This is what you can get on-site, at the location where the vision is happening.

Realistic - not now, but 5 or 10 years from now. I remember, we were developing an image matching system which ran 'unrealistic' few minutes in late 1990-th. But then it became less then a second. 'Unrealistic' is maybe 10^6 times slower now than needed...
 

> We would like to hardcode as less as possible. We can (and likely should) code TensorFlowValue

I think that would be a good experiment to conduct.  While Ben and other enjoy designing systems top-down, I like to pursue a bottom-up approach -- build something, see how well it works. If it works poorly, make sure that we understood *why* it failed, and what parts were good, and then try again.  So, for me a TensorFlowValue object would highlight what's good and what's bad in the current design.  Engineering hill-climbing.

Nice.
 


> but we would like to avoid hardcoding (is-near? A B).

I agree, sort-of-ish.  English language propositions are a "closed class" - its a finite list, and a fairly small list -- a few dozen that are truly practical. A few hundred, if you start listing archaic, obsolete, rare ones, ones unapplicable to images ... https://en.wikipedia.org/wiki/List_of_English_prepositions   So for now, I find it acceptable to hard code a certain subset.

A discussion about "how can we learn prepositions from nothing?" would have to be a distinct conversation.

True, but the problem is not in the number of thse prepositions, but in their applications in different contexts. If we hard-code (is-near? A B), say, for rectangular regions on images, it will be inapplicable even to regions of a different shape. So, these prepositions can have some built-in templates, but not a procedural implementation.
 
 
 PlusLink, TimesLink, etc..
> This is exactly my question whether we need them or not :)

Whether they are needed or not depends a lot on what kind of data is exposed by TensorFlowValue, and how that data is then routed up into the natural-language and reasoning layers. There are multiple possible designs for this; there is no particular historical precedent (in the atomspace) for this.

OK
 

It avoids some of the complexity of bounding boxes (which might be touching, overlapping or inside-of.)

It will not work. In the case of centroids, a nose can be IsLeftOf a face. So, we shouldn't oversimply either...
 
-- Alexey

Alexey Potapov

unread,
May 24, 2018, 7:13:41 AM5/24/18
to opencog, Linas Vepstas, Ben Goertzel, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin


No, I meant values in general. So for instance if atom (Node "A") holds a FloatValue (5.2, 0.1, 4.5) at key (Schema "*-my-key*-"), the following would return something like

(cog-execute!
  (GetValueLink
    (Node "A")
    (Schema "*-my-key-*")))

(List (Number 5.2) (Number 0.1) (Number 4.5))

Will this list then be added to Atomspace or will it exist temporally while Pattern Matching is working? I mean the idea was not to introduce any global changes to Pattern Matcher. So, PM once encounters GetValueLink, calls cog-execute! on it, and receives NumberNode, which then will be used to evaluate e.g. GreaterThanLink connecting GetValueLink to another GetValueLink or NumberNode. This was my initial understanding. Is it right?

Linas Vepstas

unread,
May 24, 2018, 5:42:31 PM5/24/18
to opencog, Ben Goertzel, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
On Thu, May 24, 2018 at 6:03 AM, Alexey Potapov <pot...@aideus.com> wrote:
Linas,
 
I want to keep this conversation realistic.  Sophia, today, struggles to see human faces.

We don't talk about applying existing narrow method. These method may be needed/realistic/practical now, but they don't bring us much closer to AGI. If we were talking about 'realistic' things in this sense, we would not talk about AGI at all.
Our task is to move forward to goal of creating a vision system for AGI. It's not about making a better narrow face recognition algo. We can do this, but it is not our task now.

And that is not what I was saying, at all. What I was talking about was the principles of software architecture: if you want to write code, then you have to design for present-day platforms, present-day speeds-and-feeds.  If you want to speculate about what kind of hardware we'll have in 10 or 20 years, that's a different game, than attempting to build something semi-workable today.



> but we would like to avoid hardcoding (is-near? A B).

I agree, sort-of-ish.  English language propositions are a "closed class" - its a finite list, and a fairly small list -- a few dozen that are truly practical. A few hundred, if you start listing archaic, obsolete, rare ones, ones unapplicable to images ... https://en.wikipedia.org/wiki/List_of_English_prepositions   So for now, I find it acceptable to hard code a certain subset.

A discussion about "how can we learn prepositions from nothing?" would have to be a distinct conversation.

> True, but the problem is not in the number of these prepositions, but in
> their applications in different contexts. If we hard-code (is-near? A B),
> say, for rectangular regions on images, it will be inapplicable even to
> regions of a different shape. So, these prepositions can have some
> built-in templates, but not a procedural implementation.

In any sort of engineering progression, there are subsystems that you can work on and perfect today, and other subsystems that you just punt on, hack, and say to yourself "I'll fix this sometime later".   How to learn prepositional relationships from scratch is an interesting discussion, but it is different from the question "how can I attach tensorflow to opencog in the next month or two?"

Perhaps it is possible to figure out how to learn prepositions, from scratch, in a month or two.  But I doubt it. Unless you happen to have some very clear ideas about this, because I certainly don't.
 
 
 PlusLink, TimesLink, etc..
> This is exactly my question whether we need them or not :)

Whether they are needed or not depends a lot on what kind of data is exposed by TensorFlowValue, and how that data is then routed up into the natural-language and reasoning layers. There are multiple possible designs for this; there is no particular historical precedent (in the atomspace) for this.

> OK

Again, it would be great if we could nail down the next level of details.  Exactly what kind of output is generated by tensorflow, and exactly what we want to do with it in opencog.

Or perhaps its a different question: maybe the question is "how can we map tf.keras to atomese?"
Because this snippet:

model = tf.keras.Sequential([
  tf.keras.layers.Dense(10, activation="relu", input_shape=(4,)),  # input shape required
  tf.keras.layers.Dense(10, activation="relu"),
  tf.keras.layers.Dense(3)
])

appears to be a purely declarative definition of a network topology, which we could map to Atomese.
This would allow us to write tensorflow programs in Atomese. Why is that interesting? Not because
we want humans to write tensorflow models in atomese, but because maybe we can have PLN 
perform reasoning about tensorflow models, or because we can use MOSES to create, control
and evaluate tensorflow models, or perhaps you have so probbilistic-programing idea that could
auto-general different tensorflow models.
So far, I am very unclear about exactly what problem we are trying to solve, here (other than the
"problem of AGI").

 

It avoids some of the complexity of bounding boxes (which might be touching, overlapping or inside-of.)

> It will not work. In the case of centroids, a nose can be IsLeftOf a face. So, we shouldn't oversimply either...

Sure. But a common engineering progression is to have the system architect, and/or a senior programmer create a functioning prototype, and then have 3 or 5 junior programmers run around and replace the centroids by bounding boxes, or whatever. its a division-of-labor issue.  
 
Linas.

Linas Vepstas

unread,
May 24, 2018, 6:14:43 PM5/24/18
to Alexey Potapov, opencog, Ben Goertzel, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
On Thu, May 24, 2018 at 6:13 AM, Alexey Potapov <pot...@aideus.com> wrote:


No, I meant values in general. So for instance if atom (Node "A") holds a FloatValue (5.2, 0.1, 4.5) at key (Schema "*-my-key*-"), the following would return something like

(cog-execute!
  (GetValueLink
    (Node "A")
    (Schema "*-my-key-*")))

(List (Number 5.2) (Number 0.1) (Number 4.5))

As I stated earlier, I think that using a collection of predicates is superior to using a GetValueLink, both for performance, and for ease-of-use.  But lets assume that there is a GetValueLink. To answer your questions:

Will this list then be added to Atomspace or will it exist temporally while Pattern Matching is working?

Yes. During pattern matching, a temporary atomspace is created to hold temporary results. Whether or not any atoms are actually placed into the main atomspace depends on what the actual pattern was.

The pattern matcher is essentially a stack machine. As it explores graphs, it places intermediate results on stack, either the actual runtime stack, or in one of several stacks in the code base.  Some graph traversals cause atoms to be created; however, if it later back-tracks, because that particular branch exploration failed, then those created atoms have to be removed, as they would be "incorrect" for other branches.

At the end of this process, when a match is found, there is an opportunity to create atoms that are places in the main, primary atomspace.  Some queries do not create any atoms; they merely return a true/false reply.   Other queries can have the side-effect of modifying values on existing atoms, instead of creating or destroying any atoms.
 
I mean the idea was not to introduce any global changes to Pattern Matcher. So, PM once encounters GetValueLink, calls cog-execute! on it, and receives NumberNode, which then will be used to evaluate e.g. GreaterThanLink connecting GetValueLink to another GetValueLink or NumberNode. This was my initial understanding. Is it right?

Yes. Those  atoms would be placed into the temporary atomspace. For various arcane technical reasons (viz memory management), we have to place them into some atomspace, somewhere, but I believe that the use of temporary atomspaces is fairly low-cost and quite reasonable.

But it would be even faster/better, if there was, say ValueGreaterLink, which combined into one the operations of value-getting and comparing.  So, for example:

ValueGreaterLink
      Atom A
      Atom B
      Predicate "key-of-value-on A"
      Predicate "key-of-value-on B"

which would, when executed, fetch the values and perform the comparison. This way, we avoid having to call operator new NumberNode, set up incoming set on NumberNode, then operator new ListLink, etc. stuff it all into the temp atomspace, run a comparison, and then discard the whole mess. 

I'm excited that well have system-programmer class people who will be able to carefully explore system performance.  The current system works, and I think it work pretty well, but love and attention could make it better. (The wrong kind of love and attention could make it worse, so I am somewhat nervous, as well. I've seen plenty of projects go from "OK" to "crazy-badly wrong", when they scaled up in size.  Its really very easy to get high-quality system programmers to do the wrong thing.  It happens all the time, everywhere.)

Linas

Alexey Potapov

unread,
May 25, 2018, 5:35:06 AM5/25/18
to Linas Vepstas, opencog, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
2018-05-24 8:42 GMT+03:00 Linas Vepstas <linasv...@gmail.com>:

Actually, I want you to not think about this. I strongly believe that pretty much anything you can think of will fit nicely into an Atom, or into a Value.  I do not want to see a third kind of "generic object system" being created, that would be a deep mistake.

OK. Actually, I was not too interested in something between Values and Atoms, but more in something between deep learning and opencog. The latter doesn't necessarily requires the former. So, I promise not to think about this, unless find it necessary :)


I think the newcomers need to have another half-year-ish of hands-on experience before we debate such fairly significant architectural changes.

I understand your worries, but
- we don't debate right now, but ask.
- we have tasks, we would like to start solving right now (of course, it's up to Ben to tell us to gain another half-year-ish of hands-on experience on secondary issues).

 Nothing I've heard so far requires any changes at all, and I can see a reasonable, simple solution, just fine.

I'm not sure what do you mean by "any changes at all", while even basic problems require some changes (e.g. TensorValue, TimesLinkValue, etc.).

And that is not what I was saying, at all. What I was talking about was the principles of software architecture

I was talking not about software development, but about R&D.

 Again, it would be great if we could nail down the next level of details. 

Sure, sure. Actually, SynerGANs and VQA example are already enough for the discussion, but I guess I need to describe them to you in more detail. I hope to return to this in 10 days with more details and more examples. Let's take a small break until then.

 
This would allow us to write tensorflow programs in Atomese. Why is that interesting? Not because we want humans to write tensorflow models in atomese, but because maybe we can have PLN perform reasoning about tensorflow models, or because we can use MOSES to create, control and evaluate tensorflow models, or perhaps you have so probbilistic-programing idea that could auto-general different tensorflow models.
 Yes. Automatic design of DNNs is another example I have in mind. Didn't I mention it? Nevermind. I'm afraid I need to write a very long document to describe all aspects. I just tried to focus on the first small step regarding applying PM to Values. But it seems we need the whole picture to proceed.

-- Alexey

Alexey Potapov

unread,
May 28, 2018, 9:27:24 AM5/28/18
to Linas Vepstas, opencog, Ben Goertzel, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
Hi.
Here are some additional thoughts on OpenCog+PPL, which I didn't include in the first message

OpenCog + PPL

 

One of the ideas/tasks was to make an OpenCog a “better Church”.

 

1.     Why is it possible

Universal probabilistic programming languages (PPLs) utilize sampling-based approaches to infer posterior probabilities. They do not “reason” about tasks.

Consider a number of examples.

1.1)

(rejection-query

 (define n (random-integer 1000000))

 n

 (= n 10))

 

This program says that n is a random integer number from 0 to 999999, and it tries to estimate P(n|n=10). It will take quite long also it is obvious that P(n=10|n=10)=1. We can imagine an (extended) Pattern Matcher easily deducing n=10 and checking it fits [0, 999999].

 

1.2)

(rejection-query

 (define n (random-integer 10))

 (define m (+ n 5))

 n

 (= m 0))

 

Here, rejection-query will hang forever, since the condition cannot be met. Again, the absence of the answer can be easily deduced (given some properties of +).

 

1.3) Einstein's (zebra) puzzle can be easily represented in Church

(define where list-index)

(define what list-ref)

(define (is? xs x ys y)

  (eq? (what ys (where xs x)) y))

(define (neighbour? x y)

  (or (= x (+ y 1))

      (= x (- y 1))))

(rejection-query

 (define colors (shuffle '(Yellow Blue Red Ivory Green)))

 (define nationalities (shuffle '(Norwegian Ukrainian Englishman Spaniard Japanese)))

 (define drinks (shuffle '(Water Tea Milk OrangeJuice Coffee)))

 (define smokes (shuffle '(Kools Chesterfield OldGold LuckyStrike Parliament)))

 (define pets (shuffle '(Fox Horse Snails Dog Zebra)))

 

 (what nationalities (where drinks 'Water))

 

 (and (is? nationalities 'Englishman colors 'Red)

      (is? nationalities 'Spaniard pets 'Dog)

      (is? drinks 'Coffee colors 'Green)

      (is? nationalities 'Ukrainian drinks 'Tea)

      (= (where colors 'Green) (+ (where colors 'Ivory) 1))

      (is? smokes 'OldGold pets 'Snails)

      (is? smokes 'Kools colors 'Yellow)

      (= (where drinks 'Milk) 2)

      (= (where nationalities 'Norwegian) 0)

      (neighbour? (where smokes 'Chesterfield) (where pets 'Fox))

      (neighbour? (where smokes 'Kools) (where pets 'Horse))

      (is? smokes 'LuckyStrike drinks 'OrangeJuice)

      (is? nationalities 'Japanese smokes 'Parliament)

      (neighbour? (where nationalities 'Norwegian) (where colors 'Blue))

      )

)

 

Again, blind search performed by PPLs is too inefficient here. OpenCog can solve this problem quite efficiently, and we can imagine that an equivalent probabilistic program is written in Atomese and URE deductively infers the answer starting from constraints without actually sampling random variables.

 

We can see that from PPLs perspective, OpenCog can be used to make inference over probabilistic programs much more efficient (at least, in some cases). Why not write these programs in Atomese directly? For what reason do we need a PPL metaphor?

 

2.     What is missing

Consider the following example.

2.1)

(rejection-query

 (define x (gaussian 0 1))

 (define y (gaussian 0 1))

 x

 (= (+ x y) 1))

 

It will actually not work in Church, because the strict condition will not be satisfied (but it can be modified with the use of soft equality). URE (given necessary axioms) can easily infer y=1–x. However, it will not be able to ground variables. Actually, these are not variables to be grounded using number nodes, but rather values should be assigned to them. Also, we shouldn’t just sample x and then calculate y=1–x, because different values of y have different prior probability. Thus, if we want to estimate posterior probabilities, we should add specific mechanisms of taking prior probabilities of inferred values into account. Of course, we will also need some basic random distributions.

 

2.2) Fitting a polynomial with an unknown degree.

(define (calc-poly x ws)

  (if (null? ws) 0

      (+ (car ws) (* x (calc-poly x (cdr ws))))))

(define (calc-poly-noise x ws sigma)

  (+ (calc-poly x ws) (gaussian 0 sigma)))

(define (generate xs ws sigma)

  (map (lambda (x) (calc-poly-noise x ws sigma)) xs))

(define (sum-dif2 xs ys)

  (if (null? xs) 0

      (+ (+ (* (- (car xs) (car ys)) (- (car xs) (car ys))))

         (sum-dif2 (cdr xs) (cdr ys)))))

(define (samples xs ys)

  (mh-query 10 10

            (define degree (sample-integer 4))

            (define ws (repeat (+ 1 degree) (lambda () (gaussian 0 3))))

            (define sigma (gamma 1 2))

            degree

            (< (sum-dif2 (generate xs ws sigma) ys) 0.5)))

 

(define xs '(0 1 2 3))

(define ys (generate xs '(0.1 1 2) 0.01))

(hist (samples xs ys) "degree")

 

This program actually works and finds a correct degree of the polynomial using Bayesian Occam razor that is available “for free” in PPLs. Here, Atomese appears to be rather useless (constraints cannot be deductively propagated/pattern-matched, and there is no set of atoms to which variables can be grounded). Small modifications of Pattern Matcher might be enough (e.g. introducing RandomVariableNode and sampling its value from a given distribution instead of systematically enumerating all possible groundings), but there still remain some problems (e.g. with marginalization, stochastic recursion, the inefficiency of blind guess in comparison with the metaheuristic search, etc.).



-- Alexey

 


Linas Vepstas

unread,
May 28, 2018, 4:49:42 PM5/28/18
to Alexey Potapov, opencog, Ben Goertzel, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
On Mon, May 28, 2018 at 8:27 AM, Alexey Potapov <pot...@aideus.com> wrote:
Hi.
Here are some additional thoughts on OpenCog+PPL, which I didn't include in the first message

OpenCog + PPL

 

One of the ideas/tasks was to make an OpenCog a “better Church”.


Oooof. I wrote a long diatribe about religion, until I realized you must be referring to something that Alonzo Church must have written or invented.   I am not that familiar with his work, so I don't know what a "better Church" would be. Is there some specific paper or book?
 

 

1.     Why is it possible

Universal probabilistic programming languages (PPLs) utilize sampling-based approaches to infer posterior probabilities. They do not “reason” about tasks.

Consider a number of examples.

1.1)

(rejection-query

 (define n (random-integer 1000000))

 n

 (= n 10))

 

This program says that n is a random integer number from 0 to 999999, and it tries to estimate P(n|n=10). It will take quite long also it is obvious that P(n=10|n=10)=1. We can imagine an (extended) Pattern Matcher easily deducing n=10 and checking it fits [0, 999999].


Or you could apply analytic combinatorics to obtain this result. Philippe Flajolet has a 600 or 800 page-long book, showing how to do this.  Its called "analytic combinatorics".
 

 

1.2)

(rejection-query

 (define n (random-integer 10))

 (define m (+ n 5))

 n

 (= m 0))

 

Here, rejection-query will hang forever, since the condition cannot be met. Again, the absence of the answer can be easily deduced (given some properties of +).


Yep.
 
So why bring it up?
 

OpenCog can solve this problem quite efficiently,


Well, that depends on how the problem is coded, representationally. If it is coded as a graph search, then there is a combinatoric explosion -- there are 5^5=3125 branches, and it takes a long time to explore each branch.  Due to representational difficulties in atomese, I think the representation used in the unit test has far more branches, using more variables than might otherwise be needed (e.g. for neighbors, it uses unordered links, which have 5! permutations. So there are some factors of 5! in that unit test - which is due to an over-simplified representation.)

The classic, efficient solution to this problem is to use a 5^5 tensor, and knock out the tensor rows and columns, which is efficient, but is not obvious how to represent this, code this up in atomese.  I'm not saying its impossible, only that its not obvious: for a human, it remains a difficult task to write the code that solves this efficiently.

Constraint-satisfaction programming languages do make it easy to specify these kinds of problems: viz, it is probably quite easy to write down this problem using ASP.  However, the current opencog rule engine does not use the ASP syntax, (or any other constraint-satisfaction syntax that I know of) and thus is not an efficient constraint-satisfaction solver.
 

and we can imagine that an equivalent probabilistic program is written in Atomese and URE deductively infers the answer starting from constraints without actually sampling random variables.

 

We can see that from PPLs perspective, OpenCog can be used to make inference over probabilistic programs much more efficient (at least, in some cases). Why not write these programs in Atomese directly? For what reason do we need a PPL metaphor?


Well, for several reasons. First, Atomese is a representational language, it is NOT a compute platform.  This is important to understand: Atomese is designed to represent things, say things, to provide a syntax that gives meaning to ideas.  Atomese is NOT a compute platform (OK, so, de facto, it can compute, because it does have an existing implementation; this implementation is... inadequate in many ways).

The goal of representation is so that syntactic transformations can be applied to the network-structures.  That is, if one has a structure of some particular shape, then it can be transformed to another structure, in any one of several different ways.  In wikipedia, these structural transformations are called "rewrite rules". In model theory, these structural transformations are called "a theory". In proof theory, these structural transformations are called "rules of inference".   In gcc (the compiler) these structural transformations are called "gimple". LLVM and JVM and C# have yet another names for them. All of these different names really describe the same general concept: the syntactic (homotopic) transformations between shapes. (thus homotopy type theory)  Atomese is a device for representing shapes, (and thus, implicitly, their semantic content), and the syntactic transformations that can be applied to the shapes.

It has been convenient to also perform computations "under the covers", sort-of hidden away, in C++ classes. So, in this sense, Atomese is "accidentally" a computational platform.  However, it is terribly inefficient.  If you compare Atomese to, say the Java bytcode, the C# bytecode, or the guile bytecode, it is literally thousands, tens-of-thousands of times slower. (but not a million; its not that slow.)

The point here is that Atomese is a representation.  What is done under the covers is the implementation, and that is quite a different thing.  The current implementation is what it is. It can be improved, in a homotopic, continuous fashion, but that will take some hard work by systems programmers.

The URE is a different beast.  The original intent was that, since our data consists of graphical structures, and a basic single-step graph-rewriting primitive (the pattern matcher), that we should now build a system that can apply multiple syntactic transformations to the knowledge graph. From this comes the concept of "chaining": one can apply one transformation after another after another, in a linear sequence or "chain". This then splits into forward and backward chaining.  And this idea got implemented and is called the URE.

It was a worthwhile experiment, but, in the end, perhaps too simplistic and naive in conception. There are better ways. By the 1980's, academia knew that chaining is fundamentally inefficient, and that there are several ways to do better. One way is to approach the problem as a "parsing problem": instead of chaining, one asks, how can one select a set of rules, and assemble them (as "puzzle pieces") in such a way as to provide an extended transformation?  Once you start thinking of it that way, you can ask, "what are the efficient parse algorithms?"  The second approach to this problem is to realize that parsing is a constraint-satisfaction problem, and so one asks, "what are the efficient constraint-satisfaction solvers?"  Continuing in that direction leads down the road towards SAT solvers.

Thus, in the future, I would like to see URE-II which assembles rules as if it were solving a parsing or constraint-satisfaction problem. This would be much much more efficient.  The syntax of the rules would have to change; a side effect of working with the URE has been the discovery that "variables are harmful" - we use variables for connectors, and this requires the concept of quoting, which turns out to be a major mistake.  Live and learn.

Whatever.

OK, so now on to your question "what do we need probabilistic programming languages for?" To answer this, I first have to explain where probability comes in.

The graphs, above, and the syntactic transformation rules being applied to them: they do NOT come uniformly distributed.  Even if they did, after a few short turns of the crank, the distribution would no longer be uniform.  There are two ways to see this: pattern generation, and pattern recognition. In pattern generation, one applies a sequence of rules, and obtains fractals -- clearly, very highly non-uniform in distribution.  More abstractly, there might be dozens or hundreds of different rule-applications that all lead to the same end-point (and thus have a large measure  or "probability" assigned to that endpoint).  Other endpoints may be very "thin", with only one, single rule-sequence that leads to that endpoint. (and thus that endpoint has a small measure or "probability").

In pattern-recognition (e.g. language recognition by finite automata or stack automata, as a basic example; or in natural-language-parsing as a more sophisticated example) - in pattern recognition, some rules are used more frequently than others, and thus "should be tried first".  In natural language, where there are ambiguities, there is a correlation between the "correct parse", or, at least, the "most likely intended parse", and the rules used to obtain it.   There is a likelihood that can be assigned to rules and rule combinations.

So, in light of this, what should one do?  There are two answers.  One is the answer that Ben proposed some decade ago or more, with "probabilistic logic networks" -- Assign probabilities to knowledge-representations, assign probabilities to transformational rules, and use certain, specific formulas (the "PLN formulas") to attempt to track or emulate or approximate the true flow of probability through the system. There are several problems with this: it is not prima facie obvious that the PLN formulas are correct. I guess they offer some reasonable approximation of probability flow, but it seems to be invented from thin air, a decent rule-of-thumb, but that's all.  A second problem is that there is a combinatorial explosion during rule application -- this is rooted in the combinatorial explosion in rule chaining (backward or forward).  If one instead replaced the chainers by constraint solvers, how would one, could one apply the PLN formulas? Its not clear.

So, the idea is that probabilistic sampling, via probabilistic programming techniques, offers a different mechanism for obtaining the probabilities.

This is best illustrated by example.  Suppose, in the Einstein problem, that instead, we had that one person had a French mother and a Dutch father, and thus was half-French, half Dutch.  Suppose the tea-drinker also drank coffee one morning a week. Suppose the Kools smoker preferred Kools, unless he ran out, and then smoked OldGold. And so on.  At this point, the puzzle will no longer have a unique solution, but will have several solutions, some with high probability, and thus likely; while others are merely "plausible": consistent with the facts, but unlikely.   There are also still a very large number of impossible assignments.   NOW how does one try to solve the puzzle?

One answer is "use PLN". Another answer is "use probabilistic programming".  Here, it seems possible that random sampling will be faster than the recursive application of rules and probability propagation via rule-formulas. It also seems that it will be more accurate.  Again: the PLN fomulas are ad hoc: they were not derived in a rigorous way (e.g. via "analytic combinatorics", which gives exact results).  Applying analytic combinatorics to the (probabilistic) Einstein problem is difficult.  Thus, frequentist sampling, using probabilistic programming will probably be not only faster, but also far more accurate.

There is still one fairly large problem with probabilistic programming; this goes back to the "Medieval reasoning" examples I gave earlier (which I wrote up in a blog entry: https://blog.opencog.org/2018/05/27/medieval-thinking-and-political-agi/ )  Probabilistic programming is effectively a sampling technique; it does not reveal cause and effect.  That is, it will generate plausible answers (sure, the kools smoker might be one-fourth-Greek, which is compatible with orange juice drinking), and even the likeliest answer (the one with the highest weight assignments)  But it does not provide a "model of the world" or a "case for the prosecution", the way that a (Medieval Scholastic... or modern) legal system might.    That is, in a court of law, the statement that "the kools smoker might drink orange juice" is tantamount to saying "the accused had time to drive there, kill the victim and drive back": it is not a probability assignment, but rather a ruling out of counter-factuals.   I do not know how to use probabilistic programming to rule out counter-factuals.   Just because, out of 1000 samples, some branch was not taken, that does not mean that the branch will not be taken on the 1001st sample. That is, probabilistic programing provides a way for estimating probabilities. For many cases, it may be faster, and provide more accurate answers, than using PLN and the URE rule chainer. However, it does not provide any mechanism for counter-factual reasoning. It does not provide a mechanism to build competing "models of the world" (viz a model where the accused is innocent, another where the accused is guilty)

Model of the world: to be clear: the Einstein puzzle allows one to build court cases, and make accusations: "the prosecution asserts that the Norwegian drinks juice!" which may be true or false, but where did the Einstein puzzle come from?  How do we even know that we should be solving the Einstein puzzle, and not some other puzzle? Why are we talking about Norwegian juice drinkers instead of school shooters and #metoo victims?  Building world models requires pattern discovery.

--------
I don't know if you followed all of that, or if you found it to be too abstract or too hand-wavey.

There is a very specific, narrowly-focused place where I think that probabilistic programming could be immediately and directly useful, and for which it might be mostly straight-forward to write the code -- this would be for a sampling replacement for PLN.   The algo goes like this:

For each input fact, assign an a-priori probability. Next, sample: assign each input fact a crisp truth value of T or F (according to the probability) and perform crisp-logic reasoning, propagating these crisp truth values. Arrive at the conclusions(s), and count.  This gives a frequentist sampling result for the final probability.

Several interesting things happen.
(1) it might be faster to do 1000 crisp-logic samples, than to use PLN
(2) it validates the PLN formulas (or not) - the frequentist answers should be assumed to be "correct" -- when does PLN give answers with the frequentist results, and when does PLN diverge? If it diverges, then why?

I think that this is a sufficiently concrete, down-to-earth, practical problem that it can be tackled in some reasonable time-frame.  It does not require any of the loosey-goosey abstract nonsense I was talking about before.
 
 

 

2.     What is missing

Consider the following example.

2.1)

(rejection-query

 (define x (gaussian 0 1))

 (define y (gaussian 0 1))

 x

 (= (+ x y) 1))

 

It will actually not work in Church, because the strict condition will not be satisfied (but it can be modified with the use of soft equality). URE (given necessary axioms) can easily infer y=1–x. However, it will not be able to ground variables. Actually, these are not variables to be grounded using number nodes, but rather values should be assigned to them. Also, we shouldn’t just sample x and then calculate y=1–x, because different values of y have different prior probability. Thus, if we want to estimate posterior probabilities, we should add specific mechanisms of taking prior probabilities of inferred values into account. Of course, we will also need some basic random distributions.


Again, URE is not very efficient.  Also, the "rules" would have to be obtained via analytic combinatorics, and this is not easy. If it were easy, there would not be a 600-page book on it.  It is not practical for humans to spend time trying to hand-write rules, anyway -- they whole point is to automate the learning of rules, from nothing, rather than to have humans write the rules.

But the learning of rules is another, different abstract discussion.

 

 

2.2) Fitting a polynomial with an unknown degree.

(define (calc-poly x ws)

  (if (null? ws) 0

      (+ (car ws) (* x (calc-poly x (cdr ws))))))

(define (calc-poly-noise x ws sigma)

  (+ (calc-poly x ws) (gaussian 0 sigma)))

(define (generate xs ws sigma)

  (map (lambda (x) (calc-poly-noise x ws sigma)) xs))

(define (sum-dif2 xs ys)

  (if (null? xs) 0

      (+ (+ (* (- (car xs) (car ys)) (- (car xs) (car ys))))

         (sum-dif2 (cdr xs) (cdr ys)))))

(define (samples xs ys)

  (mh-query 10 10

            (define degree (sample-integer 4))

            (define ws (repeat (+ 1 degree) (lambda () (gaussian 0 3))))

            (define sigma (gamma 1 2))

            degree

            (< (sum-dif2 (generate xs ws sigma) ys) 0.5)))

 

(define xs '(0 1 2 3))

(define ys (generate xs '(0.1 1 2) 0.01))

(hist (samples xs ys) "degree")

 

This program actually works and finds a correct degree of the polynomial using Bayesian Occam razor that is available “for free” in PPLs.



I did not take the effort to try to decipher the program up above. Our traditional answer for these kinds of problems is "use MOSES".  I did spend some time in working on a framework that would allow MOSES to solve differential equations (that is, given a (noisy) time series of data, automatically discover the the differential equation that describes that motion.  It seems doable and even reasonably efficient.  (Viz, newton differences, "umbral calculus" effectively means that polynomials and differential equations are "the same thing")

For some things (such as discovering differential equations in motion data) it seems semi-reasonable to have human beings write some special-purpose, custom software.  But in general, this runs counter to the automatic discovery of patterns.   The above example is a large complex program that is hard to write, hard to understand. We cannot require humans to write large complex programs, every time some new problem comes up.  Pattern discovery needs to be automated.
 

Here, Atomese appears to be rather useless (constraints cannot be deductively propagated/pattern-matched, and there is no set of atoms to which variables can be grounded).


Well, careful in understanding what "Atomese" is. Again:

-- Atomese is a knowledge representation system. In so far as the above is some snippet of knowledge, it can be represented in Atomese.

-- If you are thinking of the pattern matcher, and of using forward-chaining to solve the problem of fitting a polynomial to a sequence of data points, I think this is doable; you'd have to create a set of rules defining least-squares fitting.  One of the example demo programs in the examples directory shows how to create a finite probabilistic state machine in Atomese. Another demo extends this into a min-quasi-crippled hidden-markov-model type system.  In my imagination, least-squares polynomial fitting could be done the same way, too. It seems like a reasonable "homework problem" for someone.

But again, having humans write least-square fitting programs in Atomese is kind-of opposite to the general direction I want to go in.  It *is* useful to try to solve different kinds, different classes of problems in Atomese, such as polynomial fitting, or bayesian networks or whatever -- these are all worthwhile experiments -- it highlights, indicates where the hard parts are, where the easy parts are, what the short-comings are, what is good and what is bad, what needs to be rethought, re-designed.

If you are thinking of the URE -- then yes, in retrospect, the concept of Variables is clearly a mistake.  It was a reasonable mistake to make, but still a mistake.  It is an example of an experiment in Atomese. We learned something from it.

 

Small modifications of Pattern Matcher might be enough (e.g. introducing RandomVariableNode and sampling its value from a given distribution instead of systematically enumerating all possible groundings),


We already have that; except the name is different. Two of them, actually: RandomNumberLink, and RandomChoiceLink
https://wiki.opencog.org/w/RandomNumberLink
https://wiki.opencog.org/w/RandomChoiceLink

These were originally implemented for doing a probabilistic-programming approach to controlling the Hanson Robotics Sophia robot.

We do NOT have a CountVariableNode that could accumulate counts of how often some particular event/branch/clause was explored.  Currently we accumulate frequentist counts in two major OpenCog subsystems: one is in the Pattern Miner, the other is in the natural-language-learning subsystem.   Both accumulate the counts in an ad-hoc way, outside of atomese.

It might be (should be?!!) interesting to accumulate counts inside of Atomese proper, in a principled way.  However, this is easier said than done.  For example, in language learning, first we observe pairs of words, and increment counts - not done in Atomese, but could be, and its not that hard.  But the second step is to perform an "MST parse", create disjuncts and accumulate statistics on those. Neither the MST parse, nor the disjunct extraction is written in Atomese .. nor, at this stage, would I want to.

Experimenting with count accumulation in Atomese would be an interesting thing to play with, I suppose.  I can even offer some prototype suggestions -- the natural place to store counts is in Values.  We would need something like an

IncrementCountLink
     SomeAtom
     PredicateNode "some key"

which adds 1.0 to the FloatValue attached to SomeAtom at "some key".

-------------

In the end, I am not at all clear on what you and Ben have agreed on w.r.t probabilistic programming.   Personally, myself, I am very interested in having a probabilistic-sampling rule-engine. The original idea was to convert rules to ASP, then use the Pottsdam ASP solver (which is both fast, and has a nice API) to repeatedly run the rules a zillion times, each with different probabilistic samplings, and then accumulate averages. I think that such a probabilistic rule engine could potentially beat PLN at it's own game.

I'm also interested in replacing the URE by a URE-II that avoids the use of variables, and replaces chaining by constraint satisfaction. Exactly how to make that probabilistic ... well, I sketched some hints, above, but its hard. Currently, I use the link-grammar parser for experiments in probability, but it is a very special purpose system, its not general.

Separately, I am very interested in various probabilistic pattern detection problems, but for those, well, this email is too long, and my thinking is cloudier.

--linas


but there still remain some problems (e.g. with marginalization, stochastic recursion, the inefficiency of blind guess in comparison with the metaheuristic search, etc.).



-- Alexey

 


Cassio Pennachin

unread,
May 28, 2018, 4:52:59 PM5/28/18
to Linas Vepstas, Alexey Potapov, opencog, Ben Goertzel, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov

Oooof. I wrote a long diatribe about religion, until I realized you must be referring to something that Alonzo Church must have written or invented.   I am not that familiar with his work, so I don't know what a "better Church" would be. Is there some specific paper or book?

Nil Geisweiller

unread,
May 29, 2018, 3:22:41 AM5/29/18
to linasv...@gmail.com, Alexey Potapov, opencog, Ben Goertzel, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
Linas,

On 05/28/2018 11:49 PM, Linas Vepstas wrote:
> I don't know what a "better Church" would be.

That is because you use the editor of the beast ;-)

> through the system. There are several problems with this: it is not
> prima facie obvious that the PLN formulas are correct. I guess they

The deduction rule (A->B & B->C |- A->C) as currently implemented makes
some independence assumption about B to simplify P(not B|A), and
uncertainty management (second order distribution approximation) is very
crude. Other than that, it's just probability theory, and can be fixed,
though probably not without slowing it down.

> Several interesting things happen.
> (1) it might be faster to do 1000 crisp-logic samples, than to use PLN
> (2) it validates the PLN formulas (or not) - the frequentist answers
> should be assumed to be "correct" -- when does PLN give answers with the
> frequentist results, and when does PLN diverge? If it diverges, then why?

Worthwhile project indeed.

> I'm also interested in replacing the URE by a URE-II that avoids the use
> of variables, and replaces chaining by constraint satisfaction.

When you get a chance I would really like to understand more what you
mean by that.

Nil

Nil Geisweiller

unread,
May 29, 2018, 3:57:25 AM5/29/18
to Alexey Potapov, Linas Vepstas, opencog, Ben Goertzel, Константин Тимофеев, Nil Geisweiller, Vitaly Bogdanov, Cassio Pennachin
Alexey,

yes, that matches more or less the way I thought about OpenCog+PPL
(though I didn't take the time to understand the Fitting Poly example).
BTW, I gave a small related presentation a while ago

https://www.youtube.com/watch?v=CvUDMvMnFVc&t=933s

with some associated code

https://github.com/ngeiswei/opencog/commits/ppl

It doesn't go far at all but who knows it may help. In particular the
use of the to-be-implemented GDTV
https://github.com/opencog/atomspace/issues/833 (which Roman Treutlien
is giving a shot BTW) could be useful.

Nil

On 05/28/2018 04:27 PM, Alexey Potapov wrote:
> Hi.
> Here are some additional thoughts on OpenCog+PPL, which I didn't include
> in the first message
>
> *OpenCog + PPL*
>
> One of the ideas/tasks was to make an OpenCog a “better Church”.
>
> *1.**Why is it possible*
>
> Universal probabilistic programming languages (PPLs) utilize
> sampling-based approaches to infer posterior probabilities. They do not
> “reason” about tasks.
>
> Consider a number of examples.
>
> 1.1)
>
> (rejection-query
>
> (define n (random-integer 1000000))
>
> n
>
> (= n 10))
>
> This program says that /n/ is a random integer number from 0 to 999999,
> and it tries to estimate P(/n/|/n/=10). It will take quite long also it
> is obvious that P(/n/=10|/n/=10)=1. We can imagine an (extended) Pattern
> Matcher easily deducing /n/=10 and checking it fits [0, 999999].
> *2.**What is missing*
>
> Consider the following example.
>
> 2.1)
>
> (rejection-query
>
> (define x (gaussian 0 1))
>
> (define y (gaussian 0 1))
>
> x
>
> (= (+ x y) 1))
>
> It will actually not work in Church, because the strict condition will
> not be satisfied (but it can be modified with the use of soft equality).
> URE (given necessary axioms) can easily infer /y/=1–/x/. However, it
> will not be able to ground variables. Actually, these are not variables
> to be grounded using number nodes, but rather values should be assigned
> to them. Also, we shouldn’t just sample /x/ and then calculate
> /y/=1–/x/, because different values of /y/ have different prior

Linas Vepstas

unread,
May 29, 2018, 12:30:17 PM5/29/18
to Nil Geisweiller, Alexey Potapov, opencog, Ben Goertzel, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
On Tue, May 29, 2018 at 2:22 AM, Nil Geisweiller <ngei...@googlemail.com> wrote:

I'm also interested in replacing the URE by a URE-II that avoids the use of variables, and replaces chaining by constraint satisfaction.

When you get a chance I would really like to understand more what you mean by that.

Perhaps it would be easiest if you sent an example of two or three typical inference rules that you use, and how you chain them together. I'll take a shot at converting them to a different notation, see if I can actually make this work.  For now, this has been a gut sense; but having to actually make it work exposes practical problems.

Ben had previously sent a URL for a scholarpedia article that describes the idea - it dates back to the 1980's.  However, that article used a very different notation from what I have in mind, and it could get away with that only because it could interpret implication as not P or Q acting on crisp truth values. I believe I can propose an alternate notation, better suited to our needs.

Linas.

Linas Vepstas

unread,
May 29, 2018, 3:53:43 PM5/29/18
to Nil Geisweiller, Alexey Potapov, opencog, Ben Goertzel, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin


On Tue, May 29, 2018 at 2:57 AM, Nil Geisweiller <ngei...@googlemail.com> wrote:

https://github.com/ngeiswei/opencog/commits/ppl

I looked at this. Its interesting. It almost works.

First: IfThenElseLink exists, but is called SequentialAndLink .. although maybe SequentialAnd did not exist when you wrote that demo.

You can also implement a coin-fip as:
 (define exec-flip (RandomChoiceLink (NumberNode 0) (NumberNode 1)))

Again, maybe RandomChoice did not exist when you created that demo.

I'm not sure why you decided to use variables; the following works:

(define pl-6 (Equal (Number 2) (Plus exec-flip exec-flip exec-flip)))
(cog-evaluate! pl-6)

Now, here's where it gets interesting.  You wrote:

(define pl-4 (Put (Equal (Number 2) (Plus X Y Z))
                  (List exec-flip exec-flip exec-flip)))

So,
(cog-evaluate! pl-4)
works as expected, returning alternating true and false values.

(cog-execute! pl-4)
does  not work "as expected" -- the Put is performed, but the resulting EqualLink is not evaluated.  I just now tried to "fix" this, in a hacky way, in pull req #1717 - unfortunately my "fix" causes ten unit tests to fail, so I had to disable it. There's a whole lot of arcane distinctions between execution and evaluation in the code, and it would be a giant effort to figure out if all of these really make sense, or if its some historical artifact, and the distinction can be removed.  This effort would be huge, difficult for me; I suspect it would be a half-year project for someone new to opencog.

As mentioned before, it is surely a poor idea to place randomly-computed nodes and links in the atomspace - it pollutes the atomspace with atoms that will never be searched for, that will never be of interest to anyone later on. The atomspace is for "remembering things"; randomly generated sampling values are not memorable.

Thus, it makes more sense to try to do everything with values, instead of atoms.  Below is some code that attaches a stream of random number values (uniformly distributed 0 to 1) to some atom, at some key.  It then adds this together, three times, checks to see if the value is greater than 2 or not.   This code works as of right now, although I had to fix more than a few bugs to make it work, so you have to git pull and rebuild to run it.

 
(use-modules (opencog) (opencog exec))

(define someplace (Concept "A"))
(define key (PredicateNode "*-uniform-*"))
(cog-set-value! someplace key (RandomStream 1))

Above places a stream of random uniformly distributed numbers 0<x<1 on the atom.

(define uniform (ValueOfLink someplace key))
(define pl-7 (GreaterThan (Number 2) (Plus uniform uniform uniform)))
(cog-evaluate! pl-7)

Above should return true 2/3rds of the time, because 2 > [0,3] two-thirds of the time.

We have no gaussian noise sources currently defined in opencog.

There's a way of doing tail-recursive loops in atomese. Here's how:

(Define (DefinedPredicate "keep going?")
   (GreaterThan (Number 2.5) (Plus uniform uniform uniform)))

(define (print-stuff) (display "hi there!\n") (stv 1 1))

(Define
   (DefinedPredicate "some loop")
   (SequentialAnd
      (DefinedPredicate "keep going?")
      (Evaluation (GroundedPredicate "scm: print-stuff") (List))
      (DefinedPredicate "some loop")
   ))

(cog-evaluate! (DefinedPredicate "some loop"))

I think that maybe the above provides enough infrastructure to port the polynomial-fitting demo to atomese. But I'm not sure.  I don't want to do that exercise myself.

-- Linas

Alexey Potapov

unread,
May 30, 2018, 11:12:33 AM5/30/18
to Nil Geisweiller, Linas Vepstas, opencog, Ben Goertzel, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
Nil,

2018-05-29 10:57 GMT+03:00 Nil Geisweiller <ngei...@googlemail.com>:
Alexey,

yes, that matches more or less the way I thought about OpenCog+PPL (though I didn't take the time to understand the Fitting Poly example). BTW, I gave a small related presentation a while ago

https://www.youtube.com/watch?v=CvUDMvMnFVc&t=933s

with some associated code

https://github.com/ngeiswei/opencog/commits/ppl

It doesn't go far at all but who knows it may help. In particular the use of the to-be-implemented GDTV https://github.com/opencog/atomspace/issues/833 (which Roman Treutlien is giving a shot BTW) could be useful.

Nil

Yes, I saw this presentation a year ago, although I didn't study the associated code separately. It's general content is good. However, what do we need is not just to implement RejectionSampling, but to integrate it with Pattern Matcher / URE as I tried to emphasize above. In particular, implementation of flip as a grounded schema node is not suitable in this case... I also remember the discussion about GDTV, but I need to rethink it. We will discuss this in more detail later.

Nil Geisweiller

unread,
May 31, 2018, 12:52:44 AM5/31/18
to Alexey Potapov, Nil Geisweiller, Linas Vepstas, opencog, Ben Goertzel, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
On 05/30/2018 06:12 PM, Alexey Potapov wrote:
> associated code separately. It's general content is good. However, what
> do we need is not just to implement RejectionSampling, but to integrate
> it with Pattern Matcher / URE as I tried to emphasize above. In
> particular, implementation of flip as a grounded schema node is not
> suitable in this case... I also remember the discussion about GDTV, but
> I need to rethink it. We will discuss this in more detail later.

I totally agree. So ideally, we want

1. Sampling based inference iterated as a URE rule

2. Such rule should be able to benefit from the broader inference
capabilities of the URE

Item 1. is trivial (just wrap whatever sampling code in a rule). Item
2. is less, for instance it should be able to recreate the kind of
program analysis presented in "Efficiently Sampling Probabilistic
Programs via Program Analysis" and beyond. I think some wisdom from
Ben's SampleLink can be borrowed here, obviously the devil is in
the details.

Maybe we want some form of adaptive iterative sampling, with some kind
of context sensitive sampling rules, I don't know.

Nil

Alexey Potapov

unread,
May 31, 2018, 2:40:36 PM5/31/18
to Nil Geisweiller, Linas Vepstas, opencog, Ben Goertzel, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
Linas (and Nil),
thanks for ValueOf link. It works just as we needed in our toy example with bounding boxes.
This is the first small, but very important step. Pattern Matcher over values works! I see a lot of cool things that can be built on top of it. However, we need to move step-by-step.
The next step is to have variable/unknown/random values in some generic way. Let me explain this on a number of examples.

1) In webPPL we can write
var isAlarm = function(burglary, earthquake) {
    if(burlgary && flip(0.95)) return true
    if(earthquake && flip(0.3)) return true
    if(flip(0.001)) return true
    return false
}
var generate = function () {
    var burlgary = flip(0.001)
    var earthquake = flip(0.002)
    var alarm = isAlarm(burglary, earthquake)
    return alarm
}
Infer({method: 'enumerate', model: generate})
(The same can be written in Church; I use webPPL here just for diversity)
This means that burglary and earthquake are variables with attached distributions. In Atomese, they should be ConceptNodes, which truth values are defined as (very simple) distributions. One could just write something like (Concept burglary (stv 0.001 1)), or to use (not implemented) GDTV. But this is not precisely what we want! Because we might want to infer a posterior truth value of burglary given some observations. This posterior should not replace the prior probability/truth value, because we might want to use it in another inference with different conditions. Thus, flip(0.001) should be a part of Atomspace. Flip should be a node (or link?), and 0.001 should be a value for this atom.
Similarly, we might want to define
var diceRoll = randomInteger(6)
randomInteger fits neither to stv nor to GDTV. It is a (prior) distribution over Values (not truth values in this case). Thus, to keep things general, we would like to be able to attach undefined/random values to atoms through some proxy atoms of a specific type.
We should not directly implement flip or randomInteger as GroundedSchemaNode!!! This might work in simple cases, but we might want not to sample these values, but infer them in a different way.

2) In Tensorflow we can write

# model
x = tf.placeholder(tf.float32, [None, 1])
y_ = tf.placeholder(tf.float32, [None])
w = tf.Variable(tf.random_uniform([4], -1, 1))
exps = tf.scan(lambda a, _: a + 1., w, initializer=-1.)
y = tf.reduce_sum(w * tf.pow(x, [exps]), axis=1)
# train
loss = tf.reduce_mean((y - y_) ** 2)
step = tf.train.AdamOptimizer(0.1).minimize(loss)
# data
xs = np.linspace(1., 10., 10)
ys = 0.1 * (xs ** 3) + xs + 5
xs = xs.reshape((xs.size, 1))

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for _ in range(5000):
_l, _ = sess.run([loss, step], feed_dict={x: xs, y_: ys})
print(_l)
print(sess.run(w))

Again, w is a variable. It is even initialized by sampling from a distribution (and if we run global_variables_initializer in a cycle, we will indeed get different samples). But in this example, its value is optimized by a gradient descent. So, we don't perform sampling here, but adjust w on the base of its previous value. Actually, in PPLs, if we use MCMC, we also memorize previous values of random variables and adjust them on the base of a proposal distribution. Again, we might want to have a Node w (TensorNode in general, but it is questionable if we need to have such type of nodes, while we connect nodes to values through keys; for TensorNode we would like to have a direct connection as it is the case for stv: (ConceptNode burglary (stv 0.001 1)) ~~> (TensorNode w (tensor_value ...)); but this is a side remark... I don't propose to implement it in such a way, but we will need to discuss this at some point)...
So, tf.Variable is explicitly a part of a TF computational graph. It "screens" the subgraph (!): tf.random_uniform([4], -1, 1) from evaluation, when w is evaluated. This "screen" is penetrated by the gradient descent optimizer which explicitly uses tf.assign for changing variables values.
Similarly, tf.Variable should be a part of Atomspace. That is, an Atomese analog of tf.Variable should be a link that connects Node w with an initializer of its value. This initializer might be a prior distribution that might be used for probabilistic inference (sampling or direct evaluation of posteriors) or might be used just for initializing this random variable in gradient descent or meta-heuristic search.

3) Tensorflow Probability / Edward (some random piece of code):
...
z = Normal(loc=tf.zeros([FLAGS.M, FLAGS.d]),
             scale=tf.ones([FLAGS.M, FLAGS.d]))
hidden = tf.layers.dense(z, 256, activation=tf.nn.relu)
x = Bernoulli(logits=tf.layers.dense(hidden, 28 * 28))
x_ph = tf.placeholder(tf.int32, [FLAGS.M, 28 * 28])
hidden = tf.layers.dense(tf.cast(x_ph, tf.float32), 256,
                           activation=tf.nn.relu)
inference = ed.KLqp({z: qz}, data={x: x_ph})
optimizer = tf.train.RMSPropOptimizer(0.01, epsilon=1.0)
inference.initialize(optimizer=optimizer)
...
Here, Normal and Bernoulli hide tf.Variable, but it is still there. Gradient descent is used for optimization, but the loss function includes prior probabilities of the variables. So, we have a mixed case here.

It can be seen that all cases are quite similar in that there are nodes in computational graphs, which describe variable values, which are modified during inference. This is somewhat similar to VariableNode, but there are too many "technical" differences.
Actually, in functional PPLs "Atoms" and "Values" are the same. One can easily write a program that is a generative model of another program, or a generative model of some numeric data. However, this causes another distinction: such PPLs work on program traces, which differ from initial programs. These traces are dynamic and mutable like values; they are not persistent and not self-referential. At the same time, initial programs are completely immutable. This is not too good...
Thus, we want to keep the distinction between Atoms and Values, although we want to be able to mix them. Pattern Matcher with the use of ValueOf link is an excellent (but still quite simple) example of this.
Of course, introducing something like VariableValue link or node will be not interesting without corresponding inference mechanisms, which are now absent, but it should be implemented first.
We can discuss, how the code in Atomese can look like, and how the gradient descent optimization or another kind of inference over values can be combined with Pattern Matcher or URE in more detail...


I would like something more generic as described above.

-- Alexey

Linas Vepstas

unread,
May 31, 2018, 9:08:23 PM5/31/18
to Alexey Potapov, Nil Geisweiller, opencog, Ben Goertzel, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
On Thu, May 31, 2018 at 1:40 PM, Alexey Potapov <pot...@aideus.com> wrote:
Linas (and Nil),
thanks for ValueOf link. It works just as we needed in our toy example with bounding boxes.
This is the first small, but very important step. Pattern Matcher over values works! I see a lot of cool things that can be built on top of it. However, we need to move step-by-step.
The next step is to have variable/unknown/random values in some generic way. Let me explain this on a number of examples.

1) In webPPL we can write
var isAlarm = function(burglary, earthquake) {
    if(burlgary && flip(0.95)) return true
    if(earthquake && flip(0.3)) return true
    if(flip(0.001)) return true
    return false
}
var generate = function () {
    var burlgary = flip(0.001)
    var earthquake = flip(0.002)
    var alarm = isAlarm(burglary, earthquake)
    return alarm
}
Infer({method: 'enumerate', model: generate})
(The same can be written in Church; I use webPPL here just for diversity)
This means that burglary and earthquake are variables with attached distributions. In Atomese, they should be ConceptNodes, which truth values are defined as (very simple) distributions. One could just write something like (Concept burglary (stv 0.001 1)), or to use (not implemented) GDTV. But this is not precisely what we want! Because we might want to infer a posterior truth value of burglary given some observations. This posterior should not replace the prior probability/truth value, because we might want to use it in another inference with different conditions. Thus, flip(0.001) should be a part of Atomspace. Flip should be a node (or link?), and 0.001 should be a value for this atom.

I thought that this was demonstrated in an earlier email.  There are now two ways of doing this in Atomese: with a traditional pure-atom solution, using RandomNumberLink, or the new-fangled  Value system, using RandomStream.

I could try to write out a working example of each, but it would be more educational if someone else did this.

Note that the Sophia robot already runs the pure-atom version; viz, instead of earthquakes and burglaries, its the number of people visible, and whether or not she recognizes them.  The guys working on ghost run this kind of code, the HK offices have expertise in this.  Some old, stale (non-ghost) code is here: https://github.com/opencog/opencog/blob/master/opencog/eva/behavior/express.scm  https://github.com/opencog/opencog/blob/master/opencog/eva/behavior/face-priority.scm  https://github.com/opencog/opencog/blob/master/opencog/eva/behavior/primitives.scm and other files in that directory.  I think the ghost guys don't use this code, but I'm not sure.

SequentialAnd takes the place of the if-statements.  LambdaLink takes the place of the function declarations.  Some design commentary:

1) perhaps RandomNumberLink and RandomStream should be collapsed down to just one thing, instead of there being two. Perhaps the API to either/both should be altered.  I have some opinions on this, but not for this email.

2) I'm starting to think that LambdaLink is another design flaw. Here's why.  (a) lambdas make it easier for *humans* to declare functions, so as to provide encapsulation, modularity. Atomese is not intended for humans, its intended for things like MOSES and PLN and URE to perform reasoning.  I suspect that Lambda makes reasoning harder, not easier. viz. unification for example.  (b)  lambda is meant to allow a single implementation to stand for repeated invocations of the same thing, viz to avoid cutting-n-pasting of code.  However, the atomspace already avoids this, automatically: atoms are globally unique, and so lambda is not required to obtain uniqueness.

3) If/when you rewrite the above example in atomese, you will find that it is much more verbose than webPPL. Again, that is because webPPL is designed for human programmers to express structure in human-readable, human-understandable ways.  Atomese is not meant to be human-readable, it is meant to be machine-readable.  So, the question really is: if you had the above function, written in Atomese, what are you going to do with it? What's the point? Why do you want to write it in Atomese?

These are not rhetorical questions. I want Atomese, because I want to make it easy for Sophia to learn, viz, if two people are in the room, and video camera shakes, and one of the people says "its an earthquake!" and the other says "its a burglary!" that she can associate video-camera shaking with burglaries and earthquakes.  Or at least, alarms.  I'm not particularly interested in having human programmers writing isAlarm code. 

 
Similarly, we might want to define
var diceRoll = randomInteger(6)

RandomChoiceLink.    https://wiki.opencog.org/w/RandomChoiceLink  it works.
 
randomInteger fits neither to stv nor to GDTV.

No, cause that's not how it works. It fits into atoms (if you use the older system) and it fits into values, if you use the newer system.  Again, it would be better if you or someone in your group attempted to write the atomese for this.  We've been doing this for a while for Sophia.

But much more important is answer the questions in point (3) first.  Because answering those questions will help make clear WHY the code should be written one way, instead of another way.

 
It is a (prior) distribution over Values (not truth values in this case). Thus, to keep things general, we would like to be able to attach undefined/random values to atoms through some proxy atoms of a specific type.
We should not directly implement flip or randomInteger as GroundedSchemaNode!!! This might work in simple cases, but we might want not to sample these values, but infer them in a different way.

Yeah GroundedSchema are for "emergency only" use, needed to interface to other major subsystems (e.g. ROS)
 

2) In Tensorflow we can write

Let me comment in next email.

Linas Vepstas

unread,
May 31, 2018, 10:00:01 PM5/31/18
to Alexey Potapov, Nil Geisweiller, opencog, Ben Goertzel, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
On Thu, May 31, 2018 at 1:40 PM, Alexey Potapov <pot...@aideus.com> wrote:

2) In Tensorflow we can write

OK, so I read the rest of this email.  I'm not sure how to respond to it.

I could try to write/invent-as-needed the kind of Atomese that expresses the same thing.  Or I could suggest that someone else does this.

I fear that I've already promised too many things to too many people, and won't have the time to think about this correctly.


> It can be seen that all cases are quite similar in that there are nodes in computational graphs, which describe variable values, which are modified during inference. This is somewhat similar to VariableNode, but there are too many "technical" differences.

Yes. The existing VariableNode is mis-used/over-used already.  We need something else....

> Thus, we want to keep the distinction between Atoms and Values, although we want to be able to mix them.

You can place an Atom into (almost) any location that wants a Value, but not vice-versa


> Pattern Matcher with the use of ValueOf link is an excellent (but still quite simple) example of this.

It was meant to be simple, a proof-of-concept.

> We can discuss, how the code in Atomese can look like, and how the gradient descent optimization or another kind of inference over values can be combined with Pattern Matcher or URE in more detail...

Yes.  I still have the meta-question: why are we doing this in Atomese? Presumably, so that some *other* algorithm can then modify the Atomese, as needed. Perhaps we should discuss what this other algorithm would do, how it would work?  I vaguely envision some kind of "moses for tensorflow models"

--linas

Ben Goertzel

unread,
May 31, 2018, 10:02:59 PM5/31/18
to Linas Vepstas, Alexey Potapov, Nil Geisweiller, opencog, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
***
Yes. I still have the meta-question: why are we doing this in
Atomese? Presumably, so that some *other* algorithm can then modify
the Atomese, as needed. Perhaps we should discuss what this other
algorithm would do, how it would work? I vaguely envision some kind
of "moses for tensorflow models"
***

One desire here is to use PLN abductive inference to generalize from
these models..

ben

Nil Geisweiller

unread,
Jun 1, 2018, 1:37:49 AM6/1/18
to Alexey Potapov, opencog, Ben Goertzel, vsb...@gmail.com, Cassio Pennachin, Linas Vepstas
Alexey,

On 05/31/2018 09:40 PM, Alexey Potapov wrote:
> One could just write
> something like (Concept burglary (stv 0.001 1)), or to use (not
> implemented) GDTV. But this is not precisely what we want! Because we
> might want to infer a posterior truth value of burglary given some
> observations. This posterior should not replace the prior
> probability/truth value, because we might want to use it in another
> inference with different conditions.

A workaround is to use Context https://wiki.opencog.org/w/ContextLink,
or even Implication, or possibly in this case Inheritance link.

> var diceRoll = randomInteger(6)
> randomInteger fits neither to stv nor to GDTV. It is a

Actually it could be a GDTV. GDTV is meant to be a second order
distribution of anything (that's why it's also called a DV, for
Distributional Value). One of my goals in introducing it was to make
it possible/easy to perform any probabilistic/statistic calculations,
basically turning PLN into an artificial statistician. This can be
done without GDTV, by moving the distribution description in atom
instead of value, but the latter is likely more adequate.

> (prior) distribution over Values (not truth values in this case). Thus,
> to keep things general, we would like to be able to attach
> undefined/random values to atoms through some proxy atoms of a specific
> type.
> We should not directly implement flip or randomInteger as
> GroundedSchemaNode!!! This might work in simple cases, but we might want
> not to sample these values, but infer them in a different way.

Note that you can axiomatize a GroundedSchemaNode and reason about it.

> 2018-05-31 7:52 GMT+03:00 Nil Geisweiller <ngei...@googlemail.com
> I totally agree. So ideally, we want
>
> 1. Sampling based inference iterated as a URE rule
>
> 2. Such rule should be able to benefit from the broader inference
> capabilities of the URE
>
> Item 1. is trivial (just wrap whatever sampling code in a rule). Item
> 2. is less, for instance it should be able to recreate the kind of
> program analysis presented in "Efficiently Sampling Probabilistic
> Programs via Program Analysis" and beyond. I think some wisdom from
> Ben's SampleLink can be borrowed here, obviously the devil is in
> the details.
>
> Maybe we want some form of adaptive iterative sampling, with some kind
> of context sensitive sampling rules, I don't know.
>
>
> I would like something more generic as described above.

Awesome, me too.

Nil

Nil Geisweiller

unread,
Jun 1, 2018, 1:47:17 AM6/1/18
to linasv...@gmail.com, Alexey Potapov, Nil Geisweiller, opencog, Ben Goertzel, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
On 06/01/2018 04:07 AM, Linas Vepstas wrote:
> 2) I'm starting to think that LambdaLink is another design flaw. Here's
> why. (a) lambdas make it easier for *humans* to declare functions, so
> as to provide encapsulation, modularity. Atomese is not intended for
> humans, its intended for things like MOSES and PLN and URE to perform
> reasoning. I suspect that Lambda makes reasoning harder, not easier.
> viz. unification for example. (b) lambda is meant to allow a single
> implementation to stand for repeated invocations of the same thing, viz
> to avoid cutting-n-pasting of code. However, the atomspace already
> avoids this, automatically: atoms are globally unique, and so lambda is
> not required to obtain uniqueness.

I agree, there are other transformations on functions (beside lambda
abstraction and application) that may be easier to reason with, for
instance if you which to build the sum of 2 functions you may write

Lambda
X
Plus
ExecutionOutput
Schema "f1"
X
ExecutionOutput
Schema "f2"
X

Or you may overload Plus to operate on functions and write

Plus
Schema "f1"
Schema "f2"

the latter is obviously easier to reason on.

Nil

Alexey Potapov

unread,
Jun 1, 2018, 5:33:57 AM6/1/18
to Nil Geisweiller, opencog, Ben Goertzel, Vitaly Bogdanov, Cassio Pennachin, Linas Vepstas
Linas, Nil,
it seems you didn't get my main point. RandomNumberLink, RandomStream, RandomChoiceLink are fine, but this is not what I was talking about. They can only be used to sample atoms or values. But we might want not to sample, but to infer these values also by any inference algorithm. We may want to use basic distribution declaratively, but not functionally to sample values (that's why we might want a SampleLink which explicitly samples values from random variables). In PPLs, any random choice is automatically a variable, which value is kept in the computation trace and is used for consequent sampling. In Tensorflow, you explicitly use tf.Variable to say that you want to screen the computation subgraph from evaluation and to memorize its previous evaluation result. Since we don't want to make OpenCog just a traditional PPL, we don't want to treat RandomNumberLink, RandomStream, RandomChoiceLink, etc. as variables by default. And now they don't work as variables at all. 
RandomStream works like tf.random_uniform([n], 0, 1), but not like tf.Variable(tf.random_uniform([n], 0, 1)). You can use RandomNumberLink, RandomStream, RandomChoiceLink only to implement the very basic rejection sampling. To implement, say, MCMC or Simulated annealing, you'll need a lot of other machinery.
Maybe, you can explicitly code in Atomese some workaround (e.g. you can "screen" RandomChoice by LambdaLink and explicitly code a sampling procedure in Atomese that simply executes the screened random choice instead of having generic SampleLink), but it will be infeasible to code everything we need in Atomese. Like Pattern Matcher tries to ground VariableNodes, optimization and inference algorithms should try to infer Values of some VariableValues or something like this. I don't insist that we should have an analog of VariableNode for Values, but we should have some standard way of saying that these are unknown values, we should be either marginalized out or inferred, or sampled or whatever...

As I said, if this is still unclear, we can try to write sketches of possible use cases...

Nil Geisweiller

unread,
Jun 1, 2018, 6:14:54 AM6/1/18
to Alexey Potapov, Nil Geisweiller, opencog, Ben Goertzel, Vitaly Bogdanov, Cassio Pennachin, Linas Vepstas
Alexey,

I think I understand what you mean, that is I believe what I meant by
having sampling-based inference rules in the URE, so that analytical and
sampling-based reasoning can seemingly inter-operate.

I can clearly see how one could have some sampling based inference
rule(s) beside analytical rules, utilizing axioms about probability, not
just Kolmogorov, that would be impractical, but more developed and
specialized knowledge, like say

(X1, ..., Xk) = Dirichlet(a1, ..., ak)
=>
(X1, ..., Xi+Xj, ..., Xk) = Dirichlet(a1, ..., ai+aj, ..., ak)

which could be represented as Implication links, etc.

On top of that the operators involved with Atomese probabislistic
programs would need to axiomatized as well, partly or wholly.

So that part is clear to me. What is less clear is how to control the
beast, not just how to mix sampling-based and analytic rules, because
it's just a particular case of inference control, but how to modulate
the sampling-based rules to be efficient as well! These rules need to
somehow recurse, they need to ask "is taking this sampling path gonna be
fruitful?". That is, the AGI-recursion need to take place inside the
rule itself. That is what I meant by using SampleLink as an inspiration
(or more perhaps).

Regarding modeling the unknown vs the known/inferred I think the
uncertainty management of OpenCog can take care of that. Currently it's
not optimal, the TV of higher confidence overwrite the TV of lower
confidence regardless of how much they have in common (it assumes that
they have as much as they could, which is very conservative), if that
can be improved then I think it would provide a solution for that. It
could ultimately be that we need an arbitrary number of levels of order
of distribution (i.e. third, fourth, etc, order distribution), as we
discussed once, but that doesn't change much the idea.

Nil

Nil Geisweiller

unread,
Jun 1, 2018, 6:20:08 AM6/1/18
to Alexey Potapov, Nil Geisweiller, opencog, Ben Goertzel, Vitaly Bogdanov, Cassio Pennachin, Linas Vepstas
On 06/01/2018 01:14 PM, Nil Geisweiller wrote:
> So that part is clear to me. What is less clear is how to control the
> beast, not just how to mix sampling-based and analytic rules, because
> it's just a particular case of inference control, but how to modulate
> the sampling-based rules to be efficient as well! These rules need to
> somehow recurse, they need to ask "is taking this sampling path gonna be
> fruitful?". That is, the AGI-recursion need to take place inside the
> rule itself. That is what I meant by using SampleLink as an inspiration
> (or more perhaps).

So that's why I suggested to have incremental sampling-based rules, so
that the decision of how to sample can be deferred to the inference
control mechanism. It's not too clear to me how that would work,
although I can see it's possible in principle.

Agreed we need to craft some simple examples to help us reason about that...

Nil

Alexey Potapov

unread,
Jun 1, 2018, 9:00:48 AM6/1/18
to Linas Vepstas, Nil Geisweiller, opencog, Ben Goertzel, Константин Тимофеев, Vitaly Bogdanov, Cassio Pennachin
Linas,
I hope to get back to your messages and answer in more detail later, if there still be a necessity for this. Here, I just try to answer your explicit questions.

2018-06-01 4:07 GMT+03:00 Linas Vepstas <linasv...@gmail.com>:


3) If/when you rewrite the above example in atomese, you will find that it is much more verbose than webPPL. Again, that is because webPPL is designed for human programmers to express structure in human-readable, human-understandable ways.  Atomese is not meant to be human-readable, it is meant to be machine-readable.  So, the question really is: if you had the above function, written in Atomese, what are you going to do with it? What's the point? Why do you want to write it in Atomese?

These are not rhetorical questions. I want Atomese, because I want to make it easy for Sophia to learn, viz, if two people are in the room, and video camera shakes, and one of the people says "its an earthquake!" and the other says "its a burglary!" that she can associate video-camera shaking with burglaries and earthquakes.  Or at least, alarms.  I'm not particularly interested in having human programmers writing isAlarm code. 

I'm OK with more verbose programs. Why do I need this in Atomese? Yes, to make Sophia able to look at associate video-camera shaking with earthquakes for example. To do this, we need to bridge the symbolic/subsymbolic gap. To do this, we need a tight integration of PLN/URE reasoning with subsymbolic inference. To do this, we need this inference integrated into Atomese. And this is exactly what I'm talking about right now (in my last few messages). We started with examples of SynerGANs, VQA, etc. and they are still here.
So, what am I going to do with a generative probabilistic or DNN model integrated to Atomese? Do inference over it or train it together with a symbolic part. In the SynerGAN example, we need to learn a deep generative model with a latent code over which structural dependencies are imposed. To do this, it's not enough to train InfoGANs first, and then to apply Pattern Miner to the latent code activations. Recently, even this was not possible, since Pattern Miner couldn't work with patterns in Values. Now, it might be possible, but will be useless for a number of reasons. What I want is to make it working. For this, we first need some "infrastructural" extensions like ValueOf link, VariableValues or something like that. Then, we will need to implement some inference procedures which will span both values and atoms (or to make MOSES working for this case, or to make URE working in synergy with PPL/DNN-style inference, which is more applicable to Tensors, or whatever...).

Linas Vepstas

unread,
Jun 1, 2018, 4:55:03 PM6/1/18
to Nil Geisweiller, Alexey Potapov, opencog, Ben Goertzel, Vitaly Bogdanov, Cassio Pennachin
On Fri, Jun 1, 2018 at 5:14 AM, Nil Geisweiller <ngei...@googlemail.com> wrote:
 but how to modulate the sampling-based rules to be efficient as well! These rules need to somehow recurse, they need to ask "is taking this sampling path gonna be fruitful?". That is, the AGI-recursion need to take place inside the rule itself. That is what I meant by using SampleLink as an inspiration (or more perhaps).

I need to ponder Alexy's email a bit more; however, I can make one quick remark here: the intent of the stream/value/atom wrappers is not to perform computations "in the atomspace", but rather, to represent the computations being done somewhere else -- in some library, somewhere, possibly on some GPU's somewhere. 

Thus, there would be either a TFVariableStream (a Value) or maybe a TFVariableLink (an ordinary atom), and, when properly connected/instantiated, these cause the standard, stock, unmodified tensorflow code to launch, start, stop, and do whatever it does, on GPU's or wherever.  Viz, just like many other Atoms/Values, TFVariableStream/TFVariableLink would be C++ classes, hooked into tensorflow libraries in some appropriate way, that either alter the internal state of tensorflow, or query what tensorflow is doing.

My interest in Values/Streams is simply as a way of monitoring what its doing, for example, "are you done yet?"  -- this could even be, say a TFTruthValue -- just like a regular TruthValue, except that its "strength" might be "percentage finished", and its "confidence" would be "degree of convergence" -- or something like that.  Every time PLN looked at the TFTruthValue, the underlying C++ class would call into tensorflow,  obtain some percent-done number, and return that to PLN.

To answer: "is this sampling path gonna be fruitful?" would require having run multiple previous, similar sampling paths, summarizing the result of those previous runs as one or more TV's (static TV's, this time, not dynamic ones, so that we can save them in persistence-to-disk/persistence-to-cloud) Then PLN compares this sampling path to previous results on similar sampling paths, performs the required exploit-or-explore multi-armed bandit decision.  

Alexey Potapov

unread,
Jun 3, 2018, 5:13:35 PM6/3/18
to Linas Vepstas, Nil Geisweiller, opencog, Ben Goertzel, Vitaly Bogdanov, Cassio Pennachin
Hi Linas, Nil.

Let's start to craft an example.
Consider a (naive) generative model of images of faces:

(define P-face (uniform 0 1))

(define face (flip P-face))

(define P-eyes-if-face (uniform 0 1))

(define P-eyes-if-noface (uniform 0 1))

(define eyes (flip (if face P-eyes-if-face P-eyes-if-noface)))

(define P-both-eyes-if-face (uniform 0 1))

(define P-both-eyes-if-noface (uniform 0 1))

(define P-both-eyes (if face P-both-eyes-if-face P-both-eyes-if-noface))

(define both-eyes (flip P-both-eyes))

(define left-eye (if eyes (if both-eyes #t (flip 0.5)) #f))

(define right-eye (if eyes (if both-eyes #t (not left-eye)))

#…

(define appearance (repeat 20 (lambda () (gaussian 0 1))))

(define z (append (list left-eye right-eye nose … ) appearance))

(define image-generated (DNN z W))


Here, we have some prior probabilities, which themselves are sampled (unknown). We have some "symbolic" concepts and relations, and a subsymbolic generative network hidden in DNN. W might be also sampled from some prior distribution.
Ultimately, we would like our model to learn all concepts and relations together with the structure of the rest latent code and architecture of DNN. But let's start with a very simplified setting.
The question is how to map this to Atomese. If we limit ourselves with the inference (generation) stage, then it looks mostly straightforward. Nevertheless, I'd like to know what Links and Atoms will you propose to use.

What is much trickier is to train this model (even without structural learning).
The loss function is the marginal probability of generating training samples. We could calculate it using sampling with a soft condition:
(mh-query ...
  ; model ...
  ; what we would like to infer ...
  (flip (exp (- (* precision (sqr (tf.reduce_mean image-generated image-real))))))
)
This is, of course, a pseudo-code. Nevertheless, one can imagine this code working in a PPL. Of course, blind sampling will do the trick only with very low precision, and we need a variational approximation or an adversarial loss to make training more efficient. In both cases of using some advanced sampling or gradient descent, we need to keep at least one set of values for all variables including not only the DNN parameters W, but also P-face, and all other values of random choices. And we need some optimization/inference process that will span both DNN parameters and probabilities/values associated with symbolic concepts.
Well... there are many aspects regarding learning. Maybe, we should start with the inference stage. This stage fits what Pattern Matcher is doing, while learning is an iterative process which is more similar to URE.
So, how do you think the above example can be better implemented in Atomese?

-- Alexey

Nil Geisweiller

unread,
Jun 4, 2018, 3:29:17 AM6/4/18
to Alexey Potapov, Linas Vepstas, Nil Geisweiller, opencog, Ben Goertzel, Vitaly Bogdanov, Cassio Pennachin
Hi,

here's a possible Atomese representation. Not necessarily the right
one, though maybe, because it "kills" a bit the PPL logic, but I think
it's worth presenting.

(define (face-tv) (stv 1 0))
(Predicate "face" face-tv)

(stv 1 0) being equivalent to a uniform second order probability
distribution (assuming a Bayesian prior) of having

(Evaluation #t
(Predicate "face")
X)

where X is any atom, supposedly representing an image. If X were to be
drawn from a subset of all possible atoms, context link could be used,
such as

(Context face-tv
(Predicate "image")
(Predicate "face"))

where (Predicate "image") would be a predicate that is true when its
argument is an image, false otherwise.

which, in this case, would be equivalent to

(And face-tv
(Predicate "image")
(Predicate "face"))

since (Predicate "face") is marginal.

I'll continue further without using ContextLink, assuming the universe
is a set of images.

(define eyes-if-face-tv (stv 1 0))
(define eyes-if-noface-tv (stv 1 0))

Now I'm gonna use a to-be-implemented constructor dv, for
Distributional Value, see
https://github.com/opencog/atomspace/issues/833

(define eyes-dv (dv ((#f (eyes-if-face-tv) (#t (eyes-if-no-face-tv))))))

this is a conditional distributional value (specifically here a
conditional truth value as well) describing the second order
distribution of the probability of having eyes if there is a face in
the image, and the second order distribution of the probability of
having eyes if there is no face in the image.

(Implication eyes-dv
(Predicate "face")
(Predicate "eyes"))

(define both-eyes-if-face-tv (stv 1 0))
(define both-eyes-if-noface-tv (stv 1 0))

(define both-eyes-dv (dv (list #f both-eyes-if-face-tv) (list #t
both-eyes-if-noface-tv)))

(Implicaton both-eyes-dv
(Predicate "face")
(Predicate "both-eyes"))

Now let's describe the relationship between eyes, both-eyes and
left-eye

(Implication (dv (list (list #f #f) 0)
(list (list #f #t) 0)
(list (list #t #f) 0.5)
(list (list #t #t) 1))
(List
(Predicate "eyes")
(Predicate "both-eyes"))
(Predicate "left-eye"))

this essentially describes a Bayesian network edge between the pair of
variables (eyes, both-eyes) and the variable left-eye. To be in
accordance with Alexey's example, it is a first order (conditional)
distribution. Respectively, 0 is the probability of having a left-eye
given having no eyes, 0.5 is the probability of having left-eye given
eyes but no both-eyes, etc.

Maybe the formalism to describe the conditional probabilities could be
more PPL like, which could help having more compact representation.

Let us now describe the relationship between eyes, both-eyes,
left-eyes and right-eye

(Implication (dv (list (list #f #f _) 0)
(list (list #t #f #f) 1)
(list (list #t #f #t) 0)
(list (list #t #f _) 1))
(List
(Predicate "eyes")
(Predicate "both-eyes")
(Predicate "left-eye"))
(Predicate "right-eye"))

where _ is used to avoiding duplicating probabilities.

There are be other/better ways to represent that, that depart even
further from the PPL logic of the example, for instance

(Implication #t
(Predicate "both-eyes")
(And
(Predicate "left-eye")
(Predicate "right-eye")))

(Implication #t
(And
(Predicate "eyes")
(Not (Predicate "both-eyes")))
(XOr
(Predicate "left-eye")
(Predicate "right-eye")))

etc.

Once we have all that then we can generate samples like

(Evaluation (Predicate "face") X)

for specific Xs, maybe via a schema (generative model) a sensory
inputs. Then infer knowledge about these instances using conditional
instantiation, etc, to obtain

(Evaluation (Predicate "left-eye") X)

etc.

Nil

Alexey Potapov

unread,
Jun 4, 2018, 3:54:37 AM6/4/18
to Nil Geisweiller, Linas Vepstas, opencog, Ben Goertzel, Vitaly Bogdanov, Cassio Pennachin
Nil,
thanks. So, do you think GDTV is really necessary here?
Also, I'm not quite sure if you wrote a program (or how do we call this?) for recognizing the presence of a face, or to generate an image of a face?
As far as I understand, (Evaluation (Predicate "face") X) tries to evaluate if (Predicate "face") true for a given X, right?
If so, this is also relevant as an example, but I asked about generating/imagining X from "nothing". This means that we randomly sample if there is a face, if this face has two eyes, etc. So, it would be ok to use RandomStream for (define appearance (repeat 20 (lambda () (gaussian 0 1)))) except there will be a problem with learning.
I expected to see ImplicationLinks, stv's, etc., and one my implicit question was what will invoke sampling through Implication and other declarative knowledge pieces and how will sampling with the use of (gd?)tv be done?..

Nil Geisweiller

unread,
Jun 4, 2018, 6:05:22 AM6/4/18
to Alexey Potapov, Nil Geisweiller, Linas Vepstas, opencog, Ben Goertzel, Vitaly Bogdanov, Cassio Pennachin
On 06/04/2018 10:54 AM, Alexey Potapov wrote:
> If so, this is also relevant as an example, but I asked about
> generating/imagining X from "nothing". This means that we randomly
> sample if there is a face, if this face has two eyes, etc. So, it would

There are no URE rules that currently do that.

Currently the closest you can get without writing new rules is to have
some process that inserts in the atomspace knowledge such as

(Evaluation #t
(Predicate "image")
(Concept "i1"))

where i1 is generated. So this process adds the assumed knowledge that
"i1" is an image. Then using knowledge like

Implication
P
Q

and instances of P like

Evaluation px1-tv
P
X1
...

Then the conditional instantiation rule can infer knowledge like

Evaluation qx1-tv
Q
X1
...

with the qx1-tv calculated to represent the probability distribution
of such instance. In the end you with "sample prototype" + probability
distributions rather than specific samples drawn from such
distribution.

So, for example, if say you generate an image instance, called i1, add
as knowledge that it is an image

(Evaluation #t
(Predicate "image")
(Concept "i1"))

you can infer that

(Evaluation (stv 1 0)
(Predicate "face")
(Concept "i1"))

(using Context to Implication rule + conditional instantiation), etc,
till you get something like

(Evaluation some-tv
(Predicate "left-eye")
(Concept "i1"))

where some-tv is not #t or #f but rather a distribution. If you want
to realize (collapse the probability distribution of) that instance,
it's possible but you'd need to code some URE rule for that, if you
don't want to spoil the true inferred tv, some-tv, it could be
expressed using ContextLink, something like

(Context #t
(Predicate "my-imaginary-world-1")
(Evaluation some-tv
(Predicate "left-eye")
(Concept "i1")))

(Context #f
(Predicate "my-imaginary-world-2")
(Evaluation some-tv
(Predicate "left-eye")
(Concept "i1")))

so these wouldn't overwrite the universal truth of some-tv, while
generating plausible samples of it.

I suppose one could skip the use of probabilistic inference + Context
by using URE rules that directly samples instead. Maybe each PLN rule
could be used in sampling mode instead of full second order
probability inference mode.

It would be an interesting exercises to implement such rules (or rule
transformer if we want to use the existing PLN rules). So for
instance, given

Implication 0.6 (to use a simple first order probability)
P
Q

and

Evaluation #t
P
i1

it would generate

Evaluation #t or #f
Q
i1

with #t or #f drawn from the probability distribution described in the
implication.

Nil

Alexey Potapov

unread,
Jun 4, 2018, 7:04:46 AM6/4/18
to Nil Geisweiller, Linas Vepstas, opencog, Ben Goertzel, Vitaly Bogdanov, Cassio Pennachin
OK. Let's start from the end.
Assume we have a Node "image", which value is calculated by some DNN. How can we better represent this?
We would like to invoke this calculation using VALUE_OF link applied to "image".
DNN has a parameter, i.e. value of some atom, which calculation is invoked when we want to calculate the DNN output.

Curtis Faith

unread,
Jun 5, 2018, 1:06:01 PM6/5/18
to ope...@googlegroups.com
I will have more on this later but let me say that I enjoy nothing more than writing a new language to unify and translate. See my linkedIN profile on my p2c work, and Jim Schmidt's work while running my engineering when we built the think Apple failed at doing but ironically enough Apple dropped the business line just as that was to be our fucking competitive advantage. It had native controls on Windows with a superior Mac programming model. I've been thinking about the programming model and am ready to call a team to D.C. to start it soon.

I found, an original Like a Prayer platinum certification like hanging on the artist's wall but not the writer's wall because they had to hock it and got shit for it album in the original mint condition everything album, which I will be leverage through my trading acumen into a sum you might image low at.... but that I conservatively estimate to be worth more than any record ever auctioned but this one won't be. Jim will be coming down later to secure it with me so we can lock it down. Damn that was a lucky find.

It would be good to see you guys and anyone you want to bring if anyone wants to come to D.C. now or in the next few.

- Curtis



--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CABpRrhwjbFqve_sREBSraJsZCDPYzSq_eQWcqYTLyr7COUCvrA%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Nil Geisweiller

unread,
Jun 6, 2018, 1:18:38 AM6/6/18
to ope...@googlegroups.com, Curtis Faith
Hi Curtis,

good to hear from you. So you know, I think you've sent your message to
the wrong recipient.

Nil
> <mailto:ngei...@googlemail.com>>:
> send an email to opencog+u...@googlegroups.com
> <mailto:opencog+u...@googlegroups.com>.
> To post to this group, send email to ope...@googlegroups.com
> <mailto:ope...@googlegroups.com>.
> <https://groups.google.com/group/opencog>.
> <https://groups.google.com/d/msgid/opencog/CABpRrhwjbFqve_sREBSraJsZCDPYzSq_eQWcqYTLyr7COUCvrA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
> For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to opencog+u...@googlegroups.com
> <mailto:opencog+u...@googlegroups.com>.
> To post to this group, send email to ope...@googlegroups.com
> <mailto:ope...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAJzHpFq%2Bvq4LK5H77RRusqtj2gC6NRKwRfke47V-y5yROjRZJQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/opencog/CAJzHpFq%2Bvq4LK5H77RRusqtj2gC6NRKwRfke47V-y5yROjRZJQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Nil Geisweiller

unread,
Jun 6, 2018, 1:31:47 AM6/6/18
to Alexey Potapov, Nil Geisweiller, Linas Vepstas, opencog, Ben Goertzel, Vitaly Bogdanov, Cassio Pennachin
Hi Alexey,

On 06/04/2018 02:04 PM, Alexey Potapov wrote:
> OK. Let's start from the end.
> Assume we have a Node "image", which value is calculated by some DNN.
> How can we better represent this?
> We would like to invoke this calculation using VALUE_OF link applied to
> "image".
> DNN has a parameter, i.e. value of some atom, which calculation is
> invoked when we want to calculate the DNN output.

I don't understand very well the question. I suppose you may want to
store both parameters and outputs as values.

As for calculating DNN output, I don't think Atomese alone, without
resorting to grounded schemata can do it ATM, as I don't see any atom
link type value modifier (as opposed to ValueOf).

If your question is about StreamValue, I don't how much about it.

Linas?

Nil

Vitaly Bogdanov

unread,
Jun 6, 2018, 8:27:54 AM6/6/18
to Nil Geisweiller, Alexey Potapov, Linas Vepstas, opencog, Ben Goertzel, Cassio Pennachin
Assume we have a Node "image", which value is calculated by some DNN. How can we better represent this?
We would like to invoke this calculation using VALUE_OF link applied to "image".
DNN has a parameter, i.e. value of some atom, which calculation is invoked when we want to calculate the DNN output.

I don't understand very well the question. I suppose you may want to
store both parameters and outputs as values.

As for calculating DNN output, I don't think Atomese alone, without
resorting to grounded schemata can do it ATM, as I don't see any atom
link type value modifier (as opposed to ValueOf).

If your question is about StreamValue, I don't how much about it.

The following code example illustrates possible representation in atomeese:

(define (test atom)
    (display "My func called with atom arguments") (newline)
    (display atom) (newline)
)

(cog-set-value! (Concept "argHolder") (Predicate "arg") (ConceptNode "value"))

(cog-set-value!
    (Concept "image")
    (Predicate "dnn")
    (EvaluationLink
        (GroundedPredicateNode "scm: test")
        (ListLink (ValueOf (Concept "argHolder") (Predicate "arg")))
    )
)

(cog-evaluate! (ValueOf (Concept "image") (Predicate "dnn")))
 

This code works but there are things which doesn't work or not natural:
1) argHolder holds an argument and the argument cannot be ProtoAtom at the moment. If any value ((RandomStream 1) for instance) is used instead of (ConceptNode "value") in the example above then Instantiator dynamically casts it to the Handle, the cast returns null pointer and "test" function is called with empty list of arguments.
2) There is ValueOfLink to get value but there is no atom to set value. In the example above (cog-set-value! ) is used to set values on atoms.

Vitaly

Alexey Potapov

unread,
Jun 7, 2018, 2:48:56 PM6/7/18
to Vitaly Bogdanov, Nil Geisweiller, Linas Vepstas, opencog, Ben Goertzel, Cassio Pennachin
2018-06-06 15:27 GMT+03:00 Vitaly Bogdanov <vsb...@gmail.com>:

The following code example illustrates possible representation in atomeese:

(define (test atom)
    (display "My func called with atom arguments") (newline)
    (display atom) (newline)
)

(cog-set-value! (Concept "argHolder") (Predicate "arg") (ConceptNode "value"))

(cog-set-value!
    (Concept "image")
    (Predicate "dnn")
    (EvaluationLink
        (GroundedPredicateNode "scm: test")
        (ListLink (ValueOf (Concept "argHolder") (Predicate "arg")))
    )
)

(cog-evaluate! (ValueOf (Concept "image") (Predicate "dnn")))
 

This code works but there are things which doesn't work or not natural:
1) argHolder holds an argument and the argument cannot be ProtoAtom at the moment. If any value ((RandomStream 1) for instance) is used instead of (ConceptNode "value") in the example above then Instantiator dynamically casts it to the Handle, the cast returns null pointer and "test" function is called with empty list of arguments.
2) There is ValueOfLink to get value but there is no atom to set value. In the example above (cog-set-value! ) is used to set values on atoms.

It might not be precisely clear, why we are not satisfied with this solution.
Let's imagine that we want OpenCog to learn to generate images distributed similarly to some training set. We can define some loss function. But over which variables should we optimize this loss function?
In Tensorflow, all tf.Variables, which the loss function depends on, are automatically assumed as the parameters to be optimized (if the set of variables to be optimized is not provided explicitly).
In PPLs, all random choices like flip or gaussian are treated as such variables.
Since we don't want to necessarily deal with PPL-style inference in OpenCog, explicit indication of varible values is necessary. We lack this right now.
Then, we need to automatically find the values to be optimized or marginalized. We don't need to do this explicitly only for the rejection-sampling style inference, because in this case we simply can run a query (conditional sampling) many times and get a histogram of a variable of interest.
However, if we want to use even simulated annealing, we need to memorize previous random choices (or values of variables to be optimized) and instead of resampling them from scratch we need to modify their previous values (i.e. to sample new values from a distribution shifted towards the previous values). In the case of simulated annealing, we need to sample new values from a shifted distribution first, and then to accept or reject this new candidate solution based on the value of the loss function.
1) If we cannot express our loss functions in pure Atomese, we cannot hope that we will be able to implement inference procedures in Atomese (and, thus, achieve self-referentiality, cognitive synergy between different procedures, etc.). Well... we can wrap cog-set-value! and cog-evaluate! over ValueOf link into some predicates, which will be used everywhere, but this seems less efficient and cumbersome...
2) Variable/unknown values, on which given loss function depends, can be naturally determined by Pattern Matcher during its recursive evaluation of subgraphs. Thus, we have a number of options. We can introduce some specific link like OptimizeLink or ProbabilisticQueryLink, which should be a counterpart of a BindLink, but for values (I mean OptimizeLink should do one step of optimization, but not the whole optimization process). Or, we can try to implement this functionality on the level of URE (which might not be as natural, but more flexible). In both cases, we need some means to express variable values, so PM could identify them as such or we could program URE to use some rules involving this type of links. I'd like to pospone the discussion on whether OptimizeLink/ProbabilisticQueryLink should be implemented as an extension/particular case of BindLink or should be implemented as rules for URE. However, I'd like to empasize that to proceed we need some unavoidable small modifications (setting values by means of Atomese, passing values as arguments, indicating inferrable/optimizable values or establishing an agreement that all uncertain values should be interpreted as such).

So, I'd like us to make the next small step... We can implement the necessary functionality by ourselves, but I'm afraid it will take too long either to coordinate each such small step or to wait Nil or Linas to have time and clear understanding of what we need to implement this. What is the best strategy to move forward faster?
Well... we can try doing as much prototyping wrapping existing sheme interfaces into Atomese predicates as possible without modifying core modules, but...


-- Alexey

Nil Geisweiller

unread,
Jun 8, 2018, 1:44:50 AM6/8/18
to Alexey Potapov, Vitaly Bogdanov, Nil Geisweiller, Linas Vepstas, opencog, Ben Goertzel, Cassio Pennachin
Hi,

On 06/07/2018 09:48 PM, Alexey Potapov wrote:
> So, I'd like us to make the next small step... We can implement the
> necessary functionality by ourselves, but I'm afraid it will take too
> long either to coordinate each such small step or to wait Nil or Linas
> to have time and clear understanding of what we need to implement this.

I don't worry too much about the time it would take you to implement
that (I would think Vitaly has gathered enough understanding to handle
that for instance) as long as you are confident that it is what you
need.

> What is the best strategy to move forward faster?

Maybe write a provisional design document detailing what you want to
achieve, how, etc, going into specifics when possible, supported by
examples, etc.

It would certainly help me :-)

> Well... we can try doing as much prototyping wrapping existing sheme
> interfaces into Atomese predicates as possible without modifying core
> modules, but...

Sure, maybe also use atoms instead of values, as inefficient as it may
be, just to get a living idea, even if it has plenty of shortcomings.

Nil

Linas Vepstas

unread,
Jun 10, 2018, 3:55:32 PM6/10/18
to Alexey Potapov, Nil Geisweiller, opencog, Ben Goertzel, Vitaly Bogdanov, Cassio Pennachin
Hi Alexey,

I have to ignore this email exchange for just a little while longer; a different project needs attention. I'll try to write back "real soon now".  On a related note, I would like to draw your attention to https://github.com/opencog/atomspace/blob/master/opencog/matrix/README.md  -- it is a technique for making portions of the atomspace look "just like vectors" -- thereby enabling those kinds of machine-learning algos that view the world as vectors.  Neural nets are at least partly vector-style machines.

The above has little/nothing to do with probabilistic programing, but it does provide an alternative window into the contents of the atomspace, that is more convenient to certain classes of algos.

--linas
Reply all
Reply to author
Forward
0 new messages