UnionLink, IntersectionLink, ComplementLink

Nil Geisweiller

unread,

Sep 10, 2021, 3:52:53 AM9/10/21

to ope...@googlegroups.com

Everybody,

Linas and I (and a mathematical divine entity, I'm sure) decided to
introduce

https://wiki.opencog.org/w/UnionLink
https://wiki.opencog.org/w/IntersectionLink
https://wiki.opencog.org/w/ComplementLink

to be used instead of OrLink, AndLink and NotLink when dealing with
concepts/sets instead of predicates.

Sorry for making such an important decision without much outside
consultation, but it happened "naturally" while dealing with the atom
type checker, it makes the code simpler, and I believe it will also
makes things simpler for human use. After all we already have
Inheritance vs Implication, so why not go all the way.

We're obviously open to revert that change if it turns out to be a bad
idea, but I don't expect it will be.

Nil

Linas Vepstas

unread,

Sep 10, 2021, 11:41:27 AM9/10/21

to opencog

Heh.

Nil gives credit to the divine which I would ascribe to plain-old
bumbling around. The definitions for these new link types were in the
definitions file for almost a decade, but stubbed out ... because no
one needed them, asked for them.

Some history: In "boolean algebra" (which deals with infinite sets),
the set-or is the same thing as set-union. Which makes sense, when
you look at it a bit. In comp-sci and logic, we are used to thinking
that logical-or operates only on "boolean" true/false, (and no one
ever calls it "logical-union")

In practice, we need both: PLN deals with named sets (for example:
Concept "animals") while low-level systems, like robot control, need
logical true/false (for example: if(block-on-table) then
pick-up-block)

The second system is very particular about what it allows. You can
write programs in Atomese, and run them. For example: (SequentialAnd
(Predicate "block-on-table") (Predicate "gripper-is-free") (Predicate
"pick-up-block")) will run these three actions in order, and stop at
the first one that fails. So if (Predicate "gripper-is-free") returns
false, then (Predicate "pick-up-block") will never run.

Because Atomese programs need to be, well, runnable without errors,
they need to be well-formed. That means that everything in a
SequentialAnd block *must* be evaluatable, i.e. *must* return
true/false when evaluated. Thus the idea of a type checker was born.
The code in Checkers.cc actually looks to make sure that everything in
a SequentialAnd block is evaluatable, and throws an error if not.

In practice, this prevented you from writing expressions like (OrLink
(Concept "sticky") (Concept "slimy")) because Concepts are not
evaluatable; they're sets. Predicates are evaluatable: (OrLink
(Predicate "sticky") (Predicate "slimy")) is acceptable. To keep the
peace with those who need to work with sets, UnionLink was invented.
This allows OrLink to continue onwards with strict type-checking,
while allowing UnionLink express ideas about set-unions.

Nil, BTW: we could arrange the type checker to check that UnionLink
only ever works with Sets and Concepts, and throw an error in all
other cases. This is probably stricter than you need it to be, right
now, but still, type-checking can be a useful thing for debugging.
----
As to divine intervention vs. bumbling around: I'm still working on
unsupervised learning, which I hope will someday be able to learn the
rules of (common-sense) inference. I think I know how to apply it to
audio and video data, and am looking for anyone who is willing to get
neck-deep in both code and theory. In particular, for audio and
video, I need someone who knows GPU audio/video processing libraries,
and is willing to learn how to wrap them in Atomese. For starters.
After that comes the painful slog of actually mangling data.

-- linas

> --
> You received this message because you are subscribed to the Google Groups "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/fcf938c8-5055-13af-d828-2782005b8839%40gmail.com.

--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

Adrian Borucki

unread,

Sep 12, 2021, 9:29:52 AM9/12/21

to opencog

I might have some time to help with this - I only did a bit of video / audio processing for ML but I have

some familiarity of AtomSpace, so that part should be easier.

Linas Vepstas

unread,

Sep 12, 2021, 12:55:23 PM9/12/21

to opencog

On Sun, Sep 12, 2021 at 8:29 AM Adrian Borucki <gent...@gmail.com> wrote:
>
>> ----
>> As to divine intervention vs. bumbling around: I'm still working on
>> unsupervised learning, which I hope will someday be able to learn the
>> rules of (common-sense) inference. I think I know how to apply it to
>> audio and video data, and am looking for anyone who is willing to get
>> neck-deep in both code and theory. In particular, for audio and
>> video, I need someone who knows GPU audio/video processing libraries,
>> and is willing to learn how to wrap them in Atomese. For starters.
>
>
> I might have some time to help with this - I only did a bit of video / audio processing for ML but I have
> some familiarity of AtomSpace, so that part should be easier.
>

Wow! That would be awesome!

I thought some more about the initial steps. A large part of this
would be setting up video/audio filters to run on GPU's, with the goal
of being able to encode the filtering pipeline in Atomese -- so that
expressions like "apply this filter then that filer then combine this
and that" are stored as expressions in the AtomSpace.

The research program would then be to look for structural correlations
in the data. Generate some "random" filter sequences (building on
previously "known good" filter structures) and see if they have
"meaningful" correlations in them. Build up a vocabulary of "known
good" filter sequences.

One tricky part is finding something simple to start with. I imagined
the local webcam feed: it should be able to detect when I'm in front
of the keyboard, and when not, and rank that as an "interesting" fact.
Possibly also detect day-night cycles. A very fancy thing to do would
be to notice that faces have two eyes that occur above a mouth, above
a chin, symmetrically arranged. That is, "eyes", "mouth" are two
"words" of a grammar, and the grammar is very strict: the only
grammatical "sentences" are those where the eyes are equidistantly
arranged in proportion to a mouth (an isosceles triangle). A goal
would be to learn that grammar. If the word "grammar" is confusing,
here, I can explain in greater detail. In short, all patterns have
grammars, and pattern recognition is the same thing as grammar
recognition.

This is a three stage project: building enough of an infrastructure to
run experiments, and then, running the experiments to see if they
work, and then refining the theory when they don't. All three are
different skill sets... that's what makes it a challenge.

-- Linas

Adrian Borucki

unread,

Sep 13, 2021, 7:49:19 AM9/13/21

to opencog

Sounds like something that would be processed with a library like OpenCV — it’s important to distinguish between

video data loading and using GPU-accelerated operations. My experience with the latter is very small as this is something usually wrapped with some

library like PyTorch or RAPIDS. Also there is a difference between running something on-line vs batch processing of a dataset — you mostly gain from GPU acceleration

when working with the latter, unless it’s something computationally expensive that’s supposed to run in real time.

First, we need to elucidate what actual “filters” are supposed to be used — when we have a list I can think about how the operations would be run.

Second, if you don’t have an existing dataset that we can use then we have to build one, that is probably the most time and resource-consuming task here… probably should be done first actually.

There are existing video datasets that might be useful, it’s worth looking into those.

Linas Vepstas

unread,

Sep 13, 2021, 1:53:55 PM9/13/21

to opencog, link-grammar

Good. Before that, though, I think we need to share a general vision
of what the project "actually is", because that will determine
datasets, libraries, etc. I tried to write those down in a file
https://github.com/opencog/learn/blob/master/README-Vision.md -- but
it is missing important details, so let me try an alternate sketch.

So here's an anecdote from Sophia the Robot: she had this habit of
trying to talk through an audience clapping. Basically, she could not
hear, and didn't know to pause when the audience clapped. (Yes, almost
all her performances are scripted. Some small fraction are ad libbed.)
A manual operator in the audience would have to hit a pause button, to
keep her from rambling on. So I thought: "How can I build a clap
detector?" Well, it would have to be some kind of audio filter -- some
level of white noise (broad spectrum noise), but with that peculiar
clapping sound (so, not pure white noise, but dense shot noise.)
Elevated above a threshold T for some time period of S at least one
second long. It is useful to think of this as a wiring diagram: some
boxes connected with lines; each box might have some control
parameters: length, threshold, time, frequency.

So how do I build a clap detector? Well, download some suitable audio
library, get some sound samples, and start trying to wire up some
threshold detector *by hand*. Oooof. Yes, you can do it that way:
classical engineering. After that, you have a dozen different other
situations: booing. Laughing. Tense silence. Chairs scraping. And
after that, a few hundred more... it's impossible to hand-design a
filter set for every interesting case. So, instead: unleash automated
learning. That is, represent the boxes and wires as Nodes and Links
in the AtomSpace (the audio stream itself would be an
AudioStreamValue) and let some automated algo rearrange the wiring
diagram until it finds a good one.

But what is a "good wiring diagram"? Well, the current very
fashionable approach is to develop a curated labelled training set,
and train on that. "Curated" means "organized by humans" (Ooof-dah.
humans in the loop again!) and "labelled" means each snippet has a
tag: "clapping" - "cheering" - "yelling". (Yuck. What kind of yelling?
Happy? Hostile? Asking for help? Are the labels even correct?) This
might be the way people train neural nets, but really, its the wrong
approach for AGI. I don't want to do supervised training. (I mean, we
could do supervised training in the opencog framework, but I don't see
any value in that, right now.) So, lets do unsupervised training.

But how? Now for a conceptual leap. This leap is hard to explain in
terms of audio filters (its rather abstract) so I want to switch to
vision, before getting back to audio. For vision, I claim there
exists something called a "shape grammar". I hinted at this in the
last email. A human face has a shape to it - a pair of eyes,
symmetrically arranged above a mouth, in good proportion, etc. This
shape has a "grammar" that looks like this:

left-eye: (connects-to-right-to-right-eye) and
(connects-below-to-mouth) and (connects-above-to-forehead);
forehead: (connects-below-to-left-eye) and
(connects-below-to-right-eye) and (connects-above-to-any-background);

Now, if you have some filter collection that is able to detect eyes,
mouths and foreheads, you can verify whether you have detected an
actual face by checking against the above grammar. If all of the
connectors are satisfied, then you have a "grammatically correct
description of a face". So, although your filter collection was
plucking eye-like and mouth-like features out of an image, the fact
that they could be arranged into a grammatically-correct arrangement
raises your confidence that you are seeing a face.

Those people familiar with Link Grammar will recognize the above as a
peculiar variant of a Link-Grammar dictionary. (and thus I am cc'ing
the mailing list.)

But where did the grammar come from? For that matter, where did the
eye and mouth filters come from? It certainly would be a mistake to
have an army of grad students writing shape grammars by hand. The
grammar has to be learned automatically, in an unsupervised fashion.
... and that is what the opencog/learn project is all about.

At this point, things become very highly abstract very quickly, and I
will cut this email short. Very roughly, though: one looks for
pair-wise correlations in data. Having found good pairs, one then
draws maximum spanning trees (or maximum planar graphs) with those
pairs, and extracts frequently-occurring vertex-types, and their
associated connectors. That gives you a raw grammar. Generalization
requires clustering specific instances of this into general forms. I'm
working on those algos now.

The above can learn (should be able to learn) both a "shape grammar"
and also a "filter grammar" ("meaningful" combinations of processing
filters. Meaningful, in that they extract correlations in the data.)

So that is the general idea. Now, to get back to your question: what
sort of video (or audio) library? What sort of dataset? I dunno.
Beats me. Best to start small: find some incredibly simple problem,
and prove that the general idea works on that. Scale up from there.
You get to pick that problem, according to taste.

One idea was to build a "French flag detector": this should be "easy":
its just three color bars, one above the other. The grammar is very
simple. The training set might be a bunch of French flags. Now, if
the goal is to ONLY learn the shape grammar, then you have to hack up,
by hand, some adhoc color and hue and contrast filters. If you want to
learn the filter grammar, then .. well, that's a lot harder for
vision, because almost all images are extremely information-rich. The
training corpus would have to be selected to be very simple: only
those flags in canonical position (not draped) Then, either one has
extremely simple backgrounds, or one has a very large corpus, as
otherwise, you risk training on something in the background, instead
of the flags.

For automated filter-grammars, perhaps audio is simpler? Because most
audio samples are not as information-rich as video/photos?

I dunno. This is where it becomes hard. Even before all the fancy
theory and what-not, finding a suitable toy problem that is solvable
without a hopeless amount of CPU -processing and practical stumbling
blocks .. that's hard. Even worse is that state-of-the-art neural-net
systems have billions of CPU-hours behind them, computed with
well-written, well-debugged, highly optimized software, created by
armies of salaried PhD's working at the big tech companies. Any
results we get will look pathetic, compared to what those systems can
do.

The reason I find it promising is this: All those neural net systems
do is supervised training. They don't actually "think", they don't
need to. They don't need to find relationships out of thin air. So I
think this is something brand new that we're doing that no one else
does. Another key difference is that we are working explicitly at the
symbolic level. By having a grammar, we have an explicit part-whole
relationship. This is something the neural-net guys cannot do (Hinton,
I believe, has a paper on how one day in the distant future, neural
nets might be able to solve the part-whole relationship problem. By
contrast, we've already solved it, more or less from day one.)

We've also "solved" the "symbol grounding problem" -- from day one.
This is another problem that AI researchers have been wringing their
hands about, from the 1960's onwards. Our symbols are grounded, from
the start: our symbols are the filter sets, the grammatical dictionary
entries, and we "know what they mean" because they work with explicit
data.

Another very old AI problem is the "frame problem", and I think that
we've got that one licked, too, although this is a far more tenuous
claim. The "frame problem" is one of selecting only those things that
are relevant to a particular reasoning problem, and ignoring all of
the rest. Well, hey: this is exactly what grammars do: they tell you
exactly what is relevant, and they ignore the rest. The grammars have
learned to ignore the background features that don't affect the
current situation. But whatever... This gets abstract and can lead to
an endless spill of words. I am much more interested in creating
software that actually works.

So .. that's it. What are the next steps? How can we do this?

-- Linas

Nil Geisweiller

unread,

Sep 14, 2021, 4:46:59 AM9/14/21

to ope...@googlegroups.com

On 9/10/21 18:41, Linas Vepstas wrote:
> Nil, BTW: we could arrange the type checker to check that UnionLink
> only ever works with Sets and Concepts, and throw an error in all
> other cases. This is probably stricter than you need it to be, right
> now, but still, type-checking can be a useful thing for debugging.

Sure. Let's give some time to gradually transition (there might still
be some code using Or, And, Not over sets/concepts), then add these checks.

Nil

Adrian Borucki

unread,

Sep 14, 2021, 10:09:30 AM9/14/21

to opencog

Here’s the part I have questions about: how do you deal with the fact that the regions won’t often be connected?

I am familiar with an idea of using Region Connection Calculus mentioned in places like “Symbol Grounding via Chaining of Morphisms” and chapter 17 on spatio-temporal inference from EGI vol. 2.

And it seems you have to use fuzzy versions of these relationships because, using the face grammar example, you won’t get a situation where, for instance, detected eye regions (like bounding boxes from an object detector) are exactly connected together — there is going to be some distance in between.

So how do you deal with this? The STI chapter mentions certain computational difficulties with the fuzzy approach and proposes that using some crude assumptions you could have something that could then be trained on a dataset to further improve it.

Is this part of the “learn” project or is there some other approach to it?

Well, we can reuse some of those for our purposes — a generic object detection model can be used to spot all sorts of things on an image, we just need to find one that was trained with a taxonomy that suits us.

Using such models with OpenCog has been done already by Alexei Potapov et al. if I remember correctly. It’s mostly a matter of adapting that scheme to the specifics of this project.

The challange is, as always, to find data that has a model that can detect things we want — with faces for example I can’t find detectors for face parts but I can find models detecting key points, which includes mouths and eyes.

(like this library: https://github.com/open-mmlab/mmpose with this dataset: https://github.com/jin-s13/COCO-WholeBody)

Linas Vepstas

unread,

Sep 14, 2021, 10:53:18 PM9/14/21

to opencog

Trimming back the first part of the conversation...

On Tue, Sep 14, 2021 at 9:09 AM Adrian Borucki <gent...@gmail.com> wrote:

>
> Here’s the part I have questions about: how do you deal with the fact that the regions won’t often be connected?

I don't understand the question. What regions? Where did they come
from? What do you mean by "region"?

> I am familiar with an idea of using Region Connection Calculus mentioned in places like “Symbol Grounding via Chaining of Morphisms” and chapter 17 on spatio-temporal inference from EGI vol. 2.

I'm not familiar with this. What is a "region"?

> And it seems you have to use fuzzy versions of these relationships because,

Sorry, fuzzy version of what relationship?

> using the face grammar example, you won’t get a situation where, for instance, detected eye regions

Detecting eyes will be very hard; that won't be possible until a
rather large and complex software stack is working. That's why I
suggested starting with something simple -- for vision, detecting
tricolor flags in canonical position. Or maybe a video camera aimed at
a room or a sidewalk or street where there is low activity. For audio,
perhaps shifts in volume and frequency distribution.

I dunno -- do I need to try to think of other simple data streams? I
guess so ... Some people have recommended that video games be used as
input, but I really don't like that ... it seems too artificial. It's
problematic, for multiple reasons. What other kind of visual input is
simple enough to process, to be debuggable, as a proof-of-concept?

> (like bounding boxes from an object detector) are exactly connected together — there is going to be some distance in between.

What bounding boxes? Why would bounding boxes be needed? What would
you do with them?

> So how do you deal with this?

You splatted a bunch of questions without defining any of the
terminology, so I don't know how to respond... you seem to be thinking
of something very different from what I'm thinking of ... but I can't
tell what that is ...

> The STI chapter mentions certain computational difficulties with the fuzzy approach and proposes that using some crude assumptions you could have something that could then be trained on a dataset to further improve it.

What STI chapter? What fuzzy approach? Why do we need fuzzy-anything?
I thought I spelled out a rather specific, precise algorithm; the word
"fuzzy" did not appear in it ...

> Is this part of the “learn” project or is there some other approach to it?

The "learn" project has maybe 300+ pages of docs, but the basic ideas
are spelled out in a bunch of README's and overviews. It is possible
that these fail to communicate the ideas correctly, and .. that's
fixable, but will take some time. I'd rather exchange emails and
take steps one-at-a-time, rather than send you out to read hundreds of
pages of stuff...

> Well, we can reuse some of those for our purposes — a generic object detection model can be used to spot all sorts of things on an image,

Sure, but it will take many years if not a decade to build a "generic
object detection model". I don't think this is something easy or
quick -- that's the end-point, not the start point.

> we just need to find one that was trained with a taxonomy that suits us.

Learning the "taxonomy" would be a rather advanced stage of the
project. I'm sort-of-ish exploring some basic aspects of something
like that at the NLP level, but so far, its mostly ideas and very
little functional code. It will be at least a year and probably a lot
more, before we can learn taxonomy of visual or audio inputs. There's
a huge amount of preliminaries that have to be gotten out of the way.

> Using such models with OpenCog has been done already by Alexei Potapov et al. if I remember correctly. It’s mostly a matter of adapting that scheme to the specifics of this project.

? Alexy is working on something else entirely. I don't know what it
is, but its pretty much totally unrelated to what I'm working on...
unless he's keeping some secrets from me ...

> The challange is, as always, to find data that has a model that can detect things we want — with faces for example I can’t find detectors for face parts but I can find models detecting key points, which includes mouths and eyes.

OK, this is a misunderstanding. The goal is to NOT use some
pre-trained, pre-built, human-engineered face detector trained on a
corpus carefully curated by humans. If you are using systems that are
hand-crafted, hand-curated by humans, its not AGI any more. I'm very
much trying to go in the exact opposite direction. The goal is to get
the human data-engineering out of the loop.

Detecting faces will be hard. It *might* be possible, maybe, once
everything is wired up, tested, debugged, tuned, tweaked, re-designed
and re-written a few times. I doubt face detection will be achievable
any sooner than a year from now, and that's only if it's a year of
full-time hard work and a whole lot of luck. Otherwise, I think face
detection is probably out of reach, for the short term. Lots of much
more basic things have to come together, first.

I dunno, maybe one could get magically lucky, but I doubt it ...

-- linas

Adrian Borucki

unread,

Sep 15, 2021, 7:59:19 AM9/15/21

to opencog

Okay, some clarification is needed because there is a sentence

> Now, if you have some filter collection that is able to detect eyes, mouths and foreheads

That suggests using some pre-existing (i. e. human-engineered) solution to find things like eyes and mouths.

That said, I’ve looked again into the README and see that you specifically mention segmentation as an image processing step.

That makes sense, as segmentation means assigning a label to each pixel of the image. That means everything is going to be connected to something (in the worst case that something being the background).

All in all, I think using a pre-existing segmentation model would for now simplify the project but that is your call to make of course.

I can’t really opine about anything related to doing the “classic” Computer Vision — my only guess is that you can probably hook up OpenCV with AtomSpace. Also a handcrafted segmentator is still a human-engineered solution — I don’t know if it’s really simpler because it requires more domain-specific knowledge to understand and modify and is not going to be very robust.

Anyway, for now I don’t see much more to discuss actually, when you have decided what data to use we can just move on to implementing the basic functionality.

By the way there is also research into unsupervised segmentation with models like MONet or GENESIS that could be trained on arbitrary data to try to figure out what things to segment as “anonymous" objects.

It is still in fairly early stages though — those models handle just 64x64 pixel images (now with colour, thankfully) and of course are not particularly cheap to train… on an easy dataset perhaps using one of the findable checkpoints from such pre-trained models would work, that would have to be tested.

Adrian Borucki

unread,

Sep 15, 2021, 10:20:25 AM9/15/21

to opencog

As a side note about the potential performance of the image grammar described in the README:

A nice thing about using cardinal directions to relate structure elements to one another gives you translation invariance “for free”: for example you can shift a face around the image and its elements will be always oriented in the same way to each other.

What it does *not* give you is rotational invariance: for instance, in the face detection example you relate eyes as having this East-West relationship — that won’t be the case if a face is sideways, it will be North-South. Then there are two sideways directions and upside-down too…

That said, it seems to me that a learned grammar should still exclude at least some structures that are not faces, like if one eye is on the other side of the mouth. It is going to be more complicated than what is proposed in the README though.

Adrian Borucki

unread,

Sep 15, 2021, 1:07:45 PM9/15/21

to opencog

On Wednesday, 15 September 2021 at 13:59:19 UTC+2 Adrian Borucki wrote:

Okay, some clarification is needed because there is a sentence
> Now, if you have some filter collection that is able to detect eyes, mouths and foreheads

That suggests using some pre-existing (i. e. human-engineered) solution to find things like eyes and mouths.
That said, I’ve looked again into the README and see that you specifically mention segmentation as an image processing step.
That makes sense, as segmentation means assigning a label to each pixel of the image. That means everything is going to be connected to something (in the worst case that something being the background).

Eh, sorry for the confusion — I shouldn’t have used the term “connected”, I confused myself with thinking about two different contexts but using the same word. “Adjacent” is the better word. Also it’s obviously not important when considering directional connections, it’s not necessary for two things to be adjacent to calculate that, say, one of them is to the left of the other.

Linas Vepstas

unread,

Sep 15, 2021, 1:55:04 PM9/15/21

to opencog

On Wed, Sep 15, 2021 at 6:59 AM Adrian Borucki <gent...@gmail.com> wrote:
>
> Okay, some clarification is needed because there is a sentence
> > Now, if you have some filter collection that is able to detect eyes, mouths and foreheads
>
> That suggests using some pre-existing (i. e. human-engineered) solution to find things like eyes and mouths.

Ah, my mistake then. The intent was to illustrate a concept of a
"shape grammar": a means of describing the relationships of things in
2D, 3D or N-dimensional spaces. The "things" have labelled edges
connecting them; the "grammar" is what you get if you cut the edges in
half: a "thing" with a collection of labelled half-edges
("connectors").

If the "things" can be connected up in a grammatically-valid way, then
one has some assurance that a "face" was correctly recognized, because
all the parts are where they should be.

I used "face" as an example, because it seemed like it was easy to
explain. Of course, there is a recursion problem: how do you know that
something is an eye or a mouth? Those problems are also solved in a
similar fashion: a networked arrangement of filters -- a graph with
labelled edges -- having a grammar to it.

I'll try to fix the README to make this more clear

> That said, I’ve looked again into the README and see that you specifically mention segmentation as an image processing step.

Ugh. I mentioned that only as one possible "quick hack" (scaffolding)
to get a proof-of-concept working. It would have to be replaced by a
proper (learned) filter set for the final version. (Of course, the
learner might learn how to segment, but that is an unrelated result.)

> That makes sense, as segmentation means assigning a label to each pixel of the image. That means everything is going to be connected to something (in the worst case that something being the background).

No, I want to go very much in the opposite of that direction. I do NOT
want any pixels anywhere in the pipeline, and certainly not any
labelled pixels.

When I say "filter", I am envisioning a detector, for example,
something that says "the upper half of the visual field is blue and
the lower half of the visual field is green", and this gives a one-bit
result: true or false. This is not quite a primitive filter, but
rather is composed of some filters for hue and maybe brightness, and
some other filters that accept or reject upper and lower parts of the
visual field.

Exactly what sequence of image operations it is composed of would have
to be learned (rather than hand built). It would be learned by
observing many photos of outdoor scenes, and statistically noting that
blue is always above (even in photos of city scenes) and that green is
often below (in nature photos). Heck -- just being able to detect
"blue is above" and converting that into a one-bit true-false value
becomes an indicator that it is an outdoor scene. The "parsed image"
is the combination of the grammatical elements, the linkage that there
is blue above, and something else below (that is, a vertical change in
hue or brightness or saturation -- a somewhat sharp change -- perhaps
one with many sharp but randomly oriented derivatives.).

These filters need to be pixel-independent: I want to avoid the
silliness of having to write 1024 different filters, each having the
horizon in a slightly different pixel position. This means that the
filters really need to be wavelet filters, so that relative sizes and
scales are handled automatically.

... at least, that is the long-run idea. For bring-up and debugging,
almost any and all hacks are allowed, as otherwise rapid development
and testing is impossible.

> All in all, I think using a pre-existing segmentation model would for now simplify the project but that is your call to make of course.

Heh. Well, not "my call" -- this needs to be a collaborative project,
and I have no desire to project an authoritarian personality. So,
more like "your call", but I want you to make the right decision,
based on the understanding of what the project actually is. Given
that this is experimental and exploratory, it is entirely normal that
the process will be filled with bad decisions and failed designs.

For bring-up, to develop and prove that a shape grammar can actually
be learned, I suppose that some pre-existing segmentation model might
be OK, maybe. It makes me nervous, though, because it builds in a
component that might be hard to remove later. Also, I think that
learning the shape grammar is easy. People have talked about shape
grammars for 50 years, its not a new or novel concept; it should not
be that hard. The hard part is learning filter sequences that produce
useful outputs.

... and my philosophy of development is to focus on the hard parts
first. It's always easy to do the easy parts later.

> I can’t really opine about anything related to doing the “classic” Computer Vision — my only guess is that you can probably hook up OpenCV with AtomSpace.

Yes, and that would be an important part of the project. The tricky
part is how to not waste too much time on this -- how to hook up just
enough to get the basic ideas working.

This means picking half-a-dozen or a dozen basic image operations --
hue, brightness filters, maybe some edge detectors, laplacians,
threshold filters -- and figuring out how to compose them together. In
Atomese, it would look something like this:

(GreaterThanLink 0.5 ; select blue values
(HueFilterLink (Number 0.0 0.0 1.0) ; RGB, no red, no green, only blue
(HaarWavelet (Number 0 1) ; lowest order Haar in vertical
direction; none in horizontal
(VariableNode "x"))))

The above specifies a filter arrangement in Atomese, that would get
bound to a specific OpenCV pipe, when (Variable "x") is bound to a
webcam or photo. The above is just an example -- the learning process
would attempt different combinations of such things, and vary the
parameters, looking for "meaningful" pipelines and parameters.

Note that there are no pixels and no segmentation in the above. I
guess we could have a (ConvexContingousRegionLink ...) that detects a
convex group of pixels that are mostly the same color... but this does
not seem required right now.

In terms of AI, this is again nothing new: people have been writing
evolutionary algorithms to automatically discover these kinds of
processing pipelines, for many decades. At least, they were before
deep-learning. I think the progress in deep learning has halted work
on such ideas, because the neural nets work so fast and so well. So
what i'm proposing above is a big step backwards -- a big step
backwards in time, a big step backwards in computational ability,
compared to neural nets. The hoped-for gain is to have explicit
symbolic control over the elements in the pipeline. The learned
pipelines and parameters may be pretty random-looking, but they will
have an explicit symbolic representation, thus making them open to
reasoning, inference, deduction, and assorted abstract symbolic
manipulations, which neural nets cannot do.

> Also a handcrafted segmentator is still a human-engineered solution — I don’t know if it’s really simpler because it requires more domain-specific knowledge to understand and modify and is not going to be very robust.

Right. So maybe these should be avoided.

> Anyway, for now I don’t see much more to discuss actually, when you have decided what data to use we can just move on to implementing the basic functionality.
>
> By the way there is also research into unsupervised segmentation with models like MONet or GENESIS that could be trained on arbitrary data to try to figure out what things to segment as “anonymous" objects.

I saw that other email thread, I'll respond to it later (a few days,
maybe) We do need a source of test images. This could be (for the
above example) a collection of out-door photos with blue skies in
them. But maybe also a collection of children's toys on a table,
with photos taken from many different angles -- the different angles
would cause the filters to learn about objects in space. A second,
more difficult training set would involve the same toys, rearranged in
different locations.

--linas

> --
> You received this message because you are subscribed to the Google Groups "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/4447730d-70d7-47ac-b225-a0c19c36d64fn%40googlegroups.com.

Linas Vepstas

unread,

Sep 15, 2021, 2:07:41 PM9/15/21

to opencog

On Wed, Sep 15, 2021 at 9:20 AM Adrian Borucki <gent...@gmail.com> wrote:
>
> As a side note about the potential performance of the image grammar described in the README:
> A nice thing about using cardinal directions to relate structure elements to one another gives you translation invariance “for free”: for example you can shift a face around the image and its elements will be always oriented in the same way to each other.
> What it does *not* give you is rotational invariance: for instance, in the face detection example you relate eyes as having this East-West relationship — that won’t be the case if a face is sideways, it will be North-South. Then there are two sideways directions and upside-down too…
> That said, it seems to me that a learned grammar should still exclude at least some structures that are not faces, like if one eye is on the other side of the mouth. It is going to be more complicated than what is proposed in the README though.

Yes, to all of these. The cardinal directions were meant to be simple
examples. Rotational invariance could be achieved by... I dunno ..
maybe haar wavelets in polar coordinates or something like that. That
would be for later...

Sometimes you want rotational invariance, and sometimes you don't. For
stop lights, you always want the red light above the green light. For
the German flag, you always want the black bar above the yellow bar.

There is a famous old experiment in neuroscience, where volunteers
wear eyeglass prisms that reverse everything left-right, or up-down.
It takes them a few days to adapt, but after that, everything seems
completely normal ... until they take off the glasses!

At any rate, all initial examples have to be simple...

--linas

Message has been deleted

Adrian Borucki

unread,

Sep 16, 2021, 4:02:16 PM9/16/21

to opencog

Yeah, this is clear to me to now — the grammar learning part is kind of a given, the real question is how well this “image predicate” learning can go… This is a deep question as no one is even sure why neural nets themselves work so well.

What needs clarification is what the structure of this filter learning would be — what is the algorithm and what direct learning objective is it given?

Like in the above example, where are all these filters and numerical arguments even coming from? The numerical part is especially difficult, given that you seemingly want to get some symbolic structure out of it.

Going back to neural nets, the obvious problem is that if we make one big neural “filter” then you don’t know what is going on inside — so the learning will be “shallower”. The question is how much of a problem this really is.

Is learning down to the low-level filtering operations a viable approach right now?

An interesting research question is if you could train a neural net that can be “queried”, possibly in natural language or some simple formal one, so that the system on top of it can learn to “extract” various statements about an image out of it — so these predicates would be essentially hooked to some queries that get send to the underlying model. Technically this probably falls somewhere in the Visual Question Answering field… the challenge is that these models are trained to answer questions about more abstract things like objects, not some low level features of the image.

The final big question is what can you really do after you get that grammar? What sort of inferences? How useful they are? The key thing here is that if you, say, have a system that classifies pictures, if it being built on top of this whole grammar and filter learning pipeline means it doesn’t achieve competitive performance with neural nets then it’s difficult to see what the comparative advantage of it is — beyond the obvious advantage of interpretability, but that won’t save that solution if its performance is considerably lower.

Well, the problem is not really with grammars, that can definitely be useful, but if that “filter sequence” part works poorly then it will bottleneck the performance of the entire system. If that low level layer outputs garbage, then all the upper layers get garbage, and we know what happens when you have garbage inputs in this field...

Linas Vepstas

unread,

Sep 16, 2021, 4:32:11 PM9/16/21

to opencog

Hi Micheal,

On Thu, Sep 16, 2021 at 2:19 PM Michael Duncan <mjsd...@gmail.com> wrote:
>
> i don't have specific code references off the top of my head but imported moses models are boolean functions using these link types, is that relevant here?

Xabush would know for sure. The goal here was to avoid writing things
like (And (Concept "cat") (Concept "dog")) because cat and dog are not
true/false values. By contrast (Union (Concept "cat") (Concept "dog"))
does make sense, its the set-union of all cats and dogs. Likewise, (Or
(Predicate "is a cat") (Predicate "is a dog")) makes sense, because
predicates are true/false.

If moses/asmoses is NOT using this format, it should probably switch
over to this. This was provoked by a pull req from Hedra earlier this
year, a moses-related pull req to allow taking (And (Set ...)) and the
proposed solution iss to instead take (Intersection (Set ...)) See
issue #2814

-- Linas

> To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/90f32ce5-a96c-4ff0-a023-8cb180f2c544n%40googlegroups.com.

Michael Duncan

unread,

Sep 17, 2021, 7:02:58 AM9/17/21

to opencog

moses output functions are imported as predicate nodes:

and

predicate "is over-expxressed"

gene "XYZ"

not

predicate "is over-expressed"

gene "PDQ"

Linas Vepstas

unread,

Sep 17, 2021, 4:19:47 PM9/17/21

to opencog

Hi Adrian,

On Thu, Sep 16, 2021 at 3:02 PM Adrian Borucki <gent...@gmail.com> wrote:
>
> Yeah, this is clear to me to now — the grammar learning part is kind of a given, the real question is how well this “image predicate” learning can go…

Yes, that is a question. Based on current experience, I'll say "very
far" or at least, "much farther than anyone else has gone". But that
is rather speculative: it's based on what I've been learning in a 1D
setting, and so any doubters or skeptics in the audience are
justified in doubting. Basically, I'm proposing this because it looks
promising.

It does not help that I am just one person proposing a rather novel,
radical, counter-cultural idea that flies in the face of conventional
wisdom. I'm quite aware of this. My burden of proof is much higher,
and I am trying to supply it as best as I can. Keep asking doubtful
questions, this is maybe the most useful thing you can do right now.
So I like how this is going. I'm only irritated that you can't read my
mind :-)

> This is a deep question as no one is even sure why neural nets themselves work so well.

Well, again, this goes in a very different direction. Here, the
reason that it would "work so well" is much more obvious: we ourselves
are very good at spotting part-whole structure. Why, in just a few
minutes, I can write down the obvious grammar for stop lights: glowing
red above yellow above green, surrounded by a painted yellow or black
harness. This is "obvious", and detecting this in images seems like it
should be pretty easy.

This is in very sharp contrast to what neural nets do: you are right:
when a neural net picks out a stoplight from an image, we have no idea
how it is doing that. Perhaps somewhere in there are some weight
vectors for red, yellow, green, but where are they? Where are they
hiding? How do neural nets handle part-whole relationships? There is
a paper (from Hinton?) stating that the part-whole relationship for
neural nets is the grand challenge of the upcoming decades. By
contrast, the part-whole relationship for grammars is "obvious".

> What needs clarification is what the structure of this filter learning would be — what is the algorithm and what direct learning objective is it given?

The exact same algo as in the existing grammar learning code, modulo
needed tweaks. That code is debugged and works well. Getting it going
on images does pose some serious challenges and open questions, but I
think the general ideas survive.

To recap that algo: given a set of inputs, one explores the parameter
space, and looks for high mutual-information correlations between
pairs. Once high-MI pairs are discovered, the dataset is passed over a
second time, this time, creating maximal spanning trees. The tree
edges are then cut to give the grammar components.

The above yields extremely high-dimensional sparse vectors: dimension
of a million. By comparison, the highest dimension that neural nets go
up to is about a thousand. So this is one of the big differences
between the two approaches. The other, of course, is that the basis is
labelled symbolically: you can see exactly which basis element
attaches to what ("red above yellow", etc.)

I'm currently working on the best ways to cluster these vectors into
groupings. Early results look pretty good, but also show that these
can be made much better. I can say much more in this.

> Like in the above example, where are all these filters and numerical arguments even coming from?

Randomly generated. With or without some sampling bias.

> The numerical part is especially difficult, given that you seemingly want to get some symbolic structure out of it.

I don't understand this statement.

>
> Going back to neural nets, the obvious problem is that if we make one big neural “filter” then you don’t know what is going on inside —

That's correct.

> so the learning will be “shallower”. The question is how much of a problem this really is.

Well, the leading lights of neural-net world claim that this is one of
the grand challenges of the upcoming decades, and I won't argue with
them about that.

> Is learning down to the low-level filtering operations a viable approach right now?

Yes, absolutely, I think so. Obviously, I haven't convinced you yet.
That is in part because I have not fully (clearly?) communicated the
general idea, just yet.

> An interesting research question is if you could train a neural net that can be “queried”, possibly in natural language or some simple formal one, so that the system on top of it can learn to “extract” various statements about an image out of it — so these predicates would be essentially hooked to some queries that get send to the underlying model.

Sure, there are hundreds of people working on this, and they are
making progress. You can go to seminars, new results are regularly
presented on this.

> Technically this probably falls somewhere in the Visual Question Answering field… the challenge is that these models are trained to answer questions about more abstract things like objects, not some low level features of the image.

Yes. Lack of a symbolic structure to neural nets impedes desirable
applicatiions, such as symbolic reasoning.

> The final big question is what can you really do after you get that grammar? What sort of inferences? How useful they are?

Well, for starters, if the system recognizes a stop light, you can ask
it: "how do you know its a stop light?" and get an answer: "because
red above yellow above green." you can ask "and what else?" and get
the answer "on a painted black or yellow background" -- "and what
else?" "the colors glow in the dark" "and what else?" "they are round"
and what else" only one comes on at a time" "and what else?" "the
cycle time varies from 30 second to three minutes" "what is a cycle
time?" "the parameter on the time filter by which repetition repeats"
"what do you mean by round?" the image area of the light is defined
via a circular aperature filter".

Good luck getting a neural net answering even one of those questions,
never mind all of them.

> The key thing here is that if you, say, have a system that classifies pictures, if it being built on top of this whole grammar and filter learning pipeline means it doesn’t achieve competitive performance with neural nets then it’s difficult to see what the comparative advantage of it is — beyond the obvious advantage of interpretability, but that won’t save that solution if its performance is considerably lower.

Really? The ability to do symbolic reasoning is valueless if it is
slow? If the filter that recognizes that lights are round also
appears in other grammatically meaningful situations, you can ask a
question "what else is round?" "the sun, the moon, billiard balls,
bowling balls, baseballs, basketballs". I think we are very very far
away from having a neural net do that kind of question answering. I
think this is well within reach of grammatical systems.

Associations between symbols and the things they represent is the
famous "symbol grounding problem", considered to be a very difficult,
unsolved problem in AI. I'm sketching a technique that solves this
problem. I think this is unique in the history of AI research. I don't
see that anyone else has ever proposed a plausible solution to the
symbol grounding problem.

> Well, the problem is not really with grammars, that can definitely be useful, but if that “filter sequence” part works poorly then it will bottleneck the performance of the entire system.

Learning it, or running it, once learned? Clearly, running it can be
superfast .. even 1980's-era DSP's did image processing quite well.
Even single-threaded CPU's have no particular problem; these days we
have multi-core CPU's and oodles of GPU's.

The learning algo is ..something else. There are two steps: Step one:
can we get it to work, at any speed? (I think we can) Step two: can we
get it to work fast? (Who knows -- compare to deep learning, which
took decades of basic research spanning hundreds of PhD theses before
it started running fast. You and I and whatever fan-base might
materialize are not going to replicate a few thousand man-years of
basic research into performance.)

> If that low level layer outputs garbage, then all the upper layers get garbage, and we know what happens when you have garbage inputs in this field...

Don't feed it garbage!

--linas

Linas Vepstas

unread,

Sep 17, 2021, 4:31:54 PM9/17/21

to opencog

On Fri, Sep 17, 2021 at 6:03 AM Michael Duncan <mjsd...@gmail.com> wrote:
>
> moses output functions are imported as predicate nodes:
> and
> predicate "is over-expxressed"
> gene "XYZ"
> not
> predicate "is over-expressed"
> gene "PDQ"

That looks reasonable to me.

(Well, to be syntactically correct, would have to be either
(Evaluation (Predicate "is over-expxressed") (Gene "XYZ")) or
(Evaluation (Predicate "is over-expxressed") (List (Gene "XYZ"))) so
that the gene and the predicate it applies to appear next to each
other. I assume that's what your indentation means.)

If your code is not throwing errors, then its probably OK. Tanksha was
dealing with an actual error message.

--linas

> To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/6f9d8989-f765-4dc3-9dc0-e52bbbb8ee69n%40googlegroups.com.

Adrian Borucki

unread,

Sep 19, 2021, 12:57:19 PM9/19/21

to opencog

Just to clarify: by “performance” I mean the rate of success on a given task, not necessarily speed.

Anyway: I’m afraid I can’t help with the visual processing part then — I know nothing of using wavelets for image analysis so I can’t really say anything further until how this is supposed to work is fully sorted out.

Linas Vepstas

unread,

Sep 20, 2021, 5:03:20 PM9/20/21

to opencog

On Sun, Sep 19, 2021 at 11:57 AM Adrian Borucki <gent...@gmail.com> wrote:
>
> Just to clarify: by “performance” I mean the rate of success on a given task, not necessarily speed.

Well, I think its likely to be successful, but clearly I have not
convinced you of that.

> Anyway: I’m afraid I can’t help with the visual processing part then — I know nothing of using wavelets for image analysis

You don't have to use wavelets. You do have to have a basic
understanding of image processing and how one applies image processing
primitives to extract information.

There is an easy way to learn this, though: The earliest programming
task is simply to write atomese wrappers for common textbook
image-processing primitives. This means, in practice, downloading a
copy of OpenCV, and reading through it's documentation. Writing
Atomese wrappers for it would allow you to learn image processing
"hands on" -- there are a number of OpenCV demos; you can run them,
convert them to Atomese, run the Atomese versions, and verify that you
get the same results. There are textbooks on image processing, filled
with examples; converting them to atomese and running them would be a
good, practical way of learning the core concepts.

An alternative would be to do this for audio; in some sense, this
would be simpler, but certainly a lot geekier: audio does not have the
immediate visual feedback of image processing. It's more abstract.

> so I can’t really say anything further until how this is supposed to work is fully sorted out.

I'm sorry to hear this. You seem to be politely backing away from the
project; I'm not sure what you expected it to be, but clearly what I
painted is not what you'd hoped for.

The project is "sorted out", but I guess I'm not communicating
something important about it. Again: the pipeline is already working
in the language domain. I tried to provide enough of an explanation
and pseudocode snippets to explain how to port it over for vision and
audio. It's pretty concrete; there's no airy-fairy hand-waving, just
a pile of pseudocode that needs to be converted to real code.

I'm guessing that somehow, I still failed to somehow explain what this
is all about. Perhaps I should bounce you to the abstract theory
papers? There are two: one that's hand-wavey with no math, another
with lots of math. These are

https://github.com/opencog/atomspace/blob/master/opencog/sheaf/docs/sheaves.pdf

and

https://github.com/opencog/learn/blob/master/learn-lang-diary/skippy.pdf

--linas

> --
> You received this message because you are subscribed to the Google Groups "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/959e9352-cc5b-481d-9a85-a4fd0a587578n%40googlegroups.com.

Adrian Borucki

unread,

Sep 20, 2021, 6:53:09 PM9/20/21

to opencog

On Monday, 20 September 2021 at 23:03:20 UTC+2 linas wrote:

On Sun, Sep 19, 2021 at 11:57 AM Adrian Borucki <gent...@gmail.com> wrote:
>
> Just to clarify: by “performance” I mean the rate of success on a given task, not necessarily speed.

Well, I think its likely to be successful, but clearly I have not
convinced you of that.

> Anyway: I’m afraid I can’t help with the visual processing part then — I know nothing of using wavelets for image analysis

You don't have to use wavelets. You do have to have a basic
understanding of image processing and how one applies image processing
primitives to extract information.

There is an easy way to learn this, though: The earliest programming
task is simply to write atomese wrappers for common textbook
image-processing primitives. This means, in practice, downloading a
copy of OpenCV, and reading through it's documentation. Writing
Atomese wrappers for it would allow you to learn image processing
"hands on" -- there are a number of OpenCV demos; you can run them,
convert them to Atomese, run the Atomese versions, and verify that you
get the same results. There are textbooks on image processing, filled
with examples; converting them to atomese and running them would be a
good, practical way of learning the core concepts.

Okay, if there is a concrete set of primitives you’d like to see incorporated that would be useful to your project, that’s something I can help with.

Basically, what I had in mind in the beginning was that I can spare some time to do some grunt work for this project.

I don’t think I can offer anything beyond that at the moment.

An alternative would be to do this for audio; in some sense, this
would be simpler, but certainly a lot geekier: audio does not have the
immediate visual feedback of image processing. It's more abstract.

> so I can’t really say anything further until how this is supposed to work is fully sorted out.

I'm sorry to hear this. You seem to be politely backing away from the
project; I'm not sure what you expected it to be, but clearly what I
painted is not what you'd hoped for.

The project is "sorted out", but I guess I'm not communicating
something important about it. Again: the pipeline is already working
in the language domain. I tried to provide enough of an explanation
and pseudocode snippets to explain how to port it over for vision and
audio. It's pretty concrete; there's no airy-fairy hand-waving, just
a pile of pseudocode that needs to be converted to real code.

I'm guessing that somehow, I still failed to somehow explain what this
is all about. Perhaps I should bounce you to the abstract theory
papers? There are two: one that's hand-wavey with no math, another
with lots of math. These are

https://github.com/opencog/atomspace/blob/master/opencog/sheaf/docs/sheaves.pdf

and

https://github.com/opencog/learn/blob/master/learn-lang-diary/skippy.pdf

I hope I will find some time to dive deeper into the theory but no promises.

Linas Vepstas

unread,

Sep 20, 2021, 10:40:21 PM9/20/21

to opencog

On Mon, Sep 20, 2021 at 5:53 PM Adrian Borucki <gent...@gmail.com> wrote:
>
> Okay, if there is a concrete set of primitives you’d like to see incorporated that would be useful to your project, that’s something I can help with.
> Basically, what I had in mind in the beginning was that I can spare some time to do some grunt work for this project.
> I don’t think I can offer anything beyond that at the moment.

Ah .. silly me! I have a bad habit of assuming people are just like me!

There's plenty of grunt work! OK. So to keep things specific, let's
say the project is to wrap Atomese around a tiny subset of OpenCV.
Does that sound reasonable?

Steps:
*) Create github repo. I just did that:
https://github.com/opencog/vision and I gave you admin permissions on
it.

*) Set up the conventional opencog repo boilerplate. A cut-n-paste of
the makefiles in opencog/atomspace-rocks might be easiest. Should I
ask you to do this, or should I?

*) Create an atom types file & etc. a cut-n-paste of the types in
opencog/spacetime might be easiest. Its mostly boilerplate ... except
for the types themselves.

*) For starters, we need to get still images into the system. I'm
thinking that this could be an ImageNode (or IMAGE_NODE <- NODE in the
types file). Corresponding to this type is a C++ class that
implements a wrapper to the suitable OpenCV analog.

The ImageNode is a type of Node, the string would be the name of the
file (or URL) containing the image. The c++ constructor can either do
nothing, or maybe it can try to open that file. The C++ class would
have as private/protected members whatever is needed from
https://docs.opencv.org/4.5.3/d4/da8/group__imgcodecs.html which
seems to be how opencv reads images. So I guess a Mat which seems to
be the opencv handle to an image.

*) Next, we need to do something with that image. So, I'm looking at
https://docs.opencv.org/4.5.3/d4/d86/group__imgproc__filter.html and
cv:blur() looks like a good place to start. So, we need an
ImageBlurLink which expects .. well, for now one parameter: a
NumberNode for the size, and the image to apply it to. So a typical
use would be (ImageBlurLink (ImageNode "/tmp/foo.jpg") (NumberNode
20))

Calling the C++ method ImageBlureLink::execute() would apply the
filter to the image, and return a ... well, that's a good question.
It could return another ImageNode holding a handle to the mutated
image. But I don't want to place these "temporary results" into Atoms
.. they really need to be Values ... so it should return an
ImageValue, which is almost exactly the same thing as an ImageNode,
except that ... its a Value, not an Atom.

So .. how do Values work? They're just like Atoms, except you can't
store them in the atomspace. If you don't know values well, please
review the demo.
https://github.com/opencog/atomspace/blob/master/examples/atomspace/values.scm

The StreamValue was invented to hold things like audio, video, and I
guess its OK to use it for static images, too. See
https://github.com/opencog/atomspace/blob/master/examples/atomspace/stream.scm

Just to confuse things, I should mention three more demos::
https://github.com/opencog/atomspace/blob/master/examples/atomspace/formulas.scm
and https://github.com/opencog/atomspace/blob/master/examples/atomspace/flows.scm
and https://github.com/opencog/atomspace/blob/master/examples/atomspace/flow-formulas.scm
... Those examples all use TruthValues to do their stuff; the general
idea that something similar would be used to route around video and
sound. But maybe we can ignore these for now.

I don't know how well you know the atomspace internals ... I'm hoping
the above is not overwhelming. It does require some thinking. I do
expect you to get confused. I do expect to give you wrong suggestions
from time to time. I expect I'll have to jump in and debug some code
sometimes... Values were invented for this kind of stuff, but using
them for audio, video would be new.

The general idea is that the ImageValues will hold a handle to the
image being processed, and the ImageBlurLink, and other things of that
kind specify the kinds of transformations to be applied to the image.
The actual images are never in the atomspace, only the abstract
processing tree, and a way of hooking that tree to the image(s) to be
processed.

*) Once this most basic demo works, the rest is just plain easy -- its
just cut-n-paste from this basic example, to add more filter types.
I'll have to crawl through the OpenCV docs to pick out two or three
more must-have ops.

-- Linas

Adrian Borucki

unread,

Sep 21, 2021, 12:32:37 PM9/21/21

to opencog

On Tuesday, 21 September 2021 at 04:40:21 UTC+2 linas wrote:

On Mon, Sep 20, 2021 at 5:53 PM Adrian Borucki <gent...@gmail.com> wrote:
>
> Okay, if there is a concrete set of primitives you’d like to see incorporated that would be useful to your project, that’s something I can help with.
> Basically, what I had in mind in the beginning was that I can spare some time to do some grunt work for this project.
> I don’t think I can offer anything beyond that at the moment.

Ah .. silly me! I have a bad habit of assuming people are just like me!

There's plenty of grunt work! OK. So to keep things specific, let's
say the project is to wrap Atomese around a tiny subset of OpenCV.
Does that sound reasonable?

Sure, I’ve already forked the repository below and started adding things, I don’t know if I’m going to have something working this week or if I get stuck, we’ll see.

It seems like streams correspond to a concept of the same name in some other programming languages (or to the concept named “generators”).

That should mean that if we have a list of image files to process, then we can iterate through that, getting the “next” image each time.

The RandomStream should probably be renamed to something more descriptive, so that it is clear it produces a specific data type (the lack of name spaces in Atomese hurts here but that’s a side note).

Linas Vepstas

unread,

Sep 21, 2021, 3:32:57 PM9/21/21

to opencog

Hey!

On Tue, Sep 21, 2021 at 11:32 AM Adrian Borucki <gent...@gmail.com> wrote:
>
> Sure, I’ve already forked the repository below and started adding things,

Feel free to push to the main opencog repo. Either push directly, or
use pull requests. Probably easier to push directly. The only things I
insist on is that the makefiles and directory structures follow that
of the other repos, and you've done that. (well, quibble:
`opencog/visops` should be `opencog/atoms/visops` but this probably
doesn't matter.)

> I don’t know if I’m going to have something working this week or if I get stuck, we’ll see.

I don't think you'll get stuck.

>> The StreamValue was invented to hold things like audio, video, and I
>> guess its OK to use it for static images, too. See
>> https://github.com/opencog/atomspace/blob/master/examples/atomspace/stream.scm
>>
> It seems like streams correspond to a concept of the same name in some other programming languages (or to the concept named “generators”).

You are right. The intent is generators, not streams, so these are
perhaps misnamed. The only defense I have is that "streams" is easier
to type than "generators", and that the atomspace does not have loop
constructs, nor does it have any "get-next" constructs, and so, at the
atomese level, both streams and generators are "the same thing". More
or less.

There is very little experience in how these things should work, in
Atomese. The existing streams were created to be just enough to allow
the basic demos, and that's all. They do work "as intended", and
that's all. There may be better ways.

One interesting variant is the QueueValue, which allows multiple
threads to push stuff onto a queue for later pickup. This was created
to allow a parallelized pattern engine; a few years ago, Ben was
pushing hard to have it run in parallel to get faster results. Now it
does, although the interest has waned. This means that the QueueValue
is stream-like and not generator-like. Basically, the data-producer
(the pattern engine) is slower than the data-consumer, and so we want
to operate in a mode where it's creating data as fast as possible.
This is a weird mirror-symmetric variation to "lazy evaluation": now
that the consumer has asked to producer for some data, the consumer
expects the producer to work as fast as possible, and dribble in the
results as they become ready, rather than saving them up to be
delivered in one big batch at the end.

What's the right way to deal with audio and video (or image) data?
Right now, I don't know, beyond some gut-feels. Something simple that
works is better than something complicated. Don't add complexity
unless you really really need it. So I'm quite happy to be ambiguous
as to whether these things are generators or streams or promises or
something else similar to all that. Something that works is better
than something fancy that doesn't work.

> That should mean that if we have a list of image files to process, then we can iterate through that, getting the “next” image each time.

Ah! That's a trick question, with two answers. First knee-jerk answer
is "yes". Since atomese has no explicit iterators, or loops or "do the
next one" constructs, all of this iteration has to happen under the
covers.

For the learning pipeline, though, its trickier. Let me sketch that
out. Currently, the learning pipeline is a large collection of mostly
scheme code, rather than Atomese, that processes data files in an ad
hoc fashion, feeding them into the pipeline, accumulating counts in
the atomspace. It's "ad hoc" because there hasn't been any reason to
do anything better/fancier. It's in scheme, not c++ or python or
atomese, because that was (for me) the easiest and fastest way to get
things working. Someday, it could be redesigned, but not today.

So, the learning pipeline for images, as I currently envision it,
would work like so:

Create N=50 to N=500 random filter sequences. Given a single image,
each filter sequence produces a single-bit t/f output. Given one image
and N filters, there are N(N-1)/2 result pairs. If both ends of the
pair are t, then the count is incremented for that pair; otherwise
not.

Given M input images, apply the above to each of the images. The
result is a collection of pairs, with varying pair-counts. (Up to a
maximum of M. The bigger the M, the better is the general rule).
Given this raw info on pairs, the generic learning pipeline kicks in,
and does the rest. The generic pipeline computes the mutual
information of the pairs, it extracts disjuncts, it merges disjuncts
into classes, and ... whatever will come next.

There are two aspects that are different with the image pipeline, as
compared to before. One is that some of these random filters may be
generating useless noise. These are presumably those with the lowest
marginal MI. They need to be discarded, and replaced, so that we build
up a good collection of "useful" or "meaningful" filters. The other is
that the filters with the highest MI with each-other might in fact be
nearly identical, and so we only need one of these, not both. One of
the two needs to be discarded. How exactly this gets handled is a big
TBD question.

The point of my writing out the above is to show what the "stream"
looks like, today. All of the above (for sentences, not for images) is
implemented in the "ad hoc" processing pipeline. A sequence of bits
corresponding to a sequence of images might be useful, but not
necessary. A sequence of bit-pairs might be useful, but not necessary.
Could the pipeline be redesigned to work with such streams? Possibly.
Does it seem urgent, right now? No.

(Well, actually, now that I think about it: I am struggling with how
to implement incremental learning aka "lifetime learning", and moving
the code to a stream/generator infrastructure may be just the
thing...)

> The RandomStream should probably be renamed to something more descriptive, so that it is clear it produces a specific data type (the lack of name spaces in Atomese hurts here but that’s a side note).

Atomese has many issues. The ones that get fixed tend to be the ones
that people complain about the most (and that have a clear solution).

-- linas

Ben Goertzel

unread,

Sep 21, 2021, 9:24:59 PM9/21/21

to opencog

Hi, the RCC stuff was work done by me and Keyvan Sadeghi quite some
years ago, which was paused not because it wasn't working but because
Keyvan moved on to other stuff...

Linas never dealt with that stuff, as far as I recall ...

I think to make that sort of approach work scalably, you would need to
use a hybrid inference engine that uses a specialized prover for
fuzzy-RCC, interoperating with a general-purpose PLN prover for
general conceptual relationships among the entities occupying the
regions... But we never got there and shifted attention to other
things...

ben

> --
> You received this message because you are subscribed to the Google Groups "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/af835338-4511-4251-97d2-89865efce045n%40googlegroups.com.

--
Ben Goertzel, PhD
http://goertzel.org

“He not busy being born is busy dying" -- Bob Dylan

Linas Vepstas

unread,

Sep 21, 2021, 9:30:28 PM9/21/21

to opencog

Hi Ben,

By RCC, I guess you mean the "region calculus"? This isn't that. This
is more like moses-for-images. Except it's unsupervised. So more like
"pattern miner for images". Except it's not using the pattern miner
infrastructure, it's using the vector+matrix infrastructure.

--linas

> To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CACYTDBe4UhapG-JwNRtSk8iyypn6maTA1UoTQY%3DsrUvi-8odng%40mail.gmail.com.

Reply all

Reply to author

Forward