Graphs for the two representations of the knowledge and ideas about the third generation Cog system - MOC

Alex

unread,

Sep 30, 2017, 6:20:10 PM9/30/17

to opencog

Hi!

I have heard that AtomSpace is big hypergraph and the names of the representation elements - nodes and links - suggest exactly that.

So - knowledge can be represented as nodes and links, i.e. as hypergraph - let call it the "Type1" labelled hypergraph - semantic graph.

But I guess that the same AtomSpace/OpenCog knowledge can be represented as mathematical formulas, e.g. as term logic formulas. Those formulas can have attributes that correspond to the probabilities but they are formulas anyway. Each formula is the word or sentence in some formal language and therefor it has syntactic graph representation as well. Let call it the "Type2" labelled hypergraph - syntactic graph.

The question is - what is the connection between Type1 (Semantic) and Type2 (syntactic) graphs? I guess, there can be established semantic rules that allow on to construct Type1 graph from the Type2 graphs and back. What is the right type for the representation? I guess, Type2 graphs are more appropriate for the formal reasoning (e.g. sequent calculus).

I have this question because there is this formalism - MMT https://uniformal.github.io/ - Meta-Meta-Theory that tries to unify all the (in)formal knowledge in one foundation free framework. There are wealth of literature how Florian Rabe with his collaborators try to encode every big formalism (Axiomatic Set Theory, Constructive Type Theory (Coq culture), Higher Order Logic (Isabelle/HOL culture) in one modular language. So - result can be the formalism, that allow to express every formula of every formalism in one common language and - of course - that means, that every formula can have Type2 graph assigned to it! And that means that we can encode in one graph database all the possible formulas (all the possible non-multimedia knowledge).

MMT is largely completed work, so - there remains the technical work only - one can take the best open source graph database (JanusGraphs is the best) and encode this knowledge and attain the most universal knowledge base possible, that is certainly more expressible than OpenCog (that currently uses (probabilistic) term logic).

One should add that each formula syntactic graph (Type2 graph) can have associated semantic graph representations (Type1 graphs) - there can be more semantic representations for the one syntactic one. Sadly, this relationship between syntactic-semantic graphs is very little researched field. There are, of course, research about semantic graphs themselves (every knowledge representation with graphs do this), but about connection between syntactic and semantic graphs there is only one work of which I am aware of: http://homepages.inf.ed.ac.uk/ldixon/papers/dixon-camcad-09.pdf - about logical graphs.

So - we can have knowledge system that have the best from the both worlds:

- The knowledge representation forma is taken from the MMT

- The knowledge representation techniqye is taken from JanusGraph - the best that the industry can provide.

That can be the future of knowledge representation systems. Sometimes I am very, very suscipicous about efforts of building custom knowledge bases. There are necessary so many resources to implement technicalities that I can not believe that custom knowledge base can compete with universal, industrial quality graph database. My guess is - if industrial graph databases had been around at the time of inception of cognitive architectures (Soar, Clarion, Cog, etc.) the all the cognitive architecture would be built around/using the industrial quality graph databases.

So - one graph database can host both types of graphs - both syntactic and semantic graphs and also this graph database can host reasoning and self-development procedures (which are programs, which can be represented as syntactic trees and saved in the same graph database as the remaining knowledge) for self-(re)evolution. So - big, big self-evolving system or hypergraphs that lives in the industrial grade graph database. Maybe this can be the start of AGI?

I call my system MOC - MetaOmegaCog (MetaOmega stands for meta-meta-meta...)

What are your thoughts about such plans?

Ivan Vodišek

unread,

Sep 30, 2017, 8:56:18 PM9/30/17

to ope...@googlegroups.com

Hi Alex :)

I'm just a lurker here most of the time, so you may want to skip my post. From what I've learned by now of OpenCog, AtomSpace is pretty much what MMT is about: a foundation for implementing whatever framework is suitable to obtain some task. To digress a bit, I've been working myself on something like MMT, and I like my results by now. But to return to OpenCog, AtomSpace should be able to describe all kinds of logics - if I'm not mistaken. And that shouldn't really be a big deal. Any theory, being lambda calculus, type theory, or a vast of available programming languages (of course, every programming language is a theory in which you can describe computations and do the interpretations), can be described each in another if they are complete enough, just like you can program the whole Python in Basic and otherwise around. One of the questions is a question of completeness of such a theory (Turing completeness notion rings my bells). Other question that arises is language's simplicity vibe that dictates how well would that language be accepted widely. The most of AI researches have developed their own languages for representing knowledge because the other solutions just "didn't seem right" or "could be improven", and among the others, the AtomSpace was born. But what I'd really like to see is setting up some standards in knowledge representation field, something that would every AI researcher have in mind when developing her/his ideas. Right now, I see a bunch of rough inference tools across the web, each of them is doing basically the same stuff, but overall, they are all incompatible each with another. I wonder what does it take for a knowledge base tool to be widely accepted. Probably its usability in industrial projects plays a significant role, but who knows? Maybe MMT would be able to change the game.

If you like MMT very much, and would like to see its implementation in OpenCog, you should probably make a comparison chart between AtomSpace and MMT. If MMT shows some benefits over AtomSpace, then there is a possibility that you'll be heard. As for a back-end database, any would do, because (in my opinion) the main database should be the very AtomSpace (or something else chosen), no matter of low level solutions like Postgtres DB that are powering it behind. Front-end is what should matter the most.

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/c436ad44-7d70-4af2-a3b9-cae81d6d2788%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Linas Vepstas

unread,

Sep 30, 2017, 10:58:28 PM9/30/17

to opencog

Hi Alex,

I think Ivan Vodisek has already given much of the correct answer, but I will amplify and say, yes its correct. Some important inline comments below.

On Sun, Oct 1, 2017 at 6:20 AM, Alex <alexand...@gmail.com> wrote:

Hi!

I have heard that AtomSpace is big hypergraph and the names of the representation elements - nodes and links - suggest exactly that.

So - knowledge can be represented as nodes and links, i.e. as hypergraph - let call it the "Type1" labelled hypergraph - semantic graph.

But I guess that the same AtomSpace/OpenCog knowledge can be represented as mathematical formulas, e.g. as term logic formulas. Those formulas can have attributes that correspond to the probabilities but they are formulas anyway. Each formula is the word or sentence in some formal language and therefor it has syntactic graph representation as well. Let call it the "Type2" labelled hypergraph - syntactic graph.

The question is - what is the connection between Type1 (Semantic) and Type2 (syntactic) graphs? I guess, there can be established semantic rules that allow on to construct Type1 graph from the Type2 graphs and back. What is the right type for the representation? I guess, Type2 graphs are more appropriate for the formal reasoning (e.g. sequent calculus).

They're both graphs. You answer your own question by saying MMT below, so I don't understand why you asked that question.

I have this question because there is this formalism - MMT https://uniformal.github.io/ - Meta-Meta-Theory that tries to unify all the (in)formal knowledge in one foundation free framework. There are wealth of literature how Florian Rabe with his collaborators try to encode every big formalism (Axiomatic Set Theory, Constructive Type Theory (Coq culture), Higher Order Logic (Isabelle/HOL culture) in one modular language. So - result can be the formalism, that allow to express every formula of every formalism in one common language and - of course - that means, that every formula can have Type2 graph assigned to it! And that means that we can encode in one graph database all the possible formulas (all the possible non-multimedia knowledge).

I have never heard of MMT before, but I believe that the opencog representation is probably going to be very similar to what MMT does. It would be an excellent exercise for you (or for someone) to compare the two, and see where they differ, how they differ.

I would very happily take the best ideas fro MMT and put them in opencog, or find some way to collaborate with the MMT community -- I think we are working on the same general ideas.

MMT is largely completed work, so - there remains the technical work only - one can take the best open source graph database (JanusGraphs is the best)

This is not enough. The MMT landing page already lists several dozen things that JanusGraphs does not do. Likewise, I have not looked at JanusGraph carefully, but I'm certain that opencog does many things it does not do. Ou focus is NOT to be "just" a graph storage system, but a graph storage system with many additional services (the MMT page lists many of these)

Our big ones are:

* the pattern matcher

* the pattern miner

* the rule engine

Our smaller ones are:

* a sparse matrix subsystem

* a parsing (categorial grammar) subsystem

The atomspace does have a postgres backend, and perhaps it would be excellent to add a JanusGraph backend. I don't have time to do this myself; we need programmer volunteers to do this.

and encode this knowledge and attain the most universal knowledge base possible, that is certainly more expressible than OpenCog (that currently uses (probabilistic) term logic).

False statement; you misunderstand opencog, or are envisioning it incorrectly. It is probably safe to say that atomese is mor advanced than Janusgraph+MMT today, but I might be wrong, because I never heard of these two before today. I might be wrong, but I doubt it.

One should add that each formula syntactic graph (Type2 graph) can have associated semantic graph representations (Type1 graphs) - there can be more semantic representations for the one syntactic one. Sadly, this relationship between syntactic-semantic graphs is very little researched field. There are, of course, research about semantic graphs themselves (every knowledge representation with graphs do this), but about connection between syntactic and semantic graphs there is only one work of which I am aware of: http://homepages.inf.ed.ac.uk/ldixon/papers/dixon-camcad-09.pdf - about logical graphs.

Perhaps we need more tutorials and wiki pages about how to do this. We've been doing logical graphs like what Dixon discusses since about forever. It's all there in atomese, and its all "old hat", old news to us.

So - we can have knowledge system that have the best from the both worlds:
- The knowledge representation forma is taken from the MMT
- The knowledge representation techniqye is taken from JanusGraph - the best that the industry can provide.

I am pretty sure that atomese today already meets these goals, and goes far beyond this proposal. I might be wrong, but I doubt it.

That can be the future of knowledge representation systems. Sometimes I am very, very suscipicous about efforts of building custom knowledge bases. There are necessary so many resources to implement technicalities that I can not believe that custom knowledge base can compete with universal, industrial quality graph database. My guess is - if industrial graph databases had been around at the time of inception of cognitive architectures (Soar, Clarion, Cog, etc.) the all the cognitive architecture would be built around/using the industrial quality graph databases.

Atomese already has a way of creating a JanusGraph plugin. Perhaps JanusGraph has some features that our current plugin API does not support -- and for this, it would be excellent to expand/improve our plugin API. It would be interesting for e to get into that level of detail.

So - one graph database can host both types of graphs - both syntactic and semantic graphs and also this graph database can host reasoning and self-development procedures (which are programs, which can be represented as syntactic trees and saved in the same graph database as the remaining knowledge) for self-(re)evolution. So - big, big self-evolving system or hypergraphs that lives in the industrial grade graph database. Maybe this can be the start of AGI?

Yes, that's part of the vision of opencog.

I call my system MOC - MetaOmegaCog (MetaOmega stands for meta-meta-meta...)

What are your thoughts about such plans?

Come study atomese more carefully, and perhaps we can talk about how to do this.

--linas

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/c436ad44-7d70-4af2-a3b9-cae81d6d2788%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

"The problem is not that artificial intelligence will get too smart and take over the world," computer scientist Pedro Domingos writes, "the problem is that it's too stupid and already has."

Ed Pell

unread,

Sep 30, 2017, 11:36:26 PM9/30/17

to opencog

JanusGraph is just a graph store. It has no available reasoners.

It is best developed for Java and is weak for Python. As far as I can tell there is no QA process.

JanusGraph is developed largely by IBM and Google as a replacement for Titan which has been taken proprietary.

IBM would benefit far more by adopting Atomspace and its reasoners than Opencog would benefit from JanusGraph.

Linas Vepstas

unread,

Oct 1, 2017, 12:09:56 AM10/1/17

to opencog

Hi Ed,

Thanks. It turns out that I have glanced at JanusGraph in the past. The main landing page for JanusGraph does make it sound very impressive.

Here's my experience and I would love to get help with it. I am now building graphs that are so large, that they no longer fit into RAM (on a machine with 256GB RAM). I'm slowly moving to algorithms that can page in only the needed subgraphs, on demand and then drop these when not needed.

In the past, I used to run postgres without any protections: My default config for postgres was to disable sync-to-disk, and tune all sorts of other parameters for performance. This worked great, or it worked "well enough". And then came a thunderstorm, and then another, and I got very shy about disabling the various writeback and sync features in postgres. Basically, I experienced data loss and database corruption. Which is semi-tolerable (for my current datasets), but quite painful and unpleasant and nerve-wracking.

So I turned the safety features back on, but now access to atoms is maybe 10x slower than before. Yes, I bought SSD's for database storage, this helped a lot. Yes, I bought an uninterruptible power supply. For the short-term, I am good to go. But this is very home-brew. There's a big difference between tinkering in your garage, and building a factory floor.

In the long-term, though, a cloud solution, with high-speed access to a distributed database is needed. I have no clue what sort of performance numbers are achievable, or what it would take to improve these (as the initial attempt is bound to be bad). How much of this would require either redesign, or large new features in atomese. I might hope "relatively little" but I am too world-wise to entertain such hopes when I'm not drunk.

--linas

--

You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/18ebc255-e1ac-402a-8898-6bc1cb74dfd9%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Amirouche Boubekki

unread,

Oct 1, 2017, 10:07:28 AM10/1/17

to ope...@googlegroups.com

Héllo all,

On Sun, Oct 1, 2017 at 6:09 AM Linas Vepstas <linasv...@gmail.com> wrote:

Hi Ed,

Here's my experience and I would love to get help with it. I am now building graphs that are so large, that they no longer fit into RAM (on a machine with 256GB RAM). I'm slowly moving to algorithms that can page in only the needed subgraphs, on demand and then drop these when not needed.

This is exactly the problem I want (or wanted) to solve using wiredtiger database engine. The problem is that I need help. I need to know:

0) Which headers files are relevant?

1) Which classes must be overridden?

2) How to test that my code works as expected?

Obviously, if nobody can guide me a *little* through the codebase, I will not be able to help. I can not convince you I will be successful after all I just another webdev (working on a 200k sloc project) lurking around...

The main difference with current postgresql backend, is that with wiredtiger database engine is a library. wiredtiger is embedded in the program that use it. Also, it handles caching for you, so you don't need to load/unload subgraphs manually, it will be done for you by the engine which will know precisely the atomspace data layout and which will lead to efficient caching.

BTW, wiredtiger is used by mongodb since 3.2. It's the underlying default engine. That said, it's more powerful than what mongodb expose. Some flaws of mongodb were kept for backward compatibility (I guess).

So I turned the safety features back on, but now access to atoms is maybe 10x slower than before. Yes, I bought SSD's for database storage, this helped a lot. Yes, I bought an uninterruptible power supply. For the short-term, I am good to go. But this is very home-brew. There's a big difference between tinkering in your garage, and building a factory floor.

How is postgresql fine tuning is helping? I was under the impression that atomspace loaded the whole database into main memory and dumped it on demand.

In the long-term, though, a cloud solution, with high-speed access to a distributed database is needed. I have no clue what sort of performance numbers are achievable, or what it would take to improve these (as the initial attempt is bound to be bad). How much of this would require either redesign, or large new features in atomese. I might hope "relatively little" but I am too world-wise to entertain such hopes when I'm not drunk.

idk.

Linas Vepstas

unread,

Oct 1, 2017, 11:39:42 AM10/1/17

to opencog

Hi Amirouche,

Let me top-post, it will be easier. First: bulk load and bulk save of the atomspace is part of the API, but it's very blunt and ugly and useless. I never-ever bulk-load or bulk-save my data.

The more fine-grained API allows:

* Specific atoms to be loaded (i.e. the values, truthvalues, etc attached to those atoms)

* The entire incoming set of a specific atom to be loaded.

* Load only that portion of the incoming set that is some specific type.

* Load all atoms of a specific type.

* Save just one specific atom.

Let me give an example: So, first, I load all atoms of type WordNode. There are maybe 100K or 200K of these, it depends. Next, I pick one word, lets say (WordNode "the") and load all SectionLinks with that word in it. (Sections are link-grammar disjuncts). For a word like "the", there might be 20K or 50K or maybe more sections. By loading only the SectionLinks, I can avoid loading the word-pairs (of which one word is "the"), because I don't need the word-pairs, and there's like maybe 100K of them that I don't need clogging up RAM. Then I run my algo, and then pick a different word, and repeat. Pretty much all words have much much fewer sections than "the". The total number of sections is maybe 25 million or double or one-tenth of that (it depends), which is probably too much to load all at the same time. I don't really need all 25M at the same time.

So how can wiredtiger help? To summarize, here's what I got:

So my algo knows exactly which atoms it wants loaded at any given time, and I can also provide fairly strong hints about which ones are no longer needed.

I absolutely, totally must have these certain kinds of atoms loaded at certain times, otherwise the algo totally fails. The atomspace API allows me to ask for exactly those atoms that I want, when I want them. The current API stalls (does not return to caller) until the requested atoms are fully loaded in the atomspace. For all I care, the loading could be done async, BUT the atoms must be there when they are accessed. (We would need to change the API to do this kind of async load, but that's doable. Hmmm. good idea, even, I should have done this earlier....)

Maybe with wiredtiger, we could making loading async, so that the atoms are not fully loaded until they are accessed. I'm not picky. I can give hints about which ones to load, when.

I have no clue how wiredtiger works, so I don't really know what to suggest to you. I can only point you at the current, actual API and its documentation. It is here. If you want a different API, that's OK, I'm OK with that, as long as it can actually work. I do NOT need crazy ideas that will never work.

The API is here:

https://github.com/opencog/atomspace/blob/master/opencog/atomspace/BackingStore.h

an example implementation is here:

https://github.com/opencog/atomspace/blob/master/opencog/persist/sql/multi-driver/SQLAtomStorage.cc

If you try to figure out how the one is wired to the other, you will get confused; there is some historical perversity that makes it more stupidly complicated than it should be. Oh well. Just skip that part.

So I will help you make wiredtiger work, if you explain to me how it can "magically" load the needed atoms at the right time. Because otherwise, it just seems like magic to me.

If we can expose whizzy features in wiredtiger, that's fine too. but I have no clue about that.

--linas

Linas Vepstas

unread,

Oct 1, 2017, 11:47:16 AM10/1/17

to opencog

Oh, and I forgot to mention: opencog atoms are small -- maybe a few hundred bytes or so. The performance of most popular web databases totally sucks when the data is that small -- they are tuned for storing mp3's and jpeg files, which are megabytes each, and they are great for that - but they suck for teeny-weeny atoms. Been there, done that.

--linas

Alex

unread,

Oct 3, 2017, 1:29:55 PM10/3/17

to opencog

Ou focus is NOT to be "just" a graph storage system, but a graph storage system with many additional services (the MMT page lists many of these)

Our big ones are:
* the pattern matcher
* the pattern miner
* the rule engine

Our smaller ones are:
* a sparse matrix subsystem
* a parsing (categorial grammar) subsystem

I did Google search - and there are already extensive research about graph pattern mahtching and graph pattern mining and there are some open source projects and tools for matcher in miner and I have found even one open source project for rule engine that is implemented on top of graph database. I am currently investigating these projects and deeper and making decisions whether they are appropriate for me and whether they are mature enough.

I just wanted to say that maybe relying on (and contributing to) open source projects for general features/foundation have some advantages.

Nil Geisweiller

unread,

Oct 3, 2017, 2:40:28 PM10/3/17

to ope...@googlegroups.com

On 10/03/2017 08:29 PM, Alex wrote:
> source projects and tools for matcher in miner and I have found even one
> open source project for rule engine that is implemented on top of graph

Interesting, which one?

> I just wanted to say that maybe relying on (and contributing to) open
> source projects for general features/foundation have some advantages.

Absolutely.

Nil

Alex

unread,

Oct 3, 2017, 3:03:46 PM10/3/17

to opencog

Well, maybe I am too optimistic...

Regarding the rules, I am investigating the following (not impressive):

https://github.com/threatgrid/naga

https://link.springer.com/chapter/10.1007/978-3-319-21542-6_14

https://www.google.com/patents/US20150302300

Rules over graph database is far less popular theme than graph matching and mining, but I guess - graph matching is by far the most complex and important part of the any rule engine.

Well - I suggest you not to waste your time on this, I will investigate further open source projects and then I will report back. I don't want to disturb anyone from his plans.

Linas Vepstas

unread,

Oct 4, 2017, 1:45:58 AM10/4/17

to opencog

Hi Alex,

On Wed, Oct 4, 2017 at 3:03 AM, Alex <alexand...@gmail.com> wrote:

Well, maybe I am too optimistic...

Regarding the rules, I am investigating the following (not impressive):
https://github.com/threatgrid/naga

Atomese is definitely inspired by datalog. About 10 years ago, we actually had a datalog API to the atomspace, it was created by one of the early contributors, and was used by the State of Florida in some web-service.

https://link.springer.com/chapter/10.1007/978-3-319-21542-6_14

I keep trying to convince Ben and/or anyone at all to write up the atomspace and submit for publication at one of these conferences, but no one ever does. It would be very cool -- a simple, easy-to-write paper describing the newest ideas -- and ti would be stuff we've already got code for. Free advertising for opencog.

https://www.google.com/patents/US20150302300

Heh. Well, we've got lots and lots of prior art for that patent. Foo.

Rules over graph database is far less popular theme than graph matching and mining, but I guess - graph matching is by far the most complex and important part of the any rule engine.

Yes. Well, once you have the concept of graph matching, then the concept of a rule engine is (should be) "obvious". There are many more things that we've done in the atomspace that are now "obvious" in hindsight, but were not when going forwards.

I am not surprised that other people are inventing/rediscovering and implementing these things -- that's what happens when an idea is "obvious". I am somewhat frustrated that some of these other projects get more exposure, more advertising, and become more popular than opencog, and I wish I could change that.

I wish that some of the energy expended on creating graph databases had been expended on opencog instead. I wish that the tinkerpop people had used the opencog query language, instead of inventing thier own.

Maybe this was the usual java vs. c++ thing. Maybe its the Apache license vs. the GPL license. Some (many?) decisions are political, not technical.

Anyway, I am pretty sure that the atomspace design is five or six steps ahead of everyone else, not just one or two. Most of the world has not yet discovered why these steps are important and interesting. Maybe in 5 or 10 years, there will be an Apache Whatever that will also do these things in a faster, more scalable way. Maybe someone will patent these ideas, and we'll have prior art. But right now, there isn't, and I can't wait 5 or 10 years.

What is critically important is to make the atomspace better known, easier to understand, more popular, easier to use, more scalable, faster, more efficient. All of this is hard to do, however.

--linas

Well - I suggest you not to waste your time on this, I will investigate further open source projects and then I will report back. I don't want to disturb anyone from his plans.

--

You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/da088fdb-5d63-45e1-935e-01554e22b575%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

Graphs for the two representations of the knowledge and ideas about the third generation Cog system - MOC - MetaOmegaCog?

Alex

Ivan Vodišek

Linas Vepstas

Ed Pell

Linas Vepstas

Amirouche Boubekki

Linas Vepstas

Linas Vepstas

Alex

Nil Geisweiller

Alex

Linas Vepstas