Questions on a Knowledge Representation Standard for AGI - Help me not waste my time :-)

93 views
Skip to first unread message

Adam Gwizdala

unread,
Apr 1, 2017, 7:47:03 AM4/1/17
to opencog
Hey OpenCog,

I've been following your work for a few years now, great effort, and some solid justification for your design principles. Keep on truckin' with it :-)

I'm currently working to define my thesis, which is going to focus on concept pattern mining, DL and ontology learning, specifically in the AGI context.

In particular, I wanted to develop an KR standard for AGI (like OWL2 on steroids) which is extensible enough to enable AGI researchers to collaborate effectively, plug in learning algorithm or other modules more readily, but also enable low-level types/relationships to be defined so that economics or probability concepts (for example) can be implemented. I still wanted to keep track of the formalisation. (eg. inferences, satisfiability, chaining, uniform interpolation etc. all the good stuff we get from a formalised KR like OWL, where it applies).

As part of my pre-work I am considering the AtomSpace in detail, due to some of its properties. Eg. large-scale KR, query engine, bias towards modular/hybrid AGI. But also because any standard would need to meet advanced requirements like those found in OpenCog to be an effective standard.

I have a couple of questions I was hoping someone could answer to help me decide to progress:

Given that you guys have gone through the process of implementing the AtomSpace, do you think that such a 'standard AGI KR' would be practical in real terms? Or just be a bit too much of monster to define with too steep a learning curve for encouraging a new user base?

Also, in many of the AtomSpace-related publications there is frequent mention of performance trade-offs and data-persistence dynamics. Do you feel that distributed computation and general HPC should be considered as a central principle to such a standard KR? eg. in the same way OWL is 'web-biased' the AGI standard should be 'HPC-biased'.

Given that perspectives on AGI research differ significantly between individuals, do you think a KR standard which tries to unify viewpoints/requirements would end up being so generalised that you might as well just not bother?

Thanks

Adam Gwizdala

Ed Pell

unread,
Apr 2, 2017, 12:51:51 AM4/2/17
to opencog

I am impressed with OWL2 and its support of spatial and temporal knowledge. HermiT seems to support reasoning over these. What features will you be adding?

Adam Gwizdala

unread,
Apr 2, 2017, 1:48:14 PM4/2/17
to opencog
Same for me. I'm quite new to OWL2, but looking at some of the reasoning that's possible over well thought out ontologies generated by humans is what got me thinking.

In terms of features, I'm building a laundry list of things that OWL2 doesn't support readily. Most of these so far have been numerical concepts or where you might want to represent uncertainty in relationships.

This list is long though (!). Once I've finished this feasibility bit, I will post.

Linas Vepstas

unread,
Apr 2, 2017, 2:01:08 PM4/2/17
to opencog
Hi Adam,

My personal instinct is that a human-curated KR system is kind-of pointless. Let me explain why.  I've actually tired to create one several times now, and have been dis-satisfied with the results.

The first time, I thought Icould do it with "semantic triples" -- subject-verb-object type stuctures, and this seems to work, sort-of, at the simplest, naivest levels.  But very quickly one discovers that one needs to deal with modifiers -- adverbs, adjectives, prepositions, relative clauses, models of mental states (John thinks that ...)   It turns out that the reason that natural language is complicated is because there is an actual need for that complexity to convey factual knowledge.  It is hard to capture that complex structure.

But if you still do want to hand-create such a system, the best place to start would be with the DSynt layer of the MTT -- "Meaning Text Theory" of linguistics.

One of the many problems with traditional KR systems is that they fail to deal with grounding in the physical world: that discusssions about object X actually pertain to an actual object in the field of view of some camera, or maybe some sound heard on a microphone.   That perhaps talk about some action is acutally about an action that must be undertaken in the real world: say, for example, you wanted to tell a self-driving car to turn left.

My current plan/hope is to build a system that can develop it's own internal KR representation automatically, instead of having a human design one. Prototypes of this concept have been published in academic journals for decades, starting with papers 10 or 20 years old that discuss the automated learning of synonymous words and synonymous phrases.

I'm actually starting work on this now, full time, but am still at the very earliest stages.  Ask me again in a few months.

Anyway, if one does have a system capable of learning a KR system by itself, then the best-possible KR representation system is just a large collection of short English-language sentences asserting facts.  That's it. Just read the corpus.  The system will assign the facts into slots, as needed.

 --linas



--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/5064103e-2eb0-41a1-a9dc-feeec578b962%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adam Gwizdala

unread,
Apr 2, 2017, 5:47:28 PM4/2/17
to opencog, linasv...@gmail.com
Hi Linas,

Thanks for the response below and good counter examples.

Agreed on your point regarding the modifiers and models. I've been trying to consider both natural language and basic scenes (eg. camera etc) this week, but even on the most trivial cases, I'm finding that I need many additional models and extensions, to say even represent even a short sentence of English effectively, often with limited continuity between my efforts in each case.

I hadn't really considered your second point on grounding in the real world, outside of considering automated techniques to build the KR. I've not seen anything in traditional KR which considers that... it's always domain knowledge thats represented, not a representation of how the domain concepts are mapped back.

Your work on English sentences and learning an internal KR structure sounds mega interesting. I thought at length about language after reading some odd bits on Universal Grammar, Merge and iLanguage. I thought the poverty of stimulus problem in language acquisition was of particular interest and had considered that the human language acquisition mechanism and internal representations might have special significance in an AGI design. 

I will ask in a few months for sure.

Thanks

Adam
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+u...@googlegroups.com.

Andrew Buck

unread,
Apr 4, 2017, 9:40:36 AM4/4/17
to opencog, linasv...@gmail.com
Linas,

You mentioned that a corpus of short factual statements is something that could be useful in the self learning approach you are using.  In another thread I asked about how volunteers like me who lack the coding knowledge to contribute directly might be able to help out.  Working on a corpus of short, grammatically correct sentences is something I had in mind.  Does such a corpus exist and if not, can you give us a bit of guidance on what your "ideal" corpus might look like?  I know there are things like ConceptNet, and I also know there are a lot of problems with them that make them difficult to use for something like OpenCog.

I have lots of time available to work on something like this, I just need to know what to actually work on so that my efforts are not a waste.

-AndrewBuck

Andrew Buck

unread,
Apr 4, 2017, 1:01:10 PM4/4/17
to opencog, linasv...@gmail.com
To expand on this idea a bit further and maybe formulate something a bit more concrete how about we start a github repository under the opencog space on guthub.  Not sure what the term for them are, but there are already sub repositories for the atomspace, relex, the minecraft code, etc all under the overall umbrella of the opencog organisation.

This repository, would host the various statements and other files like short stories, etc, as well as maybe a few minimal scripts in python or similar to do maintenance on the corpus.  Things like looking for duplicate sentences, etc.  Additionally there might be some scripts to feed these sentences to something like relex, but maybe these should be best left to the other repositories as they are more specialised.  The idea is just to have one place where all this corpus and training data can be aggregated that is under control of the opencog organisation.

Making our own corpus has obvious advantages in that it can be carefully vetted to make sure the grammar and punctuation is correct, it doesn't contain nonsense or poorly formatted text, and it can be arranged in a way that is most easily suited to its purpose of being a learning space and/or a source of test data to test various reasoning portions of the opencog system.  Obviously the AI will eventually need to learn to deal with language that is not grammatically clean and tidy, but it is going to be far easier to start out with a corpus that is clean and then grow its understanding over time.  Also, by hosting it on github, we can take advantage of things like pull requests so that anyone can easily suggest changes/additions to the text and play around with it, but it won't be put into the "stable" master branch of the repository until it meets whatever standards we decide upon.  This way at any given time you know that if you run tests using the master branch of the corpus you should expect a similar response, just like with stable releases of a software program.

The way I envision such a system working is by having individual files with a bunch of short sentences on a single topic: something like "breakfast".  In here you would have things like "Cereal is a common breakfast food", "Breakfast is a meal eaten at the start of the day", etc.  The idea is that each file would contain lots of overlapping and redundant statements about some small aspect of human experience.  Redundancy will be a key idea here; unlike something like ConceptNet where each idea is contained in a single sentence, we might have 10 different sentence variations on the same idea.  Each of these files would be something akin to what a kindergartner might learn in one afternoon at school.  Then we would have some "meta" files which contain lists of these individual files that group them into something more akin to a chapter in a grade school textbook.  One such example list would be the files on breakfast, lunch, dinner, common food items, resturants, soup kitchens, etc.  Basically grouping lots of little blocks of knowledge into something more broad.

The idea is not to make this grouping a "formal logic" kind of grouping in the sense of subsets and predicate logic, but rather to group them in the sense of "if you were learning about subject X, what are the kinds of things you would read about.  The list might contain some subjects which are only tangentitially related to the core topic, but all should have at least some overlap in context.  By making lots of little self contained files and then grouping them like this we make it easy to test out the systems ability to integrate more and more complex ideas.  If you want to simply test a parser to see if it gets parts of speech tagged correctly you can run through individual files and not be bombarded with too much information.  Then when you want to test higher level things like word sense disambiguation, you can easily run through a few "chapters" to see how the same words might be used in different contexts, etc.

I think I might start putting together some small examples of such a system, but it would be good to have feedback from the devs on this to avoid making something that is not useful.  Ultimately you are the ones who would be using this kind of data, so I want to structure it in the way that makes it easiest for you to feed into the various things you are testing.

-AndrewBuck

Andrew Buck

unread,
Apr 4, 2017, 2:31:24 PM4/4/17
to opencog
In order to prevent derailing this thread I am going to start a new thread on the corpus discussion.

AndrewBuck
Reply all
Reply to author
Forward
0 new messages