automated code generation

145 views
Skip to first unread message

Dmitry Ponyatov

unread,
May 7, 2017, 11:46:21 AM5/7/17
to opencog
Is opencog suitable for automated code generation ?

I'm searching in some AI technologies which can be applied in transformational programming, especially in automated code generation: synthsize programs code in mainstream languages like C(++), Java, JS,... basing on declarative highlevel descriptions, reasing code block taking in mind target system, predifines class structure, library of code snippets and so on.

Ivan Vodišek

unread,
May 7, 2017, 2:19:19 PM5/7/17
to ope...@googlegroups.com
Correct me if I'm wrong, but isn't backward chainer what Dimitry is looking for?

Is it possible to insert a set of formulas:

(a -> T, b -> T, c -> T)

(a -> T, b -> F, c -> T)

(a -> F, b -> T, c -> T)

(a -> F, b -> F, c -> T)

(a -> T, b -> F, c -> F)

(a -> F, b -> T, c -> F)


(a1 ∧ a2) a3

2017-05-07 17:46 GMT+02:00 Dmitry Ponyatov <dpon...@gmail.com>:
Is opencog suitable for automated code generation ?

I'm searching in some AI technologies which can be applied in transformational programming, especially in automated code generation: synthsize programs code in mainstream languages like C(++), Java, JS,... basing on declarative highlevel descriptions, reasing code block taking in mind target system, predifines class structure, library of code snippets and so on.

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/052d5af3-68db-4e08-9690-b66774d69b05%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ivan Vodišek

unread,
May 7, 2017, 2:32:18 PM5/7/17
to ope...@googlegroups.com
Ooops, I incidentantly hit the send button prematurely...

What I mean is:

Disjunction:

((a -> T) ∧ (b -> T) c -> T)

((a -> T) ∧ (b -> F) c -> T)

((a -> F) (b -> T) c -> T)

((a -> F) ∧ (b -> F) c -> T)

((a -> T) (b -> F) c -> F)

((a -> F) ∧ (b -> T) c -> F)


Result:

(a ∧ b) -> c

Shouldn't backward chainer be able to conclude the below result from the above disjunction?

In my opinion, the difference between this kind of reasoning and construction of code is only in task domain choice and result domain choice. In another code example we start from natural language specification, and search for a source code equivalent by backward chaining. The only "but" is in complexity of source code fragments bindings to natural language. And it is a big "but", in my opinion.

 

Ben Goertzel

unread,
May 7, 2017, 11:03:53 PM5/7/17
to opencog
In principle the opencog Rule Engine and backward chainer can do this...

Getting it to run quickly and scalably on such tasks without
combinatorial explosion is gonna be nontrivial however... but
important and interesting..
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/052d5af3-68db-4e08-9690-b66774d69b05%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin

Ivan Vodišek

unread,
May 8, 2017, 7:26:23 AM5/8/17
to ope...@googlegroups.com
There has been a thing I'm working on more thoroughly, recently, and that is "intensional" representation of deduction/abduction tree.

A classic way for deducing conclusions is extensional, meaning ranging over starting set of formulas and enriching the starting set by new conclusions, repeating the process when a new element(s) shows up. The process is extensional in a sense that we explicitly insert to the set each new formula we conclude. Of course, this set could grow infinitely, and that is where we run out of resources for executing a complete solutions.

But, if we look more carefully, a set of conclusions is a possibly infinite set that follows some pattern, and it is a kind of "expression fractal" that, instead of explicitly stating every concluded formula, can be described by a recursive expression function that connects right parts of formulas at right places. What we get is a big, but finite compound formula that describes every possible conclusion that follows from the starting set.

Now, given this kind of (possibly infinite) set of conclusions written by a finite compound formula, there is a possibility to state any arbitrary formula and to check weather that formula belongs to the set. We can than state "false" and check whether that "false" belongs to the set, which could be applied to form a "proof by contradiction" logic method. It is about solve/check relation, where solving could take infinite amount of resources, while checking happens in a finite amount of time/memory space.

It reminds me a bit of P versus NP problem.

To return to the subject, in my understanding, deduction is what forward chainer is dedicated to do, and abduction is what backward chainer does. So, in the same way that (possibly infinite) deduction set is formed, it is possible to form a (possibly infinite) abduction set, giving an opportunity to construct algorithms from natural language specifications, avoiding combinatorial explosion by constructing "expression fractals" instead of each instance of parts of the algorithm set. All we have to do after processing is to pick any specific instance of this fractal, and we get one of possible algorithms constructed by specifications.

Still have have to resolve some checking problems, it is a matter of functions and their inverse forms, but it seems that some human-aided work-arounds are possible.


2017-05-08 5:03 GMT+02:00 Ben Goertzel <b...@goertzel.org>:
In principle the opencog Rule Engine and backward chainer can do this...

Getting it to run quickly and scalably on such tasks without
combinatorial explosion is gonna be nontrivial however... but
important and interesting..


On Sun, May 7, 2017 at 11:46 PM, Dmitry Ponyatov <dpon...@gmail.com> wrote:
> Is opencog suitable for automated code generation ?
>
> I'm searching in some AI technologies which can be applied in
> transformational programming, especially in automated code generation:
> synthsize programs code in mainstream languages like C(++), Java, JS,...
> basing on declarative highlevel descriptions, reasing code block taking in
> mind target system, predifines class structure, library of code snippets and
> so on.
>
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/052d5af3-68db-4e08-9690-b66774d69b05%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin
--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.

To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

Alex

unread,
May 9, 2017, 2:08:50 PM5/9/17
to opencog
Ivan, 

is this new idea what are your writing about  ""intensional" representation of deduction/abduction tree"? Do you have any literature or references regarding this idea? I could be greate if You can share the sources of your idea.

Automatic programming is quite old field that has ceased to be develope in nineties and after them there has been only development of templates. I am not aware about big theories or sophisticated technologies behind model driven development.

Code generation is one of my goals but I have not implemented it yet. I plan to do this in very crude way: I will have inference process whose result will be abstract syntax tree (of some industrial programming languages like Java, Typescript and HTML) as a Scheme code and then I will generate actual code (Java, Typescript, HTML) from this syntax tree. Functional programming languages have no distinction between data and code (at least I have heard about such thesis in Scheme book) - meaning - Scheme data structures can be perceived as executable programs and that is why generation of Scheme data structures can be the same as generation of executable code (Javascript has some similar features) but I am not pursuing this path because I want object-oriented code - the generated code should be of industrial quality.

There is work in implementation of Goedel machine (self-improving software) and as far as I have read about it, they are using lambda calculus.

In AGI books (engineering AGI) there are chapters about evolutionary code generation but the generated code is the result of genetic programming process in that case. Genetic programming is quite relaxed and heuritic method, I would prefere more rigorous automatic programming methods - code should be generated in the inference process from the specifciation and specification should be autoatically generated from the normas, best practices, interaction with the customer. Of course, in this inference process we can use adaptable, probabilistic, non-classical logics, but nevertheless - logics should be used. Evolutionary search can not give guarantees that inference process can give.

Dmitry Ponyatov

unread,
May 9, 2017, 4:09:35 PM5/9/17
to opencog
but I am not pursuing this path because I want object-oriented code - the generated code should be of industrial quality.

I see Prolog-derivatives as implementation of backtracking reasoner on top of hypergraph knowledge base -- prolog rules look like exclusively hypergraph beast.


Maybe you should look here ? This Ergo/Flora system include mix of
  • Minsky frames looks like native representation for object-based software systems, and 
  • Transactional Reasoning represents state machines behavior in logic programming domain.


Flora-2  (a.k.a. Ergo Lite) is an advanced object-oriented knowledge representation and reasoning system. It is a dialect of F-logic with numerous extensions, including meta-programming in the style of HiLog, logical updates in the style of Transaction Logic, and defeasible reasoning. Applications include intelligent agents, Semantic Web, knowledge-based networking, ontology management, integration of information, security policy analysis, and more.



At first time I thought about opencog as a partner for Flora, firstly, as a visualization tool (it seems Flora lacks it) and as a generic engine for non-backtracking applications.

But I found lot of problems with opencog nonportability and lack of prebuilt packages for Debian Linux, not speaking about necessity to shove a buggy virtualbox to my win32 host system.

Now I'm playing with http://hypergraphdb.org as standin for opencog at this role.

Ivan Vodišek

unread,
May 9, 2017, 4:41:21 PM5/9/17
to ope...@googlegroups.com
Hi, alex :)
 
is this new idea what are your writing about  ""intensional" representation of deduction/abduction tree"? Do you have any literature or references regarding this idea? I could be greate if You can share the sources of your idea.

I came onto this idea on my own (but maybe I'm just reinventing a wheel) and there is no literature known to me, that describes this kind of reasoning. Actually, things are not much complicated and so mystical below the surface of the problem. Let's consider a complex logic formula, consisted of operators for negation, implication, conjunction and disjunction. A structure of this logic formula forms a expression tree where each leaf has only one expression. Some of these expressions, when applied to logic axioms (and relevant other leafs), yield new formulas that can be placed as disjunctions with original leafs, in places of original leafs. It's a kind of equivalent to abstract syntax forest that comes as a result of parsing ambiguous data.

One approach is to enumerate all of these leafs / trees in advance, before quering them for some purpose. The other approach is to have a "function" instead of each leaf / tree whose result could be enumeration of every leaf / tree that logically follows in the right places. But, we don't enumerate these leafs / trees in advance. Rather, we enumerate leafs / trees in a lazy way, only when checking whether a searched formula corresponds to certain leaf / tree in the whole forest. Knowing in advance what to search for in the forest, we can optimize the search process, rejecting paths that don't match with the formula we search for. This way, we can compare only fragments (say, beginning) of searched formula and check the rest of searched formula only if the previous fragments passed. Basically, algorithmic branching for checking remains the same, while the memory space for expressing each leaf / tree is not needed anymore, as we procedurally build a branch we want to check on demand, thus eliminating one of two important components responsible for combinatorial explosion, and that is memory space usurpation.

The other component responsible for combinatorial explosion is algorithmic branching complexity, and it seems that this is going to be a hard nut to solve. Basis of it should be in converting recursive definitions (of, say, even numbers like "f(x) -> (x, f(x + 2)), first x being 0") to their non-recursive equivalents (like f(x) -> every x where x / 2 is integer and x >= 0). It turns out that this is not such a trivial task and I think that we are going to need an AGI to solve these kinds of questions on general level.

I think, for now I'm going to settle down with the first part, meaning non-greedy-memory-approach, leaving algorithmic search space explosion solution for the future. Until then, my recommendation for users would be to avoid writing recursive definitions, noting them in non-recursive notation, where possible.

Code generation is one of my goals but I have not implemented it yet. I plan to do this in very crude way: I will have inference process whose result will be abstract syntax tree (of some industrial programming languages like Java, Typescript and HTML) as a Scheme code and then I will generate actual code (Java, Typescript, HTML) from this syntax tree.

My approach is to start from the bottom: to define an assembler. Then to give to user an opportunity to define higher level expressions that translate to assembler. Then an user can define even higher level expressions that translate to the level below. And so on with even higher levels, one of which may be the very natural language expressions...
 
Functional programming languages have no distinction between data and code (at least I have heard about such thesis in Scheme book) - meaning - Scheme data structures can be perceived as executable programs and that is why generation of Scheme data structures can be the same as generation of executable code (Javascript has some similar features) but I am not pursuing this path because I want object-oriented code - the generated code should be of industrial quality.

Similarity between data and code is obvious for Lisp family of languages, but you could say that for any language around - as you can always express any code by an abstract syntax tree. What you do with that tree, execute it (treat it as code) or analyze it (treat it as data), it is up to you.

I believe that object oriented approach of programming is just one side of the story: it is our current industrial reality. The other side is functional programming in a sense of lambda calculus. Some pretty mean possibilities open up in functional programming. You can think of inference process as a set of functions from assumptions to conclusions. In that sense, it would be a loss to overlook it and to concentrate just on OOP. I'm still not sure, but I think there is a special place for functional programming in the future. A place of scientific approach to solving problems.

In AGI books (engineering AGI) there are chapters about evolutionary code generation but the generated code is the result of genetic programming process in that case. Genetic programming is quite relaxed and heuritic method, I would prefere more rigorous automatic programming methods - code should be generated in the inference process from the specifciation and specification should be autoatically generated from the normas, best practices, interaction with the customer. Of course, in this inference process we can use adaptable, probabilistic, non-classical logics, but nevertheless - logics should be used. Evolutionary search can not give guarantees that inference process can give.

Interactive successive problem specification could reduce a number of possible results, but if there are still many results in the process, a combination of genetic guessing and methodical inference could be a way to go.

-- ivan


--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

Linas Vepstas

unread,
May 9, 2017, 6:26:19 PM5/9/17
to opencog
What Ben and Ivan say:  .

On Sun, May 7, 2017 at 10:03 PM, Ben Goertzel <b...@goertzel.org> wrote:
In principle the opencog Rule Engine and backward chainer can do this...

In principle, yes. In practice, you should get to know the internals of gcc or of clang/llvm or the java bytecode compiler, or even the guile-scheme bytecode compiler.  All of these have "intermediate languages" (IL) inside of them.  The IL are design to be
-- easy to machine-read
-- easy to machine-write
-- easy to machine-apply rewrite rules.  The rewrite rules are typically optimizations.

That is, IL is a kind of "programming language" designed for use by algorithms rather than by humans.

In pretty strong sense, opencog "atomese" is meant to be a super-general, super-whizzy IL.  However, the way that it is designed would make it rather inefficient for use as in IL inside of a compiler: gcc and llvm have nearly a century worth of hand-crafting to make them extremely fast and well suited for compilation.

If you don't know IL, then I'd recommend looking ath the guile IL. I've never had to use it, but I've skimmed it, and it looks ... interesting as far as such things go.  Note that the guile IL has both scheme, and some version of javascript sitting on top of it.

Anyway .. kind of the point of having atomese is in order to *avoid* the problem you mention:  using a machine to synthesize "mainstream high-level languages" is kind of awkward, precisely because those languages are designed for use by humans, not by machines.

There has been some work in this area; for example, if I recall correctly, "stalin" was a compiler that converted scheme to C code, which you could then compile with gcc, to get high-performance compiled scheme.

Cython does something similar: it generates C code too, although "high performance" is not what it does.  Also, the generated code is just barely human-readable. Its not obfuscated on purpose, its just not... human.

--linas
 



O

Linas Vepstas

unread,
May 9, 2017, 6:44:47 PM5/9/17
to opencog
Oh, and also to clarify what Alex says: yes, I forgot -- from the 1990's onwards (and even earlier e.g. smalltalk) there was this idea that people would draw diagrams, hit a button, and the diagrams get converted into high-level code. Alex's email uses the various catch-phrases that were popular in that era.

The problem with this "model-driven" programming style is that its a one-way street: you can convert your model into code, but you can't go the other way. There's no "uncompiler" or "disassembler".

One of the points of atomese, or more narrowly, of the pattern miner, or of learning, is to be able to "uncompile" or understand crazy sensory perceptions.  But before you can do that, you need all sorts of infrastructure, to allow this two-way movement.

None of the 1990's systems were ever designed to allow uncompiling (e.g. you cannot get the original scheme code back, after running stalin   https://en.wikipedia.org/wiki/Stalin_(Scheme_implementation)  you cannot get back the original cython/python code after running the cython compiler.

For me, some form of "uncompiling" is cirtical -- I'd argue that "learning", "inference", "deduction", etc are all a kind-of uncompiling of raw data into abstract structures that can be manipulated by thought.

consider for example, machine vision or face-recgonition algos: they are "uncompiling" an array of pixels into named objects with bounding boxes. This is the reverse of, and a lot harder than, e.g. 3d graphics that every computer game can do.

--linas

Linas Vepstas

unread,
May 9, 2017, 6:57:07 PM5/9/17
to opencog
Yes, opencog atomese is very much influenced by ideas from prolog.

However, unlike flora2 and ergo, it just seemed easier to cut loose and ignore all the other buzz-words: "semantic web" "W3C", RIF, etc because trying to track all of that, being buzzword compliant, was just wayyy too much work.

Also, atomsese is unlike prolog (or flora or ergo) because it's very interested in probabilistic methods: have not boolean true/false, but have probabilities attached to everything.  this means that in the end, all the buzzwords in flora/ergo would need to get ported to a probabilistic, uncertain-inference framework.  And that changes the game completely.

Also, another difference: those urls mention https://en.wikipedia.org/wiki/F-logic whereas atomese is a "meta-logic", so that you can layer f-logic or whatever your favorite system or logic, on top of atomese.  It tries hard to not care about what logic or KR system you want to use.  It got pushed that way because of the goofy arguments between Pei Wang's NARS group, and Opencog's PLN: I just said -- screw it, make a system that can do either or both at the same time.  whatever structure or formula you want to use .. Bayesian probability or something else ... its up to you.  The chainers and the pattern tools are meant to be generic.



--linas


--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

Ivan Vodišek

unread,
May 9, 2017, 7:19:45 PM5/9/17
to ope...@googlegroups.com
I think it was a very good decision not to be tied to a specific logic system, but to go a level further - to develop Atomese that can describe any system. I just don't get why didn't all of OpenCog add-ons (like MOSES) developed in Atomese. Why do you still have to use C, Scheme, Python and possibly other languages? In my opinion, universal rule rewriting system such is Atomese powered by chainers (did I get it right?) should be as complete as lambda calculus, thus should be capable to describe any conceivable algorithm.



Linas Vepstas

unread,
May 9, 2017, 8:54:11 PM5/9/17
to opencog
Hi Ivan,

On Tue, May 9, 2017 at 6:19 PM, Ivan Vodišek <ivan....@gmail.com> wrote:
I think it was a very good decision not to be tied to a specific logic system, but to go a level further - to develop Atomese that can describe any system. I just don't get why didn't all of OpenCog add-ons (like MOSES) developed in Atomese.

History. The moses vertex was supposed to be the same thing as the atom; that was always the intent, but no one really knew how to design an atom correctly.  It still doesn't quite feel right, it still seems kind-of heavy, clunky, somehow.

Anyway, given the pressure of writing code and getting good performance out of it ... design decisions get made, and the moses vertex-atom was created before the opencog-atom was fully designed. 
 
Why do you still have to use C, Scheme, Python and possibly other languages?

Because atomese is a horrible language for humans. Its not meant for humans -- its really like an IL, its meant to hold data structures that algorithms can mutate into various forms.

It would be cool if we had a compiler for it, but that does not yet seem urgent, and might be premature, its still not clear that everything got done right.
 
In my opinion, universal rule rewriting system such is Atomese powered by chainers (did I get it right?) should be as complete as lambda calculus, thus should be capable to describe any conceivable algorithm.

Uh, yes, there is an explicit lambda atom in atomese, and also an explicit beta-reduction atom, and the evaluator can evaluate them, so you can certainly map all of lambda calculus on there. You can even do a typed, probabilistic lambda calculus; atomese has a fairly rich type system.

 http://wiki.opencog.org/w/LambdaLink_and_ScopeLink
http://wiki.opencog.org/w/PutLink

http://wiki.opencog.org/w/Cog-evaluate!
http://wiki.opencog.org/w/Cog-execute!


I mean, I guess you could layer lisp or scheme on to of atomese, but it would be painfully slow, with the current evaluator, and would be quite bloated, since atoms are fat.  Atoms get indexed in the atomspace, and that ends up being costly in both cpu and ram.

We index them because we want to use the atomspace for KR... which conflicts with using them only for performing calculations.  Its a balancing act, its hard to figure out where the imbalance is.

--linas

Ivan Vodišek

unread,
May 10, 2017, 2:31:22 AM5/10/17
to ope...@googlegroups.com
Linas, thanks for a thoughtful explanations :-)

Andi

unread,
May 10, 2017, 8:46:34 AM5/10/17
to opencog, linasv...@gmail.com
Linas, thank you for your precise and profound explanations!


I mean, I guess you could layer lisp or scheme on to of atomese, but it would be painfully slow, with the current evaluator, and would be quite bloated, since atoms are fat.  Atoms get indexed in the atomspace, and that ends up being costly in both cpu and ram.

We index them because we want to use the atomspace for KR... which conflicts with using them only for performing calculations.  Its a balancing act, its hard to figure out where the imbalance is.
 
As far as I understand, what is going on here at OpenCog, an Atom is the most universal thing in the universe - able to represent  "all that is the case" - how Witti would say.

Universality is always in contradiction to performance. One can not balance this.
I think a step to overcome this is to compile certain types of atoms at run time to something optimized for performance and than recompile the results back to regular atoms.

Humans do this - why an AGI should not do the same?

Maybe especially at your main topic - link grammar. 
Somewhere I read your complaints, how slow it became when you ported it to the atomspace.

My thoughts about this was that there should be a possibility to transform a given  text corpus to a list of integers, where every int represents a word or sign, operate on this list and bring back the results to the atom space.  

Humans do this. Remember a situation in which you try to understand something new. You read about it, you draw diagrams etc. until you have the feeling that you completely understood. 

In case of words - if it is possible to transform a text corpus to 32bit ints or even  16bit ints,  it or chunks of it, could be put into the cpu-cache and could be worked on very fast.

My feeling is that this could close the gap between performance and universality: agents that transform formats at run time and specialized code for special tasks - in many  cases even GPUs could be involved.

NL-comprehension is crucial for AGI. You're going to have to face that - no escape ;) 

An AGI needs narrow AI-agents. A human genius is often someone who is able to perform narrow AI tasks.

--Andi


 

Linas Vepstas

unread,
May 10, 2017, 1:06:45 PM5/10/17
to Andi, opencog
On Wed, May 10, 2017 at 7:46 AM, Andi <gabil...@gmail.com> wrote:
Linas, thank you for your precise and profound explanations!

You are welcome! The more who understand this stuff, the better!
 
As far as I understand, what is going on here at OpenCog, an Atom is the most universal thing in the universe - able to represent  "all that is the case" - how Witti would say.

Yeah, I'm not sure where that name comes from. Opencog stole it from textbooks on logic; where it was before that I don't know. It might date to Whitehead and Hilbert.

Universality is always in contradiction to performance. One can not balance this.
I think a step to overcome this is to compile certain types of atoms at run time to something optimized for performance and than recompile the results back to regular atoms.

Well, we do: some atoms have C++ counterparts. The most complicated of these is the PatternLink, which stores a pre-compiled copies of the patterns that is searches for.  That way, when you call it, all the machinery is there, warm and ready to go.

> Maybe especially at your main topic - link grammar.
> Somewhere I read your complaints, how slow it became when you ported it to the atomspace.

> My thoughts about this was that there should be a possibility to transform a
> given  text corpus to a list of integers, where every int represents a word or
> sign, operate on this list and bring back the results to the atom space. 

Heh. You are on a slippery slope here. **everything** inside a computer is a "list of integers".  the question is always "which list of integers should it be".

-- Linas



Andi

unread,
May 10, 2017, 1:46:37 PM5/10/17
to opencog, gabil...@gmail.com, linasv...@gmail.com
come on linas, seems that you don't understand me on purpos :)

you make a python-like dictionary for every single symbol in your text. one symbol (word or sign or space etc.) is represented by one int. if there are less than 64k different symbols (words), what will be true for most books, you can take 16bit. with this you can put a medium sized book with maybe 400 pages directly into the CPU-cache and do your operations very quick....

Building such a dictionary should be very quick on the run and speed up everything and something like a tree of distances will become handy.

knowing that this is your special domain....

but this is what i am thinking about it today :)

--Andi

Alex

unread,
May 10, 2017, 3:20:22 PM5/10/17
to opencog
Well - here was mentioned idea that Atomspace/Atomese can be perceived like logical framework, like meta-logic in which every other kind of logic can be encoded. Well - logic is deductive reasoning and deductive reasoning is only one type of reasoning besides inductive, abductive and analogical reasoning. So - maybe not just meta-logic but even more - meta-reasoning system...

But I was taken aback by the fact, that PLN (and NARS) currently is based on Aristotelian term logic (just monadic fragment of the first order predicate logic), as I understand. I believe that PLN can be extended to full predicate logic or even to more general logics, but for my present endeavours I have decided to use Atomspace and Atomspace Scheme for the encoding of the entire knowledge base (facts, rules) but for the specific reasoning tasks I will try to call external reasoning system.

For example, legal reasoning (of which I am fan) consists from the commonsense reasoning which can be done in Atomspace but also it consists - in special cases - from formal inference that should be done in specific logics (e.g. deontic defeasible logics). In this formal case I will formulate this subtask in the terms of Atomspace and I will try to forward this subtask to the external reasoning engine and this engine will do reasoning and it will give results to Atomspace and I will be able to continue my inference in Atomspace with some strongly and provably deduced facts. How this sounds? Is it possible to call external functions from the Atomspace? E.g. can Atomspace BindLink call external program in its rule head?

For the external reasoning system I have choosen Coq proof asistant (which I am currently learning) which is capable of acting as meta-logic framework and in which it is possible to encode wealth of logics (e.g. http://www.cs.nuim.ie/~jpower/Research/LinearLogic/ Working with linear logic in Coq). Well - there are lot of meta-logic systems - including MMT project (Latin) https://kwarc.info/people/frabe/Research/rabe_habil_14.pdf and also other proof assistants but there are so much good references about Coq that I am sticking to it. 

There is long way to go before Atomspace can be used as strong meta-logic framework - provable and formal syntax and semantics should be defined for the Atomspace parts (maybe subset of Atomese atom and link types) that will be used in formal/rigorous reasoning engine. E.g. Atomspace should contain some atoms and links that can act as CIC (Calculus of Inductive Constructions) language in Coq in which wealth of other logics and programming languages can be encoded. I believe that such formalization of subspace of Atomspace can be done, but for achieving results here and now I am currently going with the combination of Atomspace and Coq.

Linas Vepstas

unread,
May 10, 2017, 5:34:46 PM5/10/17
to Andi, opencog
Hi Andi, I'm not sure how to respond to this. Every word is already an int --  a 32-bit int, which happens to be a pointer to the string of letters in the word. Maybe you could squeeze this down to 16 bits, but what's the point? 

The association of integers to "things" is called an "index", and there are many kinds of indexes: vectors (arrays), rb-trees and hash tables being the most popular.  Pretty much all software that does almost anything at all is packed to the gills with indexes of every kind. Its pretty fundamental to the definition of what computing is all about.

Simply having an index of words is not enough to do any kind of textual analysis at all. Typically, you need to know how often a word occurs, how often it occurs next to other words, whether it occurs more frequently on one page than another. You've got to compute all this information, and more, store it somewhere too, and apply god-knows-what algorithms to it: LSA or MI or word2vec or whatever.  Replacing a 32-bit pointer to a word string by a 16-bit int does pretty much precisely nothing to simplify the complexity of the data analysis problem.

Seriously, think about it. Google invented map-reduce to solve their data analysis problems. Apache Foundation shepherds along hadoop and tinkerpop and cassandra to deal with the indexing problem.  Text analysis and big data are just huge parts of the economy these days.  Quite infamously, Cambridge Analytica used text analysis to help get Trump elected. We are not living in the 1960's.

--linas

Linas Vepstas

unread,
May 10, 2017, 6:16:51 PM5/10/17
to opencog
Hi Alex,

I'll try to be brief.

On Wed, May 10, 2017 at 2:20 PM, Alex <alexand...@gmail.com> wrote:
Well - here was mentioned idea that Atomspace/Atomese can be perceived like logical framework, like meta-logic in which every other kind of logic can be encoded. Well - logic is deductive reasoning and deductive reasoning is only one type of reasoning besides inductive, abductive and analogical reasoning. So - maybe not just meta-logic but even more - meta-reasoning system...

Reasoning, in any logical framework, can be accomplished by rule application. These rules are often called "axioms", depending on the textbook.

As far as I know, there is little or no scholarly work on "meta-reasoning" or "meta-logic", and the reason for that is because the academic literature use a different set of words to talk about this. This includes "proof theory" and the related topics of topos theory, sheaf theory, type theory. A fun recent read on the topic is ludics from Jean-Yves Girard.  These are the things that are foundational to "meta-reasoning".

But I was taken aback by the fact, that PLN (and NARS) currently is based on Aristotelian term logic (just monadic fragment of the first order predicate logic),
as I understand. I believe that PLN can be extended to full predicate logic or even to more general logics,

I think you misunderstand. The word "term" being used here is not in the sense of "term logic", but in the sense of a "term" in logic, for example "term algebra", Term algebras underly things like "model theory", which in turn underlies logics of various sorts.

https://en.wikipedia.org/wiki/Term_algebra

The "atoms" in opencog are the same thing as the "atoms" in that wikipedia article.  So, for example, these articles describe opencog atoms:

https://en.wikipedia.org/wiki/Atomic_formula
https://en.wikipedia.org/wiki/Atomic_sentence

Opencog truth values are interpretations of structures, which are defined here:

https://en.wikipedia.org/wiki/Structure_(mathematical_logic)

For example, legal reasoning (of which I am fan) consists from the commonsense reasoning which can be done in Atomspace but also it consists - in special cases - from formal inference that should be done in specific logics (e.g. deontic defeasible logics).

Sure you can do that. the atomspace doesn't care which logic you care to use.  Just plug in the rules (axioms) that define deontic logic, add in some data, turn on the chainers, and off you go.


Is it possible to call external functions from the Atomspace? E.g. can Atomspace BindLink call external program in its rule head?

Yes. RTFM

For the external reasoning system I have choosen Coq proof asistant (which I am currently learning) which is capable of acting as meta-logic framework and in which it is possible to encode wealth of logics

Yes, that's fine. you should learn coq. It will help you understand what some parts of opencog are like.

Note, however, that the "interpretations of structures" that coq assigns are all crisp true/false boolean values.  In the atomspace, and in the chainers, we allow arbitrary groupings of floating-point numbers.  This allows us to model bayesian probabilities, among other things, which, to the best of my understanding, coq cannot do.  Unfortunaltely, this means that the algorithms that underly coq, such as DPLL and SAT solvers, simply cannot be used for inference control in the opencog chainers.
 
 
> There is long way to go before Atomspace can be used as strong meta-logic framework

You only say that because you don't understand what the atomspace is.

-- linas

Dmitry Ponyatov

unread,
May 13, 2017, 12:06:22 AM5/13/17
to opencog
Imprssive paper on hypergraph software representation:
http://kobrix.com/documents/rse.pdf
[Borislav Iordanov] Rapid Software Evolution

Reply all
Reply to author
Forward
0 new messages