Speculation/musing -- possible future Common-LISP / CLASP wrapping for OpenCog and related tools

72 views
Skip to first unread message

Ben Goertzel

unread,
Jun 11, 2017, 12:11:57 AM6/11/17
to opencog
!!! Caveat: This post presents some speculative suggestions, not to be
interpreted

as a definitive plan or mandate for work-to-be-done or anything like

that; just as an interesting-looking directly for investigation


TL;DR — particularly for use of OpenCog in bioinformatics and other

scientific-computing applications — but also for other applications like

corpus linguistics where one deals with large datasets and wants to

combine OpenCog with other command-line tools — it might be a good

idea to replace or supplement our current Guile shell with a wrapping-up

of OpenCog in Common Lisp … and in particular in the CLASP Common

Lisp framework that Christian Schafmeister has developed. This would

also have some other smaller side-benefits like letting us exploit the CL

bindings for Jupiter Notebooks which would be cool for OpenCog

tutorials; and making it easy to generate LISP bindings for ROS, thus

avoiding the need to deal with ros-py for OpenCog robotics applications…


Basic reasons are: CLASP is efficient at handling large datasets and

piping them from one place to another. It can also automatically

generate bindings for C++ code, which could be used to auto-generate

LISP bindings for ROS….





So I met Christian Schafmeister at an AI-nanotech workshop in Palo

Alto late last month, and I became aware of this really cool LISP

environment for scientific computing that he's been developing...


https://github.com/drmeister/clasp


https://github.com/drmeister/cando


https://drmeister.wordpress.com/


He has some strong arguments as to why this is a better way than R or

python to get scientific computing done on large datasets….


One thing he found is that in many of his computational chemistry

analysis scripts, the bulk of compute time was getting taken up

passing around datasets and results between different C++ programs,

in R or python or whatever other glue language was being used…


Via using Common LISP compiled into LLVM, he found this could be

worked around, because one could then script stuff in CL but have

efficient garbage collection done via LLVM …


My speculative line of thinking now is that it might be interesting to


-- supplement or replace Guile with Clasp as a shell for working with OpenCog


-- Integrate C++ bio-analytics tools into Clasp, in a similar way to

what Schafmeister has been doing for chem-analytics tools


— Integrate R bio-analytics tools into Clasp as well, using the following

LLVM compiler for R


https://github.com/duncantl/RLLVMCompile


https://arxiv.org/abs/1409.3144


(although I note this compiler appears to still need a bit of work to be fully

generally usable…)


— Use the Clasp tools for automatically generating and updating LISP bindings

for C++ code, to auto-generating ROS bindings for CL


— Use the Jupyter Notebooks wrapping for CL to make Jupyter tutorials

for OpenCog


(although, as a side note, integrating Guile with Jupyter Notebooks

would also not be impossible ... it would just be a bunch of work,

like any of this…)





I note, one can auto-convert Guile to Common Lisp using existing available

scripts, though obviously this will require some hand-checking afterwards...

For OpenCog uses of Guile that are basically just “Guile wrapping Atomese”,

auto-conversion to CL would likely work fine. For cases where there is

more actual programming done in Guile, more hand-adjustment of conversions

to CL would probably be necessary..


One could also emulate what Schafmeister did for C++, and auto-generate CL

bindings for R packages. Although much of the time, the R packages are just

wrapping C++ functions; so in those cases, it might sometimes be better just

to bypass R and go straight from the underlying C++ functions to CL …


Another possibility would be to develop R-like syntax in a domain-specific

language within CL … much as we’re now developing ChatScript-like

syntax within Guile for authoring content for the Hanson robots…





Anyway it is not yet clear that the above would be a great idea, and obviously

it would be a lot of work to do…. However I think that, insofar as we can find

time to consider new development directions, this is worth thinking about…


— Ben


--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin

AmeBel

unread,
Jun 11, 2017, 11:03:17 PM6/11/17
to opencog

Ben Goertzel

unread,
Jun 12, 2017, 2:08:54 AM6/12/17
to opencog
nice!

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/5b2ee105-32bb-4a1f-b284-e3c15c555fc3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Linas Vepstas

unread,
Jun 17, 2017, 10:48:59 PM6/17/17
to opencog
Hi Ben,

I like the general idea, but many particulars I don't like.  Yes, we need something like this, but maybe there's an alternate route.

TL;DR:  Maybe we should pick the science tools first, and then ask which languages we should target.

You met a really great salesman who convinced you that his CL product is the best, but I think you need to step back and get some bearings:

-- why is his CL better than bigger, established CL projects?
-- why is some other CL project better than some other scheme project?
-- why not consider scala or haskel?
-- are there science libraries for scala or haskel, or some other scheme or lisp?
-- maybe its easier to fix cython, and just use scipy with cython?  assuming scipy is adequate for the science needs?

The "autogenerating c++ bindings" is total sales bullhockey.  Recall that cython, guile, swig, and two dozen other things were invented o "autogenerate C++ bindings".  Even haskell has this ability, and you've seen how it took Roman just one afternoon to do this!

--linas


On Sat, Jun 10, 2017 at 11:11 PM, Ben Goertzel <b...@goertzel.org> wrote:


TL;DR — particularly for use of OpenCog in bioinformatics and other

scientific-computing applications — but also for other applications like

corpus linguistics where one deals with large datasets and wants to

combine OpenCog with other command-line tools —

"with other sparse-data, big-data and graph processing tools" would be more appropriate.
 
it might be a good

idea to replace or supplement our current Guile shell with a wrapping-up

of OpenCog in Common Lisp …

Here and elsewhere, you confuse scheme, guile and  lisp.  So

1) scheme is a dialect of lisp, so is common lisp.

2) guile is a specific implementation of scheme that allows scheme to be used from c++.  There are very few implementations of scheme or lisp that allow this to take place.

3) Creating language bindings for opencog is a lot of work. Creating common lisp bindings would be roughly as much work as it is for guile, python or haskell.
 
and in particular in the CLASP Common

Lisp framework that Christian Schafmeister has developed. 

Personally, I am nervous by one-man-shows.  The one-man has a historical tendency to get bored and move on to something else, and then everything stops and falls apart.
 
 This would

also have some other smaller side-benefits like letting us exploit the CL

bindings for Jupiter Notebooks which would be cool for OpenCog

tutorials;

I played with jupyiter a fair amount, hoping to use it as a personal diary of the data analysis I do in opencog.  Upshot: its way cool, and very immature.  It struggles to do even basic stuff, and you got to load on and confugre all sorts of add-ons to get it to behave properly. So its a cool idea, but still very immature and half-baked.
 
and making it easy to generate LISP bindings for ROS,

those already exist  but we fall into the age-old trap: exactly which variant of common lisp do you propose?  I count seven major, 20 minor variants:

http://www.cliki.net/Common+Lisp+Implementation

its like picking racket-scheme over guile-scheme over mzscheme over chicken-scheme.

the point behind scheme is that its "modern", fixing many of the "bugs" in lisp.

 
thus

avoiding the need to deal with ros-py for OpenCog robotics applications…


Basic reasons are: CLASP is efficient at handling large datasets and

piping them from one place to another.   It can also automatically

generate bindings for C++ code, which could be used to auto-generate

LISP bindings for ROS….

Yeah, like total bullshit.  Everyone has something called "FFI" (foreign function interface) and everyone always says that FFI makes it trivial to interface their programming language to c++ code.  Which is half-way correct, for small and simple C/C++ libraries.

This falls apart for something more complex, which is why things like SWIG and Cython  and Guile get invented.   There's got to be a dozen-or-two more of these "automatically generate bindings for c++ code" tools out there.  They all work sort-of-ish OK.  Up to a point, and then hell breaks loose.

Opencog is clearly on the other side of that: otherwise Roman would have finished the haskell bindings in an afternoon, just by autogenerating them with FFI.

"It's easy" is the program managers swan song. What it really means is "i see a way of doing something"




He has some strong arguments as to why this is a better way than R or

python to get scientific computing done on large datasets….
 
That is probably true. I don't get the impression that R or scipy are ready for large datasets. 


One thing he found is that in many of his computational chemistry

analysis scripts, the bulk of compute time was getting taken up

passing around datasets and results between different C++ programs,

in R or python or whatever other glue language was being used…

That is must surely be very true

On the other hand, the atomspace is a kind-of database for holding datasets, so what is really needed is a way to have these tools talk directly to the data that is lready there, in the atomspace.  The point is: bring the tools to where the data is, don't try to cart the data around between the tools.


Via using Common LISP compiled into LLVM, he found this could be

worked around, because one could then script stuff in CL but have

efficient garbage collection done via LLVM …

one can script stuff in scheme, and have efficient garbage collection with bdwgc ... oh wait, we already have that.  Don't let curtis confuse you: if you do garbage collection once every sentence, then, yes, garbage collection will take up a large fraction of the time.  Don't do garbage collection on ever sentence, and the problem goes away.


My speculative line of thinking now is that it might be interesting to


-- supplement or replace Guile with Clasp as a shell for working with OpenCog

Non-starter, for the above reasons. First, it will take just as much or more time than haskel, cython or guile took,  and clasp is far more obscure, and has a far smaller user base.


-- Integrate C++ bio-analytics tools into Clasp,

Or pick some other, more popular variant of lisp or scheme.
 
in a similar way to

what Schafmeister has been doing for chem-analytics tools


— Integrate R bio-analytics tools into Clasp as well, using the following

LLVM compiler for R

Ir integrate lisp into java, like clojure, or integrate Groovy or Scala or Jython or ....

Besides Haskell, I think scala is the other interesting one to consider.


— Use the Jupyter Notebooks wrapping for CL to make Jupyter tutorials

for OpenCog


Jupyter + scheme would be be the way to go. A lot easier, I'm guessing maybe 50 times less work.
 


(although, as a side note, integrating Guile with Jupyter Notebooks

would also not be impossible ... it would just be a bunch of work,

like any of this…)

It would be like several orders of magnitude less work.  Like maybe weeks, instead of half-year++ or year++


That's it. I think that basically, you met a really great salesman who convinced you that his CL product is the best, but I think you need to step back and get some bearings:

-- why is his CL better than bigger, established CL projects?
-- why is some other CL project better than some other scheme project?
-- why not consider scala or haskel?
-- are there science libraries for scala or haskel, or some other scheme?
-- maybe its easier to fix cython, and just use scipy with cython?  assuming scipy is adequate for the science needs?

Are we putting the cart before the horse? Maybe we should pick the science tools first, and then ask which languages we shoul target?


 

gmail.com.

Ben Goertzel

unread,
Jun 17, 2017, 11:12:06 PM6/17/17
to opencog
Hi Linas, thanks for the detailed thoughts and responses... I will
write an equally carefu reply when I get time; or else I will return
to all this in detail when/if resources are obtained to pursue these
sorts of directions... (which I hope will be soon...)

About Jupyter Notebooks, I note that Google is now using it to supply
new Tensorflow tutorials... I do think it would be great if we could
get our OpenCog tutorials working with Jupyter. Perhaps making
Jupyter support Guile is indeed the easiest way... it seems this
"shouldn't be hard" ... I agree Jupyter is not perfect, but it's a
lot nicer than embedding code in the wiki pages like we are now,
tutorial-wise...

I agree that if we wanted to go with CL, we would want to test
Christian's CL approach thoroughly versus others before making any
sort of decision to undertake a lot of work...

About using OpenCog in a data analysis context -- so for applications
like bio-data-analysis, or analyzing network data for a networking
company, it's clear there are non-OpenCog tools that will be used for
data preprocessing, for results evaluation, etc. Putting all these
other tools in Atomspace/OpenCog isn't viable. There are going to be
scripts that pass data thru various other things, then through
OpenCog/Atomspace, then through various other things.... For the
next 5 years at least, probably more.... So finding a great way to
support this kind of workflow is going to be important. Maybe the
Guile shell is a great way to support this kind of workflow.

In the bioinformatics case it happens that the preponderance of needed
tools are coded in R, not sci-py ... we need e.g. the 100 or so
scripts in bioconductor (an R package). It is true that the heavy
lifting behind R packages is usually done in C++, but we need more
than just the heavy lifting, we need the bio-specific data-munging
code that is in the R code also.... So that is part of what I'm
wrestling with. How to build a framework that makes sense in an
OPenCog-universe context, and also makes sense in terms of the
existing ecosystem of bio-AI/informatics tools that we need OpenCog to
be used together with...

Schafmeister is using a whole other set of tools for his
nanotech/cheminformatics work... C++ and FORTRAN not R.... But
genomics is in R these days...

-- Ben
> --
> You received this message because you are subscribed to the Google Groups
> "opencog" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to opencog+u...@googlegroups.com.
> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/opencog/CAHrUA37qymACH%3DYyVtAxuu0adk224_yL7dpsUb5An5KyD0v76Q%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



Linas Vepstas

unread,
Jun 17, 2017, 11:43:27 PM6/17/17
to opencog, Nala Ginrut

1) jupyter is probably great for tutorials, and is nice for collaboration, but is immature and inadequate if you want to use it as an actual diary for actual results (which is my personal use-case).  So sure, install it now on the opencog website, and have at it.  Its clearly some kind of wave of some kind of future.  I doubt that sticking guile in it would be hard. Maybe we should ask Nala Ginrut about this.

2)  I most emphatically did NOT say "put other tools inside of opencog". I said the exact opposite.  Our data is in opencog. Put a pipeline between where the data is: in opencog, and these other tools. 

3) the "guille shell" is wayyyyy too low-level for a data API. Using scheme and using atomese are like programming in assembly code.  Don't do that.  Instead I want to say things like "I have a sparse matrix, please do PCA or SVD or factor analysis or blah blah algo on my sparse matrix."  I am willing to write a small shim (in guile or in c++ or maybe even ... shiver... python) that translates between their sparse-matrix format, and mine (which are EvaluationLink's)    The key is that this shim must be small, simple, easy-to-write --- an afternoon, a few days, a week at most, or its just not worth it. 

4) I have no clue at all on how R accesses data. Can't you write some small shim, that allows R to reach into opencog to get the data it needs?  How hard can this be?  Surely lots of people do something like this, right?

The point is that you are NOT creating a programming language API for opecog, which is hard to do. The point is that you are creating a pipeline to move data to and from opencog, which is a lot easier. orders of magnitude easier.  And is much closer to what you want, in the end, anyway.

--linas




> To post to this group, send email to ope...@googlegroups.com.
> Visit this group at https://groups.google.com/group/opencog.
> To view this discussion on the web visit
--
Ben Goertzel, PhD
http://goertzel.org

"I am God! I am nothing, I'm play, I am freedom, I am life. I am the
boundary, I am the peak." -- Alexander Scriabin

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.

To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.

Ben Goertzel

unread,
Jun 17, 2017, 11:52:09 PM6/17/17
to opencog, Nala Ginrut
On Sun, Jun 18, 2017 at 11:43 AM, Linas Vepstas <linasv...@gmail.com> wrote:
> 4) I have no clue at all on how R accesses data. Can't you write some small
> shim, that allows R to reach into opencog to get the data it needs? How
> hard can this be? Surely lots of people do something like this, right?


We'll see...

We have an R wrapper for MOSES now which is being used to apply MOSES
to genomic data...

Now Hedra and Mike are experimenting with PLN backward chaining
inference on bio-data in the Atomspace (building on stuff Eddie and
Misgana did way back w/ the forward chainer)... we will want to make
some good way to access this from R, but need to figure out what this
will be....

Linas Vepstas

unread,
Jun 17, 2017, 11:57:04 PM6/17/17
to opencog, Michael Duncan
I mean for R -- seriously, this cannot possibly be hard.  I used R for a few days, almost a week, and got the impression it was mostly vectors and arrays.   I'm pretty sure that in R, there is a C/C++ wrapper, where you can create a "special" R vector, or R array, and whenever you access item k in the vector, or item i,j,k in the array, that wrapper calls your C++ code, and the only thing your C++ code has to do is to return a float or an int or a string.

For us, under the covers that C++ code just gets or sets some truth-value (or other value) on some pile of atoms in the atomspace.  This can't possibly be more than a few days worth of work.  and bingo: you can now move data between R and the atomspace.

I'm really pretty sure I saw this kind of API in R, somwhere.

--lins

Linas Vepstas

unread,
Jun 18, 2017, 12:08:06 AM6/18/17
to opencog
I quote:

Overview

The Rcpp package provides C++ classes that greatly facilitate interfacing C or C++ code in R packages using the .Call() interface provided by R. Rcpp provides matching C++ classes for a large number of basic R data types. Hence, a package author can keep his data in normal R data structures without having to worry about translation or transfering to C++. At the same time, the data structures can be accessed as easily at the C++ level, and used in the normal manner. The mapping of data types works in both directions. It is as straightforward to pass data from R to C++, as it is it return data from C++ to R. The following two sections list supported data types.

That describes, to me, an almost nearly perfectly ideal API between opencog and R.

In our case, (my case) I have a 2D matrix, which is an (evaluationlink (predicate "foo") (List x,y)) and the x,y are the coordinates of the matrix.  If R asks for x,y, and I have this eval link, then I return the TV on it. If I don't have this pair, I return zero.  And vice versa: if R asks me to store a value for x,y, then just create BlahNode X and blortNode Y and stuff them into the evaluation link, store the number the TV or other value, and you're done. generic atomspace values work, now. they are vectors of floats strings or other values.

https://journal.r-project.org/archive/2011-2/RJournal_2011-2_Plummer.pdf


--linas

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
Reply all
Reply to author
Forward
0 new messages