Link-grammar 5.0.0 build error.

93 views
Skip to first unread message

Amen Belayneh

unread,
Apr 9, 2014, 4:48:48 AM4/9/14
to link-g...@googlegroups.com
I get the following output , when running make

make[1]: Entering directory `/home/vagrant/link-grammar-5.0.0/link-grammar'
make[2]: Entering directory `/home/vagrant/link-grammar-5.0.0/link-grammar'
  CC     analyze-linkage.lo
  CC     api.lo
  CC     build-disjuncts.lo
  CC     constituents.lo
  CC     count.lo
  CC     dict-common.lo
  CC     dictionary.lo
  CC     read-dict.lo
  CC     read-regex.lo
  CC     word-file.lo
  CC     disjunct-utils.lo
  CC     disjuncts.lo
  CC     error.lo
  CC     expand.lo
  CC     extract-links.lo
  CC     fast-match.lo
  CC     idiom.lo
  CC     post-process.lo
  CC     pp_knowledge.lo
  CC     pp_lexer.lo
  CC     pp_linkset.lo
  CC     preparation.lo
  CC     print.lo
  CC     print-util.lo
  CC     prune.lo
  CC     regex-morph.lo
  CC     resources.lo
  CC     spellcheck-hun.lo
  CC     string-set.lo
  CC     tokenize.lo
  CC     utilities.lo
  CC     word-utils.lo
make[2]: Leaving directory `/home/vagrant/link-grammar-5.0.0/link-grammar'
make[1]: Leaving directory `/home/vagrant/link-grammar-5.0.0/link-grammar'
Making all in viterbi
make[1]: Entering directory `/home/vagrant/link-grammar-5.0.0/viterbi'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/home/vagrant/link-grammar-5.0.0/viterbi'
Making all in bindings
make[1]: Entering directory `/home/vagrant/link-grammar-5.0.0/bindings'
Making all in java
make[2]: Entering directory `/home/vagrant/link-grammar-5.0.0/bindings/java'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/home/vagrant/link-grammar-5.0.0/bindings/java'
Making all in ocaml
make[2]: Entering directory `/home/vagrant/link-grammar-5.0.0/bindings/ocaml'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/home/vagrant/link-grammar-5.0.0/bindings/ocaml'
Making all in perl
make[2]: Entering directory `/home/vagrant/link-grammar-5.0.0/bindings/perl'
make  all-am
make[3]: Entering directory `/home/vagrant/link-grammar-5.0.0/bindings/perl'
  CXX    clinkgrammar_la-lg_perl_wrap.lo
../../bindings/perl/lg_perl_wrap.cc: In function 'void boot_clinkgrammar(PerlInterpreter*, CV*)':
../../bindings/perl/lg_perl_wrap.cc:4917:3: warning: unused variable 'items' [-Wunused-variable]
  CXXLD  clinkgrammar.la
make[3]: Leaving directory `/home/vagrant/link-grammar-5.0.0/bindings/perl'
make[2]: Leaving directory `/home/vagrant/link-grammar-5.0.0/bindings/perl'
Making all in python
make[2]: Entering directory `/home/vagrant/link-grammar-5.0.0/bindings/python'
make  all-am
make[3]: Entering directory `/home/vagrant/link-grammar-5.0.0/bindings/python'
  CXX    _clinkgrammar_la-lg_python_wrap.lo
../../bindings/python/lg_python_wrap.cc: In function 'void init_clinkgrammar()':
../../bindings/python/lg_python_wrap.cc:6185:21: warning: variable 'md' set but not used [-Wunused-but-set-variable]
  CXXLD  _clinkgrammar.la
make[3]: Leaving directory `/home/vagrant/link-grammar-5.0.0/bindings/python'
make[2]: Leaving directory `/home/vagrant/link-grammar-5.0.0/bindings/python'
make[2]: Entering directory `/home/vagrant/link-grammar-5.0.0/bindings'
make[2]: Nothing to be done for `all-am'.
make[2]: Leaving directory `/home/vagrant/link-grammar-5.0.0/bindings'
make[1]: Leaving directory `/home/vagrant/link-grammar-5.0.0/bindings'
Making all in link-parser
make[1]: Entering directory `/home/vagrant/link-grammar-5.0.0/link-parser'
  CC     link-parser.o
  CC     command-line.o
  CCLD   link-parser
../link-grammar/.libs/liblink-grammar.so: undefined reference to `dictionary_create_from_db'
../link-grammar/.libs/liblink-grammar.so: undefined reference to `check_db'
collect2: ld returned 1 exit status
make[1]: *** [link-parser] Error 1
make[1]: Leaving directory `/home/vagrant/link-grammar-5.0.0/link-parser'
make: *** [all-recursive] Error 1

 as a result it is not installing. Any help on how to fix it is  greatly appreciated :-)

Linas Vepstas

unread,
Apr 9, 2014, 11:06:33 AM4/9/14
to link-grammar
the short term fix is to install sqlite3-dev  I'll issue a 5.0.1 that doesn't need that in a few minutes

--linas


--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To post to this group, send email to link-g...@googlegroups.com.
Visit this group at http://groups.google.com/group/link-grammar.
For more options, visit https://groups.google.com/d/optout.

Amen Belayneh

unread,
Apr 10, 2014, 10:24:59 AM4/10/14
to link-g...@googlegroups.com, linasv...@gmail.com
I tried to build 5.0.1 but i get the following error,

Making all in link-grammar
make[1]: Entering directory `/home/vagrant/link-grammar-5.0.1/link-grammar'
make[2]: Entering directory `/home/vagrant/link-grammar-5.0.1/link-grammar'
  CC     analyze-linkage.lo
  CC     and.lo
  CC     api.lo
  CC     build-disjuncts.lo
  CC     constituents.lo
  CC     count.lo
  CC     dict-common.lo
In file included from dict-common.c:23:0:
dict-sql/read-sql.h:25:9: warning: no previous prototype for 'check_db' [-Wmissing-prototypes]
dict-sql/read-sql.h:26:12: warning: no previous prototype for 'dictionary_create_from_db' [-Wmissing-prototypes]
  CC     dictionary.lo
In file included from dict-file/dictionary.c:27:0:
./dict-sql/read-sql.h:25:9: warning: no previous prototype for 'check_db' [-Wmissing-prototypes]
./dict-sql/read-sql.h:26:12: warning: no previous prototype for 'dictionary_create_from_db' [-Wmissing-prototypes]
  CC     read-dict.lo
  CC     read-regex.lo
  CC     word-file.lo
  CC     read-sql.lo
  CC     disjunct-utils.lo
  CC     disjuncts.lo
  CC     error.lo
  CC     expand.lo
  CC     extract-links.lo
  CC     fast-match.lo
  CC     fat.lo
  CC     idiom.lo
  CC     massage.lo
  CC     post-process.lo
  CC     pp_knowledge.lo
  CC     pp_lexer.lo
  CC     pp_linkset.lo
  CC     prefix.lo
  CC     preparation.lo
  CC     print.lo
  CC     print-util.lo
  CC     prune.lo
  CC     regex-morph.lo
  CC     resources.lo
  CC     spellcheck-aspell.lo
  CC     spellcheck-hun.lo
  CC     string-set.lo
  CC     tokenize.lo
  CC     utilities.lo
  CC     word-utils.lo
.libs/dictionary.o: In function `check_db':
/home/vagrant/link-grammar-5.0.1/link-grammar/./dict-sql/read-sql.h:25: multiple definition of `check_db'
.libs/dict-common.o:/home/vagrant/link-grammar-5.0.1/link-grammar/dict-sql/read-sql.h:25: first defined here
.libs/dictionary.o: In function `dictionary_create_from_db':
/home/vagrant/link-grammar-5.0.1/link-grammar/./dict-sql/read-sql.h:26: multiple definition of `dictionary_create_from_db'
.libs/dict-common.o:/home/vagrant/link-grammar-5.0.1/link-grammar/dict-sql/read-sql.h:26: first defined here
collect2: ld returned 1 exit status
make[2]: *** [liblink-grammar.la] Error 1
make[2]: Leaving directory `/home/vagrant/link-grammar-5.0.1/link-grammar'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/vagrant/link-grammar-5.0.1/link-grammar'
make: *** [all-recursive] Error 1

Running, 
 sudo apt-get install sqlite3 libsqlite3-dev
make clean
./configure
make 
 
didn't fix it either

Linas Vepstas

unread,
Apr 10, 2014, 12:16:30 PM4/10/14
to Amen Belayneh, link-grammar
arghhh. There is now a version-5.0.2 that should work. Actually tested, this time.


--linas

Danny Brian

unread,
Apr 15, 2014, 8:22:51 PM4/15/14
to link-g...@googlegroups.com
5.0.3 is giving me these compilation errors on OS X:

duplicate symbol _afdict_classnum in:
    .libs/api.o
    .libs/build-disjuncts.o
duplicate symbol _afdict_classnum in:
    .libs/api.o
    .libs/dict-common.o
duplicate symbol _afdict_classnum in:
    .libs/api.o
    .libs/dictionary.o
duplicate symbol _afdict_classnum in:
    .libs/api.o
    .libs/read-dict.o
duplicate symbol _afdict_classnum in:
    .libs/api.o
    .libs/word-file.o
duplicate symbol _afdict_classnum in:
    .libs/api.o
    .libs/read-sql.o
duplicate symbol _afdict_classnum in:
    .libs/api.o
    .libs/idiom.o
duplicate symbol _afdict_classnum in:
    .libs/api.o
    .libs/tokenize.o
ld: 8 duplicate symbols for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [liblink-grammar.la] Error 1

On Ubuntu, I'm getting further, but then getting this:

  Making all in perl
  make[2]: Entering directory `/home/Downloads/link-grammar-5.0.3/bindings/perl'
  make[2]: *** No rule to make target `../../bindings/perl/lg_perl_wrap.cc', needed by `all'.  Stop.




Danny Brian

unread,
Apr 15, 2014, 9:13:34 PM4/15/14
to link-g...@googlegroups.com
Prior to installing SWIG on Linux, I was getting this output from configure:

Swig interfaces generator:      no
Perl interfaces:                yes

So I'm guessing that the logic to build the Perl interfaces needs to add SWIG as a dependency.

As for the OS X build failure, I'm at a bit of a loss.

Linas Vepstas

unread,
Apr 15, 2014, 10:52:34 PM4/15/14
to link-grammar
The work-around for the afdict problem is this:

Index: dict-common.h
===================================================================
--- dict-common.h (revision 34173)
+++ dict-common.h (working copy)
@@ -29,7 +29,7 @@
 
 /* Connector names for the affix class lists in the affix file */
 
-enum {
+typedef enum {
  AFDICT_RPUNC=0,
  AFDICT_LPUNC,
  AFDICT_QUOTES,



I'll put out a version 5.0.4 with this fix, tomorrow.


Linas Vepstas

unread,
Apr 15, 2014, 11:04:11 PM4/15/14
to link-grammar
On 15 April 2014 19:22, Danny Brian <da...@brians.org> wrote:

On Ubuntu, I'm getting further, but then getting this:

  Making all in perl
  make[2]: Entering directory `/home/Downloads/link-grammar-5.0.3/bindings/perl'
  make[2]: *** No rule to make target `../../bindings/perl/lg_perl_wrap.cc', needed by `all'.  Stop.


Hmm. That's confusing. That file is included in the tarball, it should be found.  What directory do you build in?  Do you just say ./configure; make  or do you do something like mkdir build; cd build; ../configure; make  ?

If you can still reproduce this, then send me the tail end of a make V=1 which  spews a verbose output.

-- Linas

Danny Brian

unread,
Apr 15, 2014, 11:19:33 PM4/15/14
to link-g...@googlegroups.com
Ahh. It's make clean.

bindings/perl/Makefile

298: BUILT_SOURCES = $(top_builddir)/bindings/perl/lg_perl_wrap.cc

305: CLEANFILES = $(BUILT_SOURCES) $(dist_pkgperl_SCRIPTS)

584:  clean-generic:
585:        -test -z "$(CLEANFILES)" || rm -f $(CLEANFILES)



--

Linas Vepstas

unread,
Apr 15, 2014, 11:47:10 PM4/15/14
to link-grammar
OK, yes, that's my fault. I'll stick that into 5.0.4 as well. 

jack...@topicquests.org

unread,
Apr 18, 2020, 7:39:45 PM4/18/20
to link-grammar
I know this is an ancient bug, but I'm building 5.8.0 on MacOS 10.15.5
After using brew to fill in some missing stuff, I did the autoconf.sh, jumped to /build, did a make clean just to be sure, then a make.
is the gist of my error; on the surface, it's not unlike what's described in this thread, ending with this

Undefined symbols for architecture x86_64:
  "_regex_tokenizer_test", referenced from:
     -exported_symbol[s_list] command line option
ld: symbol(s) not found for architecture x86_64

clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [liblink-grammar.la] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive] Error 1

googlefoo isn't getting much past that. I'll appreciate any comments or thoughts.
Thanks
Jack


On Tuesday, April 15, 2014 at 8:47:10 PM UTC-7, Linas Vepstas wrote:
OK, yes, that's my fault. I'll stick that into 5.0.4 as well. 
On 15 April 2014 22:19, Danny Brian <da...@brians.org> wrote:
Ahh. It's make clean.

bindings/perl/Makefile

298: BUILT_SOURCES = $(top_builddir)/bindings/perl/lg_perl_wrap.cc

305: CLEANFILES = $(BUILT_SOURCES) $(dist_pkgperl_SCRIPTS)

584:  clean-generic:
585:        -test -z "$(CLEANFILES)" || rm -f $(CLEANFILES)

On Tue, Apr 15, 2014 at 9:04 PM, Linas Vepstas <linasv...@gmail.com> wrote:



On 15 April 2014 19:22, Danny Brian <da...@brians.org> wrote:

On Ubuntu, I'm getting further, but then getting this:

  Making all in perl
  make[2]: Entering directory `/home/Downloads/link-grammar-5.0.3/bindings/perl'
  make[2]: *** No rule to make target `../../bindings/perl/lg_perl_wrap.cc', needed by `all'.  Stop.


Hmm. That's confusing. That file is included in the tarball, it should be found.  What directory do you build in?  Do you just say ./configure; make  or do you do something like mkdir build; cd build; ../configure; make  ?

If you can still reproduce this, then send me the tail end of a make V=1 which  spews a verbose output.

-- Linas

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-g...@googlegroups.com.

To post to this group, send email to link-g...@googlegroups.com.
Visit this group at http://groups.google.com/group/link-grammar.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-g...@googlegroups.com.

Amir Plivatsky

unread,
Apr 18, 2020, 7:48:34 PM4/18/20
to link-grammar
Hello Jack,
Just remove regex_tokenizer_test from the file link-grammar/link-grammar.def and rebuild.

Amir

Jack Park

unread,
Apr 18, 2020, 8:19:23 PM4/18/20
to link-g...@googlegroups.com
Thank you, Amir. That got a clean compile, but now I see a different make check error (Sorry, different subject)
Here's the console trace

Making check in python-examples
/Applications/Xcode.app/Contents/Developer/usr/bin/make  check-TESTS
FAIL: tests.py
============================================================================
Testsuite summary for link-grammar 5.8.0
============================================================================
# TOTAL: 1
# PASS:  0
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
============================================================================
See bindings/python-examples/test-suite.log
Please report to https://github.com/opencog/link-grammar
============================================================================
make[4]: *** [test-suite.log] Error 1
make[3]: *** [check-TESTS] Error 2
make[2]: *** [check-am] Error 2
make[1]: *** [check-recursive] Error 1
make: *** [check-recursive] Error 1

The test-suite.log says this
Running by: /usr/local/opt/python/bin/python3.7
Running ../../../bindings/python-examples/tests.py in: /Users/jackpark/Documents/workspaceEclipse/tqos-relex-kafka-agent/relex/link-grammar-link-grammar-5.8.0/build/bindings/python-examples
PYTHONPATH=../../../bindings/python-examples/../python:../python:../python/.libs
srcdir=../../../bindings/python-examples
LINK_GRAMMAR_DATA=../../../bindings/python-examples/../../data
Traceback (most recent call last):
  File "/Users/jackpark/Documents/workspaceEclipse/tqos-relex-kafka-agent/relex/link-grammar-link-grammar-5.8.0/bindings/python/linkgrammar.py", line 10, in <module>
    import linkgrammar.clinkgrammar as clg
ModuleNotFoundError: No module named 'linkgrammar.clinkgrammar'; 'linkgrammar' is not a package

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../../../bindings/python-examples/tests.py", line 25, in <module>
    from linkgrammar import (Sentence, Linkage, ParseOptions, Link, Dictionary,
  File "/Users/jackpark/Documents/workspaceEclipse/tqos-relex-kafka-agent/relex/link-grammar-link-grammar-5.8.0/bindings/python/linkgrammar.py", line 13, in <module>
    import clinkgrammar as clg
  File "/Users/jackpark/Documents/workspaceEclipse/tqos-relex-kafka-agent/relex/link-grammar-link-grammar-5.8.0/build/bindings/python/clinkgrammar.py", line 15, in <module>
    import _clinkgrammar
ImportError: dlopen(/Users/jackpark/Documents/workspaceEclipse/tqos-relex-kafka-agent/relex/link-grammar-link-grammar-5.8.0/build/bindings/python/.libs/_clinkgrammar.so, 2): Library not loaded: /usr/local/lib/liblink-grammar.5.dylib
  Referenced from: /Users/jackpark/Documents/workspaceEclipse/tqos-relex-kafka-agent/relex/link-grammar-link-grammar-5.8.0/build/bindings/python/.libs/_clinkgrammar.so
  Reason: image not found
FAIL tests.py (exit status: 1)

Linas Vepstas

unread,
Apr 18, 2020, 9:35:29 PM4/18/20
to link-grammar
If you don't need the python bindings, then just ignore the error :-) If you do need the python bindings ... Ooof.

I see "relex" in the paths .. do you really need/plan to use relex? Note that relex uses java, not python, so you can ignore python errors. Relex is .. old, and has not really been maintained. I think it works... but its .. old.

--linas

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAH6s0fzFNVuT5n4ogmGjkNS_c_sXTVVU%3DXysqCaMeyhhLKNXnQ%40mail.gmail.com.


--
cassette tapes - analog TV - film cameras - you

Amir Plivatsky

unread,
Apr 18, 2020, 9:42:31 PM4/18/20
to link-grammar
Hi Jack,
The test failed because the linkgrammar Python module uses the linkgrammar library from the system, which is not installed yet.
So in order to test the LG library you will need to install it. (Next time I touch it on macOS I'll try to fix that.)

Amir

Jack Park

unread,
Apr 18, 2020, 9:49:41 PM4/18/20
to link-g...@googlegroups.com
Thanks to Amir and to Linas. I am not using anything but Java. My plan is to run Relex in a kafka cluster alongside SpaCy and see what it takes to resolve their differences. I'll go ahead and install and see what happens. Many thanks!

--
You received this message because you are subscribed to a topic in the Google Groups "link-grammar" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/link-grammar/FMGJq5YhbME/unsubscribe.
To unsubscribe from this group and all its topics, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/260219c4-8411-40db-b1e3-34e835691a67%40googlegroups.com.

Linas Vepstas

unread,
Apr 18, 2020, 9:58:46 PM4/18/20
to link-grammar
Hi Jack,

Caution: relex is not nearly as accurate/correct as the base link-grammar it's working from. It's old, it's not been maintained, its almost surely making various false inferences. Performance-wise, its also probably a performance bottleneck. LG has gotten fast, relex adds a hefty overhead to that. To be clear: relex doesn't actually "do anything" -- it converts the LG stye parse to an alternate representation --  a dependency grammar that look kind-of-like the old stanford parser style. The conversion is .. imprecise. And, in many ways, just not really needed.

So if you really want to do head-to-head compares, use raw LG, and not relex. I'd like to hear about how it compares to SpaCy, I've never attempted to do any compares myself.  A writeup would be nice.

--linas

You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAH6s0fzq3FtDUEAkChvoOgbvuwCUGrtRgR_CHSjWUN79QSUWEQ%40mail.gmail.com.

Jack Park

unread,
Apr 18, 2020, 10:05:08 PM4/18/20
to link-g...@googlegroups.com
Good advice!
LG is running.
In the past, I did run LG alone, but never really got the hang of interpreting its output. I suppose it's time to dive into that. I have partial code from prior experiments I can build on.
Will report back.
Thanks!

Jack Park

unread,
Apr 19, 2020, 1:28:22 PM4/19/20
to link-g...@googlegroups.com
I have it running:
Starting Link Grammar Server at port 9000, with 1 available processing threads and  with default dictionary location.
link-grammar: Info: JNI: dictionary language 'en' version 5.8.0

I did a test with a simple HTTP client; it responded immediately but complained about the URL-encoded query.

I then switched to a new (Java) socket("localhost", 9000).

I send it the same message; it does nothing - no hint in the console, and the socket's inputStream is "not ready"

If I run the example from the readme
echo "text:this is a test" | nc localhost 9000
it runs just fine.

This has got to be a really simple error on my part.

What am I missing?
Thanks
Jack

Linas Vepstas

unread,
Apr 19, 2020, 2:27:17 PM4/19/20
to link-grammar
The link-grammar server doesn't serve HTTP ... nor does it generate HTML ...

That server was created for people who wanted to have a JSON-style output. It was created before we had any javascript bindings for LG.  To use the JSON, you's still have to get some json code from somewhere, and wire it up to do whatever it is you want to do.

Again, as a server, it just adds overhead: CPU overhead to send/receive packets, CPU overhead to write out json, cpu overhead to read the JSON back in ... this is OK, I guess, if you want to build some whizzy cloudy network-server thingy, but it does not offer any additional linguistic capabilities. (It was used in some State of Florida website, for something, for a while).

The meta question is: what do you want to actually do? Once you answer this question (for yourself, not necessarily for me) the next question is: how will you convert the LG output into your desired input? There are multiple solutions to this, but it depends on what input you need/want.

--linas



--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAH6s0fzEVVqwAwwsFKxNoJUFpU0aqFf-WN1qcRatnfSGFOoJ4w%40mail.gmail.com.

Jack Park

unread,
Apr 19, 2020, 3:03:12 PM4/19/20
to link-g...@googlegroups.com
The server use was based on having LG as a remote server in a network of readers.
Http was abandoned in favor of what the readme says: a TCP socket, but it's not responding to that.

So, I began experimenting with adapting code from Relex to just use JNI; I am still deep into debugging that.

In theory, I like the JSON idea because it's close to my internal transport mechanism; beyond that, taking apart the JNI results, as coded in the Relex LGParser appears reasonable. I am cloning and experimenting with that code, but am some distance from making it work.


You received this message because you are subscribed to a topic in the Google Groups "link-grammar" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/link-grammar/FMGJq5YhbME/unsubscribe.
To unsubscribe from this group and all its topics, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAHrUA36CBgG3uOVViGbWvoXUx%3DrQ-YoXOJho9uiejGssfv3miA%40mail.gmail.com.

Linas Vepstas

unread,
Apr 19, 2020, 3:49:50 PM4/19/20
to link-grammar
On Sun, Apr 19, 2020 at 2:03 PM Jack Park <jack...@topicquests.org> wrote:
The server use was based on having LG as a remote server in a network of readers.
Http was abandoned in favor of what the readme says: a TCP socket, but it's not responding to that.

You may need to  append a newline, not sure. Make sure you flush the socket -- tcp can sometimes just buffer up, and not send anything until you explicitly force it to send. Also, I think the server is designed to hang up after every sentence, so one way to flush would be to close the outgoing socket of the pair, and wait for a reply on the incoming socket, and then close that.

--linas

Jack Park

unread,
Apr 19, 2020, 4:21:08 PM4/19/20
to link-g...@googlegroups.com
Appended "\n" to the string. Already flushed the output. Added a wait loop and sure enough, it returned. The wait loop was never hit. Not finished yet.
It returned with an int 88 followed by a basically empty JSON string - empty links.
Ran the very same sentence in a console with echo and it returned 2542 followed by a very large JSON string.

Trying to sort out why echo gets a full parse, but the socket doesn't.

Ran it one more time on a much longer sentence; got the same 88 and no links.

Turned out closing the output closed the socket.

Linas Vepstas

unread,
Apr 19, 2020, 4:29:28 PM4/19/20
to link-grammar
On Sun, Apr 19, 2020 at 3:21 PM Jack Park <jack...@topicquests.org> wrote:
Appended "\n" to the string. Already flushed the output. Added a wait loop and sure enough, it returned. The wait loop was never hit. Not finished yet.
It returned with an int 88 followed by a basically empty JSON string - empty links.

88 is the number of bytes that follow, I think.

Ran the very same sentence in a console with echo and it returned 2542 followed by a very large JSON string.

Trying to sort out why echo gets a full parse, but the socket doesn't.

I think it expects "text:" as the first five bytes, and then searches for a newline. Without the newline, it hangs... I just now inserted some extra newlines, but they did not seem to maek a difference.

--linas

Linas Vepstas

unread,
Apr 19, 2020, 4:30:25 PM4/19/20
to link-grammar
Also I get 88 if I do this:

echo "text:" | nc localhost 9000
88
"numSkippedWords":0,"linkages":[],"version":"link-grammar-5.8.0","dictVersion":"5.8.0"}

Jack Park

unread,
Apr 19, 2020, 4:36:07 PM4/19/20
to link-g...@googlegroups.com
This is the text

"text:this is a sentence to parse"

This is how I sent it
out.writeUTF(QUERY+"\n");

The addition of the newline made it work in the first place. Adding two more newlines changed nothing. It's as if it did not send that.
I am using DataOutputStream.writeUTF instead of a Printwriter

I might return to a PrintWriter to see if anything changes



Jack Park

unread,
Apr 19, 2020, 4:40:25 PM4/19/20
to link-g...@googlegroups.com
Switching to a PrintWriter changed everything. Seems to be working now.
Many thanks for the help, Linas.
I'm off now to write an interpreter to pull out what I need.

Linas Vepstas

unread,
Apr 19, 2020, 4:48:42 PM4/19/20
to link-grammar
Oh. Java. Right. I remember java, but not fondly. I do recall it had a very persnickety, fragile sockets interface.  Don't know why, because sockets are supposed to be dirt-easy to use, but they managed to create something that had unlimited quantities of unexpected behaviors.  Sorry.

--linas

Jack Park

unread,
Apr 19, 2020, 5:05:47 PM4/19/20
to link-g...@googlegroups.com
I feel the same way about Python, sometimes 😉

No apologies necessary; we're all having fun here.


Jack Park

unread,
Apr 23, 2020, 9:14:07 PM4/23/20
to link-g...@googlegroups.com
I originally started to build an agent which talks to LG over TCP and to my OpenSherlock ecosystem over Kafka, but decided to break the LG client out to its own library.

I have it running, including an experimental interpreter which takes parses and constructs an array of Feature objects, some of which are an individual "word" from the parser, some of which span whole phrases. This is on a path towards mimicking what SpaCy does, though I am not all that certain anymore that full SpaCy replication is either necessary or desirable; I'm pretty happy with what it is producing.

In doing the deep read on how the parser works, I see that it mirrors many of my thoughts on how an anticipatory text reader should work; working with LG is being quite entertaining. I imagine something like that which takes entire subjects found while reading forming links which get satisfied later in the document.

My plan, now that it is giving satisfactory  results on the same sentences I ran through SpaCy and through my earlier linguistic-rules-based reader, I shall put it up at github, perhaps tomorrow or this weekend. Still, a huge amount of hacking left on it.

But, my real dream is to do the same with JNI and build a tiny bean which can talk to Kafka and run in a server farm of Raspi4s.

More soon.

Linas Vepstas

unread,
Apr 24, 2020, 6:02:15 PM4/24/20
to link-grammar
On Thu, Apr 23, 2020 at 8:14 PM Jack Park <jack...@topicquests.org> wrote:

In doing the deep read on how the parser works, I see that it mirrors many of my thoughts on how an anticipatory text reader should work; working with LG is being quite entertaining.

The parser itself is sentence-by-sentence, and is not "anticipatory" word-by-word. For a while I had an idea that one could/should do a word-by-word parser, I called this the "viterbi parser" because that's what viterbi does, but soon came to the conclusion that this offers no benefits, and mostly disadvantages.  We could debate this, if interested, but really, it was about what's faster, and what controls combinatorial explosion better.
 
I imagine something like that which takes entire subjects found while reading forming links which get satisfied later in the document.

Well, but of course. I can sketch this in greater detail. But first, a baby-step or two. So, there is this idea that "words have meanings" and that a word can have multiple meanings. It turns out that you can "guess" word meanings statistically, if you have access to a dictionary (e.g. to wordnet).  The canoncial example is "I heard the church bells ring on Sunday", and you can infer that "ring" means "sound" and not "ring on finger" because of "hear". You can infer that "church" means "the building" and not "the abstract institution" because the abstract institution doesn't have bells. It also cannot be localized to a point in time: "sunday". stuff like that.

So what I'd always wanted/hoped/like to do would be to build up a network of possible inter-relationships between different word-meanings, assign a probability weight to each link, and then crank on the network,  raising or lower the probability on each link based on evidence of the surrounding links, eventually arriving at a most-like interpretation or meaning-assignment.

The AtomSpace, if this is not yet clear, is an apparatus for storing such networks. It allows links to be created between things, and weights to be assigned to those links, and then an assortment of tools for walking over that graph, updating link-strengths, and other such generic operations.

Of course, the neural-net systems, starting with word2vec and getting progressively more sophisticated, already so "something like this", but they obfuscate the network, and they make any number of other fundamental flaws. I've tried to sketch out some more principled, more reasonable ways of going about doing this, but have had trouble gaining an audience. So a bit stuck, dead-in-the-water, for just right now.

-- Linas

Jack Park

unread,
Apr 24, 2020, 8:08:14 PM4/24/20
to link-g...@googlegroups.com
Thanks, Linas, for those comments. I'd like to just make a small set of comments here and change the subject because of that.

Long before I ever heard of AtomSpace, I created an experiment as part of some dissertation research following defending a thesis proposal which mentioned "anticipatory story reading".

I decided - for reasons which now might seem mysterious - that a wordgram (n-gram with words) network might provide a kind of long-term memory for reading. So, that project became OpenSherlock after a while. Each WordGram stores a record of the sentenceId in which it was detected, giving you word frequencies in context; each WordGram has edges corresponding to sentenceId (cardinality of that list would serve the same purpose), but, while terminals - single words are the initial graph, after parsing, pairs, triples, etc, replace their terminals in the graph, and those pairs, triples, etc, also have usage cardinalities.

I was once asked to compare my code's performance with some trials someone did with the OpenIE jar. They are not really comparable because OpenIE does things which my ASR component (anticipatory story reader) does, and vice versa, but the report is here

More recently, I did discover AtomSpace, and as you might recall, started interpreting its code into Java. That was an enormously useful exercise. But, the road ahead for OpenSherlock is not yet cast in any concrete; I remain interested in things like Datalog, Prolog, and other technologies which need to be explored; the central concept in OpenSherlock is a topic map, a highly specialized knowledge graph.

The thesis behind ASR is that expectations form first when a query finds a paper, then is continuously embellished and refined while reading.

Since WordGrams are identified by the numeric identities of the words they contain, those identities make them content addressable.

I have a friend experimenting with doing topic modeling not on words but on wordgram identifiers. We shall see what that produces.

Just a half Euro for the day.
Cheers,
Jack

--
You received this message because you are subscribed to a topic in the Google Groups "link-grammar" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/link-grammar/FMGJq5YhbME/unsubscribe.
To unsubscribe from this group and all its topics, send an email to link-grammar...@googlegroups.com.

Linas Vepstas

unread,
Apr 24, 2020, 9:30:25 PM4/24/20
to link-grammar
Hi Jack!

On Fri, Apr 24, 2020 at 7:08 PM Jack Park <jack...@topicquests.org> wrote:
Thanks, Linas, for those comments. I'd like to just make a small set of comments here and change the subject because of that.

Long before I ever heard of AtomSpace, I created an experiment as part of some dissertation research following defending a thesis proposal which mentioned "anticipatory story reading".

I decided - for reasons which now might seem mysterious - that a wordgram (n-gram with words) network might provide a kind of long-term memory for reading.

Not unreasonable. This is the original idea behind corpus linguistics. Which was talking about  word-grams since the IBM PC era, where you actually had a practical tool that you could use. The idea is even older -- biblical concordances, hand-assembled by scholastic monks in medieval times. Bit tedious though, doing it by hand :-)

So, that project became OpenSherlock after a while. Each WordGram stores a record of the sentenceId in which it was detected, giving you word frequencies in context; each WordGram has edges corresponding to sentenceId (cardinality of that list would serve the same purpose), but, while terminals - single words are the initial graph, after parsing, pairs, triples, etc, replace their terminals in the graph, and those pairs, triples, etc, also have usage cardinalities.

Well, the insight I keep offering up is that you can gain a lot of power and insight by using parses instead of n-grams. For example, cataloging sentences which differ only in some adjective, or cataloguing sentences where a pair of words are separated by a very long modifying phrases. For example, "the dog ran in the park" and "the dog, a black cocker spaniel, ran in the park" are effectively the same sentence, and that's easy to catch in a parse, but hard to catch with n-grams. (unless you use sliding-window skip-grams, yadda, yadda... which then begs the question "why the complexity?")

I was once asked to compare my code's performance with some trials someone did with the OpenIE jar. They are not really comparable because OpenIE does things which my ASR component (anticipatory story reader) does, and vice versa, but the report is here

More recently, I did discover AtomSpace, and as you might recall, started interpreting its code into Java.

Ugh. Don't do that. Waste of time.

That was an enormously useful exercise.

Unless, of course, it's intellectually satisfying!

But, the road ahead for OpenSherlock is not yet cast in any concrete; I remain interested in things like Datalog, Prolog, and other technologies which need to be explored; the central concept in OpenSherlock is a topic map, a highly specialized knowledge graph.

Well, software design is about making choices. datalog is interesting, but there is no "natural" way for it to store probabilities (or other numbers or tagging information) So that's how I got to the atomspace -- all other choices (including neo4j, etc.) seemed lacking and incapable and unpowerful  (and too hard to use).

prolog is interesting, but it only does crisp logic. Anyway, those old prolog backward-forward chainer technology is obsolete; its been replaced by answer-set programming (ASP), which looks just like prolog (is notationally the same, it looks just like prolog) but uses the new fast SAT solvers instead.

Since prolog/ASP only work for crisp logic .. no one has done a probabalistic-programming version of prolog. Now, PLN kind-of-ish tried to be that, but PLN is not done yet.  A whizzy probabilisitc programming system is pryo: http://pyro.ai/examples/intro_part_i.html but actually, I think probabilistc programming is stupid and boring, but that's a different  topic. (well, OK, probabilistic prolog would be interesting but .. some other day).

My own interests are about automatically discovering via unsupervised learning, all of these network relationships... so step one is to kind of throw all pre-existing structures out the window, as the goal is to find them ab initio.

-- linas

The thesis behind ASR is that expectations form first when a query finds a paper, then is continuously embellished and refined while reading.

Since WordGrams are identified by the numeric identities of the words they contain, those identities make them content addressable.

I have a friend experimenting with doing topic modeling not on words but on wordgram identifiers. We shall see what that produces.

Just a half Euro for the day.
Cheers,
Jack

On Fri, Apr 24, 2020 at 3:02 PM Linas Vepstas <linasv...@gmail.com> wrote:


On Thu, Apr 23, 2020 at 8:14 PM Jack Park <jack...@topicquests.org> wrote:

In doing the deep read on how the parser works, I see that it mirrors many of my thoughts on how an anticipatory text reader should work; working with LG is being quite entertaining.

The parser itself is sentence-by-sentence, and is not "anticipatory" word-by-word. For a while I had an idea that one could/should do a word-by-word parser, I called this the "viterbi parser" because that's what viterbi does, but soon came to the conclusion that this offers no benefits, and mostly disadvantages.  We could debate this, if interested, but really, it was about what's faster, and what controls combinatorial explosion better.
 
I imagine something like that which takes entire subjects found while reading forming links which get satisfied later in the document.

Well, but of course. I can sketch this in greater detail. But first, a baby-step or two. So, there is this idea that "words have meanings" and that a word can have multiple meanings. It turns out that you can "guess" word meanings statistically, if you have access to a dictionary (e.g. to wordnet).  The canoncial example is "I heard the church bells ring on Sunday", and you can infer that "ring" means "sound" and not "ring on finger" because of "hear". You can infer that "church" means "the building" and not "the abstract institution" because the abstract institution doesn't have bells. It also cannot be localized to a point in time: "sunday". stuff like that.

So what I'd always wanted/hoped/like to do would be to build up a network of possible inter-relationships between different word-meanings, assign a probability weight to each link, and then crank on the network,  raising or lower the probability on each link based on evidence of the surrounding links, eventually arriving at a most-like interpretation or meaning-assignment.

The AtomSpace, if this is not yet clear, is an apparatus for storing such networks. It allows links to be created between things, and weights to be assigned to those links, and then an assortment of tools for walking over that graph, updating link-strengths, and other such generic operations.

Of course, the neural-net systems, starting with word2vec and getting progressively more sophisticated, already so "something like this", but they obfuscate the network, and they make any number of other fundamental flaws. I've tried to sketch out some more principled, more reasonable ways of going about doing this, but have had trouble gaining an audience. So a bit stuck, dead-in-the-water, for just right now.

-- Linas

--
cassette tapes - analog TV - film cameras - you

--
You received this message because you are subscribed to a topic in the Google Groups "link-grammar" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/link-grammar/FMGJq5YhbME/unsubscribe.
To unsubscribe from this group and all its topics, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAHrUA35peGCH9b1Y8NHwAnymE7%3DkBJWQpNNAMLBuiLVkVd0krg%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAH6s0fx9htJWpjtxfc0Rkhj2gU7tFn9D%3DQ0YUFPqfD-7t4bxTw%40mail.gmail.com.

Jack Park

unread,
Apr 24, 2020, 11:49:59 PM4/24/20
to link-g...@googlegroups.com
Linas, I believe are orbiting the very same attractor basin, albeit in different orbits. Couple of comments before I annotate below:

There is something called problog - a probabilistic prolog

And, there is the abandoned (due to tragic loss of the developer) probabilistic datalog buried in the Pire project
Since it was abandoned, I secured kind permission to resurrect the codebase and mavenize it; it is here:

On Fri, Apr 24, 2020 at 6:30 PM Linas Vepstas <linasv...@gmail.com> wrote:
Hi Jack!

On Fri, Apr 24, 2020 at 7:08 PM Jack Park <jack...@topicquests.org> wrote:
Thanks, Linas, for those comments. I'd like to just make a small set of comments here and change the subject because of that.

Long before I ever heard of AtomSpace, I created an experiment as part of some dissertation research following defending a thesis proposal which mentioned "anticipatory story reading".

I decided - for reasons which now might seem mysterious - that a wordgram (n-gram with words) network might provide a kind of long-term memory for reading.

Not unreasonable. This is the original idea behind corpus linguistics. Which was talking about  word-grams since the IBM PC era, where you actually had a practical tool that you could use. The idea is even older -- biblical concordances, hand-assembled by scholastic monks in medieval times. Bit tedious though, doing it by hand :-)

Indeed! I have it automated.

So, that project became OpenSherlock after a while. Each WordGram stores a record of the sentenceId in which it was detected, giving you word frequencies in context; each WordGram has edges corresponding to sentenceId (cardinality of that list would serve the same purpose), but, while terminals - single words are the initial graph, after parsing, pairs, triples, etc, replace their terminals in the graph, and those pairs, triples, etc, also have usage cardinalities.

Well, the insight I keep offering up is that you can gain a lot of power and insight by using parses instead of n-grams. For example, cataloging sentences which differ only in some adjective, or cataloguing sentences where a pair of words are separated by a very long modifying phrases. For example, "the dog ran in the park" and "the dog, a black cocker spaniel, ran in the park" are effectively the same sentence, and that's easy to catch in a parse, but hard to catch with n-grams. (unless you use sliding-window skip-grams, yadda, yadda... which then begs the question "why the complexity?")

That concept, the idea of pulling out the triples is what I do.


I was once asked to compare my code's performance with some trials someone did with the OpenIE jar. They are not really comparable because OpenIE does things which my ASR component (anticipatory story reader) does, and vice versa, but the report is here

More recently, I did discover AtomSpace, and as you might recall, started interpreting its code into Java.

Ugh. Don't do that. Waste of time.

That was an enormously useful exercise.

Unless, of course, it's intellectually satisfying!

Exactly! It was very satisfying to "get inside the heads of AtomSpace developers" and figure out what is going on.


But, the road ahead for OpenSherlock is not yet cast in any concrete; I remain interested in things like Datalog, Prolog, and other technologies which need to be explored; the central concept in OpenSherlock is a topic map, a highly specialized knowledge graph.

Well, software design is about making choices. datalog is interesting, but there is no "natural" way for it to store probabilities (or other numbers or tagging information) So that's how I got to the atomspace -- all other choices (including neo4j, etc.) seemed lacking and incapable and unpowerful  (and too hard to use).

prolog is interesting, but it only does crisp logic. Anyway, those old prolog backward-forward chainer technology is obsolete; its been replaced by answer-set programming (ASP), which looks just like prolog (is notationally the same, it looks just like prolog) but uses the new fast SAT solvers instead.

Since prolog/ASP only work for crisp logic .. no one has done a probabalistic-programming version of prolog. Now, PLN kind-of-ish tried to be that, but PLN is not done yet.  A whizzy probabilisitc programming system is pryo: http://pyro.ai/examples/intro_part_i.html but actually, I think probabilistc programming is stupid and boring, but that's a different  topic. (well, OK, probabilistic prolog would be interesting but .. some other day).

My own interests are about automatically discovering via unsupervised learning, all of these network relationships... so step one is to kind of throw all pre-existing structures out the window, as the goal is to find them ab initio.

That has been the goal of OpenSherlock, though it is cast in the shadow of Watson, so there is an interactive tell-ask component which engages with wetware.

Linas Vepstas

unread,
Apr 25, 2020, 12:39:19 AM4/25/20
to link-grammar
On Fri, Apr 24, 2020 at 10:50 PM Jack Park <jack...@topicquests.org> wrote:

There is something called problog - a probabilistic prolog

Cool! Nice notation, too. 


Exactly! It was very satisfying to "get inside the heads of AtomSpace developers" and figure out what is going on.

Funny, I was trying to do the same thing. :-) its the end result of "following one's nose", taking the small, obvious steps at the time that they seem most obvious, not-overthinking it, and seeing where I ended up.  Ended up in some very interesting places. Frustrated, a bit, that its not popular, and that the rest of the world lags behind by 5-10 years, but the rest of the world seems to otherwise be on a similar trajectory.

Don't know what it would take to increase it's popularity. Well, I do know, but perhaps I'm hoping someone else will do that work. Life is short, one must prioritize.
 


But, the road ahead for OpenSherlock is not yet cast in any concrete; I remain interested in things like Datalog, Prolog, and other technologies which need to be explored; the central concept in OpenSherlock is a topic map, a highly specialized knowledge graph.


That has been the goal of OpenSherlock, though it is cast in the shadow of Watson, so there is an interactive tell-ask component which engages with wetware.

Sure. We had an opencog question-answering chatbot 10 years ago; it bit-rotted.   Let me use that for a rambling story.  Why did it bit-rot? No one understood how it worked, no one wanted to continue enhancing it. Why? Because it revealed certain short-comings in the fundamental design. Despite this, I felt it should have been maintained anyway, despite it's short-comings, because it could serve as a show-case product that would attract attention, even if, under the covers, it was flawed.  Getting attention is good. It builds excitement. For open source, it attracts volunteers and users.

Instead, we fell back into doing more scientific research, and more tool-building. But this is non-glamorous.  Here's a tool-building example. we could take problog, the probabilistic prolog above, and provide an atomspace wrapper for it. That would instantly and immediately open up the possibility of doing symbolic reasoning in a reliable robust way in the atomspace.  Wow! awesome! So why not do it?

OK, well who would use it? Oh, ah, someone who needs it. But who? Where does the data come from? Oh, the language subsystem. Oh, but what's the status of that? Err ...The output of link-grammar is too low-level to do concept-level reasoning on. Sooo.. well, we had relex2logic, but that is unfinished .. and fragile, .. fragile in the sense that all hand-built systems are fragile. So why integrate problog into the atomspace, again?

The fragility is why I'm ignoring those paths, and trying to do grammar learning.  Unfortunately, I'm distracted .. and unpaid. I'm looking for funding, grants. Recommendations?

-- Linas




Jack Park

unread,
Apr 25, 2020, 10:36:07 AM4/25/20
to link-g...@googlegroups.com
"Unfunded and looking for grants" R us.

Marc-Antoine Parent installed proplog on my macbook when he visited last year,  including booting it in jupyter, but I haven't spent much time with iti; I have the criteria that it must be able to store facts and rules on a scale out database and reason across that; few prologs do that. The PDatalog platform uses mysql, and I am modifying it to use postgres. I had in mind that it might be possible to maintain some kind of bayesian belief network across the entire wordgram graph.

I do revisit my notes on AtomSpace from time to time, mostly out of an interest to see if one of a) it can do what I want, or b) I can borrow from it where useful, or c) there's a hybrid lurking between the two.

Over


Linas Vepstas

unread,
Apr 25, 2020, 5:40:18 PM4/25/20
to link-grammar
On Sat, Apr 25, 2020 at 9:36 AM Jack Park <jack...@topicquests.org> wrote:

I do revisit my notes on AtomSpace from time to time, mostly out of an interest to see if one of a) it can do what I want, or b) I can borrow from it where useful, or c) there's a hybrid lurking between the two.

Anything you can store in datalog, you can store in the atomspace. The meta-questions are: is it more compact? easier to use? faster? what's the API?

atomspace does have a postgres backend, but performance is so-so. You need SSD hard drives, not spinning disks, to get OK performance, (so clearly SATA and the disk drive itself is the bottleneck) but even then, performance is disappointing. The atomspace works best when everything fits in RAM -- loading up that RAM, and saving it back to disk is the bottleneck.  (I do incremental load/save, as data is touched) As someone recently noted: saving smaller datasets to file, as ascii strings, and loading those up is 10x faster than postgres. By "smaller files" I mean gigabyte-sized.

re: "do what I want" ... well, that depends. Right now, the #1 strongest most versatile aspect of it is the graph query subsystem. You can search for arbitrarily complex graphs, and that works fast, bug-free. It's gotten to be very mature - all the bells & whistles, whiz-bang features, all the corner cases work well. Beyond that ... I don't know what you need/want.

--linas

Jack Park

unread,
Apr 25, 2020, 7:57:12 PM4/25/20
to link-g...@googlegroups.com
Perhaps this thread belongs over in the opencog list rather than hear; it's fine with me either way. Meanwhile, this is interesting.

On Sat, Apr 25, 2020 at 2:40 PM Linas Vepstas <linasv...@gmail.com> wrote:


On Sat, Apr 25, 2020 at 9:36 AM Jack Park <jack...@topicquests.org> wrote:

I do revisit my notes on AtomSpace from time to time, mostly out of an interest to see if one of a) it can do what I want, or b) I can borrow from it where useful, or c) there's a hybrid lurking between the two.

Anything you can store in datalog, you can store in the atomspace. The meta-questions are: is it more compact? easier to use? faster? what's the API?


In my universe of design and thinking, there are two primary stores - perhaps they could be just one; they are:

A topic map, which is a graph, but also in which the edges can also be vertices (fully addressable topics); since links extend atoms in AtomSpace, I saw that as a bonus.

A wordgram graph, where the edges are just edges.

I like that the wordgram graph provides enough information to support probabilistic reasoning; since many of the vertices in the wordgram graph are affiliated with topics in the topic map, there may emerge the ability to perform probabilistic reasoning over the topic map. Too soon (for me) to tell.

In that universe, I write external code which manipulates those two graphs.

There may be a sense in which datalog or prolog could fuse code and data.


atomspace does have a postgres backend, but performance is so-so. You need SSD hard drives, not spinning disks, to get OK performance, (so clearly SATA and the disk drive itself is the bottleneck) but even then, performance is disappointing. The atomspace works best when everything fits in RAM -- loading up that RAM, and saving it back to disk is the bottleneck.  (I do incremental load/save, as data is touched) As someone recently noted: saving smaller datasets to file, as ascii strings, and loading those up is 10x faster than postgres. By "smaller files" I mean gigabyte-sized.

I have never been a big fan of in-memory reasoning across massive graphs; perhaps mostly because I don't like buying the computers to support that, though I have a 2 U dual xeon box with 64gb ram just for such experiments.

I'm much more interested in sorting out paging algorithms which could make it possible to speed up inferencing across terabytes of graph data, or splitting up the task in the fashion of hadoop. But, in truth, for now, my focus is on getting algorithms working which reliably do the task at any cost. One simple speed up for the time being is to create a tiny raspi server farm of LG parsers and just pass sentences at them till I run out of sentences to read - which might never happen.


re: "do what I want" ... well, that depends. Right now, the #1 strongest most versatile aspect of it is the graph query subsystem. You can search for arbitrarily complex graphs, and that works fast, bug-free. It's gotten to be very mature - all the bells & whistles, whiz-bang features, all the corner cases work well. Beyond that ... I don't know what you need/want.

 Graph search is useful. I don't rule out SQL as well for some tasks. If I am not mistaken, VLog is a datalog which runs on a column database and offers SQL as well. So many choices, but for now, the game is to make the algorithms do what I need, then worry about a refactor to something better. I have not ruled out AtomSpace, but am not yet ready to mount that learning curve. Doing an experimental transliteration to Java was still useful.

Linas Vepstas

unread,
Apr 25, 2020, 10:48:19 PM4/25/20
to link-grammar
On Sat, Apr 25, 2020 at 6:57 PM Jack Park <jack...@topicquests.org> wrote:
Perhaps this thread belongs over in the opencog list rather than hear;

May as well continue here.

A topic map, which is a graph, but also in which the edges can also be vertices (fully addressable topics);

we call this the hypergraph. In addition to storing the edges/vertcies, it also:
* stores a type
* stores back-links to everything pointing at it, thus making graph-traversal possible
* stores a key-value lookup table, allowing arbitrary info to be associated w/ each edge/vertex
* atoms are globally unique (so you can always find it, even if the only thing you know is it's name) (just like textbook examples of SQL table "employee", so you don't have multiple entries for one person)
* many/most predefined types that know how to "do things", e.g. "plus" knows how to actually add numbers. Although most are far more abstract. In this sense, "plus" is both a node in a graph (and so you can do symbolic algebra with it) but its also executable (so you can plug numbers in and get a result) -- for the more complex examples, the "programming API" just also happens to be "a graph", so you can work with the API directly.  

If you don't need one or more of the above, then .. hard-coding in java or favorite-language of choice will be ... I dunno .. easier? faster?

 


I have never been a big fan of in-memory reasoning across massive graphs;
...
I'm much more interested in sorting out paging algorithms which could make it possible to speed up inferencing across terabytes of graph data,
...

I don't have time for that. There are plenty of gear-heads who know how to write code that does this. Let them do the work. I'd rather work on the stuff that no one else understands :-/ which, sadly, means that, well .. no one else understands it so they go WTF???

getting algorithms working which reliably do the task at any cost.

Bingo!

One simple speed up for the time being is to create a tiny raspi server farm of LG parsers

Yeah, I doubt that. current LG, on a 6-year-old computer, can crank through the 1000 sentences in the unit test in 3 seconds. The 5K sentences in the extended unit test in 10 seconds. The 6K rather, long, complex, sometimes-archaic English of another century found in Pride&Prejudice in under 5 minutes.  I'm certain that LG will never be the bottleneck in your pipeline.  I mean, if youre not careful, simply pushing that text through java might take longer. Doing whatever analysis you need will surely take longer.


 Graph search is useful. I don't rule out SQL as well for some tasks.

The problem with SQL is you must define your tables before you even start. Another problem with SQL is that every row in the table must be exactly the same. This prevents ad-hoc, on-the-fly generic data. This is partly why graph databases are popular: you can create ad-hoc, on-the-fly graphs. Otherwise, sure SQL could have been enough, in some super-abstract sense.

So, in prolog/datalog, you can define on-the-fly tree-shaped graphs. (under the covers, that's what the atomspace does to: they're "hypergraphs".) You can have arbitrary, ad-hoc shapes.  The biggest problem with prolog is the lack of a type system. In SQL, you can declare things to be an int or a string or other types. Compare to java/c++ where every object is a type.  There's no OO-prolog.  The atomspace would look like prolog, if we threw away the type system. The type system makes it hard to map to/from prolog.  Similar-but-different considerations for javascript.

Another issue is with uniqueness: in prolog, if you create a node called "joe" and use it in two trees, but you want both to refer to the same node, well, you can, but the way prolog does this is that it shoves it into a symbol table.  The atomspace is a glorified symbol table.

... and prolog's symbol table is the "actual database". And that symbol table is in-RAM, and not paged/hadooped/whatevered. And, depending on the version of prolog, you're not going to easily map that symbol table onto some other, external distributed, scalable system.

The atomspace would not exist, if there was a "typed prolog", that offered access to the symbol table. Or if there was a typed javascript that offered access to the symbol table. I'd point at haskell or CamL, but those languages have utterly, totally and compeltey forgotten about the existance of data. They pretend that data doesn't exist. They think that programming is all ther is to life.

At least prolog was very aware of data. Unfortunately, prolog is old, and it predates modern concepts like object-orientation, closure, type systems, type inference, functional programming.  The atomspace is kind-of-ish prolog, with these modern concepts added.

And that's why the atomspace is the weird frankenstein that it is: a glorified symbol table. With a type system. With a functional programming style. The data is querieable like SQL, but the data can be any ad-hoc (hyper-)graph. It's javascripty, except when it isn't... 

I dislike promoting competitors, but the grakn.ai system has taken some baby-steps in this general direction. I'm envious that they are far more popular/funded/suported/used than the atomspace.

--linas

Jack Park

unread,
Apr 26, 2020, 11:05:24 AM4/26/20
to link-g...@googlegroups.com
While testing the parser on biomedical sentences, the phrase "type 2 diabetes mellitus" becomes problematic.

An apparent reason for that is this: the vocabulary only knows "type.v" and not "type.n" as evidenced in what the parse returns.

It's not obvious that I can simply add "type.n" to words.n.1-const or any of the other noun collections without rebuilding.

What is the process associated with augmenting the vocabulary?

Thanks in advance.
Jack

Jack Park

unread,
Apr 26, 2020, 11:24:50 AM4/26/20
to link-g...@googlegroups.com
That's a useful observation. I wonder if it has anything to do with the fact that they give appearances more of offering a platform aimed at solving real-world problems as compared to a platform for language modeling research?

Their landing page reads rather differently from that of the AtomSpace landing page.  They make results-oriented promises; I don't see that on the AtomSpace landing page.

Their landing page appears to be the work of skilled marketing types; the AtomSpace landing page appears to be the work of, well, not skilled marketing types. Their top nav bar says Products, Solutions, Use Cases, Community, ...; AtomSpace is a MediaWiki, one very familiar to developers, but not to business oriented people.

What if AtomSpace ignored all the cool buzz words and opened with problems it can solve, ways people can start using it right out of the box without building it, tuning it, etc?

Would you, as a scientist, be able to live with wall-street-oriented thinkers taking over your pet projects and packaging them up for far less-skilled consumers?

I think this is an issue faced by a lot of us.

I studied Grakn carefully; the front story is one, for me, of some attraction to the apparent simplicity of the system; the back story, for me, is that Grakn appears optimized in ways which get in the way of allowing me to push it in ways I believe it should be pushed; a problem of ontological commitments I cannot undo, so I just walk away.

For a really interesting case for comparison, take a look at https://opencrux.com/

On Sat, Apr 25, 2020 at 7:48 PM Linas Vepstas <linasv...@gmail.com> wrote:
<snip>

Linas Vepstas

unread,
Apr 26, 2020, 1:11:19 PM4/26/20
to link-grammar
Quite right.  We've never had anyone interested in marketing opencog participate in the project. This is a chicken-and-egg issue.But also $$$ -- marketing types work for money, developers do stuff cause they feel like it. its pre-game-b .

--linas

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.

Linas Vepstas

unread,
Apr 26, 2020, 2:04:02 PM4/26/20
to link-grammar, opencog
About opencrux... below,

On Sun, Apr 26, 2020 at 10:24 AM Jack Park <jack...@topicquests.org> wrote:
That's a useful observation. I wonder if it has anything to do with the fact that they give appearances more of offering a platform aimed at solving real-world problems as compared to a platform for language modeling research?

Their landing page reads rather differently from that of the AtomSpace landing page.  They make results-oriented promises; I don't see that on the AtomSpace landing page.

Their landing page appears to be the work of skilled marketing types; the AtomSpace landing page appears to be the work of, well, not skilled marketing types. Their top nav bar says Products, Solutions, Use Cases, Community, ...; AtomSpace is a MediaWiki, one very familiar to developers, but not to business oriented people.

What if AtomSpace ignored all the cool buzz words and opened with problems it can solve, ways people can start using it right out of the box without building it, tuning it, etc?

Would you, as a scientist, be able to live with wall-street-oriented thinkers taking over your pet projects and packaging them up for far less-skilled consumers?

I think this is an issue faced by a lot of us.

Quite right.  We've never had anyone interested in marketing participate in the project. Marketing is a skill, and unlike open source, there is no "open marketing", so you have to pay these people actual $$$ to get slick content. 

I studied Grakn carefully; the front story is one, for me, of some attraction to the apparent simplicity of the system; the back story, for me, is that Grakn appears optimized in ways which get in the way of allowing me to push it in ways I believe it should be pushed; a problem of ontological commitments I cannot undo, so I just walk away.

For a really interesting case for comparison, take a look at https://opencrux.com/

Well, care to be specific? I suppose it's possible to put a prolog/datalog API on top of the atomspace. I'm sufficiently removed from that world that I would not want to even begin without a lot of arm-twisting, but I can consult.

I'm not sure what to say about other things. Bitemporality? We have a concept of a space-time server, optimized for dealing with time .. and space coordinates. It's neglected...

Document-graph? Sure, cause why not? Seems easy to me...

Back-ends other than postgres? That's interesting. It's not "hard" - its not conceptually hard, and also the infrastructure/API is already in place. But it does require some fair amount of slogging. One would have to be enthusiastic about it. 

Datalog queries? I'm certain that anything you can say in datalog, you can say in Atomese, and so its a matter of writing a converter/translator that accepts datalog as input and spews atomese as output. How hard is this? No clue. You've actually messed with this kind of stuff, you'd know better.  Again -- everything that is a "symbol" in prolog is an atom in atomese.  The atomspace is pretty much nothing more than a glorified symbol table. 

... and this seems to be the key insight that the opencrux developers have made: symbol tables are (ad-hoc, in-RAM, poorly-designed) databases. Rip out the ad-hoc symbol table of programming language XYZ, replace it with a real, actual database, and wow ... off we go on a wild ride. I'm trying to think of what programming languages XYZ this could be the most interesting for... where you'd get the most bang-for-the-buck. Maybe prolog -- maybe that is the lesson from opencux ?

Care to suggest an easily-hackable version of prolog/datalog on which this experiment could be done? Just for the heck of it?

I mean, you could do this trick for python or javascript ... I don't think the python community would accept it, it would be too crazy and weird for them. The javascript folks might .. but they already have some pretty decent infrastructure already, so they don't need something this low-level. They've done this integration at a higher level, already. (programming in javascipt is far more mind-expanding than python. Python shuts down your world-view, narrows your thinking. Blinds you to possibilities. javascript does the opposite.)

--linas

On Sat, Apr 25, 2020 at 7:48 PM Linas Vepstas <linasv...@gmail.com> wrote:
<snip>
I dislike promoting competitors, but the grakn.ai system has taken some baby-steps in this general direction. I'm envious that they are far more popular/funded/suported/used than the atomspace.

--linas

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.

Linas Vepstas

unread,
Apr 26, 2020, 2:23:51 PM4/26/20
to link-grammar
On Sun, Apr 26, 2020 at 10:05 AM Jack Park <jack...@topicquests.org> wrote:
While testing the parser on biomedical sentences, the phrase "type 2 diabetes mellitus" becomes problematic.

What's problematic about it?  I get this:

               +---------------Os--------------+
               |      +-----------AN-----------+
    +--->WV--->+      |   +---------AN---------+
    +->Wd--+-Ss+      |   |       +-----AN-----+
    |      |   |      |   |       |            |
LEFT-WALL he has.v type.n 2 diabetes.n-u mellitus[!].a

and I can only guess that you must have wanted this:

               +---------------Os--------------+
               |      +-----------AN-----------+
    +--->WV--->+      |                        |
    +->Wd--+-Ss+      +-A-+       +-----AN-----+
    |      |   |      |   |       |            |
LEFT-WALL he has.v type.n 2 diabetes.n-u mellitus[!].a


An apparent reason for that is this: the vocabulary only knows "type.v" and not "type.n" as evidenced in what the parse returns.

? those "subscripts" just help narrow down which rule-set gets used. They're otherwise meaningless.

It's not obvious that I can simply add "type.n" to words.n.1-const or any of the other noun collections without rebuilding.

There's nothing to rebuild... none of the dictionary contents are in C++ code.


What is the process associated with augmenting the vocabulary?

:-)  ... Its ... not that hard, but there's a learning curve that you will not want to face. Describe the problem, maybe there's an easy fix.
 
--linas

Thanks in advance.
Jack

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.

Jack Park

unread,
Apr 26, 2020, 2:30:24 PM4/26/20
to link-g...@googlegroups.com, opencog
I'm trying hard not to be an advocate of anything; rather, I'm in exploration mode.

Tell you what: I'd like to see (have not yet found) some solid examples of AtomSpace - full-on knowledge graph implementations, whatever. Something to be able to follow through one or two complete examples which are more than toy exercises.


You received this message because you are subscribed to a topic in the Google Groups "link-grammar" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/link-grammar/FMGJq5YhbME/unsubscribe.
To unsubscribe from this group and all its topics, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAHrUA36GQq2B6nrFcQxN7DqQ8gZJH1DqvvaUkBp54rcR2b89Mw%40mail.gmail.com.

Jack Park

unread,
Apr 26, 2020, 2:33:11 PM4/26/20
to link-g...@googlegroups.com
My sentence is this

The pandemic of obesity, type 2 diabetes mellitus (T2DM) and nonalcoholic fatty liver disease (NAFLD) has frequently been associated with dietary intake of saturated fats (1) and specifically with dietary palm oil (PO) (2).

I never see "type.n" - only "type.v". This could be the fact that I am using a tcp client which returns JSON when "text:<sentence>" is sent. No telling (yet) what the difference might be.


You received this message because you are subscribed to a topic in the Google Groups "link-grammar" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/link-grammar/FMGJq5YhbME/unsubscribe.
To unsubscribe from this group and all its topics, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAHrUA34MHa83vMBss-HtWVv-MsUi%3D%2BHcRLEHfrALAEaSsQTRvA%40mail.gmail.com.

Linas Vepstas

unread,
Apr 26, 2020, 3:18:26 PM4/26/20
to link-grammar
I just fixed this in the master git branch.  That was ..interesting. First, the fix was remarkably easy. Second, the lack of the fix caused the parser to become profoundly confused about the correct parse, generating something quite insane.

Anyway: you can either try to pull from git master, or you can hand-edit `data/en/4.0.dict` and insert the following:

% Numerical identifiers
% NM+ & AN+ : "Please use a number 2 pencil"
%             "He has type 2 diabetes"
number.i batch.i group.i type.i:
  NM+ & AN+;

So NM is a link to a "numerical modifier"  while AN is a link to an "adjectival noun" (nouns that can be used as if they were adjectives).  So, for example:

               +--------Ou--------+
    +--->WV--->+      +-----AN----+
    +->Wd--+-Ss+      +NMn+       |
    |      |   |      |   |       |
LEFT-WALL he has.v type.i 2 diabetes.n-u



--linas


Linas Vepstas

unread,
Apr 26, 2020, 3:26:19 PM4/26/20
to link-grammar
Oh, a bit more:

On Sun, Apr 26, 2020 at 2:18 PM Linas Vepstas <linasv...@gmail.com> wrote:
I just fixed this in the master git branch.  That was ..interesting. First, the fix was remarkably easy. Second, the lack of the fix caused the parser to become profoundly confused about the correct parse, generating something quite insane.

Anyway: you can either try to pull from git master, or you can hand-edit `data/en/4.0.dict` and insert the following:

% Numerical identifiers
% NM+ & AN+ : "Please use a number 2 pencil"
%             "He has type 2 diabetes"
number.i batch.i group.i type.i:
  NM+ & AN+;

So NM is a link to a "numerical modifier"  while AN is a link to an "adjectival noun" (nouns that can be used as if they were adjectives).  So, for example:

               +--------Ou--------+
    +--->WV--->+      +-----AN----+
    +->Wd--+-Ss+      +NMn+       |
    |      |   |      |   |       |
LEFT-WALL he has.v type.i 2 diabetes.n-u

In this example, NMn means its actually a number. If you say "he has type A diabetes" you get NMa which indicates that its "just like a numerical modifier, but its alphabetic".  The upper-case link-types describe the actual linguistic relationship. The lower-case subscripts refine various special-cases thereof.  "he has vitamin D deficiency"

--linas

Linas Vepstas

unread,
Apr 26, 2020, 4:50:09 PM4/26/20
to link-grammar, opencog
On Sun, Apr 26, 2020 at 1:30 PM Jack Park <jack...@topicquests.org> wrote:
I'm trying hard not to be an advocate of anything; rather, I'm in exploration mode.

Tell you what: I'd like to see (have not yet found) some solid examples of AtomSpace - full-on knowledge graph implementations, whatever. Something to be able to follow through one or two complete examples which are more than toy exercises.

The latest blog entry?  https://blog.opencog.org/

You're not going to find "complete examples" anywhere -- if you can't master the toy examples, a "complete example" will not make anything easier. That said -- this all is open source, so ---

The mozi biochem annotation code -- https://github.com/MOZI-AI/annotation-scheme
It handles datasets with some 10-50 million protein and gene encodings.  You would have to write to the developers to get sample datasets, I'm not sure if they are proprietary or what.

There used to be the old Hanson Robotics behavior code; it has fallen away but it had a bunch of scripts for reacting via facial expressions, to visual cues (people entering the room, making hand-gestures).   It implemented something called "behavior trees" (see wikipedia)  -- what's left of it is here https://github.com/opencog/opencog/tree/master/opencog/eva but I'm not sure where the ROS integration has moved to. It used to be here: https://github.com/opencog/ros-behavior-scripting -- the repo is still there, but the contents were gutted and moved somewhere else, not sure where. Maybe some hanson robotics tree.  The robot control stuff was here: https://github.com/opencog/blender_api  -- with some effort, the original pipeline can be made to work again.

Somewhere there's an AIML import module: you can import everything that AIML does, and run it in a compatibility mode. The goal of doing this was to also attach vision and motor control. That never got far, because AIML sucked. AIML had maybe 50K or 100K stimulous-response pairs, so that was maybe under 1M atoms, not sure. Here: https://github.com/opencog/opencog/tree/master/opencog/nlp/aiml -- it worked in real-time, enough to get the robot to trade shows and on TV.

Then a switch to ChatScript, but chatscript is far more complex than AIML, so they used something called "ghost" -- https://github.com/opencog/opencog/tree/master/opencog/ghost  -- the chatscript also has about 20K-ish conversational interactions, gambits, responses, so maybe 200K atoms ?? no clue, actually.  Also this: https://github.com/opencog/loving-ai-ghost  and this: https://github.com/opencog/ghost_bridge and this: https://github.com/opencog/loving-ai

The robot stuff and chatbots are abandoned for a variety of reasons. Partly because chatbots suck and partly because there's no money.  But also because chatbots got little/nothing to do with the "hard problem" of AI, so its boring to Ben, and to me, and to most anyone else working on this stuff.  Again -- that's why there's a focus on learning.

That leaves the genomics/proteomics stuff, which is interesting because you can actually do data mining that other bio systems cannot do.

-- Linas


Jack Park

unread,
Apr 26, 2020, 6:20:43 PM4/26/20
to link-g...@googlegroups.com
It's going to be a while. The version on my ubuntu box is older than current; just swapping that file from git with the old one and restarting changed nothing. I brought down the latest git (I built the old one from the zip file), I'm having to relearn how to build it; seems I cannot recall how I did it before.

Jack Park

unread,
Apr 26, 2020, 6:25:33 PM4/26/20
to link-g...@googlegroups.com, opencog
I'll start with the blog post.
Thank you.

Jack Park

unread,
Apr 26, 2020, 8:36:32 PM4/26/20
to link-g...@googlegroups.com
Installing latest LG repo pull on ubuntu 18.04 and get this error message
mv: cannot move 'tmp-pp_lexer.c' to '../lex.yypost-process/pp_lexer.c': No such file or directory
CC post-process/pp_lexer.lo
gcc: error post-process/pp_lexer.c : no such file

I cannot copy the whole thing because copy and paste has gone south on this computer for reasons I do not understand.

I am running this from the install directory. There is a build directory.
On the hunch that it couldn't move owing to permissions, I did sudo make install (I'm not a unix jockey, what do I know?)

Thanks in advance for ideas.

Linas Vepstas

unread,
Apr 26, 2020, 8:53:50 PM4/26/20
to link-grammar
run `sudo make install` from the same directory (the build directory) as you ran `make` in.

--linas

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.

Jack Park

unread,
Apr 26, 2020, 9:13:20 PM4/26/20
to link-g...@googlegroups.com
Ran make clean just to be sure - appeared to run fine
sudo make install got same error immediately

Possible issue: looking in config status, found some fatal errors; sorry, must attach the file. copy and paste not even working in a chrome browser.

g++ fatal error unrecognized commandline option -qversion
apparently on configure 4713

You received this message because you are subscribed to a topic in the Google Groups "link-grammar" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/link-grammar/FMGJq5YhbME/unsubscribe.
To unsubscribe from this group and all its topics, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAHrUA34%3DSaRf_VTYkWRMfSMpX8mHYhktZ8H_5gLeOy5s8PQdAw%40mail.gmail.com.
config.status

Linas Vepstas

unread,
Apr 26, 2020, 9:25:06 PM4/26/20
to link-grammar
I can't figure out what you're saying. It would be best to build link-grammar from a tarball, and copy the dictionaries as needed. To build from git, you need a variety of extra packages installed, including automake, autoconf, and the autoconf-archive macros .. and modern g++ compilers. and lex and yacc. (or rather bison, the yacc substitute).  Then you have to ./autogen.sh to rebuild the configure file. For beginners, its daunting, which is why I recommend the tarball.

I'm thinking that you may have out-of-date versions of some of these tools. Also, if you have an existing version of link-grammar installed, and its quite old, it might interfere with the build of the newer one. It shouldn't, not really, but once luck starts going bad, this is something to check for.


Jack Park

unread,
Apr 26, 2020, 9:32:04 PM4/26/20
to link-g...@googlegroups.com
The first one was built from a tarball downloaded from github. On my macbook, it was built from a tarball. This is the first attempt to build from a pull. I'll try with a tarball.

Linas Vepstas

unread,
Apr 26, 2020, 9:34:55 PM4/26/20
to link-grammar
do not use tarballs for github. They are broken. This is a stupid automatic service from github that cannot be disabled.   Use the tarballs from here: https://www.abisource.com/projects/link-grammar/

--linas

Jack Park

unread,
Apr 26, 2020, 9:37:31 PM4/26/20
to link-g...@googlegroups.com
That tarball will not have your patch to the dict - will have to bring that down from github

Jack Park

unread,
Apr 26, 2020, 9:55:38 PM4/26/20
to link-g...@googlegroups.com
installed. Thanks for the help!

Jack Park

unread,
Apr 26, 2020, 9:58:55 PM4/26/20
to link-g...@googlegroups.com
My parse is now still not like yours above; it still shows type as a verb, but it picked up the NM and followed that with an appropriate AN, so the parse is good. Now, I must rerun all the other sentences to ensure nothing changed. Thanks!

Linas Vepstas

unread,
Apr 26, 2020, 10:18:55 PM4/26/20
to link-grammar
That's odd .. the parse with type.i has a score of 
Linkage 1, cost vector = (UNUSED=0 DIS= 5.07 LEN=120)
and thus should be reported first, before the type.v one which reports
Linkage 2, cost vector = (UNUSED=0 DIS= 5.97 LEN=107)
... oh, right, the java bindings also weight the length in some way, that might explain it. ... you might want to disable that ...

At any rate, the parse with type.v is obviously wrong, its garbage.

Very roughly, the cost is (minus the) logarithm of the probability.

--linas

Jack Park

unread,
Apr 27, 2020, 10:43:09 AM4/27/20
to link-g...@googlegroups.com
It is not at all clear to me where one controls weights in the java code.

Linas Vepstas

unread,
Apr 27, 2020, 2:25:57 PM4/27/20
to link-grammar
On Mon, Apr 27, 2020 at 9:43 AM Jack Park <jack...@topicquests.org> wrote:
It is not at all clear to me where one controls weights in the java code.

Heh. I was wrong... the weighting is in the relex java code; there's none of that in the simple server.

OK, so now for a dive into technicalities. This is a large, complex sentence, and the dictionary allows a (very) large number of alternative parses. Each parse is ranked, and, in principle, the one with the lowest "cost" is the one most likely to be correct (i.e. is the parse that a human would interpret).

In this case, there are apparently 313120 linkages .. out of these, a random drawing of 1000 is made, and these are ranked and presented to you.  Every time you parse, you'll get a different random drawing.

* There's a flag to disable random drawing, and to use deterministic drawing.
* There's an option to change the 1000 to any other number. If you change it to something larger than 313120, then all parses will be ranked, and the lowest-cost one will be returned first.  This can take a bit more CPU time ... but its not too bad, it seems, at least for this sentence.
* For this sentence, after looking at all possibilities, there are 8 that are tied for best score, and many more that are not far away. They are all ... OK at the phrase level, but have objectionable parses overall. There's a fair amount of confusion about where the "and" links. So, for example should it be a "pandemic of (obesity and fatty liver)" or should it be "(pandemic of obesity) and (fatty liver)" ? As humans, we know we want the former, not the latter. The parser, however, is not AGI, and it does not have enough smarts to distinguish between these two, and is offering up  the wrong thing.

...  The only way I know of improving on this situation is via unsupervised training. But that is a long program with only partial results.

-- Linas.





Jack Park

unread,
Apr 27, 2020, 3:05:27 PM4/27/20
to link-g...@googlegroups.com
While I am somewhat liking the results I have gotten so far, I might be faced with choices. At this very moment, I got the parse I want, and it makes sense. On all the other sentences (I must go back and try them again with the latest and upgraded LG install) I got parses that were quite reasonable, some better than I was getting with SpaCy.

For me, the work I must do is to make sense of these parses; my prime choice is to drop LG entirely and continue with SpaCy, or continue with the comparison.

I'll just do something else for a bit and let this one percolate.
Thanks


Linas Vepstas

unread,
Apr 27, 2020, 7:30:37 PM4/27/20
to link-grammar
If it helps you percolate: I presume the reason that you find the link-grammar link types challenging is that there are so many of them. They provide a lot of very detailed linguistic relationships, far beyond what anything else provides. If you want to simplify your life, you can "blur" these down to a smaller set -- just plain "noun modifiers" or just-plain "verb modifiers" or whatever.  Just discard the extra markup, bin them into a handful of relationships.

This is what relex did -- it just simplified everything down to the more-or-less-standard-ish set of dependencies that the original Stanford parser provided. There are maybe 20-ish of those, if I remember correctly.  I said bad things about relex, but if you don't need the firehose of detail, you might want to try that -- I think it works well-enough.

Even link-grammar itself provides a phrase-structure-grammar compatibility mode -- reducing everything to S, NP, VP, ADVP, ADJP and a handful of  others.

I've been avoiding these, because for me, blurring things down wasn't useful. Sometimes, you really do want to know if something is singular or plural, or acts as a predicative adjective or not, or is someone expressing an opinion about something else. The Stanford markup and the HPSG markup looses that information.

That said, I don't understand what you are doing, so maybe its enough to just tag verbs and prepositional phrases and be done with it.  Certainly, in machine learning, there are often large benefits to bunching up data into coarse-grained bins -- it avoids over-fitting and smooths out noisy inputs.

--linas

Jack Park

unread,
Apr 27, 2020, 10:13:33 PM4/27/20
to link-g...@googlegroups.com
Thanks Linas.

My primary - first level - goal is to pluck all the triples out of sentences. The sentence I used there, being conjunctive, has 3x2 = 6 and in later processing, I find there to be a couple of isAs which can be inferred following "pandemic" and acronyms following a couple of nouns. I'd be happy for now with reliably doing that.


Linas Vepstas

unread,
Apr 27, 2020, 10:38:02 PM4/27/20
to link-grammar
Hmm. OK, then in the general case, you only care about four links.  WV points from wall-to-head-verb. So I guess that's the middle word of your triple. W points from wall-to-head-noun. (sometimes Q for questions).  S connects head-noun to head verb (the "subject") and O connect to the object. And that's it.

But triples...ooooff. The question-answering chatbot from days of yore started out with triples, and the lesson I learned there was that you can't simplify English down to triples. First observation was that modifiers matter, and you can't really squeeze them into triples, not without lots of additional complexity. And then it slides downhill from there.  Turns out out that people make complex syntactic constructions for a reason: to actually discriminate ambiguous situations. Who wudda thunk?  That was the day my worldview changed entirely.

"The green ball is under the couch. The red ball is on the table.  Where is the green ball? "

"The blue rock is on the floor. The black cat sat on the mat on the table. Where is the white cat?"

-- Linas

Reply all
Reply to author
Forward
0 new messages