Re: [Link Grammar] Choice of language for Link Grammar?

28 views
Skip to first unread message

Linas Vepstas

unread,
Oct 2, 2021, 4:48:26 PM10/2/21
to link-grammar, opencog
Hi Calvin,

On Sat, Oct 2, 2021 at 9:31 AM Calvin Irby <calvin...@gmail.com> wrote:
>
> Hello Link Grammar Community,
>
> I was just wondering about how the way that Link Grammar is written. I know over time it has changed a lot and various support for other languages have been added. But why was Link Grammar written in C?

Because neither Python nor Java existed when link Grammar was created
-- version 1.0 came out in 1991, so it's celebrating it's 30th
anniversary, now.

> Is it because C is fast and so natural language processing requires the C language so that calculations are done as fast as possible. If that's the case, where would that leave other high level languages? Could Link Grammar ever be rewritten in a language like Python?

Yes it could be. Any system can be rewritten in any (Turing complete)
language whatsoever. Most programming languages are Turing complete,
the few exceptions are explicitly not: regex's are for finite state
machines, only.

For the link parser, Python would be a low-quality choice. For two
reasons. (1) good god, why? The code already exists, it's debugged, it
works. (2) it would be 10x or 20x slower. Many of the structures
inside of LG are hand-tuned to fit inside the cache-line of a typical
modern CPU, and a fair chunk of it's performance comes from this kind
of tuning. This tuning is (literally) impossible in Python and java.
(3) If you do want to create something new, well -- I'm developing a
theory and an infrastructure that is Link-Grammar-like, but is more
general: I believe it will work for audio and video (and other things,
too) but the parser will remain very link-grammar-like. See the
README's in https://github.com/opencog/learn and the documentation in
https://github.com/opencog/atomspace/sheaf for more details.

> It just seems like High Level Languages like Java and Python make it difficult because developers never seem to learn all of the language and are always fighting an uphill battle just trying to understand all of a language's ecosystem.

Python is the Visual Basic of the 21st century. A lot of people who
don't know how to program, who are terrible programmers, write code in
Python. It's OK to keep them at arm's length.

The Java infrastructure is daunting. There are practical issues:
requiring 16GB of RAM for infrastructure just to write and run "hello
world" is more than I can handle. I spent years coding in java, and
came to dislike it. It's very verbose -- you have to write 5-10 lines
of code to do just the simplest things.

Much much more important is the theory: It is very difficult to write
abstract, recursive algorithms in Java. The entire idea of
object-orientation fights against the idea of recursive data
structures. Things like Link Grammar, and the AtomSpace are all about
recursive data structures. It would be miserable to have to work with
these in Java. It's like .. I dunno, wearing pants for a shirt.
Wearing a shirt for pants. Shirts and pants are great, but you have to
wear them on the correct part of your body. The shape of recursive
data structures simply does not fit conventional OO languages - not
just Java, but also javascript, (although that is for a very different
reason, having to do with named data structure locations, as opposed
to anonymous ones.)

This is why most of my code is in Scheme, these days. It's not because
Scheme is a great language. Actually, it kind of sucks - it's hard to
use, hard to understand, difficult to read, difficult to modify. But
it fits the data structures much much better. ***For this particular
project*** (and not other projects!) one line of Scheme does more than
20 lines of contorted, complexticated Java or Python boilerplate.
It's all about the data structure.

> Not only that, but my predictions is that the C language is a safe bet now for most projects as AI and ML techniques will soon be taking over the development world. We are already seeing a movement for "No code" and Tim O' Reilly even commented recently that "The Golden Age of the Programmer is over". Is C going to be the surviving dinosaur language that will continue to thrive in the future while all the other various high level languages die out to the technologies that big corporations are developing? Google and Microsoft seem to be on a race to develop AI that helps out developers.

You're talking about developers who are inexperienced -- the "Visual
Basic" kind of thinkers -- the "Power Users". The people who were
crappy programmers to begin with. Sure, AI will allow them to evolve
into better crappy programmers, or not be programmers at all.

-- Linas

--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

Jacques Basaldúa

unread,
Oct 3, 2021, 4:28:32 AM10/3/21
to link-grammar, opencog
Hi,

If you want to use LinkGrammar in **any** language a very simple way to go is: refactor it as a single class in C++ and use swig to wrap it to any language.

class LinkGrammar {

public:

LinkGrammar (char *dict_name, char *pp_name, char *cons_name, char *affix_name, int a_verbosity = -1);
LinkGrammar (int a_verbosity = -1) : LinkGrammar("4.0.dict", "4.0.knowledge", "4.0.constituent-knowledge", "4.0.affix", a_verbosity) {}
~LinkGrammar ();

// Includes the dictionary load, check .dict == NULL for error.
Dictionary dict;

// Native API (uses Link Grammar types)
Sentence sentence_create (char *input_string);
int sentence_parse (Sentence sent);
Linkage linkage_create (int k, Sentence sent);
char *linkage_sprint_diagram (Linkage linkage);
void string_delete (char *p);
char *linkage_sprint_constituents (Linkage linkage);
void sentence_delete (Sentence sent);



Then, use this http://www.swig.org/Doc1.3/Python.html to build a Python wrapper and there you are. Note that anything that works well in Python (including its own core library, numpy, scipy, pytorch, etc.) is written in C.

I think that was done at some time in LG, but possibly abandoned.

Personally, I am very interested in the dictionary itself and not so much in the solver that comes with LinkGrammar. 

The reason why is: I do not like forcing the sentence to be such an important part of the language. It's like a self contained whole, but it is still unrelated with the previous or the next sentence. If the sentence is "ungrammatical" we throw it out completely.
I consider language much more of a flow and I am interested in how words fit together at a smaller scale and possibly that becomes even larger than a sentence. It is like a tree search. Avoiding the idea that there is just one correct solution.

See how beautifully described the links are: https://www.link.cs.cmu.edu/link/dict/section-CC.html (The index is here https://www.link.cs.cmu.edu/link/dict/index.html)

I say this to just throw in my idea:

Why not make it open ended? Make the dictionary a well documented, exportable as a single file, resource that explains all the possible relations in the English language in text form. There are just around 100 links in all of English and let the user imagine how to use that to "crack language" in an explainable way. I am a huge fan since I first saw it (in this mailing list), and I find understanding the dictionary very hard and when it is used, it is full of hacks to fit special cases rather than just letting the dictionary
be the reference and let the AI figure out from the context what applies best where.

Just my 2 cents.

Jacques.






--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAHrUA36Pa39%3D-56sjqPdD2OxoP7R800rxrz%3DwK_%3DuDEtMYV%3DWQ%40mail.gmail.com.

Linas Vepstas

unread,
Oct 3, 2021, 2:52:49 PM10/3/21
to link-grammar, opencog
Hi Jacques,

I'm glad to hear from you ... I wrote a long reply ... it may seem
stringent but there has been a lot happening in the last 10 years of
the link-grammar project that you are not aware of ... much of what
you mention is in the process of being addressed.

-- linas

On Sun, Oct 3, 2021 at 3:28 AM Jacques Basaldúa
<jacques...@gmail.com> wrote:
>
> Hi,
>
> If you want to use LinkGrammar in **any** language a very simple way to go is: refactor it as a single class in C++ and use swig to wrap it to any language.

That is exactly how it is done for 3 or 4 of the current language
bindings, including python.

> Personally, I am very interested in the dictionary itself and not so much in the solver that comes with LinkGrammar.

In case you aren't aware of this, I'm working very hard on
automatically creating dictionaries for any language. This includes
the language of shapes and sounds! The project is at
https://github.com/opencog/learn -- and now to be realistic: so far,
I've been able to put together some demos for English, only. Even
those are a bit shaky, I'm currently redesigning internals. Work on
sound/vision has not begun; basic DSP/GPU libraries need to be wrapped
first.

> The reason why is: I do not like forcing the sentence to be such an important part of the language. It's like a self contained whole, but it is still unrelated with the previous or the next sentence.

Yes, this is a short-coming with the current design. Although -- be
aware that the current design can correctly parse 2-3-4 sentences at
a time. After that, I think it gets a bit slow (? maybe?), since it
needs to explore the locations of sentence boundaries. After that,
the printed output is hard to read, so most people don't do this.

There is a need for an incremental parser, and there's some work on
this that has been done.

> I consider language much more of a flow and I am interested in how words fit together at a smaller scale and possibly that becomes even larger than a sentence.

Sure, except one of the interesting possibilities is anaphora resolution...

> It is like a tree search. Avoiding the idea that there is just one correct solution.

Link grammar does not assume "one correct solution" -- it gives you a
ranked list of possibilities, from most likely to less-likely.

> See how beautifully described the links are: https://www.link.cs.cmu.edu/link/dict/section-CC.html (The index is here https://www.link.cs.cmu.edu/link/dict/index.html)

HEY PAY ATTENTION! That documentation is 15 years out of date! The
current documentation is located at
https://www.abisource.com/projects/link-grammar/dict/

In particular, the CC link is obsolete, and was removed five years
ago! If you are using the CMU dictionaries, you are missing fifteen
years of changes, updates, improvements. The current dictionaries
cover a far far larger subset of English. The current parser is
literally 100x faster.

>
> Why not make it open ended?

Yes, there is extensive work in OpenCog to do exactly that.

> Make the dictionary a well documented, exportable as a single file,

One of the additions, made about 7 years ago, allows you store the
dictionary in an sqlite3 database. This is, in fact, not ideal, but
... it does give you a "single file"

Note also we have dictionaries for Russian, and partial dictionaries
for Farsi, Arabic, German, and demos of another half-dozen languages.

> resource that explains all the possible relations in the English language in text form. There are just around 100 links in all of English and let the user imagine how to use that to "crack language" in an explainable way. I am a huge fan since I first saw it (in this mailing list), and I find understanding the dictionary very hard and when it is used, it is full of hacks to fit special cases rather than just letting the dictionary
> be the reference and let the AI figure out from the context what applies best where.

Yes. I will be giving a talk in two weeks on "explainable patterns"
which uses LG as a central inspiration for how one can learn new
things in an unsupervised setting, and, unlike neural nets, have
access to the symbolic internals so that you can actually understand
what was learned.

Again, the goal of the https://github.com/opencog/learn project is to
do this not just for English, but for any language, modern or extinct,
and also for sound and vision, or, more generally, anything in the
noosphere for which you have a perception interface.

-- linas
> To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CA%2B_pmb78MrDeBEnJ80YfXgPQXLf-gECCbAYtGWwQMGBhp6HD9Q%40mail.gmail.com.

Jacques Basaldúa

unread,
Oct 3, 2021, 3:11:07 PM10/3/21
to link-grammar, opencog
Hi Anton and Linas,

Thank you both for your long answers. 

1. I signed up for the INLP Workshop and will follow your presentations with great interest.
2. Linas, I did not know about your https://github.com/opencog/learn project and will have a look at that too.
3. My bad, I was sloppy not checking for the latest version, but I am just in the process of understanding it, I can easily switch to the latest version.
4. Happy to know that you have similar ideas for OpenCog. When I am there too, (my project goes slowly and I have other things first), I will keep you updated and, hopefully, some of the things I do may be useful for OpenCog too.

Jacques.

Linas Vepstas

unread,
Oct 3, 2021, 3:43:19 PM10/3/21
to link-grammar, opencog
Hi Jaques!

On Sun, Oct 3, 2021 at 2:11 PM Jacques Basaldúa
<jacques...@gmail.com> wrote:
>
> 3. My bad, I was sloppy not checking for the latest version, but I am just in the process of understanding it, I can easily switch to the latest version.

Ah ha! Well then -- a completely different set of remarks are in order!

So, first
-- the older English dict is smaller and simpler, and therefore might
be easier to understand. Yes, it is quite daunting at first, but if
you compare the dictionary to the resulting parses, it becomes clear
how things work. The parser provides some useful tools for this
exploration:

!disjuncts Display of disjuncts used
!links Display of complete link data
!!<string> Print all the dictionary words that match <string>.
!help
!var List user-settable variables and their functions.

Please use the new parser (even if you use the old dictionaries)

In the end, understanding the dict not really all that hard, although
it is confusing at first.

For all the other projects ... well, there's a lot more to say, and
its all very complicated, so I'll leave it at that.

If you have questions about link-grammar, how it works, if you're
confused about something .. just write! I'll try to provide a
friendly answer.
Reply all
Reply to author
Forward
0 new messages