Parsing Based on Link-Grammars and SAT Solvers, unpublished draft paper.

505 views
Skip to first unread message

Amirouche Boubekki

unread,
Aug 30, 2019, 3:05:39 PM8/30/19
to link-grammar
I am willing to try to port link grammar sat parser to Scheme.


I could not find a copy of P. Janičić, B. Goertzel, Parsing Based on Link-Grammars and SAT Solvers, unpublished draft paper. Can anyone share a copy with me please?

Which sat solver do you recommend to use?

Any advice welcome.

Linas Vepstas

unread,
Sep 2, 2019, 12:57:33 PM9/2/19
to link-grammar
Why?  What do you hope to accomplish?

Sample answers:
1) self-edification, entertainment, fun-n-games. Well, that's a reasonable answer, and I can't say no to that.

2) Performance. Well, since that PDF was written, we've learned a thing or two. First, the SAT solver was never faster than the "regular" LG parser, for short sentences. Since then, the LG parser has gotten faster .. much much faster. For short and long sentences. The most recent merged pull req, #996 ... well, I have not measured, but I suspect there's now parity.

3) scheme bindings. LG has lisp bindings, but no scheme bindings. The Lisp bindings are probably bit-rotted. New bindings are good.

4) Some kind of generalization.  At this time, LG is designed for, meant for UTF-8 text strings. There are some reasons why it should be generalized to something other than text strings, to be somehow made more general-purpose. But this has nothing to do with scheme ....

I suspect that you're shooting for #1, and indeed, SAT is interesting to study and understand, and it's relationship to general compute problems is interesting. But, as something for the benefit of LG, the thrill has evaporated.

Now, interfacing a SAT solver onto the opencog atomspace ... that could be interesting ... that could be a wide-open interesting project. And, FYI, there exists working and maintained code that imports LG dictionaries into the AtomSpace, so the entire dictiionary structure is there. But a native parser using that structure is not.  (Native=white-box; there is a block-box interface between LG and the atomspace, but to the atomspace, LG is a magic black box).

--linas




--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/c4a14618-1524-4daf-8e10-a74c9656a825%40googlegroups.com.


--
cassette tapes - analog TV - film cameras - you

Linas Vepstas

unread,
Sep 2, 2019, 1:40:39 PM9/2/19
to link-grammar
p.s. I just now tried to compare the SAT parser to the current 'regular' parser, and it appears that SAT is always slower.  On one real-world text, it is at least 10x slower, and at a minimum 2x slower on another, and basically, I got bored waiting so don't have numbers more accurate than that. It is perhaps time to remove the SAT parser, as it would appear that it offers no benefits of any kind, while also missing some useful features (e.g. null word support).

--linas

Amir Plivatsky

unread,
Sep 15, 2019, 5:42:19 AM9/15/19
to link-g...@googlegroups.com
On Friday, August 30, 2019 at 10:05:39 PM UTC+3, Amirouche Boubekki wrote:
I am willing to try to port link grammar sat parser to Scheme.


As Linas said, it is much better to just provide a Scheme bindings.
This way you will be able to use from scheme the whole LG library, including the SAT solver.
Implementing this is not hard.

Note that a current SAT parser implemented directly in Scheme would be extremely slow... No point in doing that unless you would like to exercise Scheme (and making a Scheme bindings is a better Scheme exercise anyway and surely more useful).
 

I could not find a copy of P. Janičić, B. Goertzel, Parsing Based on Link-Grammars and SAT Solvers, unpublished draft paper. Can anyone share a copy with me please?

From my experience, for understanding the internals of the SAT parser, the provided PDF is more than enough.

Which sat solver do you recommend to use?

In any case, any implementation of SAT parser shouldn't depend on the SAT solver used. I tried linking the LG library with several ones and all worked fine. Which is faster is vary and depending on the exact release. The speed differences were not big. In any case the classic parser is now way too faster. The SAT parser could use many of the classic parser old and new performance enhancements, but unless someone helping me doing so, this is not in the top of my LG TODO list, as it is better to first continue to improve the performance of the classic parser.
 

Any advice welcome.

To sum up: Go for the Scheme bindings. For how to do it, see:  http://www.swig.org/Doc3.0/SWIGDocumentation.html#Guile

Amir

Amir Plivatsky

unread,
Sep 15, 2019, 6:08:18 AM9/15/19
to link-grammar
The current SAT parser is indeed much slower than the current classic parser.
In addition, the classic parser is going to be very much faster then now, both for short and long sentences. New releases with an enhanced speed depends on the speed in which I'm able to convert my proof-of-concept performance demos to production code.

As I already said in my previous post here, the SAT parser may be made faster too. For example, I have a version using the power-pruning of the classic parser (in a modular way, without even touching the SAT parser code itself, a thing that looks impossible on first glance) for a significant speedup. A trivial change of the SAT parser code to then use nearest_word will make it even faster. And so on. I also have other fixes that make it faster, for example a change in generating XOR relations makes it 2x faster. I also started to introduce a null word support into it. But for now I don't have enough time to release these changes, as I'm concentrating in implementing "revolutionary" ideas of speeding up the classic parser.


On Monday, September 2, 2019 at 8:40:39 PM UTC+3, Linas Vepstas wrote:
p.s. I just now tried to compare the SAT parser to the current 'regular' parser, and it appears that SAT is always slower.  On one real-world text, it is at least 10x slower, and at a minimum 2x slower on another, and basically, I got bored waiting so don't have numbers more accurate than that. It is perhaps time to remove the SAT parser, as it would appear that it offers no benefits of any kind, while also missing some useful features (e.g. null word support).

--linas

On Mon, Sep 2, 2019 at 7:57 PM Linas Vepstas <linasv...@gmail.com> wrote:
Why?  What do you hope to accomplish?

Sample answers:
1) self-edification, entertainment, fun-n-games. Well, that's a reasonable answer, and I can't say no to that.

2) Performance. Well, since that PDF was written, we've learned a thing or two. First, the SAT solver was never faster than the "regular" LG parser, for short sentences. Since then, the LG parser has gotten faster .. much much faster. For short and long sentences. The most recent merged pull req, #996 ... well, I have not measured, but I suspect there's now parity.

3) scheme bindings. LG has lisp bindings, but no scheme bindings. The Lisp bindings are probably bit-rotted. New bindings are good.

4) Some kind of generalization.  At this time, LG is designed for, meant for UTF-8 text strings. There are some reasons why it should be generalized to something other than text strings, to be somehow made more general-purpose. But this has nothing to do with scheme ....

I will add (4) to my TODO list...
 

I suspect that you're shooting for #1, and indeed, SAT is interesting to study and understand, and it's relationship to general compute problems is interesting. But, as something for the benefit of LG, the thrill has evaporated.

Now, interfacing a SAT solver onto the opencog atomspace ... that could be interesting ... that could be a wide-open interesting project. And, FYI, there exists working and maintained code that imports LG dictionaries into the AtomSpace, so the entire dictiionary structure is there. But a native parser using that structure is not.  (Native=white-box; there is a block-box interface between LG and the atomspace, but to the atomspace, LG is a magic black box).

--linas



BTW, understanding the opencoge framework is already in my TODO list, and I will then see how I can help in that.

Amir



On Fri, Aug 30, 2019 at 10:05 PM Amirouche Boubekki <amirouche...@gmail.com> wrote:
I am willing to try to port link grammar sat parser to Scheme.


I could not find a copy of P. Janičić, B. Goertzel, Parsing Based on Link-Grammars and SAT Solvers, unpublished draft paper. Can anyone share a copy with me please?

Which sat solver do you recommend to use?

Any advice welcome.

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar+unsubscribe@googlegroups.com.


--
cassette tapes - analog TV - film cameras - you

Amirouche Boubekki

unread,
Sep 29, 2019, 4:08:02 PM9/29/19
to link-grammar
Le ven. 30 août 2019 à 21:05, Amirouche Boubekki
<amirouche...@gmail.com> a écrit :
>
> I am willing to try to port link grammar sat parser to Scheme.
>
> I started reading the paper at https://raw.githubusercontent.com/opencog/link-grammar/master/link-grammar/sat-solver/sat-based-link-grammar-parser.pdf
>
> I could not find a copy of P. Janičić, B. Goertzel, Parsing Based on Link-Grammars and SAT Solvers, unpublished draft paper. Can anyone share a copy with me please?
>

>
> Any advice welcome.
>

To summarize:

a) According to both Amir and Linas the SAT solver is slow but there
is room for improvements

b) Interfacing with the Atom Space will be nice

c) Building Scheme bindings (using SWIG) would be nice

> Which sat solver do you recommend to use?

SAT solvers have more or less a common interface, it will be easy to
swap implementation.

My goal has not changed since 5 years! I want to create a mini-opencog
framework. In the spirit of Scheme that builds abstractions on top of
powerful primitives, such as a SAT solvers.

Like we discussed previously, I think the mix of programming language
(C, C++, Python, Scheme, Java, Haskell) is not helping embrace the
power of opencog. So I will try only rely on proven C libraries. Link
Grammar is such a beast. But:

a) As far as I understand there still some moving pieces in how LG is
implemented, and there is still improvement to the core mechanic that
parse natural languages that could be made to support more. That is
not everything about LG engineering is performance optimization. A
high level language is a good candidate for experimenting with new
features in LG.

b) I would like to better understand the link grammar theory. A bit of
thinking, goofing around and reading lead me to the personal discovery
that LG is a very peculiar kind of software because it rely upon a
programming language, that is the language that in which dictionary
are expressed, that is very broad and powerful (like other have
noted). It goes along the idea of Domain Specific Language, where one
builds a programming language to solve a particular task. I think LG
is the best example of DSL I know. As such, it calls for more study,
experimentation and understanding. Even if LG or a particular
dictionary e.g. English dictionary is flawed (somehow?) it is a
significant software feat that I am sure will be taken as inspiration
in the future human endeavours.

c) One area, where Link Grammar software will probably will need to
improve is the ability to create, fix, extend and improve the
dictionaries. That is, it needs a user interface and user experience
that looks better than notepad or the current REPL cli tool called
link-parser. Alas, I have no better idea than a REPL of some sort but
use voice... One area, that could improve the ui/ux of the creation
and maintaince of the dictionary is better integration with the
AtomSpace and in general with the end-user application. That is it
would be neat, to allow the user to fix the dictionary. Something that
is made difficult in current microservice-like setup of opencog and
the fact that. In order, to have a quick feedback loop between LG, the
knowledge base and the user. To make it more clear, the current unit
tests obviously are a good thing. One should bring that feature, unit
testing into the client. We could have, ground through knowledge,
similar to expected parse trees in the current unit tests even take
into account inferred knowledge based on parse trees and eventually on
a version of the dictionary which will be checked as soon as the
client user make a change to that dictionary. Basically, give access
to the user to some knobs that are frozen right now and give the user
the tools necessary to make sure: It Works (tm). The reason for that
is two sides: 1) for rare languages, it should be possible to build a
LG dictionary from the user interface, prolly with a bootstrap
language like lobjan or english. 2) I think LG dictionary language (or
something similar) is a good candidate for inclusion in projects such
as wikidata but before that happens it must be possible to test check
the correctness of changes since it is possible to do so. (unlike
common sense, encyclopedic knowledge and, so called, lexicographic
that are ground truth).

d) Like I try to explain above, I prefer easy to code. Fast programs,
Speedy processors et al. have proven numerous times in recent years to
be false friends. So, like Rob Pike might have said: "make it work,
then make it fast".

My understanding is that GOFAI has nightmares about slowness. Like i
tried to explain somewhere else, they are workarounds to slow
processus like a) lazy algorithm or beam search b) probabilistic
models c) slow overall workflow.

The last point is interesting, my idea is that the problem that AGI
needs to solve, are big and slow and sometime not even advancing at
all when humans try to tackle them. So, if the computer is "slow"
compared to ordering a pizza, the user will be thankful even if it
reply with a message saying: need more input.

--
Amirouche ~ amz3 ~ https://hyper.dev

Linas Vepstas

unread,
Sep 29, 2019, 9:13:12 PM9/29/19
to link-grammar, opencog, lang-learn
CC'ing the opencog mailing list.

On Sun, Sep 29, 2019 at 3:08 PM Amirouche Boubekki <amirouche...@gmail.com> wrote:

My goal has not changed since 5 years! I want to create a mini-opencog
framework. In the spirit of Scheme that builds abstractions on top of
powerful primitives, such as a SAT solvers.

One very interesting abstraction on top of SAT is "answer set programming" (ASP). Now, ASP looks a lot like prolog, except that ASP solvers are orders of magnitude faster than traditional prolog, because they replace the stack-based backwards/forwards chainers by the SAT algo.

I would love to see an atomspace interface into an ASP solver.  In this view, ASP would look like a crisp-logic variant of PLN.  We could even approximate PLN by taking a statistical average of hundreds of ASP solutions. It might even be faster than the current rule-engine backwards/forwards chainers, for the same reason that ASP/SAT is faster than traditional prolog backward/forward chaining.


Like we discussed previously, I think the mix of programming language
(C, C++, Python, Scheme, Java, Haskell) is not helping embrace the
power of opencog. So I will try only rely on proven C libraries. Link
Grammar is such a beast. But:

a) As far as I understand there still some moving pieces in how LG is
implemented, and there is still improvement to the core mechanic that
parse natural languages that could be made to support more. That is
not everything about LG engineering is performance optimization. A
high level language is a good candidate for experimenting with new
features in LG.

Yes, but ... Although LG was originally developed to "parse natural language", what it is actually doing is to parse any linear (time-ordered) sequence of tokens, to extract structural relationships between those tokens.  With Amir's work on the tokenizer, it can even discover different, competing (contradictory) token boundaries, splitting an input stream in different ways.

There are possible enhancements to LG, such as "multi-color parsing" or "link crossing" but discussing these properly is beyond the scope of this email.

At the level of this email, LG really is more-or-less "feature complete" and nothing to be done except to performance tune.  Yet, also, as you suggest, we could build better tools on top of it...


b) I would like to better understand the link grammar theory. A bit of
thinking, goofing around and reading lead me to the personal discovery
that LG is a very peculiar kind of software because it rely upon a
programming language, that is the language that in which dictionary
are expressed, that is very broad and powerful (like other have
noted). It goes along the idea of Domain Specific Language,

The theory is that of jigsaw-puzzle pieces, as described in the very first paper. If you knew a lot more category theory, you would recognize these as monoidal categories.  For example, LG resembles "pregroup grammars" (see wikipedia) and this is no accident: the same theory describes both, more or less.

The current text-file dictionaries in LG are a kind of DSL, but that's misleading, because any system that can describe a monoidal category in a type-theoretic way can be handled by LG.  So there's a whole class of DSL's that you could layer on top of LG.
 
where one
builds a programming language to solve a particular task. I think LG
is the best example of DSL I know. As such, it calls for more study,
experimentation and understanding. Even if LG or a particular
dictionary e.g.  English dictionary is flawed (somehow?) it is a
significant software feat that I am sure will be taken as inspiration
in the future human endeavours.

The "flaw" is that the English language cannot be described by a small ruleset, no matter what DSL you would care to use. This is the general lesson of linguistics: there are dozens of linguistic theories, and hundreds of variants, they are all appealing for various reasons, they are all adept at demonstrating various linguistic phenomena, but as soon as you try to craft "the rules of English" by hand, the number of rules grows exponentially.  Pick some theory of language, I don't care which, and a few dozen rules will capture simplistic-English. A few hundred rules will give reasonable accuracy at some elementary-school reading level, and you need thousands of rules to begin to get acceptable quality on newspaper English, and you need tens of thousands of rules to start getting broad coverage (science literature, tweets, 18th century English, etc.)  and tens-of-thousands of rules can no longer be managed by hand. 
 

c) One area, where Link Grammar software will probably will need to
improve is the ability to create, fix, extend and improve the
dictionaries. That is, it needs a user interface and user experience
that looks better than notepad

My core claim is that even if you invent some really cool-looking DSL for describing a monoidal category in a type theoretic way, and also developed a nice GUI for it, you would still be faced with the need to maintain thousands or tens of thousands of rules. At least, for natural language.

If you wanted to use LG to parse *some other* time series -- I dunno, something from biochemistry or some network-hacking trace log or whatever, something that could be described by only dozens or hundreds of rules, then yes, a GUI would be great to have.

However, all linguists who have attempted to build any kind of GUI for any kind of parser for any natural language -- they all hit a wall.  Maintaining tens of thousands of rules is just too hard.
 
or the current REPL cli tool called
link-parser. Alas, I have no better idea than a REPL of some sort but
use voice... One area, that could improve the ui/ux of the creation
and maintaince of the dictionary is better integration with the
AtomSpace and in general with the end-user application.

This veers in a different direction.  Roughly speaking, the atomspace is "just like" any other database. Have you ever seen a nice UI/UX for the maintenance of a database? Gee golly, well, why not?  Because databases are too abstract to mash into a GUI or UI/UX. Database tables can be anything at all. There's no way to UI/UX that, (for the same reason that no general-purpose programming language has a GUI) (I mean, LEGO Mindstorms is almost a GUI, and COBOL is almost a GUI, but once you understand either of these, you promptly realize that a plain-old text editor is just faster and easier.)
 
That is it
would be neat, to allow the user to fix the dictionary. Something that
is made difficult in current microservice-like setup of opencog and
the fact that. In order, to have a quick feedback loop between LG, the
knowledge base and the user. To make it more clear, the current unit
tests obviously are a good thing. One should bring that feature, unit
testing into the client. We could have, ground through knowledge,
similar to expected parse trees in the current unit tests even take
into account inferred knowledge based on parse trees and eventually on
a version of the dictionary which will be checked as soon as the
client user make a change to that dictionary.

This is impossible. The current LG test sets contain about 7K test sentences, and it is impossible to fix one without breaking something else. The best you can do is to fix more than what you break.

The current English dict has approximately 2K rules in it, and its impossible to make changes in one of them without carefully understanding most of the others.

This is not like ordinary software.  Human natural language is much much messier than any software program.

The goal of the language-learning project is to automatically learn the rules of any natural language, given a sampling of it's corpus.

The point being that humans should not be writing these dictionaries. Machines should be. 

-- linas

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAL7_Mo81_MyagiBNspt-25x66s-UXD9oerUD2kVQmaiSC4%2B-Bg%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages