Re: [Link Grammar] Parsing Based on Link-Grammars and SAT Solvers, unpublished draft paper.

22 views

Skip to first unread message

Linas Vepstas

unread,

Sep 29, 2019, 9:13:12 PM9/29/19

to link-grammar, opencog, lang-learn

CC'ing the opencog mailing list.

On Sun, Sep 29, 2019 at 3:08 PM Amirouche Boubekki <amirouche...@gmail.com> wrote:

My goal has not changed since 5 years! I want to create a mini-opencog
framework. In the spirit of Scheme that builds abstractions on top of
powerful primitives, such as a SAT solvers.

One very interesting abstraction on top of SAT is "answer set programming" (ASP). Now, ASP looks a lot like prolog, except that ASP solvers are orders of magnitude faster than traditional prolog, because they replace the stack-based backwards/forwards chainers by the SAT algo.

I would love to see an atomspace interface into an ASP solver. In this view, ASP would look like a crisp-logic variant of PLN. We could even approximate PLN by taking a statistical average of hundreds of ASP solutions. It might even be faster than the current rule-engine backwards/forwards chainers, for the same reason that ASP/SAT is faster than traditional prolog backward/forward chaining.

Like we discussed previously, I think the mix of programming language
(C, C++, Python, Scheme, Java, Haskell) is not helping embrace the
power of opencog. So I will try only rely on proven C libraries. Link
Grammar is such a beast. But:

a) As far as I understand there still some moving pieces in how LG is
implemented, and there is still improvement to the core mechanic that
parse natural languages that could be made to support more. That is
not everything about LG engineering is performance optimization. A
high level language is a good candidate for experimenting with new
features in LG.

Yes, but ... Although LG was originally developed to "parse natural language", what it is actually doing is to parse any linear (time-ordered) sequence of tokens, to extract structural relationships between those tokens. With Amir's work on the tokenizer, it can even discover different, competing (contradictory) token boundaries, splitting an input stream in different ways.

There are possible enhancements to LG, such as "multi-color parsing" or "link crossing" but discussing these properly is beyond the scope of this email.

At the level of this email, LG really is more-or-less "feature complete" and nothing to be done except to performance tune. Yet, also, as you suggest, we could build better tools on top of it...

b) I would like to better understand the link grammar theory. A bit of
thinking, goofing around and reading lead me to the personal discovery
that LG is a very peculiar kind of software because it rely upon a
programming language, that is the language that in which dictionary
are expressed, that is very broad and powerful (like other have
noted). It goes along the idea of Domain Specific Language,

The theory is that of jigsaw-puzzle pieces, as described in the very first paper. If you knew a lot more category theory, you would recognize these as monoidal categories. For example, LG resembles "pregroup grammars" (see wikipedia) and this is no accident: the same theory describes both, more or less.

The current text-file dictionaries in LG are a kind of DSL, but that's misleading, because any system that can describe a monoidal category in a type-theoretic way can be handled by LG. So there's a whole class of DSL's that you could layer on top of LG.

where one
builds a programming language to solve a particular task. I think LG
is the best example of DSL I know. As such, it calls for more study,
experimentation and understanding. Even if LG or a particular
dictionary e.g. English dictionary is flawed (somehow?) it is a
significant software feat that I am sure will be taken as inspiration
in the future human endeavours.

The "flaw" is that the English language cannot be described by a small ruleset, no matter what DSL you would care to use. This is the general lesson of linguistics: there are dozens of linguistic theories, and hundreds of variants, they are all appealing for various reasons, they are all adept at demonstrating various linguistic phenomena, but as soon as you try to craft "the rules of English" by hand, the number of rules grows exponentially. Pick some theory of language, I don't care which, and a few dozen rules will capture simplistic-English. A few hundred rules will give reasonable accuracy at some elementary-school reading level, and you need thousands of rules to begin to get acceptable quality on newspaper English, and you need tens of thousands of rules to start getting broad coverage (science literature, tweets, 18th century English, etc.) and tens-of-thousands of rules can no longer be managed by hand.

c) One area, where Link Grammar software will probably will need to
improve is the ability to create, fix, extend and improve the
dictionaries. That is, it needs a user interface and user experience
that looks better than notepad

My core claim is that even if you invent some really cool-looking DSL for describing a monoidal category in a type theoretic way, and also developed a nice GUI for it, you would still be faced with the need to maintain thousands or tens of thousands of rules. At least, for natural language.

If you wanted to use LG to parse *some other* time series -- I dunno, something from biochemistry or some network-hacking trace log or whatever, something that could be described by only dozens or hundreds of rules, then yes, a GUI would be great to have.

However, all linguists who have attempted to build any kind of GUI for any kind of parser for any natural language -- they all hit a wall. Maintaining tens of thousands of rules is just too hard.

or the current REPL cli tool called
link-parser. Alas, I have no better idea than a REPL of some sort but
use voice... One area, that could improve the ui/ux of the creation
and maintaince of the dictionary is better integration with the
AtomSpace and in general with the end-user application.

This veers in a different direction. Roughly speaking, the atomspace is "just like" any other database. Have you ever seen a nice UI/UX for the maintenance of a database? Gee golly, well, why not? Because databases are too abstract to mash into a GUI or UI/UX. Database tables can be anything at all. There's no way to UI/UX that, (for the same reason that no general-purpose programming language has a GUI) (I mean, LEGO Mindstorms is almost a GUI, and COBOL is almost a GUI, but once you understand either of these, you promptly realize that a plain-old text editor is just faster and easier.)

That is it
would be neat, to allow the user to fix the dictionary. Something that
is made difficult in current microservice-like setup of opencog and
the fact that. In order, to have a quick feedback loop between LG, the
knowledge base and the user. To make it more clear, the current unit
tests obviously are a good thing. One should bring that feature, unit
testing into the client. We could have, ground through knowledge,
similar to expected parse trees in the current unit tests even take
into account inferred knowledge based on parse trees and eventually on
a version of the dictionary which will be checked as soon as the
client user make a change to that dictionary.

This is impossible. The current LG test sets contain about 7K test sentences, and it is impossible to fix one without breaking something else. The best you can do is to fix more than what you break.

The current English dict has approximately 2K rules in it, and its impossible to make changes in one of them without carefully understanding most of the others.

This is not like ordinary software. Human natural language is much much messier than any software program.

The goal of the language-learning project is to automatically learn the rules of any natural language, given a sampling of it's corpus.

The point being that humans should not be writing these dictionaries. Machines should be.

-- linas

Basically, give access
to the user to some knobs that are frozen right now and give the user
the tools necessary to make sure: It Works (tm). The reason for that
is two sides: 1) for rare languages, it should be possible to build a
LG dictionary from the user interface, prolly with a bootstrap
language like lobjan or english. 2) I think LG dictionary language (or
something similar) is a good candidate for inclusion in projects such
as wikidata but before that happens it must be possible to test check
the correctness of changes since it is possible to do so. (unlike
common sense, encyclopedic knowledge and, so called, lexicographic
that are ground truth).

d) Like I try to explain above, I prefer easy to code. Fast programs,
Speedy processors et al. have proven numerous times in recent years to
be false friends. So, like Rob Pike might have said: "make it work,
then make it fast".

My understanding is that GOFAI has nightmares about slowness. Like i
tried to explain somewhere else, they are workarounds to slow
processus like a) lazy algorithm or beam search b) probabilistic
models c) slow overall workflow.

The last point is interesting, my idea is that the problem that AGI
needs to solve, are big and slow and sometime not even advancing at
all when humans try to tackle them. So, if the computer is "slow"
compared to ordering a pizza, the user will be thankful even if it
reply with a message saying: need more input.

--
Amirouche ~ amz3 ~ https://hyper.dev

--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAL7_Mo81_MyagiBNspt-25x66s-UXD9oerUD2kVQmaiSC4%2B-Bg%40mail.gmail.com.