Re: English Sentence Parser

Skip to first unread message

Linas Vepstas

Jun 14, 2021, 7:31:58 PMJun 14
to Stephen Frechette, Dominic Lachowicz, link-grammar
Salut Stephen,

Good to hear from you. I respond in-line below.

On Mon, Jun 14, 2021 at 6:00 PM Stephen Frechette <> wrote:

I have been using your parser to help teach other English, and am finding it extremely useful.

I am cc'ing the link-grammar mailing list; perhaps others will take interest.

I am writing to you because I found a few things I think may be errors in the program.

A minor bit of technical background: there are two parts to the "program": the part that does parsing; it is not aware of the language which it parses. Then there is the lexis or dictionary, which encodes a specific language.  English is the best developed, followed by Russian and perhaps Persian. There is a small German dictionary. The accuracy and coverage is entirely determined by that dictionary.

Crafting a dictionary (by hand) is at first easy, and then becomes ever harder, as finer and more varied linguistic expressions are added. I invite everyone to try their hand at creating a new dictionary for a new language, with the understanding that this is a multi-year task, and not something you can knock off in a weekend. It is kind-of pleasant and relaxing, though; kind of half-way between solving crossword puzzles and solving sudoku puzzles. Difficult but mindless; invigorating but soothing.

First, I noticed that for certain relative clauses with "to be", the parser allows sentences such as "That is what is that" (as well as the correct "That is what that is"), where the former is generally never written or spoken by General American speakers.

I'll see if I can fix this, and the problems below.  There are two key issues: "to be" is the most complicated of all the words in English, having amazingly complicated rules behind it. That these rules might be inaccurate is not a surprise.  The other issue is that, in order to allow a broader coverage, to correctly parse a greater number of sentences, some rules were loosened, and then were never tightened back up again. I've thought it much more important to correctly understand good sentences, rather than to reject bad sentences.  This is partly to allow slang, patois, street-talk and mixed dialects to be parsed. Thus, fixing some of these are difficult.

    +----------->WV---------->+           |
    +-------->Ws---------+    |           |
    |             +<-CO<-+    |           |
    |       +_ICCI+Xc+   +Ss*w+--Ost-+    |
    |       |     |  |   |    |      |    |
LEFT-WALL that   is  , what is.v that.j-p ?

Note the CO link; this is the "Clause Opener" link.  "Hey Joe, what is that?" "I mean, what is that?" "That is to say, what is that?" -- I can't really fix "that is what is that" except to maybe demand a question mark, or to demand a comma. The CO clause opener link is the give-away of what is really happening.

I'll take a shot at fixing the problems below. You can also report issues by opening a bug report, here:

Oh, and, um, which version are you currently using?

Re: my thoughts, below -- these are all reasonable complaints. Fixing them without breaking something else is .. well, I'll see. Sometimes it is hard, and sometimes it is easy.


Secondly, the parser seems to have problems recognizing postpositive adjectives, which while not the norm in English, are certainly used.
Some examples taken from Wikipedia:  "They took him to the people responsible." implies a very different meaning to "They took him to the responsible people.", but the parser will not recognize the first sentence at all.
"Every visible star is named" also has a different meaning from "Every star visible is named", the parser will not recognize the second sentence.
"She was the queen regent" is incorrectly parsed with "queen" being a modifying noun.

Thirdly, the parser seems to have difficulty distinguishing some gerunds with objects to adjectival present participles.
Example: "He grew up speaking English", "grow up" is an intransitive verb, yet the parser recognizes "English" as the object of this sentence and "speaking" as an adjective describing English erroneously.

Lastly, a minor thing, the parser does not seem to recognize "provided" as a conjunction.

I appreciate the program, and I may be wrong about some of these being errors, and am curious on your thoughts.


Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.

Reply all
Reply to author
0 new messages