Extract predicat

25 views
Skip to first unread message

Pierre

unread,
Jan 28, 2021, 11:11:34 AM1/28/21
to link-grammar
Hello, 
As a new user of marvelous Link-Grammar. i wonder if Link-Grammar could help to extract predicats from sentences.
Is there any Predicate-Argument Extractor based on  Link-Grammar ?

Ex:
Input: I love dogs and cats.
Expected output:  I  love dogs # i love cats.

Regards
   Pierre 




Linas Vepstas

unread,
Feb 3, 2021, 12:30:01 AM2/3/21
to link-grammar
Hi Pierre,

There is nothing built-in, but you can readily follow the link types. For example ...
(I look, and, alarmingly, embarrassingly, the very first parse is incorrect! Aieee!) 
So I look at the second parse:

    +---->WV---->+------Op-----+
    +->Wd--+-Sp*i+      +<SJlp<+->SJrp>+
    |      |     |      |      |       |
LEFT-WALL I.p love.v cats.n and.j-n dogs.n


The O points to the object of the sentence (Op means "plural") and SJl points to the left object, and SJr points to the right object. The algorithm would be to pick either SJl or SJr, remove the other, remove the link, and what's left is what you wanted. 

-- Linas
 

Дмитро Досин

unread,
Feb 3, 2021, 1:37:27 AM2/3/21
to link-g...@googlegroups.com
Dear Linas,
I am also looking for predicates using your Link Grammar (and RelEx) software with the aim to update OWL ontology by the learning natural language texts. RelEx approach discourages me because predicates formed there had a form of separate roles of each word in the sentence, but not as a relation between constituent subject noun group and a constituent object noun group with a name of the relation as the verb group or so. I understand that it is my own problem, but what do you think how hard this task is? (I also try to use Naive Bayes learning approach to reach from your 'interword'  meta-semantic relations to real semantics of the sentence.)
Best regards,
Dmytro Dosyn

ср, 3 лют. 2021 о 07:30 Linas Vepstas <linasv...@gmail.com> пише:
--
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAHrUA34wFiZHtKRfjgOkO9Ug%2BOHNGMJxu75aegQPwcoXCf8F%3Dw%40mail.gmail.com.

Linas Vepstas

unread,
Feb 3, 2021, 4:03:25 PM2/3/21
to link-grammar
Hi Dmytro,

The problem with RelEx was that it over-simplified things. So: if you know only a little bit about grammar and linguistics, then concepts like subject, verb, object sound like a good idea, and Stanford-style dependency markup achieves those ideas: the "subj" relation links the verb to the subject, the "obj" relation links the verb to the object, and so on. RelEx is a collection of rules that takes LG as input, and generates Stanford dependencies as output.  I was unable to use RelEx for anything useful, mostly because it discarded too much information. There is a lot of information in the LG parse that is ignored, thrown away by RelEx, and what was left was "information poor".

You are asking about "constituent subject noun groups" and "constituent object noun groups", and a similar set of remarks apply.  So, first of all, LG does provide constituents:

linkparser> !con=3
constituents set to 3
linkparser> I love black and white cats

                 +--------------Op--------------+
    +---->WV---->+              +-------A-------+
    +->Wd--+-Sp*i+      +<-AJla<+->AJra>+       |
    |      |     |      |       |       |       |
LEFT-WALL I.p love.v black.a and.j-a white.a cats.n

(S (NP I.p) (VP love.v (NP (ADJP black.a and.j-a white.a) cats.n)))


Type !help at the command line for more info about settings. There are also docs on the website.

So:
1. The LG constituents are obtained from the LG parse using a collection of rules (thus, its like relex; but the rules are very different)
2. The constituent form throws away a lot of information from the parse. It is "informationally starved" - anorexic, bulemic.
3. If you don't like the above format, or want something different, you too can create your own system, with your own rules, to trace links and make inferences. For example, the Op arrow points from the verb "love" to the object phrase "black and white cats" and indicates that "cats" is the head-word of that phrase.  The A arrow points from the head-word "cats" to the modifier phrase "black and white". It points to the head-word of phrase, which is "and".  The two AJ arrows point to the two leaves.  You can certainly write your own code to walk this tree, and to perform algorithmic, mechanical transformations on it, to convert it into some different form.
4. Unless you have a tight, narrow application, with fairly rigid and controlled language input, you will suffer from information loss during the conversion. 
5. Pretty much anyone who tries to use LG, or any other parsing system, to build some kind of AI robot, hits this information-loss wall.  The prototype works great on small, simple sentences!  The full-scale production machine is unleashed, and then promptly collapses and dies on sentences like this (Mark Twain) "Well, say, Joe, you can be Friar Tuck or Much the miller's son, and lam me with a quarter-staff; or I'll be the Sheriff of Nottingham and you be Robin Hood a little while and kill me." -- even if you had a "perfect" p[arse of this sentence, what would you do with the information in it?
6. I think it is possible to build "information amplifiers" (information extractors) instead of information reducers, but you have to be very very careful and clever in defining "information". Practical experience has shown me that Relex, and constituency parsers are "information destroyers".  Achieving amplification is ... hard. We have not yet invented the transistor.   (Although I do think I know how to build one, but that is a different topic.)

These comments apply to both what you are trying to do, and what Perre was describing.

Oh, and 7. -- The Russian dict does not have a constituency form. Someone would need to write a "4.0.constituent-knowledge" for Russian.

-- Linas.



--
Patrick: Are they laughing at us?
Sponge Bob: No, Patrick, they are laughing next to us.
 

dmytro.dosyn

unread,
Feb 6, 2021, 4:38:59 AM2/6/21
to link-grammar
Dear Linas,
thank you for so detailed explanations! For me, a possibility itself is most important. I'm trying to catch up and use the context of a parsed message to supplement it (context, represented by an ontology) by a piece of new knowledge. I suppose to use a formal text of sci paper summaries, although I realize sentences there could be not simple enough. Constituents could help on the way. But Java API did not let me set the configuration to obtain a constituent structure of the message. 
I`m not using Russian at all, because I am Ukrainian.
Best wishes,
Dmytro

середа, 3 лютого 2021 р. о 23:03:25 UTC+2 linasv...@gmail.com пише:

Linas Vepstas

unread,
Feb 7, 2021, 2:46:03 AM2/7/21
to link-grammar
For marking up sentences with context/ontology, you should look at what Rada Mihalcea did with Word-Sense Disambiguation. I liked it a lot ...

To get constituents in java, you would have to hack the java bindings. They are unmaintained.

--linas

Anton Kolonin @ Gmail

unread,
Feb 7, 2021, 6:50:04 AM2/7/21
to link-g...@googlegroups.com

Hi Dmytro,

Regarding Java, we have limited functionality research project that is capable to do LG-parsing using pure Java implementation, relying on features of English Link Grammar Dictionary (i.e. we don't support morphology stuff like one used in Russian dictionary):

https://github.com/aigents/aigents-java-nlp

So far, we use it for test segmentation and text generation only, here is the latest paper:

http://aigents.com/papers/2020/NLS_IEEE_Xplore.pdf

Best regards,

-Anton

Дмитро Досин

unread,
Feb 8, 2021, 8:17:13 AM2/8/21
to link-g...@googlegroups.com
Thank you for your response! I'll study all mentioned. I am optimistic about the effectiveness of using Java bindings to LG. I rely more on using the statistics of the signs which LG gives to users. 
Regards,
Dmytro

нд, 7 лют. 2021 о 13:50 Anton Kolonin @ Gmail <akol...@gmail.com> пише:
Reply all
Reply to author
Forward
0 new messages