I'm working on Machine Reading Comprehension.Let me summarise, to start with it targets- QA- text summaryI'm using DistilBert transformer QA model running it per paragraph on the lesson of interest and attempting answers for question of interestI had explored a lot on usage ofCoreNLP models for finding Parts Of Speech, it works decently finds various POS including Named Entity Recognitions and lexical tree parsing.Then landed at lib link grammar, which is cool.However I'm looking at constituent tree for multiple sentences, like- ability to map pronouns singular/plural) across the passage to corresponding proper nounsI can see in almost all cases (with null link allowed), lib link grammar fails to get this done.
I'm looking at tweaking lib link grammar.Am I missing anything in my understanding or usage?
Is my understanding correct that lib link grammar cannot really map / inter link several sentences into a single constituent tree.
Also corenlp/Stanford Parser does not work at giving mapping of words (pronouns/proper nouns) mapping across a passage
Please give your thoughtsThanks
There is a research work done on Language Segmentation Based on Link Grammar
Code is open-source and works for English dictionary (no morphology support)
>The biggest issue with the entire approach is the need to
have humans hand-craft custom rules to handle each of the
exceptions and special cases. It's like trying to build a
sky-scraper out of two-by-fours: after a while, there are too many
pieces, the structure is too complex, the whole thing becomes ever
more fragile and unmaintainable.
You received this message because you are subscribed to the Google Groups "link-grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email to link-grammar...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/link-grammar/CAHrUA369tB%3DfHxUa9ZXZe1oYPM2dnk96ZBdQsvYHW59nJ2wtMw%40mail.gmail.com.
-- -Anton Kolonin telegram/skype/facebook: akolonin mobile/WhatsApp: +79139250058 akol...@aigents.com https://aigents.com https://www.youtube.com/aigents https://www.facebook.com/aigents https://wt.social/wt/aigents https://medium.com/@aigents https://steemit.com/@aigents https://reddit.com/r/aigents https://twitter.com/aigents https://golos.in/@aigents https://vk.com/aigents https://aigents.com/en/slack.html https://www.messenger.com/t/aigents https://web.telegram.org/#/im?p=@AigentsBot
nlp = stanza.Pipeline('en', processors='tokenize, pos, lemma, depparse, ner', use_gpu=False, pos_batch_size=3000)
are quite accurate.
Then feed in the mappings to reorganize the input passage before passing it onto DistilBert.
Hi Linas,Thanks for the pointers and inputs.I will go through the https://github.com/opencog/opencog/tree/master/opencog/nlp/anaphora and also https://github.com/opencog/learn.Earlier today I was taking a look at https://github.com/opencog/relex (still struggling to even have it built properly) It is definitely using liblink-grammar under the hood, so I feel it will also not generate dependency parsing across multiple sentences, yet to try it out. Please correct me if I am wrong.
For MRC, DistilBert hugging face model does generate reasonable answers for questions out of grade-school level. It definitely needs a pre-processing setup to create a dependency tree of all mappings of proper nouns/ pronouns and subjects/objects to make it improve accuracy.A lame method is to get Euclidean distance across all pronouns/proper nouns and subjects/objects
I agree the linkages are not expected to intersect with liblink-grammar which means it won't really work. Only hope I had with null-links allowed, it would generate a combination of linkages, well sorting out the various linkages and inferring data from them to get mapping is going to take forever.
I agree with your comments on it.
Here is the adjacency matrix. proper nouns, pronouns, noun subjects, noun modifiers can be the nodes of the graph
Tom John friends They Marley pet these guys
Tom 1 0 1 0 0 0 0 0
John 0 1 1 0 0 0 0 0
friends 1 1 1 1 0 0 0 0
They 0 0 1 1 0 0 1 0
Marley 0 0 0 0 1 1 0 0
pet 0 0 0 0 1 1 0 0
these 0 0 0 1 0 0 1 1
guys 0 0 0 0 0 0 1 1
A BFS followed by DFS on the graph can give all the relevant mappings
if x -> y then automatically y -> x.
If there is the same noun repeated, we can either collapse to earlier occurrence
or suffix it to make it a new entry in the adjacency matrix.
Hobb's algorithm does seem to get us closer to the above adjacency matrix
# Load SpaCy
nlp = spacy.load('en_core_web_sm')
# Add neural coref to SpaCy's pipe
coref = neuralcoref.NeuralCoref(nlp.vocab, greedyness=0.5)
doc = nlp(text)
tok_list = list(token.text_with_ws for token in doc)
for cluster in doc._.coref_clusters:
cluster_main_words = set(cluster.main.text.split(' '))
for coref in cluster.mentions:
cluster_mention_words = set(coref.text.split(' '))
v = bool(cluster_mention_words.intersection(cluster_main_words))
if v is False:
tok_list[coref.start] = cluster.main.text + doc[coref.end-1].whitespace_
#input = "Apparently I love the camera and I it"
input = "Tom and John friends. They live together. Marley is pet of these guys"