Hi everyone,
Here are some remarks and comments on previous week's submission.
- Everyone beat the baseline! Well done!
- We have a new best score for the lab! Well done!
- the hidden set had 1599 sentences, with 3.2 GT extractions per sentences on an average, min extractions per sentence being 1 -> all sentences had at least 1 SPO extraction.
- The best solution could extract an avg of 2.3 triples per sentence.
- In general precision scores were much higher than the recall.
- Highest recall: 0.82
- Highest Precision: 0.89
Notes:
1. In case you get low scores -> check you output
- submission contained same values for verb and subject
- submission had `!verb` to denote negation, but benchmark evaluation cannot handle this
- submission had white spaces for subjects/objects
2. Current skeleton code without any formatting cannot support multiple SPO triples from the same verb since the triples are stored in a dictionary keyed on the predicate verb.
You could improve your recall by
- appending subsequent same keys with some ordering
For the sentence: The French and the Portuguese captured the islands.
verbs =
{
'captured': {'subject':'The French', 'object': 'the islands'},
'captured__1': {'subject':'The Portuguese', 'object': 'the islands'},
}
such that while formatting, you take care to remove the ordering (`key.split('__')[0]`) from the key (==verb)
- modifying the returned dict to store lists of subjects and objects:
verbs =
{
'captured': [
{'subject':'The French', 'object': 'the islands'},
{'subject':'The Portuguese', 'object': 'the islands'}
]
}
Here again the formatting code would have to be slightly modified.
Sample approaches used:
1. Find the verb occurrences using POS tags, tokens upto the verb is the subject and the tokens following the verb is the object
Edge cases - when sentence begins with a verb
- compound sentences - object/subject may become very long
- may not work well in object-verb-subject, where the semantics may be lost
"James Cameron directed Avatar ." -> 'directed': {'subject': 'James Cameron', 'object': 'Avatar'}
"Avatar was directed by James Cameron ." -> 'directed': {'subject': 'Avatar', 'object': 'James Cameron'}
2. Baseline: sending over the provided baseline is not enough to pass. If you fall in this category, you have not been awarded Pass.
3. Using POS and dependency parser
- For all verbs determine if verb is active (s,p,o) or passive (o,p,s)
- account for conjunctions
4. Improving upon the baseline:
Baseline only returns single tokens for subject, predicate and verb - include phrases if present
Modifications for complex statements, where subject/object direct head may not be a VERB.
Hope this helps.
Shrestha