General Feedback Lab06

36 views
Skip to first unread message

AKBC2022

unread,
Jun 8, 2022, 3:11:20 PM6/8/22
to AKBC2022
Hi everyone,

Here are some remarks and comments on previous week's submission.

- Everyone beat the baseline! Well done!
- We have a new best score for the lab! Well done!
- the hidden set had 1599 sentences, with 3.2 GT extractions per sentences on an average,  min extractions per sentence being 1 -> all sentences had at least 1 SPO extraction.
- The best solution could extract an avg of 2.3 triples per sentence.
- In general precision scores were much higher than the recall.
- Highest recall: 0.82
- Highest Precision: 0.89

Notes:
1. In case you get low scores -> check you output
        - submission contained same values for verb and subject
        - submission had `!verb` to denote negation, but benchmark evaluation cannot handle this
        - submission had white spaces for subjects/objects

2. Current skeleton code without any formatting cannot support multiple SPO triples from the same verb since the triples are stored in a dictionary keyed on the predicate verb.
You could improve your recall by
    - appending subsequent same keys with some ordering  
        For the sentence: The French and the Portuguese captured the islands.

        verbs =
        {
            'captured': {'subject':'The French', 'object': 'the islands'},
            'captured__1': {'subject':'The Portuguese', 'object': 'the islands'},
        }

        such that while formatting, you take care to remove the ordering (`key.split('__')[0]`) from the key (==verb)

    - modifying the returned dict to store lists of subjects and objects:
        verbs =
        {
            'captured': [
                {'subject':'The French', 'object': 'the islands'},
                {'subject':'The Portuguese', 'object': 'the islands'}
            ]
        }
        Here again the formatting code would have to be slightly modified.


Sample approaches used:

1. Find the verb occurrences using POS tags, tokens upto the verb is the subject and the tokens following the verb is the object
    Edge cases - when sentence begins with a verb
               - compound sentences - object/subject may become very long
               - may not work well in object-verb-subject, where the semantics may be lost
                       "James Cameron directed Avatar ." ->  'directed': {'subject': 'James Cameron', 'object': 'Avatar'}
                       "Avatar was directed by James Cameron ." -> 'directed': {'subject': 'Avatar', 'object': 'James Cameron'}



2. Baseline: sending over the provided baseline is not enough to pass. If you fall in this category, you have not been awarded Pass.
   
3. Using POS and dependency parser
    - For all verbs determine if verb is active (s,p,o) or passive (o,p,s)
    - account for conjunctions

4. Improving upon the baseline:
    Baseline only returns single tokens for subject, predicate and verb - include phrases if present
    Modifications for complex statements, where subject/object direct head may not be a VERB.

Hope this helps.

Shrestha
    

Reply all
Reply to author
Forward
0 new messages