SemEval 2016 Task 5 ABSA Subtask 1: Sentence-level ABSA (Domain: Restaurants)

Nouman Dilawar

unread,

Mar 6, 2016, 2:42:38 PM3/6/16

to Maria Pontiki, semeva...@googlegroups.com

Hi,

I am a student interested in performing "SemEval 2016 Task 5 ABSA" and specifically am interested in Subtask 1: Sentence-level ABSA (Domain: Restaurants). I am just confused between test set and gold set for Subtask 1. Actually i want to ask that, on which set it is required to evaluate the Accuracy Gold or Test ?

Please help me out!

Thanks

Regards,

Nouman Dilawar

Jakub Macháček

unread,

Mar 7, 2016, 6:04:06 PM3/7/16

to SemEval-ABSA, mpon...@gmail.com

Hello Nouman Dilawar,

You already missed the evaluation deadline (January 22+29). Nevertheless, you can still use train, test and gold data sets to compare your future system with in-time submitted systems and contribute to the same task next year, if reopened. Both test and gold sets contain the same set of reviews. The difference between them is that the latter contains gold annotations (i.e. what your system should return). Test sets were released just few days before submission deadline to prevent the participants from overfitting their models. Gold sets were released (after the deadline, for obvious reasons) so participants could analyze effectivity of their systems in details. You don't really need any test set, just download the corresponding gold set and use it for both getting predictions and comparing them with gold annotations.

Best regards,

Jakub Macháček

Dne neděle 6. března 2016 20:42:38 UTC+1 Nouman Dilawar napsal(a):

Nouman Dilawar

unread,

Apr 16, 2016, 8:33:27 AM4/16/16

to SemEval-ABSA, mpon...@gmail.com

Hey! thanks for a brief description :)

I have implemented a code in python to predict aspect categories of sentences. But, am bit confused in calculating the accuracies or F1 micro averaging scores. Suppose, a sentence x belongs to three a,b and c categories and my system predicted a and b correctly but unable to predict category c. So, how should i treat this sort of predictions. Whether should i assign a overall False and give my system a score of 0 or assign my system a score = 2 out of 3 ?

Jakub Macháček

unread,

Apr 17, 2016, 3:48:17 AM4/17/16

to SemEval-ABSA, mpon...@gmail.com

Hello again,

Let's modify your example a little bit. Suppose a sentence contains aspect categories a and b and your system returns a and c. Then it correctly predicted a (true positive), failed to predict b (false negative) and falsely predicted c (false positive). True negatives are irrelevant.

Precision is defined as true_positives/(true_positives+false_positives)

Recall is defined as true_positives/(true_positives+false_negatives)

F1-measure is defined as the harmonic mean of precision and recall, i.e. F1 = (precision*recall)/(precision+recall)

Dne sobota 16. dubna 2016 14:33:27 UTC+2 Nouman Dilawar napsal(a):

Nouman Dilawar

unread,

Apr 17, 2016, 5:56:30 AM4/17/16

to Jakub Macháček, SemEval-ABSA, Maria Pontiki

thank you so much that is a great answer :)

Sorry to bother you again, One thing more that I would to ask is, whether I should take care of outofScope sentences or not? should I skip these sentences ? because they dont have categories in it.

Regards,

Nouman Dilawar

--
You received this message because you are subscribed to the Google Groups "SemEval-ABSA" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semeval-absa...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jakub Macháček

unread,

Apr 17, 2016, 10:08:10 AM4/17/16

to SemEval-ABSA, xtr...@gmail.com, mpon...@gmail.com

The out-of-scope sentences were not included in the official evaluation.

Dne neděle 17. dubna 2016 11:56:30 UTC+2 Nouman Dilawar napsal(a):

Reply all

Reply to author

Forward