Test data versus Training data

Caroline Brun

unread,

Oct 10, 2018, 10:58:44 AM10/10/18

to SemEval 2019: Task 9

Dear organizers

It seems that most of the sentences (~95 %) present in TrialData_SubtaskA_Test.csv, are included in the full training set, Training_Full_V1.1.csv, and that
the few sentences left are just slight variants (corrected) of sentences present in the Train set.

Is this normal ? During the trial period, do we have indeed to score TrialData_SubtaskA_Test.csv, i.e. a subsample of the training set ?

Thanks to clarify this point.

Best regards,

Caroline

Sapna Negi

unread,

Oct 10, 2018, 1:11:25 PM10/10/18

to Caroline Brun, SemEval 2019: Task 9

Dear Caroline,

No, this wasn’t supposed to be the case.

Thanks for pointing this out. We may have mixed up the files, we will rectify this asap.

Regards

--
You received this message because you are subscribed to the Google Groups "SemEval 2019: Task 9" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semeval-2019-ta...@googlegroups.com.
To post to this group, send email to semeval-2...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/semeval-2019-task-9/085d05d8-da5f-4683-9a01-479873ed5763%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sapna Negi

unread,

Oct 11, 2018, 6:00:34 PM10/11/18

to SemEval 2019: Task 9, Caroline Brun

Dear participants

Further to the reporting of trial test data being present within the training set, we have updated the GitHub repo with the new version of train dataset Training_Full_V1.2.csv where the test instances are removed.

https://github.com/Semeval2019Task9/Subtask-A

Apologies for any inconvenience caused. We also request the participants to re-submit their new results on Codalab, using the updated train set.

Also, we will be releasing the final version of full train set in a few days with additional train sentences.