Evaluation data live

100 views
Skip to first unread message

Scott Hale

unread,
Jan 9, 2022, 10:39:24 PM1/9/22
to semeval-2022-task-8-multilingual-news
Dear all,

The evaluation/test data for SemEval 2022 Task 8: Multilingual News Article Similarity is now available. Please see "The Data" section of the CodaLab website for details.

The automatic scoring system is not yet active. It will be activate later this week. There appears to be an expired SSL certificate on codalab.org at the moment. You should be able to bypass this error on your browser, but if you would prefer you can also download the data from Google Drive.

The text below is a repeat from the website:

[The data] is in the same format as other datasets, but does not have the "Geography", "Entities", "Time", "Narrative", "Overall", "Style", or "Tone" columns. To participate in the competition, you should provide a CSV file with the pair id in column one and your prediction for the numeric Overall score (1-4, decimals ok) in column two. The automatic scoring program will be activated later this week and further details will be provided on data format.

All news articles referenced in the dataset are available on the Internet Archive at the time of posting. We recommend you use the semeval_8_2022_ia_downloader Python package that was earlier released with the trial and training data. If you have any problems accessing articles, please let us know on the Google group. As mentioned at the start of the competition, the evaluation data includes surprise languages not seen in training (including in the cross-lingual setting). We plan to enable an English-only option for the competition in which case you will only be expected to submit similarity scores for pairs where `url1_lang` and `url2_lang` are both equal to "en".

Best wishes,
SemEval Task 8 Organizers

nidhir bhavsar

unread,
Feb 1, 2022, 5:02:30 AM2/1/22
to semeval-2022-task-8-multilingual-news
Hello Organizers,

Just being curious, are we allowed to submit English-only data as mentioned earlier in the mail? If yes will it be before the deadline, or are we supposed to add the scores directly to the paper?

Best, 
Nidhir

Scott Hale

unread,
Feb 1, 2022, 9:14:24 AM2/1/22
to nidhir bhavsar, semeval-2022-task-8-multilingual-news
We'll release the evaluation data after the current phase ends, and that should allow everyone to calculate their score breakdowns for different languages. 

We didn't  set up a separate codalab site or leaderboard for English-only in the end. We had hoped it could be done with one CodaLab website but later realized that was not possible.  

If there is anyone who has not submitted to the task but would want to submit to an English-only task, please let me know.

Best wishes,
Scott


--
You received this message because you are subscribed to the Google Groups "semeval-2022-task-8-multilingual-news" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semeval-2022-task-8-mult...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/semeval-2022-task-8-multilingual-news/5e5a955a-aaf5-401b-ae65-6409ecb0d823n%40googlegroups.com.


--
Reply all
Reply to author
Forward
0 new messages