Task 8 Paper Draft

David Jurgens

unread,

Mar 14, 2022, 11:59:54 PM3/14/22

to semeval-2022-task-8-multilingual-news

Hi Task 8 participants,

We're thrilled to share a draft of the Task 8 paper, which goes into significant detail on how the data was collected and annotated. The task paper also tries to synthesize all of the systems you submitted and suggest key take-aways.

We welcome any feedback, particularly around whether we have described your system correctly (see Table 6). If you have questions or suggestions for additional things we might include, please let us know. We certainly learned a lot about the systems (and the task) through writing and analyzing all the performances.

Thanks,

David Jurgens (on behalf of the organizers)

SemEval-2022_Task-8_draft.pdf

Shotaro Ishihara

unread,

Mar 15, 2022, 1:11:15 AM3/15/22

to David Jurgens, semeval-2022-task-8-multilingual-news

Hello,

Thank you for sharing a draft of the task 8 paper.

I'm a member of team Nikkei, and there are two feedback from us.

- In Table 6, the system of Nikkei does fine-tuning. (no -> yes)

- In citation, the author of Nikkei paper is "Shotaro Ishihara, and Hono Shirai".

Best,

Shotaro Ishihara at Nikkei

--
You received this message because you are subscribed to the Google Groups "semeval-2022-task-8-multilingual-news" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semeval-2022-task-8-mult...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/semeval-2022-task-8-multilingual-news/CAD9LGsHLjvMHwC6SY545Bj-r56OBB%2BbLx%2B4ZeUDmy%2BNhwMd_YA%40mail.gmail.com.

Emanuela Boros

unread,

Mar 15, 2022, 12:27:58 PM3/15/22

to semeval-2022-task-8-multilingual-news

Hello,

The reference for the paper "EMBEDDIA at SemEval-2022 Task 8 : Investigating Sentence, Image , and Knowledge Graph Representations for Multilingual News Article Similarity" is wrongly written: two authors are missing and for some reason, there is a location reference instead of the other authors.

Thanks.

mattia.samory

unread,

Mar 16, 2022, 5:32:17 AM3/16/22

to semeval-2022-task-8-multilingual-news

Thank you both for spotting the oversights, I have amended them in the paper!

Best,

m

sandeep

unread,

Mar 16, 2022, 10:22:55 AM3/16/22

to semeval-2022-task-8-multilingual-news

Hi David,

I am from Team HuaAMS. Thanks for sharing the draft of the paper and allowing us to submit feedback.

Couple of things :

In table 6 against HuaAMS, I noticed under “Data Handling / NER” it says we use NER. Please note that this is NOT the case. We used it only in our baselines (NER matching + cosine similarity)
This brings me to my next point, I noticed in section. 4.5 you mention two systems one that relies on pre-trained embedding and scores 0.759 and, a system that uses (NER matching + cosine similarity) and scores in the top 25%. Both these do not have any referrences

Best,

S

On Tuesday, March 15, 2022 at 4:59:54 AM UTC+1 jur...@umich.edu wrote:

nidhir bhavsar

unread,

Mar 16, 2022, 10:32:47 AM3/16/22

to semeval-2022-task-8-multilingual-news

Yeah, I also wanted to continue to the previous discussion, under section 4.3 there is a mention for using informative parts from an article that our system does reconcile with, so maybe it's worth mentioning our team's paper there. Thank you.

Михаил Сергеевич Куимов

unread,

Apr 10, 2022, 4:28:30 PM4/10/22

to semeval-2022-task-8-multilingual-news

Hi, David!

It is not clear to me from the draft, how the cross-lingual pairs has been dealed in the baseline approaches. It is mentioned that baselines have Jaccard similarity as feature. But for all cross-lingual pairs it will be zero. Can you clarify it, please?

Respectfully,

Mikhail Kuimov

среда, 16 марта 2022 г. в 17:32:47 UTC+3, nidbha...@gmail.com:

Scott Hale

unread,

Apr 11, 2022, 2:50:28 AM4/11/22

to Михаил Сергеевич Куимов, semeval-2022-task-8-multilingual-news

Thank you, Mikhail, for flagging. For cross-lingual pairs, we first extracted named entities in each language. We then compared these named entities to Wikipedia page titles and used the interlanguage links in WikiData to determine if named entities in different languages were in fact the same. The Jaccard similarity for cross-lingual pairs is calculated using only the named entities successfully mapped to Wikipedia page titles (for monolingual pairs it is all named entities extracted).

I've updated the task paper to the following and will further revise as we prepare the camera-ready version.

> First, the named entities of each article are extracted using spaCy and polyglot. For monolingual pairs, we select pairs of articles having high Jaccard similarity of these named entities. For cross-lingual pairs, we attempt to match the named entities to Wikipedia article titles and store the Wikidata concept ids for matching Wikipedia articles, which are language agnostic. We then select cross-lingual pairs of articles having high Jaccard similarity of these Wikidata concept ids.

Thank you!

Scott

--

You received this message because you are subscribed to the Google Groups "semeval-2022-task-8-multilingual-news" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semeval-2022-task-8-mult...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/semeval-2022-task-8-multilingual-news/b4173cbd-2889-4518-9174-a97af277c7afn%40googlegroups.com.

--

Dr Scott A. Hale

http://scott.hale.us
computer...@gmail.com

Reply all

Reply to author

Forward