Questions about SemEval-2021 Task 12 - Learning with Disagreements

ryan stark

unread,

Dec 3, 2020, 10:27:09 PM12/3/20

to semeval-task12-participants

Hi，I am interested in SemEval-2021 Task 12, but I find some questions which make me no sense. I wish you can anwer these questions below, thank you very much.

Q1: In the PDIS dataset, every case have a "feature" field, in your description " 'features': a list of lists, each inner list representing syntatic features extracted for each mention,".

I just don't understand where is the 'features' comes from? And How to use it ?

Q2: In the PDIS dataset I notice that one annotator_id could have more than one label, how to understand it? In other words, why the value type is list not a integer in the key-value pairs of annotator_id and annotations?

Q3: In the PDIS dataset's submisson format section, you said " where model predictions => a list with 2 elements, representing the probability of the image belonging

to each of the 2 classes. " What's the image mean? a slip in writing？

Q4: How to win this competition and which leaderboard should I focus on in the official competition?

Thanks again!

Alexandra Uma

unread,

Dec 4, 2020, 4:22:50 AM12/4/20

to semeval-task12-participants

Hello Ryan,

Thank you for your interest in our shared task. And for sharing your questions on this forum.

Re-Q1: The features were adapted from feature set in the paper Incremental Fine-grained Information Status by Hou 2016. The baseline PDIS model provided in starting_kit.zip uses those features to learn the PDIS classifier.

Re-Q2: Yes, the value for each annotator_id is a list of integers, not a single integer. In PDIS, annotators were allowed annotate the same item (markable or in this case, mention) more than once. We provide all the annotations given by the annotator for any given item.

Re-Q3: Yes, it is a slip in writing, I'm sorry. We meant "...probability of the mention belonging to each of the two classes". We'll correct that.

Re-Q4: You may decide to focus on a single task, and propose a disagreement-aware loss function that works for that task alone. If you do this, you will not win the competition but on the leader board for that task, you will come out on top.

To win the competition, your novel approach to learning from crowd annotations must have best average score across all the tasks.

I hope this response answers your questions. If not, please feel free to ask for further clarification.

Thank you.

Venneesha Gudimetla

unread,

Oct 18, 2021, 11:17:42 AM10/18/21

to semeval-task12-participants

Hi,

I am working on SemEval 2021 -Task12 -Humour as a part of course work and so looking for some help as I only have basic programming and no ML experience prior to this.

Downloaded all the training and dev data and startingkit.zip and successful in generating an output.

Have few Questions:

1) I don't see a column for Humour task under the Hard or Soft evaluation. Does the Aurc-8 column is where I have to look for the result related to Humour?

2) have submitted the output file generated with the given data and model, but still see a 0 on the score board. what could be wrong?

3)I believe that some modifications are needed for the given starting kit model in terms of adding some new code or functionality, but stuck wondering what it could be.

4) also wondering how do we evaluate the output against the given input?

I know these questions might seem very lame , but am really new to ML and finding it difficult to relate all the pieces and make sense to me.

So any inputs are appreciated.

Thank you.

Alexandra Uma

unread,

Oct 22, 2021, 5:20:09 AM10/22/21

to Venneesha Gudimetla, semevaltask1...@googlegroups.com

Hello,

Thank you for choosing to participate in the shared task.

To answer your questions:

1. Yes, the humour task results are shown in the AURC-8 column.

2. To answer this question, I need to ask, "have your submissions been made under the user name asuresh5"? If so, the submissions don't match the required format. Have you consulted the task participation page ?

3. The starting kits provide a base model for learning a task and training is done in a standard way (computing the cross entropy of the model's predictions wrt labels aggregated from a crowd consensus or gold hard labels). Modifications can be done in one of several ways:

- on how the hard labels are aggregated as in dawid and skene, 1979

- in generating soft labels using the crowd labels as in peterson et al, 2019 or uma et al 2020

- or in the way the model computes its loss functions using the crowd information as in rodrigues and pereira, 2018, plank et al 2014b

4. Evaluation is done using: (1) F1 wrt gold labels. This is the hard evaluation and (2.) cross entropy wrt probabilistic soft labels from crowds. This is the soft evaluation. See uma et al 2020 or uma et al, 2021 for further details on this.

I hope this response clarifies things. Feel free to reach out again if you have further questions or requests regarding the task - no questions will be considered lame.

Kind regard,

SemEval-2021 Task 12 organizers.

--
You received this message because you are subscribed to the Google Groups "semeval-task12-participants" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semevaltask12parti...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/semevaltask12participants/8edbfa97-af06-44a2-8957-a6fe27f56239n%40googlegroups.com.

Reply all

Reply to author

Forward