Training Data Available - Practice Phase has now begun.

h.tayyarmadabushi

unread,

Sep 13, 2021, 2:43:39 PM9/13/21

to semeval-2022-task-2-MWE

Dear all,

The training data for the task has recently been released and the practice phase has now begun. During this phase you are encouraged to build your models, and submit results to CodaLab (https://competitions.codalab.org/competitions/34710).

For a detailed description of the tasks and the data format, please see the task description website: https://sites.google.com/view/semeval2022task2-idiomaticity

We have also released some strong baselines and preprocessing scripts on Google Colab Notebooks, that you might find useful. In addition, the dataset description paper is also available at: https://arxiv.org/abs/2109.04413

To familiarise yourself with this Task and get started, we suggest:

Join the Task mailing list so you receive regular updates on the task.
Read through this page to understand the two Subtasks and their settings.
Familiarise yourself with the timelines of this Task.
Decide on the Subtask(s) and Setting(s) you intend to participate in. You can participate in any one (or more) of the following:
1. Subtask A: Zero-shot
2. Subtask A: One-shot
3. Subtask B: Pre-train
4. Subtask B: Fine-tune
Step through the Google Colab Notebooks with the Baselines for each Subtask and Setting you wish to participate in so you understand the requirements.
Submit the resultant baseline file to Codalab so you are clear about the submission format.
Start working on your own method for the Subtask(s) Setting(s)
Submit your results

If you have any questions at all, please do not hesitate to post them to on this forum. You can also get in touch with the organisers at semeval-2022-task-2-...@sheffield.ac.uk

Best wishes,

Harish

Rob v

unread,

Dec 20, 2021, 5:35:19 AM12/20/21

to semeval-2022-task-2-MWE

Thanks for releasing the data. I am a bit confused about the training data labels for task B. Is it correct that it only contains 100% positive examples? (i.e. 1.0 similarity score). It seems like on your google CoLab example you use predicted labels as gold, right? (in: _get_predictions_for_train_data_labels)

Harish Tayyar Madabushi

unread,

Dec 20, 2021, 8:28:56 AM12/20/21

to semeval-2022-task-2-MWE

Hi,

That's a great question and yes, you are right. The similarity scores between sentence pairs that are not equivalent (i.e. STS is not 1) are not provided and instead you are provided with a pair of alternative sentences (which will never contain an idiomatic expression) whose similarity can be used as a proxy for the original two. _get_predictions_for_train_data_labels, like you've pointed out, generates these labels.

Notice that this provides you with a way to train your model to output similarity scores that are consistent between sentences that contain idiomatic expressions (the original pair) and those that do not (the alternative pair provided) -- which is the goal of Subtask B. Importantly, if we released these scores, then we would be requiring you to train your model to be consistent with the model we used to generate these scores instead of being self-consistent.

This is the case presented in the table "Subtask B: Sample Training Data (Case 1)" on the task description website here.

In summary:

When the similarity is provided, it will always be 1 and can be used to train your mode (called Case 2 in the example on the task description website).
When the similarity is not provided, you will be provided with two alternative sentences that do not contain idiomatic expressions and you will need to find the similarity between those two sentences to use as the training labels for the original sentences pair (called Case 1 in the example on the task description website). Typically, you'd first use a model to generate the labels (using the alternative pairs) and then train the *same* model using these generated labels and the original sentence pairs (that contains idiomatic expressions).

Hope this clarifies things and do let me know if you have any other questions or run into any difficulties.

Best

Harish

初征

unread,

Dec 29, 2021, 4:36:46 AM12/29/21

to semeval-2022-task-2-MWE

Hello, organizer! I have some questions about the data restrictions.

1. in SubtaskA, I wonder if the model can be trained using unofficially supplied data . For example, can I train with datas other than [train_one_shot.csv, train_zero_shot.csv] which are provided officially?

2. What is the definition of "idiom specific data" in SubtaskB's pre-train setting？Is it the data that marks MWE in the sentence and whether MWE is an idiom or not.

Thank you.

Harish Tayyar Madabushi

unread,

Dec 31, 2021, 2:15:34 AM12/31/21

to semeval-2022-task-2-MWE

Hi,

That's a good question.

Subtask A

Please do NOT use training data other than that provided [train_one_shot.csv, train_zero_shot.csv] for this subtask. We have carefully designed the tasks to differentiate between the zero shot and one shot settings and we also want to be able to directly compare models which we will not be able to do if you use your own training data. I've updated the task description website with this note.

Subtask B

Yes, you are right - Please do not use any data with annotations associated with MWEs or Idiomatically in the pre-train setting of Subtask B. You are free to use any training data (either your own or additional data generated from the data provided) in the fine-tune setting of Subtask B.

In summary:

Subtask A, Zero Shot: Use only "train_zero_shot.csv" only. Do NOT use your own training data or the one shot data.
Subtask A, One shot: Use "train_zero_shot.csv" and "train_one_shot.csv" only. Do not use your own training data.
Subtask B, pre-train: Any training data that you generate which does not explicitly mark out MWEs or idiomaticity (no idiom specific data).
Subtask B, fine-tune: train_data.csv and any training data you generate.

Hope that helps and let me know if you have any other questions.

Wishing everyone a fantastic new year.

Harish

Rob v

unread,

Dec 31, 2021, 5:53:03 AM12/31/21

to semeval-2022-task-2-MWE

Hi Harish,

I have already prepared my model for subtask A, which uses multi-task learning (it is really the focus of my approach). It is not trained on similar tasks, can I still submit, and perhaps not participate in the final ranking?

Best,

Rob

Harish Tayyar Madabushi

unread,

Dec 31, 2021, 11:21:11 AM12/31/21

to semeval-2022-task-2-MWE

Hi Rob (and anyone else in a similar position),

Yes, absolutely, and we also encourage you to submit a paper. We'll list systems that make use of additional data from tasks that are dissimilar separately in the task description paper. Unfortunately, we will not be able to include them in the final ranking.

Best

Harish

Dylan RS Phelps

unread,

Jan 1, 2022, 9:33:47 AM1/1/22

to semeval-2022-task-2-MWE

Hi Harish,

I am hoping to get a clarification about which setting I fall under. I have collected examples for each of the idioms in the dev and eval datasets, and am using these to train my model. Does this fall under the pre-train or fine-tune setting? The data doesn't contain any STS scores, just an idiom and a list of sentences which contain that idiom.

Thanks!

Harish Tayyar Madabushi

unread,

Jan 4, 2022, 6:44:08 AM1/4/22

to semeval-2022-task-2-MWE

Hi,

Thanks for this question - I should have been clearer in my original post. Subtask B, pre-training should have read as follows:

Subtask B, pre-train: Use any training data that you generate which does NOT explicitly consider MWEs or idiomaticity for the assignment of STS scores (no idiom specific data). An example of data that you can NOT use is any data that is similar to the fine-tune setting's training data. For clarity: You are allowed to use sentences containing MWEs (e.g. for pre-training as in the dataset paper) as long as you do not include associated STS scores.

So, to answer your question, as long as you haven't assigned STS scores to the sentences you have collected, you will fall under the "pre-train" setting. Please take note that the test set will have a different set of idioms, so you should be prepared to collect whatever associated data you need - you will have between the 10 of January (when the test data will be released) and the end of January (when the evaluation period ends) for this.

For completeness, I've put down the full list of restrictions below (and have also updated the task description website with this):

Subtask A, Zero Shot: Use "train_zero_shot.csv" only. Do NOT use your own training data or the one shot data.

Subtask A, One shot: Use "train_zero_shot.csv" and "train_one_shot.csv" only. Do not use your own training data.

Subtask B, pre-train: Use any training data that you generate which does NOT explicitly consider MWEs or idiomaticity for the assignment of STS scores (no idiom specific data). An example of data that you can NOT use is any data that is similar to the fine-tune setting's training data. For clarity: You are allowed to use sentences containing MWEs (e.g. for pre-training as in the dataset paper) as long as you do not include associated STS scores.
Subtask B, fine-tune: Use train_data.csv and any training data you generate. For clarity: You are allowed to add your own sentences containing STS scores for this setting.

Hope this helps and let me know if anything is unclear.

Best

Harish

Reply all

Reply to author

Forward