Second CfP: Shared Task on the Disambiguation of German Verbal Idioms at KONVENS 2021

1 view
Skip to first unread message

waszcz...@gmail.com

unread,
May 17, 2021, 6:55:15 AM5/17/21
to verbalmwe
[apologies for cross-postings]

Shared Task on the Disambiguation of German Verbal Idioms at KONVENS 2021

https://competitions.codalab.org/competitions/31715

Second call for participation – Training data ready

The shared task on the disambiguation of German verbal idioms (VIDs)
aims to disambiguate instances of a pre-selected set of German VIDs from
their literal counterparts, e.g. /Das Handy fiel ins Wasser / ('The
mobile phone fell into the water') vs. /Das Konzert fiel ins Wasser/
('The concert was cancelled'). This kind of disambiguation is an
implicit or explicit step during VID identification and it is a
well-known challenge for NLP applications like parsing or machine
translation. The caveat is that literal readings of such expressions are
quite rare relative to idiomatic ones, so one of our goals was to
alleviate this issue by providing a corpus with a lower than usual
idiomaticity rate. This allows for the training and evaluation of
classifiers able to disambiguate VIDs from their literal counterparts.

Shared task website:

The shared task will be realized on CodaLab:
https://competitions.codalab.org/competitions/31715

Participants will have to register in order to compete, i.e. to submit
their results and have them evaluated.

The data is made available on our official GitHub repository:
https://github.com/rafehr/vid-disambiguation-sharedtask

#### Publication

Shared task participants will be invited to submit a system description
paper which, upon acceptance, will be published in the shared task
proceedings on konvens.org. Their acceptance depends on the quality of
the paper rather than on the results obtained in the shared task.

#### Data

The shared task data consists of 9906 instances of a set of German VID
types or their literal counterparts in context. The set of VID types was
pre-selected, thus it constitutes a lexical sample data set. It is a
merger of the COLF-VID
(https://www.aclweb.org/anthology/2020.figlang-1.29.pdf) and the German
SemEval-2013 task 5b data set
(https://www.aclweb.org/anthology/S13-2007.pdf).

The data will be uploaded to our official GitHub repository:
https://github.com/rafehr/vid-disambiguation-sharedtask

The training data was released May 15 and the test data will be ready
June 23 which also marks the start of the evaluation phase.

#### Evaluation

Participating teams will be required to submit the test data with the
predictions made by their systems, thus they don't have to submit their
systems, but their results. These will be compared to the gold data. The
evaluation will focus on the minority class of literal readings.
Furthermore, we will include unseen VID types in the dev and test set to
challenge the systems generalization capabilities.

As mentioned above, we will use CodaLab for the shared task
(https://competitions.codalab.org/competitions/31715). Hence, the
predictions made by the systems will be submitted to the CodaLab site
where they will be automatically evaluated.

#### Important Dates

- Trial data ready: April 23, 2021
- Training data ready: May 15, 2021
- Test data ready: June 23, 2021
- Evaluation end: June 30, 2021
- Paper submission due: July 15, 2021
- Camera ready due: August 10, 2021
- KONVENS 2021: September 6-10, 2021

#### Organizing Team

Rafael Ehren, Laura Kallmeyer, Timm Lichte and Jakub Waszczuk
Contact: vid.disambi...@gmail.com
Reply all
Reply to author
Forward
0 new messages