PARSEME Shared Task 1.2 - 1st Call for Participation

Skip to first unread message

Carlos Ramisch

Nov 13, 2019, 10:50:22 AM11/13/19
to Parseme ST Core

PARSEME shared task 1.2 on semi-supervised identification of verbal multiword expressions

First call for participation

(Apologies for cross-posting)

The third edition of the PARSEME shared task on automatic identification of verbal multiword expressions (VMWEs) aims at identifying verbal MWEs in running texts.  Verbal MWEs include, among others, idioms (to let the cat out of the bag), light-verb constructions (to make a decision), verb-particle constructions (to give up), multi-verb constructions (to make do) and inherently reflexive verbs (s'évanouir 'to faint' in French).  Their identification is a well-known challenge for NLP applications, due to their complex characteristics including discontinuity, non-compositionality, heterogeneity and syntactic variability.

Previous editions have shown that, while some systems reach high performance (F1>0.7) for identifying VMWEs that were seen in training data, performance on unseen VMWEs is very low (F1<0.2). Hence for this third edition, **emphasis will be put on discovering VMWEs that were not seen in the training data**.

We kindly ask potential participant teams to register using the expression of interest form:

Task updates and questions will be posted on the shared task website:

and announced on our public mailing list:

#### Publication and workshop

Shared task participants will be invited to submit a system description paper to a special track of the Joint Workshop on Multiword Expressions and Electronic Lexicons (MWE-LEX 2020), at COLING 2020, to be held on September 13 or 14, 2020, in Barcelona, Spain:

Submitted system description papers must follow the workshop submission instructions and will go through double-blind peer reviewing by other participants and selected MWE-LEX 2020 program committee members.  Their acceptance depends on the quality of the paper rather than on the results obtained in the shared task. Authors of the accepted papers will present their work as posters/demos in a dedicated session of the workshop, collocated with COLING 2020.  The submission of a system description paper is not mandatory.

Due to double blind review, participants are asked to provide a nickname (i.e. a name that does not identify authors, universities, research groups etc.) for their systems when submitting results and in the submitted papers.

#### Provided data

For each language, we provide to the participants corpora in which VMWEs are annotated according to the 1.1 shared task guidelines (

On March 18th, we will release, for each language: 

* A training corpus manually annotated for VMWEs;

* A development corpus to tune/optimize the systems' parameters,

* A larger raw corpus, to favor semi- and unsupervised methods for VMWEs discovery

On April 28th, we will release, for each language:

* A blind test corpus to be used as input to the systems during the evaluation phase, during which the VMWE annotations will be kept secret.

When available, morphosyntactic data  (parts of speech, lemmas, morphological features and/or syntactic dependencies) are also provided, both for annotated and raw corpora.  Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe).

So far we plan to include data for the following languages:

Bulgarian (BG), German (DE), Greek (EL), Basque (EU), French (FR), Hebrew (HE), Hindi (HI), Croatian (HR), Hungarian (HU), Polish (PL), Brazilian Portuguese (PT), Romanian (RO), Swedish (SV).

The amount of annotated data depends on the language.

#### Tracks

System results can be submitted in two tracks:

  * Closed track: Systems using only the provided training and development data (with VMWE and provided morpho-syntactic annotations) + provided raw corpora.

  * Open track: Systems using or not the provided training data, plus any additional resources deemed useful (MWE lexicons, symbolic grammars, wordnets, other raw corpora, word embeddings and language models trained on external data, etc.). However, the use of previous shared task editions' corpora is strictly forbidden. This track includes notably purely symbolic and rule-based systems.

Teams submitting systems in the open track will be requested to describe and provide references to all resources used at submission time. Teams are encouraged to favor freely available resources for better reproducibility of their results.

#### Evaluation metrics

Participants will provide the output produced by their systems on the test corpus. This output will be compared with the gold standard (ground truth).

Emphasis will be put on discovery of the VMWEs in the test corpus that were unseen in the training data. A VMWE from the test corpus is considered seen if a VMWE with the same (multi-)set of lemmas is annotated at least once in the training corpus.

#### Important dates


  * Feb 19, 2020: trial data and evaluation script released

  * Mar 18, 2020: training and development data plus raw corpora released

  * Apr 08, 2020: final call for Participation

  * Apr 28, 2020: blind test data released

  * Apr 30, 2020: submission of system results

  * May 06, 2020: announcement of results

  * May 20, 2020: shared task system description papers due (same as regular papers)

  * Jun 24, 2020: notification of acceptance

  * Jul 11, 2020: camera-ready system description papers due

  * Sep 13-14, 2020: shared task session at the MWE-LEX 2020 workshop at Coling 2020

#### Organizing team

Carlos Ramisch, Marie Candito, Bruno Guillaume, Agata Savary, Ashwini Vaidya, and Jakub Waszczuk

Reply all
Reply to author
0 new messages