==========================================================
WebNLG+: The Second WebNLG Challenge
First call for participation: Training data now
available!
==========================================================
WebNLG goes bi-lingual (English, Russian) and
bi-directional (generation and parsing)!
TASKS
The challenge comprises two main tasks:
- Task 1, RDF-to-text generation: similar to WebNLG
2017 but with new data and into two languages;
- Task 2, Text-to-RDF semantic parsing: converting a
text into the corresponding set of RDF triples.
For Task 1, given the four RDF triples shown in (a),
the aim is to generate a text such as (b) or (c). For
Task 2, the opposite should be achieved, i.e. to
generate the triples in (a) starting from text as in (b)
or (c).
(a) Set of RDF triples
<entry category="Company" eid="Id21" size="4">
<modifiedtripleset>
<mtriple>Trane | foundingDate |
1913-01-01</mtriple>
<mtriple>Trane | location |
Ireland</mtriple>
<mtriple>Trane | foundationPlace |
La_Crosse,_Wisconsin</mtriple>
<mtriple>Trane | numberOfEmployees |
29000</mtriple>
</modifiedtripleset>
</entry>
(b) English text
Trane, which was founded on January 1st 1913 in La
Crosse, Wisconsin, is based in Ireland. It has 29,000
employees.
(c) Russian text
Компания "Тране", основанная 1 января 1913 года в
Ла-Кроссе в штате Висконсин, находится в Ирландии. В
компании работают 29 тысяч человек.
INDICATIVE DATES
- 15 April 2020: Release of Training and Development
Data
- 30 April 2020: Release of some simple preliminary
evaluation scripts to support development
- 30 May 2020: Release of the final evaluation
scripts
- 13 September 2020: Release of Test Data
- 27 September 2020: Entry submission deadline
- 15-18 December 2020: Results of automatic and human
evaluations and system presentations at INLG 2020
DATA DOWNLOAD and REGISTRATION
To register for the WebNLG+ task and download the
WebNLG+ training and development data, please fill the
form below:
The data, evaluation scripts and system outputs of
WebNLG 2017 can also be downloaded here:
EVALUATION
For the evaluation phase, starting on July 17th, new
test sets will be released for all categories seen in
the training data (see above), and for several new
unseen categories (categories not included in the
training data). For a task, each team can submit more
than one system, but can only submit one output per
system; in other words, multiple submissions of the same
non-deterministic system should be avoided. Participants
are free to choose which task and language they want to
provide results for (generation and/or semantic parsing,
English and/or Russian).
System outputs as well as baseline and human-produced
outputs will be evaluated.
For RDF-to-text generation, two evaluations will be
carried out:
- Automatic evaluation, with standard n-gram-based
and embedding-based metrics such as BLEU, METEOR, TER,
ChrF++, BERTScore, etc; global and detailed results will
be provided (per DBpedia category, per input size,
per Category and Input Size, etc.).
- Human evaluation: system outputs will be assessed
according to criteria such as
grammaticality/correctness, appropriateness/adequacy and
fluency/naturalness, by native speakers recruited on
crowdsourcing platforms.
For Text-to-RDF semantic parsing, the automatic
evaluation of three aspects is foreseen, in terms of
recall, precision and F1-score:
- Property identification.
- Subject and Object Identification
- Full triple identification.
Initially, preliminary evaluation scripts are
released and can be used to test the models. The final
evaluation scripts and metrics used for WebNLG+ will be
provided at a later stage (see Indicative Dates).
MOTIVATION
The WebNLG data was originally created to promote the
development of RDF verbalisers able to generate short
text and to handle micro-planning (i.e., sentence
segmentation and ordering, referring expression
generation, aggregation); the data for the first
challenge included a total of 15 DBpedia categories. The
2020 challenge aims first of all at increasing the
datasets (hence, the coverage of the verbalisers), by
covering more categories and an additional language. The
other main objective of the 2020 edition is to promote
the development of knowledge extraction tools, with a
task that mirrors the verbalisation task.
[RDF Verbalisers] The RDF language—in which DBPedia
is encoded—is widely used within the Linked Data
framework. Many large scale datasets are encoded in this
language (e.g., MusicBrainz, FOAF, LinkedGeoData) and
official institutions increasingly publish their data in
this format. Being able to generate good quality text
from RDF data would open the way to many new
applications such as making linked data more accessible
to lay users, enriching existing text with information
drawn from knowledge bases or describing, comparing and
relating entities present in these knowledge bases.
[Multilinguality] By providing a bilingual corpus
(English and Russian), we aim to promote the development
of tools for languages other than English and to allow
for experimentation with pre-training and transfer
approaches (do the English verbalisations of RDF triples
help in better verbalising the triples in Russian?)
[Knowledge extraction] The new semantic parsing task
opens up new lines of research in several directions.
Can it be used to bootstrap entity linkers? How does
RDF-based semantic parsing relate to other semantic
parsing tasks where the output semantic representations
are lambda terms or KB queries? Can semantic parsing be
used to improve generation in ways similar to the back
translation approaches proposed in machine translation?
ORGANISING COMMITTEE
* Thiago Castro Ferreira, Federal University of Minas
Gerais, Brazil
* Claire Gardent, CNRS/LORIA, Nancy, France
* Nikolai Ilinykh, University of Gothenburg, Sweden
* Chris van der Lee, Tilburg University, The
Netherlands
* Simon Mille, Universitat Pompeu Fabra, Barcelona,
Spain
* Diego Moussalem, Paderborn University, Germany
* Anastasia Shimorina, Université de Lorraine/LORIA,
Nancy, France
CONTACT
REFERENCES
* Creating Training Corpora for NLG Micro-Planners.
C. Gardent, A. Shimorina, S. Narayan and L.
Perez-Beltrachini. Proceedings of ACL 2017. Vancouver
(Canada).
* The WebNLG challenge: Generating text from RDF
data. C. Gardent, A. Shimorina, S. Narayan and L.
Perez-Beltrachini. Proceedings of INLG, 2017. Santiago
de Compostela (Spain).
* Building RDF Content for Data-to-Text Generation.
L. Perez-Beltrachini, R. Sayed and C. Gardent.
Proceedings of COLING 2016. Osaka (Japan).
* Enriching the WebNLG corpus. T. Castro Ferreira, D.
Moussallem, E. Krahmer and S. Wubben. Proceedings of
INLG, 2018. Tilburg (The Netherlands).
* Creating a corpus for Russian data-to-text
generation using neural machine translation and
post-editing. A. Shimorina, E. Khasanova and C. Gardent.
Proceedings of BSNLP Workshop, 2019. Florence (Italy).