This thread contains a summary of some of the most important questions asked about the shared task so far:
1. Since the W&I+LOCNESS corpus is split into different levels, will the test data also be split into different levels?
No. Participants will be given a single plain text file that contains all the tokenised test sentences of all levels combined. Systems should not expect to know the CEFR level of an input text in advance and should hence be prepared to handle all levels and abilities.
2. Will the test sentences be shuffled?
No. To make sure participants do not know the CEFR level of an essay, the test data will be shuffled in terms of essays rather than sentences. The sentence order within each essay will be preserved.
3. Will my system be evaluated against a gold annotated reference or an automatic annotated reference?
A gold annotated reference. Although there will be a mismatch if we compare an automatic annotated hypothesis against a gold annotated reference, we previously found no statistically significant difference between a gold or automatic reference (
Bryant et al., 2017: Section 4.1). This is largely because systems tend to struggle with long, complicated edits and humans tend to annotate them inconsistently anyway.
4. Will punctuation be normalised in the test set?
Yes. Almost all non-ascii punctuation characters have been normalised to their standard ascii equivalents.
5. Can I use just the source or target side of a parallel GEC corpus in the Low Resource Track?
No. Other than the W&I+LOCNESS development set, you cannot use any other learner data.
6. Can I use Wikipedia edits in Track 1, 2 or 3?
Wikipedia edits may only be used in Track 2 and 3. You should already have enough data in Track 1 not to need Wikipedia edits.