About the NUCLE dataset

379 views
Skip to first unread message

14484...@qq.com

unread,
Feb 18, 2019, 12:25:26 AM2/18/19
to BEA 2019 Shared Task: Grammatical Error Correction
In the NUCLE Release 3.3 dataset I request, the file nucle.train.gold.bea19.m2 only has about 50000 sentences. I wonder it is correct ?

BEA 2019 Shared Task Organisers

unread,
Feb 18, 2019, 8:53:46 AM2/18/19
to BEA 2019 Shared Task: Grammatical Error Correction
That is correct.

Here are the statistics for all the datasets allowed in the Restricted Track:

FCE: 33,237 sentences (train/dev/test)
Lang-8: 1,037,561 sentences
NUCLE: 57,151 sentences
W&I+LOCNESS: 43,129 sentences (train/dev/test)

Although Lang-8 is the biggest, it is also the noisiest of these datasets. All the others were professionally annotated with GEC in mind and should be of higher quality.
Reply all
Reply to author
Forward
0 new messages