Question about the difference between the devset we download and the Codelab's

91 views
Skip to first unread message

robin

unread,
Mar 6, 2019, 11:06:07 PM3/6/19
to BEA 2019 Shared Task: Grammatical Error Correction
Hi,

I've found a little difference between the dev set we download and the Codelab's. What's the reason and which one should we use?

Thanks!

BEA 2019 Shared Task Organisers

unread,
Mar 7, 2019, 7:14:34 AM3/7/19
to BEA 2019 Shared Task: Grammatical Error Correction
Hi Robin,

We just doubled-checked and the files are identical. Are you using version 2 of the W&I+LOCNESS data we uploaded yesterday? What difference did you find?

robin

unread,
Mar 7, 2019, 9:33:53 PM3/7/19
to BEA 2019 Shared Task: Grammatical Error Correction
The number of sentences is different. I downloaded from https://www.cl.cam.ac.uk/research/nl/bea2019st/data/wi+locness.bea19.tar.gz and the number of sentences is 4377. But the codalab's is 4384. It seems that you split one example into two examples. For example,

"i have a dog it name 's chente , it is a golden retriver . It`s a lover dog , he knows how i feel ." is in the origin A.dev.gold.bea19.m2, but the codelab's is split into "i have a dog it name 's chente , it is a golden retriver ." and "It`s a lover dog , he knows how i feel ."

在 2019年3月7日星期四 UTC+8下午8:14:34,BEA 2019 Shared Task Organisers写道:

BEA 2019 Shared Task Organisers

unread,
Mar 8, 2019, 7:48:47 AM3/8/19
to BEA 2019 Shared Task: Grammatical Error Correction

You are using an old version of the data. Please download v2 from the website.

Let us know if you still have problems,
Mariano
Reply all
Reply to author
Forward
0 new messages