Regarding chunking sub-task

21 views
Skip to first unread message

Mishal Kazmi

unread,
Jan 25, 2016, 4:06:23 AM1/25/16
to ists-s...@googlegroups.com
Hello,

Regarding the chunking sub-task should we be able to detect if the text is gold-tokenized (currently this is the case for images and headlines, but not for student answers) or will we be provided with gold-tokenized text?

For example there are issues where a dot or a comma is marked as a separate token for us.

Maybe if we are to be provided with the gold-tokenized text then we can focus more on just dealing with the chunking.

Mostly the issue is with the Student-Answers dataset where the text is not tokenized properly.

Best,
Mishal Kazmi
PhD Student in Electronics Engineering
Human Language and Speech Technologies Lab
Sabanci University

Eneko Agirre

unread,
Jan 27, 2016, 7:14:59 AM1/27/16
to Mishal Kazmi, ists-s...@googlegroups.com

Hi Mishal, all,


The answers-students dataset is not tokenized. As the gold standard and evaluation dataset depend on the indices of the tokens (where token is string separated by whitespace), we suggest that participants do the following for the answers-students dataset:

- remove all punctuation which is not tokenized.
- don't do anything which changes the indices of tokens (where token = string separated by whitespace)

Two examples follow:
 
  A battery should connect to a bulb in a closed path.

  bulbs a, b, and c are on a path with the battery

- in the first example the punctuation in the end of the sentence would be removed (there are 158 sentences in the test dataset)

- in the second example the punctuation would remove the commas of "a," and "b," (73 sentences affected)

Sorry about this. There was an overlook/misunderstanding among the organizers in the case of this dataset. We hope this solution is acceptable for participants.

best

eneko



01/25/2016 10:06 AM(e)an, Mishal Kazmi igorleak idatzi zuen:
--
You received this message because you are subscribed to the Google Groups "Interpretable STS Semeval Task" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ists-semeval...@googlegroups.com.
Visit this group at https://groups.google.com/group/ists-semeval.
To view this discussion on the web visit https://groups.google.com/d/msgid/ists-semeval/CAO2C32GFs-OGB_CBMAOBGCr2Fr3OMNh5WZEuM-XscKAUJYT4XA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--

Eneko Agirre
Euskal Herriko Unibertsitatea
University of the Basque Country
http://ixa2.si.ehu.eus/eneko

Peter Schüller

unread,
Jan 27, 2016, 9:21:02 AM1/27/16
to Interpretable STS Semeval Task, misha...@sabanciuniv.edu
Thank you for the clarification!

Best Regards,
Peter Schüller
To unsubscribe from this group and stop receiving emails from it, send an email to ists-semeval+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages