Maarten van Gompel
unread,Nov 11, 2009, 10:35:43 AM11/11/09Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to SemEval2010.CrossLingualLexicalSubstitution
Dear organisers,
I wanted to bring to your attention that I found some technical issues
with the trial data, clls.trial.data . I notice you make extensive use
of XML entities, but some of them are broken, possibly by a
tokenisation process. As a result, the XML does not validate, and
picky XML parsers may break on processing. See for example instance
161 , and occuring in several other spots as well.
Secondly, there are some encoding issues. It seems something went
wrong here, as this is not valid UTF-8, iso-8859-1, cp1252, or any
other encoding I recognize. Look for example at instances 16, 100,
112, 133, 190, and various more.
Kind regards,
--
Maarten van Gompel (Proycon)
Induction of Linguistic Knowledge Research Group
University of Tilburg