accents / encoding

3 views
Skip to first unread message

Richard Wicentowski

unread,
Mar 28, 2010, 10:26:22 PM3/28/10
to SemEval2010.CrossLingualLexicalSubstitution
Hi,

Two part question:

1. In the documentation, it says that you removed all diacritics by
converting them to regular characters. Is the gold standard in this
format, or should our answers include diacritics?
2. If the answer to #1 is "include diacritics", should our files be
encoded as Latin-1 or UTF-8?

Thanks,

Rich

Diana McCarthy

unread,
Mar 29, 2010, 7:38:49 AM3/29/10
to clls...@googlegroups.com
Hi Rich

I understand from my co-organisers that we are removing all diacritics
to make it easier for systems. (2.2 of the documentation) so don't
include diacritics

very best

Diana

Richard Wicentowski wrote, On 29/03/10 03:26:


--

===========================================================================
Diana McCarthy, http://www.dianamccarthy.co.uk/
Lexical Computing Ltd. http://www.sketchengine.co.uk/
===========================================================================


Reply all
Reply to author
Forward
0 new messages