Apostrophes and encoding

81 views
Skip to first unread message

David P

unread,
May 23, 2015, 4:26:34 AM5/23/15
to antword...@googlegroups.com
Hi,

I'm working on a Mac, and I've saved Word document files to TXT in UTF-8 encoding using Word for Mac.  When I run them through AWP I find that the word counts are often wrong.  This seems to be due to apostrophes, which (when I try opening them in various apps - LibreOffice, Word, TextEdit) sometimes show up as:
’
with a space before/within this, which seems to make the token counter count an extra word.

However, the sequence ’ doesn't show up as an off-list word in AWP.

Any advice for sorting this out would be much appreciated,
thanks,
David

David P

unread,
May 23, 2015, 8:11:11 AM5/23/15
to antword...@googlegroups.com
Well, after trying it on a PC, I think I've worked out that AWP treats 't and 's as separate words, which I guess is reasonable :-)

Laurence Anthony

unread,
Jun 4, 2015, 8:39:06 AM6/4/15
to antword...@googlegroups.com
Hi,

Sorry for the long delay. You can also edit the AntWordProfiler token definition to include the apostrophes, but it is not standard practice.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

--
You received this message because you are subscribed to the Google Groups "AntWordProfiler-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antwordprofil...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages