Using word lists to generate keywords - error messages

294 views
Skip to first unread message

David King

unread,
Feb 25, 2014, 9:50:42 AM2/25/14
to ant...@googlegroups.com
Hi Professor Anthony

I am trying to use a word list (generated via sketch engine) against which to compare a corpus we have compiled here at UAL.  The word list's formatting did not match that required by AntConc and so I reformatted it as RANK FREQUENCY WORD and saved it as a .txt file.  Despite having nothing but numbers in the rank and frequency columns, I continue to get the "One or more of the rank column values is not a number" and "One or more of the frequency column values is not a number".  I have read through and double-checked the read-me file, but I am at a loss to understand what I've done wrong.

Many thanks in advance for your time and attention

David King

PS:  I doubt you'll remember our correspondence over the summer of 2013 (regarding type/token ratios and what seemed to be antconc/antprofiler discrepancies), but I've since finished that corpus research (it was for my MA at KCL) and I passed with distinction!  Sincere, albeit belated, thanks for your input!

Laurence Anthony

unread,
Feb 25, 2014, 10:28:51 AM2/25/14
to ant...@googlegroups.com
Dear David,

Congratulations on your MA distinction. Well done!

If you use AntConc 3.4.1, when there are mismatches, the software will tell you at which lines in your list you have problems. If you go to those lines, you should be able to identify what the problem is.

Also, make sure that your file is saved in the matching character encoding to the target files. UTF-8 is the default in AntConc 3.4.1 and I recommend you use this.

Laurence.

David King

unread,
Mar 4, 2014, 5:56:10 AM3/4/14
to ant...@googlegroups.com
Hi Laurence

Thank you for your prompt reply.  As it turned out, the file had been saved as ANSI and not UTF-8.  I have since changed that, but the error message I now get is "One or more of the rank column values in file RANKFREQUENCYWORD.txt is not a number. First error on line: 1.  Value: ..."  then there is a small square shape followed by the number 1.  I have gone back into the file and re-typed numbers 1 to 10 in UTF-8 but that doesn't seem to have made a difference.  I'm wondering if there might be another explanation?

Many thanks
David 

Laurence Anthony

unread,
Mar 4, 2014, 7:27:56 AM3/4/14
to ant...@googlegroups.com
Hi David,

When you save your file as UTF-8, did you you save it with a BOM by accident? Saving it without the BOM will work. Unfortunately, Windows Notepad doesn't make this obvious. Perhaps use Notepad++, which is a much better text editor anyway. It will give you lots of options to save and change the character encoding.

Laurence.



###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################


--
You received this message because you are subscribed to the Google Groups "AntConc-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To post to this group, send email to ant...@googlegroups.com.
Visit this group at http://groups.google.com/group/antconc.
For more options, visit https://groups.google.com/groups/opt_out.

David King

unread,
Mar 4, 2014, 10:05:39 AM3/4/14
to ant...@googlegroups.com
Hi Laurence

I have no idea what a BOM is, but I played around with a variety of .txt options and by saving the output from sketch engine as a tab delimited .txt file (once I had reformatted it as RANK FREQUENCY WORD), I was able to get it to work.

Many thanks again!
David

Laurence Anthony

unread,
Mar 4, 2014, 10:20:45 AM3/4/14
to ant...@googlegroups.com
Hi David,

Most 'good' text editors would not add the BOM (Byte Order Mark) to a UTF-8 text file because the BOM is actually an non UTF-8 character. However, in classic fashion, Microsoft adds it every time. It allows Microsoft Notepad to open the UTF-8 file correctly.

Glad you managed to get everything fixed.

Laurence.



###############################################################
Laurence ANTHONY, Ph.D.
Professor
Center for English Language Education in Science and Engineering (CELESE)
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.antlab.sci.waseda.ac.jp/
###############################################################


Mura Nava

unread,
Mar 5, 2014, 7:07:07 PM3/5/14
to ant...@googlegroups.com

hi david

you may find this interesting for future reference https://plus.google.com/104940199413423400545/posts/UeANcr79eVr

ta
mura
Reply all
Reply to author
Forward
0 new messages