A format for a word list

23 views
Skip to first unread message

Илья Бажаев

unread,
Jun 3, 2026, 9:26:31 PMJun 3
to AntConc-Discussion
Hello,

I'm trying to import a frequency word list (attached below) in the txt UTF-8 file format in the Corpus Builder (Word Lists) section of Corpus Manager as a Simple Word List applying the type/freq format which was previously described here https://groups.google.com/g/AntConc/c/aEUKYXsIfwM. But after I press the Create button the program closes automatically. Could you please explain, is there another option to work with txt word lists now or only the three file MSTV format is available? 

Thanks in advance,

Ilya
lemmas_60k.txt

Laurence Anthony

unread,
Jun 12, 2026, 6:05:39 AM (14 days ago) Jun 12
to ant...@googlegroups.com
Hi,

Sorry for the late reply.

I checked your file and I noticed a few things. First, the format was not correct as it didn't have a header for the 1st column. Also, the numbers contained commas, so they were not numbers but strings. Also, the file was not UTF-8 encoded, which is the default in AntConc.

If you want it to work properly, you need to format it as I have in the attached file.

By the way, I'm not sure what you're trying to do with this list, but if you're trying to create a lemmatized word list of your target corpus, this is not the way to do it. The better way is to simply load in your lemma tagged corpus directly into AntConc (using the simple_word_pos_headword_indexer option) and then choose to display lemmas in the word list tool.

I hope that helps.

Laurence.


--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/antconc/6864a2f0-e7b4-4a99-94ce-09bb9bccf80fn%40googlegroups.com.
lemmas_60k.txt
Reply all
Reply to author
Forward
0 new messages