lemma keyword lists

539 views
Skip to first unread message

tat...@gogol.f9.co.uk

unread,
Nov 5, 2018, 11:05:19 AM11/5/18
to AntConc-Discussion

Dear Laurence

I am trying to create lemma keyword lists for two corpora - one in Russian and one in Ukrainian. I have got the lemma lists for both languages and I am trying to follow your AntConc 3.2.4 Tutorial 10: Working with lemmas. It works OK up to one point: I have created the reference corpus lemma list, cut it down to 3 columns , and followed the rest of the instructions in the video until I reach the stage of  creating a lemma keyword list. Then in both Russian and Ukrainian corpora I get a message "One or more of the rank column values is not a number. First error on line:0 Value 1". If i ignore this message and click on STRAT I get another one:  "NO reference corpus word list was available. Generate a reference corpus".

I have repeated the procedure from the video good half a dozen times in both corpora from scratch trying to be very careful in following the procedure, I use UTF-8 encoding., and I have used these corpora in creating keyword lists, looking at collocations and concordancers in AntConc - in short they work in every tool apart from lemma keyword lists. I know it is difficult to say without looking into what I am doing, but I was wondering if you could suggest what I am doing wrong? 

I hope you could help


Kind regards

Tatyana

Laurence Anthony

unread,
Nov 14, 2018, 10:41:27 PM11/14/18
to ant...@googlegroups.com
Hi Tatyana,

> I have created the reference corpus lemma list

What you basically need is a *target corpus lemma-based word list" and an equivalent "reference corpus lemma-based word list".

I suggest you start by doing the following:

1) Load in a simple raw file of English (as a target corpus) into AntConc (e.g. the first paragraph of https://en.wikipedia.org/wiki/Corpus_linguistics)

2) Load in a lemma list (e.g. from my website) via the Word list tool preferences

3) Generate a lemma-based target word list using the Word List tool

4) Repeat steps 2-3 for a the whole text of https://en.wikipedia.org/wiki/Corpus_linguistics  serving as a reference corpus

5) Load in your lemma-based target word list into the Word List preferences as a target corpus word list

6) Load in your lemma-based reference word list into the Keyword List preferences as a reference corpus word list.

7) Generate your lemma-based keyword lists using the Keywords tool.

Once you get the above working, you should probably be able to figure out the problem with your own texts.

I hope that helps.

Laurence.





###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################


--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To post to this group, send email to ant...@googlegroups.com.
Visit this group at https://groups.google.com/group/antconc.
For more options, visit https://groups.google.com/d/optout.

tat...@gogol.f9.co.uk

unread,
Nov 16, 2018, 8:59:42 AM11/16/18
to AntConc-Discussion

tat...@gogol.f9.co.uk

unread,
Nov 16, 2018, 9:34:39 AM11/16/18
to AntConc-Discussion
Hi Laurence,

Thank you for your reply - I followed all the steps using test English target and reference corpora as you suggested, and managed to create two lemma-based lists - target and reference. But every time i try to load the files into word list/keyword list preferences,  the same message: ""One or more of the rank column values is not a number. First error on line:0 Value 1" appears most of the time, or just nothing happens. Could it be the problem with my computer/software? I am using bog-standard Windows 10 , 64 processor.
I am very sorry to bother you but I really need to use the lemmatised values in my paper...

Thank you

Tatyana

Laurence Anthony

unread,
Nov 16, 2018, 9:45:28 AM11/16/18
to ant...@googlegroups.com
Hi, 

Can you send me the two lists that you created?

Laurence


--

Tatyana Karpenko-Seccombe

unread,
Nov 16, 2018, 7:22:26 PM11/16/18
to ant...@googlegroups.com

Hi Laurence,

 

I am ever so grateful for your help. Here they are – two I created following the suggestions in your recent email, other two with exel in them I created following the steps in your youtube video, in which you suggested dropping the results into exel file and saving first three columns only.

 

Thanks again

 

Tatyana

Test Exel Lemma-based reference word list.txt
Test Exel Lemma-based target word list.txt
Test Lemma-based reference word list.txt
Test lemma-based target word list.txt

Laurence Anthony

unread,
Nov 18, 2018, 7:56:50 PM11/18/18
to ant...@googlegroups.com
Hi Tatyana,

I have found the problems:

1) The Test lemma-based target word list.txt and Test Lemma-based reference word list.txt files are clearly not formatted correctly as they contain all the lemma variants in the list. Delete those and everything should be fine.

2) The Test Exel Lemma-based target word list.txt and Test Exel Lemma-based reference word list.txt files are formatted correctly but they include a BOM (Byte Order Mark) at the beginning (probably introduced by (stupid) Microsoft Notepad). Use a good text editor like Notepad++ and make sure that the are encoded in UTF-8 without the BOM. The attached versions will work as expected.

3) The files produce no Keywords with the default settings because no words are statistically significantly different. If you show all keywords, then you will see some results.

I hope that helps!

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Test Exel Lemma-based reference word list.txt
Test Exel Lemma-based target word list.txt

tat...@gogol.f9.co.uk

unread,
Nov 19, 2018, 10:20:24 AM11/19/18
to AntConc-Discussion
Hi Laurence

Huge thanks - it works a treat!!!!!!!!!!!!!!!!!!!!!!!!!

As soon as I saved my files in Notepad++ it all started working perfectly well!!!!!!

Thank you ever so much! 

Tatyana


On Monday, November 5, 2018 at 4:05:19 PM UTC, tat...@gogol.f9.co.uk wrote:

Laurence Anthony

unread,
Nov 19, 2018, 10:29:02 AM11/19/18
to ant...@googlegroups.com
Excellent! Yes, use Notepad++.

Notepad is the worst text editor out there. Microsoft should be ashamed of themselves.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

--
Reply all
Reply to author
Forward
0 new messages