analyzing a word list in a corpus

571 views
Skip to first unread message

lina milosevska

unread,
Jan 30, 2023, 6:02:14 PM1/30/23
to AntConc-Discussion
I have compiled a corpus of journalistic texts (about  2 million words) that I am going to analyze using Antconc which to my absolute pleasure works well with Macedonian language.  My research foucuses on examining anglicisms in journalistic text. For that purpose I have compiled a word list of about 8 thousand anglicisms. I want to examine my anglicisms word list in the corpus by using Antconc e.g. count, frequencies, concordance and other options that antConc provides. The problem that I am facing and for which I am contacting you is that besides the corpus that I have, I have a word list of anglicisms that I extracted from my corpus. I want to count these anglicisms in the corpus using Antconc and also to do other analysis that AntConc allows. The issue is that it is not clear to me how to do that. Which tools in AntConc allow for examining a word list in a corpus? also AntConc would not upload my wordlist properly. So I uploaded it as a separate corpus but I am not sure if this is the right thing to do.

Thank you very much

Laurence Anthony

unread,
Jan 30, 2023, 8:52:33 PM1/30/23
to ant...@googlegroups.com
Hi Lina,

I think you might be confused about two different concepts in AntConc. The first is constructing a "word list corpus" in the corpus manager. Many people don't have the raw files of a reference corpus and just have a word list from the corpus. The idea of a "word list corpus" is to create a reference corpus from just such a a word list. This 'corpus' can then be used as a comparison against the target corpus in order to create keywords in the keyword tool. The second concept is "searching a target using a list of search query words/phrases". In this case, we are not creating a reference corpus, but just using a list of words, phrases, or other search queries with the main target corpus. I think this is what you want to do. In this case, you can do either of the following:
a) just separate the query items in the main search box using the || wildcard, e.g., dog||cat||mouse
b) open the Advanced Search (Adv Search) box, activate the "Search Query List" and enter the list of queries there.

Does that help you?

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################


On Tue, 31 Jan 2023 at 08:02, lina milosevska <lina....@gmail.com> wrote:
I have compiled a corpus of journalistic texts (about  2 million words) that I am going to analyze using Antconc which to my absolute pleasure works well with Macedonian language.  My research foucuses on examining anglicisms in journalistic text. For that purpose I have compiled a word list of about 8 thousand anglicisms. I want to examine my anglicisms word list in the corpus by using Antconc e.g. count, frequencies, concordance and other options that antConc provides. The problem that I am facing and for which I am contacting you is that besides the corpus that I have, I have a word list of anglicisms that I extracted from my corpus. I want to count these anglicisms in the corpus using Antconc and also to do other analysis that AntConc allows. The issue is that it is not clear to me how to do that. Which tools in AntConc allow for examining a word list in a corpus? also AntConc would not upload my wordlist properly. So I uploaded it as a separate corpus but I am not sure if this is the right thing to do.

Thank you very much

--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antconc/f2ba231f-1071-4bc1-916c-4f6ed5139f3bn%40googlegroups.com.

lina milosevska

unread,
Jun 20, 2023, 9:03:52 AM6/20/23
to AntConc-Discussion
Dear prof. Anthony,

well still I am having problems with counting my word list in the target corpus. I have a word list of over 8 thousand words. It seems that the advanced search (search query) allows for just a few words (5-6) to be searched at a time. Also it is not clear to me how to separate the search items using a wildcard as you suggested in your email. So my question is how to insert a wildcard between words. Another question is it possible for AntConc to analyized a wordlist of 8000 words in a target corpus. If yes, please do explain the necessary steps as I have tried many things but it does not work.

Thank you

Laurence Anthony

unread,
Aug 19, 2023, 1:40:02 AM8/19/23
to AntConc-Discussion
Hi Lina,

Sorry for the slow response. I'm still not exactly sure what you are trying to do, but I just released AntConc 4.2.1, which has some updates for loading word lists as corpora. Perhaps you can try that and come back here if you are still having problems. Note that the expression "counting my word list" is a little confusing to me, so you'll need to be clear about what you're trying to do if/when you come back.

I hope that helps.

Laurence.

lina milosevska

unread,
Nov 2, 2023, 9:16:17 AM11/2/23
to AntConc-Discussion
Dear prof. Anthony,

My problem have not been solved. But first I have an issue creating a word list in AntConc. I have installed the latest version 4.2.4. Whenever I try to upload my word list as a txt file AntConc provides this massage:

The following word list file could not be read.

See the error report below.

Could not determine delimiter.

When I try to upload the same word list but as an excel file with the extension xlsx. (unlike in the tutorial where you explain creating a word list. Your file is with csv extension) Antconc simply shuts down and it does not create a wordlist either way. Please how can I solve this issue. What should I do? please. The format of the word list is of course not important what I need is to be able to create a wordlist and then be able to do the intended analysis. Please help.


Laurence Anthony

unread,
Nov 10, 2023, 11:40:32 PM11/10/23
to ant...@googlegroups.com
Hi Lina,

Sorry for the late reply. I was away in China. To address your problem, let's start from scratch. Here's the situation as I understand. Please correct me if I'm wrong. I'll number all the assumptions and steps for clarity:

1) You have a corpus of texts
2) You have a list of words
3) You want to count how often the words in the list appear in the texts

To understand this process, I recommend you start with a much simpler setup. Here are the steps:

1) Create a simple corpus containing a single corpus text containing the text "The cat sat on the mat."
2) Create a simple word list containing a single word "the".
3) Load the corpus into AntConc via the File-Open File(s) as Quick Corpus option

image.png

4) Load the word list via the Global Settings->Tool Filters opinion

image.png

5) Go to the word list tool, and click start. The frequencies of the words in the word list (in this case "the") will be shown.

image.png

6) Repeat the above steps for your own corpus and word list.

I hope that helps!

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Amena El-Shafie

unread,
Aug 15, 2024, 7:56:55 AM8/15/24
to AntConc-Discussion
Hi Prof. Anthony, 

Hope all is well. I have been facing Lina's problem (Could not determine delimiter) and when I attempted the same steps that you have shown I get no results: "the" is not even shown, despite uploading a file through the Global Settings Filter Options. 

Am I uploading the wrong file type somehow?

Laurence Anthony

unread,
Aug 15, 2024, 10:02:50 PM8/15/24
to ant...@googlegroups.com
Hi Amena,

After you load your file into AntConc, are you able to see its contents in the File View? Can you also see the list of all the words in the Word List tool? If so, what words do you see in the word list tool after you add "the" in the word filter? My guess is that you have clicked the 'hide' option instead of the 'show' option.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Amena El-Shafie

unread,
Aug 15, 2024, 11:08:50 PM8/15/24
to AntConc-Discussion
Morning Professor, 

I can see the contents of the file (Screenshot 1) 
I can also see the contents of the trial wordlist (which only includes the ) (Screenshot 2) 
The result is still the same (Screenshot 3) 
Do you mean the (Hide words) option in the Tool Filters? 

Appreciate your help, always 

3- Trial.png
2- TrialWordList.png
1- TrialCorpus.png

Laurence Anthony

unread,
Aug 15, 2024, 11:11:19 PM8/15/24
to ant...@googlegroups.com
It looks like you have "The" and "THe". Check your spelling. This shouldn't make a difference, but check that first.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Amena El-Shafie

unread,
Sep 1, 2024, 4:28:11 AM9/1/24
to ant...@googlegroups.com
Dear Professor Anthony

Thanks for your reply 


I am still  facing a problem uploading a reference corpus  to my own data/corpus.

To break it down: - I have a corpus of 20 file (almost half a million words) that I want to compare certain aspects of to the BNC wordlist corpus; specifically,  I want to find the frequency and keyness of those files as compared to the BNC wordlist. Adding the files to the Reference Corpus through the Corpus Manager kept showing up a variety of errors. One of them was the (delimiter error) that appeared in this thread earlier. This is what got me here and got me to try your kind solution 

The Global Settings Filter Solution: 

 I am attaching three screenshots of how it sometimes does not work. It worked before once.  I also tried my corpus and adding 'the' file only to the Global Settings Filter tool and it never worked for multiple files. 

Thanks again for your help 


You received this message because you are subscribed to a topic in the Google Groups "AntConc-Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/antconc/I_SQHVD_mwk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to antconc+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antconc/CAL6Fgv0LoOF0qu8fNNF67eHfV5merg2bNzsi9cLFXTLt0__6Nw%40mail.gmail.com.


--
Amena El-Shafie
3- no result.jpg
1- the cat sat on the mat.jpg
2- the word the.jpg

Laurence Anthony

unread,
Sep 1, 2024, 10:10:21 PM9/1/24
to ant...@googlegroups.com
Hi,

If you want to compare the words in your corpus with words from the BNC wordlist, you don't need to use the global filter option at all. Just do the following:

1) In the corpus manager, create a target corpus from your 20 files (using the raw files option)
2) In the corpus manager, create a reference corpus from the BNC wordlist (using the word list option)
3) In the corpus manager, select these two newly created corpora as the target and reference corpus.
4) In the word list tool, create a word list for your target corpus
5) In the keywords tool, generate the keywords.

I hope that helps!

Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Reply all
Reply to author
Forward
0 new messages