stop words

667 views
Skip to first unread message

Eric Lease Morgan

unread,
Jan 12, 2022, 4:19:55 PM1/12/22
to AntConc-Discussion

Today I downloaded the new version of AntConc, and while I like many of the new features, I do miss the stop word list function. All too often I have students load their corpus into AntConc, and then use the Word tab to list the most frequent words and allude to the corpus's aboutness. Sans the stop word function, this is not as possible.

Do y'all have any suggestions? 

--
Eric Lease Morgan
Navari Family Center for Digital Scholarship
Hesburgh Libraries
University of Notre Dame

574/631-8604
https://cds.library.nd.edu

David Adler

unread,
Jan 13, 2022, 3:46:27 AM1/13/22
to AntConc-Discussion
Same problem here!

Laurence Anthony

unread,
Jan 13, 2022, 3:56:34 AM1/13/22
to ant...@googlegroups.com
Hi Eric and David,

I'm in the middle of all sorts of coding right now and haven't had time to really think about this issue. As you say, the stop list function has changed in AntConc 4. Can you confirm that you just want a stop list to apply to the Word List tool results, effectively serving as a filter to hide certain rows?

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################


--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antconc/8aec1a33-70e8-4b1b-9e6d-f3c87a68ee33n%40googlegroups.com.

David Adler

unread,
Jan 13, 2022, 4:35:14 AM1/13/22
to AntConc-Discussion
For me it would also be helpful if the stopwords are not shown in the collocates/n-grams. Not sure if this is is related anyway. (Thanks for the reply. I just mentioned the other post here, as I was not sure if you saw the comment because of the problems with notifications some weeks ago.)

Laurence Anthony

unread,
Jan 13, 2022, 7:05:50 AM1/13/22
to ant...@googlegroups.com
Thanks for the update, David.

The problem with stop-words, is that they are a very blunt instrument. Depending on the corpus and the research question, some stop-words might be very important and relevant, but the researcher will not notice because they have been eliminated too early. 

In my own research, I never use stop words and instead prefer to use likelihood/effect size statistics, frequency/range thresholds, and comparisons with reference corpora to generate key words.

Applying a blunt stop-word filter to the various lists is now quite simple to implement because all the results are stored in a database. I think a slight modification to the advanced search dialog is all I need to do.

Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Eric Lease Morgan

unread,
Jan 13, 2022, 8:24:21 AM1/13/22
to ant...@googlegroups.com


On Jan 13, 2022, at 7:05 AM, Laurence Anthony <antho...@gmail.com> wrote:

> The problem with stop-words, is that they are a very blunt instrument. Depending on the corpus and the research question, some stop-words might be very important and relevant, but the researcher will not notice because they have been eliminated too early.
>
> In my own research, I never use stop words and instead prefer to use likelihood/effect size statistics, frequency/range thresholds, and comparisons with reference corpora to generate key words.
>
> Applying a blunt stop-word filter to the various lists is now quite simple to implement because all the results are stored in a database. I think a slight modification to the advanced search dialog is all I need to do.


Thank you for the prompt replies.

The stop words function is useful to me in the word frequencies, n-grams, and collocations functions. The use of stop words is an easy concept for noobies to get their heads around when beginning with text mining and natural language processing. It would be great if it were added back in, or if some workaround were articulated that was also easy to implement. I sincerely appreciate things like log-likelihood, etc. but such concepts are difficult to explain to students new to the methodology.

David Adler

unread,
Jan 13, 2022, 9:49:39 AM1/13/22
to AntConc-Discussion
I understand that stopwords are a very clumsy tool. In my case, I try to use it as a means to make multiple tweets in a file irrelevant for each other.

With Collocates 5L 5R I would avoid that words from prior or succinct tweets are included in the calculation by putting something like ”stopñä stopñä stopñä stopñä stopñä” in between the tweets in the source file and then exclude the ”stopñä“ from being shown in the results.

------
<TweetID1>
Some tweet words

stopñä stopñä stopñä stopñä stopñä

<TweetID2>
Some  other tweet words

stopñä stopñä stopñä stopñä stopñä

----

Searching for collocates for ”some”
  • “words” from  TweetID1 and  “Some” from TweetID2 are now separated by 5 (unique) words.
  • ”stopñä Some” is excluded with stop words for being irrelevant.
Of course I don’t expect you to a include this for this special case.

Laurence Anthony

unread,
Jan 13, 2022, 11:33:29 AM1/13/22
to ant...@googlegroups.com
Hi David and Eric,

I understand that your needs for stop words vary significantly. For Eric, I can see where their inclusion would be useful. As for David's use, I think you no longer need to worry about this as AntConc can easily treat each tweet as a separate document. The only problem comes when there are millions of tweets and AntConc tries to render these as different files in the file viewer on the left of the main window or in the plot tool. The file viewer is something that I can fix by linking the viewer directly to the internal database. In this case, the number of entries become irrelevant. But, the plot tool would still be a problem. For now, I think I will just provide you with a workaround for using traditional stop lists. Just give me a few days, as I'm quite rushed with my day job right now!

Regards,
Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.

Stephen Gourlay

unread,
Jan 22, 2022, 11:21:42 AM1/22/22
to AntConc-Discussion
I agree - a stop-word option is useful in Word, Collocate, and N-Gram

Emma Goldsmith

unread,
Feb 4, 2022, 5:49:30 AM2/4/22
to AntConc-Discussion
I'd also find a stop-word list useful for the keyword view. When the target and reference corpora are compared, I seem to get a lot of noise from single letters and two-letter words. 

Laurence Anthony

unread,
Feb 4, 2022, 6:39:12 AM2/4/22
to ant...@googlegroups.com
Thanks Emma and all,

Since early January, I've been rushed off my feet with admin duties and class grading, but I'll get back to the AntConc updates in a week or so.

Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.

Laurence Anthony

unread,
Jul 8, 2022, 9:43:23 PM7/8/22
to AntConc-Discussion
Hi everyone,

In response to this question, AntConc now has a word filter feature (in the global setting settings) that allows for showing only words that are in the list or not in the list (i.e. serving as a stop list).

Laurence.

Stephen Gourlay

unread,
Jul 9, 2022, 3:59:57 AM7/9/22
to ant...@googlegroups.com
Thanks very much Lawrence
Best wishes
Stephen
> You received this message because you are subscribed to a topic in the Google Groups "AntConc-Discussion" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/antconc/QBMGYz4_EME/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to antconc+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/antconc/6fc0b1da-740e-44d4-9ba2-72d6ea3c1667n%40googlegroups.com.

Vered Silber-Varod

unread,
Jul 17, 2022, 2:36:46 AM7/17/22
to AntConc-Discussion
Dear Lawrence,
Could you please elaborate a little on the filter option. In previous version I could use an English stopword list that was embedded in Antconc. Should I use an external list now? Can you refer to a standard American English stop word list? 
Thank you
Vered

Laurence Anthony

unread,
Jul 17, 2022, 5:01:02 AM7/17/22
to ant...@googlegroups.com
Hi Vered,

AntConc has never included a stoplist. It was always required that the user add one.

If you do an Internet search for "Stop list" or "Stop word list", you'll find many, many. I actually don't really like using stop lists, because they are ad hoc lists that are not tailored to the target data, so you might end up deleting/ignoring words that are relevant to the target study, especially as most stop-lists are quite large. My preference is to use a tool like "keywords" to get a more meaningful subset of words that can be explained.

Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

Stephen Gourlay

unread,
Jul 17, 2022, 9:07:47 AM7/17/22
to ant...@googlegroups.com
Hi Vered
If you don't want to use a standardized stop list you can easily make
your own - just list the words to exclude in a plain text file then
load it under Settings > Global settings > Tool Filters > Use words in
file. I find it useful to tidy up displays when I'm not interested in
"and", "a" etc. or there are a number of single letters in the output.
But you need to make sure you don't exclude words that could be
meaningful in your research.

Best wishes
Stephen

On Sun, 17 Jul 2022 at 07:36, Vered Silber-Varod <vered...@gmail.com> wrote:
>
> You received this message because you are subscribed to a topic in the Google Groups "AntConc-Discussion" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/antconc/QBMGYz4_EME/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to antconc+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/antconc/05e1dac5-7c38-40fe-8ae9-11dfe21957bfn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages