Different keyness results when applying different effect size measure settings

60 views
Skip to first unread message

Janailton

unread,
May 16, 2024, 10:39:43 AMMay 16
to AntConc-Discussion
Dear prof Anthony and AntConc group members,

I’ve been testing the keyword tool on AntConc for a while and I’d like to better interpret the scale of numbers behind the effect size results when I use different settings.

For example, when I adopt the default AntConc settings (Dice -- All values), combined with the default  AntConc  log-likelihood settings (4-term - p.<0.05 with Bonferrori correction), the results on the 'keyness (effect)' tab range from 0.069 to 0.00. When I change to MI -- all values, the keyness values shift from 2.736 to 0.099. If I change to MI2 -- all values, the values vary from 12.699 to 5.58. If I go on changing these thresholds and settings, the keyness results will continue coming out differently. Not only that, the keywords also change, which suggests that some words are key only when given settings are adopted, but they cease being key if other settings are inserted. I wonder if this may lead to errors or the results may not be accepted as wholly reliable.

When I checked the literature, Brezina (2018, p. 14) states that the effect size and its standard interpretation in R is the following: 0.1 = Small effect; 0.3 = Medium effect; 0.5 = Large effect. This doesn't seem to be the case of the measures adopted on AntConc, because the scale of keyness varies greatly from Brezina's study. I also checked Dice’s (1945) original paper and found no trace of interpreting these results when adopting Dice either.

Although I understand that there's a lot of debate and little consensus on effect size measures going on, and as I found no information on AntConc manual and elsewhere on how to interpret the great variability of keyness effect results while adopting different effect size settings (including the default  AntConc setting Dice -- all values), do any of you have any clue on how I can proceed in a more accurate interpretation of these results? Should I simply resort to accepting that, if a keyword has a keyness effect of 0.069 and another of 0.00 , then it automatically means that the size of difference in the use of  this particular word in the study and reference corpora is greater than that which showed 0.00?

 Thank you very much!

Best,
Janailton.

Laurence Anthony

unread,
May 16, 2024, 11:02:47 AMMay 16
to ant...@googlegroups.com
Hi,

The issue of which effect size measure to use and how to interpret effect size measures is a quite complex issue, and lots of people seem to have quite different opinions on it. Here is my take, based on what you write in your post. I'll list up the points for convenience.

1) There is no established effect size measure that the field generally agrees with. Some researchers often use MI with collocates, while others might use a different value, or even a combination of values.
2) Each effect size measure will generate values on a different scale so they are not comparable. One may vary from 0 to 1, while another may vary from 0 to infinity.
3) I'm not sure if you are quoting Brezina (2018) accurately, but point 2) holds, so we cannot say 0.1 = small and 0.5 = large. It depends on the measure that you are using.
4) There is no general agreement on how to interpret effect size measures. You need to decide on a measure, understand what the scale is, and then interpret the value appropriately. Take Dice as an example. It varies from 0 (completely different) to 1 (exactly the same). So, if you are comparing groups of people, you would probably not consider a value of 0.5 to be 'very similar', but for two corpora, you might consider a value of 0.5 to suggest that the corpora have 'a lot of overlap' so the effect is 'strong'.
5) AntConc offers a range of effect sizes. So, you are free to choose the ones that you feel familiar with or ones that are intuitively more easy to interpret.
6) For keyword analysis, I don't recommend using an effect size measure to rank the keywords. This is why the default setting in AntConc is to rank by log-likelihood (a statistical test, which is not an effect size measure, but may have an effect size component to it).
7) As each effect size measure exaggerates/weights different types of values (some weight low frequencies higher and others weight high frequency values higher), the results will vary dramatically depending on your choice of effect size measure. This is one reason why I don't recommend using them for keywords (see point 6).

I hope that helps!

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################


--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antconc/6c8f0017-0622-4f9f-aad3-4b44bce565cdn%40googlegroups.com.

Janailton

unread,
May 17, 2024, 7:42:40 AMMay 17
to AntConc-Discussion
Thank you for your explanation, Prof. Anthony. I have some follow-up questions, though. I'd be glad if you could help me.

Let's take the case that I'm ranking the results according to the effect size measures.

So, as for  point 4 in your explanation above, what's the scale for interpreting the effect size values when adopting the default AntConc settings (Dice -- All values), combined with the default  AntConc  log-likelihood settings (4-term - p.<0.05 with Bonferrori correction)? 

I had cited Brezina just as an example for interpreting the results, because the effect size r and its interpretation lie in that scale (0.1 = Small effect; 0.3 = Medium effect; 0.5 = Large effect). So I'm assuming that , if I'm using this effect size (which is unavailable on AntConc), then I should interpret the results that way.

But what is the scale for DICE? Is there any similar scale like this one cited in Brezina? As I mentioned, in one of the results from my tests, the scale ranged from 0.069 to 0.00 (that's even lower than 0.1!) Does that mean that 0.069 is to be interpreted as a large effect (which is synonymous to a large difference between study and reference corpora)?

As for points 6 and 7, how should I deal with the effect size measures then, considering that they are available in the program and that I cannot avoid checking their related buttons in the Tool Settings? 

Thanks again, 
J.

Laurence Anthony

unread,
May 17, 2024, 11:50:19 PMMay 17
to ant...@googlegroups.com
Hi again

So, as for  point 4 in your explanation above, what's the scale for interpreting the effect size values when adopting the default AntConc settings (Dice -- All values), combined with the default  AntConc  log-likelihood settings (4-term - p.<0.05 with Bonferrori correction)? 
But what is the scale for DICE? Is there any similar scale like this one cited in Brezina? As I mentioned, in one of the results from my tests, the scale ranged from 0.069 to 0.00 (that's even lower than 0.1!) Does that mean that 0.069 is to be interpreted as a large effect (which is synonymous to a large difference between study and reference corpora)?

As I wrote:
"Take Dice as an example. It varies from 0 (completely different) to 1 (exactly the same). So, if you are comparing groups of people, you would probably not consider a value of 0.5 to be 'very similar', but for two corpora, you might consider a value of 0.5 to suggest that the corpora have 'a lot of overlap' so the effect is 'strong'."
It's the same as the effect size measure called 'height'. How do you interpret values like 153 cm vs 195 cm. Which of these values can be considered 'short' or 'tall'. It depends on the context. 

As for points 6 and 7, how should I deal with the effect size measures then, considering that they are available in the program and that I cannot avoid checking their related buttons in the Tool Settings? 

Again, as I wrote:
"6) For keyword analysis, I don't recommend using an effect size measure to rank the keywords. This is why the default setting in AntConc is to rank by log-likelihood (a statistical test, which is not an effect size measure, but may have an effect size component to it). 

I don't know what you mean by "I cannot avoid checking their related buttons". The default setting in keywords is to not use effect size. And in the collocates tool, the default setting is to use LL as a cutoff and rank by MI. For collocates, there is some kind of general understand to use MI > 3, but this again, will depend on the study and context.

Janailton

unread,
May 18, 2024, 6:28:03 AMMay 18
to AntConc-Discussion

Hi again,  Thanks for the response !
Well , just one more question: When I referred to the buttons ,"I cannot avoid checking their related buttons", I meant that , when users open the Tool settings for keywords, they have to define settings for both statistics and metrics , which means that they have to choose some option  for the metrics too (e.g. keyness measures) , and cannot leave this space blank . This suggests that both tests and metrics will run together while comparing both target and reference corpora and, in a way, one may influence the other.  That’s why I’ve been confused, because if we should ignore looking at the results from the effect size rank altogether , why does the tool take us to choose settings for effect and not just for  statistical tests (like likelihood) since these are actually more important ? I hope that explains my question, thanks! J.

Laurence Anthony

unread,
May 18, 2024, 6:39:16 AMMay 18
to ant...@googlegroups.com
Hi,

I don't really understand what you mean. I don't say that you should ignore the effect size measures. Some people find them useful and as I say, for collocation, the standard practice is to rank by MI. Also,  the  tool doesn't take you to the settings for effect size. It just takes you to a settings window where all the settings can be set. For effect size, you can always leave the setting at the default value. You can see below that by default, no effect size threshold is set, so it has no impact on the results (except when you decide to rank by effect size).
 
image.png

image.png

If there is anything still unclear, do let me know. 

I hope that helps.

Laurence.


###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################

--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.

Janailton

unread,
May 18, 2024, 7:42:45 AMMay 18
to AntConc-Discussion
Thanks for the response  and the screenshots , they will me more helpful for explaining. As for the second screenshot where keyword settings are laid out , do you mean that when we users leave the effect size measure settings as they already are in the images, as Dice and all values, respectively , that is to say that these settings  won’t impact on the types of keywords that will appear? Some differences would only appear, though, if we decided to adopt other measures different from the default ones? (As I mentioned in the first post, I did spot different keywords when adopting different effect measures). Thanks!

Laurence Anthony

unread,
May 18, 2024, 9:23:51 AMMay 18
to ant...@googlegroups.com
Hi again,

If you choose to *rank* the words by the effect size measure (which is *not* the default), then the results will obviously vary depending on which effect size measure you choose as they are not comparable. The default is to cut off the list according to the p-value and rank by the log-likelihood scores.

I hope that helps.

Laurence.

Ignacio Rodríguez Sánchez

unread,
May 23, 2024, 12:00:01 PMMay 23
to ant...@googlegroups.com
Reenvío mensaje.

Ignacio Rodríguez Sánchez
Facultad de Lenguas y Letras
Universidad Autónoma de Querétaro 

Campus Aeropuerto
Anillo Vial Fray Junípero Serra s/n
C.P. 76140
Querétaro, Qro.
México
tel. + (52) 442 1921200 ext. 61270


Reply all
Reply to author
Forward
0 new messages