Dear Prof Anthony and all,
I've done keyword analysis in AntConc with the setting: 'Log-likelihood (4-term)' for likelihood measure and 'p < 0.01 (6.63 with Bonferroni)' for threshold.
To describe the analysis in the method section, I need to know:
- log-likelihood procedure
- assumptions of the statistics for kyeness
- the adjusted alpha level after Bonferroni correction
Regarding the second point about assumptions of statistics, it seems that general assumptions for statistics like random sampling or normal data distribution are not applicable (or not necessary) for keyness and other corpus analyses, because the purpose of inferential statistics in corpus analysis is not rejecting null hypothesis but identifying words and ranking them. I got this impression by reading some sources like Baker (2006), Bestgen (2014), Rayson (2019) and Pojanapunya and Todd (2018). Am I understanding properly? Then what can be assumptions particularly of corpus analysis statistics?
I will very much appreciate if you can help me to figure out how I describe the three points.
Thank you!
Best regards,
Hyeseung
*References*
- Baker, P. (2006). Using corpora in discourse analysis. Continuum.
- Bestgen, Y. (2013). Inadequacy of the chi-squared test to examine vocabulary differences between corpora. Literary and Linguistic Computing, 29(2), 164-170. https://doi.org/10.1093/llc/fqt020
- Pojanapunya, P., & Todd, R. W. (2018). Log-likelihood and odds ratio: Keyness statistics for different purposes of keyword analysis. Corpus Linguistics and Linguistic Theory, 14(1), 133-167. https://doi.org/doi:10.1515/cllt-2015-0030
- Rayson, P. (2019) Corpus analysis of key words. In Chapelle, Carol A. (Ed) The Concise Encyclopedia of Applied Linguistics. Wiley, p. 320-326. https://www.wiley.com/en-gb/The+Concise+Encyclopedia+of+Applied+Linguistics-p-9781119147367