Settings for Keyword analysis

54 views
Skip to first unread message

Carla Moriarty

unread,
Oct 22, 2024, 8:59:15 PM10/22/24
to AntConc-Discussion

Hi to Dr Anthony and everyone else in this group.

I’m struggling with an area regarding ‘Keywords’ and I’m hoping someone could please steer me in the right direction.

I’ve created my own corpus of text-to-image text prompts from Midjourney which contains 100 million tokens. Now I would like to compare what words appear unusually frequently in my Midjourney corpus to a reference corpus. I have chosen AmE06 for this (as Midjourney’s biggest group of users reside in the US).

My problem is that AmE06 is quite a bit smaller than my Midjourney corpus (1 million tokens versus 100 million tokens) and I’m not sure if the settings I have chosen reflects this disparity – can anyone please advise? I have selected:
Picture1.pngPicture2.png

Laurence Anthony

unread,
Oct 22, 2024, 9:25:23 PM10/22/24
to ant...@googlegroups.com
Hi Carla,

The Log-Likelihood statistic takes into account the difference in corpora sizes, so you don't have to worry about that. The only danger is when you are comparing very small frequencies, (<30 in the slots of the statistic). In these cases, the keyness values will be inaccurate so you might want to consider effect sizes in these cases. If you are only looking at the top keywords, usually you should be fine.

I hope that helps.

Laurence.

###############################################################
Laurence ANTHONY, Ph.D.
Professor of Applied Linguistics
Faculty of Science and Engineering
Waseda University
3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
E-mail: antho...@gmail.com
WWW: http://www.laurenceanthony.net/
###############################################################


--
You received this message because you are subscribed to the Google Groups "AntConc-Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antconc+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/antconc/1322cbd8-8901-4e7c-8215-7081e4131f4dn%40googlegroups.com.

Carla Moriarty

unread,
Oct 22, 2024, 11:13:54 PM10/22/24
to ant...@googlegroups.com
Yes, that does help - thank you so much!



--
Kind regards

Carla Moriarty
Reply all
Reply to author
Forward
0 new messages