Unexpected Negative Keyword Classification in AntConc (Log-Likelihood 4-term)

22 views
Skip to first unread message

Illia Ilyin

unread,
Feb 22, 2025, 11:31:43 AM2/22/25
to AntConc-Discussion

Hello everyone,

I'm encountering an issue with negative keyword classification in AntConc (4.3.1) while comparing two corpora using Log-Likelihood (4-term).

Corpus Details:
  • Target Corpus: Pro-immigration discourse (US-immigr-LEMMA), 9,630 tokens.
  • Reference Corpus: Speech by J.D. Vance (Vance-LEMMA), 2,965 tokens.
Observation:

The word democracy appears relatively significantly more frequently in the target corpus (12 occurrences, NormFreq = 1246.106) than in the reference corpus (12 occurrences, NormFreq = 4047.218). Given this distribution, I expected it to be a positive keyword, yet in a previous analysis, it appeared in the list of negative keywords.

What I Checked:
  • The reference corpus is correctly set as Vance-LEMMA, and the target corpus is US-immigr-LEMMA.
  • Negative keywords are enabled in the settings.
  • I reviewed raw and normalized frequency values, which confirm that democracy is used more in the target corpus than in the reference.
  • Sorting by Log-Likelihood (4-term).
My Question:

Why would democracy be classified as a negative keyword when it is relatively more frequent in the target corpus than in the reference corpus? Could this be a quirk of the Log-Likelihood (4-term) method, or am I missing something in the calculation of negative keywords?

I would greatly appreciate any insights on what might be causing this discrepancy.

Thanks in advance!

Please, find the corpus files and the screenshot from AntConc attached!

Screenshot_3.jpg
JD Vance Remarks.txt
US-immigr.txt
Reply all
Reply to author
Forward
0 new messages