Unexpected Negative Keyword Classification in AntConc (Log-Likelihood 4-term)

22 views

Skip to first unread message

Illia Ilyin

unread,

Feb 22, 2025, 11:31:43 AM2/22/25

to AntConc-Discussion

Hello everyone,

I'm encountering an issue with negative keyword classification in AntConc (4.3.1) while comparing two corpora using Log-Likelihood (4-term).

Corpus Details:

Target Corpus: Pro-immigration discourse (US-immigr-LEMMA), 9,630 tokens.
Reference Corpus: Speech by J.D. Vance (Vance-LEMMA), 2,965 tokens.

Observation:

The word democracy appears relatively significantly more frequently in the target corpus (12 occurrences, NormFreq = 1246.106) than in the reference corpus (12 occurrences, NormFreq = 4047.218). Given this distribution, I expected it to be a positive keyword, yet in a previous analysis, it appeared in the list of negative keywords.

What I Checked:

The reference corpus is correctly set as Vance-LEMMA, and the target corpus is US-immigr-LEMMA.
Negative keywords are enabled in the settings.
I reviewed raw and normalized frequency values, which confirm that democracy is used more in the target corpus than in the reference.
Sorting by Log-Likelihood (4-term).

My Question:

Why would democracy be classified as a negative keyword when it is relatively more frequent in the target corpus than in the reference corpus? Could this be a quirk of the Log-Likelihood (4-term) method, or am I missing something in the calculation of negative keywords?

I would greatly appreciate any insights on what might be causing this discrepancy.

Thanks in advance!

Please, find the corpus files and the screenshot from AntConc attached!

Screenshot_3.jpg

JD Vance Remarks.txt

US-immigr.txt

Reply all

Reply to author

Forward

0 new messages