Logistic regression in methylKit

10 views
Skip to first unread message

Moshe Olshansky

unread,
Mar 25, 2026, 8:44:59 PMMar 25
to methylkit_discussion
Hello,

I have two questions regarding logistic regression:
1. If I am only looking for individual CpGs and no overdispersion correction is applied, why a ChiSq or F test is needed? Don't we get the p-values from logistic regression itself?
2. What is meth.diff? Is it the difference between the estimated logIt? If so, how would you estimate the difference between the methylation proportions?

Thank you.

Alexander Blume

unread,
Apr 7, 2026, 10:07:59 AMApr 7
to methylkit_...@googlegroups.com
Hi Moshe,

Thank you for your questions. 

1. Why a Chi-Sq or F test if p-values come from logistic regression?

When methylKit fits a logistic regression per CpG, it produces a coefficient for the treatment/group effect. To get a p-value for that coefficient, it performs a likelihood ratio test (Chi-Sq) or an F-test - this is effectively the test from the logistic regression model. It's not an additional test on top of it. The Chi-Sq LRT compares the full model (with the group effect) to a reduced model (without it), which is the standard way to assess significance in GLMs. So you are getting the p-value from the logistic regression, just via a likelihood ratio test rather than a Wald z-test, which is generally more reliable for this type of data.

When overdispersion correction is applied (using a beta-binomial model), methylKit switches to an F-test to account for the extra variance. Without correction, the Chi-Sq LRT is used.

2. What is meth.diff?

meth.diff is the difference in mean methylation "percentages" between the two groups (not the difference in logit-transformed values). Concretely, it is computed as:

  meth.diff = mean_methylation_group1 (%) − mean_methylation_group2 (%)

So a meth.diff of +20 means group 1 has 20 percentage points higher methylation than group 2 at that CpG. This is a straightforward, interpretable measure of effect size on the proportion scale. If you want to work on the logit scale (e.g., for downstream effect size comparisons), you would need to compute that separately from the coverage and count columns.

Hope that helps!

Best
Alex

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/methylkit_discussion/95c1345e-09b4-439b-9c32-794dd3c666d1n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages