Hi Moshe,
Thank you for your questions.
1. Why a Chi-Sq or F test if p-values come from logistic regression?
When methylKit fits a logistic regression per CpG, it produces a coefficient for the treatment/group effect. To get a p-value for that coefficient, it performs a likelihood ratio test (Chi-Sq) or an F-test - this is effectively the test from the logistic regression model. It's not an additional test on top of it. The Chi-Sq LRT compares the full model (with the group effect) to a reduced model (without it), which is the standard way to assess significance in GLMs. So you are getting the p-value from the logistic regression, just via a likelihood ratio test rather than a Wald z-test, which is generally more reliable for this type of data.
When overdispersion correction is applied (using a beta-binomial model), methylKit switches to an F-test to account for the extra variance. Without correction, the Chi-Sq LRT is used.
2. What is meth.diff?
meth.diff is the difference in mean methylation "percentages" between the two groups (not the difference in logit-transformed values). Concretely, it is computed as:
meth.diff = mean_methylation_group1 (%) − mean_methylation_group2 (%)
So a meth.diff of +20 means group 1 has 20 percentage points higher methylation than group 2 at that CpG. This is a straightforward, interpretable measure of effect size on the proportion scale. If you want to work on the logit scale (e.g., for downstream effect size comparisons), you would need to compute that separately from the coverage and count columns.
Hope that helps!
Best
Alex