logistic regression in calculateDiffMeth function

53 views
Skip to first unread message

Antoine

unread,
Jan 30, 2025, 10:55:51 PMJan 30
to methylkit_discussion
Hello,

I am new in methylation analysis.
I am trying to understand  calculateDiffMeth function for logistic regression :
 1.How to know when wu should apply an over dispersion correction  ?
 2. Same question regarding the choice between CHI-seq and  F-test

My second question is about processBismarkAln :
-Is Read.context option filter CpG ? 

Thank you for your help

Alexander Blume

unread,
Feb 3, 2025, 7:22:34 AMFeb 3
to methylkit_...@googlegroups.com
Hi Antoine,

> 1.How to know when wu should apply an over dispersion correction ?

Deciding when to apply overdispersion correction and which statistical
test to use depends on your data.

Overdispersion correction (overdispersion=TRUE) is necessary when
there’s a high level of variability across biological replicates,
which can inflate false positives.
This often happens in low-coverage WGBS or RRBS data where read counts
per site are low.
To check for overdispersion, you can look at dispersion estimates from
tools like edgeR (estimateDisp()) or examine the coefficient of
variation in methylation proportions across replicates.

> 2. Same question regarding the choice between CHI-seq and F-test

For statistical tests, the Chi-squared test (test="Chisq") is
typically used when sample sizes are small and overdispersion is not a
concern. It assesses whether methylation proportions differ between
groups.
The F-test (test="F"), on the other hand, is more robust in the
presence of overdispersion and is better suited for larger datasets.
If you’re unsure, a good approach is to check dispersion first—if it’s
high, apply overdispersion correction and opt for the F-test.

> My second question is about processBismarkAln :
> -Is Read.context option filter CpG ?

Yes, the read.context option in processBismarkAln() filters CpG sites
if you specify "CpG", though it can be omitted since this is also the
default value.
This option allows you to choose which methylation context to retain
from the Bismark alignment file; other options are "CHG" and "CHH."

Best,
Alex
> --
> You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/methylkit_discussion/6f2bdb27-5295-47ec-9156-050840e47aban%40googlegroups.com.

Antoine

unread,
Feb 9, 2025, 4:10:14 PMFeb 9
to methylkit_discussion
Thank you very much for the answers Alex! 

Regarding over dispersion, is there any cut-off for estimateDisp() or the coefficient of variation. 

I tried processBismarkAln with "CHG" and "CHH." options but I am getting the same result with the three options (CHG, CHH and CpG). is there an explanation ?

Thank you again
Antoine

Alexander Blume

unread,
Feb 14, 2025, 5:24:06 AMFeb 14
to methylkit_...@googlegroups.com
Hi Antoine,

Please note, that I suggested the wrong code for enabling the overdispersion in calculateDiffMeth, the correct one should be either `overdispersion="MN"` or  `overdispersion="shrinkMN"` (less tested) as is mentioned in the function help and the vignette (https://www.bioconductor.org/packages/release/bioc/vignettes/methylKit/inst/doc/methylKit.html#37_Correcting_for_overdispersion).

I am adding a few more resources explaining the theory of overdispersion [1] and how to estimate dispersion with edger [2]. 


> Regarding over dispersion, is there any cut-off for estimateDisp() or the coefficient of variation.

This article outlines how to perform those checks using edgeR [2]. 
Once the dispersion is greater than 1, overdispersion correction can be be applied [1], which is also checked in methylKit's code (https://github.com/al2na/methylKit/blob/d244b0b975db230bba856a8bbda5e726b7df7964/R/diffMeth.R#L238-L247). 



> I tried processBismarkAln with "CHG" and "CHH." options but I am getting the same result with the three options (CHG, CHH and CpG). is there an explanation ?

Could you share your code? If there is a problem with the package feel free to issue it on GitHub, but please add some reproducible examples. 

Best, 
Alex



- [1] Explanation of concept and adjustment for overdispersion (includes code example at the bottom): https://online.stat.psu.edu/stat504/lesson/7/7.3
- [2] Workflow for Differential methylation analysis using edgeR (Dispersion Estimation is highlighted): https://f1000research.com/articles/6-2055/v2#:~:text=three%20cell%20populations.-,Dispersion%20estimation,-With%20the%20design
- [3] An empirical approach to determine a threshold for assessing overdispersion in Poisson and negative binomial models for count data: https://pmc.ncbi.nlm.nih.gov/articles/PMC6290908/#:~:text=Currently%2C%20one%20of%20the%20most,ratio%20is%20greater%20than%20one.
Reply all
Reply to author
Forward
0 new messages