Number of DMRs are higher than number of DMCs

234 views
Skip to first unread message

Berkay Günay

unread,
Apr 10, 2021, 9:34:59 AM4/10/21
to methylkit_discussion
Hi,

I am using Bismark cytosine report files to find differentially methylated bases/regions. Is it normal for the number of differentially methylated regions to be higher than the number of differentially methylated cytosines?

My results:

BASE                                            REGION(100*, 100**, 5***)      REGION(1000*, 1000**, 5***)

Total 1284144 Cytosines         Total 80633 Regions                 Total 52692 Regions     
DMCs: 329                                  DMRs: 1140                                 DMRs: 918

* Window size
** Step size
*** Coverage
Note: Methylation difference threshold: 10, q-value threshold: 0.1.

Codes used for DMCs:

library("methylKit")

file.list = list("D:/Coverage/P1_coverage.CpG_report.txt",
                       "D:/Coverage/P2_coverage.CpG_report.txt",
                       "D:/Coverage/P3_coverage.CpG_report.txt",
                       "D:/Coverage/P7_coverage.CpG_report.txt",
                       "D:/Coverage/P8_coverage.CpG_report.txt",
                       "D:/Coverage/P9_coverage.CpG_report.txt",
                       "D:/Coverage/P4_coverage.CpG_report.txt",
                       "D:/Coverage/P5_coverage.CpG_report.txt",
                       "D:/Coverage/P6_coverage.CpG_report.txt")

myobj = methRead(file.list,
                                   sample.id = list("test1", "test2", "test3", "test4", "test5", "test6", 
                                                                "control1", "control2", "control3"),
                                   assembly = "rn6",
                                   treatment = c(1,1,1,1,1,1,0,0,0),
                                   context = "CpG",
                                   mincov = 5,
                                   pipeline = "bismarkCytosineReport",
                                   header = FALSE)

filtered.myobj=filterByCoverage(myobj,lo.count=5,lo.perc=NULL,
                                      hi.count=NULL,hi.perc=99.9)

meth = unite(filtered.myobj, destrand = FALSE, min.per.group=1L)

myDiff = calculateDiffMeth(meth, adjust = "BH")
myDiff10p.hyper = getMethylDiff(myDiff, difference = 10, qvalue = 0.1, type = "hyper")
myDiff10p.hypo = getMethylDiff(myDiff, difference = 10, qvalue = 0.1, type = "hypo")
myDiff10p = getMethylDiff(myDiff, difference = 10, qvalue = 0.1)

Codes used for DMRs:

library("methylKit")

file.list = list("D:/Coverage/P1_coverage.CpG_report.txt",
                       "D:/Coverage/P2_coverage.CpG_report.txt",
                       "D:/Coverage/P3_coverage.CpG_report.txt",
                       "D:/Coverage/P7_coverage.CpG_report.txt",
                       "D:/Coverage/P8_coverage.CpG_report.txt",
                       "D:/Coverage/P9_coverage.CpG_report.txt",
                       "D:/Coverage/P4_coverage.CpG_report.txt",
                       "D:/Coverage/P5_coverage.CpG_report.txt",
                       "D:/Coverage/P6_coverage.CpG_report.txt")

myobj = methRead(file.list,
                                   sample.id = list("test1", "test2", "test3", "test4", "test5", "test6", 
                                                                "control1", "control2", "control3"),
                                   assembly = "rn6",
                                   treatment = c(1,1,1,1,1,1,0,0,0),
                                   context = "CpG",
                                   mincov = 5,
                                   pipeline = "bismarkCytosineReport",
                                   header = FALSE)

filtered.myobj=filterByCoverage(myobj,lo.count=5,lo.perc=NULL,
                                      hi.count=NULL,hi.perc=99.9)

tiles = tileMethylCounts(filtered.myobj, win.size = 100, step.size = 100, cov.bases = 5)

meth = unite(tiles, destrand = FALSE, min.per.group=1L)

myDiff = calculateDiffMeth(meth, adjust = "BH")
myDiff10p.hyper = getMethylDiff(myDiff, difference = 10, qvalue = 0.1, type = "hyper")
myDiff10p.hypo = getMethylDiff(myDiff, difference = 10, qvalue = 0.1, type = "hypo")
myDiff10p = getMethylDiff(myDiff, difference = 10, qvalue = 0.1)

Best regards,
Berkay


                                                        


Altuna Akalin

unread,
Apr 21, 2021, 1:30:00 PM4/21/21
to methylkit_...@googlegroups.com
this could be because of tiling. When you do that, you have more counts. and for the count-based tests such as fisher's exact or probably logistic regression as well can push the p-values below significance threshold. For example, fisher.test(matrix(c(1,9,2,8),ncol=2)) is not significant but fisher.test(matrix(c(100,900,200,800),ncol=2)) is
both tests represent 10% difference in methylation values.

Best,
Altuna

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/methylkit_discussion/21ac0d87-c276-495e-ab9d-77a20f84ce86n%40googlegroups.com.

Berkay Günay

unread,
Apr 21, 2021, 1:38:03 PM4/21/21
to methylkit_discussion
Dear Altuna,

Thank you so much for your reply.

Best regards,
Berkay

Margot Berger

unread,
May 28, 2024, 11:19:34 AM5/28/24
to methylkit_discussion
Hi ! 

I just saw this issue, and I have kind of a similar problem ... 

I ran the script below, and the results I get are really bizarre, as it identify only 10 DMCs, but 2 493 DMRs ... I get this type of results only for CHH context

Basically, I am running the DMCs and DMRs analysis on two groups of samples, after pooling biological replicates. DMRs are calculated  with tilling window of 100bp with a step of 50bp. 

I extracted the lines from the code I'm using: 


myobj=methRead(file.list,sample.id=list("AS_R37C11","AS_R38C20","AS_R39C20","FES_R18C18","FES_R37C8","FES_R39C19","FV_R18C18","FV_R37C8","FV_R39C19"),pipeline="bismarkCytosineReport",assembly="PN40X",
                treatment=c(1,1,1,3,3,3,2,2,2),context="CHH",mincov=5)

myobj.norm <- normalizeCoverage(myobj,method="median")

meth <- unite(myobj.norm,destrand=FALSE)
meth.filt <- meth

FES_FV_ids <- c("FES_R18C18","FES_R37C8","FES_R39C19","FV_R18C18","FV_R37C8","FV_R39C19")
meth.filt.FES_FV <- reorganize(meth.filt,sample.ids=FES_FV_ids,treatment=c(3,3,3,2,2,2))
meth.filt.FES_FV_pool <- pool(meth.filt.FES_FV,sample.ids=c("FES","FV"))

PoolmyDiff_FES_FV <-calculateDiffMeth(meth.filt.FES_FV_pool,overdispersion="MN", adjust="BH")

PoolmyDiff10p_FES_FV_DMC <- getMethylDiff(PoolmyDiff_FES_FV,difference=10,qvalue=0.05,type="all")
PoolmyDiff10p_FES_FV_DMC <- PoolmyDiff10p_FES_FV_DMC[order(PoolmyDiff10p_FES_FV_DMC$qvalue),]

#DMRs

CHH_tiles_w100s50cov5 =tileMethylCounts(myobj,win.size=100, step.size=50,cov.bases=5)
head(CHH_tiles_w100s50cov5[[1]])

myobj_tilesCHH.norm <- normalizeCoverage(CHH_tiles_w100s50cov5,method="median")

meth_tiles <- unite(myobj_tilesCHH.norm)

meth_tiles.FES_FV <- reorganize(meth_tiles,sample.ids=FES_FV_ids,treatment=c(3,3,3,2,2,2))
meth_tiles.FES_FV_pool <- pool(meth_tiles.FES_FV,sample.ids=c("FES","FV"))
PoolmyDiff_tiles_FES_FV <- calculateDiffMeth(meth_tiles.FES_FV_pool,overdispersion="MN",adjust="BH")

pool_FES_FV_all_diff_CHH_DMR <- getMethylDiff(PoolmyDiff_tiles_FES_FV, difference = 10, qvalue = 0.05, type = "all")
pool_FES_FV_all_diff_CHH_DMR <- pool_FES_FV_all_diff_CHH_DMR[order(pool_FES_FV_all_diff_CHH_DMR$qvalue),]


The results I obtained for CG and CHG with pooled samples seemed accurate so far with (globally, at least more DMCs than DMRs identified, sometimes the difference between the number of DMCs and DMRs identified is not that important, but with a window of 100bp and a step of 50, I think it can make sense ?). 

However, I don't understand how thousands of DMRs can be identified when only 10 DMCs are considered as differentially methylated in a whole genome ... (I am working on grapevine, to it's ~580Mbp)... 

This is the first time I use this tool, and I am wondering where am I doing something wrong ? 

Also, I was wondering what are the criteria applied to identify a window as a DMR ? How many cytosines need to be differentially methylated ? Is there a threshold of differential methylation per cytosine to be considered prior calculating the difference within the entire window between two conditions ? etc ... I read the litterature related to methylkit to try to understand the results I got, but I couldn't put my hand on this information...

Best regards, 

Margot 

Alexander Blume

unread,
Jun 17, 2024, 11:02:28 AM6/17/24
to methylkit_discussion
Hi Margot, 

Altuna's previous answer about the inflation of p-values still holds as the general reason for this big difference in the number of differential sites.

However,  I will still try to answer your other questions:
> I was wondering what are the criteria applied to identify a window as a DMR ?
The same criteria as for the untiled approach, the methylation difference and the q-value. 

> How many cytosines need to be differentially methylated ?
This is not evaluated. After tiling, single CpGs are not considered, their read counts (numCs, numTs) are summed up over the region. 

> Is there a threshold of differential methylation per cytosine to be considered prior calculating the difference within the entire window between two conditions ? 
No, this is currently not supported. You may want to look at https://bioconductor.org/packages/release/bioc/vignettes/dmrseq/inst/doc/dmrseq.html or https://github.com/ben-laufer/DMRichR/ for this kind of analysis. 


Best,
Alex

Reply all
Reply to author
Forward
0 new messages