I ran the script below, and the results I get are really bizarre, as it identify only 10 DMCs, but 2 493 DMRs ... I get this type of results only for CHH context
Basically, I am running the DMCs and DMRs analysis on two groups of samples, after pooling biological replicates. DMRs are calculated with tilling window of 100bp with a step of 50bp.
myobj=methRead(file.list,
sample.id=list("AS_R37C11","AS_R38C20","AS_R39C20","FES_R18C18","FES_R37C8","FES_R39C19","FV_R18C18","FV_R37C8","FV_R39C19"),pipeline="bismarkCytosineReport",assembly="PN40X",
treatment=c(1,1,1,3,3,3,2,2,2),context="CHH",mincov=5)
myobj.norm <- normalizeCoverage(myobj,method="median")
meth <- unite(myobj.norm,destrand=FALSE)
meth.filt <- meth
FES_FV_ids <- c("FES_R18C18","FES_R37C8","FES_R39C19","FV_R18C18","FV_R37C8","FV_R39C19")
meth.filt.FES_FV <- reorganize(meth.filt,sample.ids=FES_FV_ids,treatment=c(3,3,3,2,2,2))
meth.filt.FES_FV_pool <- pool(meth.filt.FES_FV,sample.ids=c("FES","FV"))
PoolmyDiff_FES_FV <-calculateDiffMeth(meth.filt.FES_FV_pool,overdispersion="MN", adjust="BH")
PoolmyDiff10p_FES_FV_DMC <- getMethylDiff(PoolmyDiff_FES_FV,difference=10,qvalue=0.05,type="all")
PoolmyDiff10p_FES_FV_DMC <- PoolmyDiff10p_FES_FV_DMC[order(PoolmyDiff10p_FES_FV_DMC$qvalue),]
#DMRs
CHH_tiles_w100s50cov5 =tileMethylCounts(myobj,win.size=100, step.size=50,cov.bases=5)
head(CHH_tiles_w100s50cov5[[1]])
myobj_tilesCHH.norm <- normalizeCoverage(CHH_tiles_w100s50cov5,method="median")
meth_tiles <- unite(myobj_tilesCHH.norm)
meth_tiles.FES_FV <- reorganize(meth_tiles,sample.ids=FES_FV_ids,treatment=c(3,3,3,2,2,2))
meth_tiles.FES_FV_pool <- pool(meth_tiles.FES_FV,sample.ids=c("FES","FV"))
PoolmyDiff_tiles_FES_FV <- calculateDiffMeth(meth_tiles.FES_FV_pool,overdispersion="MN",adjust="BH")
pool_FES_FV_all_diff_CHH_DMR <- getMethylDiff(PoolmyDiff_tiles_FES_FV, difference = 10, qvalue = 0.05, type = "all")
pool_FES_FV_all_diff_CHH_DMR <- pool_FES_FV_all_diff_CHH_DMR[order(pool_FES_FV_all_diff_CHH_DMR$qvalue),]
The results I obtained for CG and CHG with pooled samples seemed accurate so far with (globally, at least more DMCs than DMRs identified, sometimes the difference between the number of DMCs and DMRs identified is not that important, but with a window of 100bp and a step of 50, I think it can make sense ?).
However, I don't understand how thousands of DMRs can be identified when only 10 DMCs are considered as differentially methylated in a whole genome ... (I am working on grapevine, to it's ~580Mbp)...
This is the first time I use this tool, and I am wondering where am I doing something wrong ?
Also, I was wondering what are the criteria applied to identify a window as a DMR ? How many cytosines need to be differentially methylated ? Is there a threshold of differential methylation per cytosine to be considered prior calculating the difference within the entire window between two conditions ? etc ... I read the litterature related to methylkit to try to understand the results I got, but I couldn't put my hand on this information...