DMRcate for bisulphite sequencing data

1,875 views
Skip to first unread message

Tim Peters

unread,
Dec 15, 2015, 7:20:36 PM12/15/15
to Epigenomics forum
Dear all,

A significantly updated version of DMRcate is now available (http://bioconductor.org/packages/3.2/bioc/html/DMRcate.html). The major change is that it now includes functionality for DMR calling from sequencing data (WGBS, RRBS etc.), provided with total reads and methylated reads for each CpG site. 450K functionality is still available and EPIC array functionality should be coming soon, depending on how regularly Illumina is updating the manifest file.

A worked example for sequencing data can be found in the vignette. Because of the greater number of CpG sites the ideal kernel size should me a fair amount smaller (i.e. argument C is larger when calling dmrcate()) than that for 450K (suggested size is in the vignette). 

A summary of the major included changes is as follows:

- DMR finding from WGBS and RRBS data, using DSS (instead of limma) as a backend for the regression step
- DMR.plot() has been completely rewritten using the Gviz platform (see image below). For the DMR passed via the argument
  • A locus showing the DMR plus 5kb up and downstream in each direction (more if there is another DMR within 5kb) for context
  • Overlapping transcript loci for a choice of hg19, hg38 or mm10
  • Heatmap showing individual methylation for given samples
  • Line plot tracking mean methylation per phenotype
  • Chromosome ideogram
- DMRs are now ranked by their Stouffer values from their individual constituent FDRs (the kernel smoothing and thresholding part of the method hasn't changed). Minpval and meanpval have been removed as a result.

Any feedback is welcome either here or at <t.peters at garvan.org.au>.

Best wishes,
Tim



Yalbi Itzel Balderas Martinez

unread,
Feb 1, 2016, 10:18:22 PM2/1/16
to Epigenomics forum
Dear Tim,

I have found your package very interesting and also your papers, however, I have found some errors in my script, and I am not sure what is happening, I really hope you could help me. 

In particular, in the function 

results.ranges <- extractRanges(dmrcoutput, genome = "hg19")

It appears the error: Error in strsplit(xx, ":") : non-character argument

I tried to solved this, creating the object result.ranges using GRanges, 

This is the object results.ranges:

GRanges object with 1194 ranges and 6 metadata columns:
      seqnames                 ranges strand   |  gene_assoc                             group no.probes      minpval     meanpval
         <Rle>              <IRanges>  <Rle>   | <character>                       <character> <integer>    <numeric>    <numeric>
  136    chr10 [ 64344052,  64344052]      *   |      ZNF365                        5'UTR,Body         1 2.074795e-25 2.074795e-25
  296    chr12 [124246391, 124247223]      *   |      DNAH10 TSS1500,TSS200,1stExon,5'UTR,Body         8 1.399450e-22 8.077516e-09
  654     chr2 [ 74210857,  74212049]      *   |                                                       6 1.798993e-19 1.325255e-07
  944     chr6 [ 28890518,  28892079]      *   |      TRIM27 Body,1stExon,5'UTR,TSS200,TSS1500        51 4.541014e-17 7.079132e-04
  914     chr5 [149492480, 149493258]      *   |       CSF1R              5'UTR,TSS200,TSS1500         7 3.748344e-16 1.193855e-03
  ...      ...                    ...    ... ...         ...                               ...       ...          ...          ...
  207    chr11 [  8289620,   8289620]      *   |                                                       1   0.04913373   0.04913373
  992     chr6 [ 79620031,  79620031]      *   |                                                       1   0.04920299   0.04920299
  702    chr20 [  1374350,   1374350]      *   |      FKBP1A                           TSS1500         1   0.04952731   0.04952731
   58     chr1 [ 67798929,  67798929]      *   |     IL12RB2                              Body         1   0.04975760   0.04975760
  174    chr10 [131665838, 131665838]      *   |        EBF3                              Body         1   0.04985599   0.04985599
        maxbetafc
        <numeric>
  136  0.30608766
  296  0.13140995
  654  0.18093186
  944 -0.06241058
  914  0.06947876
  ...         ...
  207  0.04545169
  992  0.05171461
  702  0.07533435
   58 -0.09641478
  174 -0.01000234
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths 

But then, when I use the function DMR.plot, it does not work again,

 groups <- c(Normal="magenta", Disease="forestgreen")
 DMR.plot(ranges=results.ranges, dmr=1, CpGs=myBetas, 
             phen.col=c("Normal", "Normal", "Disease", "Disease"), genome="hg19",
             samps=samps)

and it appears this error:
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘Rle’ for signature ‘"GRanges", "missing"’

Could you help me in finding what is happening? I am so sorry if this is a basic question....
Thanks,

This is my session info

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
 [1] grid      splines   stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RnBeads_1.2.1                                      methylumi_2.16.0                                  
 [3] FDb.InfiniumMethylation.hg19_2.2.0                 org.Hs.eg.db_3.2.3                                
 [5] RSQLite_1.0.0                                      DBI_0.3.1                                         
 [7] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2            GenomicFeatures_1.22.8                            
 [9] AnnotationDbi_1.32.3                               reshape2_1.4.1                                    
[11] scales_0.3.0                                       illuminaio_0.12.0                                 
[13] matrixStats_0.50.1                                 limma_3.26.5                                      
[15] gridExtra_2.0.0                                    gplots_2.17.0                                     
[17] ggplot2_2.0.0                                      fields_8.3-6                                      
[19] maps_3.0.2                                         spam_1.3-0                                        
[21] ff_2.2-13                                          bit_1.1-12                                        
[23] cluster_2.0.3                                      RColorBrewer_1.1-2                                
[25] MASS_7.3-45                                        IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1
[27] plyr_1.8.3                                         DMRcate_1.6.4                                     
[29] DMRcatedata_1.6.1                                  DSS_2.10.0                                        
[31] bsseq_1.6.0                                        minfi_1.16.0                                      
[33] bumphunter_1.10.0                                  locfit_1.5-9.1                                    
[35] iterators_1.0.8                                    foreach_1.4.3                                     
[37] Biostrings_2.38.3                                  XVector_0.10.0                                    
[39] SummarizedExperiment_1.0.2                         GenomicRanges_1.22.3                              
[41] GenomeInfoDb_1.6.3                                 IRanges_2.4.6                                     
[43] S4Vectors_0.8.7                                    lattice_0.20-33                                   
[45] GEOquery_2.36.0                                    Biobase_2.30.0                                    
[47] BiocGenerics_0.16.1                                biomaRt_2.26.1                                    

loaded via a namespace (and not attached):
 [1] nlme_3.1-124             bitops_1.0-6             tools_3.2.2              doRNG_1.6                nor1mix_1.2-1           
 [6] KernSmooth_2.23-15       rpart_4.1-10             Hmisc_3.17-1             Gviz_1.14.2              colorspace_1.2-6        
[11] nnet_7.3-11              base64_1.1               preprocessCore_1.32.0    chron_2.3-47             pkgmaker_0.22           
[16] rtracklayer_1.30.1       caTools_1.17.1           genefilter_1.52.0        quadprog_1.5-5           stringr_1.0.0           
[21] digest_0.6.9             Rsamtools_1.22.0         foreign_0.8-66           siggenes_1.44.0          R.utils_2.2.0           
[26] dichromat_2.0-0          BSgenome_1.38.0          mclust_5.1               BiocParallel_1.4.3       gtools_3.5.0            
[31] acepack_1.3-3.3          R.oo_1.19.0              VariantAnnotation_1.16.4 RCurl_1.95-4.7           magrittr_1.5            
[36] Formula_1.2-1            futile.logger_1.4.1      Rcpp_0.12.3              munsell_0.4.2            R.methodsS3_1.7.0       
[41] stringi_1.0-1            zlibbioc_1.16.0          gdata_2.17.0             multtest_2.26.0          annotate_1.48.0         
[46] beanplot_1.2             igraph_1.0.1             rngtools_1.2.4           corpcor_1.6.8            codetools_0.2-14        
[51] mixOmics_5.2.0           futile.options_1.0.0     XML_3.98-1.3             biovizBase_1.18.0        latticeExtra_0.6-26     
[56] lambda.r_1.1.7           data.table_1.9.6         gtable_0.1.2             reshape_0.8.5            xtable_1.8-0            
[61] survival_2.38-3          GenomicAlignments_1.6.3  registry_0.3             ellipse_0.3-8            rgl_0.95.1441     


Tim Peters

unread,
Feb 9, 2016, 1:47:10 AM2/9/16
to Epigenomics forum
Dear Yalbi,

I have ironed out a couple of bugs in the last couple of weeks involving extractRanges() and DMR.plot() - could you update DMRcate to version 1.6.51 (plus all dependencies and imports) and try again please?
Thanks,
Tim

Yalbi Itzel Balderas Martinez

unread,
Mar 8, 2016, 10:13:40 PM3/8/16
to Epigenomics forum

Dear Tim, 

I have updated all the packages and libraries, but it is not working, I have this message error when I use result.ranges

"Error in extractRanges(DMRs, genome = "hg19") : 
  object 'tx.hg19' not found"

I am not sure if this is an object from another library??

Regards,

Tim Triche, Jr.

unread,
Mar 9, 2016, 4:24:33 PM3/9/16
to Epigenomics forum
post a traceback (or try debug(extractRanges) beforehand).  I wrote that particular function, it's probably some cruft I left in there.

ps.  If anyone wants to help me add a Savitsky-Golay filter to choose the best cutoff for turning a -log10(p) "pileup" of DMRs into a BED file, please let me know.  There are a number of things I'd like to merge back into the main trunk, so if I'm going to fix one I might as well fix them all.

best,

--t

Tim Peters

unread,
Mar 13, 2016, 5:24:51 AM3/13/16
to Epigenomics forum
Hi Yalbi,

You'll need to load the data package into the workspace via

data(dmrcatedata)

and then try again. This contains the annotation objects tx.hg19, tx.hg38 and tx.mm10.
Also, I discovered a bug on line 41 in DMR.plot() this week that would have thrown an error if there was a "C" in any of the column names of the beta/counts object - I hope it wasn't what was messing up your calls. 1.6.52 is now available to download from Bioconductor with the corrected version of DMR.plot().

Good luck,
Tim
Message has been deleted

Tim Peters

unread,
Apr 12, 2016, 12:52:13 AM4/12/16
to Epigenomics forum
Hi again folks,

Thanks for all your feedback on the new version. A recurring problem with extractRanges() and DMR.plot() that was the result of me not declaring the correct environments for the test data should now be fixed in version 1.6.53. Thanks to Yalbi and others for pointing all of this out. Please keep sending me bug reports and questions - it helps improve the package.

Best,
Tim 

Tim Peters

unread,
May 25, 2016, 9:55:16 PM5/25/16
to Epigenomics forum
Hi all,

The latest version of DMRcate (1.8.5) can now call DMRs from Infinium MethylationEPIC Kit samples. Thanks to Kasper Hansen for making the annotation package available on Bioconductor ( http://bioconductor.org/packages/3.3/data/annotation/html/IlluminaHumanMethylationEPICanno.ilm10b2.hg19.html). 

The workflow is identical to that for 450K arrays, except that annotation=c(array = "IlluminaHumanMethylationEPIC", annotation = "ilm10b2.hg19") should be passed to cpg.annotate() and array.annotation=c(array = "IlluminaHumanMethylationEPIC", annotation = "ilm10b2.hg19") to DMR.plot(). Initial testing looks like optimal kernel size is very similar to 450K, so defaults have been left as is.

Happy DMR finding,
Tim


vk

unread,
Jul 19, 2016, 12:28:50 PM7/19/16
to Epigenomics forum
Hi Tim,

I am working on EPIC data to find differentially methylated regions. I am using DMRcate. I am getting an error for cpg.annotate 

myAnnotation <- cpg.annotate(mVals, datatype = "array",analysis.type="differential", design=design,contrasts = TRUE, cont.matrix = contMatrix,coef="MCFseven - fourAfive")

Your contrast returned 496947 individually significant probes. We recommend the default setting of pcutoff in dmrcate().
Error in data.frame(ID = rownames(object), stat = stat, CHR = RSanno$chr,  :
  arguments imply differing number of rows: 812896, 425236

I saw your previous reply for this error. But Yes, I had removed SNP probes too. Still I get this error. Please have look to my code and functions I used.

targets$Sample_Group
 [1] "A_NT"             "A_BPA"            "A_BaP"            "A_BPABaP"
 [5] "AT1_NT"           "AT1_BPA"          "AT1_BaP"          "AT1_BPABaP"
 [9] "CA_NT"            "A_NT"             "A_BPA"            "A_BaP"
[13] "A_BPABaP"         "AT1_NT"           "AT1_BPA"          "AT1_BaP"
[17] "AT1_BPABaP"       "CA_NT"            "A_NT"             "A_BPA"
[21] "A_BaP"            "A_BPABaP"         "AT1_NT"           "AT1_BPA"
[25] "AT1_BaP"          "AT1_BPABaP"       "CA_NT"            "fourAfive"
[29] "twoFone_wt"       "oneGthree_DEfour" "twoCtwo_DEfour"   "MCFseven"
[33] "fourAfive"        "twoFone_wt"       "oneGthree_DEfour" "twoCtwo_DEfour"
[37] "MCFseven"

mSetRaw <- preprocessRaw(rgSet)
mSetSq <- preprocessFunnorm(rgSet)
[preprocessFunnorm] Background and dye bias correction with noob
[preprocessNoob] Using sample number 24 as reference level...
[preprocessFunnorm] Mapping to genome
[preprocessFunnorm] Quantile extraction
[preprocessFunnorm] Normalization
Warning message:
In .getSex(CN = CN, xIndex = xIndex, yIndex = yIndex, cutoff = cutoff) :
  An inconsistency was encountered while determining sex. One possibility is that only one sex is present. We recommend further checks, for example with the plotSex function.

detP <- detP[match(featureNames(mSetSq),rownames(detP)),]
keep <- rowSums(detP < 0.01) == ncol(mSetSq)
table(keep)
keep
 FALSE   TRUE
  6018 860818

mSetSqFlt <- mSetSq[keep,]

mSetSqFlt
class: GenomicRatioSet
dim: 860818 37
metadata(0):
assays(2): Beta CN
rownames(860818): cg14817997 cg26928153 ... cg07587934 cg16855331
rowData names(0):
colnames(37): A_NT.1 A_BPA.2 ... twoCtwo_DEfour.36 MCFseven.37
colData names(10): Sample_Name Sample_Well ... filenames predictedSex
Annotation
  array: IlluminaHumanMethylationEPIC
  annotation: ilm10b2.hg19
Preprocessing
  Method: Raw (no normalization or bg correction)
  minfi version: 1.18.2
  Manifest version: 0.3.0

keep <- !(featureNames(mSetSqFlt) %in% annEPIC$Name[annEPIC$chr %in% c("chrX","chrY")])
table(keep)
keep
 FALSE   TRUE
 19098 841720
 
mSetSqFlt <- mSetSqFlt[keep,]

mSetSqFlt <- dropLociWithSnps(mSetSqFlt)

mSetSqFlt
class: GenomicRatioSet
dim: 812896 37
metadata(0):
assays(2): Beta CN
rownames(812896): cg14817997 cg26928153 ... cg07660283 cg09226288
rowData names(0):
colnames(37): A_NT.1 A_BPA.2 ... twoCtwo_DEfour.36 MCFseven.37
colData names(10): Sample_Name Sample_Well ... filenames predictedSex
Annotation
  array: IlluminaHumanMethylationEPIC
  annotation: ilm10b2.hg19
Preprocessing
  Method: Raw (no normalization or bg correction)
  minfi version: 1.18.2
  Manifest version: 0.3.0
  

mVals <- getM(mSetSqFlt)
head(mVals[,1:5])
               A_NT.1    A_BPA.2    A_BaP.3 A_BPABaP.4    AT1_NT.5
cg14817997  0.5209928  0.4748454  0.4805088  0.4106944 -0.09737575
cg26928153  3.8652288  3.2916606  4.0673276  3.7638197  4.09015606
cg16269199  1.9067538  1.9842475  2.2688374  1.8933198  2.40681559
cg13869341  1.8235001  2.0258282  1.9080779  1.9390136  2.32162979
cg14008030  2.3367597  2.3639274  2.0101783  2.0230830  1.98342362
cg12045430 -1.6365938 -2.0614246 -2.1343550 -2.0402295 -1.80513521

bVals <- getBeta(mSetSqFlt)
head(bVals[,1:5])
              A_NT.1   A_BPA.2   A_BaP.3 A_BPABaP.4  AT1_NT.5
cg14817997 0.5893127 0.5815496 0.5825045  0.5706912 0.4831325
cg26928153 0.9357864 0.9073421 0.9437076  0.9314310 0.9445423
cg16269199 0.7894579 0.7982473 0.8281594  0.7879060 0.8413460
cg13869341 0.7797060 0.8028491 0.7896104  0.7931505 0.8333046
cg14008030 0.8347563 0.8373375 0.8011264  0.8025477 0.7981553
cg12045430 0.2433499 0.1932747 0.1855147  0.1955757 0.2224883

cellType <- factor(targets$Sample_Group)
design <- model.matrix(~0+cellType, data=targets)
colnames(design) <- c(levels(cellType))
fit <- lmFit(mVals, design)
contMatrix <- makeContrasts(MCFseven-fourAfive,twoFone_wt-fourAfive,oneGthree_DEfour-fourAfive,twoCtwo_DEfour-fourAfive, levels=design)

fit2 <- contrasts.fit(fit, contMatrix)
fit2 <- eBayes(fit2)
summary(decideTests(fit2))
   MCFseven - fourAfive twoFone_wt - fourAfive oneGthree_DEfour - fourAfive
-1               177463                  70477                        78167
0                315949                 489922                       494446
1                319484                 252497                       240283
   twoCtwo_DEfour - fourAfive
-1                      82238
0                      488685
1                      241973

myAnnotation <- cpg.annotate(mVals, datatype = "array",analysis.type="differential", design=design,contrasts = TRUE, cont.matrix = contMatrix,coef="MCFseven - fourAfive")
Your contrast returned 496947 individually significant probes. We recommend the default setting of pcutoff in dmrcate().
Error in data.frame(ID = rownames(object), stat = stat, CHR = RSanno$chr,  :
  arguments imply differing number of rows: 812896, 425236

Looking forward to your response. Thank you in advance.
Message has been deleted

Tim Peters

unread,
Jul 20, 2016, 5:14:27 AM7/20/16
to Epigenomics forum
Hi vk,

First make sure you're using the most recent version of DMRcate (currently 1.8.6).

Then, as per the manual, you'll need to add annotation=c(array = "IlluminaHumanMethylationEPIC", annotation = "ilm10b2.hg19") to your call to cpg.annotate() for EPIC data. It should work from there.

Best,
Tim

vk

unread,
Jul 20, 2016, 5:33:40 AM7/20/16
to Epigenomics forum
Hi Tim,

Thanks for the reply. Yes I am using recent version of DMRcate (currently 1.8.6).

sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
[1] C

attached base packages:
 [1] splines   grid      stats4    parallel  stats     graphics  grDevices
 [8] utils     datasets  methods   base

other attached packages:
 [1] stringr_1.0.0
 [2] DMRcate_1.8.6
 [3] DMRcatedata_1.8.2
 [4] DSS_2.12.0
 [5] bsseq_1.8.2
 [6] Gviz_1.16.1
 [7] minfiData_0.14.0
 [8] IlluminaHumanMethylationEPICmanifest_0.3.0
 [9] IlluminaHumanMethylationEPICanno.ilm10b2.hg19_0.3.0
[10] matrixStats_0.50.2
[11] missMethyl_1.6.2
[12] RColorBrewer_1.1-2
[13] minfi_1.18.2
[14] bumphunter_1.12.0

I added annotation before itself.

annEPIC = getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b2.hg19)

This is how contMatrix looks.

contMatrix
                  Contrasts
Levels             MCFseven - fourAfive twoFone_wt - fourAfive
  A_BaP                               0                      0
  A_BPA                               0                      0
  A_BPABaP                            0                      0
  A_NT                                0                      0
  AT1_BaP                             0                      0
  AT1_BPA                             0                      0
  AT1_BPABaP                          0                      0
  AT1_NT                              0                      0
  CA_NT                               0                      0
  fourAfive                          -1                     -1
  MCFseven                            1                      0
  oneGthree_DEfour                    0                      0
  twoCtwo_DEfour                      0                      0
  twoFone_wt                          0                      1
                  Contrasts
Levels             oneGthree_DEfour - fourAfive twoCtwo_DEfour - fourAfive
  A_BaP                                       0                          0
  A_BPA                                       0                          0
  A_BPABaP                                    0                          0
  A_NT                                        0                          0
  AT1_BaP                                     0                          0
  AT1_BPA                                     0                          0
  AT1_BPABaP                                  0                          0
  AT1_NT                                      0                          0
  CA_NT                                       0                          0
  fourAfive                                  -1                         -1
  MCFseven                                    0                          0
  oneGthree_DEfour                            1                          0
  twoCtwo_DEfour                              0                          1
  twoFone_wt                                  0                          0

Kristian Almstrup

unread,
Jul 20, 2016, 8:26:33 AM7/20/16
to Epigenomics forum
Dear Tim

Is there any way to control the heatmap and average tracks individually with the DMR.plot function? Compared to the earlier version of the DMR.plot with only mean plots, the heatmap now overtakes much of the space when many samples are included. I would simply like to do a plot without the heatmaps or minimise them. I guess one solution will be to create my own plotTracks with Gviz and if so, you maybe could direct me how to do that.

Thanks!

Kind regards,
Kristian

Kasper Daniel Hansen

unread,
Jul 20, 2016, 8:54:46 AM7/20/16
to epigenom...@googlegroups.com
Tim, why is this needed if you use new version of minfi?

Best,
Kasper
(Sent from my phone.)

On Jul 20, 2016, at 05:12, Tim Peters <stormy...@gmail.com> wrote:

Hi Par,

First make sure you're using the most recent version of DMRcate (currently 1.8.6).

Then, as per the manual, you'll need to add annotation=c(array = "IlluminaHumanMethylationEPIC", annotation = "ilm10b2.hg19") to your call to cpg.annotate() for EPIC data. It should work from there.

Best,
Tim

 On Wednesday, July 20, 2016 at 2:28:50 AM UTC+10, vk wrote:

--
You received this message because you are subscribed to the Google Groups "Epigenomics forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to epigenomicsfor...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tim Peters

unread,
Jul 20, 2016, 10:57:36 PM7/20/16
to Epigenomics forum
Hi vk,

The annotation argument still needs to be explicitly passed to cpg.annotate(), since it takes a "naked' data matrix of M-values instead of a GenomicRatioSet. So the annotation information isn't retained.

Kasper has just provided a wider context for this - the original version of DMRcate was created with an older version of minfi (i.e. before it had GenomicRatioSets) and the argument format has persisted, so I can see why there might be some confusion. Perhaps in the future I will encode an option that takes GenomicRatioSets as the first argument instead of a matrix.

Cheers,
Tim

Tim Peters

unread,
Jul 20, 2016, 11:10:30 PM7/20/16
to Epigenomics forum
Hi Kristian,

I'm afraid not at the moment. Please feel free to create a custom plotTracks, like you suggest. It shouldn't be too hard to unpack the code and remove the heatmap track. 

Cheers,
Tim

Kasper Daniel Hansen

unread,
Jul 21, 2016, 12:01:30 AM7/21/16
to epigenom...@googlegroups.com
Adding that functionality should be easy. Do an if statement at the beginning of the function detecting whether the input object is a matrix or a GenomicRatioSet. 


Best,
Kasper
(Sent from my phone.)

vk

unread,
Jul 21, 2016, 7:24:34 AM7/21/16
to Epigenomics forum
Thank you very much Tim. It worked. But for DMR.plot I have an error now.

myAnnotation <- cpg.annotate(mVals, datatype = "array",analysis.type="differential", annotation=c(array = "IlluminaHumanMethylationEPIC", annotation = "ilm10b2.hg19"), design=design,contrasts = TRUE, cont.matrix = contMatrix,coef="A_NT - AT1_NT - CA_NT")
Your contrast returned 755801 individually significant probes. We recommend the default setting of pcutoff in dmrcate().

DMRs <- dmrcate(myAnnotation, lambda=1000, C=2)

head(DMRs$results)
                        coord no.cpgs minfdr Stouffer maxbetafc meanbetafc
236520 chr6:33156164-33181870     268      0        0 -1.836410 -0.5585352
236364 chr6:32143681-32173532     218      0        0 -1.350725 -0.4145994
236278 chr6:31618987-31640160     216      0        0 -1.245563 -0.3788226
236348 chr6:32034322-32059605     208      0        0 -1.457552 -0.5790459
236535 chr6:33279563-33292029     206      0        0 -1.553673 -0.4180961
251989 chr7:27178861-27211741     206      0        0 -1.410725 -0.5099268

#converted the regions to annotated genomic ranges
results.ranges <- extractRanges(DMRs, genome = "hg19")

groups <- pal[1:length(unique(targets$Sample_Group))]
names(groups) <- levels(factor(targets$Sample_Group))
cols <- groups[as.character(factor(targets$Sample_Group))]
samps <- 1:nrow(targets)

par(mfrow=c(1,1))

DMR.plot(ranges=results.ranges, dmr=1, CpGs=bVals, phen.col=cols, pch=16, toscale=TRUE, plotmedians=TRUE, genome="hg19", samps=samps)
Error: Expectation Failed

Not sure what's going on here.

Tim Peters

unread,
Jul 23, 2016, 5:18:50 AM7/23/16
to Epigenomics forum
Hi vk,

Not sure why you've included toscale, pch and plotmedians arguments - these are deprecated from a previous version and are likely interfering with the call. Get rid of them and try again.

Cheers,
Tim 

Olivia Solomon

unread,
Aug 5, 2016, 1:31:39 PM8/5/16
to Epigenomics forum
Hi Tim, 

I've been using DMRcate for a few different analyses and was wondering about the pcutoff() setting in dmrcate().  In analyses that had individually significant probes from the limma step, the default pcutoff="fdr" leads to a cutoffs on orders of 10^-14-10^-12.  
Do you have any suggestions for what would be considered the maximum appropriate manual cutoff to set in dmrcate() before I should be concerned with type 1 errors in cases where there are no individual sites with limma-fdr <0.05? 

Thank you, 
Olivia 
Message has been deleted

Tim Peters

unread,
Aug 7, 2016, 9:51:46 PM8/7/16
to Epigenomics forum

Hi Olivia,


I don't think specifying a canonical cutoff from those p-values is particularly useful. The low values you are seeing are from the post-smoothing χtest, which is very powerful and often gives very low p-values, hence the caveat you see when running cpg.annotate(). Unfortunately tests with varying levels of power (another example is the hypergeometric test when doing functional enrichment over a number of terms with varying numbers of genes) are prone to this bias. I'd say the p-values here are only useful in a relative sense i.e. comparing the DM signal from different loci from the same fit.  DMRs are instead benchmarked against the number of significant limma probes from the given user-defined cutoff fdr in cpg.annotate(), which IMO is much more robust. To vary the cutoff, I would relax the fdr argument in cpg.annotate() to 0.1, 0.25 etc. to see if you get any DM probes at that "significance" level. dmrcate() will then do the rest with the aforementioned benchmarking with the current default. For an overall significance metric per DMR, use the Stouffer transformation (derived from limma FDRs, not the χtest).


Hope this helps,


Tim

Olivia Solomon

unread,
Aug 8, 2016, 3:09:04 PM8/8/16
to Epigenomics forum
Hi Tim, 

Thank you.  That is very helpful.  

Regarding interpreting the Stouffer estimate: assuming we wanted to consider DMRs with an adjusted p-value <=0.05, would the Stouffer estimate then be used for that cutoff?  Some DMRs that passed through both cpg.annotate fdr=0.05 and dmrcate pcutoff="fdr" still have fairly high Stouffer values (as high as 0.5), so I'm not sure what to make of this.  

Thank you, 
Olivia

Tim Peters

unread,
Aug 9, 2016, 12:01:04 AM8/9/16
to Epigenomics forum
Hi Olivia,

It's up to you. Higher Stouffer values suggest there is some heterogeneity of signal within the DMR. If you're concerned about this you can always make lambda smaller to split them up into more discrete DMRs with (hopefully) more uniform DM signal across them.

Cheers,
Tim

Rachel Jepson

unread,
Aug 18, 2016, 6:35:14 AM8/18/16
to Epigenomics forum
Hi Tim

I have noticed in the current version of DMRcate that following cpg.annotate there is no longer any associated gene/group annotation ($gene/$Group). I just wondered why this feature had been dropped and whether there are any suggestions for adding the information to the DMRcate hits?

Thanks 
Rachel

Tim Peters

unread,
Aug 24, 2016, 8:40:39 AM8/24/16
to Epigenomics forum
Hi Rachel,
I dropped these features since the Illumina annotation (with regards to gene loci) was incomplete and contentious in a few cases. Instead I incorporated complete Gencode transcript loci into DMRcatedata for hg19 (and hg38 and mm10 if you're working with WGBS). You can annotate your output from dmrcate() with overlapping Gencode transcript promoter regions using extractRanges() with default settings.

Cheers,
Tim

Aarti Jajoo

unread,
Sep 27, 2016, 5:05:17 PM9/27/16
to Epigenomics forum
Thank you very much for putting this version up.

I have a question regrading unpaired data, in your previous post (for older version ) you have mentioned that one could just use
design =  model.matrix(~0+type)

But when I use this for cpg.annotate(..) it gives me an error
Error: colnames(design)[1] == "(Intercept)" is not TRUE

I tried my best to look into your old posts to find an answer, sorry if you have already addressed this issue before.

Thanks,
Aarti

Tim Peters

unread,
Sep 27, 2016, 10:48:25 PM9/27/16
to Epigenomics forum
Dear Aarti,
Design matrices such as the one you specified will need a contrast matrix and contrasts=T given as arguments to cpg.annotate, otherwise you will run into the intercept error you specified.
If type is binary then you can just specify design =  model.matrix(~type) and coef=2. But if there are 3 or more levels you will want to define contrasts. Have a look at the limma manual, in particular the worked example on page 36. https://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf

Best,
Tim

Aarti Jajoo

unread,
Sep 28, 2016, 10:07:25 AM9/28/16
to Epigenomics forum
Thank you very much for the prompt reply Tim.

type is binary in my case. I had tried  design as suggested by you, but I was getting the following remark "Your contrast returned no individually significant probes.". Hence I was not sure if  design was correct. But thank you for verifying this aspect, and as suggested I will look into limma manual as well to understand things better.

Regards,
Aarti

Priyanka

unread,
Mar 24, 2017, 5:51:32 AM3/24/17
to Epigenomics forum
Dear Tim.

I'm working on 450K data of paired samples. 
I have earlier performed DMCpG analysis using paired design model from limma using the following script

## Load M-value file 
Paired_M_val <- read.csv(".../Paired_M_val.csv", row.names=1, check.names = FALSE)

## Load targets file for analysis
Targets_Paired <- read.csv("..../Paired_targets.csv", row.names=1, check.names = FALSE)
Treat_p_outcome <- factor(Targets_Paired$Sample_Outcome_Details)
Treat_p_Batch <- Targets_Paired$Sentrix_ID_coded

## Model: Adjusted for batch effects
design_paired_model <- model.matrix(~0+Treat_p_outcome+Treat_p_Batch)
cm_paired_model <- makeContrasts(case2vscase1 = Treat_p_outcome2-Treat_p_outcome1,case3vscase1 = Treat_p_outcome3-Treat_p_outcome1)
corfit_paired_model <- duplicateCorrelation(Paired_M_val, design_paired_model, block=Targets_Paired$SampleID)

## Input the inter-subject correlation into the linear model fit:
fit_paired_model <- lmFit(Paired_M_val, design_paired_model, block=Targets_Paired$SampleID, correlation=corfit_paired_model$consensus)

## Computing contrasts and moderated t-tests for cleanM_combat group wise
fit_paired_model_final <- contrasts.fit(fit_paired_model,cm_paired_model)
fit_paired_model_final_eBayes <- eBayes(fit_paired_model_final, trend = TRUE)

## Extracting results
case2vscase1_paired_model_adjusted_BH <- topTable(fit_paired_model_final_eBayes,coef="case2vscase1",adjust="BH")
case3vscase1_paired_model_adjusted_BH <- topTable(fit_paired_model_final_eBayes,coef="case3vscase1",adjust="BH")

I wish to perform DMR analysis using DMRcate. However I'm unable to fit the paired nature of samples in the design matrix.

Following is my session info

> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
 [1] splines   stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DMRcate_1.10.8             DMRcatedata_1.10.1         DSS_2.14.0                 bsseq_1.10.0               limma_3.30.13             
 [6] minfi_1.20.2               bumphunter_1.14.0          locfit_1.5-9.1             iterators_1.0.8            foreach_1.4.3             
[11] Biostrings_2.42.1          XVector_0.14.1             SummarizedExperiment_1.4.0 GenomicRanges_1.26.4       GenomeInfoDb_1.10.3       
[16] IRanges_2.8.2              S4Vectors_0.12.2           Biobase_2.34.0             BiocGenerics_0.20.0       

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2                                    siggenes_1.48.0                                    
  [3] mclust_5.2.3                                        IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0 
  [5] biovizBase_1.22.0                                   htmlTable_1.9                                      
  [7] base64enc_0.1-3                                     dichromat_2.0-0                                    
  [9] base64_2.0                                          IlluminaHumanMethylationEPICanno.ilm10b2.hg19_0.6.0
 [11] interactiveDisplayBase_1.12.0                       AnnotationDbi_1.36.2                               
 [13] codetools_0.2-15                                    R.methodsS3_1.7.1                                  
 [15] methylumi_2.20.0                                    knitr_1.15.1                                       
 [17] Formula_1.2-1                                       Rsamtools_1.26.1                                   
 [19] annotate_1.52.1                                     GO.db_3.4.0                                        
 [21] cluster_2.0.6                                       R.oo_1.21.0                                        
 [23] shiny_1.0.0                                         httr_1.2.1                                         
 [25] backports_1.0.5                                     assertthat_0.1                                     
 [27] Matrix_1.2-8                                        lazyeval_0.2.0                                     
 [29] acepack_1.4.1                                       htmltools_0.3.5                                    
 [31] tools_3.3.3                                         gtable_0.2.0                                       
 [33] doRNG_1.6                                           Rcpp_0.12.10                                       
 [35] multtest_2.30.0                                     preprocessCore_1.36.0                              
 [37] nlme_3.1-131                                        rtracklayer_1.34.2                                 
 [39] stringr_1.2.0                                       mime_0.5                                           
 [41] ensembldb_1.6.2                                     rngtools_1.2.4                                     
 [43] gtools_3.5.0                                        statmod_1.4.29                                     
 [45] XML_3.98-1.5                                        beanplot_1.2                                       
 [47] org.Hs.eg.db_3.4.0                                  AnnotationHub_2.6.5                                
 [49] zlibbioc_1.20.0                                     MASS_7.3-45                                        
 [51] scales_0.4.1                                        BSgenome_1.42.0                                    
 [53] VariantAnnotation_1.20.3                            BiocInstaller_1.24.0                               
 [55] GEOquery_2.40.0                                     RColorBrewer_1.1-2                                 
 [57] missMethyl_1.8.0                                    yaml_2.1.14                                        
 [59] memoise_1.0.0                                       gridExtra_2.2.1                                    
 [61] ggplot2_2.2.1                                       pkgmaker_0.22                                      
 [63] biomaRt_2.30.0                                      rpart_4.1-10                                       
 [65] reshape_0.8.6                                       latticeExtra_0.6-28                                
 [67] stringi_1.1.3                                       RSQLite_1.1-2                                      
 [69] genefilter_1.56.0                                   checkmate_1.8.2                                    
 [71] permute_0.9-4                                       GenomicFeatures_1.26.3                             
 [73] BiocParallel_1.8.1                                  matrixStats_0.51.0                                 
 [75] bitops_1.0-6                                        nor1mix_1.2-2                                      
 [77] lattice_0.20-34                                     ruv_0.9.6                                          
 [79] GenomicAlignments_1.10.1                            htmlwidgets_0.8                                    
 [81] plyr_1.8.4                                          magrittr_1.5                                       
 [83] R6_2.2.0                                            Hmisc_4.0-2                                        
 [85] IlluminaHumanMethylation450kmanifest_0.4.0          DBI_0.6                                            
 [87] foreign_0.8-67                                      survival_2.41-2                                    
 [89] RCurl_1.95-4.8                                      nnet_7.3-12                                        
 [91] tibble_1.2                                          grid_3.3.3                                         
 [93] data.table_1.10.4                                   digest_0.6.12                                      
 [95] xtable_1.8-2                                        httpuv_1.3.3                                       
 [97] R.utils_2.5.0                                       illuminaio_0.16.0                                  
 [99] openssl_0.9.6                                       munsell_0.4.3                                      
[101] registry_0.3                                        BiasedUrn_1.07                                     
[103] Gviz_1.18.2                                         quadprog_1.5-5  

Could you please help me with inputting the paired sample details in cpgannotate function?

Thank you in advance.

-Priyanka

Kristian Almstrup

unread,
Apr 10, 2017, 1:50:23 AM4/10/17
to epigenom...@googlegroups.com
Dear Tim and others

Along the same line.

Will it make sense to included surrogate variables in the design matrix for cpg.annotate when surrogate variables have been calculated from single CpGs and not regions per se?

The question is whether correcting (too much) for variability in single CpGs actually can kill some of the signals on a regional level.

Kind regards,
Kristian


--
You received this message because you are subscribed to a topic in the Google Groups "Epigenomics forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/epigenomicsforum/G3yO92pEqzQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to epigenomicsfor...@googlegroups.com.

rsheib

unread,
May 17, 2017, 5:05:43 AM5/17/17
to Epigenomics forum

Dear Tim, 

Is there anyway to categorize DMRs output from DMRcate into hyper/hypo methylated like what is possible in minfi by using bumphunter method? And also is there a way to extract regions of the hyper-methylated promoters regions from DMRcate.output?

Many thanks in advance, 

Tim Triche, Jr.

unread,
May 18, 2017, 10:34:48 AM5/18/17
to Epigenomics forum
(this is a different Tim replying):

yes, of course.  And I say this having used both the DSS-backed N=1 DMR calling approach and also DMRcate's approach on the same pediatric samples.  It's been an education -- the arrays are better for some things (e.g. CNV) and the WGBS for others (although maybe not the ones that would most immediately come to mind).  Anyways...

Let's use the built-in example.

     data(dmrcatedata)
     myMs <- logit2(myBetas)
     myMs.noSNPs <- rmSNPandCH(myMs, dist=2, mafcut=0.05)
     patient <- factor(sub("-.*", "", colnames(myMs)))
     type <- factor(sub(".*-", "", colnames(myMs)))
     design <- model.matrix(~patient + type)
     myannotation <- cpg.annotate("array", myMs.noSNPs, what="M", arraytype = "450K",
                                  analysis.type="differential", design=design, coef=39)
     dmrcoutput <- dmrcate(myannotation, lambda=1000)

Now we have an object of class "dmrcate.output" (try class(dmrcoutput) if you don't believe me) with an element named "result". 

Let's use that to turn it into something that we can crank out BED files from.

    dmrcoutput$gr <- as(dmrcoutput$results$coord, "GRanges")
    dmrcoutput$gr$score <- dmrcoutput$results$meanbetafc
    dmrcoutput$gr$p <- dmrcoutput$results$Stouffer

    # in this example, it's useless, but just FYI:
    pcutoff <- 0.05 
    dmrcoutput$gr <- subset(dmrcoutput$gr, p < pcutoff)

    # now split by direction
    hyperHypo <- split(dmrcoutput$gr, ifelse(dmrcoutput$gr$score > 0, "hyper", "hypo"))
    sapply(hyperHypo, length)
    # hyper  hypo
    #    336   407

    # now export each 
    library(rtracklayer)
    for (direction in names(hyperHypo)) { 
       export(hyperHypo[[direction]], paste0("/tmp/", direction, "DMRs.hg19.bed"))
    }

And that's that.  Similar logic for the output of DSS' callDMR() output. 

rsheib

unread,
May 21, 2017, 7:43:10 AM5/21/17
to Epigenomics forum
Dear Tim,

I am so grateful for your help and explanation, that helps a lot. I have one more question is the "Score" here (dmrcoutput$results$meanbetafc) a logarithmic value? this is the definition in DMRcate: 'meanbetafc': Mean beta fold change within the region. because if it is not, then we should nt have hyperHypo <- split(dmrcoutput$gr, ifelse(dmrcoutput$gr$score > 1, "hyper", "hypo")) ?

To define, only the hyper DMRs in the promoter region, I did it like this, Is that correct?
promoter_regions <- promoters(TxDb.Hsapiens.UCSC.hg19.knownGene,upstream=1000, downstream=200)
hyper_promoters <- subsetByOverlaps(hyperHypo$hyper, promoter_regions)
hypo_promoters <- subsetByOverlaps(hyperHypo$hypo, promoter_regions)

Many many thanks again for your help and ime,
rsheib
Reply all
Reply to author
Forward
Message has been deleted
0 new messages