DMRcate for bisulphite sequencing data

Tim Peters

unread,

Dec 15, 2015, 7:20:36 PM12/15/15

to Epigenomics forum

Dear all,

A significantly updated version of DMRcate is now available (http://bioconductor.org/packages/3.2/bioc/html/DMRcate.html). The major change is that it now includes functionality for DMR calling from sequencing data (WGBS, RRBS etc.), provided with total reads and methylated reads for each CpG site. 450K functionality is still available and EPIC array functionality should be coming soon, depending on how regularly Illumina is updating the manifest file.

A worked example for sequencing data can be found in the vignette. Because of the greater number of CpG sites the ideal kernel size should me a fair amount smaller (i.e. argument C is larger when calling dmrcate()) than that for 450K (suggested size is in the vignette).

A summary of the major included changes is as follows:

- DMR finding from WGBS and RRBS data, using DSS (instead of limma) as a backend for the regression step

- DMR.plot() has been completely rewritten using the Gviz platform (see image below). For the DMR passed via the argument

A locus showing the DMR plus 5kb up and downstream in each direction (more if there is another DMR within 5kb) for context
Overlapping transcript loci for a choice of hg19, hg38 or mm10
Heatmap showing individual methylation for given samples
Line plot tracking mean methylation per phenotype
Chromosome ideogram

- DMRs are now ranked by their Stouffer values from their individual constituent FDRs (the kernel smoothing and thresholding part of the method hasn't changed). Minpval and meanpval have been removed as a result.

Any feedback is welcome either here or at <t.peters at garvan.org.au>.

Best wishes,

Tim

Yalbi Itzel Balderas Martinez

unread,

Feb 1, 2016, 10:18:22 PM2/1/16

to Epigenomics forum

Dear Tim,

I have found your package very interesting and also your papers, however, I have found some errors in my script, and I am not sure what is happening, I really hope you could help me.

In particular, in the function

results.ranges <- extractRanges(dmrcoutput, genome = "hg19")

It appears the error: Error in strsplit(xx, ":") : non-character argument

I tried to solved this, creating the object result.ranges using GRanges,

This is the object results.ranges:

GRanges object with 1194 ranges and 6 metadata columns:

seqnames ranges strand | gene_assoc group no.probes minpval meanpval

136 chr10 [ 64344052, 64344052] * | ZNF365 5'UTR,Body 1 2.074795e-25 2.074795e-25

296 chr12 [124246391, 124247223] * | DNAH10 TSS1500,TSS200,1stExon,5'UTR,Body 8 1.399450e-22 8.077516e-09

654 chr2 [ 74210857, 74212049] * | 6 1.798993e-19 1.325255e-07

944 chr6 [ 28890518, 28892079] * | TRIM27 Body,1stExon,5'UTR,TSS200,TSS1500 51 4.541014e-17 7.079132e-04

914 chr5 [149492480, 149493258] * | CSF1R 5'UTR,TSS200,TSS1500 7 3.748344e-16 1.193855e-03

... ... ... ... ... ... ... ... ... ...

207 chr11 [ 8289620, 8289620] * | 1 0.04913373 0.04913373

992 chr6 [ 79620031, 79620031] * | 1 0.04920299 0.04920299

702 chr20 [ 1374350, 1374350] * | FKBP1A TSS1500 1 0.04952731 0.04952731

58 chr1 [ 67798929, 67798929] * | IL12RB2 Body 1 0.04975760 0.04975760

174 chr10 [131665838, 131665838] * | EBF3 Body 1 0.04985599 0.04985599

maxbetafc

136 0.30608766

296 0.13140995

654 0.18093186

944 -0.06241058

914 0.06947876

... ...

207 0.04545169

992 0.05171461

702 0.07533435

58 -0.09641478

174 -0.01000234

-------

seqinfo: 24 sequences from an unspecified genome; no seqlengths

But then, when I use the function DMR.plot, it does not work again,

groups <- c(Normal="magenta", Disease="forestgreen")

DMR.plot(ranges=results.ranges, dmr=1, CpGs=myBetas,

phen.col=c("Normal", "Normal", "Disease", "Disease"), genome="hg19",

samps=samps)

and it appears this error:

Error in (function (classes, fdef, mtable) :

unable to find an inherited method for function ‘Rle’ for signature ‘"GRanges", "missing"’

Could you help me in finding what is happening? I am so sorry if this is a basic question....

Thanks,

This is my session info

> sessionInfo()

R version 3.2.2 (2015-08-14)

Platform: x86_64-apple-darwin13.4.0 (64-bit)

Running under: OS X 10.9.5 (Mavericks)

locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:

[1] grid splines stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:

[1] RnBeads_1.2.1 methylumi_2.16.0

[3] FDb.InfiniumMethylation.hg19_2.2.0 org.Hs.eg.db_3.2.3

[5] RSQLite_1.0.0 DBI_0.3.1

[7] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.22.8

[9] AnnotationDbi_1.32.3 reshape2_1.4.1

[11] scales_0.3.0 illuminaio_0.12.0

[13] matrixStats_0.50.1 limma_3.26.5

[15] gridExtra_2.0.0 gplots_2.17.0

[17] ggplot2_2.0.0 fields_8.3-6

[19] maps_3.0.2 spam_1.3-0

[21] ff_2.2-13 bit_1.1-12

[23] cluster_2.0.3 RColorBrewer_1.1-2

[25] MASS_7.3-45 IlluminaHumanMethylation450kanno.ilmn12.hg19_0.2.1

[27] plyr_1.8.3 DMRcate_1.6.4

[29] DMRcatedata_1.6.1 DSS_2.10.0

[31] bsseq_1.6.0 minfi_1.16.0

[33] bumphunter_1.10.0 locfit_1.5-9.1

[35] iterators_1.0.8 foreach_1.4.3

[37] Biostrings_2.38.3 XVector_0.10.0

[39] SummarizedExperiment_1.0.2 GenomicRanges_1.22.3

[41] GenomeInfoDb_1.6.3 IRanges_2.4.6

[43] S4Vectors_0.8.7 lattice_0.20-33

[45] GEOquery_2.36.0 Biobase_2.30.0

[47] BiocGenerics_0.16.1 biomaRt_2.26.1

loaded via a namespace (and not attached):

[1] nlme_3.1-124 bitops_1.0-6 tools_3.2.2 doRNG_1.6 nor1mix_1.2-1

[6] KernSmooth_2.23-15 rpart_4.1-10 Hmisc_3.17-1 Gviz_1.14.2 colorspace_1.2-6

[11] nnet_7.3-11 base64_1.1 preprocessCore_1.32.0 chron_2.3-47 pkgmaker_0.22

[16] rtracklayer_1.30.1 caTools_1.17.1 genefilter_1.52.0 quadprog_1.5-5 stringr_1.0.0

[21] digest_0.6.9 Rsamtools_1.22.0 foreign_0.8-66 siggenes_1.44.0 R.utils_2.2.0

[26] dichromat_2.0-0 BSgenome_1.38.0 mclust_5.1 BiocParallel_1.4.3 gtools_3.5.0

[31] acepack_1.3-3.3 R.oo_1.19.0 VariantAnnotation_1.16.4 RCurl_1.95-4.7 magrittr_1.5

[36] Formula_1.2-1 futile.logger_1.4.1 Rcpp_0.12.3 munsell_0.4.2 R.methodsS3_1.7.0

[41] stringi_1.0-1 zlibbioc_1.16.0 gdata_2.17.0 multtest_2.26.0 annotate_1.48.0

[46] beanplot_1.2 igraph_1.0.1 rngtools_1.2.4 corpcor_1.6.8 codetools_0.2-14

[51] mixOmics_5.2.0 futile.options_1.0.0 XML_3.98-1.3 biovizBase_1.18.0 latticeExtra_0.6-26

[56] lambda.r_1.1.7 data.table_1.9.6 gtable_0.1.2 reshape_0.8.5 xtable_1.8-0

[61] survival_2.38-3 GenomicAlignments_1.6.3 registry_0.3 ellipse_0.3-8 rgl_0.95.1441

Tim Peters

unread,

Feb 9, 2016, 1:47:10 AM2/9/16

to Epigenomics forum

Dear Yalbi,

I have ironed out a couple of bugs in the last couple of weeks involving extractRanges() and DMR.plot() - could you update DMRcate to version 1.6.51 (plus all dependencies and imports) and try again please?

Thanks,

Tim

Yalbi Itzel Balderas Martinez

unread,

Mar 8, 2016, 10:13:40 PM3/8/16

to Epigenomics forum

Dear Tim,

I have updated all the packages and libraries, but it is not working, I have this message error when I use result.ranges

"Error in extractRanges(DMRs, genome = "hg19") :

object 'tx.hg19' not found"

I am not sure if this is an object from another library??

Regards,

Tim Triche, Jr.

unread,

Mar 9, 2016, 4:24:33 PM3/9/16

to Epigenomics forum

post a traceback (or try debug(extractRanges) beforehand). I wrote that particular function, it's probably some cruft I left in there.

ps. If anyone wants to help me add a Savitsky-Golay filter to choose the best cutoff for turning a -log10(p) "pileup" of DMRs into a BED file, please let me know. There are a number of things I'd like to merge back into the main trunk, so if I'm going to fix one I might as well fix them all.

best,

--t

Tim Peters

unread,

Mar 13, 2016, 5:24:51 AM3/13/16

to Epigenomics forum

Hi Yalbi,

You'll need to load the data package into the workspace via

data(dmrcatedata)

and then try again. This contains the annotation objects tx.hg19, tx.hg38 and tx.mm10.

Also, I discovered a bug on line 41 in DMR.plot() this week that would have thrown an error if there was a "C" in any of the column names of the beta/counts object - I hope it wasn't what was messing up your calls. 1.6.52 is now available to download from Bioconductor with the corrected version of DMR.plot().

Good luck,

Tim

Message has been deleted

Tim Peters

unread,

Apr 12, 2016, 12:52:13 AM4/12/16

to Epigenomics forum

Hi again folks,

Thanks for all your feedback on the new version. A recurring problem with extractRanges() and DMR.plot() that was the result of me not declaring the correct environments for the test data should now be fixed in version 1.6.53. Thanks to Yalbi and others for pointing all of this out. Please keep sending me bug reports and questions - it helps improve the package.

Best,

Tim

Tim Peters

unread,

May 25, 2016, 9:55:16 PM5/25/16

to Epigenomics forum

Hi all,

The latest version of DMRcate (1.8.5) can now call DMRs from Infinium MethylationEPIC Kit samples. Thanks to Kasper Hansen for making the annotation package available on Bioconductor ( http://bioconductor.org/packages/3.3/data/annotation/html/IlluminaHumanMethylationEPICanno.ilm10b2.hg19.html).

The workflow is identical to that for 450K arrays, except that annotation=c(array = "IlluminaHumanMethylationEPIC", annotation = "ilm10b2.hg19") should be passed to cpg.annotate() and array.annotation=c(array = "IlluminaHumanMethylationEPIC", annotation = "ilm10b2.hg19") to DMR.plot(). Initial testing looks like optimal kernel size is very similar to 450K, so defaults have been left as is.

Happy DMR finding,

Tim

vk

unread,

Jul 19, 2016, 12:28:50 PM7/19/16

to Epigenomics forum

Hi Tim,

I am working on EPIC data to find differentially methylated regions. I am using DMRcate. I am getting an error for cpg.annotate

myAnnotation <- cpg.annotate(mVals, datatype = "array",analysis.type="differential", design=design,contrasts = TRUE, cont.matrix = contMatrix,coef="MCFseven - fourAfive")

Your contrast returned 496947 individually significant probes. We recommend the default setting of pcutoff in dmrcate().

Error in data.frame(ID = rownames(object), stat = stat, CHR = RSanno$chr, :

arguments imply differing number of rows: 812896, 425236

I saw your previous reply for this error. But Yes, I had removed SNP probes too. Still I get this error. Please have look to my code and functions I used.

targets$Sample_Group

[1] "A_NT" "A_BPA" "A_BaP" "A_BPABaP"

[5] "AT1_NT" "AT1_BPA" "AT1_BaP" "AT1_BPABaP"

[9] "CA_NT" "A_NT" "A_BPA" "A_BaP"

[13] "A_BPABaP" "AT1_NT" "AT1_BPA" "AT1_BaP"

[17] "AT1_BPABaP" "CA_NT" "A_NT" "A_BPA"

[21] "A_BaP" "A_BPABaP" "AT1_NT" "AT1_BPA"

[25] "AT1_BaP" "AT1_BPABaP" "CA_NT" "fourAfive"

[29] "twoFone_wt" "oneGthree_DEfour" "twoCtwo_DEfour" "MCFseven"

[33] "fourAfive" "twoFone_wt" "oneGthree_DEfour" "twoCtwo_DEfour"

[37] "MCFseven"

mSetRaw <- preprocessRaw(rgSet)

mSetSq <- preprocessFunnorm(rgSet)

[preprocessFunnorm] Background and dye bias correction with noob

[preprocessNoob] Using sample number 24 as reference level...

[preprocessFunnorm] Mapping to genome

[preprocessFunnorm] Quantile extraction

[preprocessFunnorm] Normalization

Warning message:

In .getSex(CN = CN, xIndex = xIndex, yIndex = yIndex, cutoff = cutoff) :

An inconsistency was encountered while determining sex. One possibility is that only one sex is present. We recommend further checks, for example with the plotSex function.

detP <- detP[match(featureNames(mSetSq),rownames(detP)),]

keep <- rowSums(detP < 0.01) == ncol(mSetSq)

table(keep)

keep

FALSE TRUE

6018 860818

mSetSqFlt <- mSetSq[keep,]

mSetSqFlt

class: GenomicRatioSet

dim: 860818 37

metadata(0):

assays(2): Beta CN

rownames(860818): cg14817997 cg26928153 ... cg07587934 cg16855331

rowData names(0):

colnames(37): A_NT.1 A_BPA.2 ... twoCtwo_DEfour.36 MCFseven.37

colData names(10): Sample_Name Sample_Well ... filenames predictedSex

Annotation

array: IlluminaHumanMethylationEPIC

annotation: ilm10b2.hg19

Preprocessing

Method: Raw (no normalization or bg correction)

minfi version: 1.18.2

Manifest version: 0.3.0

keep <- !(featureNames(mSetSqFlt) %in% annEPIC$Name[annEPIC$chr %in% c("chrX","chrY")])

table(keep)

keep

FALSE TRUE

19098 841720

mSetSqFlt <- mSetSqFlt[keep,]

mSetSqFlt <- dropLociWithSnps(mSetSqFlt)

mSetSqFlt

class: GenomicRatioSet

dim: 812896 37

metadata(0):

assays(2): Beta CN

rownames(812896): cg14817997 cg26928153 ... cg07660283 cg09226288

rowData names(0):

colnames(37): A_NT.1 A_BPA.2 ... twoCtwo_DEfour.36 MCFseven.37

colData names(10): Sample_Name Sample_Well ... filenames predictedSex

Annotation

array: IlluminaHumanMethylationEPIC

annotation: ilm10b2.hg19

Preprocessing

Method: Raw (no normalization or bg correction)

minfi version: 1.18.2

Manifest version: 0.3.0

mVals <- getM(mSetSqFlt)

head(mVals[,1:5])

A_NT.1 A_BPA.2 A_BaP.3 A_BPABaP.4 AT1_NT.5

cg14817997 0.5209928 0.4748454 0.4805088 0.4106944 -0.09737575

cg26928153 3.8652288 3.2916606 4.0673276 3.7638197 4.09015606

cg16269199 1.9067538 1.9842475 2.2688374 1.8933198 2.40681559

cg13869341 1.8235001 2.0258282 1.9080779 1.9390136 2.32162979

cg14008030 2.3367597 2.3639274 2.0101783 2.0230830 1.98342362

cg12045430 -1.6365938 -2.0614246 -2.1343550 -2.0402295 -1.80513521

bVals <- getBeta(mSetSqFlt)

head(bVals[,1:5])

A_NT.1 A_BPA.2 A_BaP.3 A_BPABaP.4 AT1_NT.5

cg14817997 0.5893127 0.5815496 0.5825045 0.5706912 0.4831325

cg26928153 0.9357864 0.9073421 0.9437076 0.9314310 0.9445423

cg16269199 0.7894579 0.7982473 0.8281594 0.7879060 0.8413460

cg13869341 0.7797060 0.8028491 0.7896104 0.7931505 0.8333046

cg14008030 0.8347563 0.8373375 0.8011264 0.8025477 0.7981553

cg12045430 0.2433499 0.1932747 0.1855147 0.1955757 0.2224883

cellType <- factor(targets$Sample_Group)

design <- model.matrix(~0+cellType, data=targets)

colnames(design) <- c(levels(cellType))

fit <- lmFit(mVals, design)

contMatrix <- makeContrasts(MCFseven-fourAfive,twoFone_wt-fourAfive,oneGthree_DEfour-fourAfive,twoCtwo_DEfour-fourAfive, levels=design)

fit2 <- contrasts.fit(fit, contMatrix)

fit2 <- eBayes(fit2)

summary(decideTests(fit2))

MCFseven - fourAfive twoFone_wt - fourAfive oneGthree_DEfour - fourAfive

-1 177463 70477 78167

0 315949 489922 494446

1 319484 252497 240283

twoCtwo_DEfour - fourAfive

-1 82238

0 488685

1 241973

myAnnotation <- cpg.annotate(mVals, datatype = "array",analysis.type="differential", design=design,contrasts = TRUE, cont.matrix = contMatrix,coef="MCFseven - fourAfive")

Your contrast returned 496947 individually significant probes. We recommend the default setting of pcutoff in dmrcate().

Error in data.frame(ID = rownames(object), stat = stat, CHR = RSanno$chr, :

arguments imply differing number of rows: 812896, 425236

Looking forward to your response. Thank you in advance.

Message has been deleted

Tim Peters

unread,

Jul 20, 2016, 5:14:27 AM7/20/16

to Epigenomics forum

Hi vk,

First make sure you're using the most recent version of DMRcate (currently 1.8.6).

Then, as per the manual, you'll need to add annotation=c(array = "IlluminaHumanMethylationEPIC", annotation = "ilm10b2.hg19") to your call to cpg.annotate() for EPIC data. It should work from there.

Best,

Tim

vk

unread,

Jul 20, 2016, 5:33:40 AM7/20/16

to Epigenomics forum

Hi Tim,

Thanks for the reply. Yes I am using recent version of DMRcate (currently 1.8.6).

sessionInfo()

R version 3.3.0 (2016-05-03)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Ubuntu 14.04.4 LTS

locale:

[1] C

attached base packages:

[1] splines grid stats4 parallel stats graphics grDevices

[8] utils datasets methods base

other attached packages:

[1] stringr_1.0.0

[2] DMRcate_1.8.6

[3] DMRcatedata_1.8.2

[4] DSS_2.12.0

[5] bsseq_1.8.2

[6] Gviz_1.16.1

[7] minfiData_0.14.0

[8] IlluminaHumanMethylationEPICmanifest_0.3.0

[9] IlluminaHumanMethylationEPICanno.ilm10b2.hg19_0.3.0

[10] matrixStats_0.50.2

[11] missMethyl_1.6.2

[12] RColorBrewer_1.1-2

[13] minfi_1.18.2

[14] bumphunter_1.12.0

I added annotation before itself.

annEPIC = getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b2.hg19)

This is how contMatrix looks.

contMatrix

Contrasts

Levels MCFseven - fourAfive twoFone_wt - fourAfive

A_BaP 0 0

A_BPA 0 0

A_BPABaP 0 0

A_NT 0 0

AT1_BaP 0 0

AT1_BPA 0 0

AT1_BPABaP 0 0

AT1_NT 0 0

CA_NT 0 0

fourAfive -1 -1

MCFseven 1 0

oneGthree_DEfour 0 0

twoCtwo_DEfour 0 0

twoFone_wt 0 1

Contrasts

Levels oneGthree_DEfour - fourAfive twoCtwo_DEfour - fourAfive

A_BaP 0 0

A_BPA 0 0

A_BPABaP 0 0

A_NT 0 0

AT1_BaP 0 0

AT1_BPA 0 0

AT1_BPABaP 0 0

AT1_NT 0 0

CA_NT 0 0

fourAfive -1 -1

MCFseven 0 0

oneGthree_DEfour 1 0

twoCtwo_DEfour 0 1

twoFone_wt 0 0

Kristian Almstrup

unread,

Jul 20, 2016, 8:26:33 AM7/20/16

to Epigenomics forum

Dear Tim

Is there any way to control the heatmap and average tracks individually with the DMR.plot function? Compared to the earlier version of the DMR.plot with only mean plots, the heatmap now overtakes much of the space when many samples are included. I would simply like to do a plot without the heatmaps or minimise them. I guess one solution will be to create my own plotTracks with Gviz and if so, you maybe could direct me how to do that.

Thanks!

Kind regards,

Kristian

Kasper Daniel Hansen

unread,

Jul 20, 2016, 8:54:46 AM7/20/16

to epigenom...@googlegroups.com

Tim, why is this needed if you use new version of minfi?

Best,

Kasper

(Sent from my phone.)

On Jul 20, 2016, at 05:12, Tim Peters <stormy...@gmail.com> wrote:

Hi Par,

First make sure you're using the most recent version of DMRcate (currently 1.8.6).

Then, as per the manual, you'll need to add annotation=c(array = "IlluminaHumanMethylationEPIC", annotation = "ilm10b2.hg19") to your call to cpg.annotate() for EPIC data. It should work from there.

Best,

Tim

On Wednesday, July 20, 2016 at 2:28:50 AM UTC+10, vk wrote:

--
You received this message because you are subscribed to the Google Groups "Epigenomics forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to epigenomicsfor...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tim Peters

unread,

Jul 20, 2016, 10:57:36 PM7/20/16

to Epigenomics forum

Hi vk,

The annotation argument still needs to be explicitly passed to cpg.annotate(), since it takes a "naked' data matrix of M-values instead of a GenomicRatioSet. So the annotation information isn't retained.

Kasper has just provided a wider context for this - the original version of DMRcate was created with an older version of minfi (i.e. before it had GenomicRatioSets) and the argument format has persisted, so I can see why there might be some confusion. Perhaps in the future I will encode an option that takes GenomicRatioSets as the first argument instead of a matrix.

Cheers,

Tim

Tim Peters

unread,

Jul 20, 2016, 11:10:30 PM7/20/16

to Epigenomics forum

Hi Kristian,

I'm afraid not at the moment. Please feel free to create a custom plotTracks, like you suggest. It shouldn't be too hard to unpack the code and remove the heatmap track.

Cheers,

Tim

Kasper Daniel Hansen

unread,

Jul 21, 2016, 12:01:30 AM7/21/16

to epigenom...@googlegroups.com

Adding that functionality should be easy. Do an if statement at the beginning of the function detecting whether the input object is a matrix or a GenomicRatioSet.

Best,

Kasper

(Sent from my phone.)

vk

unread,

Jul 21, 2016, 7:24:34 AM7/21/16

to Epigenomics forum

Thank you very much Tim. It worked. But for DMR.plot I have an error now.

myAnnotation <- cpg.annotate(mVals, datatype = "array",analysis.type="differential", annotation=c(array = "IlluminaHumanMethylationEPIC", annotation = "ilm10b2.hg19"), design=design,contrasts = TRUE, cont.matrix = contMatrix,coef="A_NT - AT1_NT - CA_NT")

Your contrast returned 755801 individually significant probes. We recommend the default setting of pcutoff in dmrcate().

DMRs <- dmrcate(myAnnotation, lambda=1000, C=2)

head(DMRs$results)

coord no.cpgs minfdr Stouffer maxbetafc meanbetafc

236520 chr6:33156164-33181870 268 0 0 -1.836410 -0.5585352

236364 chr6:32143681-32173532 218 0 0 -1.350725 -0.4145994

236278 chr6:31618987-31640160 216 0 0 -1.245563 -0.3788226

236348 chr6:32034322-32059605 208 0 0 -1.457552 -0.5790459

236535 chr6:33279563-33292029 206 0 0 -1.553673 -0.4180961

251989 chr7:27178861-27211741 206 0 0 -1.410725 -0.5099268

#converted the regions to annotated genomic ranges

results.ranges <- extractRanges(DMRs, genome = "hg19")

groups <- pal[1:length(unique(targets$Sample_Group))]

names(groups) <- levels(factor(targets$Sample_Group))

cols <- groups[as.character(factor(targets$Sample_Group))]

samps <- 1:nrow(targets)

par(mfrow=c(1,1))

DMR.plot(ranges=results.ranges, dmr=1, CpGs=bVals, phen.col=cols, pch=16, toscale=TRUE, plotmedians=TRUE, genome="hg19", samps=samps)

Error: Expectation Failed

Not sure what's going on here.

Tim Peters

unread,

Jul 23, 2016, 5:18:50 AM7/23/16

to Epigenomics forum

Hi vk,

Not sure why you've included toscale, pch and plotmedians arguments - these are deprecated from a previous version and are likely interfering with the call. Get rid of them and try again.

Cheers,

Tim

Olivia Solomon

unread,

Aug 5, 2016, 1:31:39 PM8/5/16

to Epigenomics forum

Hi Tim,

I've been using DMRcate for a few different analyses and was wondering about the pcutoff() setting in dmrcate(). In analyses that had individually significant probes from the limma step, the default pcutoff="fdr" leads to a cutoffs on orders of 10^-14-10^-12.

Do you have any suggestions for what would be considered the maximum appropriate manual cutoff to set in dmrcate() before I should be concerned with type 1 errors in cases where there are no individual sites with limma-fdr <0.05?

Thank you,

Olivia

Message has been deleted

Tim Peters

unread,

Aug 7, 2016, 9:51:46 PM8/7/16

to Epigenomics forum

Hi Olivia,

I don't think specifying a canonical cutoff from those p-values is particularly useful. The low values you are seeing are from the post-smoothing χ2 test, which is very powerful and often gives very low p-values, hence the caveat you see when running cpg.annotate(). Unfortunately tests with varying levels of power (another example is the hypergeometric test when doing functional enrichment over a number of terms with varying numbers of genes) are prone to this bias. I'd say the p-values here are only useful in a relative sense i.e. comparing the DM signal from different loci from the same fit. DMRs are instead benchmarked against the number of significant limma probes from the given user-defined cutoff fdr in cpg.annotate(), which IMO is much more robust. To vary the cutoff, I would relax the fdr argument in cpg.annotate() to 0.1, 0.25 etc. to see if you get any DM probes at that "significance" level. dmrcate() will then do the rest with the aforementioned benchmarking with the current default. For an overall significance metric per DMR, use the Stouffer transformation (derived from limma FDRs, not the χ2 test).

Hope this helps,

Tim

Olivia Solomon

unread,

Aug 8, 2016, 3:09:04 PM8/8/16

to Epigenomics forum

Hi Tim,

Thank you. That is very helpful.

Regarding interpreting the Stouffer estimate: assuming we wanted to consider DMRs with an adjusted p-value <=0.05, would the Stouffer estimate then be used for that cutoff? Some DMRs that passed through both cpg.annotate fdr=0.05 and dmrcate pcutoff="fdr" still have fairly high Stouffer values (as high as 0.5), so I'm not sure what to make of this.

Thank you,

Olivia

Tim Peters

unread,

Aug 9, 2016, 12:01:04 AM8/9/16

to Epigenomics forum

Hi Olivia,

It's up to you. Higher Stouffer values suggest there is some heterogeneity of signal within the DMR. If you're concerned about this you can always make lambda smaller to split them up into more discrete DMRs with (hopefully) more uniform DM signal across them.

Cheers,

Tim

Rachel Jepson

unread,

Aug 18, 2016, 6:35:14 AM8/18/16

to Epigenomics forum

Hi Tim

I have noticed in the current version of DMRcate that following cpg.annotate there is no longer any associated gene/group annotation ($gene/$Group). I just wondered why this feature had been dropped and whether there are any suggestions for adding the information to the DMRcate hits?

Thanks
Rachel

Tim Peters

unread,

Aug 24, 2016, 8:40:39 AM8/24/16

to Epigenomics forum

Hi Rachel,

I dropped these features since the Illumina annotation (with regards to gene loci) was incomplete and contentious in a few cases. Instead I incorporated complete Gencode transcript loci into DMRcatedata for hg19 (and hg38 and mm10 if you're working with WGBS). You can annotate your output from dmrcate() with overlapping Gencode transcript promoter regions using extractRanges() with default settings.

Cheers,

Tim

Aarti Jajoo

unread,

Sep 27, 2016, 5:05:17 PM9/27/16

to Epigenomics forum

Thank you very much for putting this version up.

I have a question regrading unpaired data, in your previous post (for older version ) you have mentioned that one could just use
design = model.matrix(~0+type)

But when I use this for cpg.annotate(..) it gives me an error
Error: colnames(design)[1] == "(Intercept)" is not TRUE

I tried my best to look into your old posts to find an answer, sorry if you have already addressed this issue before.

Thanks,
Aarti

Tim Peters

unread,

Sep 27, 2016, 10:48:25 PM9/27/16

to Epigenomics forum

Dear Aarti,

Design matrices such as the one you specified will need a contrast matrix and contrasts=T given as arguments to cpg.annotate, otherwise you will run into the intercept error you specified.

If type is binary then you can just specify design = model.matrix(~type) and coef=2. But if there are 3 or more levels you will want to define contrasts. Have a look at the limma manual, in particular the worked example on page 36. https://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf

Best,

Tim

Aarti Jajoo

unread,

Sep 28, 2016, 10:07:25 AM9/28/16

to Epigenomics forum

Thank you very much for the prompt reply Tim.

type is binary in my case. I had tried design as suggested by you, but I was getting the following remark "Your contrast returned no individually significant probes.". Hence I was not sure if design was correct. But thank you for verifying this aspect, and as suggested I will look into limma manual as well to understand things better.

Regards,
Aarti

Priyanka

unread,

Mar 24, 2017, 5:51:32 AM3/24/17

to Epigenomics forum

Dear Tim.

I'm working on 450K data of paired samples.

I have earlier performed DMCpG analysis using paired design model from limma using the following script

## Load M-value file

Paired_M_val <- read.csv(".../Paired_M_val.csv", row.names=1, check.names = FALSE)

## Load targets file for analysis

Targets_Paired <- read.csv("..../Paired_targets.csv", row.names=1, check.names = FALSE)

Treat_p_outcome <- factor(Targets_Paired$Sample_Outcome_Details)

Treat_p_Batch <- Targets_Paired$Sentrix_ID_coded

## Model: Adjusted for batch effects

design_paired_model <- model.matrix(~0+Treat_p_outcome+Treat_p_Batch)

cm_paired_model <- makeContrasts(case2vscase1 = Treat_p_outcome2-Treat_p_outcome1,case3vscase1 = Treat_p_outcome3-Treat_p_outcome1)

corfit_paired_model <- duplicateCorrelation(Paired_M_val, design_paired_model, block=Targets_Paired$SampleID)

## Input the inter-subject correlation into the linear model fit:

fit_paired_model <- lmFit(Paired_M_val, design_paired_model, block=Targets_Paired$SampleID, correlation=corfit_paired_model$consensus)

## Computing contrasts and moderated t-tests for cleanM_combat group wise

fit_paired_model_final <- contrasts.fit(fit_paired_model,cm_paired_model)

fit_paired_model_final_eBayes <- eBayes(fit_paired_model_final, trend = TRUE)

## Extracting results

case2vscase1_paired_model_adjusted_BH <- topTable(fit_paired_model_final_eBayes,coef="case2vscase1",adjust="BH")

case3vscase1_paired_model_adjusted_BH <- topTable(fit_paired_model_final_eBayes,coef="case3vscase1",adjust="BH")

I wish to perform DMR analysis using DMRcate. However I'm unable to fit the paired nature of samples in the design matrix.

Following is my session info

> sessionInfo()

R version 3.3.3 (2017-03-06)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows >= 8 x64 (build 9200)

locale:

[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C

[5] LC_TIME=English_United States.1252

attached base packages:

[1] splines stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:

[1] DMRcate_1.10.8 DMRcatedata_1.10.1 DSS_2.14.0 bsseq_1.10.0 limma_3.30.13

[6] minfi_1.20.2 bumphunter_1.14.0 locfit_1.5-9.1 iterators_1.0.8 foreach_1.4.3

[11] Biostrings_2.42.1 XVector_0.14.1 SummarizedExperiment_1.4.0 GenomicRanges_1.26.4 GenomeInfoDb_1.10.3

[16] IRanges_2.8.2 S4Vectors_0.12.2 Biobase_2.34.0 BiocGenerics_0.20.0

loaded via a namespace (and not attached):

[1] colorspace_1.3-2 siggenes_1.48.0

[3] mclust_5.2.3 IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0

[5] biovizBase_1.22.0 htmlTable_1.9

[7] base64enc_0.1-3 dichromat_2.0-0

[9] base64_2.0 IlluminaHumanMethylationEPICanno.ilm10b2.hg19_0.6.0

[11] interactiveDisplayBase_1.12.0 AnnotationDbi_1.36.2

[13] codetools_0.2-15 R.methodsS3_1.7.1

[15] methylumi_2.20.0 knitr_1.15.1

[17] Formula_1.2-1 Rsamtools_1.26.1

[19] annotate_1.52.1 GO.db_3.4.0

[21] cluster_2.0.6 R.oo_1.21.0

[23] shiny_1.0.0 httr_1.2.1

[25] backports_1.0.5 assertthat_0.1

[27] Matrix_1.2-8 lazyeval_0.2.0

[29] acepack_1.4.1 htmltools_0.3.5

[31] tools_3.3.3 gtable_0.2.0

[33] doRNG_1.6 Rcpp_0.12.10

[35] multtest_2.30.0 preprocessCore_1.36.0

[37] nlme_3.1-131 rtracklayer_1.34.2

[39] stringr_1.2.0 mime_0.5

[41] ensembldb_1.6.2 rngtools_1.2.4

[43] gtools_3.5.0 statmod_1.4.29

[45] XML_3.98-1.5 beanplot_1.2

[47] org.Hs.eg.db_3.4.0 AnnotationHub_2.6.5

[49] zlibbioc_1.20.0 MASS_7.3-45

[51] scales_0.4.1 BSgenome_1.42.0

[53] VariantAnnotation_1.20.3 BiocInstaller_1.24.0

[55] GEOquery_2.40.0 RColorBrewer_1.1-2

[57] missMethyl_1.8.0 yaml_2.1.14

[59] memoise_1.0.0 gridExtra_2.2.1

[61] ggplot2_2.2.1 pkgmaker_0.22

[63] biomaRt_2.30.0 rpart_4.1-10

[65] reshape_0.8.6 latticeExtra_0.6-28

[67] stringi_1.1.3 RSQLite_1.1-2

[69] genefilter_1.56.0 checkmate_1.8.2

[71] permute_0.9-4 GenomicFeatures_1.26.3

[73] BiocParallel_1.8.1 matrixStats_0.51.0

[75] bitops_1.0-6 nor1mix_1.2-2

[77] lattice_0.20-34 ruv_0.9.6

[79] GenomicAlignments_1.10.1 htmlwidgets_0.8

[81] plyr_1.8.4 magrittr_1.5

[83] R6_2.2.0 Hmisc_4.0-2

[85] IlluminaHumanMethylation450kmanifest_0.4.0 DBI_0.6

[87] foreign_0.8-67 survival_2.41-2

[89] RCurl_1.95-4.8 nnet_7.3-12

[91] tibble_1.2 grid_3.3.3

[93] data.table_1.10.4 digest_0.6.12

[95] xtable_1.8-2 httpuv_1.3.3

[97] R.utils_2.5.0 illuminaio_0.16.0

[99] openssl_0.9.6 munsell_0.4.3

[101] registry_0.3 BiasedUrn_1.07

[103] Gviz_1.18.2 quadprog_1.5-5

Could you please help me with inputting the paired sample details in cpgannotate function?

Thank you in advance.

-Priyanka

Kristian Almstrup

unread,

Apr 10, 2017, 1:50:23 AM4/10/17

to epigenom...@googlegroups.com

Dear Tim and others

Along the same line.

Will it make sense to included surrogate variables in the design matrix for cpg.annotate when surrogate variables have been calculated from single CpGs and not regions per se?

The question is whether correcting (too much) for variability in single CpGs actually can kill some of the signals on a regional level.

Kind regards,

Kristian

--
You received this message because you are subscribed to a topic in the Google Groups "Epigenomics forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/epigenomicsforum/G3yO92pEqzQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to epigenomicsfor...@googlegroups.com.

rsheib

unread,

May 17, 2017, 5:05:43 AM5/17/17

to Epigenomics forum

Dear Tim,

Is there anyway to categorize DMRs output from DMRcate into hyper/hypo methylated like what is possible in minfi by using bumphunter method? And also is there a way to extract regions of the hyper-methylated promoters regions from DMRcate.output?

Many thanks in advance,

Tim Triche, Jr.

unread,

May 18, 2017, 10:34:48 AM5/18/17

to Epigenomics forum

(this is a different Tim replying):

yes, of course. And I say this having used both the DSS-backed N=1 DMR calling approach and also DMRcate's approach on the same pediatric samples. It's been an education -- the arrays are better for some things (e.g. CNV) and the WGBS for others (although maybe not the ones that would most immediately come to mind). Anyways...

Let's use the built-in example.

data(dmrcatedata)

myMs <- logit2(myBetas)

myMs.noSNPs <- rmSNPandCH(myMs, dist=2, mafcut=0.05)

patient <- factor(sub("-.*", "", colnames(myMs)))

type <- factor(sub(".*-", "", colnames(myMs)))

design <- model.matrix(~patient + type)

myannotation <- cpg.annotate("array", myMs.noSNPs, what="M", arraytype = "450K",

analysis.type="differential", design=design, coef=39)

dmrcoutput <- dmrcate(myannotation, lambda=1000)

Now we have an object of class "dmrcate.output" (try class(dmrcoutput) if you don't believe me) with an element named "result".

Let's use that to turn it into something that we can crank out BED files from.

dmrcoutput$gr <- as(dmrcoutput$results$coord, "GRanges")

dmrcoutput$gr$score <- dmrcoutput$results$meanbetafc

dmrcoutput$gr$p <- dmrcoutput$results$Stouffer

# in this example, it's useless, but just FYI:

pcutoff <- 0.05

dmrcoutput$gr <- subset(dmrcoutput$gr, p < pcutoff)

# now split by direction

hyperHypo <- split(dmrcoutput$gr, ifelse(dmrcoutput$gr$score > 0, "hyper", "hypo"))

sapply(hyperHypo, length)

# hyper hypo

# 336 407

# now export each

library(rtracklayer)

for (direction in names(hyperHypo)) {

export(hyperHypo[[direction]], paste0("/tmp/", direction, "DMRs.hg19.bed"))

}

And that's that. Similar logic for the output of DSS' callDMR() output.

rsheib

unread,

May 21, 2017, 7:43:10 AM5/21/17

to Epigenomics forum

Dear Tim,

I am so grateful for your help and explanation, that helps a lot. I have one more question is the "Score" here (dmrcoutput$results$meanbetafc) a logarithmic value? this is the definition in DMRcate: 'meanbetafc': Mean beta fold change within the region. because if it is not, then we should nt have hyperHypo <- split(dmrcoutput$gr, ifelse(dmrcoutput$gr$score > 1, "hyper", "hypo")) ?

To define, only the hyper DMRs in the promoter region, I did it like this, Is that correct?

promoter_regions <- promoters(TxDb.Hsapiens.UCSC.hg19.knownGene,upstream=1000, downstream=200)

hyper_promoters <- subsetByOverlaps(hyperHypo$hyper, promoter_regions)

hypo_promoters <- subsetByOverlaps(hyperHypo$hypo, promoter_regions)

Many many thanks again for your help and ime,

rsheib

Reply all

Reply to author

Forward

Message has been deleted