Loading bismark .CpG_report.txt.gz straight into methylkit

1,689 views
Skip to first unread message

avil...@gmail.com

unread,
Jun 1, 2016, 10:15:27 AM6/1/16
to methylkit_discussion
Hi,

This is a question about generating a straightforward standard pipeline from bismark sample-by-sample CpG counting directly into methylKit multi-sample DMR calling.

The methylkit authors kindly provided a loading mechanism for the .CpG_report.txt.gz file from bismark, which means that on a pipeline setup (e.g. Galaxy or Dnanexus), one can go from bismark methylation extractor output directly loaded into methylkit, without intermediate files.

Find below the code snippet with the solution (assumes 2 cases and 2 controls, and min.cov=10).

Feedback welcomed:

$ Rscript run_methylkit_four_samples.R CASE1.CpG_report.txt.gz CASE2.CpG_report.txt.gz CONTROL1.CpG_report.txt.gz CONTROL2.CpG_report.txt.gz 10



########

#!/usr/bin/Rscript

thisdir = getwd()
tempdir <- function() thisdir
unlockBinding("tempdir", baseenv())
assignInNamespace("tempdir", tempdir, ns="base", envir=baseenv())
assign("tempdir", tempdir, baseenv())
lockBinding("tempdir", baseenv())

library(methylKit)
library(tools)
args<-commandArgs(TRUE)

# readBismarkCytosineReport function
devtools::source_gist("4839e615e2401d73fe51")

file.list = list(args[1],args[2],args[3],args[4])

# This reads straight from the output of bismark, works for both strands
myobj = readBismarkCytosineReport(
        file.list,
          sample.id=list(
  	  strsplit(basename(args[1]), '[_]')[[1]][1],
          strsplit(basename(args[2]), '[_]')[[1]][1],
  	  strsplit(basename(args[3]), '[_]')[[1]][1],
  	  strsplit(basename(args[4]), '[_]')[[1]][1]
  	       ),
  	assembly="hg38",
  	treatment=c(1,1,0,0),
  	min.cov=args[5])

pdf(file="Methylkit.01.MethylationStats.Sample01.pdf",width=12,height=12) ; getMethylationStats(myobj[[1]],plot=T,both.strands=T); dev.off()
pdf(file="Methylkit.01.MethylationStats.Sample02.pdf",width=12,height=12) ; getMethylationStats(myobj[[2]],plot=T,both.strands=T); dev.off()
pdf(file="Methylkit.01.MethylationStats.Sample03.pdf",width=12,height=12) ; getMethylationStats(myobj[[3]],plot=T,both.strands=T); dev.off()
pdf(file="Methylkit.01.MethylationStats.Sample04.pdf",width=12,height=12) ; getMethylationStats(myobj[[4]],plot=T,both.strands=T); dev.off()

pdf(file="Methylkit.01.CoverageStats.Sample01.pdf",width=12,height=12) ; getCoverageStats(myobj[[1]],plot=T,both.strands=T); dev.off()
pdf(file="Methylkit.01.CoverageStats.Sample02.pdf",width=12,height=12) ; getCoverageStats(myobj[[2]],plot=T,both.strands=T); dev.off()
pdf(file="Methylkit.01.CoverageStats.Sample03.pdf",width=12,height=12) ; getCoverageStats(myobj[[3]],plot=T,both.strands=T); dev.off()
pdf(file="Methylkit.01.CoverageStats.Sample04.pdf",width=12,height=12) ; getCoverageStats(myobj[[4]],plot=T,both.strands=T); dev.off()

meth = unite(myobj)
pdf(file="Methylkit.02.Meth_CorrelationPlot.pdf",width=12,height=12); getCorrelation(meth,plot=T); dev.off()

# cluster all samples using correlation distance and return a tree object for plclust
hc = clusterSamples(meth, dist="correlation", method="ward", plot=FALSE)

# cluster all samples using correlation distance and plot hiarachical clustering
pdf(file="Methylkit.03.PCASamples.ctreeplot.pdf",width=12,height=12); clusterSamples(meth, dist="correlation", method="ward", plot=TRUE); dev.off()

# screeplot of principal component analysis.
pdf(file="Methylkit.03.PCASamples.screeplot.pdf",width=12,height=12); PCASamples(meth, screeplot=TRUE); dev.off()

# principal component anlaysis of all samples.
pdf(file="Methylkit.03.PCASamples.pcaxyplot.pdf",width=12,height=12); PCASamples(meth); dev.off()

myDiff=calculateDiffMeth(meth)
write.table(myDiff, file="Methylkit.04.DiffMeth.tsv", sep='\t', quote=FALSE)
myDiff25p.hyper = get.methylDiff(myDiff,difference=25,qvalue=0.01,type="hyper")
write.table(myDiff25p.hyper,"Methylkit.04.hyper_methylated.txt",sep='\t', quote=FALSE)
myDiff25p.hypo = get.methylDiff(myDiff,difference=25,qvalue=0.01,type="hypo")
write.table(myDiff25p.hypo,"Methylkit.04.hypo_methylated.txt",sep='\t', quote=FALSE)
myDiff25p = get.methylDiff(myDiff,difference=25,qvalue=0.01)
write.table(myDiff25p,"Methylkit.04.differentialy_methylated.txt",sep='\t', quote=FALSE)
pdf("Methylkit.04.DiffMethPerChr.pdf",width=12,height=12); diffMethPerChr(myDiff,plot=TRUE,qvalue.cutoff=0.01,meth.cutoff=25); dev.off()

gene.obj=read.transcript.features("/Annotations/UCSC_database/refseq.hg38.bed.txt")

diffAnn = annotate.WithGenicParts(myDiff25p, gene.obj)
write.table(getAssociationWithTSS(diffAnn),"Methylkit.05.AssociationWithTSS.txt", sep='\t', quote=FALSE)

pdf("Methylkit.05.TargetAnnotation.piechart1.pdf",width=12,height=12); plotTargetAnnotation(diffAnn, precedence = TRUE, main ="differential methylation annotation"); dev.off()

write.table(getFeatsWithTargetsStats(diffAnn, percentage = TRUE),"Methylkit.05.FeatsWithStargetsStats.txt",sep='\t', quote=FALSE)

tmpfiles <- dir(path=getwd(), pattern="*.CpG_report.txt")
file.remove(tmpfiles)

1

#########

Altuna Akalin

unread,
Jun 1, 2016, 10:31:42 AM6/1/16
to methylkit_...@googlegroups.com
Hi Albert,
just one comment, being able to read bismark files is now a part of the development version of methylKit. Here is the full list of current, changes. We changed some function names in order to make naming convention coherent, old names will still work for a while. Maybe you can test your pipeline with the development version. 

methylKit 0.9.6
--------------
IMPROVEMENTS AND BUG FIXES

* changes to following function names:
  read() to methRead()  
  read.bismark to processBismarkAln()
  adjust.methylC() to adjustMethylC() 
  get.methylDiff() to getMethylDiff()
  annotate.WithFeature() to annotateWithFeature()
  annotate.WithFeature.Flank() to annotateWithFeatureFlank()
  annotate.WithGenicParts() to annotateWithGenicParts()
  read.bed() to readBed()
  read.feature.flank() to readFeatureFlank()
  read.transcript.features() to readTranscriptFeatures()
* Improved documentation for methRead() (old read())
* Ported the Perl script for methylation base calling to C/C++ style, 
  for easier integration via Rcpp. Contributed by Alexander Gosdschan.
* methRead() uses data.table::fread() to read files faster.

NEW FUNCTIONS AND FEATURES

* new function methSeg() can segment methylation (methylRaw objects) and
  differential methylation (methylDiff objects) profiles to segments.
  Associated function methSeg2bed() creates BED files from segments.
  see ?methSeg. A test is added to check this in R CMD check.
  Contributed by Arsene Wabo and Alexander Gosdschan.

* Now, bismark cytosine report and coverage files can be read using methRead()
  pipeline argument. see ?methRead

* new tabix based classes methylRawDB, methylRawListDB, methylBaseDB, 
  methylDiffDB and respective methods implemented. Tests are updated to 
  check proper function in R CMD check. Contributed by Alexander Gosdschan.

* calculateDiffMeth() now supports basic overdispersion correction and multiple 
  methods for pvalue correction. The function also now handles covariates 
  such as age,sex etc. A test is added to check this in R CMD check. 
  Contributed by Adrian Bierling.

* New function calculateDiffMethDSS() is using beta-binomial model from DSS 
  package to calculate differential methylation. Contributed by 
  Dhruva Chandramohan with modifications from Altuna Akalin. Original function
  written by Hao Wao in DSS package.

* dataSim creates a methylBase object with simulated methylation data. 
  A test is added to check this in R CMD check. Contributed by Adrian Bierling.


Best,
Altuna

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at https://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.

ahmad.m...@gmail.com

unread,
Nov 23, 2017, 10:13:40 AM11/23/17
to methylkit_discussion
Hi

I want use this readBismarkCytosineReport function but after running :
devtools::source_gist("4839e615e2401d73fe51")

I faced with error
Error in r_files[[which]] : invalid subscript type 'closure'


How can I fix it?

Thanks

Altuna Akalin

unread,
Nov 23, 2017, 10:23:36 AM11/23/17
to methylkit_...@googlegroups.com
Install the latest methylKit version and read the readMeth help, this is now integrated there

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at https://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.
--
Sent from mobile, excuse the brevity

ahmad.m...@gmail.com

unread,
Nov 25, 2017, 5:06:21 AM11/25/17
to methylkit_discussion
Dear Altuna,

I have checked and re-install methylKit several times but still I could not insert CpG_report files in it.

New Bismark version gives several files : bedGraph, Cov and CpG but they are not sync with MethylKit.

Does anybody know straightforward pipeline for Bismark output files?? 

Or How can I convert Bismark output to Methylkit acceptable format?

reagrds,
Ahamd
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsub...@googlegroups.com.

To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at https://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.

Altuna Akalin

unread,
Nov 25, 2017, 5:59:50 AM11/25/17
to methylkit_...@googlegroups.com
from the methRead help:
Bismark aligner can output methylation information per base in multiple formats. With pipeline='bismarkCoverage', the function reads bismark coverage files, which have chr,start,end, number of cytosines (methylated bases) and number of thymines (unmethylated bases) format. If bismark coverage files are used the function will not have the strand information,so beware of that fact. With pipeline='bismarkCytosineReport', the function expects cytosine report files from Bismark, which have chr,start, strand, number of cytosines (methylated bases) , number of thymines (unmethylated bases),context and trinucletide context format.

To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsubscrib...@googlegroups.com.
To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at https://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.
--
Sent from mobile, excuse the brevity

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsub...@googlegroups.com.
To post to this group, send email to methylkit_discussion@googlegroups.com.

Seyed Ahmad Mousavi

unread,
Nov 25, 2017, 6:39:08 AM11/25/17
to methylkit_...@googlegroups.com, aak...@gmail.com
Dear Altuna,

Thank you so much for prompt reply and complete answer.

Sorry for asking too much questions:


Here is my code and error I faced:

myobj=methRead( f, sample.id=list("S1","S2"),assembly="mm10",treatment= c(0,1), pipeline="bismarkCoverage")

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘methRead’ for signature ‘"character", "list", "character"’


 It gave similar error for bismarkCytosineReport too.

Please let me know how to pass this error too. 






To post to this group, send email to methylkit_discussion@googlegroups.com.
Visit this group at https://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "methylkit_discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/methylkit_discussion/89gjErcOmmU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to methylkit_discussion+unsub...@googlegroups.com.

Seyed Ahmad Mousavi

unread,
Nov 26, 2017, 6:58:55 AM11/26/17
to Altuna Akalin, methylkit_...@googlegroups.com
dear Altuna ,

It works but  I have 2 other problems,

1-  When I wanted to draw clustering charts I faced with following error:

The "ward" method has been renamed to "ward.D"; note new "ward.D2"
Error in hclust(d, HCLUST.METHODS[hclust.method]) : 
  NA/NaN/Inf in foreign function call (arg 11)
In addition: Warning message:
In cor(t(x), method = METHODS[method]) : the standard deviation is zero


2- I have 20 samples which are divide into 7 groups, How can I make comparison among all conditions?


Kind regards,
Ahmad

On Sat, Nov 25, 2017 at 4:28 PM, Altuna Akalin <aak...@gmail.com> wrote:
f should be a list

To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsub...@googlegroups.com.
To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at https://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.
--
Sent from mobile, excuse the brevity

--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discussion+unsub...@googlegroups.com.
To post to this group, send email to methylkit_discussion@googlegroups.com.
Visit this group at https://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "methylkit_discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/methylkit_discussion/89gjErcOmmU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to methylkit_discussion+unsub...@googlegroups.com.

To post to this group, send email to methylkit_discussion@googlegroups.com.
Visit this group at https://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.
--

Anita Tripathi

unread,
Jan 27, 2025, 12:38:27 PMJan 27
to methylkit_discussion

Dear Dr. Altuna
I Also have an issue , which file to used after running bismark methylation extractor if I want to perform differential methylation analysis for all 3 contexts through methylkit, and what could be the possible methRead input parameter structure.


--
You received this message because you are subscribed to a topic in the Google Groups "methylkit_discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/methylkit_discussion/89gjErcOmmU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to methylkit_discussion+unsub...@googlegroups.com.
To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at https://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.

Alexander Blume

unread,
Jan 30, 2025, 5:16:06 PMJan 30
to methylkit_...@googlegroups.com
Hi Anita,


For keeping track of questions and answers, it would be better if you would open up a new thread for your question next time. Of course you can link to older threads, but this makes it easier for us to follow up.


Now back to your question. To perform differential methylation analysis for all 3 contexts (CpG, CHG, CHH) using methylKit after running Bismark methylation extractor, you should use the comprehensive cytosine report files. These files contain methylation information for all cytosines regardless of context.

For the methRead function call, you can generally follow the vignette (https://www.bioconductor.org/packages/devel/bioc/vignettes/methylKit/inst/doc/methylKit.html#21_Reading_the_methylation_call_files), but you need to adjust the pipeline argument.

For example:

```R
myobj = methRead(file.list,
                 sample.id = list("sample1", "sample2", "sample3", "sample4"),
                 assembly = "genome_assembly",
                 treatment = c(1,1,0,0),
                 context = "CpG",
                 pipeline = "bismarkCytosineReport",
                 mincov = 10)
```

Replace "file.list" with your list of cytosine report file paths, "sample1", etc. with your sample IDs, "genome_assembly" with the appropriate assembly, and adjust the treatment vector to match your experimental design. The "context" parameter can be set to "CpG", "CHG", or "CHH" depending on which context you want to analyze.



Best,
Alex


To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/methylkit_discussion/3e5b774f-7f36-478e-855b-fd0d44a68c0bn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages