I was recently charged with the analysis of Targeted bisulfite sequencing (BS-Seq) data of human patients.
Patients + controls were sequenced on 3 different runs (hello, batch effect!).
They used Illumina's TruSeq MethylCapture EPIC Library prep kit (107 Mb,
3,340,894 CpG sites) and the sequencing was performed on a NextSeq 500.
The data is paired-end (fastq R1 + fastq R2).
I perfomed the primary analysis with Trim galore for trimming and bismark for alignment (hg19) and methylation calling.
I am now planning to use methylkit to perform a differential methylation analysis using the logistic regression model with covariates and Chi-squared test with overdispersion correction.
For the analysis, I am planning to follow the methylkit's user guide written by Altuna Akalin
(https://bioconductor.org/packages/devel/bioc/vignettes/methylKit/inst/doc/methylKit.html#1_introduction) which was very helpful to a beginner like me, but I still have some doubts.
And several questions:
1) I have a very low sample size (5 cases + 5 controls) => do you think the logistic regression model is the appropriate choice ?
2) I was initially planning to use the processBismarkAln function to read methylation calls from the generated BAM files.
Unfortunately I keep getting the RStudio fatal error described here (no solution was found) : https://www.biostars.org/p/241210/
Therefore, I decided to use the readBismarkCoverage function posted on github by Altuna Akalin: https://gist.github.com/al2na/4839e615e2401d73fe51
=> is there a significant difference between these input methods? I noticed that I won't have the strand information, but I don't know how much it will affect the results...
3) I'm working with targeted capture data, so I was thinking of generating a GRanges object containing the coordinates of captured regions (provided by Illumina in a manifest bed file) and then using selectByOverlap function of methylkit to restrict my analysis to captured regions.
=> I would really appreciate your opinion on it, I'm hesitating between selecting the targeted regions/ tiling window analysis / default parameters...
4) Bonus question about the primary analysis :
There seem to be divergent opinions about the post-alignment deduplication step in case of targeted capture BS-seq data...you can check a post on biostars about it: https://www.biostars.org/p/328912/#328955
=> From your point of view, should I perform the post-alignment deduplication in my case?
Sorry to bother you and for the long post, any help would be appreciated!
Best regards,
Hi,I'm quite new to epigenetics and DNA methylation analysis in general, so any help would be appreciated.Many excuses in advance if my questions seem too naive or have already been answered elsewhere.To sum up my project :I was recently charged with the analysis of Targeted bisulfite sequencing (BS-Seq) data of human patients.
Patients + controls were sequenced on 3 different runs (hello, batch effect!).
They used Illumina's TruSeq MethylCapture EPIC Library prep kit (107 Mb, 3,340,894 CpG sites) and the sequencing was performed on a NextSeq 500.
The data is paired-end (fastq R1 + fastq R2).
I perfomed the primary analysis with Trim galore for trimming and bismark for alignment (hg19) and methylation calling.
I am now planning to use methylkit to perform a differential methylation analysis using the logistic regression model with covariates and Chi-squared test with overdispersion correction.
For the analysis, I am planning to follow the methylkit's user guide written by Altuna Akalin
(https://bioconductor.org/packages/devel/bioc/vignettes/methylKit/inst/doc/methylKit.html#1_introduction) which was very helpful to a beginner like me, but I still have some doubts.
And several questions:
1) I have a very low sample size (5 cases + 5 controls) => do you think the logistic regression model is the appropriate choice ?
2) I was initially planning to use the processBismarkAln function to read methylation calls from the generated BAM files.
Unfortunately I keep getting the RStudio fatal error described here (no solution was found) : https://www.biostars.org/p/241210/
Therefore, I decided to use the readBismarkCoverage function posted on github by Altuna Akalin: https://gist.github.com/al2na/4839e615e2401d73fe51
=> is there a significant difference between these input methods? I noticed that I won't have the strand information, but I don't know how much it will affect the results...
3) I'm working with targeted capture data, so I was thinking of generating a GRanges object containing the coordinates of captured regions (provided by Illumina in a manifest bed file) and then using selectByOverlap function of methylkit to restrict my analysis to captured regions.
=> I would really appreciate your opinion on it, I'm hesitating between selecting the targeted regions/ tiling window analysis / default parameters...
4) Bonus question about the primary analysis :
There seem to be divergent opinions about the post-alignment deduplication step in case of targeted capture BS-seq data...you can check a post on biostars about it: https://www.biostars.org/p/328912/#328955
=> From your point of view, should I perform the post-alignment deduplication in my case?
Sorry to bother you and for the long post, any help would be appreciated!
Best regards,
--
You received this message because you are subscribed to the Google Groups "methylkit_discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to methylkit_discus...@googlegroups.com.
To post to this group, send email to methylkit_...@googlegroups.com.
Visit this group at https://groups.google.com/group/methylkit_discussion.
For more options, visit https://groups.google.com/d/optout.