Many more A -> G and T -> C mutations than expected

85 views
Skip to first unread message

RS

unread,
Jul 8, 2015, 11:00:38 AM7/8/15
to bissn...@googlegroups.com
Hi Yaping,

First, thank you for writing this program.

I am having an issue which I hope you might be able to help me with. My SNP VCF output consists of mostly of A -> G and T -> C mutations. This is surprising, because it is backward from what I would expect, i.e. many G -> A and C -> T mutations indicative of cytosine deamination. This is including in my control which is the same genetic background as the reference genome.

I was wondering if you have ever seen this before, or have an idea what could cause this? 

Thanks,
Ray

ping

unread,
Jul 9, 2015, 3:59:43 PM7/9/15
to bissn...@googlegroups.com
Hi Ray,
Thanks for your interests! Is this VCF file a raw SNP vcf file or after filtering step? Bisulfite-seq usually have a high strand bias and could lead high false positive discovery of SNP.

yaping

--
You received this message because you are subscribed to the Google Groups "bissnp-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bissnp-help...@googlegroups.com.
To post to this group, send email to bissn...@googlegroups.com.
Visit this group at http://groups.google.com/group/bissnp-help.
For more options, visit https://groups.google.com/d/optout.



--
Yaping Liu, Ph.D.

Postdoctoral Associate
Manolis Kellis Lab
Computer Science and Artificial Intelligence Lab (CSAIL)
Massachusetts Institute of Technology
Broad Institute of MIT and Harvard


RS

unread,
Jul 17, 2015, 3:33:39 PM7/17/15
to bissn...@googlegroups.com
Hi Yaping,

Thank you for getting back to me! Sorry for the delay in my reply, I was testing something out. I get the problematic result in my filtered SNP VCF too -- almost exclusively A->G and T->C mutations. I tried another SNP calling software and it worked as I expected, detecting other types of mutations, so I'm not sure what I'm doing wrong with BisSNP. I tried your test dataset and it worked fine, but then again the test dataset commands provided in your instructions have fewer steps than the general instructions starting from scratch, so perhaps there is some difference? My workflow is as follows:



java -Xmx20g -jar ./picard-tools-1.135/picard.jar AddOrReplaceReadGroups INPUT=sample01.bam OUTPUT=sample01.RG.bam RGID=sample01 RGLB=sample01 RGSM=sample01 SORT_ORDER=coordinate RGPL=illumina RGPU=run CREATE_INDEX=true

java -Xmx20g -jar ./picard-tools-1.135/picard.jar MarkDuplicates INPUT=sample01.RG.bam OUTPUT=sample01.RG.mdups.bam METRICS_FILE=sample01.RG.mdups.metric.txt CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT

java -Xmx20g -jar ./BisSNP-0.82.2.jar -R TAIR10.fa -I sample01.RG.mdups.bam -T BisulfiteCountCovariates -recalFile sample01_before.csv -knownSites known.vcf -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -nt 1

java -Xmx20g -jar ./BisSNP-0.82.2.jar -R TAIR10.fa -I sample01.RG.mdups.bam -o sample01.RG.mdups.recal.bam -T BisulfiteTableRecalibration -recalFile sample01_before.csv -maxQ 40

java -Xmx20g -jar ./BisSNP-0.82.2.jar -R TAIR10.fa -I sample01.RG.mdups.recal.bam -T BisulfiteCountCovariates -recalFile sample01_after.csv -knownSites known.vcf -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -nt 1

java -Xmx20g -jar ./BisulfiteAnalyzeCovariates-0.69.jar -recalFile sample01_before.csv -outputDir before_recal -ignoreQ 5 --max_quality_score 40
java -Xmx20g -jar ./BisulfiteAnalyzeCovariates-0.69.jar -recalFile sample01_after.csv -outputDir after_recal -ignoreQ 5 --max_quality_score 40

java -Xmx20g -jar ./BisSNP-0.82.2.jar -R TAIR10.fa -T BisulfiteGenotyper -I sample01.RG.mdups.recal.bam -vfn1 sample01.snp.raw.vcf -D known.vcf -stand_call_conf 30 -stand_emit_conf 0 -out_modes EMIT_VARIANTS_ONLY

java -Xmx20g -jar ./BisSNP-0.82.2.jar -R TAIR10.fa -T VCFpostprocess -oldVcf sample01.snp.raw.vcf -newVcf sample01.snp.filtered.vcf -snpVcf sample01.snp.raw.vcf -o sample01.snp.raw.filter.summary.txt

RS

unread,
Jul 20, 2015, 2:16:28 PM7/20/15
to bissn...@googlegroups.com
I think it may have to do with my BAM alignment files. Recently I tried BisSNP on BSMAP-aligned files, and it seemed to work fine. Earlier, I had been using Bismark-aligned files. Strangely, the other SNP caller I used had the opposite -- it worked on Bismark-aligned files but not on BSMAP-aligned files. So I guess I have to dive into the alignment files and see what is the difference in BAM output that is making BisSNP like one more than the other (at least on my settings). I suppose the issue could be at the Picard step too.
Reply all
Reply to author
Forward
0 new messages