large discrepancy between DP and DP4 in vcf file

156 views
Skip to first unread message

annie...@gmail.com

unread,
Sep 24, 2014, 3:15:03 PM9/24/14
to bissn...@googlegroups.com
I tried doing some SNP calling on some targeted bisulfite sequencing data set. I used all of the default setting in BisSNP BisulfiteGenotyper. What I am noticing is that my DP is sometimes 50% larger than the sum of what is contained in the DP4 column. I understand that the DP4 doesn't count low quality bases but it seems more is getting filtered out than that. I took a look at some of my sites in IGV. I scrolled across the different reads to see what the mmq and base quality score was. For most of them it was very high. I am not sure if those are the only settings that can affect DP4. The problem is, I am getting genotype calls that just don't seem right when you look at the raw allele frequencies. I guess I would like to know, is there something I am missing? Is there a setting that I should maybe relax or some way to figure out why these reads that appear good in IGV are being treated as poor by the genotype caller? I would appreciate any suggestions.

Annie

ping

unread,
Sep 24, 2014, 4:55:02 PM9/24/14
to bissn...@googlegroups.com
Hi Annie,
Could you let me know which you think it is a good reads but it is missing in the methylation/genotype call? Show me the reads information in the bam file and IGV browser...There are a lot of reads filter rule applied during the genotyping call. You could only visualize them a little bit in IGV browser...

yaping

On Wed, Sep 24, 2014 at 3:15 PM, <annie...@gmail.com> wrote:
I tried doing some SNP calling on some targeted bisulfite sequencing data set. I used all of the default setting in BisSNP BisulfiteGenotyper. What I am noticing is that my DP is sometimes 50% larger than the sum of what is contained in the DP4 column. I understand that the DP4 doesn't count low quality bases but it seems more is getting filtered out than that. I took a look at some of my sites in IGV. I scrolled across the different reads to see what the mmq and base quality score was. For most of them it was very high. I am not sure if those are the only settings that can affect DP4. The problem is, I am getting genotype calls that just don't seem right when you look at the raw allele frequencies. I guess I would like to know, is there something I am missing? Is there a setting that I should maybe relax or some way to figure out why these reads that appear good in IGV are being treated as poor by the genotype caller? I would appreciate any suggestions.

Annie

--
You received this message because you are subscribed to the Google Groups "bissnp-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bissnp-help...@googlegroups.com.
To post to this group, send email to bissn...@googlegroups.com.
Visit this group at http://groups.google.com/group/bissnp-help.
For more options, visit https://groups.google.com/d/optout.



--
Yaping Liu Ph.D.

Postdoctoral Associate
Manolis Kellis Lab
Computer Science and Artificial Intelligence Lab (CSAIL)
Massachusetts Institute of Technology
Broad Institute of MIT and Harvard


annie...@gmail.com

unread,
Sep 25, 2014, 8:25:59 AM9/25/14
to bissn...@googlegroups.com
Upon further review of the data I do think the reads are getting filtered out primarily because the default mmq score is set at 40. Obviously I can't change the quality of my data. I have to work with what I have. I've attached 2 sam files with the reads that overlap the particular site I was looking at. I can see that the mmq scores are very low for a good majority of the reads. Can you give me any advice as to what I might lower that threshold to and still be fairly accurate? Maybe 40 is the best but would 20 be decent. I am not trying to identify novel SNPs. I am only interested in differences in genotypes at known SNP sites. I would rather more sites be included that weren't ideal then think I have different genotypes where I really don't.

Annie   
11_204_fetal_site.sam
G917_maternal_site.sam

ping

unread,
Sep 25, 2014, 9:55:38 AM9/25/14
to bissn...@googlegroups.com
Hi Annie,
You could try this option to adjust mapping quality threshold,-mmq 20
and this option could help for base quality score threshold, -mbq 5

yaping  

--
You received this message because you are subscribed to the Google Groups "bissnp-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bissnp-help...@googlegroups.com.
To post to this group, send email to bissn...@googlegroups.com.
Visit this group at http://groups.google.com/group/bissnp-help.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages