heterozygous calls

531 views
Skip to first unread message

Shiaoman Chao

unread,
Jun 29, 2015, 6:38:11 PM6/29/15
to tas...@googlegroups.com
Hi,

I'm testing the tassel 5 GBS v2 pipeline on the sequence data I have for 45 durum wheat samples, mostly cultivars. Durum is a tetraploid wheat by the way.  The same data has been analyzed using the originall version of the tassel pipeline.  I set the tag length at 100, and I've got 6X more SNP variants called from the v2 pipeline, quite impressive. However, I also got 30% overall rate of hets called using v2 as opposed to <1% using the original version.

I was wondering if  the quantitative SNP calling method was implemented in the v2.  When -c parameter was increased from 4 to10 in the TagExportToFastqPlugin, that only reduced the amount of variants called, but didn't help to reduce the amount of hets called. The default -eR of 0.01 in the ProductionSNPCaller Plugin V2 was used.  Other than these two parameters, I'm not sure what other parameters I could play with.

Thanks for your thoughts.

Shiaoman

Lynn Carol Johnson

unread,
Jun 30, 2015, 1:48:42 PM6/30/15
to tas...@googlegroups.com
Hi Shiaoman -

Yes, the GBS v2 pipeline does use a quantitative SNP calling method.  The value given for the –eR parameter is used when calculating the likelihood ratio cutoffs for quantitative SNP calling.  

To reduce the number of hets, you can either filter out the excessively heterozygous SNPs or try more stringent alignment parameters at the alignment step (bow tie/bwa).

Thanks - Lynn

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/0d840277-465c-406d-8597-e7eaae4acbd5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shiaoman Chao

unread,
Jul 1, 2015, 4:34:04 PM7/1/15
to tas...@googlegroups.com, lc...@cornell.edu
Thanks Lynn.

I have been using bwa as the aligner. I did a comparison between bwa and bowtie using the original tassel GBS pipeline, and found that bwa aligned far fewer SNPs from the D genome which durum wheat doesn't have, while the D genome sequences were in the wheat reference sequences. I guess I can try increasing bwa alignment stringency because the tag length is a bit longer now.

Thanks,
Shiaoman

Shiaoman Chao

unread,
Jul 13, 2015, 10:49:37 AM7/13/15
to tas...@googlegroups.com, lc...@cornell.edu
Hi Ed, Lynn,

I was wondering why the v2 pipeline didn't implement the parameter F, the inbreeding coefficient. My guess is the use of F must have helped filtering out a lot of false het SNP calls in the 'old' TASSEL pipeline, although I didn't have data from without using F. Jesse suggested we do a post filtering to remove hets and missing calls.

Thanks,
Shiaoman

Edward S. Buckler

unread,
Jul 13, 2015, 8:58:09 PM7/13/15
to tas...@googlegroups.com, Lynn Carol Johnson
Hi Shiaoman-
There is a new characterization plugin that reports results to the DB and a text file.  We then want users to come up with their own criteria, and return a score or list of good and bad SNPs.  

Lynn can provide further details if the online docs are not sufficient.

Cheers-
Ed


Shiaoman Chao

unread,
Jul 14, 2015, 7:16:19 PM7/14/15
to tas...@googlegroups.com, lc...@cornell.edu
Hi Ed,

You probably referred to the SNPQualityProfilerPlugin. I did run this plugin and got the outputStat.txt. Below are the headers reported in the text file. I do see the information would be a guide for filtering the SNPs. It would be good if Lynn could explain to me what some of the headers mean,like  minor2DepthProp, propCovered, propCovered2, what are GE2, DGE2

Chromosome
PositionID
avgDepth
minorDepthProp
minor2DepthProp
gapDepthProp
propCovered
propCovered2
taxaCntWithMinorAlleleGE2
genotypeCnt
minorAlleleFreqGE2
hetFreq_DGE2
inbredF_DGE2

Thanks,
Shiaoman
To post to this group, send email to ta...@googlegroups.com.

Lynn Carol Johnson

unread,
Jul 15, 2015, 6:58:43 AM7/15/15
to tas...@googlegroups.com
HI Shiaoman -

Listed below is a description of the SNPQualityProfiler output data. 
  • aveDepth:  Average number of times the minor allele appeared in the taxa (total number of times an allele appeared in  a taxon divided by the number of taxa)
  • minorDepthProp – the percentage of total depth where  minor allele 1 appeared (alleleDepths[1]/total_depths)
  • minorDepthProp2 – the percentage of total depth where minor allele 2 appeared (alleleDepths[2]/total_depths)
  • gapDepthProp – percentage of the total depth that represents a gap (allele = GAP_ALLELE, gapDepth/total_depths)
  • propCovered – proportion of taxa that contained the allele
  • propCovered2 – proportion of taxa where the alleles appeared > 1 time
  • taxaCntWithMinorAlleleGE2inbredF_DG2 – number of taxa with the 2nd minor allele

The following data only appear if there are more than 2 alleles:
  • genotypeCnt:  total number of all genotypes (homeMajor, het, homoMinor)
  • minorAlleleFreqGE2 – minor allele frequency (calculated as 1 - major allele Frequency)
  • hetFreq_DGE2 – proportion of heterozygotes
  • inbredF_DGE2 – inbred frequency, calculated as 1 - (proportion hets / expected Hets)
Lynn

Edward S. Buckler

unread,
Jul 15, 2015, 7:07:13 AM7/15/15
to tas...@googlegroups.com
"The following data only appear if there are more than 2 alleles
This should be “more than one allele”
-Ed

Shiaoman Chao

unread,
Jul 15, 2015, 2:45:52 PM7/15/15
to tas...@googlegroups.com
Perfect.  Thanks for the clarifications.  I will study this file a bit and figure out how to use the information for SNP filtering.

Thanks,
Shiaoman


--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to ta...@googlegroups.com.

Lynn Carol Johnson

unread,
Jul 16, 2015, 8:54:48 AM7/16/15
to tas...@googlegroups.com
I have added this data to the SNPQualityProfiler section of the GSBv2 wiki.  If you find other sections that need clarification, please let me know and I will update.  Our hope is the wiki pages can answer many of our user questions.


Thanks - Lynn

jlblanc...@gmail.com

unread,
Aug 26, 2015, 9:29:41 AM8/26/15
to TASSEL - Trait Analysis by Association, Evolution and Linkage, lc...@cornell.edu
HI Lynn,
It seems that the SNPQualityProfilerPlugin is not working well. I have observed that is giving 0.0000000 values in gapDepthProp to loci that seems to have gaps for some samples, as I could have observed in the TASSEL GUI.

cheers
José.
Reply all
Reply to author
Forward
0 new messages