We ignore the illumina quality filters. We use the minimum tag count (-c option) instead. The more often a given tag has been observed (across all of the samples) the more likely it is real.
Best,
Jeff
--
Jeff Glaubitz
Project Manager
Genetic Architecture of Maize and Teosinte
National Science Foundation award 0820619
Institute for Genomic Diversity
Cornell University
175 Biotechnology Bldg
Ithaca, NY 14853
Phone: 607-255-1386
--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tassel/fbab61d9-e2a2-4ce1-a388-1fc28f59c723%40googlegroups.com?hl=en-US.
For more options, visit https://groups.google.com/groups/opt_out.
>What if all the bases at a specific SNP have low quality scores, even though it pass the tag count?
Then the SNP will get called (provided that those tags align to a unique genomic position). If the SNP is merely artifact of sequencing errors, then it will hopefully be removed at subsequent filtering steps (e.g., if it is too heterozygous among inbred samples, etc).
Best,
Jeff
From: tas...@googlegroups.com [mailto:tas...@googlegroups.com]
On Behalf Of peter
Sent: Thursday, May 30, 2013 1:55 PM
To: tas...@googlegroups.com
Subject: Re: [TASSEL-Group] how to provide my own quality cutoff in GBS pipeline
Thanks Jeff for the kind reply.
I did read the tassel document and it is said " We have found that perfectly good reads – exactly matching a 64 base tag that we have seen many times – can fail to pass Illumina’s filters."
Is it the reason you don't use and provide any quality cutoff? What if all the bases at a specific SNP have low quality scores, even though it pass the tag count? Thanks.
Best,
Peter
On Thursday, May 30, 2013 1:39:19 PM UTC-4, Jeff Glaubitz wrote:
We ignore the illumina quality filters. We use the minimum tag count (-c option) instead. The more often a given tag has been observed (across all of the samples) the more likely it is real.
Best,
Jeff
From: tas...@googlegroups.com [mailto:tas...@googlegroups.com] On Behalf Of peter
Sent: Thursday, May 30, 2013 1:34 PM
To: tas...@googlegroups.com
Subject: [TASSEL-Group] how to provide my own quality cutoff in GBS pipeline
Hi all,
I am curious if there is a way for me to supply my own quality cutoff. I am aware that you use Illumina quality filters. I am curious what is the Illumina quality filters quality cutoff you used? Thanks a lot!
Best,
Peter
--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tassel/fbab61d9-e2a2-4ce1-a388-1fc28f59c723%40googlegroups.com?hl=en-US.
For more options, visit
https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/6d483c31-63ff-42d2-804c-00d267e177fa%40googlegroups.com?hl=en-US.
No there is no Q10 filter in the current pipeline. It has evolved considerably since the Elshire et al. 2011 paper. The current pipeline completely ignores the Illumina quality scores.
Also, your summary of the Elshire et al. 2011 paper is inaccurate. Here is what it says:
“To generate a reference set of 64 base sequence tags to be included in a presence/absence genotype table, only reads with a minimum Q-score of 10 across the first 72 bases) and that occurred at least twice were kept. We opted to use this somewhat low-stringency minimum Q-score cutoff to maximize the number of useful sequence tags. Sequence tags containing random sequencing errors should not occur multiple times in multiple samples and should not map genetically, so they should be filtered out in subsequent steps. To this set of reference tags, the expected 64 base tags from an in silico ApeKI digest of the maize reference genome, B73 RefGen v1 [21], were added (with fragments shorter than 64 bases filled with polyA, as above). To fill in the observed counts in the genotype table, a second pass across the reads for each DNA sample was performed. In this second pass, 64 base reads were counted for each sample (and the count added to the genotype table) if they perfectly matched one of the reference tags, regardless of their minimum Q score. The resulting genotype table was then filtered to remove tags that occurred in 10 or fewer DNA samples; this should remove most of the sequencing errors.”
In other words, it used a Q10 filter in the first step, to come up with a master list of tags. And then built a genotype table (i.e., TBT) by included all reads that perfectly matched a read in this master list, regardless of their quality score.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/3da6c842-8ff7-425e-b876-32e97bcdc703%40googlegroups.com?hl=en-US.