how to provide my own quality cutoff in GBS pipeline

peter

unread,

May 30, 2013, 1:34:25 PM5/30/13

to tas...@googlegroups.com

Hi all,

I am curious if there is a way for me to supply my own quality cutoff. I am aware that you use Illumina quality filters. I am curious what is the Illumina quality filters quality cutoff you used? Thanks a lot!

Best,

Peter

Jeff Glaubitz

unread,

May 30, 2013, 1:39:19 PM5/30/13

to tas...@googlegroups.com

We ignore the illumina quality filters. We use the minimum tag count (-c option) instead. The more often a given tag has been observed (across all of the samples) the more likely it is real.

Best,

Jeff

--

Jeff Glaubitz

Project Manager

Genetic Architecture of Maize and Teosinte

National Science Foundation award 0820619

http://www.panzea.org

Institute for Genomic Diversity

Cornell University

175 Biotechnology Bldg

Ithaca, NY 14853

Phone: 607-255-1386

jcg...@cornell.edu

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/fbab61d9-e2a2-4ce1-a388-1fc28f59c723%40googlegroups.com?hl=en-US.
For more options, visit https://groups.google.com/groups/opt_out.

peter

unread,

May 30, 2013, 1:55:03 PM5/30/13

to tas...@googlegroups.com

Thanks Jeff for the kind reply.

I did read the tassel document and it is said " We have found that perfectly good reads – exactly matching a 64 base tag that we have seen many times – can fail to pass Illumina’s filters."

Is it the reason you don't use and provide any quality cutoff? What if all the bases at a specific SNP have low quality scores, even though it pass the tag count? Thanks.

Best,

Peter

Jeff Glaubitz

unread,

May 30, 2013, 5:05:15 PM5/30/13

to tas...@googlegroups.com

>What if all the bases at a specific SNP have low quality scores, even though it pass the tag count?

Then the SNP will get called (provided that those tags align to a unique genomic position). If the SNP is merely artifact of sequencing errors, then it will hopefully be removed at subsequent filtering steps (e.g., if it is too heterozygous among inbred samples, etc).

Best,

Jeff

From: tas...@googlegroups.com [mailto:tas...@googlegroups.com] On Behalf Of peter
Sent: Thursday, May 30, 2013 1:55 PM
To: tas...@googlegroups.com
Subject: Re: [TASSEL-Group] how to provide my own quality cutoff in GBS pipeline

Thanks Jeff for the kind reply.

I did read the tassel document and it is said " We have found that perfectly good reads – exactly matching a 64 base tag that we have seen many times – can fail to pass Illumina’s filters."

Is it the reason you don't use and provide any quality cutoff? What if all the bases at a specific SNP have low quality scores, even though it pass the tag count? Thanks.

Best,

Peter

On Thursday, May 30, 2013 1:39:19 PM UTC-4, Jeff Glaubitz wrote:

We ignore the illumina quality filters. We use the minimum tag count (-c option) instead. The more often a given tag has been observed (across all of the samples) the more likely it is real.

Best,

Jeff

From: tas...@googlegroups.com [mailto:tas...@googlegroups.com] On Behalf Of peter

Sent: Thursday, May 30, 2013 1:34 PM
To: tas...@googlegroups.com
Subject: [TASSEL-Group] how to provide my own quality cutoff in GBS pipeline

Hi all,

I am curious if there is a way for me to supply my own quality cutoff. I am aware that you use Illumina quality filters. I am curious what is the Illumina quality filters quality cutoff you used? Thanks a lot!

Best,

Peter

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/fbab61d9-e2a2-4ce1-a388-1fc28f59c723%40googlegroups.com?hl=en-US.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/6d483c31-63ff-42d2-804c-00d267e177fa%40googlegroups.com?hl=en-US.

peter

unread,

May 31, 2013, 11:41:37 AM5/31/13

to tas...@googlegroups.com

Thanks Jeff!

I also re-read the original paper today regarding the quality filter. At the first step, the paper don't use any quality filter to deal with raw reads. However, for the subsequent step, it uses a minimal quality cutoff Q10 for filtering.

It would kill me for not knowing what is going on. Does Q10 filter is applied in the current GBS pipeline? Thanks.

Best,

Peter

Jeff Glaubitz

unread,

May 31, 2013, 12:00:56 PM5/31/13

to tas...@googlegroups.com

No there is no Q10 filter in the current pipeline. It has evolved considerably since the Elshire et al. 2011 paper. The current pipeline completely ignores the Illumina quality scores.

Also, your summary of the Elshire et al. 2011 paper is inaccurate. Here is what it says:

“To generate a reference set of 64 base sequence tags to be included in a presence/absence genotype table, only reads with a minimum Q-score of 10 across the first 72 bases) and that occurred at least twice were kept. We opted to use this somewhat low-stringency minimum Q-score cutoff to maximize the number of useful sequence tags. Sequence tags containing random sequencing errors should not occur multiple times in multiple samples and should not map genetically, so they should be filtered out in subsequent steps. To this set of reference tags, the expected 64 base tags from an in silico ApeKI digest of the maize reference genome, B73 RefGen v1 [21], were added (with fragments shorter than 64 bases filled with polyA, as above). To fill in the observed counts in the genotype table, a second pass across the reads for each DNA sample was performed. In this second pass, 64 base reads were counted for each sample (and the count added to the genotype table) if they perfectly matched one of the reference tags, regardless of their minimum Q score. The resulting genotype table was then filtered to remove tags that occurred in 10 or fewer DNA samples; this should remove most of the sequencing errors.”

In other words, it used a Q10 filter in the first step, to come up with a master list of tags. And then built a genotype table (i.e., TBT) by included all reads that perfectly matched a read in this master list, regardless of their quality score.

To view this discussion on the web visit https://groups.google.com/d/msgid/tassel/3da6c842-8ff7-425e-b876-32e97bcdc703%40googlegroups.com?hl=en-US.

Reply all

Reply to author

Forward