tassel GBS pipeline MergeMultipleTagCountPlugin and MergeTagsByTaxaFilesPlugin

280 views
Skip to first unread message

Paolo Cozzi

unread,
Feb 25, 2013, 9:56:35 AM2/25/13
to tas...@googlegroups.com
Dear all,

I found some problems while running GBS-pipeline (Tassel Version: 3.0.146 Date: January 10, 2013).

The first problem concerns MergeMultipleTagCountPlugin. In this phase I can't understand how the different tags are merged together in the final FASTQ file when the tags terminate with poly-A. I know that a poly-A tail is added to the tags when they are shorter than 64 bp, and that these tags are merged in the final FASTQ file maintaining their original size. However, there some ambiguous situations, as for example:
- in the case of the final sequence CAGCACACTGATCCTTGCTTGTTTTTCAATTTTCTGTCGCAGAGTTTCTGTGCAG (55bp, 2 tags), which corresponds to 2 tags of different size (55bp and 64bp), the shorter tag was chosen.
- vice versa, in the case of the final sequence CAGCAAAAACACCAGTAGCCAACACTTCCCAACTATACTACAGCAATTGAACATACTAGCAGAA (64bp, 2 tags), derived from 2 tags of 64 and 62bp, the longer tag was chosen.

So the first question is: how are the merged-tags created? why not merge the tags by identity?

The second problem concerns MergeTagsByTaxaFilesPlugin: By specifing a command line like this: "~/Programmi/tassel3-standalone/run_pipeline.pl -fork1 -MergeTagsByTaxaFilesPlugin -i test/tbt_files -o test/test.tbt.byte -endPlugin -runfork1" the program dies with this status output:

Tassel Pipeline Arguments: -fork1 -MergeTagsByTaxaFilesPlugin -i test/tbt_files -o test/test3.tbt.byte -endPlugin -runfork1
[main] INFO net.maizegenetics.pipeline.TasselPipeline - Tassel Version: 3.0.146 Date: January 10, 2013
[main] ERROR net.maizegenetics.pipeline.TasselPipeline - java.lang.IllegalArgumentException: TasselPipeline: parseArgs: No -endPlugin flag specified.
java.lang.IllegalArgumentException: TasselPipeline: parseArgs: Unknown parameter: -MergeTagsByTaxaFilesPlugin
at net.maizegenetics.pipeline.TasselPipeline.parseArgs(TasselPipeline.java:1229)
at net.maizegenetics.pipeline.TasselPipeline.<init>(TasselPipeline.java:121)
at net.maizegenetics.pipeline.TasselPipeline.main(TasselPipeline.java:161)

I think that this plugin cannot parse the command line correctly. In fact, if I try to create the XML file, I get:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<TasselPipeline>
<fork1>
<MergeTagsByTaxaFilesPlugin>
<i>test/tbt_files</i>
<o>test/test.tbt.byte -endPlugin</o>
<runfork1/>
</MergeTagsByTaxaFilesPlugin>
</fork1>
</TasselPipeline>

and then if I fix manually the XML file, tassel works correctly.

Thanks for your attention,

Paolo.

Jeff Glaubitz

unread,
Feb 25, 2013, 1:00:12 PM2/25/13
to tas...@googlegroups.com

Hi Paolo,

 

In regard to your first question, tags with internal cut sites (for ApeKI, GCAGC or GCTGC) (within the first 64 bases) are always trimmed (the GCAG or GCTG part is kept) and padded with polyA.  The length of the tag is recorded, so every tag plus length combination is unique, so in theory they should not be merged with a slightly longer tag that actually ends in polyA in the original sequence (unlikely for a 55 bp tag, but quite possible for a 62 bp tag that ended in GCAGCA, if there was another otherwise identical tag that in reality ended in GCAGAA).  I just looked at the relevant code in MergeMultipleTagCountPlugin, and it turns out that only the sequence (including the padding A’s) is used to compare (and merge) tags (the length is not used). So it seems that you have uncovered a subtle bug in the code.  This means that we will, in very rare cases, ignore variation at the end of two tags, with and without a cut site, if the tag without a cut site in reality ends in GCAG + polyA (or GCTG + polyA), but is identical with the tag with the cut site up until the cut site (including GCAG or GCTG).  Thanks for pointing this out.  However, it seems like very minor bug, that may not be very high priority to fix right now.

 

Are your two pair of tags from real examples, or hypothetical?  It seems unlikely that this tag exists in the actual sequence:

CAGCACACTGATCCTTGCTTGTTTTTCAATTTTCTGTCGCAGAGTTTCTGTGCAGAAAAAAAAA

 

As for your second question, I suspect that you have a dash (or space) in your command that is not the same ascii character as the one on your keyboard.

 

Best,

 

Jeff

 

--

Jeff Glaubitz

Project Manager

Genetic Architecture of Maize and Teosinte

National Science Foundation award 0820619

http://www.panzea.org

Institute for Genomic Diversity

Cornell University

175 Biotechnology Bldg

Ithaca, NY 14853

Phone: 607-255-1386

jcg...@cornell.edu

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/tassel/-/h3X83daL8sEJ.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Paolo Cozzi

unread,
Feb 26, 2013, 4:12:37 AM2/26/13
to tas...@googlegroups.com
Hi Jeff,

Thanks for your reply. I've controlled the command line of MergeTagsByTaxaFilesPlugin and you were right: now the command works fine.

Concerning the tag question, the cases I've reported are real, I found them in our data. Of course, the 55bp tags not seems to align to genome (we are working on rice). But in our data we found about 5500 (of 45 million) "merged tag counts" that are composed by tags of different sizes which are nearly 62 bp on average length.

So, the question is may this type of merging affect the final imputation by passing the filters imposed during the GBS pipeline or by aligning on genome in different positions?

Many thanks,

Paolo.

Jeff Glaubitz

unread,
Feb 27, 2013, 5:27:28 PM2/27/13
to tas...@googlegroups.com

Hi Paolo,

 

I don’t think it will be a big problem.  The 55bp case must be extremely rare, as the average is 62 bp (which makes sense).  So, the (minor) bug results in a small number of SNPs being ignored at the very end of these 5500 reads (0.01% of your reads).  This is the more error prone part of a read, so many of these are probably sequencing errors anyway. 

 

It is not very likely that a (for example) 60 base read will align to a different place than a 64 base read ending in AAAA.

 

Best,

 

Jeff

--

You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msg/tassel/-/hg6j3KFR1UYJ.

Jeff Glaubitz

unread,
Feb 27, 2013, 5:31:28 PM2/27/13
to tas...@googlegroups.com

Meant to write “It is not very likely that a (for example) 60 base read will align to a different place than AN OTHERWISE IDENTICAL 64 base read ending in AAAA”

 

Jeff

 

 

From: tas...@googlegroups.com [mailto:tas...@googlegroups.com] On Behalf Of Jeff Glaubitz
Sent: Wednesday, February 27, 2013 5:27 PM
To: tas...@googlegroups.com
Subject: RE: [TASSEL-Group] tassel GBS pipeline MergeMultipleTagCountPlugin and MergeTagsByTaxaFilesPlugin

 

Hi Paolo,

 

I don’t think it will be a big problem.  The 55bp case must be extremely rare, as the average is 62 bp (which makes sense).  So, the (minor) bug results in a small number of SNPs being ignored at the very end of these 5500 reads (0.01% of your reads).  This is the more error prone part of a read, so many of these are probably sequencing errors anyway. 

 

It is not very likely that a (for example) 60 base read will align to a different place than a 64 base read ending in AAAA.

 

Best,

 

Jeff

 

 


Sent: Tuesday, February 26, 2013 4:13 AM
To: tas...@googlegroups.com

--

You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msg/tassel/-/hg6j3KFR1UYJ.


For more options, visit https://groups.google.com/groups/opt_out.
 
 

--

You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages