Hi Paolo,
In regard to your first question, tags with internal cut sites (for ApeKI, GCAGC or GCTGC) (within the first 64 bases) are always trimmed (the GCAG or GCTG part is kept) and padded with polyA. The length of the tag is recorded, so every tag plus length combination is unique, so in theory they should not be merged with a slightly longer tag that actually ends in polyA in the original sequence (unlikely for a 55 bp tag, but quite possible for a 62 bp tag that ended in GCAGCA, if there was another otherwise identical tag that in reality ended in GCAGAA). I just looked at the relevant code in MergeMultipleTagCountPlugin, and it turns out that only the sequence (including the padding A’s) is used to compare (and merge) tags (the length is not used). So it seems that you have uncovered a subtle bug in the code. This means that we will, in very rare cases, ignore variation at the end of two tags, with and without a cut site, if the tag without a cut site in reality ends in GCAG + polyA (or GCTG + polyA), but is identical with the tag with the cut site up until the cut site (including GCAG or GCTG). Thanks for pointing this out. However, it seems like very minor bug, that may not be very high priority to fix right now.
Are your two pair of tags from real examples, or hypothetical? It seems unlikely that this tag exists in the actual sequence:
CAGCACACTGATCCTTGCTTGTTTTTCAATTTTCTGTCGCAGAGTTTCTGTGCAGAAAAAAAAA
As for your second question, I suspect that you have a dash (or space) in your command that is not the same ascii character as the one on your keyboard.
Best,
Jeff
--
Jeff Glaubitz
Project Manager
Genetic Architecture of Maize and Teosinte
National Science Foundation award 0820619
Institute for Genomic Diversity
Cornell University
175 Biotechnology Bldg
Ithaca, NY 14853
Phone: 607-255-1386
--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msg/tassel/-/h3X83daL8sEJ.
For more options, visit https://groups.google.com/groups/opt_out.
Hi Paolo,
I don’t think it will be a big problem. The 55bp case must be extremely rare, as the average is 62 bp (which makes sense). So, the (minor) bug results in a small number of SNPs being ignored at the very end of these 5500 reads (0.01% of your reads). This is the more error prone part of a read, so many of these are probably sequencing errors anyway.
It is not very likely that a (for example) 60 base read will align to a different place than a 64 base read ending in AAAA.
Best,
Jeff
--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/tassel/-/hg6j3KFR1UYJ.
Meant to write “It is not very likely that a (for example) 60 base read will align to a different place than AN OTHERWISE IDENTICAL 64 base read ending in AAAA”
Jeff
From: tas...@googlegroups.com [mailto:tas...@googlegroups.com]
On Behalf Of Jeff Glaubitz
Sent: Wednesday, February 27, 2013 5:27 PM
To: tas...@googlegroups.com
Subject: RE: [TASSEL-Group] tassel GBS pipeline MergeMultipleTagCountPlugin and MergeTagsByTaxaFilesPlugin
Hi Paolo,
I don’t think it will be a big problem. The 55bp case must be extremely rare, as the average is 62 bp (which makes sense). So, the (minor) bug results in a small number of SNPs being ignored at the very end of these 5500 reads (0.01% of your reads). This is the more error prone part of a read, so many of these are probably sequencing errors anyway.
It is not very likely that a (for example) 60 base read will align to a different place than a 64 base read ending in AAAA.
Best,
Jeff
From: tas...@googlegroups.com [mailto:tas...@googlegroups.com] On Behalf Of Paolo Cozzi
Sent: Tuesday, February 26, 2013 4:13 AM
To: tas...@googlegroups.com
--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/tassel/-/hg6j3KFR1UYJ.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
tassel+un...@googlegroups.com.
To post to this group, send email to tas...@googlegroups.com.