blacklisted loci

196 views
Skip to first unread message

chris blair

unread,
Apr 15, 2014, 2:33:14 PM4/15/14
to stacks...@googlegroups.com
Hi all, 

I had a general question of why Stacks blacklists loci. According to Julian this often suggests repetitive sequences, but I wanted to know if there are any other reasons why Stacks might blacklist tags. As an example, if I build loci denovo(ustacks), approximately 3% of tags are blacklisted. Conversely, if I use a reference (pstacks) about 33% of tags are blacklisted. Thanks!

Chris

Ryan Waples

unread,
Apr 15, 2014, 4:36:05 PM4/15/14
to stacks...@googlegroups.com
Chris, 

For ustacks, I think that Blacklisting occurs from two separate filters (there could be more?): 
-high depth 
-too many alleles (but cannot be 'Deleveraged')

For pstacks, I'm not sure quite what blacklisted means, but perhaps it is reads that aren't uniquely aligned in your sam file?

When you say 3% of 'tags' are blacklisted (ustacks),do you mean 3% *loci* are blacklisted?

When you say 33% of 'tags' are blacklisted (pstacks), do you mean 33% *reads* are blacklisted.

If so, you may not be comparing apples to apples.

-Ryan 





--
Stacks website: http://creskolab.uoregon.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

chris blair

unread,
Apr 15, 2014, 5:39:58 PM4/15/14
to stacks...@googlegroups.com
Hi Ryan, 

Thanks for the message. I used Bowtie2 to map reads to a reference using default parameters which, to my knowledge, only reports the 'best' alignment in the resulting SAM file (unless I am missing something). For these percentages I divided the number of blacklisted loci per individual (filtered using the webserver) by the total number of unique stacks for the same individual.

Chris

Ryan Waples

unread,
Apr 15, 2014, 5:50:36 PM4/15/14
to stacks...@googlegroups.com
Hmm,

Well, you will have to think about your reference was constructed, how does it deal with homologs and repetitive sequences?  

Are reference loci (i.e. contigs) without any depth considered 'blacklisted'?

You could look over your distributions of coverage from the pstacks run and from the ustacks across all loci, this might give you a hint of why loci are being blacklisted.


-Ryan 

Julian Catchen

unread,
Apr 19, 2014, 6:34:51 PM4/19/14
to stacks...@googlegroups.com, blai...@gmail.com
Hi Chris,

The only reason loci are blacklisted in pstacks is when there are an excessive number of Ns at the locus that can prevent any haplotypes from being read. So, if your reference is evolutionarily far away from the sequence you are aligning, or you have a low quality draft reference, or you allowed soft-masking in the alignment (where the aligner only aligns a 'seed' from the read and masks the remainder of the read off as Ns), you can end up with a lot of gaps inserted into the alignment. In these cases, even though pstacks can call SNPs from the columns of the stack, it can't successfully read across the rows of the stack to get the haplotype due to the presence of Ns.

I would take a look at a few of your pstacks-blacklisted loci in the web interface or in IGV and see if this is the case. You may need to tighten your alignment parameters.

Best,

julian
Reply all
Reply to author
Forward
0 new messages