how do difference sequence lengths in ustacks affects downstream analysis in Stacks

452 views
Skip to first unread message

Rose

unread,
Jan 15, 2015, 1:12:52 AM1/15/15
to stacks...@googlegroups.com
Hi Jullian,

I am working with non-RAD seq data, and I jumped to ustacks without demultiplexed the data using process radtags (because the data is not barcoded). The warning appeared "different sequence lengths detected, this will interfere with Stacks algorithms" even I have cleaned and trim my samples to equal length each prior to Stacks analysis. 

May I know how do difference sequence lengths in ustacks may affects downstream analyses in Stacks?



Julian Catchen

unread,
Jan 15, 2015, 9:30:18 AM1/15/15
to stacks...@googlegroups.com
Hi Rose,

What type of data are you using? Is your data shotgun data or was it created with a restriction enzyme?

Even if you don't want to demultiplex, you can still use process_radtags to trim and/or remove low quality reads.

The program has detected different sequence lengths, so presumably your trimming operation did not complete correctly. You can verify this by checking the distribution of sequence lengths:

cat yourinputfile.fq | sed -n '2~4p' | awk '{print length}' | sort -n | uniq -c | sort -n

The ustacks program requires reads to have the same lengths. Different samples can themselves have different lengths (ustacks is run on each individual sample) but within a single sample they have to be the same length.

Best,

julian

mathi

unread,
Jan 30, 2015, 11:55:33 AM1/30/15
to stacks...@googlegroups.com, jcat...@illinois.edu
Hi,
in my experience, different sequence lengths are also causing nonsense if you try to build a catalog with cstacks from samples with different sequence lengths - the algorithm will make different length k-mers, and the mismatch parameter will obviously be not behave the same when its proportion to the sequence length changes...
Moreover, I think it actually breaks when you try to match long RADtags against catalog loci that are shorter than the query; the other way round is not better...
regards,
Mathias
Reply all
Reply to author
Forward
0 new messages