How to use paired end reads to create STACKS and call SNPs

kb

unread,

Oct 24, 2013, 10:27:54 AM10/24/13

to stacks...@googlegroups.com

Hello all,

I have Illumina paired end reads (single digest RAD-Seq project) as an input for STACKS. I have reference genome against which all the demultiplexed samples were mapped using Bowtie2 (Version 2.1.0) and then sam files were used in STACKS (Version 1.06). When looked through STACKS output, it seems like Read1 and Read2 are treated as separate reads and are used to create stacks and call SNPs separately.

How can I use Read1 and Read2 (as paired reads) together to create stacks and call SNPs? Could you please let me know if there are any additional steps that we need to run for paired end data? I looked through 'sort_read_pairs.pl' in STACKS, however failed to understand it being run at the end of stacks pipeline (after running stacks pipeline and calling SNPs from read1 and read2 are used separtely)

Regards

Ketaki

Julian Catchen

unread,

Oct 31, 2013, 12:51:41 AM10/31/13

to stacks...@googlegroups.com, Ketaki Bhide

Hi Ketaki,

You should only align the single-end reads to the reference genome, and then
only feed these single-end alignments to the Stacks pipeline. Stacks relies on
the restriction enzyme cutsites to line up the stacks, whether or not you have a
reference genome. Using your current data set, including both the single and
paired-end reads will likely give you some strange results, since different
groups of paired-end reads will appear to 'stack', despite missing a restriction
enzyme cutsite, and SNP calling will be affected quite a bit, since these false
stacks will have low coverage.

The sort_read_pairs.pl program is intended to bucket paired-end reads together
so they can be assembled de novo. You do not need to do this (obviously) if you
have a reference genome. There are a number of uses of these paired-end contigs,
and you can call SNPs from them with quite a bit of work.

Although we eventually plan to support paired-end reads, currently they can't be
used directly in the Stacks pipeline. If you want to call SNPs in the
paired-ends (whether assembled de novo or aligned to a reference genome) you'll
have to use other tools, such as GATK to call the SNPs, and then integrate them
into the Stacks data. Some people have done this but it is not straightforward
and requires the ability to write some custom code.

Best,

julian

On 10/24/13 7:27 AM, kb wrote:
> Hello all,
>
> I have Illumina paired end reads (single digest RAD-Seq project)
> as an input for STACKS. I have reference genome against which all the
> demultiplexed samples were mapped using Bowtie2 (Version 2.1.0) and then sam
> files were used in STACKS (Version 1.06). When looked through STACKS output, it
> seems like Read1 and Read2 are treated as separate reads and are used to create
> stacks and call SNPs separately.
>
> How can I use Read1 and Read2 (as paired reads) together to create
> stacks and call SNPs? Could you please let me know if there are any additional
> steps that we need to run for paired end data? I looked through

> '*sort_read_pairs.p*l' in STACKS, however failed to understand it being run at

kb

unread,

Oct 31, 2013, 4:26:52 PM10/31/13

to stacks...@googlegroups.com

Hello Julian,
Thank you for your reply. I was able to understand better from your email.

Thank you.
Regards
Ketaki

Reply all

Reply to author

Forward