paired-end ddRAD data, webserver, and ustacks

814 views
Skip to first unread message

chris blair

unread,
Sep 13, 2013, 2:05:11 PM9/13/13
to stacks...@googlegroups.com
Hi Julian et al., 

I am working my way through the Stacks pipeline and have a few more questions. I am trying to run the pipeline components one by one to get a good feel for the software. I successfully ran process_radtags, which gave me the 4 files per barcode (all reads trimmed to 145 bp). My first question is what to do with the four files. I have read a few other threads on this issue. Is it best to simply merge the four files together and supply the pipeline with one fq file per barcode?

Second, when I try to run ustacks on one of the merged fq files I get the error: "Unable to open tag file for writing." I know this is probably something simple, but I cannot figure out what.

Third, I would like to utilize the web interface so I can visualize the data. However, I am by no means a computer expert and am having a difficult time figuring out what needs to be installed and how to go about doing it. Does anyone have some simple guidlines that I can use? 

Thanks everyone.

Chris

chris blair

unread,
Sep 17, 2013, 1:06:37 PM9/17/13
to stacks...@googlegroups.com
Hi all, 

Just for S & G's I ran one of the 4 output files from process_radtags for an individual through ustacks with no problems. Apparently, the issue came when I used cat to merge the four files together to feed into ustacks. I would prefer not to throw out half of my data (i.e. the paired reads), so any thoughts would be great. Thanks.

Ryan Waples

unread,
Sep 17, 2013, 1:27:59 PM9/17/13
to stacks...@googlegroups.com
My understanding of paired-end reads in stacks. 

Stacks doesn't (currently) use the P2 reads for SNP variant discovery or genotyping.  So you are stuck with just P1 reads for now.

Stacks works in haplotypes, this is easy to do when your sequences are one piece.  Once you try to infer haplotypes for an individual by combining multiple paired sequences you lose the ability to perfectly observe haplotypes and they have to be inferred in some way. 

-Ryan


--
--
For more options or to unsubscribe: http://groups.google.com/group/stacks-users
Stacks website: http://creskolab.uoregon.edu/stacks/
 
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

chris blair

unread,
Sep 17, 2013, 1:53:55 PM9/17/13
to stacks...@googlegroups.com
Thanks Ryan. I guess I will first run through the pipeline using only the single-end reads. Afterwards, it would be interesting to compare this to a pipeline concatenating the reads. Although concatenating like this is less than ideal and introduces additional sources of error, I think it would be worthwhile.

Chris

Ryan Waples

unread,
Sep 17, 2013, 2:00:27 PM9/17/13
to stacks...@googlegroups.com
Unless your P1 and P2 reads overlap I don't see concatenation helping.  I agree it would be very desireable to leverage the info in the P2 reads though.

-ryan


Julian Catchen

unread,
Sep 17, 2013, 10:09:21 PM9/17/13
to stacks...@googlegroups.com, Christopher Blair
Hi All,

If you are using double-digested data then Stacks can use both single and
paired-end reads. The software will consider the two ends of the read (each
anchored with its own restriction enzyme) as separate loci.

Chris - your error is not likely related to concatenating the files together, it
is likely that you asked Stacks to write to a non-existent directory.

Best,

julian

chris blair

unread,
Sep 18, 2013, 11:44:37 AM9/18/13
to stacks...@googlegroups.com
Hi Julian, 

Thanks for the reply. I am a bit hesitant to treat the paired-end reads as separate loci (as they are obviously linked). Perhaps I will run the pipeline both ways (with and w/o the paired end data). Thanks.

Chris


On Friday, 13 September 2013 14:05:11 UTC-4, chris blair wrote:

Paul Maier

unread,
Jan 2, 2014, 1:41:45 AM1/2/14
to stacks...@googlegroups.com
Hi all,

This thread is very interesting to me because I'm also working with ddRAD paired-end data. Each pair of R1 and R2 is gapped by a known amount +/- approximately 90bp, so linking these reads as haplotypes would be ideal. The 90bp flop is not critically important and each R1/R2 pair could be concatenated with an appropriate number of Ns. Then each full locus could be run through the pipeline as a coherent whole.

Would this be easy/possible to incorporate into stacks? Does anyone know of another script that implements this?

Thanks,
Paul
Reply all
Reply to author
Forward
0 new messages