process_radtags step organization

743 views
Skip to first unread message

john.h...@gmail.com

unread,
Jun 3, 2013, 2:14:32 PM6/3/13
to stacks...@googlegroups.com
Hi, I am new to Stacks and want to double-check that process_radtags does what I think it does.

I have a bunch of files off the Illumina, all named something like: lane1_index01.fastq.gz.  There are 48x barcodes inline at the beginning of the sequence, and 12x third-read barcodes (the 'index01' in the file name).

If I run process_radtags on a directory that contains lane1_index01.fastq.gz and lane2_index01.fastq.gz, it will demultiplex these two files, then concatenate the reads from, say, barcode  AATCG in lane1_index01.fastq.gz and barcode AATCG in lane2_index01.fastq.gz into a single file called sample_AATCG.fq.  I'm happy so far, since lane1 and lane2 contain the same samples.

But if I run it on a directory containing:
 lane1_index01.fastq.gz
 lane2_index01.fastq.gz
 lane1_index02.fastq.gz
 lane2_index02.fastq.gz
It will still output only 48 files, even though there are 96 individuals between index01 and index02.  Is there any way to get around this, or do I just have to preprocess for each third-read index separately?

john.h...@gmail.com

unread,
Jun 3, 2013, 3:16:26 PM6/3/13
to stacks...@googlegroups.com
Also, I have had trouble with the denovo_map.pl script.

I run it and get back the following error message:
Unable to locate sample file './sample_CGGTA.fq '
I've tried putting the complete path instead of just "./sample_X", and I get the exact same error.

ananta

unread,
Jun 3, 2013, 3:37:37 PM6/3/13
to stacks...@googlegroups.com
This is something I also wanted to be changed. I actually do my own demultiplexing step, but its always good to have same program do most common things. I have to suggestion for Julian, 

1. allow to work with barcodes of different length.
2. use a barcode file format somewhat similar to TASSEL, which uses flowcell, lane, barcode and sample. and rename all demultiplexed files to real sample name rather than barcode and option to merge samples if they are same.

Thanks

john.h...@gmail.com

unread,
Jun 3, 2013, 4:35:16 PM6/3/13
to stacks...@googlegroups.com
This problem was due to some sort of line-ending problem.  I still don't know what, but I managed to make a new command script that didn't do this.

Julian Catchen

unread,
Jun 3, 2013, 6:22:57 PM6/3/13
to stacks...@googlegroups.com
Hi,

process_radtags can demultiplex using both sets of barcodes in one execution. In
the barcodes file, place the inline barcode first, followed by a tab, then the
index barcode second. So, you will have a two column file with the inline and
index barcodes in it, respectively. Specify the --inline_index comman line
option to process_radtags. process_radtags will look for all the allowable
combinations of the two sets of barcodes in the dataset.

To process an entire directory at once (as opposed to a single pair of files),
process_radtags expects the files to be named in the standard Illumina format,
which looks like this:

GfddRAD1_001_ATCACG_L008_R1_001.fastq.gz GfddRAD1_001_ATCACG_L008_R2_001.fastq.gz
...

Basically, process_radtags is looking for the "_R1_" and "_R2_" patterns to
differentiate which file contains which of the paired ends.

john.h...@gmail.com

unread,
Jun 4, 2013, 11:30:59 AM6/4/13
to stacks...@googlegroups.com, jcat...@uoregon.edu
Thanks, I thought I might be missing something.

john.h...@gmail.com

unread,
Jun 4, 2013, 6:07:24 PM6/4/13
to stacks...@googlegroups.com, jcat...@uoregon.edu
What if I don't have paired-end reads?  I.e., I have single reads indexed with "third"-reads, but no second read.

I renamed my reads to follow your format:
dataset_001_ATCACG_L001_R1_001.fastq.gz
and so on for the other 12 combinatorial barcodes, for a total of twelve files.
I gave it a list of barcodes, which was just
inline_index <tab> combinatorial_index
for every possible combination.

Now it is dumping all of my reads as ambiguous.  I looked in the log file, and it finds zero of the appropriate barcodes:

Barcode Total   No RadTag       Retained
GCATG-ATCACG    0       0       0
AACCA-ATCACG    0       0       0
CGATC-ATCACG    0       0       0
TCGAT-ATCACG    0       0       0

And only inline barcodes:

Sequences not recorded
Barcode Total
GAGTC   5744812
GAGAT   5457263
GCATG   5139967
GTAGT   5130677
GGCTC   4748589
AACCA   4647654

What am I doing wrong now?

Julian Catchen

unread,
Jun 4, 2013, 7:02:25 PM6/4/13
to stacks...@googlegroups.com
process_radtags can't currently handle this case. It expects combinatorial
barcodes to be associated with paired-end reads. I'll update the code to handle
this case. If you don't have paired-end reads, it is also unnecessary to rename
the files to account for _R1_ and _R2_, obviously.

In the meantime, you could run process_radtags twice, first specifying the
inline barcodes, and then a second time on each output file (using -f to direct
it to each file produced in the first run), specifying the index barcodes
(giving --inline_null and --index_null as the command line option, respectively).

It will take me a day or two to write the new code.


On 6/4/13 3:07 PM, john.h...@gmail.com wrote:
> What if I don't have paired-end reads? I.e., I have single reads indexed with
> "third"-reads, but no second read.
>
> I renamed my reads to follow your format:
> dataset_001_ATCACG_L001_R1_001.fastq.gz
> and so on for the other 12 combinatorial barcodes, for a total of twelve files.
> I gave it a list of barcodes, which was just
> inline_index <tab> combinatorial_index
> for every possible combination.
>
> Now it is dumping all of my reads as ambiguous. I looked in the log file, and
> it finds zero of the appropriate barcodes:
>
> /Barcode Total No RadTag Retained/
> /GCATG-ATCACG 0 0 0/
> /AACCA-ATCACG 0 0 0/
> /CGATC-ATCACG 0 0 0/
> /TCGAT-ATCACG 0 0 0/
>
> And only inline barcodes:
>
> /Sequences not recorded/
> /Barcode Total/
> /GAGTC 5744812/
> /GAGAT 5457263/
> /GCATG 5139967/
> /GTAGT 5130677/
> /GGCTC 4748589/
> /AACCA 4647654/
> --
> --
> For more options or to unsubscribe: http://groups.google.com/group/stacks-users
> Stacks website: http://creskolab.uoregon.edu/stacks/
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Stacks" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to stacks-users...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
Julian M Catchen, Ph.D.
Institute of Ecology and Evolution
University of Oregon
--
jcat...@uoregon.edu
http://www.uoregon.edu/~jcatchen/
Reply all
Reply to author
Forward
0 new messages