problems with Ambiguous RAD-Tag

416 views
Skip to first unread message

Kritika Garg

unread,
Feb 26, 2014, 1:57:21 AM2/26/14
to stacks...@googlegroups.com
Hi All,
         I am having a problem running process_radtags on my ddRAD paired-end data. I have used nlaIII and mlucI enzyme combination for my work. I loose most of my reads to Ambiguous RAD-Tag. Below is the summary of my results.

Total Sequences           281322138
Ambiguous Barcodes    40525562
Low Quality                  12239773
Ambiguous RAD-Tag     140674240
Retained Reads            87882563

I have checked if my restriction site is intact (AATT) and for most of the reads due to presence of N's (NNTT, AANT and even NNNN) these reads are discarded.

In other posts it has been suggested that this can occur due to low diversity issues during illumina reads (https://groups.google.com/forum/#!searchin/stacks-users/rem$20file/stacks-users/-8JbcWDwzCw/xNPwaeDuXJMJ). Is there any way to recover these reads? I cannot use the --disable_rad_check as most of these reads are then discarded as low quality reads with default filter in stacks.

The other way around is to use only read 1 after clean up. During process_radtag most of the reads are actually recovered and I believe they are stored as rem file (Please correct me if I am wrong). Can I just combine these files and proceed with denovomap.pl? Is there a way to determine how many reads are rescued?

Thank you

Regards,
Kritika

Thank you

Regards
Kritika

Julian Catchen

unread,
Feb 28, 2014, 6:22:42 AM2/28/14
to stacks...@googlegroups.com, kritika...@gmail.com
Hi Kritika,

You have to consider the quality of your sequenced library if your
paired-end reads are all discarded due to ambiguous cut sites, and then,
if you disable that check they are disabled for low quality purposes. If
it is just an isolated problems where you have Ns in the first few
bases, but the rest of the read is fine, you can NOT use the -c option
(and only the -q option) which discards reads with Ns.

You could use a program like sed to correct your cut sites, since you
know the sequence. Again, I would only do this if the problem is
isolated to the cut site, and not if you have general quality issues
with the entire read.

Finally, you can just use read 1 and combine your reads together as you
suggest. These are stored in separate files to keep the paired reads in
phase (so the first single-end read finds its paired-end read also as
the first read in the paired-end file).

Best,

julian

Kritika Garg

unread,
Mar 18, 2014, 7:50:01 AM3/18/14
to stacks...@googlegroups.com, jcat...@uoregon.edu
Hi Julian,
              Thank you for the reply. I am sorry I could not get back to you before. We have issues only at the restriction enzyme site. I will try the sed command to correct my sequences. Currently I have concatenated the remainder and the main file for read 1.  I have few question regarding the analysis.

1. When we had run the stacks pipeline with rad data the restriction site was removed. I have noticed with the ddRAD data the restriction site is still present in the consensus sequence. Is there something wrong in the code I am using for cleaning the data. Below is the code I have used.

process_radtags -P -p ./raw -o ./samples/ -b./raw/barcodes -r -q --inline_index --renz_1 nlaIII --renz_2 mluCI -i gzfastq -t 80

2. When I use the populations wrapper to generate an output file for SNPs in the terminal it shows me that I have 1212 loci for which the summary statistics information is given. But when I check the structure file we have data only for 517 loci. Is this correct or am I making any mistake while the running code. I have given the code below.

populations -P ./default/ -b 1 -k -r 0.5 -f p_value -s -t 36 --structure --genepop --write_single_snp

Thank you

Regards,
Kritika

Julian Catchen

unread,
Mar 18, 2014, 1:22:47 PM3/18/14
to stacks...@googlegroups.com, kritika...@gmail.com
Hi Kritika,

It is normal for process_radtags to leave the restriction enzyme cut
site intact. It will only remove inline barcodes, if you have them.

For the populations program (it is not a wrapper), you have 1212 loci,
but you are asking only for the first SNP at each polymorphic locus to
be output in the structure file, resulting in 517 loci. If you count up
the number of unique loci in your sumstats file, it should equal 517,
then, a certain number of those loci have more than one SNP present.
> <https://groups.google.com/forum/#!searchin/stacks-users/rem$20file/stacks-users/-8JbcWDwzCw/xNPwaeDuXJMJ>).
>
> > Is there any way to recover these reads? I cannot use the
> > --disable_rad_check as most of these reads are then discarded as low
> > quality reads with default filter in stacks.
> >
> > The other way around is to use only read 1 after clean up. During
> > process_radtag most of the reads are actually recovered and I
> believe
> > they are stored as rem file (Please correct me if I am wrong). Can I
> > just combine these files and proceed with denovomap.pl
> <http://denovomap.pl>? Is there a way
Reply all
Reply to author
Forward
0 new messages