Hi Leanne,
In terms of the restriction enzyme, generally, double digest protocols are designed such that all sequence reads associated with one sequencing direction will be associated with one of the two restriction enzymes. So, even if you're doing double digest, I think you'll likely still have a RE site in your sequences, and yes, you'd need to identify the appropriate one.
If you're unsure which one to use, the easiest way to check might be to view your fastq file (should probably do this with a command like 'more' or 'less' in a Terminal window - likely too big for programs to open the whole file), and look for a part of the sequence that is repeated in every read (ignoring sequencing error). If you have demultiplexed data, and no inline barcodes (see below), this repeated section should be at the beginning of each read. Note that this will probably not make up the entire recognition sequence for the enzyme, and the portion that is actually in the reads is what you want to feed in to AftrRAD for the 're' parameter.
In terms of your demultiplex question, I might need a bit of clarification where you say the barcodes it creates don't match your samples, but the demultiplex option might be a little confusing in general, so I'll try to explain it here a little better than it is in the current manual. If this doesn’t answer your question, then let me know.
First, there are two main ways to label each individual sample when prepping libraries. One option is to construct your adaptors such that a “barcode” unique to each sample becomes the first X bases (usually 5-7, but could be more or less) of each sequence read for that sample. In this case, if you do a single-read 50 bp Illumina run with 6 bp barcodes, then you sacrifice 6 of the 50 bases as barcode. I'll refer to this as inline-barcodes, or just "barcodes" for short. An alternative is to put an "index" sequence more in the middle of the adaptor, distant from where the adaptor actually attaches to the fragment. This is the most common method in Illumina sequencing, especially when you get outside of RADseq (though lots of RADseq folks do this too). For clarity, I'll refer to these as "indexes" as opposed to "barcodes", though these two terms are often interchanged. In the case of indexes, the sequencer has to be set for an “indexed run”, and in the case of a SR-50 index run, it would sequence the first 50 bases of your fragments as normal, but then start a second, separate sequencing reaction with a different sequencing primer that is oriented in the opposite direction as the first, sequencing just the index position. It will use these index reads to sort the sequences out into different fastq files (demultiplexing). This is unlike the inline-barcode approach, where the sequencer doesn’t actually interpret the barcodes, but just dumps all the sequences into the same fastq file – you have to sort them out later.
When we first put AftrRAD together, we wrote it for our own data (yeah, maybe a little selfish there), which at the time only contained in-line barcodes (everything was in one fastq file). This is where the Barcode file comes in to play, as it tells AftrRAD how to sort the reads out and assign them to individual samples based on the inline barcodes. The way it is written, AftrRAD needs each sequence read to start with an inline barcode, but datasets constructed with the index approach don't have these, and instead, their reads are demultiplexed into individual files/samples by the sequencer. The easiest way for me to deal with this was to generate a unique, random barcode sequence for each of the demultiplexed files, and use these to create inline barcodes by adding them to the reads in the appropriate fastq file. Then, we just concatenate all of the deumltiplexed files together (with all reads now identified with an inline barcode) into a single file that we name “AllSamples.txt” (so essentially, we’re going backwards by undoing the demultiplexing that the sequencer did). This puts the data in the format that AftrRAD expects. From here, AftrRAD re-demultiplexes the samples as part of a normal run, doing it based on the barcodes that were added.
So, in terms of your specific question, the barcodes in the Barcode file that is created when the 'dplexedData' flag is used are randomly generated barcodes, and so will not match the index sequences contained in your adaptors. This is OK, and you are correct that if you're using the 'dplexedData' flag, you shouldn't be entering any barcodes at the start of the run. Hopefully, this is what you were asking. Alternatively, if when using this 'dplexedData' flag, sequence reads from one original demultiplexed file had gotten the same inline barcode as sequence reads from a different demultiplexed file, then this would certainly be a problem. I’m not aware of any problems like this, but if you think this may be occurring, please let me know and we’ll look in to it more.
Sorry about the really long response, but hopefully that will clarify something for someone!
Mike
Arguments entered are...
Help 0
re 0
numIndels 3
Phred 33
minQual 20
DataPath Data/
MaxH 90
BarcodePath Barcodes/
P2 noP2
stringLength 15
minIden 90
minParalog 5
dplexedData 1
minDepth 5
But the run and the whole computer freezes at this point...
......
Filtering sequence 366000000.
Use of uninitialized value $TotalBarcodeNonMatches in concatenation (.) or string at AftrRAD.pl line 957.
Use of uninitialized value $TotalBarcodeNonMatches in concatenation (.) or string at AftrRAD.pl line 958.
Demultiplexing samples for data file AllSamples.txt.
No specific error message - I just come back to the computer and it is frozen. I thought it might be a space issue but the external HD I'm using still has 1TB available.
Any other ideas?
Thanks
Leanne
Hi Mike,
Apologies if you're getting this message to you inbox several times.
Thanks. I don't think it should be the read lengths causing a problem. I realised this could be an issue when I first ran AftrRAD as the data I have is from two separate sequencing runs. So at first I trimmed everything back to 49bp then ran through cutadapt with two steps - trimming adaptors and then removing reads less than 49bp.
Leanne
Currently genotyping sample...SscRAD1_201_trimmed49.fq sh: R: command not found
No such file or directory at Genotype.pl line 168.