Hi Julian,
thank you for the response. I think you are right regarding demultiplexing i5/i7 barcodes only. Also, I was able to retain more information about library
preparation. RAD3 method was used for this project (the third enzyme was
used to cut dimers). In Adapterama III paper where they describe the
method they talk about using Stacks for data cleaning and processing. So
I just assumed that my data should work with software. First, I want to ask the question: Can the 3RAD be
the issue here?
Three enzymes that were used: BamHI, mspI, and ClaI. In the Adapterama III paper they provide the table with enzymes and say that in this case Cla enzyme will be the third enzyme (for cutting dimers).
I looked into sequences of two files (that correspond to two reads for the same sample) and barcodes present in the reads. The cutting sites match mspI and bamHI.
I ran the following script for the pair of reads that I checked sequences of first:
process_radtags -1 ./AS-1_2-PN_1.fastq.gz -2 ./AS-1_2-PN_2.fastq.gz -o ./cleaned/ -b ./barcodes_no_cutting_site_AS.txt --renz_1 mspI --renz_2 bamHI --inline-inline -r -c -q -y fastq --adapter_1 TACACTCTTTCCCTACACGACGCTCTTCCGATCT --adapter_2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT --adapter_mm 2
As you can see in the name of barcodes file I deleted sequences for the cutting sites from them, meaning that instead of mspI barcode AGCGTTGATCGG I used AGCGTTGAT. And instead of Bam barcode ATGCTGTCGATCC I used ATGCTGTC (I followed your suggestions from the thread
https://groups.google.com/forum/#!topic/stacks-users/Sa8X4aArDHA).
As a result of this analysis, I received the following statistics:
1375804 total sequences
11222 reads contained adapter sequence (0.8%)
702566 barcode not found drops (51.1%)
53098 low quality read drops (3.9%)
16553 RAD cutsite not found drops (1.2%)
592365 retained reads (43.1%)
So, 43.1% of reads retained. In 51.1% data barcode was not found.That's in half of the reads.
I tried to run analysis with only one restriction enzyme: first, only mspI:
1375804 total sequences
11470 reads contained adapter sequence (0.8%)
690960 barcode not found drops (50.2%)
60078 low quality read drops (4.4%)
2058 RAD cutsite not found drops (0.1%)
611238 retained reads (44.4%)
Very similar statistics to the one when both enzymes were listed. In half of cases barcodes not found.
Second, only with Bam:
1375804 total sequences
0 reads contained adapter sequence (0.0%)
1375784 barcode not found drops (100.0%)
3 low quality read drops (0.0%)
5 RAD cutsite not found drops (0.0%)
12 retained reads (0.0%)
As a result, 0% retained reads. Based on this I want to say that Stacks cannot detect Bam barcode. Is my conclusion correct? In the fasta file this barcode is present, I checked the sequence, it is the same as I use in barcodes file.
I feel stuck at this point. Maybe I am missing something that is obvious for a more skilled eye, but I do not understand why in 50% data the barcode is not found.
Do you by chance have any suggestions what may be causing this issue and how should I proceed from here?