Hello everyone,
I am sharing with you a problem that has been driving me insane!!!!!! I would love your help to know whether it is a wet lab problem in the library prep itself, or something solvable with stacks.
Problem: Stacks discards almost 70% of my reads as RAD cut site not found or barcode not found.
Protocol used: double digestion with PstI and BfaI, paired-end Illumina sequence (150bp reads)
Stacks code used: process_radtags -P -p input_path -o output_path -b barcodes.txt --renz-1 PstI --renz-2 BfaI -r -c -q --inline_null --bestrad --rescue
Stacks version: Stacks/2.59-GCCcore-10.2.0
Example of how my Fastq file looks:
@LH00309:469:22WGL2LT4:2:1101:1760:1028 1:N:0:TAAGGC
GGATTACAGGCACCCACCACTACACCTGGCTAATTTTTGTATTTTTAGTAAAGATAGGGAGAATATTTTAAAAAATAGTATAGACTAATTTGTTAAGTAATTATTGAATTATGTGCTACATATGAGCACTTAAGTAGACTTGATTTCTTTC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IIIIIIIIIIIIIIIIIII*IIII99I9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9I99IIIIIII9IIIIIIIII*II9I9II*IIII9III9
@LH00309:469:22WGL2LT4:2:1101:2327:1028 1:N:0:TTAGAC
GAAGACCCTCTCGGCATGGACGAGCTGTACAAGCTGTCGGGATCAGGCGGCGGCGGCTCCTTCGAGTGCAAGGATTGCGGCAAGGCCTTCATCCAGAAGAGCAACCTGATCAGACACCAGAGAACACACGGAGGATAAGTTAGTAATGAGC
Stacks output report:
Total Sequences: 77296632
Barcode Not Found: 30135808 (39%)
Low Quality: 110679
RAD Cutsite Not Found: 20963855 (27.11%)
Retained Reads: 26086290 (33.75%)
I was told that our protocol adds extra "GG" at the start of the reads, which is why all of our barcodes start with GG (e.g. GGGGAAGAA, GGCAGAGAA, or GGTCGTCAA)
A few things I have looked at:
- I noticed that there are inline barcodes in the R2 reads, so I tried to use the flag -inline_inline and change the barcode file so that the first two columns contain identical barcodes and the third column contains the sample ID, but it wouldn't run (I think because the barcodes are identical?).
- I thought about disabling RAD site, but I am worried that would affect the confidence of my demultiplexing.
- I tried the flat --barcode-dist-2 2, it didn't dramatically enhance the retained percentage.
- Could there be a problem with reading R2 that's causing stacks not to retain reads (although the total number of reads is indeed the number of R1 and R2 reads combined).
Is there something obvious that I am missing? I can't think of anything else besides something going wrong during the lab prep but I would love any input!!!