very low retained reads!

19 views
Skip to first unread message

Hagar Soliman

unread,
Aug 27, 2025, 12:16:24 PM (9 days ago) Aug 27
to Stacks
Hello everyone,

I am sharing with you a problem that has been driving me insane!!!!!! I would love your help to know whether it is a wet lab problem in the library prep itself, or something solvable with stacks.

Problem: Stacks discards almost 70% of my reads as RAD cut site not found or barcode not found.
Protocol used: double digestion with PstI and BfaI, paired-end Illumina sequence (150bp reads)
Stacks code used: process_radtags -P -p input_path -o output_path -b barcodes.txt --renz-1 PstI --renz-2 BfaI -r -c -q --inline_null --bestrad --rescue
Stacks version:  Stacks/2.59-GCCcore-10.2.0
Example of how my Fastq file looks:
@LH00309:469:22WGL2LT4:2:1101:1760:1028 1:N:0:TAAGGC
GGATTACAGGCACCCACCACTACACCTGGCTAATTTTTGTATTTTTAGTAAAGATAGGGAGAATATTTTAAAAAATAGTATAGACTAATTTGTTAAGTAATTATTGAATTATGTGCTACATATGAGCACTTAAGTAGACTTGATTTCTTTC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IIIIIIIIIIIIIIIIIII*IIII99I9IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII9I99IIIIIII9IIIIIIIII*II9I9II*IIII9III9
@LH00309:469:22WGL2LT4:2:1101:2327:1028 1:N:0:TTAGAC
GAAGACCCTCTCGGCATGGACGAGCTGTACAAGCTGTCGGGATCAGGCGGCGGCGGCTCCTTCGAGTGCAAGGATTGCGGCAAGGCCTTCATCCAGAAGAGCAACCTGATCAGACACCAGAGAACACACGGAGGATAAGTTAGTAATGAGC Stacks output report:
Total Sequences:  77296632
Barcode Not Found: 30135808 (39%)
Low Quality: 110679
RAD Cutsite Not Found: 20963855 (27.11%)
Retained Reads: 26086290 (33.75%)

I was told that our protocol adds extra "GG" at the start of the reads, which is why all of our barcodes start with GG (e.g. 
GGGGAAGAA, GGCAGAGAA, or GGTCGTCAA)

A few things I have looked at:
  • I noticed that there are inline barcodes in the R2 reads, so I tried to use the flag -inline_inline and change the barcode file so that the first two columns contain identical barcodes and the third column contains the sample ID, but it wouldn't run (I think because the barcodes are identical?).
  • I thought about disabling RAD site, but I am worried that would affect the confidence of my demultiplexing. 
  • I tried the flat --barcode-dist-2 2, it didn't dramatically enhance the retained percentage. 
  • Could there be a problem with reading R2 that's causing stacks not to retain reads (although the total number of reads is indeed the number of R1 and R2 reads combined).
    Is there something obvious that I am missing? I can't think of anything else besides something going wrong during the lab prep but I would love any input!!!

Catchen, Julian

unread,
Aug 27, 2025, 1:31:53 PM (9 days ago) Aug 27
to stacks...@googlegroups.com
Hi Hagar,

These are not BestRAD data (a single-digest RAD protocol) so you should omit that flag. Where do you expect the barcodes to appear? I see an Illumina index barcode in your FASTQ header, but did you also use inline barcodes with your ddRAD library prep?  When you look at the raw FASTQ file, do you see the barcodes at the start of your single-end reads? I also do not see the PstI cutsite overhang in your reads, do you see the PstI and BfaI overhangs in your sequenced, raw data?

Julian

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/stacks-users/8ee41829-6785-45d0-997b-47562f9a9c3fn%40googlegroups.com.

Hagar Soliman

unread,
Aug 28, 2025, 10:27:42 AM (8 days ago) Aug 28
to Stacks
Hi Julian!

  • Our lab used the Bestrad protocol (which is why we use this flag), and when I attempted to remove this flag once, the percentage retained dropped from 30% to 2%.
  • We expect the barcode to be in the first 9 nucleotides with all our sequences start with GG (I was told that the GG was added due to PstI digestions, although not all my sequences start with GG and get discarded, which could be a wet lab problem). The barcode index at the header is an i7 Illumina barcode located in all reads; that's why we didn't include it in the barcode file. Do you think it could be the problem? Here is what some of my good raw reads look like (and by good read, I mean that it starts with GG)
  • @LH00813:53:22VH7TLT4:4:1101:19073:1056 1:N:0:TAGCTT
    GGTATGAGATGCAGAAAGCACCAATCTCCATCCAACCATTTCTACCTTCTTCAGCATGCTGGAATCCTCTAACGATGTAAACAAATCGCCATATTCAATCGCTGCTGATATTACAAACTTGGAAATGGCTATCACTGTGCTGCTTGCACAA
    +
    9II*9****99II*III9II*II99III9I9I9I*I9III9II99I*99II99IIII9I9IIIIII*I9*I*I9II*I9*999***9*****99999***99*9999*99*999*999*99*9**9***99**9*999**9**999*9**9
    @LH00813:53:22VH7TLT4:4:1101:22115:1056 1:N:0:TAGCTT
    GGGACGGAATGCAGTACCCCCCGTGAATACTCCGCCGGTATGAAAAGTTCTTAATGTTAATTGAGTACCCGGTTCTCCAATCGATTGCCCTGCAATAATACCTACAGCTTCTCCCAATTCAACCAGGTCGCCGTGAGTAGGACTTCGGCCA
  • I believe the bases I underlined are the enzyme cut site because they are consistently located after the adaptors. 
What are your thoughts? 

Thank you so much! 

Catchen, Julian

unread,
6:03 PM (2 hours ago) 6:03 PM
to stacks...@googlegroups.com
Hi Hagar,

The published BestRAD protocol is a single-digest protocol. Due to the way the protocol works, the restriction enzyme cut site remnant and barcode combination can end up on either the single- or paired-end read (with standard RAD protocols, these are always on the single-end read).  So, when you specify the --bestrad flag to process_radtags, it will search both reads for the cutsite/barcode and reverse-complement them if it finds them on the paired-end read. I don’t know what the reads will look like in a double-digest variant or if the flag will do what it is expected to do with a standard BestRAD protocol.

As far as the barcodes go, if all your barcodes have a ‘GG’ prefix, you can just add that as part of the barcodes in the barcodes file supplied to process_radtags and it should work fine. (The cutsite overhang for PstI is “TGCAG”.)

Best,

Julian

Reply all
Reply to author
Forward
0 new messages