>90% barcode not found drops!

362 views
Skip to first unread message

Brian A

unread,
Apr 15, 2019, 3:32:47 PM4/15/19
to Stacks
Hello, I'm new to Stacks and have been using it to analyze some single digest (SbfI) paired-end RADseq data I inherited. I'm not sure if this matters, but the data are from 2 species. The samples use 8nt barcodes, and according to Rochette & Catchen, 2017, I should expect to see the beginning of each sequence start with <8-nt barcode>TGCAGG<unique sequence> when I visually inspect some of the raw reads in the fastq file. Instead, the sequences show no similar sequences at the beginning at all. 

I've used a command to return the next 5 nucleotides downstream of the barcode sequence, and for the most part, the most common sequence is TGCAN, where N is a variable nucleotide. This is a little encouraging since TGCA is part of the SbfI cut site, but the variable nucleotide afterward is troubling.

When I run Process Radtags, I receive 97% barcode not found drops. The command I use and output are as follows:

process_radtags -p /home/brallen/Pine_RAD/Pine4/clonefilter -b /home/brallen/Pine_RAD/Pine1/barcode1.txt -o /home/brallen/Pine_RAD/Pine4/clonefilter/process_radtags -e sbfI -P -r -c -q

  • 54502372 total sequences

  • 53065872 barcode not found drops (97.4%)

  •   25879 low quality read drops (0.0%)

  •  702424 RAD cutsite not found drops (1.3%)

  •  708197 retained reads (1.3%)


I've also tried to run clone_filter first, but that didn't improve anything. Any troubleshooting advice on what may be going wrong is appreciated! Specifically, I'm trying to determine if there was a problem with the sequencing or my analysis.

Thank you in advance for any help!
Brian A

Dennis Larsson

unread,
Apr 20, 2019, 2:44:12 PM4/20/19
to Stacks
It looks to me like your reads are cut using PstI, try using the same command but switch sbfI with pstI (pst + capital ' i '). I am pretty sure my PstI cut reads start with TGCA.
Reply all
Reply to author
Forward
0 new messages