Hello, I'm new to Stacks and have been using it to analyze some single digest (SbfI) paired-end RADseq data I inherited. I'm not sure if this matters, but the data are from 2 species. The samples use 8nt barcodes, and according to Rochette & Catchen, 2017, I should expect to see the beginning of each sequence start with
<8-nt barcode>TGCAGG<unique sequence> when I visually inspect some of the raw reads in the fastq file. Instead, the sequences show no similar sequences at the beginning at all.
I've used a command to return the next 5 nucleotides downstream of the barcode sequence, and for the most part, the most common sequence is TGCAN, where N is a variable nucleotide. This is a little encouraging since TGCA is part of the SbfI cut site, but the variable nucleotide afterward is troubling.
When I run Process Radtags, I receive 97% barcode not found drops. The command I use and output are as follows:
process_radtags -p /home/brallen/Pine_RAD/Pine4/clonefilter -b /home/brallen/Pine_RAD/Pine1/barcode1.txt -o /home/brallen/Pine_RAD/Pine4/clonefilter/process_radtags -e sbfI -P -r -c -q
54502372 total sequences
53065872 barcode not found drops (97.4%)
25879 low quality read drops (0.0%)
702424 RAD cutsite not found drops (1.3%)
708197 retained reads (1.3%)
I've also tried to run clone_filter first, but that didn't improve anything. Any troubleshooting advice on what may be going wrong is appreciated! Specifically, I'm trying to determine if there was a problem with the sequencing or my analysis.
Thank you in advance for any help!
Brian A