Low percentage of reads retained in process

Lindsay Miles

unread,

Apr 11, 2025, 12:57:04 PMApr 11

to Stacks

Hi all,

I did my ddRADseq library prep according to Peterson et al (2012) with the pstI and mspI cutsites on a NextSeq2000 run. I cannot for the life of me get the samples demultiplexed.

It looks like the sequencing facility did not cut the Illumina adapters off and they ran with no index. So I used cutadapt to remove the Illumina adapters. The fastQC run shows high quality reads, so the data should be there. But when I run process_radtags I come across a suite of errors. First, no cut site found, but I used disable_rad_check to bypass that.

I use this code to only get a small percentage: process_radtags -P -p ./ -b ./barcodes.txt -o ./ --renz_1 -pstI --renz_2 -mspI -c -q -r --disable_rad_check -i gzfastq --inline-inline --adapter-mm 1

505555390 total sequences
497262826 barcode not found drops (98.4%)
233631 low quality read drops (0.0%)
0 RAD cutsite not found drops (0.0%)
8058933 retained reads (1.6%)

Any insight on how to recover more reads?

Thanks!

Lindsay

Gonzalo Flores

unread,

Apr 11, 2025, 4:40:48 PMApr 11

to Stacks

try to do the mapping next and see what you got.

Angel Rivera-Colón

unread,

Apr 14, 2025, 6:20:20 PMApr 14

to Stacks

Greetings Lindsay,

I would take a subset of the raw reads (before running cutadapt) and manually inspect them. I will try and find where the key elements (i.e., barcodes, cutsite, adapters) are in the FASTQ files. For example, if the library prep uses a PstI-MspI double digest with in-line barcodes, we expect the forward read to start with the barcode followed by the TGCAG of the PstI (see section 4.1.1 of the Stacks Manual for some examples). Compare those first few bases of the putative barcode against your list of barcodes to make sure they match, check for any differences in barcode lengths, verify that the enzymes are in the correct orientation (i.e., PstI in the forward and MspI in the reverse read), etc. Similarly, look for the presence of adapters to confirm that the trim is necessary. Any patterns here will give you an idea of the exact issue you encounter and how to handle it. It might be possible to address some these issues by changing the parameters in process_radtags; however, in other cases might reflect larger problems with the library preparation and sequencing. Nonetheless, keep a close eye on the barcodes, since they seem to be the main reason reads are discarded.

Afterwards, do the same thing with a subset of reads after processing with cutadapt. Are the barcodes and enzyme cutsites still in the expected location? Given that the main reason for discarding the read is the lack of barcodes, I would confirm that the trimming hasn't removed the barcodes for some reason. In my experience, if using in-line barcodes, the Illumina adapters mainly "eat" into the usable length of the sequencing read, and tend not to interfere with barcodes/cutsites given their position in the reads.

In summary, I think it is important to raw check the reads to make sure they follow the expected configuration as specified by the library prep. This might help you identify if it is just an issue addressable with just changing a parameter in the software, or if the problem is due to some larger problem in library preparation.

Low percentage of reads retained in process_radtags

Lindsay Miles

Gonzalo Flores

Angel Rivera-Colón