PCR duplicates

21 views
Skip to first unread message

Silvia Bettencourt

unread,
Aug 14, 2025, 7:10:05 AMAug 14
to Stacks
good morning all,

background: i am starting my journey on learning to analyse RAD-SEQ data. 24 samples; tetraploid species; single digested RAD; QC ok; M/n optimized at 12, for m=3 and max_locus_stacks =5; using galaxy.eu to analyse data as it has STACKS 2 as I´m a beginner. 

DATA: average reads/sample 11M, ranging from 0.7M to 42M (just one sample under 1M). Ustacks mean coverage 12x; gstacks mean coverage 40x, ns_coverage per sample ranges from 9x to 124x.

problem: i think my data do not make sense coverage wise. plus the Gstacks log file does not provide me with the amount of PCR duplicates removed. Below is gstacks output.

Attempted to assemble and align paired-end reads for 806501 loci: 0 loci had no or almost no paired-end reads (0.0%); 4464 loci had paired-end reads that couldn't be assembled into a contig (0.6%); For the remaining 802037 loci (99.4%), a paired-end contig was assembled; Average contig size was 235.2 bp; 13332 paired-end contigs overlapped the forward region (1.7%) Mean overlap: 12.3bp; mean size of overlapped loci after merging: 212.2; Out of 75904698 paired-end reads in these loci (mean 90.0 reads per locus), 72205159 were successfuly aligned (95.1%); Mean insert length was 295.5, stdev: 116.8 (based on aligned reads in overlapped loci). Genotyped 802018 loci: effective per-sample coverage: mean=40.2x, stdev=36.4x, min=9.0x, max=124.7x mean number of sites per locus: 225.2 a consistent phasing was found for 100705 of out 127196 (79.2%) diploid loci needing phasing gstacks is done.

any suggestions on the coverage?
how can I understand the amount of PCR duplicates removed? the gstacks i´m using at galaxy.eu does not have a option to specifically check to remove PCR duplicates.


I appreciate your help and inputs.

Silvia Bettencourt


Reply all
Reply to author
Forward
0 new messages