Negative correlation between mean read depth and heterozygosity

26 views

Skip to first unread message

Liz Boggs

unread,

Feb 23, 2026, 7:31:14 PMFeb 23

to Stacks

Hi Stacks community!

I am working on a RAD-seq dataset with ~160 individuals. I have processed this dataset using process_radtags, clone_filter, and ref_map. In ref_map I used flags P1 and R80, then filtered the resulting SNP set using VCFtools (minDP 10, min-meanDP 15, max-missing 0.80, mac 3). Removed individuals with >25% missing data and selected the SNP with the highest MAF on each RAD tag, then filtered loci out of HWE. Resulting dataset is about 6000 SNPs.

When plotting individual mean depth against individual observed heterozygosity (VCFtools outputs --het and --depth), I get a pretty strong negative correlation (see attached plot). When looking at missing data, it naturally followed that the samples with highest heterozygosity (and, thus, lower mean depth) were also the ones with the highest missing data, but again the maximum missing a sample can have is 25% (most samples were below 15%). I saw the same trend in a subset of the data that was filtered slightly differently (minDP 5 and maf 0.05; other params the same). I also checked the site mean depth and didn't see any extra bumps in higher depths, so I'm not sure that paralogous loci would be the issue either (but didn't look much into paralogs beyond that).

It makes sense that samples with lower read depth would have more spurious heterozygotes, but we're just wondering if this is a commonly-seen trend in other data or if I should be more concerned that what I'm seeing is coming from a separate issue with my pipeline. If the default model used is marukilow, is this an issue with having samples with >200 average mean depth perhaps? We obviously have a pretty big range of averages here.

Thanks in advance!

Best,
Liz

Reply all

Reply to author

Forward

0 new messages