Unrealistic parameters conversion

65 views
Skip to first unread message

Hugo D

unread,
Mar 20, 2025, 9:12:44 PMMar 20
to dadi-user
Hi Ryan, 

I am trying to convert optimized dadi parameters from 2D models to effective population sizes, divergence time in years, etc... 

The converted effective population sizes are unrealistically large (in the million range) given my study organism, which I suspect comes from an error in my computation of the effective sequence length L. 

I am working with medium coverage (~10X) whole genome sequencing data with SNPs imputed using the GATK pipeline. Following the recommandations in the manual and other posts in this group, I have computed L as 

L=Nsequenced sites*(SNPdadi/SNPdetected)

For Nsequenced sites, I summed unvariant, SNPs and INDELs sites called by GATK (~160M sites). 

For SNPdetected I used the number of SNPs originally detected in a population pair before filtration (~30M SNPs) 

For SNPdadi I used the number of SNPs to produce a JAFS in a specific population pair. After filtration for low quality sites, missingness, mac, etc.. this resulted in 161,316 SNPs which was brought down to ~25,000 SNPs after filtering for physical linkage (to get unlinked SNPs for computation of confidence intervals using the FIM). So I used SNPdadi ~ 25,000 in the computation of L. Because I masked low frequency bins to accomodate sequencing errors, the real number of SNPs used is slightly lower ~21,000. 

This gives me an L value of approximately 130,000. Using a mutation rate of 1.2*10^(-8) which has been estimated in a closely related taxon and the theta of my optimized models (~10,000) this gives an Nref value of ~2M which I don't think is realistic. 

Do you see a specfic point in the procedure that is uncorrect ? 

Thanks a lot for your help, 

Hugo 

Ryan Gutenkunst

unread,
Mar 23, 2025, 7:28:22 PMMar 23
to dadi-user
Hello Hugo,

I don’t see any error in the logic you’ve outlined below. How big is your sample size? Your species has quite high genetic diversity, if ~1/5 of sites have a SNP in a typical sample size. That would tend to suggest a high effective population size, given the moderate mutation rate.

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/029405e1-b155-413d-9dbe-bde3fcdda049n%40googlegroups.com.

Hugo D

unread,
Mar 26, 2025, 2:47:14 AMMar 26
to dadi-user
Hi Ryan, 

Thank you for confirming that there isn't an issue with the computation procedure. You are right that this species seems highly diverse, so perhaps the effective population size is not that unrealistic. 

The original sample size was >70 individuals per population (70-600) but I used only 11 individuals per populaton in order to maximize the number of SNPs retained after filtration for missing data.

Best, 

Hugo 
Reply all
Reply to author
Forward
0 new messages