Hi Ryan,
I am trying to convert optimized dadi parameters from 2D models to effective population sizes, divergence time in years, etc...
The converted effective population sizes are unrealistically large (in the million range) given my study organism, which I suspect comes from an error in my computation of the effective sequence length L.
I am working with medium coverage (~10X) whole genome sequencing data with SNPs imputed using the GATK pipeline. Following the recommandations in the manual and other posts in this group, I have computed L as
L=Nsequenced sites*(SNPdadi/SNPdetected)
For Nsequenced sites, I summed unvariant, SNPs and INDELs sites called by GATK (~160M sites).
For SNPdetected I used the number of SNPs originally detected in a population pair before filtration (~30M SNPs)
For SNPdadi I used the number of SNPs to produce a JAFS in a specific population pair. After filtration for low quality sites, missingness, mac, etc.. this resulted in 161,316 SNPs which was brought down to ~25,000 SNPs after filtering for physical linkage (to get unlinked SNPs for computation of confidence intervals using the FIM). So I used SNPdadi ~ 25,000 in the computation of L. Because I masked low frequency bins to accomodate sequencing errors, the real number of SNPs used is slightly lower ~21,000.
This gives me an L value of approximately 130,000. Using a mutation rate of 1.2*10^(-8) which has been estimated in a closely related taxon and the theta of my optimized models (~10,000) this gives an Nref value of ~2M which I don't think is realistic.
Do you see a specfic point in the procedure that is uncorrect ?
Thanks a lot for your help,
Hugo