Bias in residuals after physical pruning

34 views

Skip to first unread message

Hugo D

unread,

Mar 30, 2025, 9:29:28 PMMar 30

to dadi-user

Hi Ryan,

I have spent a lot of time optimizing some 2D models (>1,000 runs), using multiple rounds of 3,2 and 1 fold perturbations and trying to correctly explore the whole parameter space.

I was originally using a set of ~160,000 linked loci (no LD or physical pruning) with JAFS built from11 individuals from 2 populations using the full dataset with no projection (Group1-Group2_allSNPs_noprojection.png).

Once having obtained my final model, the residuals looked good with no clear structure (Group1-Group2_asym_mig_allSNPs_noprojection.pdf) but I was having issues getting parameters confidence intervals with the GIM (most intervals ranged from 0 to infinite despite running >1000 of bootstraps and trying several eps values).

Therefore I decided to use a set of unlinked loci in order to use the FIM. Based on rates of LD decay (LD_decay_allgroups.png) I chose to remove SNPs within 5kb of each other to obtain such dataset. Because this greatly reduced the number of SNPs to 25,000 the resulting JAFS was quite sparse (Group1-Group2_unlinked_5kb_noprojection.png) and therefore I chose to project down to 9 individuals per population (Group1-Group2_unlinked_5kb_projection_18-18.png).

Now after repeating the optimization steps and picking the best model, the residuals are highly structured indicative of a spurious fit (Group1-Group2_20mis_unlinked_5kb_asym_mig_all4.pdf). I am not sure what is the reason for it. I have tried using different thresholds (e.g. 1kb or 20kb) for the physical pruning but the problem remains the same. For comparison I also tried to project down my initial set of linked SNPs and use the optimized parameters to visualize the residuals and it was introducing a similar bias (Group1-Group2_20mis_asym_allSNPs_projected.pdf), but I haven't re-optimized the parameters for the projected spectrum in that case.

I should say that the model (divergence with asymetrical migration) was picked based on LLR tests with 4 other models (including divergence without migration and a no divergence model) where it was consistently better across runs.

Do you think the pruning/projecting procedure is removing informative SNPs causing this bias ? Do you have suggestions on a better way to proceed ?

Thanks a lot for your help,

Kind regards,

Hugo

Group2-Group4_20mis_unlinked_5kb_asym_mig_all4.pdf

LD_decay_allgroups.png

Group1-Group2_allSNPs_noprojection.png

Group1-Group2_20mis_unlinked_1kb_asym_mig_all4.pdf

Group1-Group2_unlinked_5kb_noprojection.png

Group1-Group2_20mis_asym_allSNPs_projected.pdf

Group1-Group2_asym_mig_allSNPs_noprojection.pdf

Ryan Gutenkunst

unread,

Apr 1, 2025, 6:48:03 PMApr 1

to dadi...@googlegroups.com

Hello Hugo,

This is a bit perplexing. If you have a well-fitting model with the unprojected data, I’m surprised that it doesn’t fit so well when both the data and the model (same parameters) are projected downward as in Group1-Group2_20mis_asym_allSNPs_projected.pdf. I’ll have to think about that one.

But more broadly, there’s no need to project downward to do the statistics on the pruned data. My suggestion would be to fit the pruned data at full sample size and then evaluate how much the inferred parameters for those data differ from your initial fit. If they’re close, then you can use the pruned data to estimate confidence intervals using the FIM. If they’re qualitatively different, then that suggests there might be an overfitting issue.

Best,

Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/8fcb61ea-1817-482c-982b-80c8a7949c59n%40googlegroups.com.
<Group2-Group4_20mis_unlinked_5kb_asym_mig_all4.pdf><LD_decay_allgroups.png><Group1-Group2_allSNPs_noprojection.png><Group1-Group2_20mis_unlinked_1kb_asym_mig_all4.pdf><Group1-Group2_unlinked_5kb_noprojection.png><Group1-Group2_20mis_asym_allSNPs_projected.pdf><Group1-Group2_asym_mig_allSNPs_noprojection.pdf>

Reply all

Reply to author

Forward

0 new messages