troubleshooting poor visual fit

86 views

Skip to first unread message

Mario Ernst

unread,

Oct 12, 2023, 10:38:40 AM10/12/23

to dadi-user

Hi Ryan,

I am running three 2D models:

no migration
symmetric migration
secondary contact

The optimization scheme consists of 3 rounds. The first two rounds have 25, and the last has 50 replicates. Each replicate can run up to 25 iterations, and the starting parameter values in each round are generated by 3-, 2-, and 1-fold perturbations.

Based on the AIC, the secondary contact scenario is the best fitting model. However, when visually comparing the observed and the simulated SFS, I am not sure I am satisfied with these results. The simulated SFS under different models look rather similar, and even the secondary contact model fails to explain very prominent features in the observed SFS. What is particularly intriguing to me is the relatively high abundance of alleles at medium frequencies (visible as a bump at the center of the SFS and in the residual plot). This is also visible in the 1D frequency spectra of each population.

In order to improve the fit of these models, I did consider running the optimization for longer, but I would be surprised if this solved the excess of middle-frequency variants in the residual plots. Particularly because the top five replicates seem to broadly converge. Hence, I am thinking that a better way forward would be to test additional models, although I wanted to keep the models as simple as possible.

Do you agree that the fit is relatively poor and that the best way to improve it is by testing more complex models? If so, are there any particular demographic events that you would add (such as bottlenecks)?

Thank you very much in advance for your help and for your guidance!

Results_Summary_Extended_top5replicatespermodel.txt

fitted_alleni_r70_pop1_pop2_reps_25_25_50_maxiters_25_25_25_folds_3_2_1.fs_no_mig.pdf

1D_alleni_r70_pop2_observed_sfs.png

fitted_alleni_r70_pop1_pop2_reps_25_25_50_maxiters_25_25_25_folds_3_2_1.fs_sec_contact_sym_mig.pdf

1D_alleni_r70_pop1_observed_sfs.png

fitted_alleni_r70_pop1_pop2_reps_25_25_50_maxiters_25_25_25_folds_3_2_1.fs_sym_mig.pdf

Ryan Gutenkunst

unread,

Oct 16, 2023, 2:27:17 PM10/16/23

to dadi-user

Hello Mario,

I agree that the fit seems relatively poor, particularly private alleles in population 1.

The bump at frequencies near 50% might be artifactual. That can arise from mapping errors to paralogous regions in the genome that generate mutations with exactly 50% frequency. Potentially these issues could be identified by an excess of heterozygotes in those sorts of sites. Or you can mask that region of the frequency spectrum.

More complex models, potentially including growth especially in population 1 seem advisable. Not modeling population size changes can lead to incorrect inferences about migration.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8233503/

Best,
Ryan

> --
> You received this message because you are subscribed to the Google Groups "dadi-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/ec0826e3-cb9f-4f4a-9e86-e95d9fa0bf8dn%40googlegroups.com.
> <Results_Summary_Extended_top5replicatespermodel.txt><fitted_alleni_r70_pop1_pop2_reps_25_25_50_maxiters_25_25_25_folds_3_2_1.fs_no_mig.pdf><1D_alleni_r70_pop2_observed_sfs.png><fitted_alleni_r70_pop1_pop2_reps_25_25_50_maxiters_25_25_25_folds_3_2_1.fs_sec_contact_sym_mig.pdf><1D_alleni_r70_pop1_observed_sfs.png><fitted_alleni_r70_pop1_pop2_reps_25_25_50_maxiters_25_25_25_folds_3_2_1.fs_sym_mig.pdf>

Reply all

Reply to author

Forward

0 new messages