Strong patterning in residuals: model choice or unrealistic spectra?

41 views
Skip to first unread message

Katharine Prata

unread,
Jun 3, 2024, 12:35:45 AMJun 3
to dadi-user
Hi all,

I have five groups and am doing pairwise comparisons amongst all of them. I first ran 1D models on each to look for signals of pop size changes and bottlenecks (i.e., bottlegrowth example model). All of them had bottleneck as the most likely model, where there is an initial expansion then contraction.

I then ran the 2D models for various scenarios of the standard models, divergence with no migration, asymmetric migration and then all divergence scenarios with popsize changes and bottlenecks and then also with periods of isolation and migration (i.e., ancient migration, secondary contact).

However, for all model scenarios I get heavily patterned residuals (see attached).

Is this because the model choices are incorrect or the spectra are unrealistic?

I would have thought that I am running enough of models to at least some to be more realistic?

Appreciate all the advice!

Cheers,
Kat
ahya.unfold.het05.group1.Amil_snp6966117_projectedasym_mig_all4.pdf
ahya.unfold.het05.group1.group3_snp6966117_projectedsplit_bottle_asym_mig_all4.pdf
ahya.unfold.het05.group3.group4_snp6966117_projectedsec_cont_asym_mig_all4.pdf

Ryan Gutenkunst

unread,
Jun 4, 2024, 7:17:10 PMJun 4
to dadi-user
Hello Kat,

I think what you’re seeing here is simply a result of having a lot of SNPs. Almost every bin in your spectra has > 10^4 SNPs, which means there is a lot of power to detect deviations between the model and the data. This makes the residuals all really large, so they’re saturating when you plot with a restricted resid_range. I would start by plotting with that range unrestricted, which will give you a better picture of how the residuals vary across the sfs.

Best,
Ryan
> --
> You received this message because you are subscribed to the Google Groups "dadi-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/f887e03c-4560-42f7-8824-a3a15406a2bcn%40googlegroups.com.
> <ahya.unfold.het05.group1.Amil_snp6966117_projectedasym_mig_all4.pdf><ahya.unfold.het05.group1.group3_snp6966117_projectedsplit_bottle_asym_mig_all4.pdf><ahya.unfold.het05.group3.group4_snp6966117_projectedsec_cont_asym_mig_all4.pdf>

Katharine Prata

unread,
Jun 5, 2024, 11:13:36 PMJun 5
to dadi...@googlegroups.com
Hi Ryan,

Thanks for getting back so quickly!

Ah I see whoops! Here are examples with an unrestricted residual range. The residuals are not shown as extreme but there's still a bit of patterning there.

These examples are from my preliminary runs as I plan to do more optimisations, so these results at the moment are exploratory and results from runs with large fold sizes for parameter optimisation.

I am showing two different models that had first and second lowest AIC for one population piar. Models are 1) divergence in isolation and then a period of asymmetric gene flow (i.e., secondary contact) and 2) divergence in isolation with instantaneous size change, then a second period with exponential size change with asymmetric gene flow (allowing different pop sizes for each pop in each time period).

Both the models are underestimating fixed and shared lower frequency mutations, which could require more divergence or pop contraction for fixed or more migration for shared low freq or do you have any other thoughts on this - could selection possibly be creating this pattern?. The bottleneck does better at the fixed sites which makes sense even though the first simpler model with secondary contact has a lower AIC.

With this patterning and high numbers of SNPs shall I explore even more complex models? Perhaps with separating periods of migration and popsize changes (i.e, 3 epochs). Or maybe just more optimisations with higher point extrapolation?

Just some thoughts for now!

Cheers,
Kat


ahya.unfold.het05.group1.Amil_snp6966117_projectedsec_cont_asym_mig_all4.pdf
ahya.unfold.het05.group1.Amil_snp6966117_projectedsplit_bottle_sec_asym_mig_all4.pdf

Ryan Gutenkunst

unread,
Jun 7, 2024, 1:58:39 PMJun 7
to dadi-user
Hello Kat,

The biggest outlier I see is the large excess of SNPs in the data versus model that are fixed in Amil but absent in group1. I struggle to identify any biology that could cause that. Perhaps there some sort of data processing issues there?

I think you’re right about your interpretation of the residuals at low frequencies. Selection could be contributing, but it might be challenging to segment that out from the numerous other possibilities.

I’m in general not a big fan of really complex models. I think it’s easy to walk down a path were you’re really fitting subtle data biases rather than new aspects of the biology. I would first work to ensure you’re fully optimized and to understand what’s happening in the lower-right of the SFS. You could try running with that corner masked, to see how much it was influencing the overall fit.

Best,
Ryan
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/CAF6Sa_WRRCUbT5WQ%2Bm%2BVxTfbqcXyw6Hgcnt2B2uoLytJKRKvig%40mail.gmail.com.
> <ahya.unfold.het05.group1.Amil_snp6966117_projectedsec_cont_asym_mig_all4.pdf><ahya.unfold.het05.group1.Amil_snp6966117_projectedsplit_bottle_sec_asym_mig_all4.pdf>

Reply all
Reply to author
Forward
0 new messages