Dear Ryan and dadi-users,
First of all, many thanks for maintaining this group! I’ve been learning a lot from the previous discussions.
I’m working with ddRad seq data from 15 populations of a single species, each with 7-10 individuals. I don’t have an outgroup, so I’m working with folded fs and have used only a single SNP per locus, assuming they are unlinked. Using Admixture, I identified 4 clusters among these populations. I pooled all individuals from each cluster into one dadi population and started by testing some 1D models for them.
I followed the optimization routine from Portik et al. (2017), running at least 5 rounds of optimization with 100 replicate optimizations per round. The starting parameters were initially random, but after each round, the best scoring replicate's parameters were used to generate perturbed starting parameters for the next round. Convergence was considered achieved when at least two replicates in each of the last two rounds reached the same optimum (with the best log-likelihood values within a 0.005 difference). If not, I increased the number of rounds until convergence was reached. I repeated this routine multiple times for each model, starting with random parameters each time, to produce independent versions of the optimum. Comparing these versions should help determine if a global optimum was found.
Now I’m a bit stuck with properly understanding the results, especially the estimated parameters. In specific:
I appreciate any insights or advice you can provide!
Best regards,
Yuanshu
On Aug 22, 2024, at 9:18 AM, Yuanshu Pu <yuans...@evobio.eu> wrote:Dear Ryan and dadi-users,
First of all, many thanks for maintaining this group! I’ve been learning a lot from the previous discussions.
I’m working with ddRad seq data from 15 populations of a single species, each with 7-10 individuals. I don’t have an outgroup, so I’m working with folded fs and have used only a single SNP per locus, assuming they are unlinked. Using Admixture, I identified 4 clusters among these populations. I pooled all individuals from each cluster into one dadi population and started by testing some 1D models for them.
I followed the optimization routine from Portik et al. (2017), running at least 5 rounds of optimization with 100 replicate optimizations per round. The starting parameters were initially random, but after each round, the best scoring replicate's parameters were used to generate perturbed starting parameters for the next round. Convergence was considered achieved when at least two replicates in each of the last two rounds reached the same optimum (with the best log-likelihood values within a 0.005 difference). If not, I increased the number of rounds until convergence was reached. I repeated this routine multiple times for each model, starting with random parameters each time, to produce independent versions of the optimum. Comparing these versions should help determine if a global optimum was found.
Now I’m a bit stuck with properly understanding the results, especially the estimated parameters. In specific:
- In the attached table, I summarized 10 optimization versions of a 2-epoch 1D model (population increase followed by a decrease). Two versions (v1, v8) reached the optimum with the lowest AIC (142.68), while five versions (v4,v5,v7,v9,v10) reached an optimum with AIC=142.70, with slightly different parameters (see the attached plots for estimation of parameter T1 and nu1 from different versions). I’m wondering, which optimum should be considered global: the one with the lowest AIC or the one supported by more independent versions? Alternatively, should they be considered the same since the difference is minor? If so, is it valid to report a range (like a 95% confidence interval) of the converged replicates of parameter estimates rather than just the single best log-likelihood one? Or instead, are more rounds of optimization needed for more precise convergence (perhaps aiming for a log-likelihood difference of 0.0001 instead of 0.005)?
- I’m uncertain about the concept of “nested models” with which the AIC are comparable. With one same fs, I tested a 1epoch model with linear population decline (optimized AIC = 203.16, with two parameters; see attached plot of comparison between model and data); and a 2epoch model with population increase in T1 followed by a decline in T2 (optimized AIC = 142.7, with four parameters; I have trouble plotting the result because the current population size at convergence is 0). Can I directly compare these results to interpret that “a historical increase before population decline is better supported”, or should I run the 1epoch model in a 2epoch framework, with the same four parameters but with restrictions between them?
- The parameter estimations of the 1epoch and 2epoch models from question2 are drastically different: the 1epoch model suggested a population decline from Ne=58,6000 to Ne=290 over just 10k years; while the 2epoch model estimated an initial population increase from Ne=50,000 to Ne=1,000,000 over 70[10-90] k years, followed by a decrease to Ne=0 over ~170[150-200] k years. I’m wondering why this occur (i.e. why is the decrease time in the 2epoch model much longer than in the 1epoch model), and how to deal with the large variation in some parameters’ estimations.
- Lastly, is it possible to find different patterns for each population within the same ADMIXTURE cluster, or is pooling all populations of each cluster into one “dadi population” sufficient?
I appreciate any insights or advice you can provide!
Best regards,
Yuanshu
<REST_mod2_incdec_optimum_1_summary_table.jpg>
<REST_mod2_incdec_optimum_1_actual.nu1_violin_plot.png><REST_mod2_incdec_optimum_1_actual.T1_violin_plot.png><REST_mod1_fs.jpg>
--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/ac752ad0-4541-4a7f-b0eb-dc2a712ff74fn%40googlegroups.com.
<REST_mod1_fs.jpg><REST_mod2_incdec_optimum_1_actual.T1_violin_plot.png><REST_mod2_incdec_optimum_1_actual.nu1_violin_plot.png><REST_mod2_incdec_optimum_1_summary_table.jpg>