Some Questions on dadi-cli

77 views
Skip to first unread message

RAYIS T R

unread,
Jan 9, 2026, 12:29:56 AMJan 9
to dadi-user
Hi all,
I recently started using dadi-cli to infer demographic history from ddRAD-seq data. We generated site frequency spectra (SFS) using easySFS and used these as input for dadi-cli for a set of five species. For each species, we are testing both two-population and three-population demographic scenarios. We used the following set of Portik models to test different scenarios (each model with 100 runs of optimisations) and calculated the AIC using the equation AIC = 2*number of parameters - 2*Log Likelihood. We plan to select the model with the lowest AIC as the best supported scenario.

Models used:
2D models: no_mig, sym_mig, asym_mig, anc_sym_mig, anc_asym_mig, sec_contact_sym_mig, sec_contact_asym_mig
3D models: split_nomig, split_symmig_all, split_symmig_adjacent

Parameter bounds used:
nu: 1e-3 to 100
m: 0 to 10
T: 0 to 20

I had the following doubts that I would appreciate some input on:
  1. During the runs, we encountered the following warnings:
    “WARNING:Inference:Model is < 0 where data is not masked.
    WARNING:Inference:Number of affected entries is 25. Sum of data in those entries is 89.7342:”
    Should this be concerning?  If so, how can we overcome this?
  2. When I checked for convergence, only a few of the runs were converged. Would you recommend increasing the number of optimisation replicates, and if so, what would be a reasonable number to aim for?
  3. We got the warning:
    “WARNING: The converged parameters are close to the boundaries”
    The values appear to have converged closer to the lower bounds (near 0). We expect our system to have a recent divergence. Should we be concerned about these warnings?
  4. Does the AIC-based model we follow make sense, or should we use another method to determine the best-fit model?
  5. We are using VCFs thinned to retain SNPs at least 10kb apart. Is this appropriate for our analysis, or would you suggest LD pruning or using the non-thinned VCF?
Thanks & Regards,
Rayis.

Ryan Gutenkunst

unread,
Jan 10, 2026, 5:52:27 PMJan 10
to dadi...@googlegroups.com
Hello Rayis,

1. This should not be concerning if it is happening early in runs. If it is near the end, that’s a problem. (The warning indicates that dadi is struggling to compute the SFS for a given parameter value. Early in parameter optimization, that’s okay because it typically happens in corners of parameter space that are unlikely.)

2. Yes, increasing replicates the best thing to do. You can also try restarting optimizations from your existing set of replicates. Unfortunately, there’s not a great heuristic for judging. Dadi-cli does have the option to force convergence, which will run optimizations until convergence is achieved.

3. No, it shouldn’t be an issue.

4. The AIC is reasonable if you are analyzing unlinked SNPs. I personally am not a big fan of it, because it doesn’t really assess the quality of the fit itself. I highly suggest inspecting residual plots to ensure your model is a good representation of the data.

4. The non-thinned SFS would give greater statistical power. But then you can’t use AIC for model selection, because linked SNPs yield a composite likelihood. Dadi-cli has methods built in to deal with that, but the composite-likelihood AIC isn’t really well defined. There is a well-defined likelihood ratio test for composite likelihoods, if you are exploring nested models.

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/66312c61-08e7-4a7b-bfb6-c0f25354e2f8n%40googlegroups.com.

RAYIS T R

unread,
Jan 13, 2026, 5:20:48 AMJan 13
to dadi-user
Thanks, Ryan! This is very helpful.

RAYIS T R

unread,
Jun 6, 2026, 7:11:47 AM (8 days ago) Jun 6
to dadi-user

Hi Ryan,


Following our previous discussions, we performed demographic inference analyses for a set of species using ddRAD sequencing data. We fitted the following three-population demographic models:


  • split_nomig

  • split_symmig_adjacent

  • split_symmig_all

  • ancmig_adj_1

  • ancmig_adj_2

  • ancmig_adj_3

  • refugia_adj_1

  • refugia_adj_2

  • refugia_adj_3


For reference, I have attached the results for two species datasets: dadi-cli-sample-results


I had a few questions and would appreciate your input.


  1. We are using the following equations to convert the population-scaled estimates from dadi into biological units:


Effective sequence length, L = Number of sequenced sites * (number of SNPs input in dadi / total SNPs detected)


N_ref = theta / (4*mut_rate*L) 


Ne = nu*N_ref

M = m / (2*N_ref)

t = T*2*N_ref*gen_time


AIC = 2*Number_of_paramters - 2*Log(likelihood)


Are these fine? Would you recommend any modifications?


  1. We were planning to use AIC as a metric for model selection. But as you can see from the table (Species A), the AIC values are very similar across several models, with no clear best-supported model. What would you suggest in such a case? 

My concern is that the parameter estimates differ substantially among these competing models. For example, the estimated divergence times for three highlighted models are on the order of 1e5, 1e4, and 2e6 years, respectively.


  1. You previously suggested examining residual plots to evaluate model fit. I have attached the residual plots of the same analyses here: dadi_sample_resedual_plots. Could you please advise on how best to assess model fit from these plots and whether any of the models appear clearly preferable based on the residual patterns, again taking into consideration the variability of estimated parameters?


  1. Some models (highlighted in Species B) have not converged despite very long runtimes (~800 CPU hours). They seem to have less convincing residual plots, but their AICs are lower than those of the converged models. What would you recommend in such cases? Should we exclude them from further analyses of the results, wait for the models to converge, or some other suggestion?


  1. Our sample sizes are relatively small, with approximately 5–10 individuals per population. Could the limited sample sizes be contributing to these issues?


Thanks and Regards,

Rayis.

Reply all
Reply to author
Forward
0 new messages