Gim statistic, model comparisions

43 views
Skip to first unread message

Hana Majerova

unread,
Mar 17, 2026, 8:26:00 AMMar 17
to dadi-user

Hi Ryan, 

 

I am using dadi-cli to generate my model data, successfully I hope! However, as statistics is quite new to me, I have some questions regarding the results. It would be very helpful to get some rules of thumb on how to handle the following:

Briefly, I am working with pseudo-diploidized, subsampled, and composite RadSeq data that has been cleared of paralogues. I have tested many models, and most yield similar results to the one appended here.

  1. Uncertainty in theta: For all my parameters except theta, I get reasonable confidence intervals (CIs). Theta, however, is always 'unconfident' (wide CIs). Is this acceptable? I understand theta is used primarily to calculate real-world values (times, population sizes, etc.), but can I still trust the rest of my demographic parameters? Is there a detectable reason why theta is consistently so uncertain? I have seen this behavior even with datasets containing paralogues, as well as on projected and unlinked datasets.
  2. Step Sizes: The confidence intervals for the parameters differ by an order of magnitude depending on the step size used. Is this normal? Should they be approximately the same, or is it okay to select one step size and present that as the result?
  3. Small CIs: Sometimes the confidence intervals are so small they seem biologically unrealistic. Is that okay? I’ve heard this can be due to a 'mathematical collapse' when a likelihood peak is too sharp, but I am unsure how to interpret this.
  4. Model Comparison: Is there a way to compare models generated by dadi-cli on composite data using their log-likelihoods? I know it should be possible to use CLAIC for non-nested models and LRT for nested models, but I cannot find a way to perform these in dadi-cli. Can these be inferred from the GIM (Godambe Information Matrix) results? If dadi-cli doesn’t support this directly, what is the best alternative?                 

Any suggestions or guidance would be greatly appreciated.

Thank you in advance and all the best,

Hana

founder_nomig_with paralogues.InferDM.bestfits.pdf
founder_nomig.Uncertainty_T1.txt
founder_nomig.bestfits.pdf
founder_nomig.InferDM.bestfits

Ryan Gutenkunst

unread,
Mar 23, 2026, 5:17:22 PMMar 23
to dadi...@googlegroups.com
Hello Hanna,

1. It is surprising to see that estimates with that large an uncertainty. It’s typically an easy parameter to estimate, since it just relates to the overall scale of the data.
2. You want to see consistency among the step sizes, to have confidence in the results.
3. That’s a sign of problem as well.
4. Dadi-cli does not have the composite nested LRT implemented right now, but is in the dadi Python version: https://dadi.readthedocs.io/en/latest/user-guide/likelihood-ratio-test/ 

If you have the computational resources, you can also just do a brute force bootstrap fitting of your model to many of the bootstrap data sets.

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/01c3d5c0-8484-4ba6-90e2-2837644512b5n%40googlegroups.com.
<founder_nomig_with paralogues.InferDM.bestfits.pdf><founder_nomig.Uncertainty_T1.txt><founder_nomig.bestfits.pdf><founder_nomig.InferDM.bestfits>

Hana Majerova

unread,
Mar 24, 2026, 2:22:20 AMMar 24
to dadi...@googlegroups.com
Hello Ryan, 

thank you for your answer. 
I understand that you wouldn't move forward with this statistic. 
Do you have any ideas about what might be wrong with my data or approch? It always outputs this same result, regardless of the filters or models I use. 

Best, 
Hana.

Hana Majerova

unread,
Mar 24, 2026, 4:01:19 AMMar 24
to dadi...@googlegroups.com
HI Ryan, 

I append .bestfits results, as I think it might be helpful.

Best, 
Hana
founder_nomig.InferDM.bestfits

Ryan Gutenkunst

unread,
Mar 24, 2026, 5:12:15 PMMar 24
to dadi-user
Hello Hana,

Take a look at the bootstrap datasets generated as part of the analysis. They should be similar to the initial input data, both visually and in terms of total segregating sites. A large uncertainty in theta could be caused by some issue with the bootstrapping that yields some data sets with very few SNPs.

You might also try rerunning the uncertainty analysis using the —logscale option. That will enforce positivity of the parameters.

Best,
Ryan

Hana Majerova

unread,
Mar 26, 2026, 5:53:49 AMMar 26
to dadi...@googlegroups.com
Hi Ryan,
 
thank you very much for you hint, it helped a lot. I checked my bootstraps and I realized that my problem was actually a bug in dadi-cli and that it was already solved (https://groups.google.com/g/dadi-user/c/JtWWy7w5Mvo/m/4XDjjbxABQAJ) . So I just updated my dadi-cli version.
 
I wonder if I could bother you with one more question regarding theta and step sizes: would you consider the confidence intervals in the appended files reliable? I am still not 100% sure how to interpret them, but I would say that those generated for founder_nomig are not reliable, as they do not follow the decrease in step sizes in the same direction. Those for vic_no_mig seem more reliable, as they decrease consistently with decreasing step sizes (though the decrease is small).
I believe this should be acceptable and interpretable at a step size of 0.001, as the RADSeq data are robust, or is it not?  The theta CI should also be correct as it represents approximately 7% of the value estimated by the model.
Please, is my logic correct?
Thank you very much for your answers.
 
Best,
Hana

vic_no_mig_Uncertainty_T1_updated.txt
founder_nomig_Uncertainty_T1_updated.txt

Ryan Gutenkunst

unread,
Mar 29, 2026, 12:29:12 AMMar 29
to dadi...@googlegroups.com
Hi Hana,

Regarding the uncertainties and step size, what you’re looking for is (rough) consistency among the step sizes. I don’t think these fit that bill for the first several parameters. Theta is well constrained in all the cases.

Best,
Ryan

To view this discussion visit https://groups.google.com/d/msgid/dadi-user/CAMkjdSGJkCqy58gv4hYYOBcMY-Y_-4vr8C62YPs_qCDtUjadPg%40mail.gmail.com.
<vic_no_mig_Uncertainty_T1_updated.txt><founder_nomig_Uncertainty_T1_updated.txt>

Hana Majerova

unread,
Apr 27, 2026, 6:10:08 AMApr 27
to dadi...@googlegroups.com
Hello Ryan,
I am still struggling with the interpretation of the GIM statistic across three step sizes. As you mentioned, it should be roughly consistent across all step sizes, but...
I re-created my dataset and re-ran my analyses. I am working with projected and unlinked RADseq data, and my goal is to select the most likely model rather than to find exact parameter values. I have one model where there is nice consistency across all three step sizes (0.1, 0.01, 0.001), and I am inclined to interpret it as the correct one. However, I have a second model with a significantly higher log(L) (or lower AIC), but the uncertainty value drops significantly at the smallest step size (0.001), while remaining consistent across the other two.
As I understand it, the GIM measures the curvature of the likelihood surface at these three step sizes, and ideally, the results should be consistent. However, I believe that at a certain point, the step size becomes too small to measure the real curvature; instead, it picks up the roughness of the likelihood surface and returns artificially small uncertainty values.

  1. In your experience, does this sudden drop at the lowest step size (0.001) occur often with RADseq data, and if so, how is it typically handled? 
  2.  I am tempted to interpret my data by saying the results are consistent at step sizes 0.1 and 0.01, but the values drop at 0.001 because the evaluator hits numerical noise rather than real curvature. Are there any arguments I might be missing against this interpretation? I assume a lower value implies a steeper surface, so it’s hard to imagine the 0.001 results as a real signal; it seems more like a "jagged" surface appearing when zoomed in. Is a "jagged" surface on an otherwise solid peak a reason to distrust the model?
  3. I noticed this happens mostly with parameters that have the lowest values (e.g. 0.3 in parameter space 0-1). Could this be why some parameters remain consistent across all three steps while others drop?
  4. Which of the two models would you prefer and why? Both are biologically plausible and I am not particulary interested in exact parameter values.
  5. What is the rule of thumb for roughly consistent? I assume 10x is too much, but what about 2x or 5x?
I would be very grateful for your opinion, as I find this point crucial for my data interpretation.
Best,
Hana


Ryan Gutenkunst

unread,
May 3, 2026, 4:05:42 PMMay 3
to dadi-user
Hello Hana,

I’m not aware of a correlation between step size issues and RADseq data. We don’t have a huge sample of cases considered.

One argument for the large step size would be if the step size of 0.1 (10% variation) or 0.01 (1% variation) is consistent with the magnitude of the uncertainties output. Then you’re measuring the curvature of the likelihood surface at the same scale as what matters for the actual parameter uncertainties.

I would not consider that step size issue a reason to distrust the model overall.

I don’t have a well-defined rule-of-thumb for consistency. At this point it’s a judgement call as to whether 2x or 5x chances the biological interpretation of what you’ve learned.

Best,
Ryan

Reply all
Reply to author
Forward
0 new messages