Inquiry on Performing LRT Between Allotetra & Seg-Allotetra Models in dadi Pipeline

14 views
Skip to first unread message

宋炎峰

unread,
Nov 15, 2025, 2:02:27 PMNov 15
to dadi-user

I am currently conducting population genomic research on polyploid plants. I suspect that my study species may represent a segmental allotetraploid, and I would like to use your pipeline to explore this.

I would like to know how to perform a likelihood ratio test between the segmental allopolyploidy model and the allopolyploid model.
Specifically, which script in your pipeline should I use to compare these two models?

Thank you very much for your time and guidance.

Best regards,
Yan-Feng Song


Ryan Gutenkunst

unread,
Nov 24, 2025, 5:23:13 PMNov 24
to dadi-user
Hello Yan-Feng,

I don’t think we implement a likelihood ratio test in the published repository. But you can do it within dadi. The description of the methods is here: https://dadi.readthedocs.io/en/latest/user-guide/likelihood-ratio-test/ .

Briefly, you would fit both the “normal” and segmental allopolyploid models to your data. Then you would apply the LRT_adjust procedure to estimate the adjusted D statistic.

You would then be performing a test with one parameter on the boundary of the parameter space.

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/e6cbc3d7-e375-4206-a70f-fa5592987113n%40googlegroups.com.

宋炎峰

unread,
Nov 24, 2025, 7:39:45 PMNov 24
to dadi-user

Dear Ryan,

Thank you very much for your helpful explanation.

I have two follow-up questions regarding the model comparison:

  1. Selecting the best run:
    I ran both the allopolyploid and the segmental allopolyploid models 10 times each, with 100 optimization steps per run. To choose the best parameter set for each model, is it sufficient to simply select the run with the highest log-likelihood?

  2. Interpreting LRT between Allo vs. Segmental models:
    Based on cytological and genomic evidence, my study system is a homoeologous tetraploid. However, after reading the polyploid continuum paper and fitting the segmental model, I find that the estimated e<sub>ij</sub> parameter is always extremely small. The likelihood ratio test comparing the segmental and allopolyploid models is also not significant.
    Could you help me understand why the segmental model fails to outperform the strict allopolyploid model in this case? Does this simply indicate very low homoeologous exchange, or could it reflect limitations of the data/model?

Thank you again for your time and guidance.

Best regards,
Yan-Feng Song

Paul Blischak

unread,
Nov 25, 2025, 12:30:51 PMNov 25
to dadi-user
Hi Yan-Feng,

The code that we used to get the likelihood ratio of the allotetraploid vs. segmental allotetraploid model for Capsella is in the `analyze_capsella_results.py` script in the polyploid-demography GitHub repo on lines 296-315 (link: https://github.com/pblischak/polyploid-demography/blob/1a8b1ee2b2bc3abc107bcb68380dd162ce018f1d/capsella/analyze_capsella_results.py#L296-L315). I don't think that we used the LRT_adjust method though, but the difference in log-likelihood between the models was huge in our case.

As for your questions:

1) What do you mean when you say 100 optimization steps? Do you mean that you ran independent parameter optimizations from different random starting points? If so, then yes, selecting the parameters from the run with highest likelihood should be  fine. It can help to look at the parameter estimates from the different optimization runs sorted by decreasing likelihood values to make sure that any runs that were similar to the best run had similar parameter estimates.

2) The estimated rate of homoeologous exchange that we got for Capsella was very small, 6e-8. How much smaller is your estimate? Also, what do plots of the observed SFS look like when plotted with the allotetraploid and segmental allotetraploid best-fit models?

Best regards,
Paul

宋炎峰

unread,
Nov 25, 2025, 8:35:09 PMNov 25
to dadi-user
09828dc1-ef7f-43b6-9d9e-9129a4b7e5de.png
logL(allotet) = -3663220.3940139185
logL(segtet)  = -3907437.159312148
Raw LRT statistic 2*(logL_segtet - logL_allotet) = -488433.5305964593
Truncated X^2 for test (>= 0) = 0.0
Mixture chi-square p-value (0.5*chi^2_0 + 0.5*chi^2_1) = 1.0
The parameter estimates obtained so far show that the best-likelihood estimate of eij (dij) is 2.7 × 10⁻³.

Ryan Gutenkunst

unread,
Nov 26, 2025, 12:00:05 PMNov 26
to dadi...@googlegroups.com
Hello Yan-Feng,

Thanks for sending the data fit images. A few thoughts:
1) What is the sample size on your data (in terms of individuals and haplotypes)? Are your data polarized (so you know the ancestral state)?
2) As you’ll see in the paper, we expect an allotetraploid to have a massive spike of alleles at 50% frequency corresponding to segregation into the two sub genomes. The signal of homeologous exchange is the lowering and spreading of that spike. There is absolutely zero spike in your data, suggesting that you aren’t analyzing the allotetraploid genomes as the model expects.
3) More generally, none of your models fit the data well. This suggests that you also need to model some population size changes.

Best,
Ryan

To view this discussion visit https://groups.google.com/d/msgid/dadi-user/f8f3dec7-cc73-4a17-8182-888b49bdcfb1n%40googlegroups.com.
<09828dc1-ef7f-43b6-9d9e-9129a4b7e5de.png>

宋炎峰

unread,
Nov 26, 2025, 8:13:49 PMNov 26
to dadi-user
I used 147 tetraploid individuals from a single population, which we had previously assumed to be an autotetraploid. The data come from a VCF in which each individual carries four alleles, and I converted this to a folded frequency spectrum. Although the current evidence suggests that this species is an autotetraploid, we suspect that it may have originated as an allotetraploid, followed by frequent recombination events.

Ryan Gutenkunst

unread,
Dec 1, 2025, 10:42:27 AM (12 days ago) Dec 1
to dadi-user
Hello Yan-Feng,

That all seems appropriate. I see that your parameter estimate eij is 2.7e-3. Assuming your effective population size is at least a few thousand, in population genetic terms this is a large rate. It may not be easily measurable experimentally, but over many generations it will very effectively mix alleles between sub genomes.

Best,
Ryan

宋炎峰

unread,
Dec 1, 2025, 11:58:15 AM (12 days ago) Dec 1
to dadi-user

Thank you very much! So does this mean that our results do indeed support a segmental allopolyploid origin? If the species were a strictly autopolyploid lineage, what would the expected value of eij be?

Ryan Gutenkunst

unread,
Dec 1, 2025, 12:24:49 PM (12 days ago) Dec 1
to dadi-user
Given that neither models fits the data well, and your data don’t show evidence of the expected peak at 50% frequency from an allotetraploid, I would *not* interpret this as support for an segmental allotetraploid model.

Reply all
Reply to author
Forward
0 new messages