Comparing non-nested models with linked SNPs

10 views
Skip to first unread message

Abby Williams

unread,
Sep 5, 2025, 11:44:54 AM (3 days ago) Sep 5
to dadi-user
Hi Ryan,

I hope you are well. I am relatively new to SFS-based modelling and I have a question about model selection for linked SNPs. I have a dataset of genome-wide SNPs for which I have constructed 2D SFS without pruning for LD. I ran a series of 6 models using the Portik et al (2017) dadi_pipeline but am struggling to understand how I should select the best model given that my SNPs are linked. I understand that using AIC is not suitable for model comparison with linked SNPs because the likelihood becomes a composite likelihood. Neither can I use likelihood ratio test because my models are not nested. It is my plan to bootstrap the parameters for the best model but I'm not sure of the evidence I need to prove that the best model is indeed the best. What sort of metrics or tests would you suggest in this situation?

(For extra context, in case it's useful: I am working with low-coverage (4X) WGS data and have constructed my SFS using ANGSD realSFS which takes bam files as input. I have not pruned for LD at any stage (the one tool I have had success with on this kind of data is ngsLD, which takes several weeks to run on my data, and I'm not sure how I would integrate it into my pipeline anyway, so I am keen to avoid this step). If you have any recommendations for data processing I would be glad to hear them.)

Thank you very much in advance for your help and I look forward to hearing from you.

Kind regards,
Abby


Ryan Gutenkunst

unread,
Sep 6, 2025, 10:33:57 PM (2 days ago) Sep 6
to dadi-user
Hello Abby,

I’m sorry I don’t have a very satisfying answer. My group’s approach is generally not to rely on statistical tests based on likelihoods. Instead we examine residual plots and look for qualitative changes in them that are created by adding new features to the models. With WGS data, I think most approaches based on small likelihood differences will be anti-conservative and biased toward models that are too complex.

If your question is whether a given feature is well-supported by the modeling, the confidence intervals may demonstrate that, for example if the confidence interval for a population bottleneck parameter excludes 1.

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/f8cacf33-9867-4053-91dd-1077049bbc6fn%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages