Hi Ryan,
I hope you are well. I am relatively new to SFS-based modelling and I have a question about model selection for linked SNPs. I have a dataset of genome-wide SNPs for which I have constructed 2D SFS without pruning for LD. I ran a series of 6 models using the Portik et al (2017) dadi_pipeline but am struggling to understand how I should select the best model given that my SNPs are linked. I understand that using AIC is not suitable for model comparison with linked SNPs because the likelihood becomes a composite likelihood. Neither can I use likelihood ratio test because my models are not nested. It is my plan to bootstrap the parameters for the best model but I'm not sure of the evidence I need to prove that the best model is indeed the best. What sort of metrics or tests would you suggest in this situation?
(For extra context, in case it's useful: I am working with low-coverage (4X) WGS data and have constructed my SFS using ANGSD realSFS which takes bam files as input. I have not pruned for LD at any stage (the one tool I have had success with on this kind of data is ngsLD, which takes several weeks to run on my data, and I'm not sure how I would integrate it into my pipeline anyway, so I am keen to avoid this step). If you have any recommendations for data processing I would be glad to hear them.)
Thank you very much in advance for your help and I look forward to hearing from you.
Kind regards,
Abby