Linked SNPs and parameter estimation

70 views
Skip to first unread message

Yasuto ISHII

unread,
Nov 27, 2024, 9:34:03 PM11/27/24
to fastsimcoal2
Dear Dr. Excoffier,

I hope this email finds you well.
I got in touch with you since I have some questions for fastsimcoal (fsc).

I'm now trying parameter estimation using SFS data which is derived from ddRAD-seq. My purpose is not only estimating demographic parameters but also selecting the most optimal model. I saw a previous post (link) in Google group, where you said pruning SNPs is necessary for model selection but problematic for parameter estimation.

Here are two questions about this post.
1) Why is SNPs pruning problematic for model selection?
I read Excoffier et al. (2013, PLoS Genet), but can't understand the reason...... Software generating SFS, such as easySFS, assume unlinked SNPs, so pruning SNPs seems necessary to reduce linkage disequilibrium.

2) What is the best scheme to perform parameter estimation as well as model selection?
In my understanding, the best scheme is selecting the optimal model WITHOUT pruning SNPs, and estimating parameters with pruning SNPs. Yet, this seems very time-consuming. Could you tell me a more effective way if any?

Best regards,
Yasuto

Laurent Excoffier

unread,
Nov 28, 2024, 4:36:31 AM11/28/24
to fastsimcoal2
Hi,

On Thursday, 28 November 2024 at 03:34:03 UTC+1 y.ishii wrote:
Dear Dr. Excoffier,

I hope this email finds you well.
I got in touch with you since I have some questions for fastsimcoal (fsc).

I'm now trying parameter estimation using SFS data which is derived from ddRAD-seq. My purpose is not only estimating demographic parameters but also selecting the most optimal model. I saw a previous post (link) in Google group, where you said pruning SNPs is necessary for model selection but problematic for parameter estimation.

Here are two questions about this post.
1) Why is SNPs pruning problematic for model selection?
I read Excoffier et al. (2013, PLoS Genet), but can't understand the reason...... Software generating SFS, such as easySFS, assume unlinked SNPs, so pruning SNPs seems necessary to reduce linkage disequilibrium.

The reasoning is that pruning is done by computing LD considering that all individuals come from a population in HW equilibrium. If you have some genetic structure LD will be created due to the genetic structure, and  a series of SNPs that are associated to high FST between populations will show high levels of LD, and might be removed by pruning whereas they are informative about your structure.

Besides, having SNPs in LD is no problem for parameter estimation as the lhood will be maximized for the same parameters than with unlinked SNPs, but you will just have less SNPs with unlinked ones, so less information. In addition, computing the proportion of monomorphic vs polymorphic sites will be problematic if you remove some linked snps, so that your estimation of mutation rate will be too high 

2) What is the best scheme to perform parameter estimation as well as model selection?
In my understanding, the best scheme is selecting the optimal model WITHOUT pruning SNPs, and estimating parameters with pruning SNPs. Yet, this seems very time-consuming. Could you tell me a more effective way if any?
 
For parameter estimation, I just gave you reasons why you should use all SNPs, and for model selection we usually also use all SNPs. Indeed these things are very time consuming, but probably less than the time it took you to collect the data and get the genotypes.  

Best regards,
Yasuto

Hope it helps

laurent 
Reply all
Reply to author
Forward
0 new messages