Help with IM models

vivi....@gmail.com

unread,

Aug 27, 2015, 9:10:36 AM8/27/15

to fastsimcoal

Hi all,

I hope some of you can help me. I just started working with fastsimcoal together with some other people in our lab. We would like to test some different IM models using (two population diverging with gene flow) and have tried to make some models to test. We have s site-frequency-spectrum so we want to run several models in order to test which scenario fits our data the best.

So far we came of with a model looking like the one below. I just present a simple case with isolation with migration and constant population sizes and a possible change in the ancestral population. Our problem is that this model looks somewhat different from the one presented in the fastsimcoal manual (page 64) and although we think it is correct we started to have a doubt. The main differences are the way we describe the splitting event but also the way we define the migration priors.

I present both our model and the one from the manual. I hope some of you can comment on whether or not these two models actually simulate the same scenario or if they are different. In the latter case we hope you can explain the difference in the assumed models.

Our IM model. Two populations diverging with gene flow:

//Number of population samples (demes)
2
//Population effective sizes (number of genes)
N1
N2
//Sample sizes
56
56
//Growth rates : negative growth implies population expansion
0
0
//Number of migration matrices : 0 implies no migration between demes
1
//Migration matrix 0
0 M01
M10 0
//historical event: time, source, sink, migrants, new size, new growth rate, migr. matrix
2 historical event
T1 0 0 1 R1 0 0
T1 1 0 1 1 0 0
//Number of independent loci [chromosome]
65000 0
//Per chromosome: Number of linkage blocks
1
//per Block:data type, number of loci, per generation recombination and mutation rates and optional parameters
FREQ 100 0 1e-8

Our prios

// Priors and rules file
// *********************
[PARAMETERS]
//#isInt? #name #dist.#min #max
//all Ns are in number of haploid individuals
1 N1 unif 100 5000000 output
1 N2 unif 100 5000000 output
1 NANC unif 100 5000000 output
1 T1 unif 100 5000000 output
0 M01 logunif 1e-20 1e-1 output
0 M10 logunif 1e-20 1e-1 output
[RULES]
[COMPLEX PARAMETERS]
0 R1 = NANC/N1 hide

The IM model from the manual (page 64):

//Parameters for the coalescence simulation program : simcoal.exe
2 samples to simulate :
//Population effective sizes (number of genes)
NPOP1
NPOP2
//Samples sizes and samples age
20
30
//Growth rates: negative growth implies population expansion
0
0
//Number of migration matrices : 0 implies no migration between demes
2
//Migration matrix 0
0 MIG21
MIG12 0
//Migration matrix 1
0 0
0 0
//historical event: time, source, sink, migrants, new deme size, growth rate, migr mat index
1 historical event
TDIV 0 1 1 RESIZE 0 1
//Number of independent loci [chromosome]
1 0
//Per chromosome: Number of contiguous linkage Block: a block is a set of contiguous loci
1
//per Block:data type, number of loci, per gen recomb and mut rates
FREQ 1 0 2.5e-8e-8

// Priors and rules file
// *********************
[PARAMETERS]
//#isInt? #name #dist.#min #max
//all N are in number of haploid individuals
1 ANCSIZE unif 100 100000 output
1 NPOP1 unif 100 100000 output
1 NPOP2 unif 100 100000 output
0 N1M21 logunif 1e-2 20 hide
0 N2M12 logunif 1e-2 20 hide
1 TDIV unif 100 20000 output
[RULES]
[COMPLEX PARAMETERS]
0 RESIZE = ANCSIZE/NPOP2 hide
0 MIG21 = N1M21/NPOP1 output
0 MIG12 = N2M12/NPOP2 output

Miguel Navascués

unread,

Aug 28, 2015, 4:42:47 AM8/28/15

to fastsimcoal

Hi,

In your model you have migration (migration matrix 0) between the "ancestral" population (pop 0) and pop 1. In the example this is avoided by using a second migration matrix with all migration rates set to zero.

The way the spliting event is set in your model is not a problem, the result will be the same than in the exemple.

Regarding the priors, they are different... the question is whether do you think they are appropriate for your study case. You can plot (for instance with the density() function in R) the results from some simulations to get an idea on the shape of the prior distribution from the exemple and judge from that...

Cheers,

Miguel

Laurent Excoffier

unread,

Aug 30, 2015, 2:55:51 AM8/30/15

to fastsimcoal

Yes, Miguel is right, your model is not an IM model because there will alway be some gene flow between your populations. You need to switch to a new no-migration matrix at the time of the merging of the two daughter populations.
Your event (one is enough in your case) should look like:
1 historical event
T1 1 0 1 1 0 1

Assuming you have a second migration matrix with zeroes in all entries

You should also change

//Number of independent loci [chromosome]
65000 0

to

//Number of independent loci [chromosome]
1 0

best

laurent

vivi....@gmail.com

unread,

Aug 31, 2015, 7:54:16 AM8/31/15

to fastsimcoal

Hi Laurent and Miguel,

Thank you so much for your reply. I have started some analyses now using the model you suggested and it runs smoothly. It will be interesting how much this will differ from my old runs. THANK YOU.

I have another question that we have discussed in my group regarding estimating confidence intervals on the various parameter estimates. It seems to us that the most accurate thing to do is to bootstrap over your own data: meaning that if you use a site-frequency-spectrum then you would bootstrap over your individual SNP data (as an example) in order to generate new SFS data sets for fsc analysis. However, it seems that some people estimate the confidence by generating e.g. 100 data sets corresponding to the estimated parameter values (I think that such can be found in the .par files) and then run fsc on these datasets. Which of these two options would you find most appropriate and why? We talked a lot about this and to us it seems most accurate to bootstrap over your own data instead of simulating new data but we may be mistaken?

A last thing I thought a bit about is if anyone tried to incorporate possible selection into your models. Although most e.g. SNPs would be assumed to be under neutral evolution some may be under positive selection (e.g. between two species) and depending on the scenario these loci may constitute a good part of your data (especially the highly differentiated ones). Did anyone try to play around with selection and how it may bias parameter estimates in fsc??? And is it possible to incorporate selection at all?

Thank you so much for your help
Sincerely Vivi and Magnus

Laurent Excoffier

unread,

Sep 9, 2015, 3:59:07 PM9/9/15

to fastsimcoal

Sorry, no selection...

csm...@willamette.edu

unread,

Dec 27, 2017, 6:59:40 AM12/27/17

to fastsimcoal

What did you all decide about bootstrapping. If you bootstrapped by resampling your own data and recalculating the site frequency spectrum, did you come up with a way to automate this? I could never get Arlequin to calculate the SFS correctly, and ended up doing that bit by hand (... I know...)

Chris

Reply all

Reply to author

Forward