How to infer confidence intervals for estimated parameters

1,715 views
Skip to first unread message

wk8...@gmail.com

unread,
Aug 24, 2014, 11:37:32 PM8/24/14
to fasts...@googlegroups.com
Hello Laurent,

Thank you for building the great soft software.
Fastsimcoal2 is user friendly, robust and fast, especially with multithreading.

However, I'm not sure about how to infer confidence intervals for estimated parameters.

This is my understanding of the process:

1) Estimate the parameter with FSC.

2) Replicate step 1 for 50~100 times.

3) Choose the parameter with the highest MaxEstLhood as the "point estimation".

4) Simulating SFS 100 times with the parameter (point estimation) from step 3.

5) Re-estimate parameter for the 100 SFS data from step 4 (30 replicates for every SFS).

6) the 100 parameters from step 5 can be used to infer confidence interval and the 100 CLR (log10(CLO=CLE)) could be used to evaluate the model.

Please check whether my understanding is right.

Another question is:
When I used the command "fsc25 -i 1PopBot20Mb_maxL.par -n 100" to simulate the SFS, the program told me "Do not output expected MAF spectrum Do not output expected DAF spectrum" and the SFS could not be output.

Thanks for any help!

Kun

Laurent Excoffier

unread,
Aug 25, 2014, 8:25:18 AM8/25/14
to fasts...@googlegroups.com
Hmm  this is not the right procedure, which should be more like:


1) Estimate the parameter with FSC.

2) Use the maximum likelihood parameters (found in file *_maxL.par) to generate say 100 SFS (preferably from DNA data, and you need to modify the _maxL.par file to do this)

3) Re-estimate parameter for the 100 SFS data from step 2 (30-50 or more runs for every SFS).

4) the 100 parameters from step 3 can be used to infer confidence interval.

Concerning the evaluation of the fit of the data to the model based on the difference between the "observed" and "maximum" likelihood, I have noticed that this procedure is too stringent, and most of the time you will reject the hypothesis that the data have been generated under the model you are simulating (and this will be most of the time right as we do not know usually what is the true evolutionary scenario). I'd rather advise to compare models with AIC rather than to hope find the absolute true model.

See my comment in point 2 above to answer your final question.

Hope it helps

laurent


wk8...@gmail.com

unread,
Aug 25, 2014, 8:52:29 AM8/25/14
to fasts...@googlegroups.com
Thanks for your kind help!

I'm sorry but I'm still a litter confused.

In the step 3, we need 30-50 or more runs for every SFS in order to get a precise parameter. But why don't we need it in step 1?

wk8...@gmail.com

unread,
Aug 25, 2014, 10:03:55 AM8/25/14
to fasts...@googlegroups.com
Another question is:

If I use DNA data as the output of simulation, the output file could be very large. It would be very difficult if I want to simulate a whole genome.

I've tested "-D", "-m" or other parameters of FSC to seek a SFS output, but they just doesn't work.

Laurent Excoffier

unread,
Aug 25, 2014, 10:25:14 AM8/25/14
to fasts...@googlegroups.com
Sure in step 1, you also need to make multiple run and select the run with the highest likelihood.

Laurent Excoffier

unread,
Aug 25, 2014, 10:27:50 AM8/25/14
to fasts...@googlegroups.com
In principle, you should simulate the same amount of data that you have observed...
But maybe you do not need to simulate a whole genome, and 20-100Mb worth of data should be fine

You need the combination of -d -s0 for generating the SFS with DNA data (as mentioned in the manual)

wk8...@gmail.com

unread,
Aug 25, 2014, 10:36:45 AM8/25/14
to fasts...@googlegroups.com
It works! Thank you very much for your invaluable help!

Kun

cecilia...@gmail.com

unread,
Jan 11, 2020, 1:19:05 PM1/11/20
to fastsimcoal
Dear Laurent,

What is the rationale in getting the confidence interval by re-estimating from simulations created based on the parameter of the best run? Could you please recommend me some text about it (I am a biologist, so I have limited mathematical and statistical knowledge)? 

Why estimating the confidence interval by plotting the distribution of estimates obtained in the step 1 is not recommended?

Thanks!

Cecilia

Cecilia Fiorini

unread,
Mar 9, 2021, 7:18:02 AM3/9/21
to fastsimcoal
I just found the answer for my question: https://groups.google.com/g/fastsimcoal/c/sxY_1XASWBE/m/8260wyIbBAAJ :)
Reply all
Reply to author
Forward
0 new messages