Issues with CI estimation

232 views
Skip to first unread message

Alejandro Manuel Ferreiro

unread,
Feb 5, 2024, 10:22:55 AM2/5/24
to fastsimcoal2
Hello,

I am using fsc ver. 2.7 (in the HPC cluster serafin) to estimate the demographic parameters of the following population evolutionary model:

 Fig1_Model.png
I am working with multi-SFS obtained from GBS data (ca. 13000 loci) using the easySFS pipeline. I have used the following input files, for the calculation of point-estimates of each parameter. 

********************** tpl file ***************************************
//Parameters for the coalescence simulation program : fastsimcoal.exe
5 samples to simulate :
//Population effective sizes (number of genes)
$NEST$
$NCAY$
$NMJO$
$NCAA$
$NWST$
//Samples sizes and samples age
50
42
6
16
12
//Growth rates  : negative growth implies population expansion      
0
0
0
0
0
//Number of migration matrices : 0 implies no migration between demes
5
//Migration matrix 0
0 $ESTCAY$ $ESTMJO$ $ESTCAA$ $ESTWST$
$ESTCAY$ 0 $CAYMJO$ $CAYCAA$ $CAYWST$
$ESTMJO$ $CAYMJO$ 0 $MJOCAA$ $MJOWST$
$ESTCAA$ $CAYCAA$ $MJOCAA$ 0 $CAAWST$
$ESTWST$ $CAYWST$ $MJOWST$ $CAAWST$ 0
//Migration matrix 1
0 $ESTCAY1$ $ESTMJO1$ 0 $ESTWST1$
$ESTCAY1$ 0 $CAYMJO1$ 0 $CAYWST1$
$ESTMJO1$ $CAYMJO1$ 0 0 $MJOWST1$
0 0 0 0 0
$ESTWST1$ $CAYWST1$ $MJOWST1$ 0 0
//Migration matrix 2
0 $ESTCAY2$ 0 0 $ESTWST2$
$ESTCAY2$ 0 0 0 $CAYWST2$
0 0 0 0 0
0 0 0 0 0
$ESTWST2$ $CAYWST2$ 0 0 0
//Migration matrix 3
0 $ESTCAY3$ 0 0 0
$ESTCAY3$ 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
//Migration matrix 4
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
//historical event: time, source, sink, migrants, new deme size, new growth rate, migration matrix index
8 historical event
$TDIV1$ 3 0 1 1 0 1
$TDIV1$ 3 3 0 0 0 1
$TDIV2$ 2 1 1 1 0 2
$TDIV2$ 2 2 0 0 0 2
$TDIV3$ 4 0 1 1 0 3
$TDIV3$ 4 4 0 0 0 3
$TDIV4$ 1 0 1 1 0 4
$TDIV4$ 1 1 0 0 0 4
//Number of independent loci [chromosome]
1 0
//Per chromosome: Number of contiguous linkage Block: a block is a set of contiguous loci
1
//per Block:data type, number of loci, per generation recombination and mutation rates and optional parameters
FREQ 1 0 4.6e-9 OUTEXP
*******************************************************************************************
*************************** est file ********************************************************
// Priors and rules file
// *********************
[PARAMETERS]
//#isInt? #name   #dist.#min  #max
//all Ns are in number of haploid individuals
1 $NEST$ unif 254884 345036 output
1 $NCAY$ unif 165000 264537 output
1 $NMJO$ unif 20307 30902 output
1 $NCAA$ unif 183982 254047 output
1 $NWST$ unif 149990 205336 output
0 $ESTCAY$ unif 0 0.001 output
0 $ESTMJO$ unif 0 0.001 output
0 $ESTCAA$ unif 0 0.001 output
0 $ESTWST$ unif 0 0.001 output
0 $CAYMJO$ unif 0 0.001 output
0 $CAYCAA$ unif 0 0.001 output
0 $CAYWST$ unif 0 0.001 output
0 $MJOCAA$ unif 0 0.001 output
0 $MJOWST$ unif 0 0.001 output
0 $CAAWST$ unif 0 0.001 output
0 $ESTCAY1$ unif 0 0.001 output
0 $ESTMJO1$ unif 0 0.001 output
0 $ESTWST1$ unif 0 0.001 output
0 $CAYMJO1$ unif 0 0.001 output
0 $CAYWST1$ unif 0 0.001 output
0 $MJOWST1$ unif 0 0.001 output
0 $ESTCAY2$ unif 0 0.001 output
0 $ESTWST2$ unif 0 0.001 output
0 $CAYWST2$ unif 0 0.001 output
0 $ESTCAY3$ unif 0 0.001 output
1 $TDIV1$ unif 25202 35284 output
1 $TDIV2$ unif $TDIV1$ 40000 output paramInRange
1 $TDIV3$ unif $TDIV2$ 50000 output paramInRange
1 $TDIV4$ unif $TDIV3$ 211726 output paramInRange

[RULES]

[COMPLEX PARAMETERS]

****************************************************************************************

The point estimates were obtained using  running 40 independent runs with the following code:
/home/aferreiro/fsc27_linux64/fsc27093 -t ${PREFIX}.tpl -e ${PREFIX}.est -m -u -n100000 -L20 -M --logprecision 18 -q -c64 -B64

Point estimates were performed successfully (see values in the file "best_params.txt" attached). However, I noticed that estimates for the effective size of the 5 populations, were outside the range provided in the est file. Despite this, I tried to estimate Confidence Intervals for those point estimates, using the parametric bootstrap procedure described in the manual. As the genomic data used for generating the multi-SFS were 13000 SNP obtained from randomly sampling 1 SNP per locus, I modified the _maxL.par file as the following:

**************** par file  (only Genetic settings) *************************************

//Number of independent loci [chromosome]

13000 0

//Per chromosome: Number of contiguous linkage Block: a block is a set of contiguous loci

1

//per Block:data type, number of loci, per generation recombination and mutation rates and optional parameters

DNA 1 0 4.6e-9 OUTEXP

*********************************************************************************
 
And 65 boostrapped SFS were computed with the following command:
/home/aferreiro/fsc27_linux64/fsc27093 -i ${PREFIX}_boot.par -n65 -j -m -s0 -x –I -q -c10 -B10 -u
Finally, for each bootstrap replicate we performed 30 independent runs with the following command:
/home/aferreiro/fsc27_linux64/fsc27093 -t ${PREFIX}.tpl -e ${PREFIX}.est -m -C10 -n1000 -L20 -M -q -u --initValues ${PREFIX}.pv --logprecision 18 -c64 -B64
 
After that, parameter values representing effective size are equal to or usually lower than those obtained from the point estimate values (see Bootstrap_params.txt), which represents a challenge to calculate a confidence interval for these parameters. Any idea of what could be going wrong? Is there something I am missing?

It is worth mentioning that I have tried to re-run the analysis and use the
bounded flag for the TDIV parameters, but we still are experiencing the same issue.

Thanks for any helps!

Alejandro
best_params.txt
Bootstrap_params.txt
Message has been deleted

Anita Wray

unread,
Feb 27, 2024, 6:54:35 PM2/27/24
to fastsimcoal2
Hi Alejandro, 

I'm having similar issues, where my point estimate is not within my 95% CIs (see attached file). Were you able to figure out a solution to this issue?

The point estimates were obtained using 100 independent runs. I used 20 block-bootstrapped files and ran 50 independent runs each. I am trying to use --initvalues right now to see if that will change anything, but I'm not seeing any major change.

Best, 
Anita
parameters_with_ci_all.txt

Alejandro Manuel Ferreiro

unread,
Mar 4, 2024, 1:05:28 PM3/4/24
to fastsimcoal2
Hi Anita,

I have not solved this issue yet. Currently I am testing other models as i read that poor likelihood scores may be one reason of the issue of narrow CI. I would let you know if this solve the problem. Did you have lucky including the -initvalues?

Best,
Ale
Reply all
Reply to author
Forward
0 new messages