Hi Ryan!
Hope you are doing well! I estimated parameters under a three population divergence model and I am trying to estimate parameter uncertainties using a nonparametric bootstrap. After generating ~100 datasets by randomly sampling SNPs (with replacement) from my dataset, I obtained real unit estimates of N, T, and M:
|
Tn.s |
Tc.s |
N_n |
N_s |
N_c |
m12 |
m13 |
m23 |
| AVERAGE |
206960.2683 |
104601.5494 |
64976.18 |
209939.3 |
89213.69 |
398.0323 |
170.139 |
9311.397 |
| STDEV |
21145.04394 |
17563.81575 |
1901.063 |
9126.291 |
5645.425 |
200.0125 |
88.13612 |
4734.688 |
| OBSERVED |
411967.886 |
180059.2868 |
128905 |
388960.3 |
177993.4 |
1114.351 |
463.508 |
26982.88 |
OBSERVED = my observed parameter estimates, AVERAGE = average parameter estimates over the 100 datasets. My question is this: the averages are about half the observed values -- I feel like this is a math error, but I'm not sure why I need to multiply all these values by 2 again, or where my calculations went wrong. Any thoughts?
Here are the equations and parameters I used to get the data:
| variants
before filtering |
32604 |
| total sites |
2383683 |
| # SNPs after filtering |
5076 |
| effective length (L) |
371107.1 |
| mu |
2.2E-09 |
| theta |
107.412 |
| Nref (theta/(4*L*u) |
32890.54 |
Divergence time - 2*tau*Nref
Effective Population size - Nref*N
Effective migrants - m*Nref*2
I've attached a spreadsheet with the calculations, and all the work is under the 'dadi_bootstrap' worksheet.
thanks a million for your advice!
Mark