Ryan,
I have more of a popgen question, but it hinges on how dadi estimates theta. I ran some simulations, and I am trying to recover the parameter values used in the simulations using dadi, but I am having trouble equating the theta estimates from dadi to the theta value I used in ms. For example, I simulated 50,000 loci of 100bp each (5,000,000 total sites). These contained 106,348 SNPs, of which 105,641 were biallelic. I randomly selected one biallelic SNP per locus for analysis, resulting in a set of 43,654 SNPs. My theta estimates from dadi are all approximately 1.0. I simulated data, however, using per-locus theta values of 0.2 (0.002 per site). I am trying to convert the parameter estimate to see if I can recover the parameter value I used in the simulation.
I think I need to correct for the fact that I only used 1 of every ~2.4 SNPs in the analysis. I was thinking I could do this as follows:
(dadi theta estimate)/(number of sites in dadi analysis)*(total number of SNPs)/(number of SNPs in analysis) = corrected per-site theta
This, however, results in the following:
(1.0/43654)*(106348/43654) = 5.5E-5/site
Which is quite different from the expected 0.002/site. Is this due to estimation error in dadi? Or (more likely) am I doing something wrong?
Many thanks for any assistance,
Mike Harvey