You are correct, there is definitely linkage between SNPs in our data,
so our likelihood function is actually a composite likelihood.
However, that doesn't bias the inference. (See Wiuf, 2006.) It does
affect the analysis in that standard likelihood ratio tests will be
anti-conservative as will calculating uncertainties from the Fisher
Information Matrix. This is why we need to bootstrap fits to estimate
uncertainties and fits to coalescent simulations to do likelihood
ratio tests.
Best,
Ryan
> --
> You received this message because you are subscribed to the Google Groups
> "dadi-user" group.
> To post to this group, send email to dadi...@googlegroups.com.
> To unsubscribe from this group, send email to
> dadi-user+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/dadi-user?hl=en.
>
--
Ryan Gutenkunst
Assistant Professor
Molecular and Cellular Biology
University of Arizona
phone: (520)626-0569
http://gutengroup.mcb.arizona.edu
On Sat, Dec 4, 2010 at 4:45 AM, 吴琦 <ribozy...@gmail.com> wrote:
> Thanks for your explaination. Following your words I deduce that if I use a
> data set with SNPs bound to be independent, I could then estimate the
> parameter uncertainties without using bootstrap. Really? If so, How to do a
> non-bootstrap estimate?
> For example, if I screen SNPs from genome with the condition that the
> distance between any two SNP are larger than 10K, How could I estimate the
> parameter uncertainties without using bootstrap?
If your SNPs are independent (and you haven't projected downward),
then standard likelihood theory applies. In particular, the
maximum-likelihood estimate for your parameters is asymptotically
normally distributed, given enough data. In that case, the
uncertainties on your parameters can be calculated as the diagonal
elements of the inverse Fisher information matrix. (The FIM is the
second-derivative matrix of the likelihood with respect to
parameters.)
The "given enough data" condition is not necessarily trivial. For a
related situation, see the study Bustamante et al. Genetics 159:1779
(2001). We haven't worked out the details of how much data is
"enough", and it probably needs to be done case-by-case.
This sort of analysis isn't built into dadi at the moment (although
many of the pieces are). If you're interested in pursuing it, I could
help implement it, either directly or by pointing you in the right
direction. To have confidence, you'd probably want to do the full
bootstrap for one case first, to compare with the results from the FIM
approach.
Best,
Ryan