Linkage of the SNP data

吴琦

unread,

Dec 3, 2010, 6:40:01 AM12/3/10

to dadi...@googlegroups.com

Hi,
I am reading the data.txt file in the examples and found that the SNPs in the file are very close in the sequence. Take the first two line of the file as an example:

# This data is a subset of the EGP data used in the original dadi paper.
Human   Chimp   Allele1 ASW     YRI     CEU     MXL     CHB     JPT     Allele2 ASW     YRI     CEU     MXL     CHB     JPT     Gene    Position
TTA     TTA     T       27      22      43      42      23      22      G       3       2       1       2       1       2       abcb1   77
GCT     GCT     C       27      22      43      42      23      22      T       3       2       1       2       1       2       abcb1   145
A

they are in the same gene, and the distance between them is only 68bp (145-77=68). I think it is a too close distance and there must is strong Linkage. Or may be I misunderstand the meaning of the number?

WQ

Ryan Gutenkunst

unread,

Dec 3, 2010, 1:06:30 PM12/3/10

to dadi...@googlegroups.com

Hello WQ,

You are correct, there is definitely linkage between SNPs in our data,
so our likelihood function is actually a composite likelihood.
However, that doesn't bias the inference. (See Wiuf, 2006.) It does
affect the analysis in that standard likelihood ratio tests will be
anti-conservative as will calculating uncertainties from the Fisher
Information Matrix. This is why we need to bootstrap fits to estimate
uncertainties and fits to coalescent simulations to do likelihood
ratio tests.

Best,
Ryan

> --
> You received this message because you are subscribed to the Google Groups
> "dadi-user" group.
> To post to this group, send email to dadi...@googlegroups.com.
> To unsubscribe from this group, send email to
> dadi-user+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/dadi-user?hl=en.
>

--
Ryan Gutenkunst
Assistant Professor
Molecular and Cellular Biology
University of Arizona
phone: (520)626-0569
http://gutengroup.mcb.arizona.edu

吴琦

unread,

Dec 4, 2010, 6:45:18 AM12/4/10

to dadi...@googlegroups.com

Hi Ryan,

Thanks for your explaination. Following your words I deduce that if I use a data set with SNPs bound to be independent, I could then estimate the parameter uncertainties without using bootstrap. Really? If so, How to do a non-bootstrap estimate?
For example, if I screen SNPs from genome with the condition that the distance between any two SNP are larger than 10K, How could I estimate the parameter uncertainties without using bootstrap?

Sincerely,
WQ

2010/12/4 Ryan Gutenkunst <ryan.gu...@arizona.edu>

Ryan Gutenkunst

unread,

Dec 7, 2010, 10:58:43 PM12/7/10

to dadi...@googlegroups.com

Hi WQ,

On Sat, Dec 4, 2010 at 4:45 AM, 吴琦 <ribozy...@gmail.com> wrote:
> Thanks for your explaination. Following your words I deduce that if I use a
> data set with SNPs bound to be independent, I could then estimate the
> parameter uncertainties without using bootstrap. Really? If so, How to do a
> non-bootstrap estimate?
> For example, if I screen SNPs from genome with the condition that the
> distance between any two SNP are larger than 10K, How could I estimate the
> parameter uncertainties without using bootstrap?

If your SNPs are independent (and you haven't projected downward),
then standard likelihood theory applies. In particular, the
maximum-likelihood estimate for your parameters is asymptotically
normally distributed, given enough data. In that case, the
uncertainties on your parameters can be calculated as the diagonal
elements of the inverse Fisher information matrix. (The FIM is the
second-derivative matrix of the likelihood with respect to
parameters.)

The "given enough data" condition is not necessarily trivial. For a
related situation, see the study Bustamante et al. Genetics 159:1779
(2001). We haven't worked out the details of how much data is
"enough", and it probably needs to be done case-by-case.

This sort of analysis isn't built into dadi at the moment (although
many of the pieces are). If you're interested in pursuing it, I could
help implement it, either directly or by pointing you in the right
direction. To have confidence, you'd probably want to do the full
bootstrap for one case first, to compare with the results from the FIM
approach.

Best,
Ryan

Reply all

Reply to author

Forward