What is "delta method" in the supplementary notes of "LD Score regression distinguishes confounding from polygenicity in genome-wide association ....

439 views
Skip to first unread message

Degang WU

unread,
Aug 22, 2017, 7:53:04 AM8/22/17
to ldsc_users
Hi,

I want to re-derive the main equation 1 in the main text. However, the derivations in the supplementary notes employed a very important approximation relating the expectation of sample correlation to population correlation (I'm referring to equation 1.7 in the supplementary notes). 

The supplementary notes states that this approximation can be obtained using methods such as "delta method". I have no clue what this method is. Could anyone point me to some references about delta method or any other methods that can be used to derive Eq.(1.7) of the supplementary notes?

Thanks!

Raymond Walters

unread,
Aug 23, 2017, 1:51:41 PM8/23/17
to Degang WU, ldsc_users
Hi,
The delta method is a canonical statistical result (e.g. on wikipedia). I first learned it from Casella & Berger’s Statistical Inference book, but I’d expect most formal statistics textbooks in the same domain include some version of it.

I wouldn’t get too hung up on the exact approximation in Equation 1.7 though. From the multiple ways LD score regression can be derived, the key feature there is that there’s an order 1/N upward bias is sample r2 vs. population r2 that corresponds to why chi2 (approximately proportional to n*r2) has and expected value of 1 under the null.

Hope that helps.

Cheers,
Raymond



--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/95e9d8ad-396b-4f79-89d3-ad6c45969a85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Degang WU

unread,
Aug 24, 2017, 2:27:11 AM8/24/17
to ldsc_users, samue...@gmail.com


Hi Raymond,

Thanks for clarifying the meaning of delta method and referring me to Casella & Berger’s Statistical Inference book (my background is not statistics but physical sciences).

I suppose equation 1.7 follows from



which, depending on the sign of g''(\theta), could indicate the sign of bias in the estimator. 

Therefore, I have one question:

Clearly, g(Y_n) is the square of sample correlation. But what will Y_n be in equation 1.7? Is Y_n=sample correlation r_{jk}? or is Y_n={X_{ij},X_{ik}}, the collection of genotypes?

Thanks!
_George_Casella__Roger_L__Berger__Statistical_Infe_BookFi__pdf__page_270_of_686_.png

Raymond Walters

unread,
Aug 25, 2017, 12:16:35 PM8/25/17
to Degang WU, ldsc_users
Hi,
I think you want the regular delta method rather than the second-order version you’ve posted (think it Theorem 5.5.24, equation 5.5.10, depending on the book version), but you’re in the right place.

Y_n will be the sample correlation.

Cheers,
Raymond




For more options, visit https://groups.google.com/d/optout.
<_George_Casella__Roger_L__Berger__Statistical_Infe_BookFi__pdf__page_270_of_686_.png>

Degang WU

unread,
Aug 25, 2017, 1:03:06 PM8/25/17
to ldsc_users, samue...@gmail.com
Thanks for your reply. The first order delta method suggests that the deviation of g(Y_n) from the population value obeys a normal distribution, but since the normal distribution is symmetric, how can first order delta method suggest a upward or downward bias in g(Y_n)?


Thanks!

Raymond Walters

unread,
Aug 25, 2017, 9:45:47 PM8/25/17
to Degang WU, ldsc_users
You’re probably right on that, my mistake. Been a while since I’ve thought about the delta method version of this.

Cheers,
Raymond



T Wu

unread,
Sep 12, 2018, 10:21:28 AM9/12/18
to ldsc_users
Hi Degang,

I also met problem when I tried to derive equation (1.7) in the 2015 LD regression paper. 
May I ask 
1) To use the delta-method, I first need a sequence of random variables that satisfies asymptotic normal. But for the sample correlation between variant j and k, it is a number. Shall I regard  different combination of j and k as Yn?
2) If the answer for 1) is Yes, how can I prove that the sequence is asymptotic normal?
3) if the g(theta) in (1.7) is theta squared, then the second order derivative will be 0, which doesn't satisfy the assumption that the second order derivative exists but not equals to 0.

Thank you for your reply in advance.

Best regards,
T

Degang WU

unread,
Sep 12, 2018, 10:27:40 PM9/12/18
to ldsc_users
Sorry I cannot help you on your questions. I have exactly the same questions as yours but so far I have not got any answers to those questions.

Degang WU

unread,
Aug 1, 2019, 12:26:35 PM8/1/19
to ldsc_users
Hi, hope you have not yet abandon your quest for the truth. I realized that the derivation of equation 1.7 might have nothing to do with delta method. Equation 1.7 actually means that the sample correlation coefficient is a biased estimator.


On Wednesday, September 12, 2018 at 10:21:28 PM UTC+8, T Wu wrote:

T Wu

unread,
Aug 12, 2019, 11:16:45 PM8/12/19
to ldsc_users
Hi, I think for the meaning of 1.7, you are right. But I don't agree that the derivation of 1.7 has nothing to do with delta-method, for the paper claimed that "one can obtain this approximation via e.g., the δ-method".

My question is, even I used delta-method, to be specific, the formula (2) in the document below:


what I can get for E(\tilder \r_{jk}^2) is just r_{jk}^2 without the (1-r_{jk}^2)/N term, as E(\tilder \r_{jk}) = r_{jk}.

What do you think?

Degang WU

unread,
Aug 18, 2019, 10:19:38 PM8/18/19
to ldsc_users
I think I understand your problem. E(\tilde \r_{jk}) = r_{jk} means that \tilde{\r_{jk}} is a unbiased estimator of r_{jk}, which is not true (see https://stats.stackexchange.com/questions/220961/is-the-sample-correlation-coefficient-an-unbiased-estimator-of-the-population-co). Therefore, one of the assumptions of delta method is not satisfied, and therefore delta method should not be used. 

samue...@gmail.com

unread,
Mar 30, 2021, 6:18:38 AM3/30/21
to ldsc_users
A bit of progress here. According to Ghosh 1966, the 2nd moment of sample correlation coefficient can be expressed as
截屏2021-03-30 下午6.09.08.png
Here rho is the population correlation coefficient, n is the number observations, and F is the hypergeometric function:
截屏2021-03-30 下午6.10.14.png.
One can show (with Mathematica for example) that F is really close to 1 when n is large (>1000), and we can approximate it as 1 here. Now it is clear that
Untitled.png,
which is quite similar to Eq 1.7 in the supplementary materials.

References:
Ghosh, B.K. (1966). Asymptotic Expansions for the Moments of the Distribution of Correlation Coefficient. Biometrika 53, 258.
Reply all
Reply to author
Forward
0 new messages