LDSC h2 estimation by hand

450 views
Skip to first unread message

cdm...@bristol.ac.uk

unread,
Nov 10, 2015, 11:46:44 AM11/10/15
to ldsc_users
Hi,

I try to estimate LDSC h2 by hand similar to Figure 2 in "LD Score regression distinguishes confounding from polygenicity in genome-wide association studies". I work with a  continuous trait with a GCTA h2 of ~0.20 (P<0.0001). 

So, far I just created LDSC bins according to quantiles (e.g. 100), generated LD scores means and Chi square means per bin, and plotted them. (I have not even started using gls). However, my LDSC h2 estimates are not even near the derived h2 using the LDSC python software. 

LDSC estimates:
Intercept(SE) : 0.992(0.0067)
h2 (SE) : 0.26(0.09)

OLS R estimates:
Call:
lm(formula = chi2 ~ L2, data = L2.chi)

Coefficients:
                   Estimate    Std. Error     t value        Pr(>|t|)
(Intercept)  0.9906202  0.0049205   201.326      < 2e-16 ***
L2              0.0013436   0.0001636     8.215         8.8e-13 ***


Any chance you could point me in the right direction?
Is it a question of the regression weights, which I have not included yet?


Many thanks,
Beate


Raymond Walters

unread,
Nov 10, 2015, 12:02:16 PM11/10/15
to cdm...@bristol.ac.uk, ldsc_users
Hi Beate,
The regression weights may have some impact, but the primary difference here is that the regression coefficient for L2 is not the estimate of h2 by itself. Instead, the regression coefficient is N h2 / M, where N is sample size and M is the number of SNPs (see Equation 1 in the LD score paper). Plugging in values for N and M should get you to an h2 estimate roughly consistent with the python package.
Cheers,
Raymond


--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/60d6bcdf-c83c-4c2a-a2a8-7a612ad73cf6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

cdm...@bristol.ac.uk

unread,
Nov 10, 2015, 5:05:20 PM11/10/15
to ldsc_users, cdm...@bristol.ac.uk
Hi Raymond,

Thanks a lot for this. Yes, indeed there was gap in my thought process. I had tried to multiply beta x N/M and failed, but of course the opposite is required, as beta is already the product of all these factors.

So, yes, I get now approximately the expected estimates. 
I am  trying now to refine the estimation.

1) Would you know whether the accuracy of the estimate is sufficient when using ldscore bins (e.g. 10,000 quantile bins)?
2) Is the gls R function (nlme) a good start to implement the correlation and the weights?
It seems quite restrictive in terms of the specification of the correlation (requested to fit specified corrClasses)...

Thanks again,

Beate

Raymond Walters

unread,
Nov 10, 2015, 5:51:12 PM11/10/15
to cdm...@bristol.ac.uk, ldsc_users
Hi Beate,
1) Hard to say, but with that approach I’d be less concerned about the accuracy of the point estimate than figuring out how to get valid standard errors.
2) I don’t have enough experience with R gls to make a solid recommendation, but I agree it does look fairly restrictive on the structure of the correlation and the weights. 

I’m curious, what’s the motivation to reimplement this in R rather than using the python version? Interested to know if there’s a use case the current software doesn’t cover well.

Cheers,
Raymond



Beate StPourcain

unread,
Nov 11, 2015, 4:18:48 AM11/11/15
to Raymond Walters, ldsc_users
Hi Raymond,

Thanks for your reply. Yes, there is a actually a reason why I would like to adapt the current model, if possible.
I am facing an increasing multiple testing burden and I am dealing with longitudinal phenotypes. Thus, I am looking at the moment into model structures which could fit, in addition to weights and correlations, random effects to model temporal structures (lets say as crossed random effects). So, that's  my intention behind it. In addition, I found it very useful to play with the current model to understand how it works actually, in principle.

I am not sure whether it is indeed possible to adapt the current model, but would really value your opinion.

Best wishes,
Beate



======================================
Beate St Pourcain
Lecturer in Genetic Epidemiology (MRC IEU)
Senior Investigator (MPI for Psycholinguistics)

MRC Integrative Epidemiology Unit
University of Bristol
Oakfield House
15-23 Oakfield Grove
Bristol
BS8 2BN
UK

Max Planck Institute for Psycholinguistics
Wundtlaan 1
6525 XD Nijmegen
The Netherlands

Phone: +31 24 3521964
Fax: +31 24 3521213

======================================
    

Raymond Walters

unread,
Nov 12, 2015, 11:14:00 AM11/12/15
to Beate.St...@bristol.ac.uk, ldsc_users
Hi Beate,
That does sound like an interesting challenge! I can see how the genetic correlation framework might be useful for establishing variances for that longitudinal model. I’m not sure though whether that would be preferable to some form of mixed model that would let you use your full set of genotypes/phenotypes rather than just summary statistics? Happy to answer any other questions as you play with the ldsc model either way.
Cheers,
Raymond
Reply all
Reply to author
Forward
0 new messages