Mixed model GWA and LDscore regression

Michel Nivard

unread,

Sep 28, 2016, 7:25:50 AM9/28/16

to ldsc_users

Hi ,

I am running into the following bias resulting from applying LDSC on sumstats obtained from commonly used regression techniques in GWAS

LDSC heritabillity estimates are zero or negative in the input file is the result of a mixed model GWAS in which the SNP is present both a a fixed effect and contributes to the GRM for which I estimate the random effect.

Now this is not entirely unexpected, Yang et al (2014) derive that the mean Lambda is 1, regardless of actual stratification in a mixed model GWA if the fixed effect is captured in the random effect. (http://www.nature.com/ng/journal/v46/n2/full/ng.2876.html)

To confirm I:

1. If I simulate a phenotype with h2 = .5 and 10% of SNPs with true effect.

2. ran a linear association in Plink 1.9, munged results and perform LDSC to retrieve h2 = .42

3. on the same simulated phenotype ran a mixed model association in GCTA where all SNPs contribute to the random effect, and retrieve h2 = -0.0809 (0.0675)

I ran this a couple of times, each times re sampling the specific SNPs with true effect, results were consistent across runs. My diagnosis would be that when running MLM with the SNP of intrest in the random effect matrix (GRM), the standard error for high LD SNPs will be pushed up disproportiontely (after all they have many LD buddies in the GRM, and therefore their effect estimate with correlate to the random effect).

If I find the time Ill run some more extensive simulations, to confirm the diagnosis. Any sugestions from the LDSC group would be welcom, perhaps I have overlooked other possible causes of the results I obtain.

If the LDSC team agrees with my diagnosis, this might be something to note somewhere on the FAQ/readme page on the github, as in a GWAMA it needs to be considered at the cohort level, before meta analysis and cannot easily be fixed retrospectively.

Obviously the resulting effects on LDSC estimates are dependent on what proportion of cohorts in a GWAMA run MLM with all SNPs in the background, but in some cases it could amount to appreciable bias.

Best.

Michel Nivard

Raymond Walters

unread,

Sep 28, 2016, 10:36:36 AM9/28/16

to Michel Nivard, ldsc_users

Hi Michel,

I think there are two pieces to note here.

1) Mixed models where the tested SNP is also present in the GRM are generally bad, as discussed by Yang et al (2014) with the MLMi vs MLMe discussion. I think your diagnosis of the impact on LDSC is correct, that MLMi will tend to be underpowered and particularly deflates results for SNPs whose effects are better tagged the GRM (aka SNPs with more LD friends), hence downward bias in h2 estimates.

MLMe, which is probably the recommended approach anyway, should resolve this.

2) Mixed models, including MLMe, also change the expected behavior of the mean chi-square. Following Yang et al.’s notation, in MLMe there’s an additional increase in mean chi-square involving a 1/(1-r^2 h^2g) term. The LDSC model is derived for linear regression so is lacking this term, and thus provides incorrect estimates if applied to GWAS results from a mixed model analysis. (May be additional biases, but this is sufficient to establish that the current model is misspecified for MLM results)

This is something we’re actively working to address, but correcting it is non-trivial especially for case/control data.

You’re probably correct that these deserve mention in the FAQ, especially the second point.

Cheers,

Raymond

--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/29801613-b81a-4862-9e98-8960f2da3ed0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michel Nivard

unread,

Sep 28, 2016, 11:04:25 AM9/28/16

to ldsc_users

Thanks for the quick reply,

I agree MLMi is generally a bad idea. Though I suspect at some point samples will be so large and genome wide significant signals so numerous that the null hypothesis implied in MLMi (SNP effect drawn from the distribution with variance H2/M) will be a potential tool to separate trait specific SNPs from the infinitesimal effects on a trait induced by other heritable traits influencing the trait of interest.

Beyond that academic point. I am running a large GWAMA in which many family and twin cohorts participate. Is their a case for running a linear regression, ignoring relatedness and having the LD score intercept deal with the inflation which will arise due to relatedness? Perhaps run a linear regression parallel to an analysis which corrects for relatedness only for the benefit of post-hoc LDscore analyses?

Added to all this, Your actively working on MLMe, In cohorts of closely related subjects the situation is again slightly different, as the random effect many correct for is a GRM with either expected values obtained from a pedigree, or a GRM with observed relatedness values where values below 0.05 are truncated to zero. is this something we can look forward to be dealth with, or should we invest some time to deal with this very specific mixed model ourselfs?

Best,

Michel

Hilary Martin

unread,

Jan 4, 2018, 1:49:46 PM1/4/18

to ldsc_users

Hi Raymond,

I'm just resurrecting this topic to ask whether this problem has been resolved in the latest version of LDSC (i.e. the LDSC model lacking that extra term present in sumstats from MLMs) ? You mentioned you were working on it back in September 2016.

We have been running LDSC to estimate heritability and calculate genetic correlations on sumstats from BOLT-LMM (case-control data, related individuals removed). If the problem is still remaining, can you give us any idea of how much it is likely to affect things? I suppose this may be hard to know without knowing what the true underlying architecture of our trait is, but anyway, we're estimating a heritability of ~11% and suspect it is highly polygenic (highly correlated with educational attainment). If the problem is unresolved, I guess we should just run a standard old-fashioned linear model with PCs and compare results.