Ratio of lds regression intercept and mean x2

1,667 views
Skip to first unread message

VvD

unread,
Nov 29, 2016, 8:40:28 AM11/29/16
to ldsc_users
Hi,

I'm trying to understand the ratio which measures the proportion of the inflation in the mean chi^2 that the LD Score regression intercept ascribes to causes other than polygenic heritability, described in the github tutorial on Heritability-and-Genetic-Correlation section. It is written as (intercept-1)/(mean(chi^2)-1), looking at other papers that have used this, Okbay (2016) "Genome-wide association study identifies 74 loci associated with educational attainment" for example, they write that 8% of the inflation of the  chi^2 can be attributed to factors other than polygenic heritability. My question is how/if this translates to the SNP association and heritability results. Is it correct to say that 8% of the heritability is spurious and that the SNP association for each SNP is inflated by 8% also?

Thank you in advance for your help with this issue. 

Raymond Walters

unread,
Nov 29, 2016, 11:26:36 AM11/29/16
to VvD, ldsc_users
Hello,

The mean chi^2 (as well as the median chi^2) is a measure of the “inflation” in a GWAS, e.g. the amount of lift above the null expectation on a QQ plot. Under a null hypothesis of zero genetic effects genome-wide, the expected mean chi square is 1. Values > 1 indicate inflation, which can reflect either true genetic effects or population stratification or other biases. Thus the denominator mean(chi^2)-1 is an index of the amount of inflation in the GWAS.

One of the primary aims of LD score regression is to distinguish inflation from true genetic effects from inflation due to population stratification/etc. Under the LD score model, the intercept term from the regression is 1 plus a population stratification term (regardless of true genetic effects), and thus equal 1 under the null hypothesis that there is inflation from no population stratification. The numerator intercept-1 thus indexes the amount of inflation from stratification and not from genetics.

Putting these pieces together, (intercept-1)/(mean(chi^2)-1) is the ratio of the amount of inflation from stratification only to the amount of inflation from stratification + true genetic effects. If the observed inflation in a GWAS just reflects population stratification it should be approx. 1, and if the inflation reflects just genetic effects it should be approx. 0 (it’s undefined if there’s no inflation in the GWAS from stratification or genetics). Values between 0 and 1 indicate the relative balance of these factors.

So a statement that 8% of the inflation in chi^2 can be attributed to factors other than genetic effects reflects a ratio=0.08.  For association results, this suggests that the influence of population stratification/etc is relatively limited and most of the observed signal is polygenic signals. To address that remaining 8% inflation, it could be desirable to apply genomic control using the LD score intercept as the correction factor in place of lambda_GC. For heritability results, the ratio should have limited impact, since that term exists in the LD score regression model to control for the population stratification effects to get a unbiased estimate of the snp-heritability. 

Further details and discussion on how all of this works can be found in the first LD score paper: https://www.ncbi.nlm.nih.gov/pubmed/25642630

Cheers,
Raymond



--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/9fca7fe5-f839-4aa3-876b-bfefb868c14f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

VvD

unread,
Nov 30, 2016, 6:45:16 AM11/30/16
to ldsc_users, vvd...@gmail.com
Many thanks for this. In my own data I'm getting a ratio of around 20% and I'm thinking that this could be a cause for concern. Whilst this is still different from 1 (stratification alone) given your explanation it does seem to suggest that my data could benefit from additional genomic control. Is this a sensible interpretation and what sort of cut off for the ratio would you advise?

Cheers, V 

Raymond Walters

unread,
Nov 30, 2016, 7:36:55 PM11/30/16
to VvD, ldsc_users
Hi V,
There’s not really a recommended cutoff for the ratio since it evaluates polygenicity vs. stratification rather than stratification alone (i.e. two studies with the same degree of uncorrected population stratification but different heritabilities will have different ratios).

Generally it’s more informative to consider the intercept directly. If the intercept is substantially greater than 1 then there's probably some lingering population stratification/etc that would be worth addressing, either with genomic control or by adjusting the GWAS (e.g. to include more PCs as covariates). The ldsc output includes the SE for the intercept term, so if you really want a cutoff you can define some p-value threshold and test whether it’s significantly above 1.

Cheers,
Raymond



On Nov 30, 2016, at 6:45 AM, VvD <vvd...@gmail.com> wrote:

Many thanks for this. In my own data I'm getting a ratio of around 20% and I'm thinking that this could be a cause for concern. Whilst this is still different from 1 (stratification alone) given your explanation it does seem to suggest that my data could benefit from additional genomic control. Is this a sensible interpretation and what sort of cut off for the ratio would you advise?

Cheers, V 

--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.

Judit

unread,
Apr 6, 2018, 3:33:07 AM4/6/18
to ldsc_users
Dear Raymond

Could you explain with more detail how to calculate intercept p-value to see if intercept is significantly greater than 1.

Thank you very much

Best,

Judit 

Raymond Walters

unread,
Apr 9, 2018, 2:07:14 PM4/9/18
to Judit, ldsc_users
Hi Judit,

You can compute the Z statistic for the difference of intercept from 1 (i.e. z = [intercept-1] / SE) with the SE that is reported in parentheses after the intercept estimate in the log from ldsc. The p-value of that z-statistic can then be computed as usual by comparison to the standard normal distribution. You'll want the one-sided test for the upper tail of the distribution to whether the intercept is > 1.

Cheers,
Raymond


Judit Cabana Dominguez

unread,
Apr 11, 2018, 2:04:50 AM4/11/18
to Raymond Walters, ldsc_users
Thank you very much for you help!!

Best,

Judit

To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+unsubscribe@googlegroups.com.




--
Judit Cabana
Dp.Genètica, Fc.Biologia
Universitat de Barcelona
Av.Diagonal, 643. Barcelona (08028)
Tel: 93 403 72 18
Reply all
Reply to author
Forward
0 new messages