Low heritability predictions and strange genetic correlations for hertiable trait

392 views
Skip to first unread message

Iris

unread,
Jan 25, 2018, 10:06:37 AM1/25/18
to ldsc_users
Hi!

To my understanding ldsc removes snps with extreme pvalues before running heritability or genetic correlations. I have a couple of questions concerning this topic:
1. What is the reasoning behind the exclusion of these SNPs? For instance, APOE would be removed for AD, while this locus might actually explain a substantial part of the heritability, and therefore the heritability might be underestimated?
2. Which step in ldsc removes these snps? (Maybe the out of bound p-values reported in the munge log?)
3. How is it possible that for a trait such as AD with quite some strong loci, heritability is so low (~6.5%)? I know that SNP heritability will be much lower than twin heritability, but can the difference be so large? (twin heritability is 60-80%). Or is the SNP-heritability maybe correlated to the number of significant loci, rather than the strength of the loci?

Thank you,
Iris

Raymond Walters

unread,
Jan 25, 2018, 2:13:04 PM1/25/18
to Iris, ldsc_users
Hi Iris,

You’re correct that ldsc does treat extreme p-values differently. 

The method used by default depends on which type of ldsc analysis is being performed (e.g. basic h2 vs partitioned h2 vs rg). For rg and partitioned h2 SNPs with extreme p-values (defined by default as chi2 > 80 or N/1000, whichever is larger) are excluded. For univariate h2 (as long as the intercept isn’t constrained), a two-step procedure is used where SNPs with chi2 < 30 (by default) are used to estimate the intercept and then all SNPs are used to estimate h2 treating that intercept as fixed.

This behavior can be changed if desired. The threshold for maximum chi2 in the first case (rg and partitioned) can be changed with --chisq-max, including setting it sufficiently large to avoid any exclusions in your data if desired. For univariate h2, you can set a different chi2 threshold for the first stage using --two-step or you can disable the two-step procedure by specifying a --chisq-max instead (again, with the option of setting it arbitrarily large).

Since this filtering is specific to the analysis it does not occur in the munge log (out of bounds p-values is just a check for values outside of 0-1). The log for the analysis should report the filtering though, either as “Removed X SNPs with chi^2 > X” or “Using two-step estimator with cutoff at X”.

As for why this filter exists, it’s primarily to improve the stability of ldsc results for traits with strong single-locus results. Empirically, keeping these outlier loci in the analysis can dramatically increase the SEs of ldsc’s estimates. You’re correct however that the increased precision from this filter does come at the cost of potential bias (both in the intercept and the h2 estimate). At the time this procedure was implemented the biases appeared small enough relative to the benefits for the SE to be deemed worthwhile. We are however keeping an eye on its impact and continuing to think about refining the default behavior and considering more elegant solutions to the single-locus model problem. (See e.g. the section on partitioned ldsc in this blog post that discussed our decision to drop the chi2 filter in initial analyses for UK Biobank)

(It’s worth noting that while the max chi2 filters are mostly a heuristic they do have some theoretical justification. In particular, ldsc is derived under a model where SNP effects are random iid with a consistent variance. Single loci of large effects are arguably a violation of this model; they don’t violate the primary moment conditions since there still exists some marginal average h2 per SNP that can be estimated, but they do affect the regression weights making the regression estimates inefficient.)

You correctly infer that this kind of locus exclusion changes the meaning of the ldsc h2 estimate. Instead of being variance explained by all common SNPs, it’s instead that variance explained minus the extra variance explained by the dropped loci (i.e. under a model assuming that the SNPs dropped from the LD score regression have the same average per-SNP effect sizes as the rest of the genome). To get total variance explained by SNPs you’d then want to re-integrate some estimate of the additional variance explained by the dropped loci as a fixed effect. Alternatively, you can run ldsc without the chi2 filters as described above and accept the less efficient (higher SE) estimates. 

Cheers,
Raymond


--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/75258632-3831-4234-b8e5-56b877980071%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages