Interpretation of Partitioned Heritability

2,455 views
Skip to first unread message

chinyang...@mpi.nl

unread,
Nov 21, 2016, 9:24:54 AM11/21/16
to ldsc_users
Dear Sir/Madam,
       I have couple of questions for the interpretation of the results I obtained from cell-type group analysis in Partitioned Heritability;
  1. The coefficients from the --print-coefficients, are actually the coefficient r_c, the per-SNP heritability in category C from  Finucane, Bulik-Sullivan et al., bioRxiv . Is this correct?
  2. I have run the 53 categories of the full baseline model when using my own annotation, however the results gave me a negative but large and significant enrichment. How would you interpret this? In FAQ from your website you did mention that negative h2 could be model misspecification, however I have used all 53 categories as my baseline for my own cell-type group. Any ideas?


Thanks in advance.

Kind regards,

Chin Yang

Raymond Walters

unread,
Nov 22, 2016, 6:14:57 PM11/22/16
to chinyang...@mpi.nl, ldsc_users
Hi Chin Yang,

1) The printed regression coefficients are the tau_c described in the paper. The published manuscript has an improved description of the interpretation of this parameter:

tau_C represents the per-SNP contribution to heritability of category C. In particular, if the categories are disjoint, tau_C is the per-SNP heritability in category C and, if the categories overlap, the per-SNP heritability of SNP j is  .

The supplementary material have more information on interpreting the parameter, especially as it relates to enrichment and it’s dependance on the other annotations in the model.


2) Do you mean that the coefficient is negative, or that the enrichment (and presumably the proportion of heritability) is negative? Negative coefficients are plausible and interpretable, but a substantially negative enrichment would be unexpected (and as you say could mean misspecification or some other source of model instability).

Cheers,
Raymond



-- 
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/1f0da0f7-a15a-46b5-85aa-111a7643218b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

chinyang...@mpi.nl

unread,
Nov 23, 2016, 3:57:21 AM11/23/16
to ldsc_users, chinyang...@mpi.nl
Hi Raymond,
Thank you for the quick response.
  1. This answer is now clear to me.
  2. Yes, you are correct, both heritability and enrichment are negative, and yes the enrichment is substantially negative (with large enrichment S.E.) and a very significant p-value (<10^-10) , when you say model misspecification does this mean that there could be something wrong with my own annotation (as I have used 53 categories for baseline within my analysis) or because the sample size for my phenotypes are too small (unlikely as I have a significant p-value?)? And just for clarification, you have replied that "Negative coefficients are plausible and interpretable", could you explain how they are interpretable.
Kind regards,
Chin Yang 

Sandra Sánchez

unread,
Nov 25, 2016, 11:39:13 AM11/25/16
to ldsc_users

As per #2, I also found significant negative enrichment in the baseline model (5-primer UTR, -43.6, s.e. 30.1), which looks bizarre. I also found negative enrichment in several other categories (CTCF+500bp, DHS peak, DHS, FANTOM5 Enhancer+500bp, Fetal DHS, 3-primer UTR+500bp). Any guidance as to how show I interpret these results, and what can this negative enrichment be due to? I also found very large standard errors for many categories. The SNP-heritability of the trait using LDSC is 7.2 (N= 21k).

Thanks in advance!

Sandra

Raymond Walters

unread,
Nov 28, 2016, 8:36:44 PM11/28/16
to Sandra Sánchez, ldsc_users
Hi Sandra,
Negative coefficients would be interpretable as indicating regions with below-average heritability (where “average” is the expectation conditional on other annotations in the model). For example, this could be reasonable for something like the “repressed” annotation, or for genes only expressed in tissues unrelated to the phenotype (which could this be below average for coding regions).

Negative heritability is a different issue though. For 5’ UTR, that doesn’t appear to be significant? (-43.6, with se=30.1 is z=-1.45, two-sided p > .14 unadjusted for multiple testing) The coefficient can be significant without the enrichment being significant. Modestly negative heritability estimates can be expected by sampling variation if the true heritability is near zero,  but if you have others that are significantly negative then that’s a red flag. 

Wide standard errors are not unusual, especially for the enrichment in smaller annotations where it’s a ratio with a small denominator. Smaller annotations are probably also more likely to have model misspecification issues, since they have less room for ldsc’s simplified model of average effect sizes to average out over the annotated region. For those reasons we tend to caution against annotations covering <1% (roughly) of the genome.

I’m guessing you mean snp-heritability of .072? If you have heritability substantially greater than 1 that would definitely indicate some sort of model misspecification.

Cheers,
Raymond


-- 
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.

Raymond Walters

unread,
Nov 28, 2016, 8:46:16 PM11/28/16
to chinyang...@mpi.nl, ldsc_users
Hi Chin Yang,
The reply I just addressed to Sandra in this thread covers much of this, but for your highly significant negative enrichment, what proportion of the genome does this annotation cover? Also, what is your sample size? 

By model misspecification, I’m referring to assumptions for the LD score regression model that might not be valid for your data. Could be something unexpected about your annotation, your phenotype, or your study design, among other things.

Cheers,
Raymond


chinyang...@mpi.nl

unread,
Nov 29, 2016, 6:13:35 AM11/29/16
to ldsc_users, chinyang...@mpi.nl
Hi Raymond,

My annotation covers 1,217,311 SNPs and phenotype have sample size of 5268.

Kind regards,
Chin Yang

Sandra Sánchez

unread,
Nov 29, 2016, 3:50:43 PM11/29/16
to ldsc_users, sandra.s...@gmail.com

Thank you for your detailed explanation!

Sandra

chinyang...@mpi.nl

unread,
Nov 30, 2016, 8:26:58 AM11/30/16
to ldsc_users, chinyang...@mpi.nl
Hi Raymond,
      Perhaps giving you examples of my results may help explain my situation better;

Trait      Category      Baseline Prop._SNPs    Prop._h2    Prop._h2_std_error    Enrichment      Enrichment_std_error    Enrichment_p 
Trait_1 MyAnnotation L2_0      0.156327        1.538251    0.440422893              9.839948415   2.817315967                  0.000192523 
Trait_2 MyAnnotation L2_0      0.156327        10.03792    5.394666379              64.21096812   34.50883225                  5.19E-22 
Trait_3 MyAnnotation L2_0      0.156327        -8.09453     8.779557794             -51.77944824   56.16145019                 1.85E-10 

As you can see from Trait_1 I have good and significant enrichment. However Trait_2 I have significant enrichment but heritability more than 1 (why is this?). Then Trait 3 have significant and negative enrichment.

Kind regards,
Chin Yang

Raymond Walters

unread,
Nov 30, 2016, 8:51:03 PM11/30/16
to chinyang...@mpi.nl, ldsc_users
Hi Chin Yang,
Thanks for the additional info. That definitely rules out the issue being annotation size, and while the sample size isn’t huge it’s definitely large enough I wouldn’t anticipate results that look quite this strange.

From your output file, you’re getting estimates that your annotation contains 1.5x the total genome-wide heritability for trait 1, 10x the total heritability for trait 2, and -8x the total for trait 3. Given values outside 0-1 are impossible, something strange is definitely happening.

The next few things for follow up:
1) Can you verify whether you’re using the --overlap-annot flag?
2) What is your univariate h2 estimate and SE for these 3 traits, before partitioning? If your univariate h2 is very small, that could contribute to the very unstable results here.
3) Is your annotation defined, directly or indirectly, based on the GWAS results for these traits?
4) Did you process your input data with munge_sumstats.py, and/or do any other filtering?

Cheers,
Raymond

chinyang...@mpi.nl

unread,
Dec 1, 2016, 7:40:10 AM12/1/16
to ldsc_users, chinyang...@mpi.nl
Hi Raymond,
Thanks for your quick response,

1.       Yes I have used --overlap-annot flag in my script.

2.       h2 (SE), Trait_1; 0.3992 (0.0949), Trait_2; 0.2593 (0.088) and Trait_3; 0.3996 (0.0961).

3.       Sorry, I’m afraid I’m unsure what you meant by direct or indirect, but my annotation was based on GWAS summary statistics.

4.       I only used munge_sumstats.py to process my input data, I also checked the log files for this and there were no errors in reading my input data.

Kind regards

Chin Yang


Raymond Walters

unread,
Dec 12, 2016, 4:17:25 PM12/12/16
to chinyang...@mpi.nl, ldsc_users
Hi Chin Yang,
Point 3 here is almost certainly the issue. 

The LDSC partitioned heritability model depends on the variance of the effect sizes within each annotation. This works well when you can expect roughly normally distributed effect sizes within annotations from some external source, but if you define an annotation directly from GWAS results you’re likely to get odd, truncated distributions whose variance may have unexpected properties. 

In this case it would appear (qualitatively) that you annotation includes SNPs that have almost entirely strong effects (high chi square values) for trait 2, and almost entirely null effects (possibly closer to zero than chance?) for trait 3. The uninterpretable enrichment/heritability numbers can be blamed on model misspecification.

If your goal is to quantify the impact of variants with known/suspected effects from a GWAS, an analysis using polygenic risk scores (or equivalent) is probably better suited for your question than ldsc.

Cheers,
Raymond




chinyang...@mpi.nl

unread,
Dec 13, 2016, 11:59:05 AM12/13/16
to ldsc_users, chinyang...@mpi.nl
Hi Raymond,
Thank you for the explanation and advice.
Just one last question, from the output file, the proportion of SNPs was 0.156327, I am unsure how ldsc got this number from, as I had 1217311 SNPs and dividing by number of 1000genomes SNPs, which is 9254557, should give me 0.131536.
Could you explain this difference to me?

Kind regards,
Chin Yang

Raymond Walters

unread,
Dec 13, 2016, 2:29:41 PM12/13/16
to chinyang...@mpi.nl, ldsc_users
Hi Chin Yang,

That proportion is based on the number of SNPs reported by the annotation files. The number of SNPs for your annotation will be the sum of the .M_5_50 files from your generated LD scores, and the “total” number of SNPs will be the sum of the .M_5_50 files for the LD scores you are supplying with --ref-ld-chr. (One exception is if you’ve used the --not-M-5-50 flag, in which case .M files are used in place of .M_5_50.)

Cheers,
Raymond



chinyang...@mpi.nl

unread,
Dec 14, 2016, 5:33:58 AM12/14/16
to ldsc_users, chinyang...@mpi.nl
Hi Raymond,
     Thank you very much for your speedy response and answers!
Kind regards,
Chin Yang

Yiyi Ma

unread,
Aug 25, 2017, 4:06:14 PM8/25/17
to ldsc_users, chinyang...@mpi.nl
Hi Raymond,
I also obtained negative but large and significant enrichment and partitioned heritability for one type of SNPs. I separated all the SNPs into two types which do not have overlaps. The annotations were not based on the GWAS results. And I downloaded the publically available GWAS summary statistics for the partitioned heritability analysis to check the relative contribution of these two types of SNPs. However, I always get the estimate of the proportioned h2 for one type of SNPs greater than 1 while the other type is with negative value, and their sum is equal to 1. The enrichment value of the type of SNPs with negative heritability is also negative. Can you instruct me what is going on? And what should we do if we encounter the negative enrichment?

The other minor issue is that there is no file with the extension of "***.results" created if I did not use the "--overlap-annot". However, the annotation of my situation is not overlapped. So I should not use this flag, but then I do not have the result file to provide the enrichment p values for each type of SNPs.

Thank you very much!
Best,
Yiyi

Raymond Walters

unread,
Aug 25, 2017, 8:42:36 PM8/25/17
to Yiyi Ma, ldsc_users, chinyang...@mpi.nl
Hi Yiyi,
Those observations all follow from having the negative estimated heritability in a non-overlapping two-way partition of all SNPs (proportions of h2 must sum to 1 by definition, enrichment is negative by definition if proportion h2 is negative, etc), so the negative h2 estimate in the partition is the key question. It’s hard to diagnose what specifically is happening without knowing more about how you’re partitioning, but the literal result is essentially that having more LD to SNPs in that annotation predicts weaker/more null GWAS results.

You’re correct that the *.results file isn’t created unless you’re using overlap-annot, but in that case the enrichment, etc should be getting output in the main ldsc log file. Are they missing from the log?

Cheers,
Raymond


Yiyi Ma

unread,
Aug 27, 2017, 2:29:17 PM8/27/17
to Raymond Walters, ldsc_users, chinyang...@mpi.nl
Hi Raymon,
Do you mean the the negative enrichment means that the group of the SNPs predict weaker or even null GWAS results? In this case, can I interpret my results that the group of SNPs with the positive enrichment contribute more effects to the GWAS results than the group of SNPs with the negative enrichment?

If I do not use the "--overlap-annot", the log file does not provide the p values of enrichment for each group of SNPs.

Best,
Yiyi

To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+unsubscribe@googlegroups.com.

Raymond Walters

unread,
Aug 28, 2017, 11:27:27 AM8/28/17
to Yiyi Ma, ldsc_users, chinyang...@mpi.nl
Hi Yiyi,
P values aren’t output, but the coefficient and coefficient SE are (or should be), which can be used to computed the z score and corresponding p-value.

Negative enrichment mean that higher LD to the annotation yields lower chi-square statistics. This is highly unusual since even if there are no genetic effects in that annotation (i.e. it’s completely null) that should lead to no relationship with LD (i.e. tagging a little vs. a lot of a null annotation is still null) rather than a negative one. It almost certainly means some form of model misspecification.

We don’t have a formal test for whether the enrichment in two non-overlapping annotations is different. You might be able to approximate something based on the jackknife delete values and assumptions about the distribution of the difference of the fitted coefficients, but it’s definitely not something that’s been formally derived or tested.

Cheers,
Raymond

yiyima...@gmail.com

unread,
Aug 28, 2017, 11:54:45 AM8/28/17
to Raymond Walters, ldsc_users, chinyang...@mpi.nl
What type of model misspecification will be? How shall I correct this? Thanks!

Best,
Yiyi

R Reynolds

unread,
Feb 16, 2018, 11:45:51 AM2/16/18
to ldsc_users
Hi Raymond,

I was just wondering when you say roughly 1% of the genome how do you define the genome? Are you referring to 1% of the genome as defined in the baseline model with 53 annotations (i.e. approx. 90,000 SNPs)? Or are you referring to something else?

Thanks,
Regina

Raymond Walters

unread,
Feb 16, 2018, 2:01:56 PM2/16/18
to R Reynolds, ldsc_users
Hi Regina,
Yes, 90,000 SNPs is probably a good approximation (assuming ~9 million 1000 Genomes variants being annotated). 

The “% of the genome” framing reflects that this scales somewhat with the number of variants in the reference panel, and for region-based annotations there’s often better intuition for the proportion of the genome those regions cover (e.g. the exome is 1-2% of the genome) rather than the number/proportion of SNPs falling in those regions (though the values should be similar).

Cheers,
Raymond


Message has been deleted

Nancy - Sarah Yacovzada

unread,
Oct 11, 2018, 4:33:49 AM10/11/18
to ldsc_users
Hi Raymond,

Regarding your answer below and as described in the paper:
"tau_C represents the per-SNP contribution to heritability of category C. In particular, if the categories are disjoint, tau_C is the per-SNP heritability in category C and, if the categories overlap, the per-SNP heritability of SNP j is  ."

Therefore, and according to the equations in the paper's online methods, the "total heritability" can be theoretically estimated by sum of tau_c as well. So what I'm wondering is:

1. In partitioned heritability analysis, there is no way to print the "total heritability" of the trait, only the regression coefficients. Why is that? I guess there is a meaningful reason for that, since this is basically a simple sum operation of values we already obtained during the analysis. 

2. Let's imagine a case where I ran a regular LDSC and got, let's say, h^2 of 10% for that triat. Then, I run the partitioned heritability with the same SNP data and given baseline-LD annotations. Can I assume that, giving the same SNPs and LD scores for both methods - the total heritability is constant, and therefore should be the same in both analyses? meaning that if I'll sum the "tau_C"s, I should get 10% as well.  

Thanks,
Nancy

Raymond Walters

unread,
Oct 18, 2018, 3:32:15 PM10/18/18
to Nancy - Sarah Yacovzada, ldsc_users
Hi Nancy,

That's correct, the total h2 estimate can be inferred from the tau_c estimates, with the key consideration of accounting for category overlap.

1. The total heritability estimate should be printed by default in the main log of the partitioned heritability analysis, right before the listing of categories. See example log attached showing estimated observed-scale h2 of 0.131.

2. We expect the results to be similar, but not necessarily identical. Differences can arise since these are estimates of SNP-h2 from two different models. In general, we've observed that the total h2 estimates from partitioned heritability analyses tend to be slightly higher on average than estimates from regular univariate analyses, with variability in either direction (e.g. see section on partitioned analysis here). 

Cheers,
Raymond

BMI_baseline.log
Reply all
Reply to author
Forward
0 new messages