Negative h2, high intercept, ricopili summary stats

750 views
Skip to first unread message

mullin...@gmail.com

unread,
Mar 26, 2017, 11:24:19 AM3/26/17
to ldsc_users
I have used the ricopili pipeline to run a case only GWAS on a secondary phenotype and I get some strange ldsc results (below). The mean chi2 should be large enough, I get no data munging errors and the number of cases and controls is correct. I ran this GWAS using 5 PCs and when I increased to 10 PCs, I saw more or less the same ldsc results. Would anyone have any suggestions on what this issue could be?

Thanks a lot for your help!
Niamh

Call:

./ldsc.py \

--h2 daner_samdd_060317.gz.ldsc.sumstats.gz \

--ref-ld-chr /home/gwas/ldsc//ref/eur_w_ld_chr/ \

--out ldsc.h2.daner_samdd_060317.gz.observed \

--w-ld-chr /home/gwas/ldsc//ref/eur_w_ld_chr/


Beginning analysis at Tue Mar  7 00:47:35 2017

Reading summary statistics from daner_samdd_060317.gz.ldsc.sumstats.gz ...

Read summary statistics for 1078840 SNPs.

Reading reference panel LD Score from /home/gwas/ldsc//ref/eur_w_ld_chr/[1-22] ...

Read reference panel LD Scores for 1293150 SNPs.

Removing partitioned LD Scores with zero variance.

Reading regression weight LD Score from /home/gwas/ldsc//ref/eur_w_ld_chr/[1-22] ...

Read regression weight LD Scores for 1293150 SNPs.

After merging with reference panel LD, 1072077 SNPs remain.

After merging with regression SNP LD, 1072077 SNPs remain.

Using two-step estimator with cutoff at 30.

Total Observed scale h2: -0.2763 (0.0486)

Lambda GC: 1.0375

Mean Chi^2: 1.0411

Intercept: 1.1018 (0.0083)

Ratio: 2.4741 (0.2011)


Raymond Walters

unread,
Mar 27, 2017, 1:44:30 PM3/27/17
to mullin...@gmail.com, ldsc_users
Hi Niamh,
I agree those results are quite strange. The oversimplified answer is that you have stronger GWAS results from SNPs with low LD scores (i.e. in areas with limited LD) than from SNPs with high LD scores.

A few things I’d look at:

1) Can you verify that the LD scores are a good match to the population for your GWAS samples? I.e. since it looks like you’re using the standard European-ancestry LD scores, can you verify that the GWAS samples are European-ancestry?

2) Consider tightening the QC thresholds in munge_sumstats.py you make sure you’re getting SNPs with high INFO scores and fairly consistent sample sizes. (I would expect this to be a complete fix, but can’t hurt.)

3) Are you sure your trait is fairly polygenic? If most of the genetic risk from from a small number of loci and they happen to have small LD score that could drive this kind of result. I haven’t seen that yield results quite this extreme before, and would normally expect to see larger SEs reflecting that kind of instability from outlier loci, but still worth a look. (Easiest check for this is to exclude your top loci and rerun ldsc.)

4) Is it possible that your trait is primarily driven by rare variants rather than common variants? LD scores are correlated with MAF, so if your SNP effects are heavily skewed towards low MAF variants, that could give you a similar skew towards effects in low LD score variants. Can run MAF-stratified ldsc as a partitioned LD score analysis with either:
a) Newly computed LD scores that are partitioned by MAF using the --cts-bin flag
b) Pre-computed scores from downloading the continuous annotations (1000G_Phase3_baselineLD_ldscores.tgz) from https://data.broadinstitute.org/alkesgroup/LDSCORE/, and extracting just the the MAFbin and base annotations

5) It might be worth plotting the actual LD score regression, i.e. as in Fig. 2 of Bulik-Sullivan et al. . There’s not an automated script for this, but you can extract the LD scores from ./eur_w_ld_chr/*ldscore.gz, match them to your SNPs, and then bin by LD score to get the mean chi-sq by LD bin. May give some more insight on if there’s a particular portion of the LD distribution that’s behaving oddly in your data.

Cheers,
Raymond




--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/3cdf959f-736f-469c-a59c-6dba6afd8c7a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

hes...@broadinstitute.org

unread,
Jul 6, 2017, 1:54:58 PM7/6/17
to ldsc_users
Hi Raymond et al.,
I'm having a related problem, so I wanted to follow up on this thread. I'm copying below the log file from LDSC, which is giving me suspicious results. I'm also including a plot of Mean Chi-sq vs. Mean LD score for 50 SNP bins from my results to confirm the strongly negative h2 results found by LDSC. I haven't thought of a reason why there is strong signal from regions of low LD. I'm performing a GWAS meta-analysis of a sub-sample from PGC-SCZ (logistic regression with 10 covariates ---> inverse-variance meta-analysis in METAL). 
Can you offer any suggestions?
Best wishes,
Jon

################################
*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.0
* (C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call:
./ldsc.py \
--h2 res_scz.sumstats.gz \
--ref-ld-chr eur_w_ld_chr/ \
--out res_scz_h2 \
--w-ld-chr eur_w_ld_chr/

Beginning analysis at Thu Jul  6 19:12:02 2017
Reading summary statistics from res_scz.sumstats.gz ...
Read summary statistics for 1204632 SNPs.
Reading reference panel LD Score from eur_w_ld_chr/[1-22] ...
Read reference panel LD Scores for 1290028 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from eur_w_ld_chr/[1-22] ...
Read regression weight LD Scores for 1290028 SNPs.
After merging with reference panel LD, 1179415 SNPs remain.
After merging with regression SNP LD, 1179415 SNPs remain.
Using two-step estimator with cutoff at 30.
Total Observed scale h2: -0.9682 (0.1103)
Lambda GC: 1.2398
Mean Chi^2: 1.2776
Intercept: 1.5918 (0.0187)
Ratio: 2.132 (0.0672)
Analysis finished at Thu Jul  6 19:12:17 2017
Total time elapsed: 15.73s
################################
ldscore.png

Raymond Walters

unread,
Jul 6, 2017, 4:49:18 PM7/6/17
to hes...@broadinstitute.org, ldsc_users
Hi Jon,
Weird… 
As a starting point:
1) Have you looked into the questions about ancestry and QC thresholds from the previous conversation? (#1 and #2 in my March 27th email)
2) For your attached plot, are you looking at some restricted set of SNPs? The displayed range of LD scores (17-20) is quite narrow.
3) Can you say how the SCZ sub-sample is defined? Wondering if there’s any chance it’s interacting weirdly with the GWAS results. (Can take that conversation out of this thread if you don’t want to talk about the details in a public forum.)

Cheers,
Raymond




-- 
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
<ldscore.png>

hes...@broadinstitute.org

unread,
Jul 6, 2017, 4:58:43 PM7/6/17
to ldsc_users, hes...@broadinstitute.org
Hi Raymond,
Thanks for your prompt reply.
1.  I'm using the post-imputed PGC-SCZ data made by the workgroup, so ancestry and QC shouldn't be confounding this.
2.  This particular plot is showing all the same markers that matched to HM3. My primary analysis is using a filtered set of markers though, and shows the same effect as the full HM3-matched set. 
3. Yes I can send you an e-mail with a description of sub-sampling procedure. 
Best wishes,
Jon

Raymond Walters

unread,
Jul 7, 2017, 6:30:49 PM7/7/17
to hes...@broadinstitute.org, ldsc_users
Hi,
For anyone watching the thread / future reference, it does look like the sampling design is likely responsible for the strange ldsc results. 
Cheers,
Raymond


Reply all
Reply to author
Forward
0 new messages