Hi Brendan,
We are currently trying to run heritability estimation on the PGC MDD data set as stated in the "LD Score regression distinguishes confounding from polygenicity in genome-wide association studies" paper.
In the supplementary, it was stated that the heritability estimation of MDD after adjusting for liability threshold is around 0.409. However, when we perform LDSC on the PGC MDD daat, we only get an estimate of 0.1783.
Our procedure were as follow:
#Prepare LD Score
1. Download 1000g vcf file
2. plink --vcf <vcf files> --biallelic-only --make-bed --out <output>
3. Extract all EUR samples
3. plink --bfile <EUR output> --make-bed --out <EUR output>.filter --maf 0.002 #Remove singletons (1/503)
4, python ldsc.py --bfile <EUR output>.filter --l2 --ld-wind-kb 1000 --out EUR #We would like to use 1mb window instead of cM
#Prepare test statistic (lift over to hg19)
awk 'NR!=1 {print "chr"$2" "$3-1" "$3" "$1}' pgc.mdd.full.2012-04.txt > mdd.hg18.bed
liftOver mdd.hg18.bed hg18ToHg19.over.chain.gz mdd.hg19.bed mdd.hg19.failed
awk 'NR==FNR {hash[$4]=$3;next}{printf $1"\t"$2"\t"hash[$1]; for(i=4; i <= NF; ++i){printf "\t"$i}; print""}' mdd.hg19.bed pgc.mdd.full.2012-04.txt > pgc.mdd.hg19.txt
#munge
python munge_sumstats.py --sumstats pgc.mdd.hg19.txt --out mdd.ldsc --snp snpid --p pval --N-cas 9240 --N-con 9519 --signed-sumstats or,1
#Perform LD Score Regression
python ldsc.py --h2 mdd.ldsc.sumstats.gz --ref-ld EUR --out mdd.ldsc.res --samp-prev 0.4925636 --pop-prev 0.15 --print-coefficients --w-ld EUR
#Result:
Beginning analysis at Tue Nov 17 10:36:51 2015
Reading summary statistics from mdd.ldsc.sumstats.gz ...
Read summary statistics for 905020 SNPs.
Reading reference panel LD Score from EUR ...
Read reference panel LD Scores for 23735408 SNPs.
Reading regression weight LD Score from EUR ...
Read regression weight LD Scores for 23735408 SNPs.
After merging with reference panel LD, 895562 SNPs remain.
After merging with regression SNP LD, 895562 SNPs remain.
Using two-step estimator with cutoff at 30.
Total Liability scale h2: 0.1783 (0.0327)
Lambda GC: 1.0741
Mean Chi^2: 1.0793
Intercept: 1.016 (0.0085)
Ratio: 0.2023 (0.1068)
Analysis finished at Tue Nov 17 10:40:17 2015
Total time elapsed: 3.0m:25.83s
Is this the correct use of LDSC? If so why did the result differ so much?
Thank you for your help!
Sam