Hi there,
I find the idea in the paper
Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types to select tissue specific genes and use SNPs +/- 100kb around them to test heritibility enrichment very interesting. So I tried to use a gene list of my interest (includes 500 genes) and another random gene list (includes 500 random genes, not of interest), include SNPs near them (+/- 100kb). I mostly followed https://github.com/bulik/ldsc/wiki/Heritability-and-Genetic-Correlation here to make sumstats file of my selected SNP using the following./munge_sumstats.py \
--out scz.sumstats.gz \
--merge-alleles w_hm3.snplist \
--N ... \
--sumstats selected500g.txt
ldsc.py \
--h2 scz.sumstats.gz \
--ref-ld-chr eur_w_ld_chr/ \
--w-ld-chr eur_w_ld_chr/ \
--out scz_h2
But the values I get in log file is very weird:
for my 500 genes of interest
After merging with regression SNP LD, 30893 SNPs remain.
WARNING: number of SNPs less than 200k; this is almost always bad.
Using two-step estimator with cutoff at 30.
Total Observed scale h2: 0.3767 (0.1654)
Lambda GC: 1.2365
Mean Chi^2: 1.2041
Intercept: 1.0727 (0.0422)
Ratio: 0.3563 (0.2067)
for random 500 genes:
After merging with regression SNP LD, 26319 SNPs remain.
WARNING: number of SNPs less than 200k; this is almost always bad.
Using two-step estimator with cutoff at 30.
Total Observed scale h2: 0.7406 (0.1667)
Lambda GC: 1.2365
Mean Chi^2: 1.2651
Intercept: 0.9727 (0.0412)
Ratio < 0 (usually indicates GC correction).
I am reading the tutorials wiki and also this group posts but still not don't know where is wrong. What baseline models and weights do you recommend for doing heritibility analysisn of SNPs near a gene list?
Thank you so much!