LDSC_SEG for SNPs near selected genes?

Ying

unread,

Apr 17, 2018, 11:22:51 AM4/17/18

to ldsc_users

Hi there,

I find the idea in the paper Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types to select tissue specific genes and use SNPs +/- 100kb around them to test heritibility enrichment very interesting. So I tried to use a gene list of my interest (includes 500 genes) and another random gene list (includes 500 random genes, not of interest), include SNPs near them (+/- 100kb). I mostly followed https://github.com/bulik/ldsc/wiki/Heritability-and-Genetic-Correlation here to make sumstats file of my selected SNP using the following

./munge_sumstats.py \
--out scz.sumstats.gz \
--merge-alleles w_hm3.snplist \
--N ... \
--sumstats selected500g.txt

ldsc.py \
--h2 scz.sumstats.gz \
--ref-ld-chr eur_w_ld_chr/ \
--w-ld-chr eur_w_ld_chr/ \
--out scz_h2

But the values I get in log file is very weird:

for my 500 genes of interest

After merging with regression SNP LD, 30893 SNPs remain.

WARNING: number of SNPs less than 200k; this is almost always bad.

Using two-step estimator with cutoff at 30.

Total Observed scale h2: 0.3767 (0.1654)

Lambda GC: 1.2365

Mean Chi^2: 1.2041

Intercept: 1.0727 (0.0422)

Ratio: 0.3563 (0.2067)

for random 500 genes:

After merging with regression SNP LD, 26319 SNPs remain.

WARNING: number of SNPs less than 200k; this is almost always bad.

Using two-step estimator with cutoff at 30.

Total Observed scale h2: 0.7406 (0.1667)

Lambda GC: 1.2365

Mean Chi^2: 1.2651

Intercept: 0.9727 (0.0412)

Ratio < 0 (usually indicates GC correction).

I am reading the tutorials wiki and also this group posts but still not don't know where is wrong. What baseline models and weights do you recommend for doing heritibility analysisn of SNPs near a gene list?

Thank you so much!

Raymond Walters

unread,

Apr 17, 2018, 12:31:01 PM4/17/18

to Ying, ldsc_users

Hello,

It looks like you are subsetting your GWAS results to the SNPs in the annotation. The partitioned LDSC analysis is performed by defining LD scores that correspond to your annotation of interest and running the LD regression with all SNPs in the GWAS. You can find the tutorial for the partitioned analysis here (including the final example of running cell/tissue types on top of the baseline model) and instructions for computing the LD scores for your custom annotation in the second half of this tutorial.

Hope that helps you get started!

Cheers,

Raymond

--
You received this message because you are subscribed to the Google Groups "ldsc_users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ldsc_users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ldsc_users/16c4dfe9-246f-4aeb-afa7-93adc2e95c98%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ying

unread,

Apr 17, 2018, 4:50:22 PM4/17/18

to ldsc_users

Thanks Raymond! That was helpful!
I was following the tutorial and for my purpose I still have a few questions
1) Should I should make a annot. file to compute the heritibility of SNPs belong to this annotation? Like for each SNP in my sumstats file, I include a 0 if not in my list and 1 if in my list and format it in .annot format.
2) Then shall i put that annot file in --overlap-annot in the following command?
I will really appreciate it if someone could give me tips on these.
Thanks!

Ying

python ldsc/ldsc.py \
  --h2 BMI.sumstats.gz \
  --ref-ld-chr baselineLD. \
  --frqfile-chr 1000G.EUR.QC. \
  --w-ld-chr weights.hm3_noMHC. \
  --overlap-annot my.anno\
  --print-coefficients \
  --print-delete-vals \
  --out BMI.baselineLD

Reply all

Reply to author

Forward