Error parsing reference panel LD Score

ma34...@gmail.com

unread,

Sep 11, 2016, 9:03:29 AM9/11/16

to ldsc_users

Good afternoon,

I am trying to analyze the partitioned heritability of a new annotation (a set of 34,000 SNPs with MAF > 0.05 in Europeans) with ldscore.py. I have followed the instructions, namely:

- I converted the BMI summary statistics from GIANT into .sumstats

- I modified the CNS.*.annot files changing the last column, putting a 1 if the SNP of my set is in the annotation and a 0 otherwise.

- I computed the LD scores for the 22 chromosomes, obtaining the *.l2.ldscore, *.M and *.

M_5_50 files.

- Finally, I ran:


python ldsc.py 
    --h2 BMI.sumstats\
    --w-ld-chr weights.\
    --ref-ld-chr my_annotation.,baseline.\
    --overlap-annot\
    --not-M-5-50\
    --out BMI_my_annotation\
    --print-coefficients

but it returns an error:

Reading summary statistics from BMI.sumstats ...
Read summary statistics for 1062313 SNPs.
Reading reference panel LD Score from my_annotation.,baseline.[1-22] ...
Error parsing reference panel LD Score.
[...]
ValueError: LD Scores for concatenation must have identical SNP columns.

So, does this mean that the .ldscores for baseline. and for my_annotation. should have the same number of rows (SNPs)? If this is the reason, why the CNS.annot files don't have the same SNPs as the baseline.annot files?

Thank you very much for your help,

Miguel

ma34...@gmail.com

unread,

Sep 12, 2016, 9:32:44 AM9/12/16

to ldsc_users

I found out that there are two sets of files called baseline.*.ldscore and baseline.*.annot, both available in the /alkesgroup/LDSCORE repository: Within the file 1000G_Phase1_baseline_ldscores.tgz there is one set and within 1000G_Phase3_baseline_ldscores.tgz the other one. Only when I carried the analysis (explained in the previous post) correcting for the baseline files that are within the 1000G_Phase1_baseline_ldscores.tgz I could run it without problems. The difference between both sets or archives is in the columns before the LD scores for the functional regions:

first 6 columns at Phase1 *.ldscores:

CHR SNP BP CM MAF base

first 4 columns at Phase3 *.ldscores:

CHR SNP BP baseL2

hil...@mit.edu

unread,

Sep 12, 2016, 2:38:24 PM9/12/16

to ldsc_users

Hi,

Sorry for this bug---I recently updated the baseline LD scores but not the cell type group LD scores. I'll upload new files this week, and in the meantime you can use your own files as long as they have the same SNPs in the same order as the baseline files.

Hilary

hil...@mit.edu

unread,

Sep 16, 2016, 10:45:32 AM9/16/16

to ldsc_users

Hi,

There are cell type groups for 1000G Phase 3 already on the website. Were you using Phase 1 CNS LD scores with Phase 3 baseline LD scores? The set of SNPs has to match if you are using two different reference panels, so you should either download Phase 3 cell type group LD scores and edit one of those, or just take the baseline annot files you are using, delete all of the columns that have annotations, and add a single column that has your annotation.