LD Scores for custom annotations not matching baseline model SNPs

638 views
Skip to first unread message

tmaj...@broadinstitute.org

unread,
Dec 24, 2018, 1:46:53 PM12/24/18
to ldsc_users
Hi-

I have generated some genomewide annotations that I would like to incorporate with the "baseline" model in partitioning heritability. I have followed the instructions for generating annotation files and LD scores using 1000G Phase 3 data and HapMap 3 snps found here: https://github.com/bulik/ldsc/wiki/LD-Score-Estimation-Tutorial

All of this seemed to work until I tried partitioning heritability. I am getting the following error: "ValueError: LD Scores for concatenation must have identical SNP columns."

Looking into the discordance between the two LD score files (baseline and my own), I see that the two files do have some differences. Neither file is a superset of the other but rather, each has some SNPs not included in the other.

For example: 

bt <- fread("1000G_EUR_Phase3_baseline/baseline.19.l2.ldscore.gz", data.table = F, stringsAsFactors = F)
rv <- fread("rarevar.1000G.EUR.QC.19.l2.ldscore", data.table = F, stringsAsFactors = F)

head(bt$SNP)
[1] "rs8100066"  "rs8105536"  "rs2312724"  "rs1020382"  "rs12459906" "rs11084928"

head(rv$SNP)
[1] "rs8100066"  "rs8102615"  "rs8105536"  "rs2312724"  "rs1020382"  "rs12459906"

sum(!(bt$SNP %in% rv$SNP))
[1] 2051

sum(!(rv$SNP %in% bt$SNP))
[1] 1533



I used the plink files and hapmap snps recommended in the documentation: 1000G_Phase3_plinkfiles.tgz, hapmap3_snps.tgz.

Any ideas as to why I'm not getting the same snps would be appreciated.

Thanks.

leihou...@gmail.com

unread,
Jan 22, 2019, 10:55:35 AM1/22/19
to ldsc_users
I also found the same problem, and I used baseline annotations from folder 1000G_Phase3_baselineLD_v2.1_ldscores.

do you get anything back from them?

tmaj...@broadinstitute.org

unread,
Feb 14, 2019, 11:56:28 AM2/14/19
to ldsc_users
No, have not heard back.
Message has been deleted

Paul Hook

unread,
Feb 18, 2019, 10:16:29 AM2/18/19
to ldsc_users
Hello all,

I ran into this same issue with "1000G_Phase3_baseline_ldscores.tgz"

Within that baseline folder was a file called "print_snps.txt" which (it seems) was used to create the baseline LD scores. This seemed to be a different list of SNPs than those included in the HapMap SNPs referred to in the tutorial (https://github.com/bulik/ldsc/wiki/LD-Score-Estimation-Tutorial).

I got around this issue by reprocessing the baseline BED files with the HapMap SNPs.

Paul

Koen

unread,
Jul 25, 2019, 4:27:52 PM7/25/19
to ldsc_users
Hi Paul,

I was wondering if you have any scripts available to reprocess the BED files?

Best,
Koen

Op maandag 18 februari 2019 16:16:29 UTC+1 schreef Paul Hook:
Reply all
Reply to author
Forward
0 new messages